How I automated my backups to Amazon S3 using rsync and s3fs.

October 27th, 2008  | Tags: , , ,

The following is how I automated my backups to Amazon S3 in about 5 minutes.

I lot has changed since my original post on automating my backups to s3 using s3sync. There are more mature and easier to use solutions now. I am switching because using s3fs gives you much more options for using s3, it is easier to set up and it is faster.

I now use a combination of s3fs to mount a S3 bucket to local directory and then use rsync to keep up to date with my files. The following directions are geared towards Ubuntu linux, but could be modified for any linux distribution and Mac OSX.


STEP 1: Install s3fs

The first step is to install s3fs dependencies. (Assuming Ubuntu)

sudo apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev

Next, install the most recent version of s3fs. As of now the most recent is r177, but a quick check of s3fs downloads will show the most recent.

wget http://s3fs.googlecode.com/files/s3fs-r177-source.tar.gz
tar -xzf s3fs*
cd s3fs
make
sudo make install
sudo mkdir /mnt/s3
sudo chown yourusername:yourusername /mnt/s3

STEP 2: Create script to mount your Amazon s3 bucket using s3fs and sync files.

The following assumes you already have a bucket created on Amazon S3. If this is not the case, you can use a tool like s3Fox to create one.

Choose a text editor of your choice and make a shell script to mount your bucket, perform rsync, then unmount. It is not necessary to unmount your S3 directory after each rsync, but I prefer to be safe. One mistake like an ‘rm’ on your root directory could wipe all of your files on your machine and your S3 mount. You should probably start with a test directory to be safe.

Make the file s3fs.sh

#!/bin/bash
/usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3
/usr/bin/rsync -avz --delete /home/username/dir/you/want/to/backup /mnt/s3
/bin/umount /mnt/s3

Note, the –delete option. This will delete any files that have been removed on the ’source’.
Change permissions to make executable

chmod 700 s3fs.sh

Before you run the entire script, you might want to run each line separately to make sure everything is working properly. The paths to rsync, umount might be different on your system. (Use ‘which rsync’ to check) Just for fun, I did a ‘df -h’, which showed I now have 256 Terabytes available on the s3 mount!

Next, run the script and let it do its work. This could take a long time depending on how much data you are uploading initially. Your internet upload speed will be the bottleneck.

sudo ./s3fs.sh

That’s it! You are backing up to Amazon S3. You probably want to automate this using cron after you are sure everything is running o.k. Just for simplicity of this tutorial, lets assume you are setting up the cron job as root so we don’t need to worry about editing permissions for mount/umounting directory.

STEP 3: Automate it with cron

sudo su
crontab -e
0 0 * * * /path/to/s3fs.sh # this runs it everyday at midnight

p.s. I use this in combination with hourly backups to a second local machine using git to have revision history. I only backup nightly to s3 without revision history in case my house burns down etc. If you would like to know how I set up my git backups locally, just leave a comment and I can make a follow up post.

  1. October 28th, 2008 at 07:05
    Reply | Quote | #1

    Hi John- great write up! just an FYI in this case the rsync -z switch (compression) has no effect because there is no remote rsync server; if required the http://www.subcloud.com version provides compression (and encryption)

  2. October 28th, 2008 at 07:29
    Reply | Quote | #2

    Would be great if someone made a .deb and a gui for this.

    Yes, me lazy…

  3. October 28th, 2008 at 08:51
    Reply | Quote | #3

    Thanks Randy, I have updated the post.

  4. Richard
    October 30th, 2008 at 13:57
    Reply | Quote | #4

    this is awesome! And it even works. thank you so much!

  5. Shamus R
    November 5th, 2008 at 13:03
    Reply | Quote | #5

    This is fantastic — exactly what I’m looking to back-up my home server. One question: how would you go about adding e-mail verification? i.e. If the back-up is successful it sends an e-mail confirmation.

  6. Dave
    November 6th, 2008 at 10:11
    Reply | Quote | #6

    Thanks for the excellent article. I’m running into a “fuse: device not found, try ‘modprobe fuse’ first”. I’ve tried everything I can think of with no luck. sudo modprobe fuse runs (no output). Anyone else run into this or have any idea what’s wrong?

  7. November 6th, 2008 at 21:31
    Reply | Quote | #7

    Dave, I had the same problem with my Gutsy EC2 instance you might want to check out this thread http://groups.google.com/group/ec2ubuntu/browse_thread/thread/9093236bc07d220b/2bf41010b95f8646?hl=en&lnk=gst

    I installed fuse:
    apt-get install -y fuse-utils encfs

    and it worked for. Not sure if I needed encfs but installed it anyway.

    BTW - Great post John - keep up the awesome work!

  8. Jay
    November 14th, 2008 at 05:22
    Reply | Quote | #8

    I for one would like to see more information on how you set up your computer to perform hourly backups using git to have a revision history.

    As a second part to my post:
    I added a few lines to the backup script described above to provide email support alerting me that the backup took place and describing the backup procedure. Here is an abbreviated sample of the script:

    #!/bin/bash
    SENDMAIL=/usr/sbin/sendmail
    EMAIL=jay@localhost
    # script to upload local directory upto s3
    #change to directory containing script
    cd /jdata/s3sync
    # jdata Directory
    export AWS_ACCESS_KEY_ID=88888888
    export AWS_SECRET_ACCESS_KEY=88888888
    export SSL_CERT_DIR=/jdata/s3sync/certs

    echo -e “To: ${EMAIL}\nSubject: s3backup results\nContent-type: text/plain\n\n” > /tmp/s3backup.log

    # and -n for dry run
    ruby s3sync.rb -r -v –ssl –delete /jdata/ jayNewBucket:/jdata > /tmp/s3backup.log
    # copy and modify line above for each additional folder to be synced

    # home directory
    ruby s3sync.rb -r -v –ssl –delete /home/ jayNewBucket:/home >> /tmp/s3backup.log
    # copy and modify line above for each additional folder to be synced

    cat /tmp/s3backup.log | ${SENDMAIL} “${EMAIL}”

  9. Tom Metro
    November 14th, 2008 at 08:37
    Reply | Quote | #9

    Backing up to S3 isn’t necessarily the hard part. Backing up to S3 securely and efficiently, is. Two things should be addressed in the intro to this howto: 1. Does using rsync in this fashion take full advantage of rsync? In other words, does s3fs permit rsync to obtain a hash of a portion of a file, and update a portion of a file, or do those operations require the transfer of an entire file. 2. While S3 may encrypt things on their end, some users would prefer a solution where encryption happens locally, so the data is safe over the wire, as well as when in storage. Where, if anywhere, does s3fs encrypt the data?

  10. Jack
    November 17th, 2008 at 09:52

    Just curious, how do your S3 charges look?

  11. Jay
    November 18th, 2008 at 07:06

    I have 16 GBs of storage. Below is my cost for the past month

    Greetings from Amazon Web Services,

    This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:

    Total: $2.52

    Please see the Account Activity area of the AWS web site for detailed account information:

  12. Chris
    November 19th, 2008 at 08:13

    And how does your restore procedure looks like? Backing up data is one thing, getting it back in a decent matter is another.

  13. November 20th, 2008 at 05:12

    Dave :Thanks for the excellent article. I’m running into a “fuse: device not found, try ‘modprobe fuse’ first”. I’ve tried everything I can think of with no luck. sudo modprobe fuse runs (no output). Anyone else run into this or have any idea what’s wrong?

    I’ve got the same issue. No solution found yet.
    Have to use s3cmd.

  14. Dave
    November 20th, 2008 at 07:08

    I ran into a few problems on three different boxes setting this up. I never got one of them working but the other two are working fine. See this thread http://groups.google.com/group/s3fs-devel/browse_thread/thread/34df46c5ca90560b

  15. Anonymous
    November 24th, 2008 at 08:38

    Very nice article, thank you for your time and detailed script. If you could help me, I am trying to figure out something and you might already have the answer.

    Ok, this is what I understand from the documentation of rsync/s3fs and s3sync:
    - s3sync uses MD5 checksum to check if a file has changed on your disk. This md5 is provided in the file listing from s3 (i.e. LIST request)
    - rsync compares the actual content of the files (doing md5 on portion of files) to determine what parts of the file has changed and only upload what is really needed. However, s3 doesn’t allow retrieval of blocks but will send the whole file to you. s3fs actually does a cache of the files to limit the bandwidth, but comparing files with rsync will still require download of what is already on s3 to this cache.

    So, now, I wonder if there is no big bandwidth usage difference between using s3fs/rsync instead of s3sync?
    Did you evaluate the difference of bandwidth usage/price between when you had the s3sync backup and now?

  16. Bertrand
    November 24th, 2008 at 08:40

    the rsync –delete is not working properly for me. When I delete a single file, it works well : the file is also deleted on the S3 bucket. But when I delete a folder, both folders & files contained in it are still on the S3 bucket when I do a “s3cmd ls”. Do you have the same problem ?

  17. Bertrand
    November 26th, 2008 at 08:20

    the rsync was really slow with s3fs, so searching around I found that Duplicity support S3 backup. It was easy to configure and it works really well for me : embeded compression to save on space in S3 and encryption with gpg. I did quite many trials and speed is also good : 140Mo backup in 15min.

  18. Dragos
    November 27th, 2008 at 10:52

    Hi all.

    I have successfully installed s3fs on Ubuntu 8.04 2.6.15-51-server.

    The things is for any I/O operation on the mounted dir /mnt/dir-bkp I gen
    t an I/O error. Same thing for rsync

    eg rsync -va /home/dir1 /mnt/dir1-bkp/

    Output
    rsync: recv_generator: mkdir “/mnt/expo-bkp/dir1″ failed: Input/output error (5)
    *** Skipping any contents from this failed directory ***

    Any ideas ?

  19. ChristianD
    December 16th, 2008 at 13:27

    Hey John,

    Happened upon your blog via google :)

    Do you have any experience syncing the other way around? I would like to keep a copy of our s3 assets in sync on the server. I have changed Paperclip to use the file system in development mode, and downloading GB’s of data from S3 is error prone.

  20. January 3rd, 2009 at 13:38

    Anyone else having problems with s3fs going ballistic, even when idle, and using 100% CPU? My laptop almost caught fire!

    There’s a note on 177 that it’s fixed but not for me.

    Anyone else?

  21. January 3rd, 2009 at 13:38

    Forgot to mention OS X 10.5 Intel.

  22. Dragos
    January 3rd, 2009 at 14:20

    I know it won’t help if you decided to use s3fs but as an alternative for backups
    I use http://s3sync.net/ or better if you think to a more stable and professional solution you can use an EC2 image acting as a rsync server.

TOP