How I automated my backups to Amazon S3 using rsync and s3fs.
The following is how I automated my backups to Amazon S3 in about 5 minutes.
I lot has changed since my original post on automating my backups to s3 using s3sync. There are more mature and easier to use solutions now. I am switching because using s3fs gives you much more options for using s3, it is easier to set up and it is faster.
I now use a combination of s3fs to mount a S3 bucket to local directory and then use rsync to keep up to date with my files. The following directions are geared towards Ubuntu linux, but could be modified for any linux distribution and Mac OSX.
STEP 1: Install s3fs
The first step is to install s3fs dependencies. (Assuming Ubuntu)
sudo apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev
Next, install the most recent version of s3fs. As of now the most recent is r177, but a quick check of s3fs downloads will show the most recent.
wget http://s3fs.googlecode.com/files/s3fs-r177-source.tar.gz tar -xzf s3fs* cd s3fs make sudo make install sudo mkdir /mnt/s3 sudo chown yourusername:yourusername /mnt/s3
STEP 2: Create script to mount your Amazon s3 bucket using s3fs and sync files.
The following assumes you already have a bucket created on Amazon S3. If this is not the case, you can use a tool like s3Fox to create one.
Choose a text editor of your choice and make a shell script to mount your bucket, perform rsync, then unmount. It is not necessary to unmount your S3 directory after each rsync, but I prefer to be safe. One mistake like an ‘rm’ on your root directory could wipe all of your files on your machine and your S3 mount. You should probably start with a test directory to be safe.
Make the file s3fs.sh
#!/bin/bash /usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3 /usr/bin/rsync -avz --delete /home/username/dir/you/want/to/backup /mnt/s3 /bin/umount /mnt/s3
Note, the –delete option. This will delete any files that have been removed on the ’source’.
Change permissions to make executable
chmod 700 s3fs.sh
Before you run the entire script, you might want to run each line separately to make sure everything is working properly. The paths to rsync, umount might be different on your system. (Use ‘which rsync’ to check) Just for fun, I did a ‘df -h’, which showed I now have 256 Terabytes available on the s3 mount!
Next, run the script and let it do its work. This could take a long time depending on how much data you are uploading initially. Your internet upload speed will be the bottleneck.
sudo ./s3fs.sh
That’s it! You are backing up to Amazon S3. You probably want to automate this using cron after you are sure everything is running o.k. Just for simplicity of this tutorial, lets assume you are setting up the cron job as root so we don’t need to worry about editing permissions for mount/umounting directory.
STEP 3: Automate it with cron
sudo su crontab -e 0 0 * * * /path/to/s3fs.sh # this runs it everyday at midnight
p.s. I use this in combination with hourly backups to a second local machine using git to have revision history. I only backup nightly to s3 without revision history in case my house burns down etc. If you would like to know how I set up my git backups locally, just leave a comment and I can make a follow up post.

Hi John- great write up! just an FYI in this case the rsync -z switch (compression) has no effect because there is no remote rsync server; if required the http://www.subcloud.com version provides compression (and encryption)
Would be great if someone made a .deb and a gui for this.
Yes, me lazy…
Thanks Randy, I have updated the post.
this is awesome! And it even works. thank you so much!
This is fantastic — exactly what I’m looking to back-up my home server. One question: how would you go about adding e-mail verification? i.e. If the back-up is successful it sends an e-mail confirmation.
Thanks for the excellent article. I’m running into a “fuse: device not found, try ‘modprobe fuse’ first”. I’ve tried everything I can think of with no luck. sudo modprobe fuse runs (no output). Anyone else run into this or have any idea what’s wrong?
Dave, I had the same problem with my Gutsy EC2 instance you might want to check out this thread http://groups.google.com/group/ec2ubuntu/browse_thread/thread/9093236bc07d220b/2bf41010b95f8646?hl=en&lnk=gst
I installed fuse:
apt-get install -y fuse-utils encfs
and it worked for. Not sure if I needed encfs but installed it anyway.
BTW - Great post John - keep up the awesome work!
I for one would like to see more information on how you set up your computer to perform hourly backups using git to have a revision history.
As a second part to my post:
I added a few lines to the backup script described above to provide email support alerting me that the backup took place and describing the backup procedure. Here is an abbreviated sample of the script:
#!/bin/bash
SENDMAIL=/usr/sbin/sendmail
EMAIL=jay@localhost
# script to upload local directory upto s3
#change to directory containing script
cd /jdata/s3sync
# jdata Directory
export AWS_ACCESS_KEY_ID=88888888
export AWS_SECRET_ACCESS_KEY=88888888
export SSL_CERT_DIR=/jdata/s3sync/certs
echo -e “To: ${EMAIL}\nSubject: s3backup results\nContent-type: text/plain\n\n” > /tmp/s3backup.log
# and -n for dry run
ruby s3sync.rb -r -v –ssl –delete /jdata/ jayNewBucket:/jdata > /tmp/s3backup.log
# copy and modify line above for each additional folder to be synced
# home directory
ruby s3sync.rb -r -v –ssl –delete /home/ jayNewBucket:/home >> /tmp/s3backup.log
# copy and modify line above for each additional folder to be synced
cat /tmp/s3backup.log | ${SENDMAIL} “${EMAIL}”
Backing up to S3 isn’t necessarily the hard part. Backing up to S3 securely and efficiently, is. Two things should be addressed in the intro to this howto: 1. Does using rsync in this fashion take full advantage of rsync? In other words, does s3fs permit rsync to obtain a hash of a portion of a file, and update a portion of a file, or do those operations require the transfer of an entire file. 2. While S3 may encrypt things on their end, some users would prefer a solution where encryption happens locally, so the data is safe over the wire, as well as when in storage. Where, if anywhere, does s3fs encrypt the data?
Just curious, how do your S3 charges look?
I have 16 GBs of storage. Below is my cost for the past month
Greetings from Amazon Web Services,
This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:
Total: $2.52
Please see the Account Activity area of the AWS web site for detailed account information:
And how does your restore procedure looks like? Backing up data is one thing, getting it back in a decent matter is another.
I’ve got the same issue. No solution found yet.
Have to use s3cmd.
I ran into a few problems on three different boxes setting this up. I never got one of them working but the other two are working fine. See this thread http://groups.google.com/group/s3fs-devel/browse_thread/thread/34df46c5ca90560b
Very nice article, thank you for your time and detailed script. If you could help me, I am trying to figure out something and you might already have the answer.
Ok, this is what I understand from the documentation of rsync/s3fs and s3sync:
- s3sync uses MD5 checksum to check if a file has changed on your disk. This md5 is provided in the file listing from s3 (i.e. LIST request)
- rsync compares the actual content of the files (doing md5 on portion of files) to determine what parts of the file has changed and only upload what is really needed. However, s3 doesn’t allow retrieval of blocks but will send the whole file to you. s3fs actually does a cache of the files to limit the bandwidth, but comparing files with rsync will still require download of what is already on s3 to this cache.
So, now, I wonder if there is no big bandwidth usage difference between using s3fs/rsync instead of s3sync?
Did you evaluate the difference of bandwidth usage/price between when you had the s3sync backup and now?
the rsync –delete is not working properly for me. When I delete a single file, it works well : the file is also deleted on the S3 bucket. But when I delete a folder, both folders & files contained in it are still on the S3 bucket when I do a “s3cmd ls”. Do you have the same problem ?
the rsync was really slow with s3fs, so searching around I found that Duplicity support S3 backup. It was easy to configure and it works really well for me : embeded compression to save on space in S3 and encryption with gpg. I did quite many trials and speed is also good : 140Mo backup in 15min.
Hi all.
I have successfully installed s3fs on Ubuntu 8.04 2.6.15-51-server.
The things is for any I/O operation on the mounted dir /mnt/dir-bkp I gen
t an I/O error. Same thing for rsync
eg rsync -va /home/dir1 /mnt/dir1-bkp/
Output
rsync: recv_generator: mkdir “/mnt/expo-bkp/dir1″ failed: Input/output error (5)
*** Skipping any contents from this failed directory ***
Any ideas ?
Hey John,
Happened upon your blog via google
Do you have any experience syncing the other way around? I would like to keep a copy of our s3 assets in sync on the server. I have changed Paperclip to use the file system in development mode, and downloading GB’s of data from S3 is error prone.
Anyone else having problems with s3fs going ballistic, even when idle, and using 100% CPU? My laptop almost caught fire!
There’s a note on 177 that it’s fixed but not for me.
Anyone else?
Forgot to mention OS X 10.5 Intel.
I know it won’t help if you decided to use s3fs but as an alternative for backups
I use http://s3sync.net/ or better if you think to a more stable and professional solution you can use an EC2 image acting as a rsync server.