How I automated my backups to Amazon S3 using s3sync.

Posted on 10/09/06 104 Comments

UPDATE: See my newer article for the way I currently backup to Amazon S3.

Jeremy Zawodny has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3). After testing many tools available for S3 currently, I decided to use the ruby program s3sync to backup my data to S3.
As I explained an earlier post, I wanted a simple low level tool to perform automatic backups S3. I decided to use s3sync to do the heavy lifting and use the jets3t Cockpit GUI to monitor my S3 account. The following explains how I successfully started automating my backups to S3 using s3sync and cockpit.

My server is running Ubuntu Dapper with samba server. All the machines in my house use a “Public” drive on the samba server to store all files from Windows and Linux. All of our important files like photos, home movies, and documents are stored on this “public” drive. This simplifies the backup procedure, since I don’t have to backup multiple sources.

The following steps describe how I backup my “public drive” to Amazon’s awesome S3 storage service. I decided to post this, because I haven’t found a fairly “simple” guide to actually automate backups to S3 that functions similar to rsync on Linux. This is a follow-up post to my original post on choosing a backup solution.


STEP 1: Activate an Amazon s3 account.

Go http://www.amazon.com/s3 and sign up for a s3 web service account

Have your Access Key ID and your Secret Access Key handy.

STEP 2: Install a management tool

(update, I no longer use cockpit, now I use the command line tools that come with s3sync that were not available at the time I wrote this original article, see Option 1.)

Option 1 use the command line shell tools that are included with s3sync (my new preferred method)

Here is a sampling of the commands from the readme file for command line tool, s3cmd.rb that can be used to create buckets and verify upload success or failure. If you use, this option, make sure you have the correct version of ruby installed on your system and you have downloaded the s3sync package (See step 3)

List all the buckets your account owns:

s3cmd.rb listbuckets

Create a new bucket:

s3cmd.rb createbucket BucketName

Delete an old bucket you don’t want any more:

s3cmd.rb deletebucket BucketName

Find out what’s in a bucket, 10 lines at a time:

s3cmd.rb list BucketName 10

Only look in a particular prefix:

s3cmd.rb list BucketName:startsWithThis

I plan to write a shell script to verify success of backup and run via cron job each night, but I haven’t done it yet. I will update here when I do.

Option 2 (original option that I used before s3sync command line shell tools were available)
UPDATE: I have had trouble getting this (or any other GUI) to work for folders containing large amounts of files. If you plan to have thousands of files stored at Amazon, then I suggest option 1.

Download a GUI tool and make sure you can log into your S3 account, create a bucket, add files, and delete them.

I have tried a lot of them, but I prefer jets3t Cockpit. It is java and open source, plus it is able to read objects uploaded to S3 by other tools. Some tools like Jungle Disk create buckets and objects in a propietary format. This means you would not be able to see your files uploaded to S3 by other tools using JD.
Here is a screenshot of Cockpit.

Cockpit
Create a bucket that you will store your backups in. Make sure to give your Bucket a unique name, because bucket names have to be unique for all users of S3. Many recommend to use your Access Key ID from S3 as a prefix. For example, fakeaccesskey1234.backups. For the rest of this article, I will assume our bucket name is “mybucket”.

Cockpit will be a handy tool for you to monitor your backups in S3, but the actual file uploading/downloading will be done with a shell script using s3sync.

STEP 3: Install s3sync (ruby)

s3sync is an open source ruby script that acts similar to rsync, the linux file sync program. Remember to read the README file from s3sync. Also, all the normal warnings apply. Test this on a couple folders and files you don’t care about and make sure you understand what you are doing. Put the source/destination in the wrong order while using the –delete option and you could blow away all of your precious data.

Lets move on.

The following apply to a Debian/Ubuntu based distribution, but could easily be adapted to your own distro.

First, make sure you have ruby 1.8.4 or greater and the ssl lib for ruby or higher

$ sudo apt-get install ruby libopenssl-ruby

check ruby version

$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]

change into the directory where you want to install s3sync, like /home/john/s3sync

download and unpack s3sync

$ wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
$ tar xvzf s3sync.tar.gz

clean up

$ rm s3sync.tar.gz

make directory for ssl certificates and download some (important, read README for info about these SSL certs)

$ mkdir certs
$ cd certs
$ wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar

run this shell archive

$ sh ssl.certs.shar

get back into main s3sync dir

$ cd ..

create two files with your favorite editor, upload.sh and download.sh with the following contents and update to suit your needs. (Important, like rsync, slashes matter, see README for examples)

upload.sh —————————————-

#!/bin/bash
# script to upload local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete /home/john/localuploadfolder/ mybucket:/remotefolder
# copy and modify line above for each additional folder to be synced

download.sh —————————————-

#!/bin/bash
# script to download local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete mybucket:/remotefolder/ /home/john/localdownloadfolder
# copy and modify line above for each additional folder to be synced

NOTICE: These scripts use the –delete option. This means it will delete any file on the destination not on source. Also, these shell scripts contain your Amazon secret info, so you will want to make sure they are only readable by you (chmod 700, credit Kelvin below). You can also add the “-v” option, so you get a verbose about of the changes. I did this this after my initial upload, so I can monitor activity via cron job emails.

Create the local upload and download directories and put some test files in the upload folder

$ mkdir localuploadfolder
$ mkdir localdownloadfolder

change the permissions on the files

$ chmod 700 upload.sh
$ chmod 700 download.sh

Test upload.sh

$./upload.sh

Use s3cmd.rb or Cockpit to make sure you can see the files made it to Amazon.

Test download.sh

$ ./download.sh

The files you uploaded to S3 should now be in your localdownloadfolder.

Once you are confident everything is working fine and your understand what you are doing. Change the shell scripts to backup your actual folders. Run the scripts manually first to ensure everything is working properly. Remember, the upload script will be limited to the upload speed of your ISP, which can be very slow. If you have a typical Cable internet connection upload speed of 384 k it will take approx. 6 hours to upload 1GB. Download speeds are usually much faster, approx 1GB/20 min, but hopefully you never need it.

STEP 4: set up cronjob to run backup script once a week/month etc.

Once you are sure the script is working for your uploads, you can automate the task by creating a cron job to run once a week, day or month. I have it run once a week, because I do nightly backups locally to my Desktop machine using rsync.

$ crontab -e

add the following line.

30 2 * * sun /path/to/upload.sh

save and exit.

Obviously, monitor to make sure everything is working.

STEP 5: kick back and relax

Now you can relax, if your laptop battery explodes and burns down your house, you know your data is safe sitting on Amazon’s geo-redundant servers right between some bits describing a new book from Oprah and a bad review on latest Ben Affleck movie!

Feel free to leave a comment if you find this useful, incorrect, or just plain uninteresting.

UPDATE 1: One additional step I did, was to create one additional bucket where I uploaded all the necessary code/scripts to restore my files using s3sync (minus my s3 information).

UPDATE 2: I have changed the chmod 755 to chmod 700 to make script not readable to all. (Credit Kelvin below). Also, updated the information about the tools I use. I no longer use cockpit to verify success, but I mostly rely on the s3sync command line tools there were not present at the time I wrote the original article.

UPDATE 3: I never gave enough credit to the actual author of s3sync. Without him, this entire process would not be possible, thanks again.

62 Comments

  1. Darren says:
    Wednesday, December 10, 2008 at 11:38am

    Been meaning to get round to this for ages and your tutorial really speeded up the process. Many thanks

  2. Kalid says:
    Sunday, January 4, 2009 at 6:40pm

    Thanks for the tutorial & walkthrough. I just setup a syncing cron job and it works well. I prefer S3Fox to the java applets, in case anyone is looking for a quick in-browser way to view your buckets. Again, appreciate the tutorial.

  3. rafiks says:
    Tuesday, January 13, 2009 at 3:17pm

    Hi! I currently have 2 linux servers ,a windows pc and a macbook that I am backing up with backuppc, I was wondering if anybody has ever tried backing up the local backuppc files to S3. I haven’t seen any backuppc integration with S3 as of now.

  4. Jason says:
    Saturday, February 14, 2009 at 10:39am

    I can’t get this to work for the life of me. Each time I run this command:

    ruby s3sync.rb -vdn -r –ssl /home/myfolder/localuploadfolder/mysql/ mybucketname:mysql

    All I get is:

    s3Prefix mysql
    localPrefix /home/myfolder/localuploadfolder/mysql/
    localTreeRecurse /home/myfolder/localuploadfolder/mysql
    s3TreeRecurse mybucketname mysql
    Creating new connection
    Trying command list_bucket mybucketname max-keys 200 prefix mysql delimiter / with 100 retries left
    Response code: 200

    I’ve tried adding a slashes before the prefix, after, all kinds of options and always get the same response. Any ideas?

  5. Matthew Clark says:
    Saturday, April 4, 2009 at 10:36am

    Brilliant, man… brilliant. I’m such a bad administrator — I run many websites from my home server for family and friends, but have never backed them up. I never had a problem, so I always thought “I’ll get to it tomorrow”. Your article helped me finally get around to it!

  6. svittal says:
    Monday, April 27, 2009 at 10:06am

    I’m trying to sync /var/log to S3.
    There are few files in /var log which belong to ‘syslog.adm’
    I see files owned by root.root is begin transferred without any problem. files owned by syslog.adm is not moved.

    any idea why?

  7. Anonymous says:
    Wednesday, September 23, 2009 at 10:02am

    A high-frequency s3sync over a large number of files is costly in terms of LIST request.

  8. Emre Akkas says:
    Friday, November 13, 2009 at 12:13pm

    Thanks for the great post. I checked the s3sync readme as well but could not figure out how to monitor progress (I am not much of a Linux person). Is there a way to write the progress to a log file (what has been uploaded etc.)?

  9. Simon says:
    Tuesday, November 24, 2009 at 4:44am

    Great article – thanks.

    Just to note that, in order to get s3sync to work, I had to make a small change to my s3config.rb to get it to check for the s3config.yml file in the local directory, as follows:-
    FROM: confpath = ["#{ENV['S3CONF']}”, “#{ENV['HOME']}/.s3conf”, “/etc/s3conf”]
    TO: confpath = ["./", "#{ENV['S3CONF']}”, “#{ENV['HOME']}/.s3conf”, “/etc/s3conf”]

    Hope this helps someone!

  10. Amedee Van Gasse says:
    Monday, December 14, 2009 at 3:52am

    Great article!
    I’m going to adapt it a bit to my own needs, in combination with backup2l, and then I’ll write a detailed article about it on my blog. That will be in Dutch.

    I’m also thinking about “bouncing” an EC2 server:
    * start the EC2 server
    * rsync from my machine to EC2
    * copy the data from EC2 to S3
    * shut down the EC2
    Ideally it would take less than one hour to do this so it would only cost me a couple of cents per day (or per week) to run the EC2 and I could use the rsync protocol more efficiently.

    By the way, you may want to delete some of the spam comments

  11. Kelso says:
    Thursday, August 12, 2010 at 2:04am

    Will this method still work?

  12. Amedee says:
    Monday, August 16, 2010 at 7:47am

    @Slow Down Music:
    I prefer BOTH: a hands-on backup AND a backup at a remote location. Preferably in another continent.
    Just in case a meteor strikes… ;-)

42 Trackbacks

  1. By Filter for 11/10 2006 - Felt on October 10, 2006 at 9:18 pm

    [...] John Eberly: How I automated my backups to Amazon S3 using s3sync Finally, a step-by-step on how to backup using Amazon S3. [...]

  2. By links for 2006-10-14 « Amy G. Dala on October 14, 2006 at 7:18 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: amazon s3 ruby software geekery) [...]

  3. By PapaScott » Blog Archive » links for 2006-10-16 on October 15, 2006 at 11:43 pm

    [...] How I automated my backups to Amazon S3 using s3sync (tags: backup amazon s3 s3sync rsync) [...]

  4. [...] How I automated my backups to Amazon S3 using s3sync.: interesting. I should probably try it. [...]

  5. By The JJW Blog :: links for 2006-10-16 on October 17, 2006 at 2:45 pm

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: s3 storage backup) [...]

  6. By links for 2006-10-23 at 59ideas on October 23, 2006 at 9:20 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: backup rsync) [...]

  7. By Marc Abramowitz » links for 2006-11-01 on November 1, 2006 at 11:07 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: backup s3 amazon) [...]

  8. By links for 2006-11-06 « Gobán Saor on November 6, 2006 at 8:43 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: s3 backup amazon s3sync) [...]

  9. By links for 2006-11-14 « Gobán Saor on November 14, 2006 at 7:34 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly’s Geek Blog (tags: backup s3 amazon ruby sync s3sync) [...]

  10. By links for 2006-11-29 « Bloggitation on November 28, 2006 at 5:20 pm

    [...] How I automated my backups to Amazon S3 using s3sync (tags: amazon s3 ruby backup) [...]

  11. By tecosystems » Friday Grab Bag From Frigid Denver on January 12, 2007 at 5:44 pm

    [...] Speaking of personal backups, I’ve finally settled on S3 as the will-be solution to my backup issues. I’ll maintain local copies for the sake of convenience, but given the fact that my music collection is – apart from my apartment – my most valuable material possession (I’m pretty sure it’s worth more than my car), I need offsite backups and S3 is the solution of choice. The problem is a.) the size of my music collection (50 GBs+; small by some standards, but large enough to be a problem), and b.) my absurdly slow upload cap (768, I think). Forgetting the math, because of spikes in upload capacity, it’s going to take days for the collection to upload. During that time, my local bandwidth will be negatively impacted, so the current plan is to initiate the sync to S3 shortly before I head to Boston next week for Mashup Camp. The upload client I’ve selected – it’s a Windows box, so this solution would require too much overhead – is JungleDisk. Anybody happen to know how it will behave if the upload is terminated prematurely? [...]

  12. [...] Next, I needed to establish an automated backup of both the webroot and our backed up MySQL databases to our predetermined offsite provider, Amazon’s S3. To do so, I followed these simple instructions. The author, John Eberly, walks you through the installation of a Ruby based rsync clone, s3sync, the creation of a simple bash script that will execute that script, and the scheduling of that job. While the notes are excellent and quite complete, a couple of issues/clarifications: [...]

  13. [...] Oh well, there is amazon and s3 python libraries. One last pain with this is that on the default debian 3.1 it doesn’t work with the version of ruby installed which is 1.8.2 but it needs 1.8.4 or greater… In case you are interested in setting it up for backups, there’s a great post on automating backups using s3 and s3sync , enjoy. [...]

  14. [...] This is extremely cheap for the peace of mind you can enjoy when you know “your data is safe sitting on Amazon’s geo-redundant servers right between some bits describing a new book from Oprah and a [...]

  15. [...] tool I looked at was the ruby-based s3sync. Following some instructions google found for me on John Eberly’s blog, I set about creating a storage bucket and began uploading the 1.5Gb of photos from my main [...]

  16. [...] http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/ Tags: backup, s3, amazon, ruby, sysadmin, storage, sync(del.icio.us history) [...]

  17. By Unatine :: blog : links for 2007-07-30 on July 30, 2007 at 5:32 pm

    [...] How I automated my backups to Amazon S3 using s3sync. Tags: none July 31, 2007, at 4:30 — links — BY-NC-SA [...]

  18. [...] How I automated my backups to Amazon S3 using s3sync Write-up of using s3sync to back up a server to Amazon S3. (tags: administration howto article aws) [...]

  19. [...] How I automated my backups to Amazon S3 using s3sync. Excellent step-by-step guide to ensuring that your data is safe. Considering that all of my data is on a VPS at Linode, this is something to look into. It certainly looks better than my current rsync strategy. (tags: s3 ruby sysadmin) [...]

  20. By Amazon S3 Storage Tools | Vinod Live! on August 19, 2007 at 12:03 pm

    [...] directory and an S3 bucket:prefix. It behaves somewhat, but not precisely, like the rsync program. John Eberly has an efficient way of using S3Sync with Jets3t. [...]

  21. By s3 is on August 30, 2007 at 12:11 am

    s3 is…

    I saw this domain for sale, according to the appraisals, its worth over $5000.00 usd. It has just the perfect amount of keywords….

  22. By Tech Messages | 2007-09-11 | Slaptijack on September 11, 2007 at 5:32 pm

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly – I’m seriously considering moving all my backups to Amazon S3. Do you have any experience with the service? [...]

  23. By Nelson’s Backups on October 13, 2007 at 7:27 am

    [...] storing some (if not all) of his important files out on Amazon’s S3. There is even a great little ruby app that makes this super easy. Typical Debian operating [...]

  24. By links for 2007-11-13 on November 13, 2007 at 1:33 am

    [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly Jeremy Zawodny has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3). After testing many tools available for S3 currently, I decided to use the ruby program s3sync to back (tags: article automation blog code command computer data filesystem guide hack hacks hosting howto imported linux mac macosx network online osx programming rails reference rubyonrails scripting server services software startup storage sysadmin Tech tool tools tutorial tutorials ubuntu Web2.0 webdev webservices windows work sync Ruby amazon backup s3) [...]

  25. [...] This is extremely cheap for the peace of mind you can enjoy when you know “your data is safe sitting on Amazon’s geo-redundant servers right between some bits describing a new book from Oprah [...]

  26. By On Amazon S3 and competitive advantage « by jan on December 23, 2007 at 4:15 am

    [...] can find many tutorials on how to use s3sync to do the backups as well. All that is very easy. Too easy actually. [...]

  27. [...] Para enviar o backup realizado para um conta no Amazon S3, que é o web service de storage da Amazon, siga as instruções abaixo que foram retiradas deste link. [...]

  28. [...] How I automated my backups to Amazon S3 using s3sync [...]

  29. [...] the most part, I took the advice of John Eberly in his automated S3 backups article. However, I did several things differently so I thought I would show what I did in an [...]

  30. [...] May 17, 2008 at 11:37 pm · Filed under How-to How I automated my backups to Amazon S3 using s3sync. | John Eberly: [...]

  31. [...] How I automated my backups to Amazon S3 using s3sync. | John Eberly (tags: .en amazon backup sauvegarde aws-s3 tutorial billet 2008 tutoriel) [...]

  32. [...] The mechanics of this – while not exactly rocket science – are not trivial either at this point. One needs an intermidiary piece of software to handle the mechnics of the backups and restore. You can read all the technical details of one person’s solution using S3 as a backup here: http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/. [...]

  33. [...] The mechanics of this – while not exactly rocket science – are not trivial either at this point. One needs an intermidiary piece of software to handle the mechnics of the backups and restore. You can read all the technical details of one person’s solution using S3 as a backup here: http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/. [...]

  34. [...] per Howto su Linux : John Eberly Questo post è stato scritto da antonde e pubblicato il Marzo 23, 2009 alle 10:04 am in howtocon [...]

  35. [...] John Eberly » How I automated my backups to Amazon S3 using s3sync. (tags: s3 backup server) Subscribe in a reader Subscribe to Arc Iris by Email Language [...]

  36. By Amazon S3 | I-Tek on April 29, 2009 at 8:47 am

    [...] How I automated my backups to Amazon S3 using s3sync Dit was geschreven door Herman. Geplaatst op dinsdag, 28 april 2009, at 21:15. Opgeslagen onder WebTek. Tagged Blog, internet, mobiel, sync, toepassing, web 2.0, WordPress. Bookmark de permalink. Volg commentaar via de RSS feed. Plaats een reactie of plaats een trackback. [...]

  37. [...] John Eberly » How I automated my backups to Amazon S3 using s3sync. (tags: webservices amazons3 amazon ec2 s3 backups admin reference sysadmin backup rsync sync aws automation ruby data server mac tools tutorial blog article software howto online todo storage ubuntu linux s3sync) [...]

  38. [...] Setup a backup script that pushes select important items up to s3 each week. I used s3sync to do this but you can use whatever you like. s3sync is straight forward and easy to use in command line scripts. Here is the s3sync site and here is a blog post that describes usage well. [...]

  39. [...] John Eberly’s blog was an inspiration to get started. Follow the link to his excellent blog post. Possibly related posts: (automatically generated)Use Amazon S3 online storage as an extra harddisk with Googles s3fsManage Amazon S3 BucketsUsing the Directory Editor in EmacsUsing Amazon S3 via s3sync [...]

  40. By Amazon S3 Backup script with encryption | *.hosting on February 16, 2010 at 1:36 pm

    [...] are already a few guides that show you how to implement s3sync on your [...]

  41. [...] The mechanics of this – while not exactly rocket science – are not trivial either at this point. One needs an intermidiary piece of software to handle the mechnics of the backups and restore. You can read all the technical details of one person’s solution using S3 as a backup here: http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/. [...]

Post a Comment

Your email is never published or shared.