How I automated my backups to Amazon S3 using s3sync.

October 9th, 2006 | Tags: , ,

UPDATE: See my newer article for the way I currently backup to Amazon S3.

Jeremy Zawodny has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3). After testing many tools available for S3 currently, I decided to use the ruby program s3sync to backup my data to S3.
As I explained an earlier post, I wanted a simple low level tool to perform automatic backups S3. I decided to use s3sync to do the heavy lifting and use the jets3t Cockpit GUI to monitor my S3 account. The following explains how I successfully started automating my backups to S3 using s3sync and cockpit.

My server is running Ubuntu Dapper with samba server. All the machines in my house use a “Public” drive on the samba server to store all files from Windows and Linux. All of our important files like photos, home movies, and documents are stored on this “public” drive. This simplifies the backup procedure, since I don’t have to backup multiple sources.

The following steps describe how I backup my “public drive” to Amazon’s awesome S3 storage service. I decided to post this, because I haven’t found a fairly “simple” guide to actually automate backups to S3 that functions similar to rsync on Linux. This is a follow-up post to my original post on choosing a backup solution.


STEP 1: Activate an Amazon s3 account.

Go http://www.amazon.com/s3 and sign up for a s3 web service account

Have your Access Key ID and your Secret Access Key handy.

STEP 2: Install a management tool

(update, I no longer use cockpit, now I use the command line tools that come with s3sync that were not available at the time I wrote this original article, see Option 1.)

Option 1 use the command line shell tools that are included with s3sync (my new preferred method)

Here is a sampling of the commands from the readme file for command line tool, s3cmd.rb that can be used to create buckets and verify upload success or failure. If you use, this option, make sure you have the correct version of ruby installed on your system and you have downloaded the s3sync package (See step 3)

List all the buckets your account owns:

s3cmd.rb listbuckets

Create a new bucket:

s3cmd.rb createbucket BucketName

Delete an old bucket you don’t want any more:

s3cmd.rb deletebucket BucketName

Find out what’s in a bucket, 10 lines at a time:

s3cmd.rb list BucketName 10

Only look in a particular prefix:

s3cmd.rb list BucketName:startsWithThis

I plan to write a shell script to verify success of backup and run via cron job each night, but I haven’t done it yet. I will update here when I do.

Option 2 (original option that I used before s3sync command line shell tools were available)
UPDATE: I have had trouble getting this (or any other GUI) to work for folders containing large amounts of files. If you plan to have thousands of files stored at Amazon, then I suggest option 1.

Download a GUI tool and make sure you can log into your S3 account, create a bucket, add files, and delete them.

I have tried a lot of them, but I prefer jets3t Cockpit. It is java and open source, plus it is able to read objects uploaded to S3 by other tools. Some tools like Jungle Disk create buckets and objects in a propietary format. This means you would not be able to see your files uploaded to S3 by other tools using JD.
Here is a screenshot of Cockpit.

Cockpit
Create a bucket that you will store your backups in. Make sure to give your Bucket a unique name, because bucket names have to be unique for all users of S3. Many recommend to use your Access Key ID from S3 as a prefix. For example, fakeaccesskey1234.backups. For the rest of this article, I will assume our bucket name is “mybucket”.

Cockpit will be a handy tool for you to monitor your backups in S3, but the actual file uploading/downloading will be done with a shell script using s3sync.

STEP 3: Install s3sync (ruby)

s3sync is an open source ruby script that acts similar to rsync, the linux file sync program. Remember to read the README file from s3sync. Also, all the normal warnings apply. Test this on a couple folders and files you don’t care about and make sure you understand what you are doing. Put the source/destination in the wrong order while using the –delete option and you could blow away all of your precious data.

Lets move on.

The following apply to a Debian/Ubuntu based distribution, but could easily be adapted to your own distro.

First, make sure you have ruby 1.8.4 or greater and the ssl lib for ruby or higher

$ sudo apt-get install ruby libopenssl-ruby

check ruby version

$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]

change into the directory where you want to install s3sync, like /home/john/s3sync

download and unpack s3sync

$ wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
$ tar xvzf s3sync.tar.gz

clean up

$ rm s3sync.tar.gz

make directory for ssl certificates and download some (important, read README for info about these SSL certs)

$ mkdir certs
$ cd certs
$ wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar

run this shell archive

$ sh ssl.certs.shar

get back into main s3sync dir

$ cd ..

create two files with your favorite editor, upload.sh and download.sh with the following contents and update to suit your needs. (Important, like rsync, slashes matter, see README for examples)

upload.sh —————————————-

#!/bin/bash
# script to upload local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete /home/john/localuploadfolder/ mybucket:/remotefolder
# copy and modify line above for each additional folder to be synced

download.sh —————————————-

#!/bin/bash
# script to download local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete mybucket:/remotefolder/ /home/john/localdownloadfolder
# copy and modify line above for each additional folder to be synced

NOTICE: These scripts use the –delete option. This means it will delete any file on the destination not on source. Also, these shell scripts contain your Amazon secret info, so you will want to make sure they are only readable by you (chmod 700, credit Kelvin below). You can also add the “-v” option, so you get a verbose about of the changes. I did this this after my initial upload, so I can monitor activity via cron job emails.

Create the local upload and download directories and put some test files in the upload folder

$ mkdir localuploadfolder
$ mkdir localdownloadfolder

change the permissions on the files

$ chmod 700 upload.sh
$ chmod 700 download.sh

Test upload.sh

$./upload.sh

Use s3cmd.rb or Cockpit to make sure you can see the files made it to Amazon.

Test download.sh

$ ./download.sh

The files you uploaded to S3 should now be in your localdownloadfolder.

Once you are confident everything is working fine and your understand what you are doing. Change the shell scripts to backup your actual folders. Run the scripts manually first to ensure everything is working properly. Remember, the upload script will be limited to the upload speed of your ISP, which can be very slow. If you have a typical Cable internet connection upload speed of 384 k it will take approx. 6 hours to upload 1GB. Download speeds are usually much faster, approx 1GB/20 min, but hopefully you never need it.

STEP 4: set up cronjob to run backup script once a week/month etc.

Once you are sure the script is working for your uploads, you can automate the task by creating a cron job to run once a week, day or month. I have it run once a week, because I do nightly backups locally to my Desktop machine using rsync.

$ crontab -e

add the following line.

30 2 * * sun /path/to/upload.sh

save and exit.

Obviously, monitor to make sure everything is working.

STEP 5: kick back and relax

Now you can relax, if your laptop battery explodes and burns down your house, you know your data is safe sitting on Amazon’s geo-redundant servers right between some bits describing a new book from Oprah and a bad review on latest Ben Affleck movie!

Feel free to leave a comment if you find this useful, incorrect, or just plain uninteresting.

UPDATE 1: One additional step I did, was to create one additional bucket where I uploaded all the necessary code/scripts to restore my files using s3sync (minus my s3 information).

UPDATE 2: I have changed the chmod 755 to chmod 700 to make script not readable to all. (Credit Kelvin below). Also, updated the information about the tools I use. I no longer use cockpit to verify success, but I mostly rely on the s3sync command line tools there were not present at the time I wrote the original article.

UPDATE 3: I never gave enough credit to the actual author of s3sync. Without him, this entire process would not be possible, thanks again.

  1. mark
    October 11th, 2006 at 09:28
    Reply | Quote | #1

    Is there a way with s3sync to exclude certain files/folders like there is with rsync? I want to keep the cache directories on my web server from being backed up…there are thousands of files in each one that change every day.

  2. john.eberly
    October 11th, 2006 at 09:32
    Reply | Quote | #2

    Mark, good question, I was wondering that myself. Currently, I just specify each folder I want backed up in the parent directory, but exclude the folders I want to skip. However, this is not my preferred way either, because it means I would have to modify my script every time I add a new folder.

    I will research this further and post my findings here.

    Thanks, John

  3. john.eberly
    October 11th, 2006 at 10:27
    Reply | Quote | #3

    Mark, I tried the –exclude command with the current s3sync.rb and it does not recognize that command. I also searched through the README and the ruby code, but there is no mention of “exclude”, so I sent the developer an email asking if such a feature is planned.

  4. john.eberly
    October 11th, 2006 at 13:35
    Reply | Quote | #4

    I got an email back from the author of s3sync stated that there is no –exclude option for directories yet, but he plans on implementing it when he has time only if someone else doesn’t add it before him.

  5. October 13th, 2006 at 03:11
    Reply | Quote | #5

    Thanks, John. Nice article — I just wish you’d posted it about 2 weeks ago before I’d spent a few frustrating hours trying to get s3sync working for me properly.

    I appreciate the pointer to jes3t Cockpit, looks like the perfect tool, and no doubt a better solution than the S3 Organizer extension for Firefox.

    My question — why did you choose s3sync.rb over jets3t Synchronize?

  6. john.eberly
    October 14th, 2006 at 08:50
    Reply | Quote | #6

    Clay, I chose s3sync over jets3t not really for any solid reason. I had started working on my solution using s3sync before I knew about jet3t. It looks like jets3t would work nicely as well, but s3sync is in ruby. I reasoned that if I had to modify anything, I would rather try to learn more about ruby than Java.

  7. October 28th, 2006 at 15:29
    Reply | Quote | #7

    John,

    Great article, I have been using Jungle Disk, but the preformance is horrible. What kind of performance are you getting with your solution (I realize it is somewhat directly related to your upstream bandwidth).

  8. John
    October 28th, 2006 at 15:57
    Reply | Quote | #8

    I tried jungledisk via mounting the webdav, but I had nothing but problems, slow, errors, crashes, etc. I wanted a pure low level command line tool, and s3sync has worked well for me.

    I usually takes me 3 hours/GB for the initial upload, and 15 minutes/GB down (but I very rarely pull data down). I have upgraded my comcast to 8MB down and 768kb up.

  9. gunfus
    November 5th, 2006 at 15:36
    Reply | Quote | #9

    John nice write up. I am looking for solutions to backup to s3, but I didn’t find much that was interesting. This one is particulary well detail and using linux.

    One question, that I do have to ask, is about your nigthly backups to your desktop? so if I understood correctly:

    1) You have the public folder in your linux, and then of that you do backups to your desktop (windows?)

    2) then weekly you do a s3 sync?

    What is your monthly average cost? of the solution?

  10. John Eberly
    November 5th, 2006 at 16:02

    gunfus,

    I currently have the following at home

    1. Linux Server running samba with /var/public/ shared as “public” drive (320GB)
    2. Linux Desktop (250GB)
    3. Windows Desktop
    4. Windows Laptop
    5. Dual boot Linux/Windows laptop

    All machines have the “public” drive mounted or mapped and this is where all data is saved. I do not backup any local folders on the individual machines.

    Nightly, I use rsync to push changes on /var/public to my linux desktop in /backups/. Weekly, I run the s3sync to copy all changes on /var/public to S3.

    Currently, I have about 10GB on s3 and probably upload about < 1GB per week to s3. It currently costs me about $2-3 per week for storage on s3.

    Honestly, the next thing I would like to do is try and reduce my energy used by leaving on the desktop machine and server all night.

  11. gunfus
    November 6th, 2006 at 05:03

    John,
    Thanks for that info. if I may ask, why doing a backup to your desktop then to s3? why not go directly yo s3?

    I am asking because I am re-thinking my whole backup solution and how my computers are configured. (in views of some recent lost of data)

    This time I am defenetly going to upload to s3 and maybe every 6 month or a year.. I will do a dvd backup.

  12. John Eberly
    November 6th, 2006 at 11:01

    gunfus,

    My weekly backups do go directly to s3, I just do the nightly backups locally.

    My current reasons for backing up in order of priority:
    1. accidental deletion (backup locally/s3)
    2. hardware failure (backup locally)
    3. keep multiple versions (backup locally)
    4. fire etc, destroys my house. (backup s3)

    I didn’t mention this earlier to keep it simple, but I have data on the “public” drive that I don’t backup to s3 like GBs of disk images, music etc (non-critical/could be lost in fire).

    The reason I backup up to local machine instead of to s3 is because I backup the entire public drive (100+ GB) every night locally, but only the data I deem critical in case of fire, etc gets backed up to s3 weekly directly from my server. This way, if someone accidentally deletes all their files and the daily backup job runs, I would still have a copy of their files on the weekly s3 backup. If my house burns down I would lose my disk images and music, something I can live with to avoid additional off-site storage/transfer costs.

    There are thousands of ways you can chose to backup your data, just remember to take into account accidental deletion. Because if you use the –delete option on s3sync it will automatically delete all files on s3 that do not exist locally.

    Ideally, if s3 was free and I had a much faster upload speed on my internet connection, I would make daily, weekly, monthly, yearly backups to s3. But currently, I chose to make my daily and monthly locally, but send my weekly backups to s3.

    For what it is worth here are my personal opinions for a home backup solution:

    - Write down your plan before trying to set it up.
    - Keep it simple, try to get all the data in one location/directory that needs to be backed up. This is obviously not necessary, but I have found having data spread all over quickly gets confusing and you will forget what is backed up and what is not.
    - Automate it, otherwise it will never get done. This is why I wanted to avoid DVDs entirely.
    - Test a restore on a separate machine.
    - Keep weekly, monthly, and possibly yearly copies of your data, not just nightly copies.
    - Monitor your backups to make sure your process is working.

    I hope this response is not too confusing, just let me know if you still have questions. Basically, any backup solution should be tailored to your needs and risks. Like I said earlier, next I want to analyze how much electricity I am using to keep two machines running all night. If I decide that the costs of running the second machine just for local backups matches or exceeds the additional cost to backup to s3, I will consider switching all my backups to s3. Your post has prompted me to start looking at that again.

    Thanks for your posts gunfus.

  13. gunfus
    November 6th, 2006 at 20:54

    Hey thanks, for the wealth of advice. Do you want to look at it together..? maybe we can come up with some awesome website that talks about what we did and why we did it.. :) I tend to be a gregarious person. Anyways.. Yea.. I like your advice of writing down the backup strategy and the automation. Previously I had no write up of the backup strategy, I just kind of came up with it on the fly, had a lot of files spread and never really had tried to restore it, and it wasn’t fully automated to with the DVD backups. I manually had to copy the files every once in a while and burn the .zip files.

    This time around, with all the information and options out there, my data getting out of hand (with me and my wife taking pictures like mad). I decided to write down a layout of the network (because that is getting complicated as well), with that I am now deciding how the backup strategy will work. And will then implement it. After all, this is how I learn to develop software and have learn it the hard way.

    Just today, during work hours I downloaded jets3t Cockpit to try it out, and I am still setting up my server.. the latest ubuntu 6.10 server version seems to have some glitches that I am trying to resolve.

  14. John Eberly
    November 8th, 2006 at 21:40

    gunfus, glad you found some of my advice useful.

    I am also revisiting my backup strategy to possibly move more of it S3.

    As more and more people put more of their life in digital format, there will be an increasing value in backups so we don’t lose any of our history.

    Keep me informed of your progress by posting here. I am always interested in other people’s solutions and by posting your results, it gives other readers multiple perspectives.

  15. November 13th, 2006 at 18:54

    John,

    Many thanks for the tips – I’m currently setting up s3sync to run on my NSLU2 (a $100 NAS device from Linksys) – first successful sync today! – and this post helped a lot :-)

    (Here’s the first of the blog posts I made as I went along, in case you or your readers are interested: http://www.gilesthomas.com/?p=5)

    Cheers,

    Giles

  16. November 21st, 2006 at 05:47

    Hi John,

    I do really appreciate your help. I have documented my solution in my website. I haven’t quite finish because I am still in testing mode, but the solution is there.

    For anyone using ubuntu and wanting to use a full java solution please visit: http://larinaandangel.gunfus.com and expand the linux section.

  17. December 3rd, 2006 at 05:20

    Thanks for howto. I thought I give it a try on my MacBook running OS X 10.4.8.

    Unfortunately I’m unable to install libopenssl-ruby, since it isn’t availabe using Fink (the package management system for os x).

    I tried to install a debian-based package from packages.debian.org/…/libopenssl-ruby but got an error message using
    sudo dpkg -i libopenssl-ruby1.8_1.8.2-7sarge4_i386.deb:

    “package architecture (i386) does not match system (darwin-i386)”

    ‘Doing the google’ for libopenssl-ruby doesn’t come up with a solution – Any ideas what to do, to get your setup working on mac os x?

  18. December 11th, 2006 at 11:13

    Phil, did you try using the full java solution instead of RUBY, just use the sync program that comes with jets3.

  19. John Eberly
    December 11th, 2006 at 13:02

    Sorry for the late response Phil. I am afraid I can’t be of much help with your MAC issue. But I also agree with gunfus, you might want to look at the jets3 java solution. It seems to do the same thing, and gunfus has a good post about how he used jets3. (see comment 25 for link to his site)

  20. January 23rd, 2007 at 19:18

    Nice writeup — very clear. My only question: you mentioned making the upload/download scripts readable “by only you,” (since you are putting your access keys inside them). But you did a chmod 755 *.sh — should you do a chmod 700 *.sh? (or maybe a 711?)

    Just my .02 — great tips either way!

  21. John Eberly
    February 1st, 2007 at 09:18

    Thanks Kelvin,

    You are correct, I should have used chmod 700 instead of 755. I have updated this posting to reflect your advice.

  22. February 17th, 2007 at 07:48

    Just a minor point, John; your download.sh (as posted) won’t work because you’re missing a # in front of “script to sync local directory upto s3″.

  23. John Eberly
    February 17th, 2007 at 10:33

    Thanks Luis, I have update my post to reflect the changes.

  24. Jon
    April 13th, 2007 at 01:49

    John,
    I want to setup a system similar to yours. After hours of research it appears that I differ slightly in my needs.

    Instead of syncing up the data from the server level to s3, I instead want to setup webdrive to map a drive that will point to an external server, as well as communicate with s3. I know Jungle Disk can accomplish this (it can map its own drive). However, I have chosen not to use that tool because of its proprietary encyption format.

    So any pointers, how can I use webdrive to get data to s3?

    Also what did you use to map all your local data to store it on your public drive on the samba box?

    Thanks for the assistance!
    Jon

  25. John Eberly
    April 13th, 2007 at 09:34

    Jon, If you want an alternative to JungleDisk that maps to a windows drive letter, you might want to look at http://www.s3drive.net/ I have not tried it, but it looks like it does what you are looking for.

    There is also s3fox, a firefox extension for accessing your s3 data.

    For my “public” drive, I have a linux server running samba where I created a folder to store all of my data. All of my windows machines then are able to map the drive like any normal windows network share.

    I am not sure if I answered your questions or not, just let me know.
    John

  26. Jon
    April 13th, 2007 at 14:10

    John,
    So to map your public drive on your local machine do you simply use the built in ability within xp, or are you using another piece of software?

    Thank You
    Jon

  27. John Eberly
    April 13th, 2007 at 14:23

    Yes, that is the benefit to setting up a samba server on linux. It allows windows machines to share files without any special software, just use windows explorer and tools->map network drive after you have configured samba on the linux server.

    You can install basically any linux distro and samba or you could use some special purpose distro specially built for setting up a file server like the following:

    http://www.openfiler.com
    clarkconnect
    freenas

  28. Jon
    April 13th, 2007 at 14:32

    And because you are going direct from your linux box to s3 you don’t have to deal with slow uploads (for ex comcast) that make products like Jungle Disk less appealing.

  29. John Eberly
    April 13th, 2007 at 14:43

    Well actually, I still have to worry about my upload speed of my ISP (comcast) because my linux server is located at home. I upgraded my comcast plan which doubled my upload speed (to 768kbps), which helped. But it still took a couple of days to get the initial upload up to S3, but now it is much faster because I am only uploading the changes. I also tried jungle disk on both windows and linux awhile back, but had nothing but problems. I prefer the set up I have now and it has been working perfectly for me.

  30. April 13th, 2007 at 19:36

    Jon and John:

    J#2: Thanks again for the write-up — I’m using this on my VPS to backup; I do a sync without the delete flag every day, then weekly do a sync with the delete flag. A cheap man’s backup.

    Considering grabbing the python code and playing around with it, as I know python quite well, but know no ruby.

    J#1: If you haven’t already implemented a solution, the first thing that jumped into my head was using S3 with FUSE. In short, you get to have a mount that acts like a mounted drive, but the “drive” is S3. This thread on the Amazon Forum should help you out:

    http://developer.amazonwebservices.com/connect/thread.jspa?threadID=10271&start=0&tstart=0

  31. John Eberly
    April 13th, 2007 at 20:53

    Kelvin,

    That is a good idea with the delete flag only every week. I think I will switch my backups to using the delete flag only once a month, in case I totally mess something up, I have a longer period to recognize it.

  32. John Eberly
    April 16th, 2007 at 23:16

    Kelvin,

    I had been following the s3 fuse forum thread for awhile, but I kind of lost track of the progress. I think the ultimate solution would be mounting s3 using fuse.

    Have you successfully done this with fuse?

  33. April 25th, 2007 at 15:08

    I have been using s3sync.rb to backup a large number of files. It seems to me that for every file, on every backup, s3sync compares to see if it is newer than the remote file. This is extremely expensive in time and bandwidth. Im taking a look at duplicity now because unfortunately for larger backups, s3sync.rb seems to create a lot of waste. Am I correct?

  34. May 5th, 2007 at 23:00

    @John: I haven’t tried using FUSE+S3 quite yet, as the need hasn’t arisen.

    For others:

    S3 just got a tad cheaper:

    Current bandwidth price (through May 31, 2007)
    $0.20 / GB – uploaded
    $0.20 / GB – downloaded

    New bandwidth price (effective June 1, 2007)
    $0.10 per GB – all data uploaded

    $0.18 per GB – first 10 TB / month data downloaded
    $0.16 per GB – next 40 TB / month data downloaded
    $0.13 per GB – data downloaded / month over 50 TB
    Data transferred between Amazon S3 and Amazon EC2 will remain free of charge

    New request-based price (effective June 1, 2007)
    $0.01 per 1,000 PUT or LIST requests
    $0.01 per 10,000 GET and all other requests*
    * No charge for delete requests

  35. May 23rd, 2007 at 23:47

    Colin:

    Yes, I believe you are correct. I too am looking for more optimized solutions. In the end I broke up my upload.sh file into several different files, depending on how often I need to do backups. For instance, I might have upload-mysql.sh upload every day, but upload-home.sh upload once/week.

  36. Brian
    July 30th, 2007 at 05:49

    I’d love to hear how people compare using S3 for backups vs. Mozy. On a price point alone, the 200GB or so I would want to backup would cost me $30+ using S3 but only $6 using Mozy. Also, I don’t have to do anything myself other than tell Mozy what to backup. No scripting.

    Thoughts?

  37. John Eberly
    July 30th, 2007 at 06:25

    Brian,

    I think it kind of depends on your situation. For me, I was looking for a solution where I could back up directly from a Linux Server. Also, I only needed between 10-20GB of storage, so s3 seemed like a better solution for me.

    But for the average person with a Mac or PC, mozy seems like a very good option as well. Does Mozy support networked drives yet?

    Currently, I would like to switch from using s3sync to duplicity when I get a chance. As Colin said above, s3sync has to make a remote call for every file which is expensive in bandwidth and time. I plan to write a post once I have it successfully working.

  38. Daniel Grüßing
    August 11th, 2007 at 07:24

    Very cool Howto! Thank you very much!
    Brilliant work! =)

  39. October 2nd, 2007 at 02:53

    Want to start your private office arms race right now?

    I just got my own USB rocket launcher :-) Awsome thing.

    Plug into your computer and you got a remote controlled office missile launcher with 360 degrees horizontal and 45 degree vertival rotation with a range of more than 6 meters – which gives you a coverage of 113 square meters round your workplace.
    You can get the gadget here: http://tinyurl.com/2qul3c

    Check out the video they have on the page.

    Cheers

    Marko Fando

  40. taylor@unwirednation
    October 8th, 2007 at 15:34

    Phil,

    Try using macports for getting the libopenssl package installed… I would use macports for the entire ruby setup also.

  41. January 14th, 2008 at 02:49

    i wrote a rake script that uses this idea to
    commit/update files for rails project.

    certificate/libary are installed automatically, only need to specify keys/what to sync and your done :>

    rake s3commit / s3update

    PS: plz mention it above if u like it

  42. March 21st, 2008 at 01:50

    Thanks for this awesome tool. It appears that Amazon is timing out the connection, and that s3sync isn’t successful in re-establishing the connection

  43. addady
    March 22nd, 2008 at 23:50

    Every time a local file has been changed it will uploads the hole new file, not just what has changed. That means that if you are using S3sync for doing regular backups you are wasting unnecessary bandwidth.

    Most of the daily change is user data are files that have been update and not new files.

    You can bypass this limitation using rsync and 3rd party gateway like: http://www.s3rsync.com/

  44. Rafael
    June 26th, 2008 at 11:59

    John,

    Have you made any progress on getting Duplicity working?

  45. John Eberly
    July 21st, 2008 at 14:50

    Rafael, not yet. Actually, I have switched to using git for local backups, but unsure how I want to push all of that up to S3. I will post something about this later.

  46. david
    July 31st, 2008 at 22:52

    what happen?

    /Library/Ruby/Site/1.8/net/https.rb:128:in `on_connect’: undefined method `ca_file=’ for #> (NoMethodError)
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:600:in `connect’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:557:in `do_start’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:552:in `start’
    from ./S3_s3sync_mod.rb:55:in `make_http’
    from ./s3try.rb:62:in `S3tryConnect’
    from ./s3try.rb:71:in `S3try’
    from s3sync.rb:285:in `s3TreeRecurse’
    from s3sync.rb:346:in `main’
    from ./thread_generator.rb:79:in `call’
    from ./thread_generator.rb:79:in `initialize’
    from ./thread_generator.rb:76:in `new’
    from ./thread_generator.rb:76:in `initialize’
    from s3sync.rb:267:in `new’
    from s3sync.rb:267:in `main’
    from s3sync.rb:735
    www:s3sync shidavid$ ./upload.sh
    /Library/Ruby/Site/1.8/net/https.rb:128:in `on_connect’: undefined method `ca_file=’ for #> (NoMethodError)
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:600:in `connect’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:557:in `do_start’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:552:in `start’
    from ./S3_s3sync_mod.rb:55:in `make_http’
    from ./s3try.rb:62:in `S3tryConnect’
    from ./s3try.rb:71:in `S3try’
    from s3sync.rb:285:in `s3TreeRecurse’
    from s3sync.rb:346:in `main’
    from ./thread_generator.rb:79:in `call’
    from ./thread_generator.rb:79:in `initialize’
    from ./thread_generator.rb:76:in `new’
    from ./thread_generator.rb:76:in `initialize’
    from s3sync.rb:267:in `new’
    from s3sync.rb:267:in `main’
    from s3sync.rb:735

  47. September 8th, 2008 at 05:28

    I use s3cmd from http://s3tools.logix.cz/s3cmd for my backups to S3. It can do rsync to Amazon and PGP/GPG encrypt data, that’s all I need.

    What I like is that it’s a Python script. Python is available virtually everywhere while Ruby is rarely there by default. The 2nd good thing about Python is that I know it (but not Ruby) which allows me to do simple tweaking myself.

    Next time consider giving s3cmd a try.

  48. Gabriel Sosa
    September 28th, 2008 at 09:14

    hi there,
    This was a cool tutorial, but I having some trouble doing the backup trough cron.

    I did my script and when i run it manually the script works fine, but when I set up the crontab it doesn’t work, do you know if there is any way to debug why isn’t working when run over crontab?

    tty
    gabriel

  49. John Eberly
    September 28th, 2008 at 09:24

    @Gabriel, cron logs to syslog in ubuntu.

    sudo grep cron /var/log/syslog

    But first off, make sure you use the full path to all programs/scripts in crontab.

    ie. /home/me/script.rb not just script.rb

  50. Neil Smith
    October 13th, 2008 at 10:19

    Thanks for an easy to follow tutorial.

Comment pages

49 trackbacks

  1. Filter for 11/10 2006 - Felt Pingback | 2006/10/10
  2. links for 2006-10-14 « Amy G. Dala Pingback | 2006/10/14
  3. PapaScott » Blog Archive » links for 2006-10-16 Pingback | 2006/10/15
  4. How I automated my backups to Amazon S3 using s3sync. « The other side of the firewall Pingback | 2006/10/16
  5. The JJW Blog :: links for 2006-10-16 Pingback | 2006/10/17
  6. links for 2006-10-23 at 59ideas Pingback | 2006/10/23
  7. Marc Abramowitz » links for 2006-11-01 Pingback | 2006/11/01
  8. links for 2006-11-06 « Gobán Saor Pingback | 2006/11/06
  9. links for 2006-11-14 « Gobán Saor Pingback | 2006/11/14
  10. links for 2006-11-29 « Bloggitation Pingback | 2006/11/28
  11. tecosystems » Friday Grab Bag From Frigid Denver Pingback | 2007/01/12
  12. tecosystems » The RedMonk IT Report: S3/ZRM for Backup Pingback | 2007/01/13
  13. David Dorgan ’s Weblog » 26-03-2007: S3 and some tools put to the test… Pingback | 2007/03/26
  14. Avinash Meetoo: Blog » Blog Archive » Synchronizing two computers using Amazon S3 Pingback | 2007/04/18
  15. knolleary » Blog Archive » S3, EC2, SQS - The AWS Triumvirate Pingback | 2007/06/05
  16. John Eberly’s Geek Blog :: How I automated my backups to Amazon S3 using s3sync. Pingback | 2007/06/10
  17. Unatine :: blog : links for 2007-07-30 Pingback | 2007/07/30
  18. Link With Reality Web Log » links for 2007-07-31 Pingback | 2007/07/30
  19. Michael Gorsuch, Timid Iconoclast » links for 2007-07-31 Pingback | 2007/07/30
  20. Amazon S3 Storage Tools | Vinod Live! Pingback | 2007/08/19
  21. s3 is Trackback | 2007/08/30
  22. Tech Messages | 2007-09-11 | Slaptijack Pingback | 2007/09/11
  23. Nelson’s Backups Pingback | 2007/10/13
  24. links for 2007-11-13 Pingback | 2007/11/13
  25. Synchronizing two computers using Amazon S3 | RelatedSeek.com Pingback | 2007/12/09
  26. On Amazon S3 and competitive advantage « by jan Pingback | 2007/12/23
  27. Rafael Lima » Configurando sistema de backup do banco de dados MySQL no Amazon S3 em 10 minutos Pingback | 2008/03/19
  28. Resources and Tools for Amazon Services « mindstorms Pingback | 2008/03/26
  29. How To: Bulletproof Server Backups with Amazon S3 - PaulStamatiou.com Pingback | 2008/03/29
  30. David Laing’s blog » Blog Archive » S3 Backup options: s3sync & duplicity Pingback | 2008/05/10
  31. How I automated my backups to Amazon S3 using s3sync. | John Eberly « The other side of the firewall Pingback | 2008/05/17
  32. Tournez les boutons » Blog Archive » links for 2008-06-18 Pingback | 2008/06/17
  33. Web Host Blog » Grid Computing, Backups and Hurricanes: Web Hosting Blog, News Bytes, Features, Commentary, Future Trends, We are the Industry Experts Pingback | 2008/08/29
  34. Web Host Blog | Grid Computing, Backups and Hurricanes Pingback | 2008/08/31
  35. Backup sulla nuvola con Amazon S3 « Dema (fon) blog Pingback | 2009/03/23
  36. links for 2009-04-09 - ArcIris- Web Design in Spain, Remote Support | Diseño Web y Soporte Remoto · Arc Iris Pingback | 2009/04/09
  37. Amazon S3 | I-Tek Pingback | 2009/04/29
  38. links for 2009-05-14 « Where Is All This Leading To? Pingback | 2009/05/14
  39. Cheap Home Network Storage | gtuhl: startup technology Pingback | 2009/07/03
  40. Managing Amazon S3 Online Storage with S3sync « Rforge Pingback | 2009/10/31
  41. Amazon S3 Backup script with encryption | *.hosting Pingback | 2010/02/16
  42. Grid Computing, Backups and Hurricanes « Features « Web Host Blog Pingback | 2010/03/30
  43. Automating backups with Amazon S3 on Linux Pingback | 2011/01/20
  44. How I automated my backups to Amazon S3 using rsync and s3fs. Pingback | 2011/01/23
  45. Neue Backuplösung mit Amazon S3 | aquasonic.ch Pingback | 2011/09/09
  46. S3 command failed if the time is not synced | LogikDevelopment Pingback | 2011/12/06
  47. How To: Bulletproof Server Backups with Amazon S3 « Denis Bosire Pingback | 2011/12/30
  48. Script Installation Trackback | 2012/03/12
  49. Remote Backup to Amazon S3 | Adam's Space Pingback | 2012/04/29