<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Eberly</title>
	<atom:link href="http://blog.eberly.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.eberly.org</link>
	<description>suggest a tagline....</description>
	<lastBuildDate>Mon, 28 Feb 2011 04:22:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Rails fuzzy searching with Sunspot gem</title>
		<link>http://blog.eberly.org/2010/07/27/rails-fuzzy-searching-with-sunspot-gem/</link>
		<comments>http://blog.eberly.org/2010/07/27/rails-fuzzy-searching-with-sunspot-gem/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 03:05:38 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[]]></category>
		<category><![CDATA[double-metaphone]]></category>
		<category><![CDATA[fuzzy]]></category>
		<category><![CDATA[metaphone]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[wildcard]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/?p=209</guid>
		<description><![CDATA[SEARCH FOR: &#8216;Jon Smath&#8217; and get => &#8216;John Smith&#8217; This post explains the easy way to get &#8220;fuzzy&#8221; search results when using sunspot with ruby on rails. This is probably obvious to solr experts out there, but I found the information to be lacking in the rails community. I originally had compared the results of [...]]]></description>
			<content:encoded><![CDATA[<p><strong>SEARCH FOR:  &#8216;Jon Smath&#8217; and get => &#8216;John Smith&#8217;</strong></p>
<p>This post explains the easy way to get &#8220;fuzzy&#8221; search results when using <a href="http://github.com/outoftime/sunspot">sunspot</a> with <a href="http://rubyonrails.org/">ruby on rails</a>.   This is probably obvious to solr experts out there, but I found the information to be lacking in the rails community.   I originally had <a href="http://blog.eberly.org/2009/07/14/fuzzy-search-results-solr-vs-sphinx/">compared the results of solr vs sphinx for fuzzy searching</a>.  In the original comparison I was using the Levenshtein distance for solr, which turns out not to scale well and doesn&#8217;t always return the best results when there are exact matches. </p>
<p>For this reason, we switched to using the &#8220;<a href="http://en.wikipedia.org/wiki/Double_Metaphone">Double-Metaphone&#8221;</a> algorithm for fuzzy searching in solr.  It provides a simple way to get fuzzy results for solr searches while still being able to scale well since most of the work is done at the time of indexing.  </p>
<p>Here is how to make it work using Sunspot.</p>
<p>Edit solr/conf/schema.xml and add the following line to the &lt;analyzer&gt; section.</p>
<p><script src="http://gist.github.com/493218.js?file=gistfile1.txt"></script></p>
<p>So it looks something like this.<br />
<script src="http://gist.github.com/493097.js?file=gistfile1.txt"></script></p>
<p>Note: order is important and you will want this towards the bottom of the config block.   See the <a href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters">Solr Wiki</a> for more detailed information</p>
<p>Last step, restart solr and reindex.   Then test your searches with misspellings etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2010/07/27/rails-fuzzy-searching-with-sunspot-gem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fuzzy search results, solr vs sphinx</title>
		<link>http://blog.eberly.org/2009/07/14/fuzzy-search-results-solr-vs-sphinx/</link>
		<comments>http://blog.eberly.org/2009/07/14/fuzzy-search-results-solr-vs-sphinx/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 04:15:02 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[rails]]></category>
		<category><![CDATA[full text search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[sphinx]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/?p=173</guid>
		<description><![CDATA[A post I made on my company blog about fuzzy search results comparing solr to sphinx. Fuzzy search results, solr vs sphinx]]></description>
			<content:encoded><![CDATA[<p>A post I made on my company blog about fuzzy search results comparing solr to sphinx.<br />
<a href="http://www.rubyglob.com/solr-vs-sphinx-fuzzy-search/">Fuzzy search results, solr vs sphinx</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2009/07/14/fuzzy-search-results-solr-vs-sphinx/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multi-staged deployment with versioning using git and capistrano.</title>
		<link>http://blog.eberly.org/2008/12/18/multi-staged-deployment-with-versioning-using-git-and-capistrano/</link>
		<comments>http://blog.eberly.org/2008/12/18/multi-staged-deployment-with-versioning-using-git-and-capistrano/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 23:41:41 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[]]></category>
		<category><![CDATA[git rails capistrano]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/?p=122</guid>
		<description><![CDATA[I recently posted a an article on my companies blog about using git with capistrano for deployment.]]></description>
			<content:encoded><![CDATA[<p>I recently posted a an article on my companies blog about <a href="http://www.rubyglob.com/multi-staged-deployment-with-versioning-using-git/">using git with capistrano for deployment.</a>                  </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2008/12/18/multi-staged-deployment-with-versioning-using-git-and-capistrano/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How I automated my backups to Amazon S3 using rsync and s3fs.</title>
		<link>http://blog.eberly.org/2008/10/27/how-i-automated-my-backups-to-amazon-s3-using-rsync/</link>
		<comments>http://blog.eberly.org/2008/10/27/how-i-automated-my-backups-to-amazon-s3-using-rsync/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 03:24:14 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[backups]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/?p=93</guid>
		<description><![CDATA[The following is how I automated my backups to Amazon S3 in about 5 minutes. I lot has changed since my original post on automating my backups to s3 using s3sync. There are more mature and easier to use solutions now. I am switching because using s3fs gives you much more options for using s3, [...]]]></description>
			<content:encoded><![CDATA[<p>The following is how I automated my backups to <a href="http://aws.amazon.com/s3/">Amazon S3</a> in about 5 minutes.</p>
<p>I lot has changed since my original post on <a href="http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/">automating my backups to s3 using s3sync</a>.  There are more mature and easier to use solutions now.  I am switching because using s3fs gives you much more options for using s3, it is easier to set up and it is faster.</p>
<p>I now use a combination of <a href="http://s3fs.googlecode.com/">s3fs</a> to mount a S3 bucket to local directory and then use rsync to keep up to date with my files.  The following directions are geared towards Ubuntu linux, but could be modified for any linux distribution and <a href="http://www.rsaccon.com/2007/10/mount-amazon-s3-on-your-mac.html">Mac OSX</a>. </p>
<p><span id="more-93"></span><br />
<strong>STEP 1: Install s3fs</strong></p>
<p>The first step is to install s3fs dependencies.  (Assuming Ubuntu)</p>
<pre>
sudo apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev
</pre>
<p>Next, install the most recent version of <a href="http://code.google.com/p/s3fs/">s3fs</a>.  As of now the most recent is r177, but a quick check of <a href="http://code.google.com/p/s3fs/downloads/list">s3fs downloads</a> will show the most recent.</p>
<pre>
wget http://s3fs.googlecode.com/files/s3fs-r177-source.tar.gz
tar -xzf s3fs*
cd s3fs
make
sudo make install
sudo mkdir /mnt/s3
sudo chown yourusername:yourusername /mnt/s3
</pre>
<p><strong>STEP 2: Create script to mount your Amazon s3 bucket using s3fs and sync files.</strong></p>
<p>The following assumes you already have a bucket created on Amazon S3.  If this is not the case, you can use a tool like <a href="https://addons.mozilla.org/en-US/firefox/addon/3247">s3Fox</a> to create one.</p>
<p>Choose a text editor of your choice and make a shell script to mount your bucket, perform rsync, then unmount.  It is not necessary to unmount your S3 directory after each rsync, but I prefer to be safe.  One mistake like an &#8216;rm&#8217; on your root directory could wipe all of your files on your machine and your S3 mount.  You should probably start with a test directory to be safe.</p>
<p>Make the file s3fs.sh</p>
<pre>
#!/bin/bash
/usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3
/usr/bin/rsync -avz --delete /home/username/dir/you/want/to/backup /mnt/s3
/bin/umount /mnt/s3
</pre>
<p>Note, the &#8211;delete option.  This will delete any files that have been removed on the &#8216;source&#8217;.<br />
Change permissions to make executable</p>
<pre>
chmod 700 s3fs.sh
</pre>
<p>Before you run the entire script, you might want to run each line separately to make sure everything is working properly.  The paths to rsync, umount might be different on your system. (Use &#8216;which rsync&#8217; to check)  Just for fun, I did a &#8216;df -h&#8217;, which showed I now have 256 Terabytes available on the s3 mount!</p>
<p>Next, run the script and let it do its work.  This could take a long time depending on how much data you are uploading initially.  Your internet upload speed will be the bottleneck. </p>
<pre>
sudo ./s3fs.sh
</pre>
<p>That&#8217;s it!  You are backing up to Amazon S3.  You probably want to automate this using cron after you are sure everything is running o.k.  Just for simplicity of this tutorial, lets assume you are setting up the cron job as root so we don&#8217;t need to worry about editing permissions for mount/umounting directory.</p>
<p><strong>STEP 3: Automate it with cron</strong></p>
<pre>
sudo su
crontab -e
0 0 * * * /path/to/s3fs.sh # this runs it everyday at midnight
</pre>
<p>p.s. I use this in combination with hourly backups to a second local machine using git to have revision history.  I only backup nightly to s3 without revision history in case my house burns down etc.  If you would like to know how I set up my git backups locally, just leave a comment and I can make a follow up post.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2008/10/27/how-i-automated-my-backups-to-amazon-s3-using-rsync/feed/</wfw:commentRss>
		<slash:comments>56</slash:comments>
		</item>
		<item>
		<title>Google Docs &amp; Spreadsheets toolbar button.</title>
		<link>http://blog.eberly.org/2007/01/29/google-docs-spreadsheets-toolbar-button/</link>
		<comments>http://blog.eberly.org/2007/01/29/google-docs-spreadsheets-toolbar-button/#comments</comments>
		<pubDate>Tue, 30 Jan 2007 04:29:04 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/2007/01/29/google-docs-spreadsheets-toolbar-button/</guid>
		<description><![CDATA[I use both Google Docs and the Google toolbar everyday, I was surprised to find there was not a toolbar button for Google Docs &#038; Spreadsheets&#8230;. so I made one. Download the Google Docs &#038; Spreadsheets toolbar button here. Enjoy!]]></description>
			<content:encoded><![CDATA[<p>I use both Google Docs and the Google toolbar everyday, I was surprised to find there was not a toolbar button for Google Docs &#038; Spreadsheets&#8230;. so I made one.</p>
<p>Download the <a title="Google Docs &#038; Spreadsheets toolbar button" href="http://toolbar.google.com/buttons/add?url=http://static.eberly.org/google/docstoolbarbutton.xml">Google Docs &#038; Spreadsheets toolbar button here.</a></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2007/01/29/google-docs-spreadsheets-toolbar-button/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>World&#8217;s worst use of a jpeg.</title>
		<link>http://blog.eberly.org/2006/10/12/worlds-worst-use-of-a-jpeg/</link>
		<comments>http://blog.eberly.org/2006/10/12/worlds-worst-use-of-a-jpeg/#comments</comments>
		<pubDate>Thu, 12 Oct 2006 22:19:19 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[random]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/2006/10/12/worlds-worst-use-of-a-jpeg/</guid>
		<description><![CDATA[NOTE: This was the top link on reddit.com for awhile A jpeg is good for a lot of things, but a live data feed is not one of them. In fact it might be violation of the Americans with Disabilities Act. The Seattle Fire Department has been hosting a live html feed of the latest [...]]]></description>
			<content:encoded><![CDATA[<p>NOTE: <a href="http://www.reddit.com/r/reddit.com/comments/lxbt/worlds_worst_use_of_a_jpeg/?already_submitted=true">This was the top link on reddit.com for awhile</a></p>
<p>A jpeg is good for a lot of things, but a live data feed is not one of them.  In fact it might be violation of the Americans with Disabilities Act.</p>
<p>The Seattle Fire Department has been hosting a <a href="http://www2.cityofseattle.net/fire/realTime911/getDatePubTab.asp">live html feed</a> of the latest 911 calls to the Seattle Fire Department. Many people have found this information useful for various reasons, eg. avoid areas of major accidents/activity.   I have been using this data as a feed for <a href="http://www.seattle911.com">seattle911.com</a>, which is a classic google maps mashup that displays the 911 call data on a map of the Seattle Area.  I host the site at my own expense, without any advertising.</p>
<p>Last night, I noticed www.seattle911.com was suddenly broken.  After 30 seconds of investigation, I found out that they swithced their data feed from text to a jpeg.<br />
<img alt="datafeed" id="image17" src="http://blog.eberly.org/wp-content/uploads/2006/10/datafeed.jpg" /></p>
<p><span id="more-14"></span></p>
<p><a href="http://www2.cityofseattle.net/fire/realTime911/showIncidentsSmall.htm">See it in action for yourself.</a></p>
<p>Here are their reasons for switching the feed to a jpeg, I found via a message board at <a href="http://psrg.org/pipermail/qst_psrg.org/2006-October/000062.html">psrg.org</a></p>
<blockquote><p>&#8220;The following message has been posted to our web site: &#8220;PLEASE NOTE:  To address security concerns raised by the public safety community, Seattle Fire Department now displays current incident data as an image within your browser rather than as plain text.  The information displayed in the image is updated every minute and then automatically refreshed to the screen.  Historical incident information will continue to be displayed as text, just as it has been in the past. We are sorry for any inconvenience this action has caused.  <strong>Our intent is to enhance the safety of personnel and the public but still provide information about current emergencies in our community.</strong>  I&#8217;ve posted the above on the real-time page.&#8221;I know this does not address your specific concerns but it is the reason for the change.&#8221;</p>
<p>When I questioned the safety concerns, he responded:</p>
<p><strong>I cant go into details but putting the information in text (data) format allows people to pipe the data into other computer programs to instantly analyze patterns.  An image makes that very difficult (although not impossible).  We don&#8217;t want to make it easy for the &#8220;bad guys&#8221;.</strong></p>
<p>Rob  Bruce Miller wrote:  Starting today, SFD implemented a change, which When I questioned the safety concerns, he responded: &#8220;I can&#8217;t go into details but putting the information in text (data) format allows people to pipe the data into other computer programs to instantly analyze patterns.  An image makes that very difficult (although not impossible).  <strong>We don&#8217;t want to make it easy for the &#8220;bad guys&#8221;.&#8221;</strong></p>
<p>I think this might be a case of &#8220;we try to make it harder for the bad guys, who will still be able to do what they want, but in reality we just hurt everybody else&#8217;s fair use.&#8221;</p></blockquote>
<div style="margin: 0px; min-height: 14px">I don&#8217;t really understand the &#8220;security concerns&#8221;, but there are numerous reasons this is a step backwards for usability not to mention a possible violation of the  Americans with Disabilities Act:</div>
<div style="margin: 0px; min-height: 14px">
<ol>
<li>can&#8217;t resize fonts for visually impaired</li>
<li>takes 8 times longer to download for people on dial-up modems</li>
<li>blind, PDAs, etc, etc</li>
</ol>
</div>
<div style="margin: 0px; min-height: 14px">I am sure if the &#8220;bad guys&#8221; wanted to, they could download the image and get the text out mechanically anyway.  But what could they really do with this information?</div>
<p>If you want to show your support or opposition to this change, here are few email addresses:</p>
<p>Nick Licata, City Council Person who oversees Public Safety: nick.licata {at] seattle.gov<br />
Gregory Dean, Seattle Fire Chief (through his assistant): debbie.brooks [at} seattle.gov</p>
<p>Here is a screenshot of seattle911.com before the lost data feed.<img id="image16" alt="Seattle911.com" src="http://blog.eberly.org/wp-content/uploads/2006/10/seattle911.gif" /></p>
<p>I am not really concerned about losing the feed for my site (it is less work for me anyway), but this change does nothing but hurt those who were using the feed for legitimate purposes, and does nothing for preventing the so-called &#8220;bad guys&#8221; from using the information.</p>
<p>UPDATE 1:  Just to prove how silly this idea is, I used this one line to extract all data from the new &#8220;jpeg&#8221; method. If I can do this, I am sure the &#8220;bad guys&#8221; could as well.<code>djpeg -pnm -gray text.jpg | gocr -</code></p>
<p>UPDATE 2: I like <a href="http://reddit.com/">reddit.com</a>. <a href="http://reddit.com/info/lxbt/comments">Here is a link to the comments over there</a>.  This page was the number one link at reddit for awhile last night which sent over 20,000 unique visitors here over a 24 hour period.</p>
<p>UPDATE 3:  This topic was covered by the <a href="http://seattlepi.nwsource.com/local/288661_fireweb14.html">Seattle PI</a>, <a href="http://www.seattlest.com/archives/2006/10/13/seattle_fire_department_doesnt_want_you_looking_at_911_data.php">Seattlest</a> and <a href="http://www.technorati.com/search/http://blog.eberly.org/">many more</a>.</p>
<p>UPDATE 4: This post has been picked up by <a href="http://yro.slashdot.org/article.pl?sid=06/10/15/0017211">Slashdot.</a>  Also, the jpeg thing can be thwarted by one line <code>curl "www2.cityofseattle.net/fire/realTime911/sfdIncidentList.jpg" | djpeg -pnm -gray | gocr -</code> Credit <span class="little"><a href="http://reddit.com/user/probablycorey"><strong>probablycorey</strong></a>  from reddit for piping it in via curl.  I probably would have made it 2 lines.  Silly me.</span></p>
<p>UPDATE 5: Article at <a href="http://blog.programmableweb.com/?p=444">Programmableweb.com</a></p>
<p>UPDATE 6: A <a href="http://apps.leg.wa.gov/wslrcwsup/RCW%20%2042%20%20TITLE/RCW%20%2042%20.%2056%20%20CHAPTER/RCW%20%2042%20.%2056%20.030.htm">link to a Washington State law</a> regarding this topic.  Credit <a href="http://slashdot.org/%7Eamemily">amemily (462019)</a> from <a href="http://yro.slashdot.org/comments.pl?sid=200805&#038;cid=16448209">Slashdot.</a></p>
<p>UPDATE 7: <a href="http://msnbc.msn.com/id/15419164/wid/11915829?GT1=8618">A lawsuit related to this issue.</a></p>
<p>UPDATE 8: An interesting perspective on <a href="http://www.forbes.com/2006/04/15/open-source-intelligence_cx_rs_06slate_0418steele.html">Open Source Intelligence.</a></p>
<p>UPDATE 9: An article in my home state newspaper &#8211; <a href="http://journalstar.com/articles/2007/01/03/news/local/doc459b0613d4796004217407.txt">Lincoln Journal-Star </a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2006/10/12/worlds-worst-use-of-a-jpeg/feed/</wfw:commentRss>
		<slash:comments>58</slash:comments>
		</item>
		<item>
		<title>How I automated my backups to Amazon S3 using s3sync.</title>
		<link>http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/</link>
		<comments>http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/#comments</comments>
		<pubDate>Tue, 10 Oct 2006 00:45:40 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[backups]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[aws]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/archive/how-automate-your-backup-to-amazon-s3-using-s3sync/</guid>
		<description><![CDATA[UPDATE: See my newer article for the way I currently backup to Amazon S3. Jeremy Zawodny has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3). After testing many tools available for S3 currently, I decided to use the ruby program s3sync to backup my data [...]]]></description>
			<content:encoded><![CDATA[<p>UPDATE:  <a href="http://blog.eberly.org/2008/10/27/how-i-automated-my-backups-to-amazon-s3-using-rsync/">See my newer article for the way I currently backup to Amazon S3.</a></p>
<p><a href="http://jeremy.zawodny.com/blog/archives/007641.html">Jeremy Zawodny</a> has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3).  After testing many tools available for S3 currently, I decided to use the ruby program s3sync to backup my data to S3.<br />
As I explained an earlier post, I wanted a simple low level tool to perform automatic backups S3.  I decided to use <a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=11975&#038;start=0&#038;tstart=0">s3sync</a> to do the heavy lifting and use the <a href="https://jets3t.dev.java.net/cockpit.html">jets3t Cockpit</a> GUI to monitor my S3 account.  The following explains how I successfully started automating my backups to S3 using s3sync and cockpit.</p>
<p>My server is running Ubuntu Dapper with samba server. All the machines in my house use a &#8220;Public&#8221; drive on the samba server to store all files from Windows and Linux. All of our important files like photos, home movies, and documents are stored on this &#8220;public&#8221; drive. This simplifies the backup procedure, since I don&#8217;t have to backup multiple sources.</p>
<p>The following steps describe how I backup my &#8220;public drive&#8221; to Amazon&#8217;s awesome S3 storage service. I decided to post this, because I haven&#8217;t found a fairly &#8220;simple&#8221; guide to actually automate backups to S3 that functions similar to rsync on Linux. This is a follow-up post to my original post on <a href="http://blog.eberly.org/2006/10/02/cheap-reliable-secure-off-site-storage-for-digital-life-backup-where-are-you/">choosing a backup solution</a>.</p>
<p><span id="more-10"></span><br />
<strong>STEP 1: Activate an Amazon s3 account.</strong></p>
<p>Go <a href="http://www.amazon.com/s3">http://www.amazon.com/s3</a> and sign up for a s3 web service account</p>
<p>Have your Access Key ID and your Secret Access Key handy.</p>
<p><strong>STEP 2: Install a management tool<br />
</strong></p>
<p>(update, I no longer use cockpit, now I use the command line tools that come with s3sync that were not available at the time I wrote this original article, see Option 1.)</p>
<p><strong>Option 1</strong> use the command line shell tools that are included with s3sync (my new preferred method)
<p>Here is a sampling of the commands from the readme file for command line tool, s3cmd.rb that can be used to create buckets and verify upload success or failure.  If you use, this option, make sure you have the correct version of ruby installed on your system and you have downloaded the s3sync package (See step 3)</p>
<p>List all the buckets your account owns:
<pre>
s3cmd.rb listbuckets</pre>
<p> Create a new bucket:
<pre>
s3cmd.rb createbucket BucketName</pre>
<p>Delete an old bucket you don&#8217;t want any more:
<pre>
s3cmd.rb deletebucket BucketName</pre>
<p>Find out what&#8217;s in a bucket, 10 lines at a time:
<pre>
s3cmd.rb list BucketName 10</pre>
<p>Only look in a particular prefix:
<pre>
s3cmd.rb list BucketName:startsWithThis</pre>
<p>I plan to write a shell script to verify success of backup and run via cron job each night, but I haven&#8217;t done it yet.  I will update here when I do.</p>
<p><strong>Option 2</strong> (original option that I used before s3sync command line shell tools were available)<br />
UPDATE:  I have had trouble getting this (or any other GUI) to work for folders containing large amounts of files.  If you plan to have thousands of files stored at Amazon, then I suggest option 1.</p>
<p>Download a GUI tool and make sure you can log into your S3 account, create a bucket, add files, and delete them.</p>
<p>I have tried a lot of them, but I prefer <a href="https://jets3t.dev.java.net/cockpit.html">jets3t Cockpit</a>. It is java and open source, plus it is able to read objects uploaded to S3 by other tools. Some tools like Jungle Disk create buckets and objects in a propietary format. This means you would not be able to see your files uploaded to S3 by other tools using JD.<br />
Here is a screenshot of Cockpit.</p>
<p><img id="image11" alt="Cockpit" src="http://blog.eberly.org/wp-content/uploads/2006/10/cockpit.jpg" /><br />
Create a bucket that you will store your backups in. Make sure to give your Bucket a unique name, because bucket names have to be unique for all users of S3. Many recommend to use your Access Key ID from S3 as a prefix. For example, fakeaccesskey1234.backups. For the rest of this article, I will assume our bucket name is &#8220;mybucket&#8221;.</p>
<p>Cockpit will be a handy tool for you to monitor your backups in S3, but the actual file uploading/downloading will be done with a shell script using s3sync.</p>
<p><strong>STEP 3: Install </strong><a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=11975&#038;start=0&#038;tstart=0">s3sync</a><strong> (ruby)</strong></p>
<p><a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=11975&#038;start=0&#038;tstart=0">s3sync</a> is an open source ruby script that acts similar to rsync, the linux file sync program. Remember to read the README file from s3sync. Also, all the normal warnings apply. Test this on a couple folders and files you don&#8217;t care about and make sure you understand what you are doing. Put the source/destination in the wrong order while using the &#8211;delete option and you could blow away all of your precious data.</p>
<p>Lets move on.</p>
<p>The following apply to a Debian/Ubuntu based distribution, but could easily be adapted to your own distro.</p>
<p>First, make sure you have ruby 1.8.4 or greater and the ssl lib for ruby or higher</p>
<pre>$ sudo apt-get install ruby libopenssl-ruby</pre>
<p>check ruby version</p>
<pre>$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]</pre>
<p>change into the directory where you want to install s3sync, like /home/john/s3sync</p>
<p>download and unpack s3sync</p>
<pre>$ wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
$ tar xvzf s3sync.tar.gz</pre>
<p>clean up</p>
<pre>$ rm s3sync.tar.gz</pre>
<p>make directory for ssl certificates and download some (important, read <a href="http://s3.amazonaws.com/ServEdge_pub/s3sync/README.txt">README</a> for info about these SSL certs)</p>
<pre>$ mkdir certs
$ cd certs
$ wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar</pre>
<p>run this shell archive</p>
<pre>$ sh ssl.certs.shar</pre>
<p>get back into main s3sync dir</p>
<pre>$ cd ..</pre>
<p>create two files with your favorite editor, upload.sh and download.sh with the following contents and update to suit your needs. (Important, like rsync, slashes matter, see README for examples)</p>
<p>upload.sh &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<pre>#!/bin/bash
# script to upload local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete /home/john/localuploadfolder/ mybucket:/remotefolder
# copy and modify line above for each additional folder to be synced
</pre>
<p>download.sh &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<pre>#!/bin/bash
# script to download local directory upto s3
cd /path/to/yourshellscript/
export AWS_ACCESS_KEY_ID=yourS3accesskey
export AWS_SECRET_ACCESS_KEY=yourS3secretkey
export SSL_CERT_DIR=/your/path/to/s3sync/certs
ruby s3sync.rb -r --ssl --delete mybucket:/remotefolder/ /home/john/localdownloadfolder
# copy and modify line above for each additional folder to be synced
</pre>
<p> NOTICE: These scripts use the &#8211;delete option. This means it will delete any file on the destination not on source. Also, these shell scripts contain your Amazon secret info, so you will want to make sure they are only readable by you (chmod 700, credit Kelvin below).  You can also add the &#8220;-v&#8221; option, so you get a verbose about of the changes.  I did this this after my initial upload, so I can monitor activity via cron job emails.</p>
<p>Create the local upload and download directories and put some test files in the upload folder</p>
<pre>$ mkdir localuploadfolder
$ mkdir localdownloadfolder</pre>
<p>change the permissions on the files</p>
<pre>$ chmod 700 upload.sh
$ chmod 700 download.sh</pre>
<p>Test upload.sh</p>
<pre>$./upload.sh</pre>
<p>Use s3cmd.rb or Cockpit to make sure you can see the files made it to Amazon.</p>
<p>Test download.sh</p>
<pre>$ ./download.sh</pre>
<p>The files you uploaded to S3 should now be in your localdownloadfolder.</p>
<p>Once you are confident everything is working fine and your understand what you are doing. Change the shell scripts to backup your actual folders. Run the scripts manually first to ensure everything is working properly. Remember, the upload script will be limited to the upload speed of your ISP, which can be very slow. If you have a typical Cable internet connection upload speed of 384 k it will take approx. 6 hours to upload 1GB. Download speeds are usually much faster, approx 1GB/20 min, but hopefully you never need it.</p>
<p><strong>STEP 4: set up cronjob to run backup script once a week/month etc.</strong></p>
<p>Once you are sure the script is working for your uploads, you can automate the task by creating a cron job to run once a week, day or month. I have it run once a week, because I do nightly backups locally to my Desktop machine using rsync.</p>
<pre>$ crontab -e</pre>
<p>add the following line.</p>
<pre>30 2 * * sun /path/to/upload.sh</pre>
<p>save and exit.</p>
<p>Obviously, monitor to make sure everything is working.</p>
<p><strong>STEP 5: kick back and relax</strong></p>
<p>Now you can relax, if your laptop battery explodes and burns down your house, you know your data is safe sitting on Amazon&#8217;s geo-redundant servers right between some bits describing a new book from Oprah and a bad review on latest Ben Affleck movie!</p>
<p>Feel free to leave a comment if you find this useful, incorrect, or just plain uninteresting.</p>
<p>UPDATE 1:  One additional step I did, was to create one additional bucket where I uploaded all the necessary code/scripts to restore my files using s3sync (minus my s3 information).</p>
<p>UPDATE 2: I have changed the chmod 755 to chmod 700 to make script not readable to all.  (Credit Kelvin below).  Also, updated the information about the tools I use.  I no longer use cockpit to verify success, but I mostly rely on the s3sync command line tools there were not present at the time I wrote the original article.</p>
<p>UPDATE 3:  I never gave enough credit to the <a href="http://developer.amazonwebservices.com/connect/profile.jspa?userID=18616">actual author of s3sync.</a>  Without him, this entire process would not be possible, thanks again.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/feed/</wfw:commentRss>
		<slash:comments>122</slash:comments>
		</item>
		<item>
		<title>Cheap, reliable, secure off-site storage for backup&#8230;</title>
		<link>http://blog.eberly.org/2006/10/02/cheap-reliable-secure-off-site-storage-for-digital-life-backup-where-are-you/</link>
		<comments>http://blog.eberly.org/2006/10/02/cheap-reliable-secure-off-site-storage-for-digital-life-backup-where-are-you/#comments</comments>
		<pubDate>Mon, 02 Oct 2006 17:09:27 +0000</pubDate>
		<dc:creator>John Eberly</dc:creator>
				<category><![CDATA[backups]]></category>

		<guid isPermaLink="false">http://blog.eberly.org/index.php/2006/10/09/cheap-reliable-secure-off-site-storage-for-digital-life-backup-where-are-you/</guid>
		<description><![CDATA[There has been numerous posts for online backup solutions (for good reason), but I haven&#8217;t found the best solution for me yet.What I want: cheap reliable off-site backups for my home Linux (Ubuntu Server Edition) file server. O.k. I am a classic digital packrat. My actual garage is very clean and organized, but my digital [...]]]></description>
			<content:encoded><![CDATA[<p>There has been numerous posts for online backup solutions (for good reason), but I haven&#8217;t found the best solution for me yet.What I want: cheap reliable off-site backups for my home Linux (Ubuntu Server Edition) file server.</p>
<p>O.k. I am a classic digital packrat. My actual garage is very clean and organized, but my digital garage is full of every byte I have saved since 1998, old Job emails, photos, music, websites, some videos, etc. I never, ever want to lose this stuff, even if my house burns down because my laptop battery explodes. So I need to backup my stuff, simple huh? Well, I already store all of our files on a Linux file server via Samba. This machine is backed up nightly to my Linux Desktop machine with cron and rsync, but this doesn&#8217;t solve the fire/flood issue.  I used to copy data to DVD and store off-site, but burning DVDs is a pain and my data is out growing DVDs. I currently have approx. 10GB of data, but expect it to grow, especially since I want to store our digital camcorder footage as well. So I began the quest for remote storage.  While these are not all of the options I considered (I looked/tried many) they came out at the top of the heap for me.<br />
My dream remote storage.</p>
<ol>
<li>Cheap, of course ($10 or less per month)</li>
<li>Secure, some of my files have info I don&#8217;t want in the wrong hands.</li>
<li>Relatively fast, but I am sure my upload speed (768 k) from my ISP, Comcast will be the bottleneck.</li>
<li>Some sort of Linux server of with shell access (so I can automate with cron and rsync)</li>
<li>Not have to set up another full fledged energy hog of a server</li>
<li>Bonus, unlimited scalability.</li>
</ol>
<p>How to store files offsite? Let go searchin&#8217;&#8230;&#8230;.</p>
<p><span id="more-4"></span></p>
<p><strong>Option 1: Amazon S3</strong><br />
This was actually my first choice, but I haven&#8217;t been able to find a good tool to use the service. I tried JungleDisk (on linux and windows), but JD had so many crashes I decided to never use it again, but I am still interested in S3. I have heard people express concerns of whether Amazon S3 storage will be around forever and I cannot answer that. While I do think Amazon S3 will be successful, I do trust Amazon over most to provide adequate time to move off of their service if they ever decide to discontinue it.</p>
<p>Pros:</p>
<ul>
<li>From Amazon, a big name company with a very good reputation</li>
<li>Relatively cheap at $0.15/GB stored and $0.20/GB transferred</li>
<li>Infinetly scalable and stored by Amazon in geographically distributed data centers (huge plus)</li>
<li>Reliability&#8230; I assume this would be the most reliable of all my options</li>
</ul>
<p>Cons:</p>
<ul>
<li>No simple way to use rsync to upload data</li>
</ul>
<p>Summary: I just wish there was a reliable tool to get my data uploaded like rsync/ssh. <a title="s3sync" href="http://developer.amazonwebservices.com/connect/thread.jspa?messageID=44471">s3sync</a>  looks promising but I haven&#8217;t tried it yet.</p>
<p><strong>Option 2: Dreamhost.com</strong><br />
After researching S3 for awhile, I thought maybe I should just get another hosting account. Dreamhost had everything I wanted, shell access, plenty of storage (200GB) and only $40/year after googling for coupon code. Sounds to good to be true. 1and1 had similar offerings, but I have heard to many horror stories to try 1and1.</p>
<p>Pros:</p>
<ul>
<li>Cheap</li>
<li>Relatively good reputation</li>
<li>Shell access</li>
<li>Ability to host websites there as well</li>
</ul>
<p>Cons:</p>
<ul>
<li>Security concerns</li>
</ul>
<p>Summary: Well they obviously oversell,  <a title="they even admit it" href="http://blog.dreamhost.com/2006/05/18/the-truth-about-overselling/">they even admit it</a>. But that didn&#8217;t bother me, if I could use the 10-50GB I would need. So I signed up and started to rsync some of my data, but then I started to worry a little. Anybody with $40 could sign up for an account with shell access. What would happen if someone gained access to my files? After a simple ls of the home directory, the first site I looked at that was hosted on my server was a blog from an ethical hacker, I decided to jump ship entirely. Not that I think Dreamhost doesn&#8217;t secure their servers, I am sure they do, but it just didn&#8217;t feel right to leave my somewhat sensitive data there. I would trust it for my photos and videos, but not my documents. So now what.</p>
<p><strong>Option 3: Modified <span style="font-weight: bold"><a href="http://www.amazon.com/gp/product/B000E5E868?ie=UTF8&#038;tag=netphoneresea-20&#038;linkCode=as2&#038;camp=1789&#038;creative=9325&#038;creativeASIN=B000E5E868">Linksys WRTSL54GS</a></span> router with openwrt or dd-wrt firmware and external USB drive.</strong><br />
I thought about taking a Linksys WRTSL54GS and putting the much improved firmware options available from openwrt or dd-wrt and slapping a USB drive on top and then putting it at my mom&#8217;s house. This would cost around $200 for about 250 GB and a router. This one appeals to the DIY part of me. I would set it all up and make initial data sync locally, then mail it to her to save a lot of time with the initial upload.</p>
<p>Pros:</p>
<ul>
<li>No monthly fees</li>
<li>I assume this setup would consume less energy than a traditional PC, but I have not calculated anything yet.</li>
<li>All hardware is under my control</li>
<li>fun to set up</li>
</ul>
<p>Cons:</p>
<ul>
<li>One more piece of hardware to buy and maintain, secure etc.</li>
<li>Limited to upload/download speed of my mom&#8217;s DSL</li>
<li>consumes her electricity</li>
</ul>
<p><strong>Option 4: Carbonite.com</strong><br />
Pros:</p>
<ul>
<li>Easy install on Windows, &#8220;unlimited&#8221; for $5/month. Thought I could use with Samba share, I was wrong, shares not supported.</li>
</ul>
<p>Cons:</p>
<ul>
<li>Doesn&#8217;t work with Network shares</li>
</ul>
<p>Summary: Since it doesn&#8217;t work with network shares, this is not going to work as a solution for me, but would be a very good solution for the average Windows user with photos, music etc in his My Document folder.</p>
<p><strong>Option 5: Use my extra space on my webhost ~ 10GB</strong><br />
Although this is a tempting option, I don&#8217;t want to mix my hosting with my backups. If I accidentally filled up my webhost quota, I could risk downtime. I ruled this option out right away, but I thought I should mention it.<br />
Conclusion</p>
<p>Here are my cost guesstimates for two scenarios&#8230;.</p>
<p>Assuming 2 years and 20GB of storage</p>
<table width="450" cellspacing="0" cellpadding="3" border="1" bgcolor="#cccccc">
<tr>
<td style="width: 20%"></td>
<td style="width: 20%">Storage Costs</td>
<td style="width: 20%">Transfer Costs</td>
<td style="width: 20%">Hardware Costs</td>
<td style="width: 20%">Total for 2 years</td>
</tr>
<tr>
<td style="width: 20%">Amazon S3</td>
<td style="width: 20%">$3/mo</td>
<td style="width: 20%">~$0.50/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$84</td>
</tr>
<tr>
<td style="width: 20%">Dreamhost</td>
<td style="width: 20%">$8/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$192</td>
</tr>
<tr>
<td style="width: 20%">Linksys/USB drive</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$200</td>
<td style="width: 20%">$200</td>
</tr>
<tr>
<td style="width: 20%">Carbonite</td>
<td style="width: 20%">$5/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$120</td>
</tr>
</table>
<p>Assuming 2 years and 100GB of storage</p>
<table width="450" cellspacing="0" cellpadding="3" border="1" bgcolor="#cccccc">
<tr>
<td style="width: 20%"></td>
<td style="width: 20%">Storage Costs</td>
<td style="width: 20%">Transfer Costs</td>
<td style="width: 20%">Hardware Costs</td>
<td style="width: 20%">Total for 2 years</td>
</tr>
<tr>
<td style="width: 20%">Amazon S3</td>
<td style="width: 20%">$15/mo</td>
<td style="width: 20%">~$2/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$408</td>
</tr>
<tr>
<td style="width: 20%">Dreamhost</td>
<td style="width: 20%">$8/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$192</td>
</tr>
<tr>
<td style="width: 20%">Linksys/USB drive</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$200</td>
<td style="width: 20%">$200</td>
</tr>
<tr>
<td style="width: 20%">Carbonite</td>
<td style="width: 20%">$5/mo</td>
<td style="width: 20%">none</td>
<td style="width: 20%">none</td>
<td style="width: 20%">$120</td>
</tr>
</table>
<p>So what am I going to use? Good question. First of all, Carbonite will not work for me at all. No way to backup from Linux or even Samba share. The Linksys/USB solution would be kind of cool, but is one more device for me to have to maintain and it will be offsite. Plus, it will be limited to my Mom&#8217;s DSL upload speed, if I need to restore. It would be faster, literally, to overnite the drive to me, than to download 20-100GB back.</p>
<p>I am left with Dreamhost or Amazon S3. While Dreamhost seems like a good fit for me, I am very nervous at the thought of letting my data sit on a server that 500 people have shell access to plus not very scalable. So I am left with Amazon S3. I only have a couple of reservations about this. One, privacy of my data, I am trusting that completely to Amazon, for better or worse. Secondly, I am worried I might start to backup much more data which would push the monthly price over my $10 monthly limit.</p>
<p>My next post will be on the success or failure of my Amazon S3 via s3sync option. Stayed tuned&#8230;..</p>
<p>Leave a comment if you have other ideas on how best to solve the growing challenge of offsite storage.</p>
<p>While I have been researching this for awhile now, I was inspired to write my findings down after reading a good post on this topic from  <a href="http://jeremy.zawodny.com/blog/archives/007624.html">Jeremy  Zawodny</a>  and realizing many people are interested in this topic.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eberly.org/2006/10/02/cheap-reliable-secure-off-site-storage-for-digital-life-backup-where-are-you/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

