July 27th, 2010 | Tags: , , , ,

SEARCH FOR: ‘Jon Smath’ and get => ‘John Smith’

This post explains the easy way to get “fuzzy” search results when using sunspot with ruby on rails. This is probably obvious to solr experts out there, but I found the information to be lacking in the rails community. I originally had compared the results of solr vs sphinx for fuzzy searching. In the original comparison I was using the Levenshtein distance for solr, which turns out not to scale well and doesn’t always return the best results when there are exact matches.

For this reason, we switched to using the “Double-Metaphone” algorithm for fuzzy searching in solr. It provides a simple way to get fuzzy results for solr searches while still being able to scale well since most of the work is done at the time of indexing.

Here is how to make it work using Sunspot.

Edit solr/conf/schema.xml and add the following line to the <analyzer> section.

So it looks something like this.

Note: order is important and you will want this towards the bottom of the config block. See the Solr Wiki for more detailed information

Last step, restart solr and reindex. Then test your searches with misspellings etc.

July 14th, 2009 | Tags: , ,

A post I made on my company blog about fuzzy search results comparing solr to sphinx.
Fuzzy search results, solr vs sphinx

December 18th, 2008 | Tags:

I recently posted a an article on my companies blog about using git with capistrano for deployment.

October 27th, 2008 | Tags: , , ,

The following is how I automated my backups to Amazon S3 in about 5 minutes.

I lot has changed since my original post on automating my backups to s3 using s3sync. There are more mature and easier to use solutions now. I am switching because using s3fs gives you much more options for using s3, it is easier to set up and it is faster.

I now use a combination of s3fs to mount a S3 bucket to local directory and then use rsync to keep up to date with my files. The following directions are geared towards Ubuntu linux, but could be modified for any linux distribution and Mac OSX.

Read more…

January 29th, 2007 | Tags:

I use both Google Docs and the Google toolbar everyday, I was surprised to find there was not a toolbar button for Google Docs & Spreadsheets…. so I made one.

Download the Google Docs & Spreadsheets toolbar button here.

Enjoy!

October 12th, 2006 | Tags:

NOTE: This was the top link on reddit.com for awhile

A jpeg is good for a lot of things, but a live data feed is not one of them. In fact it might be violation of the Americans with Disabilities Act.

The Seattle Fire Department has been hosting a live html feed of the latest 911 calls to the Seattle Fire Department. Many people have found this information useful for various reasons, eg. avoid areas of major accidents/activity. I have been using this data as a feed for seattle911.com, which is a classic google maps mashup that displays the 911 call data on a map of the Seattle Area. I host the site at my own expense, without any advertising.

Last night, I noticed www.seattle911.com was suddenly broken. After 30 seconds of investigation, I found out that they swithced their data feed from text to a jpeg.
datafeed

Read more…

October 9th, 2006 | Tags: , ,

UPDATE: See my newer article for the way I currently backup to Amazon S3.

Jeremy Zawodny has an excellent article/discussion about the different tools currently available to take advantage of Amazon simple storage service (S3). After testing many tools available for S3 currently, I decided to use the ruby program s3sync to backup my data to S3.
As I explained an earlier post, I wanted a simple low level tool to perform automatic backups S3. I decided to use s3sync to do the heavy lifting and use the jets3t Cockpit GUI to monitor my S3 account. The following explains how I successfully started automating my backups to S3 using s3sync and cockpit.

My server is running Ubuntu Dapper with samba server. All the machines in my house use a “Public” drive on the samba server to store all files from Windows and Linux. All of our important files like photos, home movies, and documents are stored on this “public” drive. This simplifies the backup procedure, since I don’t have to backup multiple sources.

The following steps describe how I backup my “public drive” to Amazon’s awesome S3 storage service. I decided to post this, because I haven’t found a fairly “simple” guide to actually automate backups to S3 that functions similar to rsync on Linux. This is a follow-up post to my original post on choosing a backup solution.

Read more…

October 2nd, 2006 | Tags:

There has been numerous posts for online backup solutions (for good reason), but I haven’t found the best solution for me yet.What I want: cheap reliable off-site backups for my home Linux (Ubuntu Server Edition) file server.

O.k. I am a classic digital packrat. My actual garage is very clean and organized, but my digital garage is full of every byte I have saved since 1998, old Job emails, photos, music, websites, some videos, etc. I never, ever want to lose this stuff, even if my house burns down because my laptop battery explodes. So I need to backup my stuff, simple huh? Well, I already store all of our files on a Linux file server via Samba. This machine is backed up nightly to my Linux Desktop machine with cron and rsync, but this doesn’t solve the fire/flood issue. I used to copy data to DVD and store off-site, but burning DVDs is a pain and my data is out growing DVDs. I currently have approx. 10GB of data, but expect it to grow, especially since I want to store our digital camcorder footage as well. So I began the quest for remote storage. While these are not all of the options I considered (I looked/tried many) they came out at the top of the heap for me.
My dream remote storage.

  1. Cheap, of course ($10 or less per month)
  2. Secure, some of my files have info I don’t want in the wrong hands.
  3. Relatively fast, but I am sure my upload speed (768 k) from my ISP, Comcast will be the bottleneck.
  4. Some sort of Linux server of with shell access (so I can automate with cron and rsync)
  5. Not have to set up another full fledged energy hog of a server
  6. Bonus, unlimited scalability.

How to store files offsite? Let go searchin’…….

Read more…

TOP