Welcome to End Point’s blog

Ongoing observations by End Point people

Acts As Xapian - It Just Works

I just recently started listening to the podcast done by the guys at RailsEnvy. It's an excellent resource for keeping up on what's new in the Rails world and it's how I found out about the new acts_as_xapian search plugin for Rails. The podcast mentioned this blog post which contains a very thorough rundown of all the different full-text search options currently available for rails. The timing of this article couldn't have been better since I was in the market for a new solution.

I was approaching a deadline on a client project here at End Point and I was having lots of trouble with my existing search solution which was acts_as_ferret. Setting up ferret was relatively easy and I was very impressed with the Lucene syntax that it supported. It seemed like a perfect a solution at first but then came "the troubles."

Ferret is extremely fragile. The slightest problem and your server will just crash. What was causing the crash? Unfortunately the server logs won't give you much help there. You will receive some cryptic message coming from the C++ library if you're lucky. Note that I skipped the suggested Drb server setup since this was a development box.

After a while I would notice something wrong in my model code that might have caused an error while updating the search index. Unfortunately this was impossible to verify since I could not predictably reproduce the error. So in the end, I think there may have been issues with my model fields but ferret was of no help in tracking these problems down. The final straw came when the client started testing and almost immediately crashed the server after doing a search.

Enter acts_as_xapian. Jim Mulholland's excellent tutorial was pretty much all I needed to get it up and running on my Mac. Documentation for acts_as_xapian is a bit thin. It consists primarily of the afore mentioned tutorial and a very detailed README. The mailing list is starting to become more active, however, and you are likely to get a response there to any thoughtful questions you might have.

One major difference with xapian (vs. ferret) is that it does not rebuild your index automatically with each model update. When you modify an ActiveRecord instance it will update the acts_as_xapian_jobs table with the id and model type of your record so that the index can be updated later. The index is then updated via a rake command that you can easily schedule via cron. You can also rebuild the entire index using a different rake command but that shouldn't really be necessary.

I was a bit concerned about the lack of a continuously updated index but I came to realize that it has some significant advantages. The biggest advantage is that it's much faster to update your model records since you are not waiting for the re-indexing to complete on the same thread. It also means you can skip the step of setting up a separate Drb server for ferret in your production environment.

With xapian you can index "related fields" in other models by constructing a pseudo-attribute in your model that returns the value of the associated model as a text string. Ferret allows you to do this as well, but unlike ferret, xapian gives excellent feedback about any mistakes you might have made while constructing them. If you have a nil exception somewhere in one of these related fields, xapian will complain and tell you exactly what line it's bombing out.

I was also able to setup paging for my search results with paginating_find which I prefer to will_paginate (just a personal preference -- nothing wrong with will_paginate). There is also a cool feature that will suggest other possible terms ("Did you mean?") if your search returns no results. So far the only disappointment has been the lack of an obvious way to do searches on specific fields.

If you are in the market for a new full-text search solution for Rails, you should really give xapian a try.


Ethan Rowe said...

A few questions:

* Did you try the Ferret recommended configuration before switching?

* Does your xapian experience at present include production environment usage/support, or are you speaking strictly from a development standpoint at the moment?

Thanks. I'll take a look at xapian soon.
- Ethan

sean.schofield said...

If by recommended configuration you mean the Drb server, then no, I did not try that. Since I was the only person hitting the server the lack of a Drb server wasn't really a factor. I ultimately would have used the Drb server when deploying in a production environment but I never got that far.

As for my production xapian experience, I deployed in a production environment last week. So far its working great (including the periodic updating of the index.)

Rob Lambert said...

Cool Sean. I tried out acts_as_xapian right after I read this, and yes "it just works". Nice.

Any chance you could give a quick overview of how you would get acts_as_xapian working with paginating_find? It wasn't immediately obvious to me...


sean.schofield said...

Here's a brief blog post on how you can do this.

rgh said...

For the record.

My own experience with ferret in a production environment is to *never* use ferret.

I am currently using both xapian and ferret (neither using acts_as...) in a moderate sized application; xapian for the main index (holding about 1.9 million documents) and ferret for individual indexes. The xapian index has worked flawlessly since it was deployed and whilst ferret worked well initially after a period of time it just started giving random errors. What's more I could find nothing on how to fix those errors.

Replacing ferret is *very* high on my list of things to do.

Anonymous said...

sadly, installing on a mac seems to no longer 'just work'. acts_as_xapian fails to find the Xapian installation even when following the blog post at that every person seems to point to when this question arises.

Dale Cook said...

We just implemented Xapian (with acts_as_xapian) in glunote and over all the experience was pretty painless. We've used Ferret before (on a different project), and while the initial results were good, production started to deteriorate rapidly.

There was one slight gotcha though. Our development environment was Ubuntu 9.04 32 bit and the servers are Ubuntu 9.04 64 bit. Everything was fine on the development machines, but on production the Xapian ruby bindings were unable load correctly. We fixed it by changing the location of the file in the xapian.rb file to be the absolute location. Problem fixed but obviously not really good long term so we'll have to work out something different.

Other than that - great and seems to be kicking along just fine.