Evaluating Lucene
April 27, 2005 5:36 AM   Subscribe

I’ve been assigned to evaluate the feasibility of using Lucene for our website, a large, high-use government site with rapidly-changing data.

It’s not the only search engine we’re evaluating, but it’s the one I’m looking at. Anything I should know?

I know this is a very general question, but I’m not particularly technical, and I’ve just started looking into it, so I don’t even know what questions to ask.
posted by MrMoonPie to Computers & Internet (3 answers total)
MrMoonPie posted "I know this is a very general question, but I’m not particularly technical, and I’ve just started looking into it, so I don’t even know what questions to ask. "

So test it. You need to see how it works for the end-user., so you don't need to be that technical.

Install a trial copy, and then ask your boss to give you ten GS-7s for hour a day for two weeks to do searches (you want all the 7s at one time, as you're trying to test how the thing responds when multiple users hit it, among other things).

Each day, give the 7s a list of thirty things to find (e.g., "our last press release that mentions panda bears"). Emphatically let the let the 7s know you're not testing them or their speed, you're testing the search software. Spend forty minutes in this, then spend the remaining twenty minutes collecting the qualitative results -- how many of the thirty items were found by each 7, and how long each search took -- and the 7s' subjective response. They've used Google; ask them how Lucene compares.

Finally, spend two hours with five GS-15s (try for a mix of lawyers and engineers) doing the same thing. Have the 15s come up with their own rather technical searches, and then have each one hand his list of searches to another one of the 15s.

Finally, get someone who is technical to give you a run-down of the technical pros and cons of the software.

At the end of the whole thing, ask your 7s and your 15s if Lucene is easy to use and makes your agency look good. Compile the quantitative results and qualitative results into a report for your boss.
posted by orthogonality at 5:56 AM on April 27, 2005

I use Lucene pretty extensively. As a developer, I appreciate how easy it is to write my own custom indexers and searchers. Plus there're neat tools like Luke that let me sift through the index and double check the data.

Is your website, J2EE based? I found it pretty trival to integrate Lucene into a Tomcat/Velocity environment, but it could be a different experience if you're running IIS.

On preview, I'd go straight to user testing with some GS-30s. I hear they're, like, twice as good as GS-15s.
posted by Loser at 9:29 AM on April 27, 2005

The other developer with my project has been doing some testing of Lucene and has been quite pleased with how easy it is to do indexing. She is also enamoured with the transparency of how result sets are generated. FWIW, our data is all METS/MODS XML so Lucene is ideal for our purposes. We have also been using Oracle's XDB but are looking to put together an entirely OSS version of our application.

The biggest problem she has run across is indexing accent-free unicode. Much of our data runs outside the core Latin-1 and we need to index with and without the special characters (ie 'resume' will match 'resume' and 'resumé'. And, on glancing at my inbox, it appears she has solved this problem now...

Sorry, I can't be more specific than this. I'm the interface/usability guy and a bit weak when it comes to how the back-end of our project is put together.
posted by Fezboy! at 11:01 AM on April 27, 2005

« Older How much money is there?   |   Make my jeans last longer! Newer »
This thread is closed to new comments.