What's the best open source search engine?
August 26, 2008 10:37 AM   Subscribe

My company is looking to build a search engine to index our web site and I'm wondering if anyone has used any open package to do the task. Names, URLs, experience would be highly appreciated,
posted by seeminglee to Technology (6 answers total) 10 users marked this as a favorite
 
Lucene is the standard for this.

Or, if it's a public site, I'd just use Google's site search. It's free, but you can pay some extra money to: a) remove ads, and b) customize the look and feel.
posted by zippy at 10:55 AM on August 26, 2008 [1 favorite]


Lucene, Xapian and Sphinx are big ones

There are no real standards here. Use whatever fits your project the best. You should give us some information about the web-platform and database you are using and you will get better answers. What you want is the one that integrates best with the stuff you already have.
posted by uandt at 11:24 AM on August 26, 2008


You almost certainly will not be using Lucene directly... Solr has an indexer and a nice web interface that cuts out a lot of the work. I've been using it for a large site (20k + pages) and it is pretty flexible and moderately fast. Not lightning-fast. Nutch is a level above Solr. (Lucene = text search library, Solr = Lucene + indexer & API, Nutch = Solr + Crawler & Web Interface).

If your site is public and large, and your resources limited, use Google custom search. If you're doing this because your site is 20,000 static HTML files, make a new website.
posted by tmcw at 11:37 AM on August 26, 2008


I've been using Perlfect Search. It works great.
posted by Class Goat at 11:55 AM on August 26, 2008


+1 nutch. That's what it's for.
posted by rachelpapers at 2:44 PM on August 26, 2008


Solr + Lucene is a combination that's worked really well for a number of sites I've been involved with. If you need fine grain control over the content (for example, you're using it as the index for a CMS that you control) it has the advantage of being able to take arbitrary metadata like 'topic' or 'author' that you define, and use that for smarter indexing.
posted by verb at 1:33 PM on August 27, 2008


« Older please draw a heart for me   |   Let's give them someone to talk to Newer »
This thread is closed to new comments.