What's the latest technology in open-source search engines for private websites?
June 14, 2004 11:27 PM   Subscribe

The last time I had to set up a search engine for a basic web site, swish-E was the free indexer of choice. However, its relevance sorting and other features (as I remember them) leave much to be desired in today's world, where everyone expects every search engine to be as good as Google. What's the LAMP favorite nowadays? Where is the interesting development happening in open-source search index and query service?
posted by scarabic to Computers & Internet (10 answers total)
I use Google on my site. Works fine.
posted by kindall at 11:30 PM on June 14, 2004

Response by poster: Right. But what about a site that's not publicly available, or an intranet? Or something so large that Google might notice and complain? Don't they have rack appliances to sell? Setting up a Google search redirect doesn't work for all purposes.
posted by scarabic at 11:35 PM on June 14, 2004

Response by poster: And I'm also just curious whether there is still a lively community of open-source search developers, and if Google's ubiquity has inspired them to build a new generation of better tools, or taken the wind out of all their sails.

posted by scarabic at 11:37 PM on June 14, 2004

ht//dig used to be the bomb for local searching, but I hear nutch is where it's at these days.
posted by mathowie at 12:01 AM on June 15, 2004

Google does sell server appliances.

As far as being too large for Google's liking, I really don't think Google cares. It's archived practically all of MetaFilter, and MetaFilter is small compared to other sites who have received similar crawling. In theory, the more data Google digests, the better Google gets, and the more profitable Google becomes.
posted by Danelope at 12:15 AM on June 15, 2004

Response by poster: Okay, you're probably right about that. They are likely prepared to absorb the search needs of even a million+ visits per day site. More AdWords ads served, no problem! True. I think this is very rarely put to the test, though, because there are very few 1M-visits-per-day sites who can afford to wait around for Google's "couple times a month" indexing schedule.
posted by scarabic at 12:53 AM on June 15, 2004

i used htdig for a few years, but googles pdf indexing was just too useful to not take advantage of, i dont know of one i could install on my server that would provide that, and my employer's content doesnt change that quickly.
posted by yeahyeahyeahwhoo at 7:50 AM on June 15, 2004

I used Atomz on my public site a while back and it seemed really slick. You might want to check out Jakarta Lucene.
posted by hyperizer at 2:58 PM on June 15, 2004

I use htdig on my Roald Dahl site and it seems to work pretty well. I like that I "own" the code so I can tweak it to look and work exactly as I want.
posted by web-goddess at 4:16 PM on June 15, 2004

For the most comprehensive information and tools list, check out Search Tools.

Lucene is being used by a lot of Java projects, but in terms of pure functionality, I think that your best bet is w/ mnoGoSearch (Dataparksearch is a potentially interesting new fork).
posted by lhl at 9:15 PM on June 15, 2004

« Older Can anyone recommend some good environmental blogs...   |   Questions regarding ditching a landline for VoIP Newer »
This thread is closed to new comments.