need a library for text classification
July 20, 2008 4:17 PM Subscribe
I need to be able to automatically identify language (English, Japaneese, Russian, etc ... ) in which a particular blog-post has been written. (lang attribute might or might not be available).
Few years ago I came across a library for RSS feeds that was doing roughly what I need - can not find it anymore though.
Few years ago I came across a library for RSS feeds that was doing roughly what I need - can not find it anymore though.
http://languid.cantbedone.org/
It's written as a Perl library, available from the site above.
posted by thebabelfish at 4:50 PM on July 20, 2008
It's written as a Perl library, available from the site above.
posted by thebabelfish at 4:50 PM on July 20, 2008
Nice find, eponysterical-babelfish.
CPAN also turns up modules like Lingua::Identify, Text::Language::Guess, Lingua::Ident (which looks like it implements my idea above)...
posted by hattifattener at 5:05 PM on July 20, 2008
CPAN also turns up modules like Lingua::Identify, Text::Language::Guess, Lingua::Ident (which looks like it implements my idea above)...
posted by hattifattener at 5:05 PM on July 20, 2008
Response by poster: http://languid.cantbedone.org/ - is the one I was looking for. Many, many thanks!
posted by chexov at 11:08 PM on July 20, 2008
posted by chexov at 11:08 PM on July 20, 2008
« Older A diamond ring that takes a licking and keeps on... | Is my friend allergic to... food? Newer »
This thread is closed to new comments.
Not a very sophisticated algorithm, but it's simple and might work perfectly well.
posted by hattifattener at 4:43 PM on July 20, 2008