Natural Language Parsing library?
June 22, 2007 12:33 PM

A freely distributable Natural Langage Parsing library?

I'm looking for a library (preferably C or C++) that I can feed text to and receive in return a parse tree with clauses, roots, and parts of speech noted appropriately.

The first thing I stumbled across was FreeLing, which provides something like this (represented here using the text output of their sample program):

Input: "This is sample text."
Output:

S_[
(This this DT)
verb_[
+(is be VBZ)
]
grup-n_[
+(sample sample NN)
]
grup-n_[
+(text text NN)
]
(. . Fp)
]


Looking through this list of potential resources I am quite overwhelmed.

The ideal library will come with the support files for several major languages. One of the problems with FreeLing is that there only appear to be support files for English, Spanish, Italian, Catalan, and Galician[*].

The project I'm investigating would be a freeware e-book reader with built-in support for people reading outside of their native languages.

In any case, as I said I'm quite overwhelmed with choices. Please, help me become whelmed again.



* If you know what Galician is without having to look it up, consider me impressed.
posted by tkolar to Science & Nature (5 answers total) 3 users marked this as a favorite
Beware of premature optimization. I'd start with FreeLing and just get it working with two languages or something.
posted by rhizome at 1:15 PM on June 22, 2007


I've already started with FreeLing, but given the mistake it makes parsing my example above, my faith is not high in the technology as a whole.

If it turns out that no freely available library exists that does what I need, I'll be scrubbing the whole project. Better to find that out sooner rather than later.
posted by tkolar at 2:06 PM on June 22, 2007


um, all of those packages are for research.

There are no good working broad coverage parsers available.
posted by MonkeySaltedNuts at 2:39 PM on June 22, 2007


I assume you're looking for automatic Phrase Structure generators or taggers, in which case I can't say for certain. If you could do with just a part of speech (PoS) tagger, then there seems to be many of them available (do a google search for "Automatic Part of Speech Tagger" - it seems like ones based on Hidden Markov Models (HMM) are the best).

It's surprising to see this here, because I've been searching high and wide for Automatic Dependency Tree Generators and it took me the last two days, extreme google-fu, and every trick any librarian has ever taught me to find just one or two downloadable libraries (not including FreeLing, which doesn't seem to do Dependency Tree Generation for English). So, use lots of buzzwords. If you find an academic article that seems relevant, google the authors and see if they have anything at their website.
posted by Galt at 2:59 PM on June 22, 2007


You're at the Stanford page, but I'm wondering if you overlooked their parsing software (GPL, for Java) - there's an online demo that seems to have considerably better luck with your test sentence than FreeLing did.
posted by aparrish at 9:51 PM on June 22, 2007


« Older Wine, Women and Song   |   GDM KDE GNOME KDM XDM XFCE ARGH! Newer »
This thread is closed to new comments.