Are there any OCR programs that can process Syriac?
March 14, 2008 8:16 PM   RSS feed for this thread Subscribe

Are there any OCR programs that can read Syriac?

This one's a pretty straightforward one, I'm just looking for an OCR program that can recognize Syriac and output it into a digital document. Anyone have any leads on software that may do this?
posted by perpetualstroll to religion & philosophy (3 comments total) 1 user marked this as a favorite
Considering what it looks like, and how vitally important the distinction between where one letter starts and another ends in OCR is, I'd say it's a significantly harder problem than OCRing cursive English.

That said, Googling leads me to this: William Clocksin and Prem Fernando are the experts, this is the relevant paper, and Clocksin describes having a program on his page linked above.

I suggest emailing him and asking if his program is available. Whether he'll say "sure, here's a copy" or "I rent that out at $5000 per hour!" is uncertain, but it's worth asking.
posted by aeschenkarnos at 8:46 PM on March 14


Further interesting stuff: the 242-page PDF PhD thesis of Mohammed S. M. Khorsheed, "Automatic Recognition Of Words In Arabic Manuscripts", a student of William Clocksin's, in June 2000.

The open source program Tesseract, and a quick perusal of its associated files indicates that the most typographically complex language it can handle so far is Vietnamese, which is a Roman script variation with 29 base letters and 6 tonal indicators.

I think the short answer is it doesn't really exist, yet. But I'm sure there's a fortune to be earned for making it.
posted by aeschenkarnos at 9:03 PM on March 14


GOCR can be trained to recognise new alphabets, but would have trouble with being able to separate letters perhaps.

I suggest posting to the GOCR mailing list (on jocr.sf.net), and linking to some high res images of some Syriac text, you may get some interest.
posted by Jerub at 3:56 AM on March 15


« Older Toddlers and copyediting do no...   |   What tree is this?... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Test My Mettle September 24, 2008
Voices you either love or hate... June 22, 2008
Modern ways of testing character January 26, 2007
What is that "and" character called? November 29, 2005
Character Recognition Software July 20, 2005