Are there any OCR programs that can process Syriac?
March 14, 2008 8:16 PM Subscribe
Are there any OCR programs that can read Syriac?
This one's a pretty straightforward one, I'm just looking for an OCR program that can recognize Syriac and output it into a digital document. Anyone have any leads on software that may do this?
This one's a pretty straightforward one, I'm just looking for an OCR program that can recognize Syriac and output it into a digital document. Anyone have any leads on software that may do this?
Best answer: Further interesting stuff: the 242-page PDF PhD thesis of Mohammed S. M. Khorsheed, "Automatic Recognition Of Words In Arabic Manuscripts", a student of William Clocksin's, in June 2000.
The open source program Tesseract, and a quick perusal of its associated files indicates that the most typographically complex language it can handle so far is Vietnamese, which is a Roman script variation with 29 base letters and 6 tonal indicators.
I think the short answer is it doesn't really exist, yet. But I'm sure there's a fortune to be earned for making it.
posted by aeschenkarnos at 9:03 PM on March 14, 2008
The open source program Tesseract, and a quick perusal of its associated files indicates that the most typographically complex language it can handle so far is Vietnamese, which is a Roman script variation with 29 base letters and 6 tonal indicators.
I think the short answer is it doesn't really exist, yet. But I'm sure there's a fortune to be earned for making it.
posted by aeschenkarnos at 9:03 PM on March 14, 2008
Best answer: GOCR can be trained to recognise new alphabets, but would have trouble with being able to separate letters perhaps.
I suggest posting to the GOCR mailing list (on jocr.sf.net), and linking to some high res images of some Syriac text, you may get some interest.
posted by Jerub at 3:56 AM on March 15, 2008
I suggest posting to the GOCR mailing list (on jocr.sf.net), and linking to some high res images of some Syriac text, you may get some interest.
posted by Jerub at 3:56 AM on March 15, 2008
This thread is closed to new comments.
That said, Googling leads me to this: William Clocksin and Prem Fernando are the experts, this is the relevant paper, and Clocksin describes having a program on his page linked above.
I suggest emailing him and asking if his program is available. Whether he'll say "sure, here's a copy" or "I rent that out at $5000 per hour!" is uncertain, but it's worth asking.
posted by aeschenkarnos at 8:46 PM on March 14, 2008