Corpus of printed letter images
January 16, 2014 1:45 AM   Subscribe

I like playing around with text recognition algorithms, but am stymied by a lack of a good corpus to train and test my code against. I'm looking for a large number of images of individual printed letters, labelled with the correct letter. (With the letter in the file name, or each set of letters in a directory, or something equivalent like a metadata file.) Something like this, but more of it.
posted by Zarkonnen to Computers & Internet (3 answers total) 1 user marked this as a favorite
You could generate them yourself. One possible way is to use node-canvas, but there's tons of other ways to generate images with text in them.
posted by ignignokt at 4:49 AM on January 16, 2014

The National Institute of Standards and Technology has your back.
posted by oceanjesse at 5:12 AM on January 16, 2014 [3 favorites]

The Kaggle website has a file set that may suit your needs. It's under the training for data science resources. It has a variety of badly spelled numbers and (if I remember correctly) letters. It's been a while since I was on there, but it stands out in my mind as 1. Curious, and 2. Useful.
posted by jibberish at 1:36 PM on January 16, 2014

« Older What brand of glasses does Harold Finch wear?   |   Adobe CS6 now thinks registration is invalid Newer »
This thread is closed to new comments.