Join 3,512 readers in helping fund MetaFilter (Hide)


Corpus of printed letter images
January 16, 2014 1:45 AM   Subscribe

I like playing around with text recognition algorithms, but am stymied by a lack of a good corpus to train and test my code against. I'm looking for a large number of images of individual printed letters, labelled with the correct letter. (With the letter in the file name, or each set of letters in a directory, or something equivalent like a metadata file.) Something like this, but more of it.
posted by Zarkonnen to Computers & Internet (3 answers total) 1 user marked this as a favorite
 
You could generate them yourself. One possible way is to use node-canvas, but there's tons of other ways to generate images with text in them.
posted by ignignokt at 4:49 AM on January 16


The National Institute of Standards and Technology has your back.
posted by oceanjesse at 5:12 AM on January 16 [3 favorites]


The Kaggle website has a file set that may suit your needs. It's under the training for data science resources. It has a variety of badly spelled numbers and (if I remember correctly) letters. It's been a while since I was on there, but it stands out in my mind as 1. Curious, and 2. Useful.
posted by jibberish at 1:36 PM on January 16


« Older I've been watching Person of I...   |  How can I convince the CS6 App... Newer »

You are not logged in, either login or create an account to post comments