Corpus of printed letter images
January 16, 2014 1:45 AM Subscribe
I like playing around with text recognition algorithms, but am stymied by a lack of a good corpus to train and test my code against. I'm looking for a large number of images of individual printed letters, labelled with the correct letter. (With the letter in the file name, or each set of letters in a directory, or something equivalent like a metadata file.) Something like this, but more of it.
The National Institute of Standards and Technology has your back.
posted by oceanjesse at 5:12 AM on January 16, 2014 [3 favorites]
posted by oceanjesse at 5:12 AM on January 16, 2014 [3 favorites]
The Kaggle website has a file set that may suit your needs. It's under the training for data science resources. It has a variety of badly spelled numbers and (if I remember correctly) letters. It's been a while since I was on there, but it stands out in my mind as 1. Curious, and 2. Useful.
posted by jibberish at 1:36 PM on January 16, 2014
posted by jibberish at 1:36 PM on January 16, 2014
« Older What brand of glasses does Harold Finch wear? | Adobe CS6 now thinks registration is invalid Newer »
This thread is closed to new comments.
posted by ignignokt at 4:49 AM on January 16, 2014