OCR for charts and graphs
October 13, 2014 9:51 AM   Subscribe

This happens all the time in my work: I see a chart of valuable data, I want the exact numeric values or close approximations but the data is (purposely) unlabeled, or perhaps only key data is labeled. Is there a way to automatically estimate the values?

See this chart from the Economist as an example:

http://theeconomist.tumblr.com/post/3748598014/daily-chart-the-worlds-biggest-defence

I could spend the time to make a similar chart in Excel and then tweak the numbers until I'm satisfied with the results (I've done it before ), but I'm hoping there's a better option.

I can imagine on a bar chart there might be a plugin to count the pixels, or on a pie chart measure the angles, that sort of thing.
posted by 2bucksplus to Computers & Internet (11 answers total) 8 users marked this as a favorite
 
It's not automatic, but you could do this using (free) ImageJ.
posted by Comrade_robot at 9:58 AM on October 13, 2014


The google term you are looking for is graph digitization, or digitizer.

I don't know if a good free solution exists, but it's not an intractable problem - Origin does this, and I vaguely remember seeing it done with Mathcad (or maybe Matlab, I forget).
posted by Dr Dracator at 10:07 AM on October 13, 2014


Check out WebPlotDigitizer, which is free (video demo).
posted by jjwiseman at 10:09 AM on October 13, 2014 [3 favorites]


You have to have a lot of faith that the unlabeled graphic accurately depicts the underlying numerical values. It may, if the numbers were used to generate the graphic. But the graphic could have been made by somebody approximating by hand. I'd never trust anything but the numbers themselves.
posted by Longtime Listener at 10:17 AM on October 13, 2014 [5 favorites]


A few options for graph digitization that I'm aware of (in addition to those already mentioned above):

GrabIt!
DataThief
DigitizeIt
Engauge Digitizer
Ungraph

I haven't used any of these and can't vouch for relative quality, I just work with someone whose research has in the past involved a fair amount of digitizing graphs.
posted by pemberkins at 10:32 AM on October 13, 2014 [1 favorite]


I use GraphClick which is not free but excellent. PlotDigitizer is a free multiplatform option.
posted by roofus at 10:48 AM on October 13, 2014


I've used Datathief before for this purpose - it's tedious and fiddly but does what you want (noting the caveat above that you have to assume whoever drew the graphic was using actual numbers in the first place).
posted by une_heure_pleine at 11:23 AM on October 13, 2014


I've used DataThief in doing meta-analysis for journal articles. It is not super-user-friendly and it is quite slow, but it does work. If the source data are good (for instance, if the graph is published in a academic journal), this is a sound, recognized way of doing this.
posted by hydropsyche at 11:24 AM on October 13, 2014


Longtime Listener: You have to have a lot of faith that the unlabeled graphic accurately depicts the underlying numerical values. It may, if the numbers were used to generate the graphic. But the graphic could have been made by somebody approximating by hand. I'd never trust anything but the numbers themselves.
Since the OP might be intending to digitize the drawing in order to check its accuracy, your point really isn't valid.

Numbers can be faked as easily as charts; would you warn someone wanting to use OCR that they "have a lot of faith that the numbers accurately blah blah blah"?
posted by IAmBroom at 12:23 PM on October 13, 2014


Re-read the question. The OP wants to find out what the exact numeric values are, or get a close approximation. The OP is not trying to check the chart against numbers that already are known. Many times in putting together publications I've had to send a graphic back to be re-done because the artist didn't get it right. No software can give you reliable numbers unless the lines or bars are rendered properly in the first place.
posted by Longtime Listener at 1:24 PM on October 13, 2014


Response by poster: (For what it's worth: many of the charts I want were obviously made in Excel, meaning it would be more work for them to be incorrect. But I'll take the caveat that sometimes charts are an inaccurate representation of their underlying data. )
posted by 2bucksplus at 3:24 PM on October 13, 2014


« Older There's a black and red spider in my car--now what...   |   Short-Term Workaround for Dying Laptop Newer »
This thread is closed to new comments.