scanner pen recommendations
April 17, 2009 8:45 PM   Subscribe

My husband has been wondering about getting an OCR scanner pen. There have been a few questions before on this topic, but the technology may have moved on, and he has some specific needs.

Here is his text:

"What I'm looking for: information on OCR [optical character recognition] scanner pens which I could use to quickly extract and digitise paragraph-sized blocks of text from printed library books.

Context: I'm a historical researcher. I must frequently consult, and take notes from, printed library books. Very often, I only have these books for a short period of time, and I'm often not allowed to take them out of the library at all. But the notes I take from them have to be complete and useful enough to cite in my work months or even years later. The best way to preserve enough context to cite the books accurately is to copy the relevant paragraphs word for word, but this often takes far too long (don't worry, I'm well aware of the dangers of mistaking these quotations for my notes and inadvertently plagiarising; I have a well developed note-taking format for unambiguously marking out quoted text). In order to facilitate this note taking, I've started to wonder whether I should get one of those scanner pens which would copy the text for me.

What I Need: It's often difficult to get a desk near a power outlet, so any scanner pen I got would need to be mains-independent for the whole working day. I need to scan English text, and it would be nice to do Russian, although I'm prepared to live without it. I don't mind a system that has to be hooked up to a computer all the time. In fact, in some ways this would be preferable as I could allow me to intersperse quotes with more free-form notes about why I think the text is useful or important. However, any pen would have to be as platform-independent as possible. In some circumstances I carry a Windows laptop but I often use a Linux-powered Eee PC. So a scanner that worked with any OS would be important.

Does anybody have any experience with scanner pens? Which brands and models are good? Are there any tips and tricks to using them? Is this even a good idea in the first place?"
posted by jb to Technology (13 answers total) 5 users marked this as a favorite
 
This article is interesting, and though not focused directly on your question, it does contain this little nugget of information that seems quite relevant:

"For the newer books, OCR is about 90 percent accurate. But that success rate drops to as low as 60 percent for older texts, which often contain fonts that are blurry and less uniform."
posted by If only I had a penguin... at 9:54 PM on April 17, 2009


Take a picture, it'll last longer.

You could stick a bunch of photos of one book into a PDF, OCR that and annotate it without destroying your original.
posted by Brian Puccio at 9:57 PM on April 17, 2009 [2 favorites]


I would look into hand held scanners, the kind you manually pull over a page to scan it. They where popular before all the cheap flat bed scanners were built. Looks like these.

With something like this, you can scan a whole page as an image, and later use whatever OCR you want. (I'd recommend Adobe Acrobat, which is very easy to use and can display the original image, while making the text in it copy&pastable).

I've recently seen a battery operated standalone version in full page size with internal memory, so not even a pc needed (for scanning). I'm sorry, I forgot the name of the device.

You'd probably get similar and possibly cheaper results with a digital camera, of course. I'd not go for a pen-size scanner thingy, as you need to go through every line of text individually and what I've heard, they are not very accurate.
posted by cdx at 2:39 AM on April 18, 2009


I have an ancient Hewlett Packard CapShare Hand Held Scanner that is usefull for this, in the sense it can capture a picture or pfd as well.

As mentioned above, OCR has its drawbacks. Basically, the OCR-software needs some training first with one scan, so it can predict the errors somewhat in the next ones.

So, I'll scan the text, in several strokes when neceessary, because the scanner will stitch everything together at the end, but prefer to capture a pdf.

[Also, I am a historian, not everything I scan is printed text].

Hand held scanners were ridiculously expensive when I bought mine. Maybe they have come down in price. But they surely last long.
posted by ijsbrand at 4:34 AM on April 18, 2009


Best answer: I had a similar situation and tried a similar solution. Caveat: I'm on a Mac, this was a few years ago and I tried the IRISPen (because of the acceptable price). The upshot was that the software and hardware wasn't nearly reliable enough to make this a working solution - I spend more time struggling with the device than getting things done. On a PC, with newer software and spending more money - it might work.
posted by outlier at 6:22 AM on April 18, 2009


He'll be happier with a digital camera and transcribing the text himself (especially if it is ONLY short paragraphs). Those pens sucks.
posted by i_am_a_Jedi at 7:10 AM on April 18, 2009


Response by poster: Reaction from husband:

"Thanks to all those providing help and thoughts. Yes, we use cameras all the time. The trouble is that in order to take OCR quality pictures in the library, you have to put the camera on a tripod and the book a long way from the lens and take long exposures. This is impossible to do inconspicuously and it's just not worth it when you're reading through large volumes of text and want to take the occasional quote. Cameras are good when you need to image a chapter and when the library is fine with you 'disrupting' their reading room.

"Also, people keep telling me that scanner pens sucked a couple of years ago but have now got much better. Does anybody have recent experience?"
posted by jb at 7:30 AM on April 18, 2009


I'd go with the digital camera route; in my current job, we do a lot of book imaging, and people I've worked with do much the same thing with different companies, using a camera, with or without a copy stand. In my experience, you don't need to be a long way or take long exposures, but you do need a camera where you can set aperture, focus, and exposure manually, and use something in the over-10MP range; a 15 megapixel image is about 4500 x 3300, so if you're taking a full-span two-page image of a 6x9 book (a 12x9 image), that's over 300dpi, more than enough for OCR to do a pretty good job; a consumer-grade 4mp camera is definitely not going to cut it. Speed is excellent, too: an entire 600-page book, using a copy stand, can be duplicated in about 20 minutes - turn, click, turn, click, etc., and I've known researchers who don't bother with the stand at all - line up the book, click the picture, let software deskew and equalize it. We use a Canon Rebel Xsi, and while we don't OCR our images, they would definitely work well for it. My usual warning about OCR is that 90% accuracy is still a lot of fixing afterwards; if you're figuring 1500 characters per page, that's 150 wrong characters per page, and finding the mistakes requires you to look at every character and tell a "1" (one) from an "l" (lower-case L) or an rn (r - n) from an m; don't blame your scanning process for too much of the errors, because a big part of it is the accepted limitations of OCR. Google, with all their resources, still has a huge number of mistakes in the OCRed version of the books in Google Book Search. If you're just trying to get a few paragraphs here and there, just take your laptop along and type it out yourself; it'll be faster and easier than an image/OCR/software/proofing process.
posted by AzraelBrown at 8:18 AM on April 18, 2009


Response by poster: My husband and I own two digital cameras for archive work (7MP and 8MP; I would love an >10MP because I have done manuscripts which are 3 feet wide), and have done 1000s of photographs of books and manuscripts. We are very familiar with cameras. We are looking for a different kind of technology. Cameras are not allowed in many libraries, for copyright reasons; SLRs are also too loud for library or archival usage. (Yes, I know some people do - and I'm annoyed at them. I can hear every click.)

It sounds like the scanner pen technology just may not be up to the task yet.
posted by jb at 8:36 AM on April 18, 2009


I've used a Nikon D80 to capture images for OCR processing. With Nuance's Paper Port and /or Omnipage Pro I get quite good results. The white balance in the camera has to be set to match the lighting and with the ISO set anywhere between 1000-1600 I can shot without a flash or tripod and get workable images. The camera does indeed have a loud shutter click. A single 2GB card will hold 1000+ images which can be OCRed one at a time or much more quickly by batching them.
posted by X4ster at 5:48 PM on April 18, 2009


I've read very good things about the (somewhat gimmicky-looking) DocuPen line. You drag it across the page like a handscanner, and it takes a microSD card and scans in color at 400dpi. Then you can plug it into your USB port and suck all that sweet, sweet data into your computer for review/OCR/copying. They're small enough to stow in a laptop bag or briefcase and big enough to scan entire A4 sheets of paper (book pages). They can be had on the cheap (RC800's go for $250-300, with bundle packages) from everybody's favorite online garage sale, eBay.

As far as OCR goes, I think the DP may come with PaperPort but I don't know if it's the 'true' version or a 'shareware' version. There are also eBay sellers who claim to bundle it with a legit version of ABBYY (one of the best!). I once scanned an entire book (300 pages, double spaced) with some free OCR software I found on the net, although I can't recall the name. Send me a note if you're interested and I'll go look it up. Microsoft Document Imaging (comes with Office Standard) also is pretty great.
posted by ostranenie at 5:55 PM on April 18, 2009


Here's an it-doesn't-suck review on the RC800; I'm sure there are loads more.
posted by ostranenie at 5:58 PM on April 18, 2009


Response by poster: Thank you all for your kind responses.
posted by jb at 8:20 PM on April 18, 2009


« Older Travel advice: Ecuador   |   Fuzzy Fate? Newer »
This thread is closed to new comments.