How to make searchable digital version of my cookbooks?
September 5, 2010 1:10 PM   Subscribe

I have a small collection of cookbooks that I'd like to digitize with the goal of making them searchable. I am fine with destroying the physical books in the process, e.g. cutting off the binding to make the book easier to scan. What is the best way to go about this? My initial idea is to scan the books with an auto-sheet feeder (hence losing the binding) and use OCR (Acrobat Pro) to yield searchable PDFs. I use a Mac, but have access to PCs. I tried this suggestion but the results weren't thorough or viewable enough for me.
posted by kami_ita to Computers & Internet (11 answers total) 14 users marked this as a favorite
 
If you were on a PC I'd say OneNote 2010. OCR built in and the search is pretty awesome.
posted by Kappi at 1:27 PM on September 5, 2010


Google books reportedly puts the books below a camera and has someone turn the pages by hand.
posted by Mike1024 at 1:42 PM on September 5, 2010


If the suggestion in the thread you linked was intriguing, you might also consider eatyourbooks.com. They let you search cookbooks that you actually own. They have 1600 cookbooks indexed and you can request that they index others. I'm not a member so I can't speak to the breadth of their coverage.

I'm also not a user of goobledybook.com, so I can't tell you anything about them either. Their mission is to solve a similar problem, but they rely on users to enter recipes from cookbooks.
posted by stuart_s at 1:44 PM on September 5, 2010


Also, some guy posted a detailed set of instructions for building a book scanner on instructables.com. It works as Mike1024 describes. Since then, he's developed several new and improved models and a community has grown up around them. You should be able to find advice on sourcing materials, software and miscellaneous whatever from there.
posted by stuart_s at 1:54 PM on September 5, 2010


There's an interesting device promoted on a site that seems to be just sort of starting up over the last few months called a BookLiberator; that may be an option for speeding up the process that preserves the binding and doesn't involve paying for a scanner with an auto-feeder. I know you don't care about the bindings so much, but I mention this merely because it appears to be a much faster way to scan through a book than using a traditional flat scanner. It's sort of the cheap DIY version of the methods used by Google and Internet Archive. (It also seems to be somewhat simpler than the book scanner mentioned by stuart_s, which itself has better instructions than the ones on instructables available if you go here.)

Anyway, since scanners with auto-feeders don't seem to be too expensive nowadays, just doing it the way you're thinking about doing it seems to be the most rational thing to do: you can break the bindings, freeing the pages from them, and then cut off any jagged edges that would prevent clean feeding with a paper cutter.

As far as that OCR software goes, you have several options. There is a very good and active open-source project focused on moving forward with the Tesseract OCR engine and especially plugging that system in to the OCRopus document analysis system. It's very cool and fun to play with, but the last time I tried it about two years ago it was generally a hackers-only affair, and my sense is that it hasn't gotten more user-friendly as time's gone on. (High-volume OCR is more of an industrial need than a consumer need, so that kind of makes sense, I think.)

Personally, I think your best bet is one of the well-maintained and easy-to-use proprietary OCR programs; specifically, I think you'd have most luck with ABBYY FineReader, which costs some money, but which I've found is fairly accurate and user-friendly. This is particularly true because there's a pretty cheap version available for the Mac, the FineReader Express Edition which can be had for a mere $99. That's my own recommendation, anyway.
posted by koeselitz at 2:16 PM on September 5, 2010


Also, some guy posted a detailed set of instructions for building a book scanner on instructables.com.

Cool note: This DIY book scanner is actually a project by MeFi's Own™ member fake! He's posted often here about scanning/digitizing books, photos, etc. and runs the diybookscanner.org forums linked by stuart_s. The DIY book scanner was also mentioned in this MeFi post about digitizing books.

Here's a recent AskMe about digitizing a personal library of books -- granted, the library in that question involves a few thousand books, but hopefully the comments there (which also include a few from fake) will give you some more leads on possible hardware, software and general techniques.
posted by rangefinder 1.4 at 3:27 PM on September 5, 2010 [1 favorite]


Wow, thanks.

Kami_ita, you are certainly welcome to join the DIY Book Scanner forum, where we talk about this stuff all day -- and we have a lot simpler designs than the first one I posted. I actually updated the homepage today to give a little more general information about the process.

And FYI, the Book Liberator people have switched to using the same software we've been helping to develop for the last year or so, so the only real advantage there is that they're selling something and we're not.

I see your situation in the following way:

1. Destroy the books to digitize them. Though somewhat unpalatable, this is the fastest and easiest. You go on eBay and buy a Fujitsu Scansnap that will work with your Mac. You also buy/find/borrow a tablesaw or take your books to a local printing press where they have large-scale paper shears. Cut the bindings off. Feed them into the ScanSnap using the included software. OCR these scans with Abby or Acrobat (the open source OCR stuff just isn't that easy to use yet). Sell the ScanSnap on eBay when you are done.

2. Do not destroy the books to digitize them. You can do this with as little as a point and shoot camera and a cardboard box (PDF) or as much as you like. My forum has now around a hundred different designs for people with different building capabilities. You process the output images with Scan Tailor, our favorite Free Software for processing the camera images (there's an OSX build in that thread or on the forum somewhere). Those images then go into Abby or Acrobat or whatever for OCR.

If I were you, I would think about this two ways.

A. If you have spare money and don't like building things, and don't mind destroying books, go with #1.

B. If you like building things, and may want to digitize other books or are interested in making your own ebooks, go with #2.

In any case, you are invited to hang with the book scanning crew that's gotten together to solve this problem. We're a friendly bunch and we are big into sharing techniques and talking about the various problems. Since we're not selling anything, we're also totally honest about drawbacks to any given approach.
posted by fake at 4:09 PM on September 5, 2010 [2 favorites]


Just to follow up on that for clarity: You might be able to get done what you want to get done with a digital camera you already own. It can't hurt to give that a try before you go and spend a bunch of money and time trying other things.
posted by fake at 4:15 PM on September 5, 2010


Looking back to your original question, you might check out Cookbooker. It's kind of a user-generated meta index/recipe review site that might be a structure that you could use to build out your own recipe index. Because, really, as long as you still have the cookbooks to cook out of, all you need to make searchable is the metadata.
posted by libraryhead at 4:19 PM on September 5, 2010


When I attempted this project, I photographed each page of the cookbook in question, then fed it into ABBYY Finereader. This was OK, except: ABBYY doesn't seem to handle fractions very well; it would often (but not always) read ΒΌ as 34, for example. Rather a problem for recipes.

It's still a ton better than typing in the whole thing yourself, but like any OCR project, this will involve lots and lots and lots of checking over the computer's work.
posted by JDHarper at 5:48 PM on September 5, 2010


I know I'm a little late in replying to this, but I would highly, highly recommend Evernote (evernote.com). I scanned in all my cookbooks this way, and they are very searchable. The neat think about evernote is the added internet clipping capability to get recipies from blogs and such, and evernote's general portability.

You'll need to upgrade to pro while you are saving everything, but it's only $5 per month for the larger upload cap, and you can cancel any time and still keep everything in evernote.
posted by tdreyer at 12:34 PM on September 14, 2010


« Older Living Small: A Memoir?   |   Horseback riding in central NJ? Newer »
This thread is closed to new comments.