squeezing grad school notebooks into a digital realm? is iphoto the way to go?
April 25, 2008 1:49 PM   Subscribe

i'm about to embark on a document-imaging and archiving project. i have dozens of handwritten notebooks, a 7 megapixel camera, tripod, macbook, and large external drive. is iphoto going to be useful for this? (or sketchy i may use a separate iphoto library on an external drive?). are alternative software/strategies preferred for managing all these document images? (in case it matters, i'm not interested in putting them online at this point).
posted by garfy3 to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
Wouldn't it be easier to use a scanner? If you're willing to take apart your notebooks, you could even use one that has a sheet feeder.
posted by pete0r at 1:55 PM on April 25, 2008


Response by poster: actually, i use a scanner all the time at work, and just couldn't deal with waiting 1 minute/page for the scans. photos are darn quicker, and from what i've seen nearly equal in quality. (i also don't want to go out and buy a scanner at this point).
posted by garfy3 at 2:02 PM on April 25, 2008


I worked for many years for a company that did this sort of thing professionally. Most of the work was big jobs, for hospitals, law firms, gov'ts etc. But we had lots of small jobs like this. You might find a company (look under "imaging services") in your local yellow pages who can do this for $.05 per page or less. Dozens of notebooks makes me think hundreds of pages (or even over 1,000 that seems like it would be super time consuming to do at home.
posted by vito90 at 3:26 PM on April 25, 2008


Scan them and OCR them with a program like IRIS or ABBYY. Then you should be able to organize and search them.
posted by lunchbox at 5:04 PM on April 25, 2008


Response by poster: lunchbox: thanks for the idea -- but i'm particularly looking to grab some of the side-margin doodles and such, so OCR probably won't do.
posted by garfy3 at 5:41 PM on April 25, 2008


Even using a tripod and camera would require more time, I think. You'd need to make sure the pages were super-flat, or different parts of the page would be in and out of focus. You'd need to make sure the light was right, etc. If you're willing to cut up the notebooks, you could use a sheet-fed scanner.
posted by delmoi at 7:16 PM on April 25, 2008


You may want to consider the DjVu file format for storing the images.
DjVu (pronounced déjà vu) is a computer file format designed primarily to store scanned images, especially those containing text and line drawings. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images. This allows for high quality, readable images to be stored in a minimum of space, so that they can be made available on the web.
posted by Sitegeist at 2:51 AM on April 26, 2008


Response by poster: Sitegeist: thanks for the DjVu tip -- i'm checking that out.
posted by garfy3 at 5:41 AM on April 26, 2008


From a preservation standpoint, I'd urge you to use a lossless format that is well-supported. You want something that is openly documented, well-supported, widely adopted, cross-platform, uses no or lossless compression, and, if possible, is not proprietary. TIFF might be a good option for this project. While you could use OCR (your margin doodles should be ignored by the process), you could also use tags; TIFF is tag-based. If you do decide to tag, pick specific terms that you'll use throughout the process and stick with them. That will make it possible for you to search and sort effectively.

If you want more information on formats and how to choose the right one for your project, see the Library of Congress's work on Sustainability for Digital Formats, particularly their content categories for Still Images and Text for more information. You might be interested in their Format Descriptions to help you decide on a format.
posted by k8lin at 8:52 AM on April 26, 2008


seconding vito90- hire someone to do it for you...
posted by mhaw at 9:55 AM on April 26, 2008


A couple of things to think about:

-what is the unit of access that you'll want when organizing the digitized documents? Notebook? Page? Range of pages?
-are there any photographs or graphics other than b&w line drawings in the notebooks?

A typical business/legal/governmental imaging approach to digitizing documents is bitonal scanning (300-600dpi) to TIFF with CCIT Group 4 compression. This creates very small files that are losslessly compressed, but works best if your source material is truly bitonal (i.e. black and white only -- even light pencil markings can sometimes be lost in the thresholding process). Group 4 Tiffs also turn into PDFs very nicely; I would consider PDF as an access/use format--there are lots of ways to manage libraries of PDF documents. Plus, you can use features of PDF to insert "bookmarks" or other structures & metadata into your digital files (since OCRing handwriting is not so reliable right now, I'd look for a solution where you can add some structure and tags within your documents, at minimum).

You could go ahead with your plan to photograph -- although like others, I think it's gonna be more of a PITA than you think, with the set-up, page flattening, lighting, aligning, cropping, etc. -- and still convert your camera RAW or jpegs to bitonal tiff. But all in all, I'm thirding the suggestion to pay someone to do the scanning for you.
posted by alb at 7:28 PM on April 26, 2008


if you scan or photograph the pages and then compile them into a PDF with acrobat pro, you can run OCR that will make them searchable (incl. through google desktop or spotlight) without messing up the formatting or losing the images. I don't know enough about the various OCR options to rate acrobat's OCR quality, but it worked well for a small project I had. Plus your pages stay in "book" format, which is nice if you ever want to read through them in order again. I personally wouldn't use iPhoto, but if you do, the iPhoto Library Manager works as advertised for keeping multiple libraries (esp. if it's on an external drive that won't always be available).
posted by Chris4d at 2:18 PM on May 2, 2008


« Older How do you work with your literary agent   |   Help me help my videos! Newer »
This thread is closed to new comments.