Organizations like Google and the Open Content Alliance are, of course, digitizing millions of public domain books. The OCA provides a variety of downloadable files (jpeg 2000, pdf full color or b/w, djvu) for their content. Their intent in scanning is to create images that are great for viewing on a computer screen. Printing and binding is not a consideration for them.
Here is a sample
title. It's various files are located
here.
So, of the variety of files offered, which would be the easiest to work with in order to create a legible paper copy from a vendor like
Lulu.com? Things I'd need to be able to do:
- add binding margin, or occasionally margin all around,
- normalize page size (as images sometimes seem randomly cropped) with minimal image distortion,
- if working with the OCA PDF files, strip hidden characters (OCR),
- batch process, or minimize page-by-page work so that each can be prepared in less than 60 minutes,
- clean up shadows or distracting grayscale fields behind the text,
- deliver small-ish b/w PDF
I have had marginal (ha, get it?!) success with a few of these steps depending on which source I begin with, but no success on all points. I have access to Windows, Mac, and Linux operating systems, and Adobe Acrobat Pro on the Mac, but no budget for other pro tools. I'd love to hear of pro tools that would do the job, though, if there's a true magic bullet for all of the above processes.
I see this kind of material being put out by the University of Michigan and Cornell, but I'm not sure if they're starting from files meant for this kind of operation. Oh, and the 'why' and 'should you even try' questions are someone else's domain; I'm just trying to figure out how
For actual page processing, my insticts would be to work with something like Photoshop – record some well-calculated actions, rasterize the pages (or possibly use the jpg2000s) and batch process them all. Photoshop itself can cost a pretty penny, but there are free alternatives whose praises are often sung. The GIMP is your most likely bet, but I am not familiar with its depth and I don't know if it has batch processing nearly as complex as Photoshop's.
Two tools that also help with myriad pdf processing issues are Quite software (Quite a Box of Tricks and Quite Imposing are what I've used) and PitStop Professional. Unfortunately, each costs hundreds of dollars, so are not what you're looking for. Just thought I'd point them out. While pretty powerful, they're also quite kludgey and often frustrating, so don't think they'd be your 'magic bullet.'
Good luck.
posted by shemko at 12:55 AM on September 19