Now that we've put the books online, how do we get them back onto paper?
September 18, 2009 1:40 PM
How do I prepare digitized public domain works for clean print on demand?
Organizations like Google and the Open Content Alliance are, of course, digitizing millions of public domain books. The OCA provides a variety of downloadable files (jpeg 2000, pdf full color or b/w, djvu) for their content. Their intent in scanning is to create images that are great for viewing on a computer screen. Printing and binding is not a consideration for them.
Here is a sample title. It's various files are located here.
So, of the variety of files offered, which would be the easiest to work with in order to create a legible paper copy from a vendor like Lulu.com? Things I'd need to be able to do:
- add binding margin, or occasionally margin all around,
- normalize page size (as images sometimes seem randomly cropped) with minimal image distortion,
- if working with the OCA PDF files, strip hidden characters (OCR),
- batch process, or minimize page-by-page work so that each can be prepared in less than 60 minutes,
- clean up shadows or distracting grayscale fields behind the text,
- deliver small-ish b/w PDF
I have had marginal (ha, get it?!) success with a few of these steps depending on which source I begin with, but no success on all points. I have access to Windows, Mac, and Linux operating systems, and Adobe Acrobat Pro on the Mac, but no budget for other pro tools. I'd love to hear of pro tools that would do the job, though, if there's a true magic bullet for all of the above processes.
I see this kind of material being put out by the University of Michigan and Cornell, but I'm not sure if they're starting from files meant for this kind of operation. Oh, and the 'why' and 'should you even try' questions are someone else's domain; I'm just trying to figure out how
Organizations like Google and the Open Content Alliance are, of course, digitizing millions of public domain books. The OCA provides a variety of downloadable files (jpeg 2000, pdf full color or b/w, djvu) for their content. Their intent in scanning is to create images that are great for viewing on a computer screen. Printing and binding is not a consideration for them.
Here is a sample title. It's various files are located here.
So, of the variety of files offered, which would be the easiest to work with in order to create a legible paper copy from a vendor like Lulu.com? Things I'd need to be able to do:
- add binding margin, or occasionally margin all around,
- normalize page size (as images sometimes seem randomly cropped) with minimal image distortion,
- if working with the OCA PDF files, strip hidden characters (OCR),
- batch process, or minimize page-by-page work so that each can be prepared in less than 60 minutes,
- clean up shadows or distracting grayscale fields behind the text,
- deliver small-ish b/w PDF
I have had marginal (ha, get it?!) success with a few of these steps depending on which source I begin with, but no success on all points. I have access to Windows, Mac, and Linux operating systems, and Adobe Acrobat Pro on the Mac, but no budget for other pro tools. I'd love to hear of pro tools that would do the job, though, if there's a true magic bullet for all of the above processes.
I see this kind of material being put out by the University of Michigan and Cornell, but I'm not sure if they're starting from files meant for this kind of operation. Oh, and the 'why' and 'should you even try' questions are someone else's domain; I'm just trying to figure out how
I'm not sure about cleaning up the images, but, on a Mac, I bet you could use some workflow with Automator and Preview or some other free image program. (I believe the tutorial shows you how to batch convert color images to b/w.)
The margins are going to be determined by how many pages you're printing, which also affects the bind you can use. Remember that margins are different for right and lefthand pages. The inner margin, called the gutter, is what is affected by the binding. (Remember, page 1 is always a righthand page.) I suggest you print out sample pages on 8.5 x 11 and measure how much more space you need according to LuLu's specs.
To actually set the margins, the university presses probably used InDesign, but I think you'll be able to use Word or a free word processing program that understands what a gutter is. You'll create a blank document with the correct margins and place an image on each page. Then you'll save the whole thing as a PDF.
You probably can automate placing these files using AppleScript. Your script would have a loop and a counter, the PDFs you place will have a common filename that only differentiates between each file by a serial incremented number. (You can use A Better Finder Rename to change the filenames in a batch.) In the AppleScript, you'll basically go to page x and place bookImagex.pdf and so on.
posted by ifandonlyif at 8:54 AM on September 19, 2009
The margins are going to be determined by how many pages you're printing, which also affects the bind you can use. Remember that margins are different for right and lefthand pages. The inner margin, called the gutter, is what is affected by the binding. (Remember, page 1 is always a righthand page.) I suggest you print out sample pages on 8.5 x 11 and measure how much more space you need according to LuLu's specs.
To actually set the margins, the university presses probably used InDesign, but I think you'll be able to use Word or a free word processing program that understands what a gutter is. You'll create a blank document with the correct margins and place an image on each page. Then you'll save the whole thing as a PDF.
You probably can automate placing these files using AppleScript. Your script would have a loop and a counter, the PDFs you place will have a common filename that only differentiates between each file by a serial incremented number. (You can use A Better Finder Rename to change the filenames in a batch.) In the AppleScript, you'll basically go to page x and place bookImagex.pdf and so on.
posted by ifandonlyif at 8:54 AM on September 19, 2009
This thread is closed to new comments.
For actual page processing, my insticts would be to work with something like Photoshop – record some well-calculated actions, rasterize the pages (or possibly use the jpg2000s) and batch process them all. Photoshop itself can cost a pretty penny, but there are free alternatives whose praises are often sung. The GIMP is your most likely bet, but I am not familiar with its depth and I don't know if it has batch processing nearly as complex as Photoshop's.
Two tools that also help with myriad pdf processing issues are Quite software (Quite a Box of Tricks and Quite Imposing are what I've used) and PitStop Professional. Unfortunately, each costs hundreds of dollars, so are not what you're looking for. Just thought I'd point them out. While pretty powerful, they're also quite kludgey and often frustrating, so don't think they'd be your 'magic bullet.'
Good luck.
posted by shemko at 12:55 AM on September 19, 2009