help me find obscure ebooks and make them where necessary
September 17, 2006 7:49 AM   Subscribe

I care enough about getting all my physical media into digital form that I'm ready to hand-scan my remaining books where necessary. Before I do this, what are all the ways I should search to be sure a given book doesn't exist as an ebook I could buy (or in any other digital form)? Also: any hints for efficiently doing the scanning & OCR?
posted by allterrainbrain to Media & Arts (10 answers total) 4 users marked this as a favorite
Are you sure you want to buy ebooks? As far as I know they're all DRM ridden (except for the ones that have expired copyright). The ones that I got are keyed to the credit card number I bought them with. Of course when I get a new card and/or a new device to view them on, I need to re-authorize them (if I can...) and re-download them. And the software. If it's available for my device.

Also, there are probably not a lot of your books that are ebooks. The selection of ebooks compared to even the smalled book store is pretty awful.

On the OCR front, Google recently (re)released a pretty decent OCR app for free.

However the best way to do it would be to rent time on a professional book scanner. See if a library, school, university, museum or historical society around you has one. Otherwise, it's either going to be a pain or cost you tens of thousands of dollars.
posted by Ookseer at 8:16 AM on September 17, 2006

Response by poster: I will look into getting cheap time on a dedicated book scanner -- thanks!

I think a clumsily-DRM'd ebook is still a much better value than putting in the human hours to hand-scan, OCR and proof the text of a whole book.

I'm primarily asking this question to make sure I'm not missing a special site or way to search for existing ebooks.
posted by allterrainbrain at 8:29 AM on September 17, 2006

Just curious ... what's the impetus for wanting to give up physical media?
posted by jbickers at 9:18 AM on September 17, 2006

Don't forget to check Project Gutenberg for books.
posted by zerokey at 9:31 AM on September 17, 2006

IRC. If you can get over downloading already scanned/ocr'ed of books that you already own. #ebooks

The best book scanner on the market for a consumer is the Plustek Opticbook 3600. It's not that great for mass market paperback books (because it needs about 5 mm for the gutter), but for your own collection of hardbacks/trade paperbacks it's really gentle on the spine (since you don't have to flex it open past about 100 degrees)

I wouldn't buy an ebook that has been crippled by DRM.

I like to read on a PDA. Call me crazy but it's damned convenient. I hand-scan on occasion. I can scan a 400 page book on the above scanner in an hour, OCR takes about 15 minutes. 15 minutes or so for spell-checking/ file clean-up. But I've been doing it for a while and I've gotten pretty good at it over time. I was not that fast in the beginning.
posted by i_am_a_Jedi at 10:56 AM on September 17, 2006

Response by poster: To answer jbickers, I've always been super-minimal in terms of the physical stuff I own (I spent a lot of my childhood living in a bus and traveling penniless-hippie style, and I'm still traveling/floating some as an adult). I care about the flexibility and the near-weightlessness of electronic text.

(People forget that other media types can't really be called weightless... e.g., if you have a 10-oz external drive with 30 movies on it, they each physically weigh a third of an ounce. Probably more than a physical DVD weighs. But they're way more flexible to use, organize, back up & physically store/protect; and of course the higher your drive capacity, the less they would weigh each.)
posted by allterrainbrain at 12:24 PM on September 17, 2006

Response by poster: Thanks a lot for the Optibook 3600 suggestion! It's selling for as low as $220 new as of this writing (nothing cheaper on ebay). It's Windows-only, but I can deal with that.

Here's a review:

and some very good closeup pics:

posted by allterrainbrain at 12:38 PM on September 17, 2006 has about the easiest-to-live-with DRM around...books are keyed to your credit card number, and since they keep your complete download library available online they make re-keying your books very easy. Readers available for Windows, Palm, Windows Mobile and Symbian. Not sure about other OSs.
posted by lhauser at 7:46 PM on September 17, 2006

If you're going to go for the 'lots of OCR' option, you might be interested in browsing the forums at PGDP, where the people who do a lot of the proofing for Project Gutenberg have developed all sorts of tools for speeding up the process of turning raw OCR output back into books; I don't know how they work or exactly what they do - that's probably something to do with the way my first project is still lurking, unprocessed, in a dark corner of my hard drive, and also something to do with the way they mostly don't work on OS X unless you know how to do complicated setup stuff.

You've got to register to see the forums, but it's not like you've got to promise you'll only use your skills on public domain text....
posted by Lebannen at 6:30 AM on September 18, 2006

I've looked into this a lot, because I'd really like a searchable, annotatable, digital copy of my physical library. I've been slowly working on that for a few years, and of 500 or so books I have, I've probably been able to find about 250-300 of them in open formats in various places online. First, for nicely-formatted versions of most of the Gutenberg stuff, there is Manybooks. Also, there was a site called Blackmask that had all the Gutenberg stuff plus their own additional scanning/OCRing projects, but those got them into trouble and they got shut down last spring. They sold CD sets with their whole archive, so you might be able to find it on eBay or usenet.

Outside of public domain stuff, you have to go to really obscure corners of the net to find anything. The best two places are the various ebooks binaries groups on usenet, and the #bookz channel on IRC(undernet) #bookz is really the place to go to find out the full extent of what's out there, though new stuff tends to pop up first on the newsgroups. I gather a lot of the same people are involved in both, just from looking at the handles that tend to pop up in both places. This is, of course, not exactly legal, and not totally under the radar, so be careful.

As far as what actually is out there, well, it reflects the tastes of the kind of people who would go to the trouble of scanning and proofing books and uploading them to usenet or distributing them via bots on IRC. In other words, there are metric tons of bad scifi/genre, comics, and technical nonfiction. Everything else is very hit-and-miss. Classic 20th century fiction has been fleshed out a lot better over the past few years, to the point that I can find most of what's in my library in that area. New bestseller fiction tends to pop up pretty quickly. Other than that, it's a total crapshoot. You're not likely to find much in the way of politics/history, humanities, poetry, and so on, though every once in awhile you'll be suprised.

Beyond that, you can search the various P2P networks and turn something interesting up once in awhile, and I bet bittorrent is getting to be a better source of this kind of thing too, though I haven't looked into it much yet. I've even found whole in-copyright books in html by just Googling for them sometimes, though those tend not to last long of course.
posted by jdunn_entropy at 1:01 PM on September 18, 2006

« Older When does sex stop being painful (and start being...   |   Is it pig blood? Newer »
This thread is closed to new comments.