Looking for mostly text .tif files
April 26, 2012 11:41 AM   Subscribe

I'm looking to download a large number, at least 500, of mostly text .tif files to use in developing a piece of software.

Optimal would be a wikileaks email data set, but I haven't been able to find that as .tif files
posted by rakish_yet_centered to Computers & Internet (12 answers total)
 
Call your county recorder -- most have gone digital, they're public records available to anyone so there's no privacy issues about disclosure, so they may be able to just dump a bunch of a thumbdrive or CD for you.
posted by AzraelBrown at 11:58 AM on April 26, 2012


Also, the free version of Bullzip PDF printer will let you print almost anything to a TIF, with the option to save as Group4 and 1-bit scans, like a fax and like most OCR enjoys. So, load a large Wikileaks file, print it to a TIF using bullzip, and you'll have your TIFF version.
posted by AzraelBrown at 12:01 PM on April 26, 2012


What are the parameters of what you need? Does it need to be scanned? If not, there are tons of utilities that will export a PDF as a series of TIFFs, and PDFs abound.
posted by supercres at 12:01 PM on April 26, 2012


Response by poster: Getting PDF's first? Could do that, but it isn't optimal. When I poked around wikileaks I only found emails embedded in HTML pages. I like the county recorder idea though

Scanned? Yes. It's a document review program that converts tif to txt, creates a thumbnail, that sort of thing
posted by rakish_yet_centered at 12:11 PM on April 26, 2012


Response by poster: Scanned? I meant no...
posted by rakish_yet_centered at 12:18 PM on April 26, 2012 [1 favorite]


If you will drop the "TIFF" requirement up front, you might get more sources. ImageMagick will convert a batch of images to whatever format you want.
posted by cmiller at 12:55 PM on April 26, 2012


Response by poster: The .tif is not really a requirement, I use imagemagick, or graphicsmagick, I forget which, to convert from image to text, and image, But tiff is better, because that what the people I know actually used for document review projects.

You're right though, in the end I'll probably be using PDF's
posted by rakish_yet_centered at 1:07 PM on April 26, 2012


http://archive.org/details/opensource_English

They are in pdf, but it should not be difficult to separate the pdfs into individual images.
posted by demiurge at 2:14 PM on April 26, 2012


Response by poster: Internet Archive is a good idea, not exactly what I was looking for, but it might have to do
posted by rakish_yet_centered at 3:03 PM on April 26, 2012


Why dont you just take a PDF book / pamphlet, etc, and save it in to individual TIF files?
posted by wongcorgi at 5:27 PM on April 26, 2012


Best answer: I have over 6000 pages of evidence from the investigation into Lisa McPherson's death at The Lisa McPherson Files (formerly on MeFi Projects), all in TIF format.

Most are typed, but some (especially the Scientology-produced documents) are handwritten. You'd find mostly typed materials in the listing of police documents.

For example, this 1-page Florida Department of Law Enforcement summary has a link to its TIF file.

If these would be useful to you, feel free to MeMail me if I can make them easier for you to download. If you have Dropbox or an FTP directory, I'd be happy to send you over the whole collection of TIFs, or the subset that'd be most useful to you.
posted by kristi at 1:19 PM on April 28, 2012


Response by poster: Yes kristi, that is exactly what I'm looking for, me-mailing
posted by rakish_yet_centered at 6:33 PM on April 28, 2012


« Older Cheap Vacation in Midwest   |   I plan to remain an amateur, okay? Newer »
This thread is closed to new comments.