A program that searches .tif files?
December 5, 2010 9:40 AM   Subscribe

Is there a inexpensive or free way to search a mass quantity of .tif files for words or phrases within the documents?

I have 10 cd's of documents with 10K + documents overall. Most of the information is not pertinent to what I need, but there very well may be 2 or 3 things I need for a project I am working on. I know there are eDiscovery products like Concordance and Summation that are made for this, but that is not cost effective and this may be the only time I will need to do this.

I have been using IrfanView to run the docs as a slide show but this is very time consuming. Any ideas?
posted by readery to Computers & Internet (7 answers total)
 
Evernote scans images for text, but I don't think it works on tiff files. Perhaps you could use something like graphic converter to bulk convert them to jpegs (possibly in small size) and then import into Evernote.
posted by kdern at 9:45 AM on December 5, 2010


Best answer: Get a trial version of Acrobat and make it batch-convert the TIFFs to PDFs. Then batch-OCR the PDFs. Optionally batch-index them. Done. Probably still 2 to 3 hours of work, though.
posted by oxit at 9:48 AM on December 5, 2010


Seconding batch conversion and running through Evernote... but watch the upload limit!
posted by Master Gunner at 10:53 AM on December 5, 2010


Best answer: oxit has it.

Specifically, File->Create PDF->From Multiple Files (namely, all the TIFFs, temporarily copied onto your local hard drive) and then Document->OCR Text Recognition->Recognize Text using OCR.

It will take a while to run and the accuracy will, of course, vary with the quality of the image. DO NOT downsample them to jpegs, because that is a lossy compression format that will give the OCR algorithm less data and reduce its accuracy.

If your images have no pagination or other embedded reference marks, note the order in which Adobe reads them in, so you can cite your discoveries later.
posted by d. z. wang at 11:11 AM on December 5, 2010


For optimum clarity i would first run them through Scantailor. To clean them up a little if they are scanned.

Then use either ABBY or Acrobat to convert to a searchable PDF. I prefer ABBY but YMMV.

Would take an hour or so to set it all up, then it can be betch processed while you are away from the computer.
posted by moochoo at 1:34 PM on December 5, 2010


If you have Microsoft Office (XP up to 2007), you already have an OCR that will work directly with the TIF files, no need to convert to PDF.

Instructions here!
posted by geodave at 6:56 PM on December 5, 2010


if you batch convert the images to JPG/PNG then you can use google docs to achieve this.
posted by asymptotic at 5:43 AM on December 6, 2010


« Older computer--router, router--internet...   |   Been chained up for the past year Newer »
This thread is closed to new comments.