<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Program to find and extract images in book scans?</title>
	<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans/</link>
	<description>Comments on Ask MetaFilter post Program to find and extract images in book scans?</description>
	<pubDate>Fri, 03 Oct 2008 08:08:39 -0800</pubDate>
	<lastBuildDate>Fri, 03 Oct 2008 08:08:39 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Program to find and extract images in book scans?</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans</link>	
		<description>Is there a program that will automatically take scans of pages that contain text and images and then extract the images and place them in separate files? Open source would be best. &lt;br /&gt;&lt;br /&gt; I have hundreds of pages of a scanned book that contain text and images. I&apos;d like to have all the images grabbed out of those scans and placed in their own files, with unique names, and with some kind of indication in the image file names which original scan they came from. This is a monstrous chore to do manually, but it strikes me as something that should be fairly easy to do programmatically.</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.103319</guid>
		<pubDate>Fri, 03 Oct 2008 07:35:47 -0800</pubDate>
		<dc:creator>Mo Nickels</dc:creator>
		
			<category>scans</category>
		
			<category>images</category>
		
			<category>scanning</category>
		
			<category>image</category>
		
			<category>opensource</category>
		
			<category>resolved</category>
		
	</item> <item>
		<title>By: JJ86</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496773</link>	
		<description>Not very likely. It is very complex to do programmatically unless all the scanned images are in the same location/size on each page.&lt;br&gt;
&lt;br&gt;
You might be able to create an action in photoshop which could do it but it would be very complicated and may not work too well if at all.&lt;br&gt;
&lt;br&gt;
If you want open source the best place to start searching is &lt;a href=&quot;http://sourceforge.net&quot;&gt;Sourceforge&lt;/a&gt;. Good luck.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496773</guid>
		<pubDate>Fri, 03 Oct 2008 08:08:39 -0800</pubDate>
		<dc:creator>JJ86</dc:creator>
	</item><item>
		<title>By: mmascolino</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496785</link>	
		<description>Actually, in the grand scheme of image processing/analysis this sounds relatively easy (emphasis on relatively).  You&apos;d basically be looking for rectangular areas that were mostly not white.  You&apos;d have to compensate for the rectangles not being exactly rectangular due to how the pages were scanned and deal with the vagueness of what it means to be not-white.&lt;br&gt;
&lt;br&gt;
The problem is that this sounds real special purpose app so someone might not have written and released a tool that does what you want.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496785</guid>
		<pubDate>Fri, 03 Oct 2008 08:21:41 -0800</pubDate>
		<dc:creator>mmascolino</dc:creator>
	</item><item>
		<title>By: scruss</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496786</link>	
		<description>It&apos;s going to be hard detecting images programatically. I&apos;ve done this manually for several books; working from copies of the original files, open them 32 pages at a time in gimp and roughly crop out the images and save them. If there are multiple images on a page, duplicate the image in gimp before you crop. You can then - if you have a clean white background - splat the rough crops through pnmcrop to clean off the excess.&lt;br&gt;
&lt;br&gt;
You might want to use something like &lt;a href=&quot;http://unpaper.berlios.de/&quot;&gt;unpaper&lt;/a&gt; to batch clean and deskew the scans. Wish it had existed when was doing this sort of thing.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496786</guid>
		<pubDate>Fri, 03 Oct 2008 08:22:19 -0800</pubDate>
		<dc:creator>scruss</dc:creator>
	</item><item>
		<title>By: mrbarrett.com</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496788</link>	
		<description>Also useful for digitizing all those girly magazines in the closet...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496788</guid>
		<pubDate>Fri, 03 Oct 2008 08:23:45 -0800</pubDate>
		<dc:creator>mrbarrett.com</dc:creator>
	</item><item>
		<title>By: Mo Nickels</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496847</link>	
		<description>By the way, Adobe Acrobat Pro will do it as part of its OCR function, but its crops tend to be poor, leaving most images clipped one way or another. It&apos;s better than nothing but it&apos;s not good enough for daily use.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496847</guid>
		<pubDate>Fri, 03 Oct 2008 09:18:12 -0800</pubDate>
		<dc:creator>Mo Nickels</dc:creator>
	</item><item>
		<title>By: bashos_frog</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1496914</link>	
		<description>Not sure, but would something like &lt;a href=&quot;http://evernote.com/&quot;&gt;Evernote&lt;/a&gt; help?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1496914</guid>
		<pubDate>Fri, 03 Oct 2008 10:27:29 -0800</pubDate>
		<dc:creator>bashos_frog</dc:creator>
	</item><item>
		<title>By: rajbot</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1497300</link>	
		<description>I wrote an image extractor for the Internet Archive that grabbed image coordinates from the xml output of a propriatery ocr engine. If the books you have are public domain, upload the images to archive.org and we can run them through the same ocr system. The image coordinates are very accurate.&lt;br&gt;
&lt;br&gt;
If I was going to redo this using open source tools, I would use the layout analysis engine from Ocropus to get image coordinates.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1497300</guid>
		<pubDate>Fri, 03 Oct 2008 18:20:26 -0800</pubDate>
		<dc:creator>rajbot</dc:creator>
	</item><item>
		<title>By: kristi</title>
		<link>http://ask.metafilter.com/103319/Program-to-find-and-extract-images-in-book-scans#1497381</link>	
		<description>Maybe I&apos;m mistaken, but I think OmniPage Pro does this reasonably well. Not open source, not cheap, and no trial version, although I seem to remember them having a money-back guarantee.&lt;br&gt;
&lt;br&gt;
Have you tried &lt;a href=&quot;http://code.google.com/p/ocropus/downloads/list&quot;&gt;OCROpus&lt;/a&gt;? I don&apos;t see a feature list, so I don&apos;t know whether it purports to extract images into files, but it might be worth a try.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103319-1497381</guid>
		<pubDate>Fri, 03 Oct 2008 21:35:03 -0800</pubDate>
		<dc:creator>kristi</dc:creator>
	</item>
	</channel>
</rss>
