How do I extract bitmaps from Excel documents?
July 5, 2005 3:35 AM   Subscribe

I have a number of Excel files with thousands of small bitmap images embedded. I need to extract these into individual files which a designer can put into Quark and match up with the original text.

I work for a very small publishing company and somebody came to us with a reference book they had put together as a hobby. My boss wants to publish it but when we recieved the files they were all in Excel files (and the writer has deleted the original images).

I managed to convert it into HTML and PNG's with openoffice.org but it gave the PNG's arbitrary names and there are just too many to pair them up manually. If they were named sequentially it would be fine.

The document was formated so that each row had one image and the relevent names / dates. There are 66 Excel files with on average 100 images each.

I am using a Mac with OS X, I could try PC based utilities but not immediatly so would prefer a Mac based solution.

If anybody can give suggestions it would be appreciated.
posted by skarmj to Computers & Internet (13 answers total)
 
Just how arbitrary are the PNG's names?
posted by orthogonality at 3:59 AM on July 5, 2005


Response by poster: For example assay1_html_3a8cbefc.png and assay1_html_7ae03be6.png

assay1 is the name of the original Excel file in these examples.

The pictures themselves can be very similar to each other because some are made of initials and dates. The book is about the manufacturer marks on antiques.
posted by skarmj at 4:18 AM on July 5, 2005


God DAMN I HATE when they submit in Office format. I shuddered when I saw "Excel", "Quark" and "images".

Have you checked the size of the PNGs? There's a very, very good chance (ie 100%) that they're going to be far too small to use for press. Office squishes and mangles when it imports images, and if the guy who created them didn't know well enough not to sodding put them into Excel, they probably weren't big enough in the first place.

Worth checking before going through the pain of what you're attempting.
posted by bonaldi at 4:35 AM on July 5, 2005


(What you want for your specific task, btw, is a script that lists the folder the images are in, sorting by date, then takes that list and renames all the files sequentially. It sounds v. trivial for a unix person to write, but I'm not really one of them, sorry )
posted by bonaldi at 4:39 AM on July 5, 2005


Response by poster: The images would need to be printed about 2cm high (about the size of the mark they represent in the real world) and the PNG files, according to photoshop, are about 10cm at 72res (2.5cm at 300res). Because they are only two-tone black on white images (like a silhouette) we hope to get away with it.

Unfortunately the dates of the files are all the same.

I was wondering whether a bulk downloading tool to leech images from websites might be able to rename from top of the page down. I looked at couple of extensions for firefox but they did not seem to rename files.
posted by skarmj at 5:24 AM on July 5, 2005


First idea, hop over to Mr Excel - you may get a broader selection of ideas...

For what it's worth, though, I think that it sounds like you'd be best off with some custom macro code. You can programmatically read each line and extract the relevant data (initials, date, whatever). It should also be possible to identify the objects (i.e. the pictures) embedded into the spreadsheet, and to save these off to a specific directory - which I would imagine will be named the same as the sheet name, and each image name made up of the extracted data?

I've just fired up my PC (don't have Excel on the Mac...) to have a play around - I'll drop you a note with any findings, but in the meantime hopefully this will give you some more avenues of research...
posted by Chunder at 5:52 AM on July 5, 2005


Response by poster: Thanks. Since I don't have Excel on this Mac (it is on the office PC though) I haven't been able to experiment with it much, and had not thought about using a macro in Excel either. That idea does sound right.

I'll have a look at Mr Excel now.
posted by skarmj at 6:30 AM on July 5, 2005


Hmm - possibly that wasn't quite as useful as I'd originally thought... it's possible to programmatically run through each picture on a sheet, copy the picture to the clipboard, and find the values of whatever cells you want offset from the cell in which the top left corner of the image falls... all using a tiny bit of code:

Sub test()

Dim o As Object

For Each o In ActiveSheet.Shapes
   If o.Type = 13 Then
      o.CopyPicture 'Stick the picture onto the clipboard
      var1 = Cells(o.TopLeftCell.Row, 1).Value 'This reads the cell value from column 1 in the same row as the picture...
      var2 = Cells(o.TopLeftCell.Row, 2).Value 'and this reads the cell value from column 2...
   End If
Next 'Breakpoint here, otherwise it will zoom through and you'll only ever see the last picture on the clipboard
     'as it will be overwritten each time it gets to the "CopyClipboard" command ;-)

End Sub


Unfortunately, I can't find a way to get things off the damned clipboard once they're there :-) There's a large (and complicated; perhaps over-complicated) solution here but that relies on Windows OLE linkages to Word and stuff - i.e. it opens Word (or a paint program, or whatever), pastes the clipboard contents as a new document, and saves it. I have a sneaky suspicion that this code won't work on the Mac ;-)

What version of MacOS are you using? If you've got Automator (which I think was part of Tiger; not sure if it was available before...) then perhaps you could check what options are made available from Excel - i.e. open automator, create a new workflow, and under the Applications, select Excel and see what Actions are shown...

Apart from that, I don't think that I can be any more help - sorry!

ARGH! When you preview, it removes things like the   code and replaces it with the actual character... doh!
posted by Chunder at 6:47 AM on July 5, 2005


Are the HTML files a series of tables? If so, you could try using the perl module HTML::TableContentParser. You could write a script that would, when encountering an IMG tag, capture the text beside it (i.e. names/dates) and rename the file to something a little easier to read.
posted by bachelor#3 at 7:37 AM on July 5, 2005


Response by poster: I found this via Mr Excel which exports from Excel the images as picture1.tif, picture2.tif and so on, but there are still two problems with it. First, I would need to reset the image sizes in Excel to 100% (they are currently all 11%, perhaps a macro could do this?), and second, the number order is not entirely right. Of my test run about 15 of 67 were given later numbers than they appeared on the page. I suspect it is using the order they were inserted into the document.
This is certainly closer than before, but I will keep looking.

Yes, the HTML is a basically a table. I've never used perl so don't know how easy it will be to do. It would need to take the text within the TD tag that follows the IMG tag but not overwrite when there are duplicate names.
posted by skarmj at 8:12 AM on July 5, 2005


@skarmj - yeah, objects will be numbered and processed in the order in which they were added, rather than the order in which they appear.
The quick fix would be to use the code posted previously, and replace everything between the "If o.Type = 13 Then" and "End If" statements with the following:

o.Name = "Picture " & o.TopLeftCell.row

This will rename each picture with the row number in which it was placed. Alternatively, you could hybridise the two bits of code and name the files using the data that was extracted from the sheet (e.g. "var1" and "var2", above)...

No idea how to do the scaling thing, though - I can't see any property of the picture object relating to zoom, scale, original_size, actual_size, or anything... :-(
posted by Chunder at 8:34 AM on July 5, 2005


Damn - odinsdream got it ;-) It's a method of the Shape object, and - naturally - doesn't appear in the Watch window. Doh. Obviously need more coffee...
posted by Chunder at 8:37 AM on July 5, 2005


Response by poster: Thanks odinsdream, I should have tried exporting from Excel before as this saves them in the right order. It also seems to save two copies of the image (big and small) but I could work with this. We only have Excel on the Windows PC used by the administrator so my ability to experiment with Excel is limited to when she's away (afternoons).

Also thanks Chunder for your effort.
posted by skarmj at 9:12 AM on July 5, 2005


« Older When I sneeze, I get hiccups!   |   Info on Kalahari Waterpark Newer »
This thread is closed to new comments.