Join 3,424 readers in helping fund MetaFilter (Hide)


html tables are killin' me.
June 14, 2012 6:12 PM   Subscribe

How do I import tables from HTML into Excel when some cells have multiple lines?

OK, so I've been given an output of a table in HTML, and I need to get it into a spreadsheet so I can work with it. The cells are all text, and while a lot of the cells are single-line, there are 20%+ that show up as multiple lines in the cell.

Problem: When I try to paste the table into Excel, all of the cells with multiple lines end up as separate cells in the table, which means I can't sort them. I could probably consolidate each group of cells, but this table has several thousand entries, so I can't just go through, find each offending cell and fix it, I need a global solution. It doesn't matter if I have to run it through another tool to remove breaks, or whatever. I'm just not quite sure how to fix it, and google isn't helping me.

The data is sensitive, so I can't provide a screenshot. Hopefully the description I'e given makes sense. if not, I can modify the data and post an example.
posted by KGMoney to Computers & Internet (6 answers total) 1 user marked this as a favorite
 
Are you copying from a browser window to Excel?

The Table2Clipboard plugin for Firefox has worked for me in the past, though comments seem to indicate it's broken in FF12. TableTools2 might work too?
posted by Pinback at 6:31 PM on June 14, 2012


I assume that these multi-line cells are actually single cells with line breaks inside them, rather than separate cells that are just formatted to look like single cells.

You could copy the source code into a text editor, find/replace all the line break code with some symbol or sequence of symbols that doesn't appear in the table (e.g. ***), open the edited HTML page in a browser, copy it to Excel, and then find/replace the symbols to bring it back to the desired state.

Alternatively, if you don't need to retain the line breaks, you could just find/replace the line break code with a space.
posted by Paragon at 6:32 PM on June 14, 2012


Here is something to try which could work easily, if you have the file in HTM or HTML format and it really is that format.

Open the HTML file in Word and delete any non-table stuff at the beginning. "Save As" RTF.

In Excel, open the RTF file.
posted by caclwmr4 at 6:42 PM on June 14, 2012


check your memail
posted by lampshade at 7:16 PM on June 14, 2012


You know that excel has an actual HTML table importer in it, right? Not the copy/paste mechanism, but a full parser with a built-in selector tool. Try this (in E2010):
1. Go to the "Data" tab in ribbon
2. Click "From Web"
3. An excel window with your default web browser opens.
4. Navigate to the web page containing your secret HTML table
5. Select the table on screen

Does that give you higher quality results?
posted by blindcarboncopy at 11:41 PM on June 14, 2012 [1 favorite]


Thanks for the help, everybody.

I had tried the importer, but ended up with the same results. I'm sure there was probably a better way to do it, but what ended up working really quickly was just to copy it into Word and global replace the ^l, then pull it into Excel.
posted by KGMoney at 6:45 AM on June 15, 2012


« Older Where can I buy balance scales...   |  I'm in NYC and I may need to s... Newer »
This thread is closed to new comments.