Strip HTML from Excel
October 27, 2006 2:22 AM   Subscribe

I have a large excel spreadsheet that contains various html tags, mainly bold and anchor tags (in upper case and lower case). How can i automagically remove them?

I've tried doing a search and replace to get rid of the bold tags, but it tells me "the formula is too long".

Any ideas?

(it also shows lots of box characters, which i presume are linebreaks from when the data was online)
posted by anonaccount to Computers & Internet (9 answers total)
 
Export to CSV... Then search and replace in anything better equipped?

(I am *NOT* and Excelpert, and I'm sure there are automagical ways of doing this. Plus, exporting to CSV might screw up any formulas. Or it might not. Seriously, I don't know anything. This is what you get, asking for advice from strangers on the internets. ;-)
posted by disillusioned at 2:24 AM on October 27, 2006


For the record, my thought process was perhaps that you could use regular expressions elsewhere.

Or maybe use this.
posted by disillusioned at 2:26 AM on October 27, 2006


Response by poster: There are no formulars, its just data that contains HTML from when it stored online content. Its being repurposed and no longer needs all the HTML gumph and has to have it removed.

To complicate things more, the nitwit who managed it before would sometimes, nay, frequently, forget to close tags.

Saving it as a CSV and then doing a search and replace in Notepad sounds like a great idea. I'll give that a shot.
posted by anonaccount at 2:32 AM on October 27, 2006


Any reason you can't do the search & replace in the Excel file (Ctrl + H) instead of converting it to a .csv?

I'm not sure what you mean by the message "the formula is too long", but I just typed <b&gt into an Excel cell, then used Find & Replace (Ctrl + H) to find "<b&gt" and replace it with "" (i.e. nothing), and it found the bold tag fine and deleted it.
posted by EndsOfInvention at 2:38 AM on October 27, 2006


OK, I messed up the bold tags, but you know what I mean.
posted by EndsOfInvention at 3:21 AM on October 27, 2006


Depending on how the HTML is stored in your sheet, this might work:

Highlight the cells with HTML, then go to the Edit menu and select Clear->Formats.
posted by blue mustard at 5:25 AM on October 27, 2006


I'm a fan of ASAP Utilities, an add on for Excel that automates all those things that can be annoying to do in Excel like removing extra spaces and hyperlinks.

www.asap-utilities.com

The web cleaning tools are under 'web'.
posted by Ness at 7:59 AM on October 27, 2006


Text Pad is nice for searching for regular expression (\n for line breaks and \t for tabs) and has some limited macro support. I use it all the time to quickly format large text files.

Here's a link
posted by johnstein at 9:19 AM on October 27, 2006


I suggest trying with OpenOffice Calc. It won't have that specific bug, and you can do regular expression search and replace.
posted by Sharcho at 9:03 PM on October 27, 2006


« Older Last minute Halloween advice. With a twist!   |   Migrating Slang Newer »
This thread is closed to new comments.