<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Strip my tags, please!</title>
	<link>http://ask.metafilter.com/26402/Strip-my-tags-please/</link>
	<description>Comments on Ask MetaFilter post Strip my tags, please!</description>
	<pubDate>Mon, 31 Oct 2005 13:35:20 -0800</pubDate>
	<lastBuildDate>Mon, 31 Oct 2005 13:35:20 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Strip my tags, please!</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please</link>	
		<description>GeekFilter: I want to strip all HTML tags from a page of text, leaving plain text. I have Text Wrangler and OS X.4. I thought it would be easy... &lt;br /&gt;&lt;br /&gt; I&apos;ve tried Googling for a script that would do this, but haven&apos;t been able to figure it out. I tried using the regular expression &lt;code&gt;&lt; [^&gt;]*&amp;gt;&lt;/&gt;&lt;/code&gt; in Text Wrangler with the &quot;use Grep&quot; option, but that doesn&apos;t seem to work either. I don&apos;t want to have to pay $25 for something like Text Soap. I have an AppleScript that will make the clipboard plain text. The ideal would be something like that that I can invoke to remove tags from the contents of the clipboard. Thanks!</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2005:site.26402</guid>
		<pubDate>Mon, 31 Oct 2005 13:31:47 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
		
			<category>computer</category>
		
			<category>osx</category>
		
			<category>text</category>
		
			<category>html</category>
		
	</item> <item>
		<title>By: kcm</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416263</link>	
		<description>lynx -dump</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416263</guid>
		<pubDate>Mon, 31 Oct 2005 13:35:20 -0800</pubDate>
		<dc:creator>kcm</dc:creator>
	</item><item>
		<title>By: sbutler</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416286</link>	
		<description>I don&apos;t know if it&apos;s a MeFi typo, but your regex has a space in it. Also, it needs to account for the closing tag, so try: &amp;lt;/?[^&amp;gt;]*&amp;gt;. That works for me in TW (also make sure to click &quot;Start at Top&quot;).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416286</guid>
		<pubDate>Mon, 31 Oct 2005 13:43:37 -0800</pubDate>
		<dc:creator>sbutler</dc:creator>
	</item><item>
		<title>By: al_fresco</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416291</link>	
		<description>&lt;i&gt;lynx -dump&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
Can you explain how I would use that? I don&apos;t know Unix (or any other language, for that matter), but I can usually Google and figure out what I need on a task-by-task basis. Context?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416291</guid>
		<pubDate>Mon, 31 Oct 2005 13:47:16 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
	</item><item>
		<title>By: sbutler</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416301</link>	
		<description>(actually, I guess it already did account for the closing tag. since things worked, I assume it was the space)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416301</guid>
		<pubDate>Mon, 31 Oct 2005 13:51:10 -0800</pubDate>
		<dc:creator>sbutler</dc:creator>
	</item><item>
		<title>By: al_fresco</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416310</link>	
		<description>Thanks &lt;b&gt;sbutler&lt;/b&gt;. That did the trick! I don&apos;t know how the space got there in my regex.&lt;br&gt;
&lt;br&gt;
Keep the answers coming, though. I&apos;d just as soon find a few ways to do this.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416310</guid>
		<pubDate>Mon, 31 Oct 2005 13:53:49 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
	</item><item>
		<title>By: unixrat</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416311</link>	
		<description>Lynx doesn&apos;t appear to be standard with OSX.&lt;br&gt;
&lt;br&gt;
However, if you installed it (or were on a Linux box), you could do:&lt;br&gt;
&lt;br&gt;
lynx -dump http://yoursite.com&lt;br&gt;
&lt;br&gt;
and it would print the website on stdout, formatted as &apos;text only&apos;.  (I prefer not to have the &apos;list of links&apos; for each page dumped, so I include -nolist as one of the options.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416311</guid>
		<pubDate>Mon, 31 Oct 2005 13:54:53 -0800</pubDate>
		<dc:creator>unixrat</dc:creator>
	</item><item>
		<title>By: unixrat</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416316</link>	
		<description>^^^ (While running &apos;terminal&apos; on your OSX box.  You need to be on a command line to run lynx.)  &lt;br&gt;
&lt;br&gt;
I did have that in there, I swear.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416316</guid>
		<pubDate>Mon, 31 Oct 2005 13:55:44 -0800</pubDate>
		<dc:creator>unixrat</dc:creator>
	</item><item>
		<title>By: adamrice</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416359</link>	
		<description>Call me crazy, but wouldn&apos;t the simplest method for this be to:&lt;br&gt;
&lt;br&gt;
1. View the web page in the browser of your choice;&lt;br&gt;
2. Select all;&lt;br&gt;
3. Copy;&lt;br&gt;
4. Go into the text editor of your choice;&lt;br&gt;
5. Paste.&lt;br&gt;
&lt;br&gt;
Not so great if you need to do this in batch mode, I guess.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416359</guid>
		<pubDate>Mon, 31 Oct 2005 14:26:28 -0800</pubDate>
		<dc:creator>adamrice</dc:creator>
	</item><item>
		<title>By: werty</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416363</link>	
		<description>In BBEdit, you can simply select all, then choose Remove Markup. Voila--no more tags. Not certain if TextWrangler has a similar feature.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416363</guid>
		<pubDate>Mon, 31 Oct 2005 14:28:28 -0800</pubDate>
		<dc:creator>werty</dc:creator>
	</item><item>
		<title>By: al_fresco</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416381</link>	
		<description>&lt;i&gt;Call me crazy, but wouldn&apos;t the simplest method for this be to:&lt;br&gt;
posted by &lt;b&gt;adamrice&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
This would work fine if I wanted to just grab the text of my whole page. I&apos;m sending out an HTML-formatted email newsletter, though, in an OS X app called Newsletter. I format the HTML portion the way I want it (which is different from my website), and then I want to clean the tags out for the Plain-text Alternative. Make sense? Thanks, though!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416381</guid>
		<pubDate>Mon, 31 Oct 2005 14:44:46 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
	</item><item>
		<title>By: ralawrence</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416383</link>	
		<description>It is worth pointing out that the above regexp will fail when you have html with &lt; and/or&gt; in the middle of a comment or as a value to a key.&lt;br&gt;
&lt;br&gt;
I&apos;ve not seen many sites that do this though.&lt;/&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416383</guid>
		<pubDate>Mon, 31 Oct 2005 14:45:30 -0800</pubDate>
		<dc:creator>ralawrence</dc:creator>
	</item><item>
		<title>By: al_fresco</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416385</link>	
		<description>&lt;i&gt;In BBEdit, you can simply select all, then choose Remove Markup. Voila--no more tags. Not certain if TextWrangler has a similar feature.&lt;br&gt;
posted by &lt;b&gt;werty&lt;/b&gt;&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
If it&apos;s there, I&apos;m not finding it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416385</guid>
		<pubDate>Mon, 31 Oct 2005 14:46:12 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
	</item><item>
		<title>By: Dick Paris</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416395</link>	
		<description>Excuse my nearsightedness, but I don&apos;t see how the &quot;remove markup&quot; command works here. Seems to leave much behind. What I do wonder though is why one would just not copy and paste the text from a web browser or BBedit (or TextWrangler, if available) preview? (Oops! I see A. Rice has the same question.)&lt;br&gt;
&lt;br&gt;
Thanks from me as well for the Grep expression though. :-)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416395</guid>
		<pubDate>Mon, 31 Oct 2005 14:49:56 -0800</pubDate>
		<dc:creator>Dick Paris</dc:creator>
	</item><item>
		<title>By: Dick Paris</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416396</link>	
		<description>&lt;small&gt;Another victim of live preview!&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416396</guid>
		<pubDate>Mon, 31 Oct 2005 14:51:03 -0800</pubDate>
		<dc:creator>Dick Paris</dc:creator>
	</item><item>
		<title>By: al_fresco</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416417</link>	
		<description>&lt;i&gt;&lt;b&gt;adamrice&lt;/b&gt;&lt;/i&gt; &amp;amp; &lt;i&gt;&lt;b&gt;Dick Paris&lt;/b&gt;&lt;/i&gt;:&lt;br&gt;
&lt;br&gt;
Actually, I just realized that Newsletter&apos;s Preview window works just like a browser in this regard, so I could have  just selected the contents of my preview and gotten the results I wanted. Not as sexy, but gets the job done.&lt;br&gt;
&lt;br&gt;
Thanks, all.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416417</guid>
		<pubDate>Mon, 31 Oct 2005 14:59:24 -0800</pubDate>
		<dc:creator>al_fresco</dc:creator>
	</item><item>
		<title>By: AmbroseChapel</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416441</link>	
		<description>For the moment? Download a demo of BBEdit. Use its &quot;Translate&quot; command (Under Markup/Utilities) with the &quot;Translate HTML to Text&quot; options.&lt;br&gt;
&lt;br&gt;
That will do a great job. &lt;br&gt;
&lt;br&gt;
For the longer term, you probably need to write a script to do this, because some tags just need to be removed and others need to be replaced so that paragraphs, etc., are preserved.&lt;br&gt;
&lt;br&gt;
You might want to translate HTML headings to ***TEXT*** for instance, to get as much impact as you can from plain text.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416441</guid>
		<pubDate>Mon, 31 Oct 2005 15:16:01 -0800</pubDate>
		<dc:creator>AmbroseChapel</dc:creator>
	</item><item>
		<title>By: holloway</title>
		<link>http://ask.metafilter.com/26402/Strip-my-tags-please#416752</link>	
		<description>The advantage of Lynx or Links (text browsers themselves) is that they&apos;ll reproduce structure (headings / tables / lists) in plain-text. So lists get bullets as *s, and tables are framed using + and | and - characters, images get ALT text, and stuff like that.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.26402-416752</guid>
		<pubDate>Mon, 31 Oct 2005 23:41:29 -0800</pubDate>
		<dc:creator>holloway</dc:creator>
	</item>
	</channel>
</rss>
