<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: How do I convert text to bare-bones HTML?</title>
	<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML/</link>
	<description>Comments on Ask MetaFilter post How do I convert text to bare-bones HTML?</description>
	<pubDate>Tue, 29 Nov 2005 11:24:35 -0800</pubDate>
	<lastBuildDate>Tue, 29 Nov 2005 11:24:35 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: How do I convert text to bare-bones HTML?</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML</link>	
		<description>How do I convert basic text formatting (italics, bold, underline, superscript, etc.) into HTML formatting on a semi-automated basis? &lt;br /&gt;&lt;br /&gt; Many of my clients&apos; websites are CMS-based, much like blogging software, allowing them to easily add new articles, update pages, etc.  They don&apos;t need to know about paragraph tags, break tags or any of the document-level HTML tags.  But they do need to insert character-formatting tags, like &lt;em&gt;em&lt;/em&gt;, &lt;em&gt;strong&lt;/em&gt;, and so on.  A clever UI, with &quot;bold&quot; and &quot;italic&quot; buttons means that they don&apos;t need to know HTML in order to mark these up.&lt;br&gt;
&lt;br&gt;
When porting large amounts of information, such as a twenty-page Word document, pasting the text inside of a textarea loses the formatting, and so somebody must go through and laboriously mark up the text with HTML to match the formatting of the original document.  This is impractical and error-prone.&lt;br&gt;
&lt;br&gt;
I&apos;ve tried programs like &lt;a href=&quot;http://wvware.sourceforge.net/&quot;&gt;wvWare&lt;/a&gt; and I&apos;ve tried saving the original content as HTML and then running it through &lt;a href=&quot;http://tidy.sourceforge.net/&quot;&gt;HTML Tidy&lt;/a&gt;, but I&apos;ve had no luck.  They create webpages.  I just want the inline markup converted, with no block-level or page-level tags.&lt;br&gt;
&lt;br&gt;
I figure that this can either happen by parsing a RTF file or through some JavaScript or OS-level magic, based on the text in the clipboard.  This must be a common need for anybody building a CMS, and yet I can&apos;t find any solutions to the problem.  Is there any widget (Flash, Java, whatever) into which I can paste formatted text and it will retain that formatting and generate HTML?  Some command-line application that will do the same?  Or do I need to -- god help me -- write my own PHP-based RTF parser?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2005:site.28082</guid>
		<pubDate>Tue, 29 Nov 2005 11:10:51 -0800</pubDate>
		<dc:creator>waldo</dc:creator>
		
			<category>cms</category>
		
			<category>html</category>
		
			<category>format</category>
		
			<category>convert</category>
		
			<category>text</category>
		
	</item> <item>
		<title>By: scottreynen</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442036</link>	
		<description>Check out &lt;a href=&quot;http://www.fckeditor.net/&quot;&gt;FCK editor&lt;/a&gt;. Works in IE and Firefox, and can paste directly from MS Word.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442036</guid>
		<pubDate>Tue, 29 Nov 2005 11:24:35 -0800</pubDate>
		<dc:creator>scottreynen</dc:creator>
	</item><item>
		<title>By: evariste</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442041</link>	
		<description>waldo:&lt;br&gt;
&lt;br&gt;
1. Fog Creek&apos;s CityDesk will output xhtml from text edited in a MS Word/Outlook-like richtext environment. So they can paste from Word into Citydesk, click the HTML tab, and copy the xhtml into the textarea. However, it would include things like &amp;lt;br /&amp;gt; and &amp;lt;p&amp;gt; tags.&lt;br&gt;
&lt;br&gt;
2. Not exactly what you want: Textile and Markdown let you write attractive plaintext that is translated into xhtml by text formatting plugins available for most CMSes, although that would involve training your clients.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442041</guid>
		<pubDate>Tue, 29 Nov 2005 11:29:16 -0800</pubDate>
		<dc:creator>evariste</dc:creator>
	</item><item>
		<title>By: evariste</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442043</link>	
		<description>Or what scottreynen said, which answers your question a lot better than my suggestions.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442043</guid>
		<pubDate>Tue, 29 Nov 2005 11:29:52 -0800</pubDate>
		<dc:creator>evariste</dc:creator>
	</item><item>
		<title>By: smackfu</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442044</link>	
		<description>How about using the &lt;a href=&quot;http://kb.mozillazine.org/Firefox_:_Midas&quot;&gt;midas editor control&lt;/a&gt;, like &lt;a href=&quot;http://www.mozilla.org/editor/midasdemo/&quot;&gt;this&lt;/a&gt;? It works in IE and Firefox and Safari.  You can paste in formatted text and get HTML out of it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442044</guid>
		<pubDate>Tue, 29 Nov 2005 11:30:03 -0800</pubDate>
		<dc:creator>smackfu</dc:creator>
	</item><item>
		<title>By: scruss</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442056</link>	
		<description>waldo, you don&apos;t want to write your own RTF parser. You don&apos;t even want to deal with an RTF token stream.&lt;br&gt;
&lt;br&gt;
All these WYSIWYG editors produce dreadful HTML. Looks worse than the cruft that Word produces.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442056</guid>
		<pubDate>Tue, 29 Nov 2005 11:40:48 -0800</pubDate>
		<dc:creator>scruss</dc:creator>
	</item><item>
		<title>By: waldo</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442064</link>	
		<description>Yeah, I didn&apos;t really want to say it, scruss, but...yeah.  I mean, these form-based editors are definitely pretty neat -- they produce code that faithfully mirrors the line height, font specifications, and word spacing of the original.  I&apos;d have to write a whole other program just to strip all that stuff out.&lt;br&gt;
&lt;br&gt;
I just want bold, italics, underline, subscript, and superscript conversion, or a similarly stripped-down level of conversion.  I&apos;d think this would be a pretty common need, with the newfound popularity of blogging software.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442064</guid>
		<pubDate>Tue, 29 Nov 2005 11:45:40 -0800</pubDate>
		<dc:creator>waldo</dc:creator>
	</item><item>
		<title>By: twiggy</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442097</link>	
		<description>waldo: There&apos;s a difference between WYSIWYG editors, and javascript tools that just enable WYSIWYG text input in an HTML textbox...  Some of those might generate much cleaner HTML code...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442097</guid>
		<pubDate>Tue, 29 Nov 2005 12:05:58 -0800</pubDate>
		<dc:creator>twiggy</dc:creator>
	</item><item>
		<title>By: zsazsa</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442116</link>	
		<description>It looks like those WYSIWYG input boxes are just taking down the HTML instance of what&apos;s in the clipboard.  The source is ugly because Word, OpenOffice, or whatever is actually generating the HTML, not the browser, so you can&apos;t really improve improve on that clientside.  What you can do is run things through HTML Tidy on the server once it&apos;s submitted to make things sane again.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442116</guid>
		<pubDate>Tue, 29 Nov 2005 12:18:48 -0800</pubDate>
		<dc:creator>zsazsa</dc:creator>
	</item><item>
		<title>By: If I Had An Anus</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442125</link>	
		<description>I thought FCK editor put out decent code.&lt;br&gt;
&lt;br&gt;
Tidy with the &lt;a href=&apos;http://tidy.sourceforge.net/docs/quickref.html#show-body-only&apos;&gt;show-body-only&lt;/a&gt; option should help.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442125</guid>
		<pubDate>Tue, 29 Nov 2005 12:27:00 -0800</pubDate>
		<dc:creator>If I Had An Anus</dc:creator>
	</item><item>
		<title>By: yerfatma</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442128</link>	
		<description>&lt;em&gt;All these WYSIWYG editors produce dreadful HTML. Looks worse than the cruft that Word produces.&lt;/em&gt;&lt;br&gt;
&lt;br&gt;
Not true. We use FCK in a custom CMS and it works fine. It gives you fine-grained control over what you can do. It requires a bit of work to customize it, but it lets you customize it, and in a well-thought out way. It lets you choose which buttons you expose to your users (so you could easily define a toolbar with just the controls you want). &lt;br&gt;
&lt;br&gt;
The only downside is you need to strip out all the extraneous tags yourself, but that&apos;s a solved problem you can find on the net in most every language (except for the one we had to work in, ASP/ VBScript). The basic idea is you pass the submitted content through a regex filter that strips out anything that looks like an html tag unless it matches a set of tags you allow (you should also consider allowing some tags by themselves and some tags with attributes).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442128</guid>
		<pubDate>Tue, 29 Nov 2005 12:29:24 -0800</pubDate>
		<dc:creator>yerfatma</dc:creator>
	</item><item>
		<title>By: kirkaracha</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442131</link>	
		<description>&lt;a href=&quot;http://tinymce.moxiecode.com/&quot;&gt;TinyMCE&lt;/a&gt; (similar to FCKEditor) works pretty well, and you can &lt;a href=&quot;http://tinymce.moxiecode.com/tinymce/docs/reference_configuration.html&quot;&gt;configure it&lt;/a&gt; to produce clean code.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442131</guid>
		<pubDate>Tue, 29 Nov 2005 12:32:27 -0800</pubDate>
		<dc:creator>kirkaracha</dc:creator>
	</item><item>
		<title>By: waldo</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442146</link>	
		<description>This WYSIWYG-meets-Tidy-meets-&lt;a href=&quot;http://us3.php.net/strip-tags&quot;&gt;strip_tags&lt;/a&gt; option sounds pretty compelling.  Hideous.  But compelling. :)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442146</guid>
		<pubDate>Tue, 29 Nov 2005 12:43:14 -0800</pubDate>
		<dc:creator>waldo</dc:creator>
	</item><item>
		<title>By: miniape</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442179</link>	
		<description>&lt;a href=&quot;http://www.kevinroth.com/rte/demo.php&quot;&gt;This&lt;/a&gt; has a little more than you want, but you the code is all public domain so you can make changes if you&apos;d like.&lt;br&gt;
&lt;br&gt;
It even won some sort of contest.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442179</guid>
		<pubDate>Tue, 29 Nov 2005 13:17:27 -0800</pubDate>
		<dc:creator>miniape</dc:creator>
	</item><item>
		<title>By: CrayDrygu</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442276</link>	
		<description>Now, I haven&apos;t actually tried it, but it seems like wvWare and HTMLTidy did exactly what you want, except that it produced a whole page instead of just a chunk of HTML.  Why not take the output from that, and trim out the &amp;lt;body&amp;gt; tag and everything before it, and the &amp;lt;/body&amp;gt; tag and everything after?  I don&apos;t remember my regular expressions, but I&apos;m sure it could be done with sed.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442276</guid>
		<pubDate>Tue, 29 Nov 2005 14:56:20 -0800</pubDate>
		<dc:creator>CrayDrygu</dc:creator>
	</item><item>
		<title>By: yclipse</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442308</link>	
		<description>The &lt;a href=&quot;http://www.atlantiswordprocessor.com&quot;&gt;Atlantis Word Processor&lt;/a&gt; is a very capable RTF editor which produces nice clean HTML when you choose &quot;Save as web page&quot;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442308</guid>
		<pubDate>Tue, 29 Nov 2005 15:24:11 -0800</pubDate>
		<dc:creator>yclipse</dc:creator>
	</item><item>
		<title>By: waldo</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442403</link>	
		<description>CrayDrygu, I&apos;m starting to think that you&apos;re right.  It&apos;s not pretty, and it will still require post-processing with strip_tags (or regex, as you point out), but it may well work.  I&apos;m playing with it now.  It may be a good 90% solution, which is better than the 0% where I&apos;m at now. :)&lt;br&gt;
&lt;br&gt;
This makes me want to learn to write Firefox plugins, just so I can solve this problem for good.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442403</guid>
		<pubDate>Tue, 29 Nov 2005 16:54:38 -0800</pubDate>
		<dc:creator>waldo</dc:creator>
	</item><item>
		<title>By: CrayDrygu</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442431</link>	
		<description>Yeah, &quot;not pretty&quot; is just about the best way to describe any *nix-based solution like this, for one simple reason: it&apos;s all based on the &quot;no jacknives&quot; theory.  A program should do one thing, and do it well, rather than trying to do everything (and inevitably doing them all poorly).&lt;br&gt;
&lt;br&gt;
So you have a program that converts Word documents to HTML documents.  You feed that into a program that beautifies/simplifies the HTML.  Then you feed &lt;i&gt;that&lt;/i&gt; into a program that strips out any unwanted tags.  Combine it with an upload script, and a script to put the final output into the blog, and you&apos;ve got yourself a real utility there.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442431</guid>
		<pubDate>Tue, 29 Nov 2005 17:13:58 -0800</pubDate>
		<dc:creator>CrayDrygu</dc:creator>
	</item><item>
		<title>By: IndigoRain</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442710</link>	
		<description>&lt;a href=&quot;http://www.cybermatrix.com/txt2html.html&quot;&gt;Text 2 HTML freeware&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
I HATE HTML Tidy. It turns &lt;a href=&quot;http://www.upontherainbow.com/images/neat.jpg&quot;&gt;my nice, neat hand-coding&lt;/a&gt; into &lt;a href=&quot;http://www.upontherainbow.com/images/tidy.jpg&quot;&gt;this unreadable garbage.&lt;/a&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442710</guid>
		<pubDate>Tue, 29 Nov 2005 23:06:11 -0800</pubDate>
		<dc:creator>IndigoRain</dc:creator>
	</item><item>
		<title>By: If I Had An Anus</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#442813</link>	
		<description>Heh, center tags are funny.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-442813</guid>
		<pubDate>Wed, 30 Nov 2005 06:19:11 -0800</pubDate>
		<dc:creator>If I Had An Anus</dc:creator>
	</item><item>
		<title>By: waldo</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#443140</link>	
		<description>IndigoRain, you&apos;re using HTML Tidy wrong.  It&apos;s a powerful tool, but not if you skip reading the manual. :)  You want to use the --wrap 0 flag.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-443140</guid>
		<pubDate>Wed, 30 Nov 2005 10:53:55 -0800</pubDate>
		<dc:creator>waldo</dc:creator>
	</item><item>
		<title>By: IndigoRain</title>
		<link>http://ask.metafilter.com/28082/How-do-I-convert-text-to-barebones-HTML#443936</link>	
		<description>Perhaps I have been unfair.  I shall read the manual and re-evaluate.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28082-443936</guid>
		<pubDate>Wed, 30 Nov 2005 22:10:26 -0800</pubDate>
		<dc:creator>IndigoRain</dc:creator>
	</item>
	</channel>
</rss>
