How do I convert basic text formatting (italics, bold, underline, superscript, etc.) into HTML formatting on a semi-automated basis?
Many of my clients' websites are CMS-based, much like blogging software, allowing them to easily add new articles, update pages, etc. They don't need to know about paragraph tags, break tags or any of the document-level HTML tags. But they do need to insert character-formatting tags, like
em,
strong, and so on. A clever UI, with "bold" and "italic" buttons means that they don't need to know HTML in order to mark these up.
When porting large amounts of information, such as a twenty-page Word document, pasting the text inside of a textarea loses the formatting, and so somebody must go through and laboriously mark up the text with HTML to match the formatting of the original document. This is impractical and error-prone.
I've tried programs like
wvWare and I've tried saving the original content as HTML and then running it through
HTML Tidy, but I've had no luck. They create webpages. I just want the inline markup converted, with no block-level or page-level tags.
I figure that this can either happen by parsing a RTF file or through some JavaScript or OS-level magic, based on the text in the clipboard. This must be a common need for anybody building a CMS, and yet I can't find any solutions to the problem. Is there any widget (Flash, Java, whatever) into which I can paste formatted text and it will retain that formatting and generate HTML? Some command-line application that will do the same? Or do I need to -- god help me -- write my own PHP-based RTF parser?
posted by scottreynen at 11:24 AM on November 29, 2005