Wikibook -> ebook?
December 1, 2008 8:06 AM   RSS feed for this thread Subscribe

How can I convert HTML+Javascript+CSS to RTF (or plain HTML)?

I'm trying to put a Wikibook on my phone to read on the subway. The "Printable Version" (Example) of each article appears to be simple HTML, but looking at the source reveals that there's a whole lot of element-hiding javascript in there. It just plain crashes µBook.

I tried copying the page into MS Word, but then all the hidden links and elements appear.

Is there a way to save post-javascript HTML? (Or, better yet, RTF -- so the pictures get embedded?)
posted by Jonathan Harford to computers & internet (10 comments total)
Try this version. I simply copied the text from your Example link into Text Edit (a basic text editor that comes with all Macs). Formatting is mostly maintained without the ugly code underneath.
posted by furtive at 8:35 AM on December 1, 2008


If you're using firefox you can go file > save page as... and then choose "Web Page, HTML only" as the type in the save dialog.
posted by symbollocks at 8:36 AM on December 1, 2008


Thanks, furtive -- but it's copying over the "[edit]"s that should be hidden! It's also removing all the formatting, which seems a shame.

symbollocks, I tried your solution and got the same scripty mess.
posted by Jonathan Harford at 8:49 AM on December 1, 2008


I got a better version by opening with Word, saving as rtf, and then opening with WordPad. That removed the [edit]s, preserved the formatting, but did leave in a couple of icons indicating scripts at the top & bottom of the file. They were easy to snip out, however.
posted by clerestory at 9:18 AM on December 1, 2008


clarifications: By opening with Word, I mean that I pasted the URL into the file open dialog box in Word. I don't know if that makes a difference. Also, there's still gunk in the file - I don't know if µBook will choke on it.
posted by clerestory at 9:22 AM on December 1, 2008


Fantastic! That's exactly what I wanted, clerestory. Shame OpenOffice doesn't seem to have a similar ability — it'd be nice to have a libre solution to this problem.
posted by Jonathan Harford at 9:31 AM on December 1, 2008


I usually use Notepad for doing that kind of "display-only" stripping out. (Usually I'm doing it to get rid of Excel formulae.)

And if you want a libre solution to the problem, you'll find that Notepad++ is not only open-source (it's based on the Scintilla editor engine) but incredibly full-featured. For example, if you open your e-book there, you'll find that the only thing left will be the little HTML tags - and you can remove those with a single command: TextFX->Convert->Strip HTML tags.

I highly recommend it.
posted by koeselitz at 10:06 AM on December 1, 2008


Ah, but koeselitz -- then I remove ALL formatting, which is something I'm trying to avoid.

(PS: Also a lover of Notepad++)
posted by Jonathan Harford at 10:36 AM on December 1, 2008


Aww. It turns out the RTF that MS Word makes leaves hidden markup that µBook chokes upon.
posted by Jonathan Harford at 4:10 PM on December 1, 2008


I was afraid of that. What about using the pdf option on wikibooks and sending it through a pdf-to-html converter?
posted by clerestory at 10:00 AM on December 2, 2008


« Older Objects keep disappearing in m...   |   Christmas Gift Filter: I am lo... Newer »
This thread is closed to new comments.