Turning HTML into a book?
December 20, 2006 7:26 PM
Subscribe
I'm trying to turn someone's blog into annual books for them as a gift (not Christmas). What's the best way to do this?
The book printer wants a PDF. I've already captured the HTML for the blog into files, and have written a perl script to parse the HTML into a basic structured text file that identifies the title, date, and other blog metadata in a standardized way.
My specific question: how can I import this text (with some HTML content in the bodies) into a word processor or other page layout program so that (a) the (simple) HTML formatting inside the blog content is preserved, and (b) styles are automatically assigned to the title, date and such so that I don't have to manually do it?
Contraints:
- I'm flexible about the layout software, but if it's not MS Word 2003 or Publisher, it needs to be free (and able to handle 200-300 page manuscripts in a single file).
- The structure of the file to import is flexible, since I'm creating it... for example, creating XML would not be difficult.
- I'm aware there are numerous HTML -> PDF options, but I really want to use a WYSIWYG style layout program that supports TOC creation, page numbering, and so on.
- The work to automate can't be too elaborate, since this is a one-off (there are about 600 blog entries spread out over three books).
Any suggestions?
posted by reborndata to computers & internet (15 comments total)
1 user marked this as a favorite
You'll probably have to be a bit wary of its interpretation of HTML, (you might have tweak your perl script and re-export a couple of times perhaps) but it should work.
posted by AmbroseChapel at 8:12 PM on December 20, 2006