Export CHM book w/ no contents tree, only displays one random page?
November 11, 2021 9:03 PM   Subscribe

Years ago I downloaded a book of brief quotations in CHM format. On each page, these is only the book title, a quotation, and a button that says "Another?", clicking on which brings one to another random quotation/page, but there is no Table of Contents (tree) or way of moving to a specific or non-random page. I would like, if possible, to free up the text contained in the file so that I can have it all in one text file.

On my now ancient Mac, I can see the "Another?" button in both Chimp and Chmox, and view random pages one at a time, however my free version of Chimp will only let me export the first five topics to pdf, one per page.

Anyone know if there's a free Windows utility (since I now also have a Win10 box) that would let me export the contents of this .chm to a text file? Neither MS HTML Help executable nor Xchm show the TOC/contents tree (and xchm doesn't even show the "Another?" button!), so no luck with either of those. Before DLing and installing a bunch of CHM editors to see if one will do this, I thought I'd ask here first. Thank you!!
posted by tenderly to Technology (5 answers total)
 
Looks like you can use HTML Help Viewer to extract the CHM contents which should be a bunch of HTML pages, basically.
posted by jimw at 9:22 PM on November 11, 2021 [1 favorite]


if i'm reading correctly, then CHM is compiled HTML- it's basically a zip file of a website using LZX. i downloaded a sample chm file and was able to open it from 7zip.

from there you're likely dealing with an html file for each quote along with all the html for the header/button/etc. if you don't want to do it one-by-one, you may also be able to get away with combining all the html files and doing a search/replace in notepad or word to remove the extra html. otherwise you'll need use some sort of screen scraper or write code to pull out the quotes.
posted by noloveforned at 9:24 PM on November 11, 2021 [1 favorite]


Calibre will convert chm to text
posted by soelo at 9:30 PM on November 11, 2021 [1 favorite]


Response by poster: Thanks for your replies, folks!

Realizing there was an .htm file in there that I could access via 7zip, I went ahead and did so, opened it in FF, and since what it showed was the same page with one quote and the "Another?" button, I opened the WD to view Page Source, and there were all the quotations!

(I'll try Calibre, though, to see if it gives me a more eye-pleasing result!)
posted by tenderly at 9:39 PM on November 11, 2021


It's likely it is using some Javascript to randomly display a quote from the array. You may have to extract the content separately by piping it through some sort of editor and some regex or hefty search and replace.
posted by kschang at 12:39 AM on November 12, 2021


« Older Restaurant rec in Tempe or Phoenix for dinner out?   |   Help me understand my Akai Pro midi keyboard and... Newer »
This thread is closed to new comments.