Converting LaTex to HTML or Word
October 2, 2014 6:00 PM   Subscribe

I have a complex document in LaTex/PDF format that I want to make available to others in a more accessible Word or HTML format.

I have been involved with the creation of a complex LaTeX file:

LaTeX: http://toronto350.org/divest/fossil-fuel-divest.tex

PDF: http://toronto350.org/divest/fossil-fuel-divest.pdf

The document is of use to people at a number of other schools, but they are frustrated by the way in which it is difficult to copy and paste from the PDF that LaTeX generates.

Has anyone found a good way to convert a LaTeX file into Word or HTML format? There seem to be some projects around the web, but a much more code-savvy member of the team behind this project has looked into them and not yet found anything that works well.
posted by sindark to Computers & Internet (11 answers total) 1 user marked this as a favorite
 
There seem to be some projects around the web, but a much more code-savvy member of the team behind this project has looked into them and not yet found anything that works well.

It would be helpful if you were to provide more detail about the software your organization has already evaluated and why it won't work.
posted by jingzuo at 6:24 PM on October 2, 2014


Pandoc is probably the best of the lot, but someone will likely have to go through and clean up its output a bit, no matter what output you pick.

Basically, this is not something I would for 190 pages. To be honest, I'm baffled that anyone thinks this ought not to be distributed as a PDF. (What do you want people to be able to cut and paste? They can certainly cut and paste text from a PDF. You could just make the figures available separately on the website in case anyone wants to reuse them.)
posted by hoyland at 6:32 PM on October 2, 2014 [1 favorite]


I have had great luck with latex2rtf, even with complicated citations.
posted by supercres at 6:34 PM on October 2, 2014


When seated the Mac App store with "extension:tex" there were a number of apps that came up to convert this format.
File View Pro seems to do the job very well.
posted by Mac-Expert at 6:57 PM on October 2, 2014


I have not dealt with LaTeX, but I have been using Pandoc a bit, and I agree that it seems like the right tool for this job.
posted by adamrice at 7:39 PM on October 2, 2014


I think that going through PDF is a bad idea. htlatex is probably already part of your distribution; try that to go straight to HTML from the tex file.
posted by supercres at 7:49 PM on October 2, 2014 [2 favorites]


htlatex seems to work fine (if not as pretty), though you'll have to disable the fontspec package in your file. I don't use lua or xetex regularly, but htlatex doesn't seem to work with them.
posted by supercres at 9:17 AM on October 3, 2014


Have you tried and rejected LaTeX2HTML already? That is (used to be?) the canonical solution when you already have the input TeX files.
posted by RedOrGreen at 10:59 AM on October 3, 2014


I know this isn't directly answering your question, but I can cut & paste text text from your pdf with all formatting intact when cutting from Adobe Acrobat, but if I cut the text from the Firefox built in pdf reader the output is a mess. If that's the source of the problem could you just ask your readership to open in the document in Acrobat?

(For some reason, if I generate a fresh pdf from your tex file on my machine, firefox does a little better in that it no longer smooshes words together without spaces between them when it copies the text, but it's still not as good as Acrobat, which manages to preserve pretty much all the formatting presumably because it's cutting rtf rather than raw text.)
posted by pharm at 12:40 PM on October 3, 2014 [1 favorite]


Everything I've heard is that LaTeX to HTML is a lot safer then LaTeX to Word. I'd suggest looking around at http://tex.stackexchange.com/, they are very helpful and a lot of package authors hang out there.

(Also as a note, it should be 2 ̊C, not 2 ̊C. There is always a space before units.)
posted by Canageek at 1:56 PM on October 3, 2014


Response by poster: Thank you all for your suggestions.

I know this isn't directly answering your question, but I can cut & paste text text from your pdf with all formatting intact when cutting from Adobe Acrobat, but if I cut the text from the Firefox built in pdf reader the output is a mess. If that's the source of the problem could you just ask your readership to open in the document in Acrobat?

This is an especially great help, and may actually eliminate the need to convert the PDF in the first place. I am also going to track down someone who has the full version of Acrobat to try to use the built-in PDF to Word tool.

I will also give some of the other methods listed a try. I am a big fan of LaTeX myself and intend to work with it a lot in the future (including for my PhD dissertation). Being able to easily export HTML or Word files would be useful for collaboration, when it comes to people unwilling to work with LaTeX code and unhappy to annotate PDFs.

Thanks!
posted by sindark at 8:40 AM on October 7, 2014


« Older can random people view my stuff on facebook...   |   Why are there HEB-brand digestives? Newer »
This thread is closed to new comments.