How can I create a Word document from a Markdown file--but keep tabs?
February 26, 2017 4:39 AM   Subscribe

I need to write Microsoft Word documents with tab characters in the text. Text files are so much easier to handle, and I've been grown accustomed to writing in markdown. Pandoc converts markdown to .docx very well--but converts all tabs to spaces! Is there anything I can do to write in a text file, then be able to convert to Word and back, while preserving the tab characters?

In Pandoc, the "--preserve-tabs" option is close, butnot quite right, because that only preserves tabs when they are in code blocks.

I've tried writing in CommonMark, and that keeps the tabs in the .docx files, but when converting back to .txt, replaces all tabs with spaces. Is there anything I can do? Thank you.
posted by surenoproblem to Computers & Internet (11 answers total) 2 users marked this as a favorite
 
Is the conversion to a series of spaces, or a single space? If tabs are converted to say, 4 spaces, then you can do a Find and Replace on 4 spaces inside double quotes "    " and replace with ^t.
posted by SuperSquirrel at 5:01 AM on February 26, 2017


Response by poster: Unfortunately, the tabs are converted into just one space!
posted by surenoproblem at 5:25 AM on February 26, 2017


Then can you do the search and replace by hunting for a paragraph mark (caret-p, I think, but I'm on my phone and can't check) followed by a space and replace that with a paragraph mark followed by a tab?
posted by carmicha at 6:48 AM on February 26, 2017


Best answer: Can you convert the tabs to a different character (or the string "TAB"), then run Pandoc, then search/replace back to a tab?
posted by Mr Stickfigure at 7:28 AM on February 26, 2017 [3 favorites]


What are you using the tab characters for? If it's for table alignment, then you need to convert them to some other table markup. Markdown might not be the best choice for your source format, as it doesn't do tables or xrefs well (if at all: passing through HTML is avoiding the problem).

If you're using tabs for paragraph indentation, then you need some kind of stylesheet management.

Almost every markup language assumes that tabs, spaces and single newlines can be smooshed together into a single space. You can't turn this assumption off with pandoc with tabs in running text.
posted by scruss at 7:48 AM on February 26, 2017 [1 favorite]


I haven't tried it, but can you format the markdown that has tabs as "code" (either indented with a tab or surrounded by ` marks)? Then output to HTML, then put that in Word? I would think that tabs should be preserved in code, but maybe not...

I've never used Pandoc myself, but I also strongly prefer writing in Markdown. When i need the result in Word or whatever, I convert to HTML (where i can also control the font/colors/etc with css) and copy/paste that into Word.
posted by cgg at 9:39 AM on February 26, 2017


If you embed encoded entities in your source text, they will be converted to the equivalent in the output.

So instead of a literal tab character in your markdown, use 	 instead.

Depending on how you are running pandoc on your markdown source, you could probably set up some kind of preprocessing filter or script or something to change literal tab characters to 	 and then you could just type as normal and not worry about it.
posted by roosterboy at 11:46 AM on February 26, 2017


Best answer: Something like this, perhaps:

sed -e $'s/\t/\	/g' tab_test.txt | pandoc -s -S -o tab_test.docx

This will take any tabs in the source file, convert them to 	 and then pass it to pandoc, which will then convert 	 back into tab.

Note that I haven't extensively tested this. It's just something I whipped up quickly to see if it would work.
posted by roosterboy at 11:55 AM on February 26, 2017 [2 favorites]


So, just in case this is an instance of the XY Problem... Could you please explain in some detail why you want to preserve the tabs / what it is you're ultimately wanting to accomplish?

There may be some easier or more natural way to get what you want.
posted by teatime at 12:07 PM on February 26, 2017 [1 favorite]


Response by poster: Thank you for all of the great answers!

The reason I'm trying to preserve the tabs is to be able to write a document that roughly adheres to these formatting guidelines:

https://australianplays.org/assets/images/files/ASC_script_format_example.pdf
posted by surenoproblem at 5:30 PM on February 26, 2017


  … keeps the tabs in the .docx files, but when converting back to .txt …

This makes it a bit more complex: are you, or is someone else, editing the Word document and then you want to convert it back to Markdown and edit it some more? If so, that's a tougher one, as you're trying to parse implicit presentation markup back to document structure and wedge it into a markup language that's really too simple for the task at hand. Usually this kind of conversion is a one way process, a bit like printing. Make the edits in the markup language, then render it to Word or PDF as final.

The dialogue you're trying to present is effectively a series of single row tables with two columns, the speaker and the text.

You can sometimes manage this workflow if you know exactly what word processor is going to be used, and code a conversion/style accordingly. If there are multiple editors, invariably one of them will be using CrappyWord 1.0a (which could be even MS Word 2011 on the same platform, as MS make big internal changes to their code now and again) and all your special style work will be gone.
posted by scruss at 8:22 PM on February 26, 2017 [1 favorite]


« Older Help me deal with anger and resentment towards my...   |   Brazilian lucky bracelets - where to buy in the UK... Newer »
This thread is closed to new comments.