Application to Generate Word documents
March 24, 2020 8:36 AM   Subscribe

I create lots of documents by taking an old document and changing a few things in it. What's the best option to automate this?

I've done it with spreadsheets and Word's mail merge function and found it not that great. I'm open to learning a little bit of coding if needed. I use Windows, but something web-based would be cool too. Willing to pay for something good for this.
posted by Xalf to Technology (4 answers total) 1 user marked this as a favorite
I'm unsure if you need document type conversion. If so, try Pandoc. There is a Windows installer.
posted by mr_bovis at 9:45 AM on March 24 [1 favorite]

If you're up for coding, you should be able to automate this using PowerShell.
posted by sjswitzer at 11:09 AM on March 24

Or possibly the scripting language that comes with Office, VBA.
posted by clew at 11:22 AM on March 24 [1 favorite]

Can you give some examples of the changes you want to make? Is it simple text substitution? More complex conditionals? Is it text diffs only, or conditionally included paragraphs? Conditional formatting?

What about mail merge did you find not that great?

How do you define "automate"? I mean, you've got to specify what changes you want to make, right? That seems interactive, non- automatable, unless it's triggered by some data that's periodically produced, like a weekly report, or data dump.

Pandoc is great (yeah, Haskell), even if you don't need document conversion. You can read docx to an AST (abstract syntax tree), manipulate that, and write it out to docx. Look at the -F and -S options.

Note that docx files are zip files, so you can take a docx file, make a copy of it, rename that copy to a .zip extension, extract that zip file to a folder, and look at the folder, and in particular, word/document.xml. You could manipulate that file and zip everything back up.

Another approach would be to do it all in rtf files. Back when xml hadn't infected the world, and .doc was the norm, I used rtf files (rich text format), which is an older Microsoft format that's plain text and therefore manipulable by many tools, such as awk, grep, sed. It's also supported by Wordpad, which is on every Windows box (MS Word isn't). It's also understood by google docs, LibreOffice, etc. (as is docx mostly).
posted by at at 12:47 AM on March 25 [1 favorite]

« Older We can't produce [anything/food] unless we produce...   |   How dangerous is it to buy a third-party laptop... Newer »

You are not logged in, either login or create an account to post comments