How to merge data fields into a plain text file?
November 25, 2009 8:04 AM   Subscribe

Is there some way to do a "mail merge" type operation on plain text, xml or html files without using a word processor?

I've been working on our company wiki and I want to bulk create a bunch of pages by merging certain field data from spreadsheets, csv files or other datasources into the template text files that I've created. The output should be XML or plain text.

I have a separate task where I have a simple html template that I'd like to populate with data pulled from a csv. The output needs to be standard, simple HTML (no funny MS Word markup)

I have access to MS Office, Open Office and Abiword but everything I've tried so far seems to "pollute" my files with word processor type formatting. (I am running KDE on Ubuntu 9.10, but I also have Win XP running as a virtual machine under Virtualbox - I have access to any version of the MS Office software, and obviously anything open source).

I'm sure there's some open source tool designed to merge text files. I'm not opposed to using a command line interface although I probably would prefer a GUI. My Google-fu is failing me on this one.

Any tips on how I should achieve this, or pointers to the tools I should be looking at?

posted by geekgirl397 to Computers & Internet (11 answers total)
Response by poster: oh, I should mention that I am not a programmer, but I would be willing to try script-based solutions if they are straightforward enough.
posted by geekgirl397 at 8:15 AM on November 25, 2009

Best answer: I think you are going about your problem the wrong way: most people would use a language like PHP to generate the pages dynamically rather than generating every page from the static data in advance. Your second task in particular is a textbook example of what PHP was created for. If you are reasonably familiar with HTML it's not that hard to get into, but might be too much for you if you have zero programming experience.

If you still want to generate static pages from plain text files + templates, look into the Perl Template Toolkit, which is a great system overall but might require jumping through too many hoops in your scenario.
posted by Dr Dracator at 8:41 AM on November 25, 2009

Your job is complex and specific to your situation. I doubt there is a general tool out there.

That said, doing that is about 100 lines of python. Considering you are not a programmer, I'd say find one at your company or find a friend. If they're good at what they do, little tasks like this can be fun - meaning they'd do it for you
posted by phrakture at 9:31 AM on November 25, 2009

FWIW, you could probably elance this for a hundred bucks or so.
posted by jenkinsEar at 9:33 AM on November 25, 2009

If the input is CSV, then it is very easy to use awk to generate whatever you are making.

As far as merging text files go, read the manual pages for cut, join, and sort. These will work on CSV files. You are on Unix so all the tools you need are already there.

Writing awk scripts is easy, but you will have to set aside some time to learn how. I have Googled "awk tutorial" before and come up with lots of easy to read stuff; also, GNU Awk comes with a great user guide.
posted by massysett at 10:09 AM on November 25, 2009 [1 favorite]

Oh, and I can't think of a GUI program that would do this at all.

For your second task you can try using M4 rather than PHP; M4 is an old Unix tool which will be available in Ubuntu (though you might have to apt-get it.) GNU M4 has a good manual. You'd probably need to write some shell scripts to glue it all together though.

If this sounds suspiciously like programming, that's because it really is basic programming...sorry :)
posted by massysett at 10:14 AM on November 25, 2009

I would do this via a Python script, but I can't really give you much more guidance than that. It is probably also possible to do some of what you want with XSL, but there is little you can do with XSL that you can't do in Python, and Python gives me less of a headache.

The Text Processing in Python ebook is a good read, although it assumes some general knowledge of Python that you may not have. For that, you might want to consult the excellent (official) Python Tutorial.

There might be some GUIs to something like m4, but I've never seen one. You're definitely in custom-script territory, but it's a pretty easy task as programming tasks go.
posted by Kadin2048 at 10:52 AM on November 25, 2009

Response by poster: I had always been meaning to make a serious effort at starting to program by writing some simple scripts. Looks like now's the time.

Not sure whether to start with Perl, Python or PHP (or .. Ruby?). Python seems to be a popular choice.

I'm surrounded by programmers and work and at home, but they tend to be so busy - I could end up waiting in their "queue" for longer than it would take for me to learn to do this myself.

Thank you so much for all your answers so far.
posted by geekgirl397 at 12:11 PM on November 25, 2009

Perl, Python, PHP, and Ruby all have their places, but I'd recommend starting with Python. It is a very popular language, it's very popular in particular with beginners and has lots of tutorials, and the language is designed to try and push you towards doing things "the right way" when it can. The documentation is also extremely good. (And that text processing book mentioned in my earlier comment has helped me out quite a bit.)

There's nothing wrong with Perl (the desire to snark is so strong...), and for text processing it is very powerful. But the syntax is very compact and I've just always found it to have a steeper learning/relearning curve for the occasional user than Python.

PHP would be a fine choice as well, especially for generating HTML (as someone upthread pointed out). The only reason I'm a bit more wary of PHP than I'd otherwise be, is that there's a ton of very bad sample code floating around. Granted, you can find bad examples in Python, Ruby, or Perl, but I've just seen more of them in PHP. If you do go the PHP route, try to stick to tutorials and be careful of junk that gets posted to messageboards. (This is a bigger concern when you're writing web apps or CGIs where there are possible security implications, but it can just lead to general bad habits even in offline programs.)

I'm not familiar enough with Ruby to really speak to it. Most people I know who use Ruby do so because they're using a big web-app framework that's written in it, and there are some advanced features of Ruby that are (depending on who you ask) improvements over Python. I think it also has built-in regexps, as opposed to getting them from an include. But I've never seen a hugely compelling reason to switch for my little text processors and other one-off utilities. The clincher for me is that Python is installed by default on more systems that I work on, or easier to talk admins into installing. So you should certainly look into it if you're interested; people are certainly doing neat stuff with it these days.

Whichever you choose, I think you have a good initial project to cut your teeth on. (And if you really want to compare two languages, try writing the same utility in one and then in the other. I've done this a few times with Java and Python and it's an interesting experience.)
posted by Kadin2048 at 12:37 PM on November 25, 2009

If you want to pick up a general purpose tool , Python is the popular answer these days. Perl is better suited to the quick and dirty, though it's a rich, mature language by now. Text manipulation is both ridiculously powerful and easy in perl, but I wouldn't suggest it as a first language as it might encourage sloppy programming practice.

PHP is for web development, good for the task at hand but should not really be used for general purpose work. Ruby is nice and easy to learn, but its seems to have lost some momentum recently.
posted by Dr Dracator at 12:41 PM on November 25, 2009

Have a look at Useful File Utilities - specifically the "Batch Replacer" Utility.
posted by bigmusic at 2:41 PM on November 25, 2009

« Older Articles on law serving specific factions of...   |   Home invasion - best security? Newer »
This thread is closed to new comments.