Automating monkeywork in Excel and Word
January 6, 2009 1:15 PM
Automation Filter: How can I automate the insertion of data from a spreadsheet into a Word report template?
My glorious analysis job has a hell of a lot more reporting in it than I expected - way more spreadsheet-to-Word than I learned how to deal with in academia. Example: quarterly survey results that need to be reported on. One report needs responses overall and broken down by respondent category. The next wants to compare all respondents from the entire time that the reports have been going. The final set of reports compares respondents from a particular quarter across each year.
My predecessor created elaborate Excel spreadsheets to calculate mean values, etc, and then copy/pasted these hundreds of values by hand into Word. I don't even have the data for the old surveys - just the reports. So for the longitudinal reports, I have to copy the old counts, add to the new, and then recalculate the percentages.
This all strikes me as insanely primitive, and a huge waste of my time. This is what I envision: the (web-administered) survey results are given a unique ID, dated, and appended to a file of all past respondents. Once the survey is closed for the quarter, I tell the robots to hit it. They mercilessly plug new dates and updated numbers into the Word reports and save them to an appropriate folder. My sole acts: skim for errors, press send on an email, laugh diabolically at what my legion of electronic minions hath wrought.
I can think of a number of ways to do this: I'm sure there is some third-party stuff that I could buy, I could write a program to produce a LaTeX file that publishes to PDF, there's probably an official Microsoft protocol, etc. I would love to program this myself, and I would get major credit from the boss for doing it for free in this time of budget crisis. But I haven't actually DONE anything along those lines. So what I would like to know is: how would YOU do this, preferably on the cheap?
I'm familiar with Visual Basic .NET, I'm learning Python, and I'm quite comfortable in R. I'm happy to learn something new, though, especially because I have a real problem to test it on. I've got Office 2007 and Adobe Acrobat in full.
Thanks in advance, Mefites. I only hope that I might someday rock as hard as you.
My glorious analysis job has a hell of a lot more reporting in it than I expected - way more spreadsheet-to-Word than I learned how to deal with in academia. Example: quarterly survey results that need to be reported on. One report needs responses overall and broken down by respondent category. The next wants to compare all respondents from the entire time that the reports have been going. The final set of reports compares respondents from a particular quarter across each year.
My predecessor created elaborate Excel spreadsheets to calculate mean values, etc, and then copy/pasted these hundreds of values by hand into Word. I don't even have the data for the old surveys - just the reports. So for the longitudinal reports, I have to copy the old counts, add to the new, and then recalculate the percentages.
This all strikes me as insanely primitive, and a huge waste of my time. This is what I envision: the (web-administered) survey results are given a unique ID, dated, and appended to a file of all past respondents. Once the survey is closed for the quarter, I tell the robots to hit it. They mercilessly plug new dates and updated numbers into the Word reports and save them to an appropriate folder. My sole acts: skim for errors, press send on an email, laugh diabolically at what my legion of electronic minions hath wrought.
I can think of a number of ways to do this: I'm sure there is some third-party stuff that I could buy, I could write a program to produce a LaTeX file that publishes to PDF, there's probably an official Microsoft protocol, etc. I would love to program this myself, and I would get major credit from the boss for doing it for free in this time of budget crisis. But I haven't actually DONE anything along those lines. So what I would like to know is: how would YOU do this, preferably on the cheap?
I'm familiar with Visual Basic .NET, I'm learning Python, and I'm quite comfortable in R. I'm happy to learn something new, though, especially because I have a real problem to test it on. I've got Office 2007 and Adobe Acrobat in full.
Thanks in advance, Mefites. I only hope that I might someday rock as hard as you.
There are several modules on the CPAN that will move data in both directions between a Perl program and Excel and Word documents -- for example, ParseExcel.pm and Writer.pm. The search page would be a good place to find others.
posted by harmfulray at 7:40 PM on January 6, 2009
posted by harmfulray at 7:40 PM on January 6, 2009
I do this twice a year on a very large scale -- think 80 different Word documents (300+ pages each reporting on five years' worth of data on 40 different outcome measures at about 15 different levels of demographic detail for every county, region and class in Pennsylvania) populated almost entirely with tables populated directly from Excel data (the Excel files being generated from data in a SQL Server database).
What I inherited when this first became my responsibility was a messy hodgepodge of Access, Excel, SQL Server, and even some VBScript. We now have a new data management suite (written as a desktop application in .NET) that does it all pretty seamlessly, making heavy use of the Microsoft.Office.Interop namespace.
I've done a lot more work since then on automating data conversion and retrieval, as well as custom document creation using Visual Studio and the Office,Interop namespace. Personally, I'd recommend keeping the number of tools/formats you have to use to a minimum, unless you can be sure that your successors will have the same level of familiarity with them as you. It's going to be a lot easier for someone to maintain this if it's all done in a single environment than if you have a ten-page HOWTO that uses six different technologies to parse and convert the data through three or four different document formats.
If you're interested in some of the more nitty-gritty details, email's in profile.
posted by Doofus Magoo at 4:57 AM on January 7, 2009
What I inherited when this first became my responsibility was a messy hodgepodge of Access, Excel, SQL Server, and even some VBScript. We now have a new data management suite (written as a desktop application in .NET) that does it all pretty seamlessly, making heavy use of the Microsoft.Office.Interop namespace.
I've done a lot more work since then on automating data conversion and retrieval, as well as custom document creation using Visual Studio and the Office,Interop namespace. Personally, I'd recommend keeping the number of tools/formats you have to use to a minimum, unless you can be sure that your successors will have the same level of familiarity with them as you. It's going to be a lot easier for someone to maintain this if it's all done in a single environment than if you have a ten-page HOWTO that uses six different technologies to parse and convert the data through three or four different document formats.
If you're interested in some of the more nitty-gritty details, email's in profile.
posted by Doofus Magoo at 4:57 AM on January 7, 2009
I agree that you could probably use mailmerge, though I haven't done anything as complicated as that in it (e.g., spanning multiple tables, etc.).
However, since you mentioned comfort with R and some familiarity with LaTeX, I'd suggest also looking into Sweave - if you're not familiar with it, it allows you to seamlessly combine R & LaTeX code into one document. You don't even have to reinvent the Excel bits if you don't want to - just write code to read what you need into dataframes using read.xls (package:xlsReadWrite) and you're good to go. Look into the latex function (package:Hmisc for fancy, or package:xtable for basic) to format tables for pretty display (warning: some futzing involved to figure out how to get things looking right). If the report has narrative that needs to include single values from your data output in the text, Sweave can do that too.
posted by shelbaroo at 5:12 AM on January 7, 2009
However, since you mentioned comfort with R and some familiarity with LaTeX, I'd suggest also looking into Sweave - if you're not familiar with it, it allows you to seamlessly combine R & LaTeX code into one document. You don't even have to reinvent the Excel bits if you don't want to - just write code to read what you need into dataframes using read.xls (package:xlsReadWrite) and you're good to go. Look into the latex function (package:Hmisc for fancy, or package:xtable for basic) to format tables for pretty display (warning: some futzing involved to figure out how to get things looking right). If the report has narrative that needs to include single values from your data output in the text, Sweave can do that too.
posted by shelbaroo at 5:12 AM on January 7, 2009
My approach would be to drive from Excel to Word, rather than using Word to extract from Excel. (And rather than using a third application.)
If you are au fait with VB.Net that VBA should be easy to pick up. Within VBA, in Excel, you can open a Word object using OLE (either invisibly or visibly), dump the data in there and produce your report i.e. ready to email with no need to edit in Word.
So you would start off with Set objWord = CreateObject() and go on from there. If you need more pointers on this approach post back.
posted by NailsTheCat at 12:57 PM on January 7, 2009
If you are au fait with VB.Net that VBA should be easy to pick up. Within VBA, in Excel, you can open a Word object using OLE (either invisibly or visibly), dump the data in there and produce your report i.e. ready to email with no need to edit in Word.
So you would start off with Set objWord = CreateObject() and go on from there. If you need more pointers on this approach post back.
posted by NailsTheCat at 12:57 PM on January 7, 2009
Thanks, everyone. This is a great variety of responses. I'm going to tinker around with your ideas a bit and will return to let you know how things go (and to mark best answers, I guess, but they all seem good).
If you have any more ideas, or if you're new to this question, feel free to comment. I'll be keeping an eye on it.
posted by McBearclaw at 2:49 PM on January 7, 2009
If you have any more ideas, or if you're new to this question, feel free to comment. I'll be keeping an eye on it.
posted by McBearclaw at 2:49 PM on January 7, 2009
Update: Sweave is awesome.
A trio of pointers for those who follow:
1) Do not attempt to adjust your margins without the geometry package
2) Standard Sweave output looks like hell. Read this article by Ross Ihaka, co-creator of R, to improve it.
3) Most of the introductory materials on Sweave focus on getting straight R output into the document. \Sexpr{} is your friend for just inserting values into text - e.g. "The mean response to Question 13 is \Sexpr{mean(data$Q13)}."
I tried shelbaroo's suggestion first because of my familiarity with R and desire to learn LaTeX. I'm sure the other solutions work, too; I just didn't get to them before Sweave rocked my face off. If you don't have any familiarity with the methods, though, I'd recommend it - LaTeX is honestly not that hard to learn if you have any grasp of how HTML works, and R is pretty straightforward until you get into the optional packages.
Thanks again, everyone.
posted by McBearclaw at 10:36 PM on January 12, 2009
A trio of pointers for those who follow:
1) Do not attempt to adjust your margins without the geometry package
2) Standard Sweave output looks like hell. Read this article by Ross Ihaka, co-creator of R, to improve it.
3) Most of the introductory materials on Sweave focus on getting straight R output into the document. \Sexpr{} is your friend for just inserting values into text - e.g. "The mean response to Question 13 is \Sexpr{mean(data$Q13)}."
I tried shelbaroo's suggestion first because of my familiarity with R and desire to learn LaTeX. I'm sure the other solutions work, too; I just didn't get to them before Sweave rocked my face off. If you don't have any familiarity with the methods, though, I'd recommend it - LaTeX is honestly not that hard to learn if you have any grasp of how HTML works, and R is pretty straightforward until you get into the optional packages.
Thanks again, everyone.
posted by McBearclaw at 10:36 PM on January 12, 2009
This thread is closed to new comments.
posted by Pants! at 1:35 PM on January 6, 2009