Programmatically generating Excel files
November 28, 2018 3:07 PM   Subscribe

How can I programmatically generate Excel spreadsheets? I know about generating a CSV and importing it, but I need the resulting spreadsheet to use Excel-only features not available in CSV.

I often need to generate Excel spreadsheets from script output.

These spreadsheets use a lot of Excel's features. They don't just present static data. They will be interacted with by my coworkers, and they include formulas, conditional formatting, etc., to enable that interaction. (But, in case it matters, they don't include any plots or other graphics.)

Right now, my scripts write CSVs, I import them into Excel, and I add all the fancy stuff by hand. I'd like to be able to programmatically generate the Excel file directly, fancy stuff and all, rather than sit down and add it by hand each time.

Barring that, I'd like to programmatically generate some intermediate format that's richer than CSV, such that when I import it into Excel, the fancy stuff comes along.

Ideally, I'd like to do all this in Python, or using command-line tools that can be invoked from a Bash script. But I'd even consider learning a new language for this if there was a sufficiently good solution in that language.
posted by nebulawindphone to Computers & Internet (10 answers total) 10 users marked this as a favorite
XLXSWriter does formulas, conditional formatting -- good enough?
posted by clew at 3:13 PM on November 28, 2018 [3 favorites]

There's also openpyxl which can use another Excel file as a template.
posted by RobotVoodooPower at 3:31 PM on November 28, 2018 [2 favorites]

You may want to look at learning VBA which is baked in to Excel and is pretty powerful. It's been a while since I've done much with it but I think you can call python scripts with it.
posted by backseatpilot at 3:39 PM on November 28, 2018 [2 favorites]

If you're handy with perl, there's a .pm for that.
posted by hanov3r at 3:57 PM on November 28, 2018 [2 favorites]

I used XLSXWriter to create and format some workbooks and that was pretty easy to get working. But I'm looking into using either openpyxl or win32com to write data into a template (with pivot tables, graphs, etc.).
posted by bread-eater at 5:10 PM on November 28, 2018

I've been in the same spot twice in the past, at two different jobs.

The first time (about 10 years ago) I generated the data in Python, output it into a CSV with some custom annotations for styling and things, then processed that into Excel with Apache POI using Jython. The POI API is extremely verbose and somewhat fiddly, but you can do pretty much anything in it. The translation layer in between tended to get ugly as I added features to support different use cases, but it did work well. You do have to have Java installed, but all the code you write is Python.

The second time (maybe 4 years ago) I used Ruby with axlsx for both pulling data and formatting. The axlsx API is almost a pleasure to use - probably about as good as it gets for Excel sheet generation. If you're comfortable with Ruby, this is a really nice way to do it, and axlsx supports almost as many Excel features as POI.

I've also tried using COM automation to control Excel. I would definitely not try this for any real work. It's so fragile, the API is brutal, you have nasty timing dependencies and all kinds of crap. I would sooner create a VBA macro to format my CSV, and that's saying a lot.

Looking at other folks' posts here, if I had to do it today, I'd probably try XlsxWriter - it looks pretty full-featured and nice to use. But there are definitely a few workable ways to go.
posted by pocams at 6:43 PM on November 28, 2018 [1 favorite]

Powershell is another good tool for this sort of thing. Windows created it to work with Windows stuff especially.
posted by irisclara at 7:25 PM on November 28, 2018

Right now, my scripts write CSVs, I import them into Excel, and I add all the fancy stuff by hand. I'd like to be able to programmatically generate the Excel file directly, fancy stuff and all, rather than sit down and add it by hand each time.

If that were my use case, the path that I would perceive as that of least resistance would be to use Excel itself to prepare a template workbook containing the formulas and formatting I needed, in which one of the worksheets was named with an Import tab and only ever used by clearing it completely and then pasting in CSV data generated from my script. Or if I wanted to avoid even the copy and paste step, so that my Excel spreadsheet could just pop out at the end of an automated build process, I'd use cross-worksheet references to a CSV instead of pasting it into a sheet.

If I actually needed script-generated formulas I'd activate them using the EVALUATE() trick.
posted by flabdablet at 11:21 PM on November 28, 2018 [2 favorites]

I've used both xlsxwriter and openpyxl with good results, including formatting, freezing the top row, setting cell types, setting up input validation, etc. Assuming you're generating xlsx files and not old-style xls files, they do basically everything Excel supports. Both are well-documented and pretty stable.

openpyxl also supports reading Excel files, so if you need a complete read/write solution, I would recommend it.
posted by jedicus at 7:56 AM on November 29, 2018 [1 favorite]

As a programmer, I created spreadsheets with formulas, etc using Visual Studio/C# with Interop.Excel. Googling, I see it's not the only option.
posted by SemiSalt at 11:29 AM on November 29, 2018

« Older How do I safely dispose of acetone-soaked rags?   |   how calculate cost per home of road paving in... Newer »

You are not logged in, either login or create an account to post comments