Tips for working with Excel as an API?
December 11, 2020 12:49 AM Subscribe
I recently was sought out by a company, looking to build an Excel "microservice." Basically they want to feed data raw data in and have a proper Excel document binary come out. The CSV won't cut out the client has specific requirements. Some are simply color and branding. Others are having filter drop down functionality set. They want it to work in the latest version of Excel, for now, they did not think that Teams and Web might not behave the same. There's a lot out there like Rust Chamomile but I'm more interested in the process and pitfalls. Scale is not an issue.
An excel plugin won't work. Their business model is they have sophisticated backends producing results, for whatever reason. They're a bit afraid of some simpler node modules might not be up to task or being quirky and looking to get low-level, I can't explain much but here's the workflow:
1, Company spends months analyzing a business. They create a framework, that among other things, is a bespoke Excel (and Powerpoint), this allows upper management as one of many ways to track and enact the changes in the organization. Assume that all excel docs are tailored to the company and to an extent the CEO. That's not my concern that's a data issue.
2. They're looking beyond simply export as CSV. They have good reasons for this not the least is branding, They immediately want this filtering capability but will want more. I think the larger roadmap is that calculated fields are generated in a non-Excel format and then trnaspiled from Rust/Go/Javascript/C++ to a Excel standard formulate but this is future state, In the end they want to provide a standard Excel document with no plugins that looks handmade and can exist with their support.
I'm going to ask some basic questions:
1. Is the VB API well defined that you can build off of it?
2. Regardless of legality is decompiling to see how things work a mess or just a thing you do?
4. Since this is an executable it gets tricky to deploy but say we got it in a container and ignored licensing is that the best way to have an "Excel service," where data goes in and excel comes out?
5. Again not looking to create plugins a vanilla XLSX with data and some calculated fields available,
I imagine the work could get tedious but given the other libraries this can't be impossible. I kind of of like the challenge,. How possible is this? Has someone tried to do something like this and Microsoft products are such a chore to work through doing something simple is difficult?
An excel plugin won't work. Their business model is they have sophisticated backends producing results, for whatever reason. They're a bit afraid of some simpler node modules might not be up to task or being quirky and looking to get low-level, I can't explain much but here's the workflow:
1, Company spends months analyzing a business. They create a framework, that among other things, is a bespoke Excel (and Powerpoint), this allows upper management as one of many ways to track and enact the changes in the organization. Assume that all excel docs are tailored to the company and to an extent the CEO. That's not my concern that's a data issue.
2. They're looking beyond simply export as CSV. They have good reasons for this not the least is branding, They immediately want this filtering capability but will want more. I think the larger roadmap is that calculated fields are generated in a non-Excel format and then trnaspiled from Rust/Go/Javascript/C++ to a Excel standard formulate but this is future state, In the end they want to provide a standard Excel document with no plugins that looks handmade and can exist with their support.
I'm going to ask some basic questions:
1. Is the VB API well defined that you can build off of it?
2. Regardless of legality is decompiling to see how things work a mess or just a thing you do?
4. Since this is an executable it gets tricky to deploy but say we got it in a container and ignored licensing is that the best way to have an "Excel service," where data goes in and excel comes out?
5. Again not looking to create plugins a vanilla XLSX with data and some calculated fields available,
I imagine the work could get tedious but given the other libraries this can't be impossible. I kind of of like the challenge,. How possible is this? Has someone tried to do something like this and Microsoft products are such a chore to work through doing something simple is difficult?
In my experience, generating xlsx files with the python library XlsxWriter is pretty easy and covers a lot of your use cases. I'm sure there are a ton of C# libraries as well, as this is a pretty common business thing to do.
posted by samj at 2:07 AM on December 11, 2020 [3 favorites]
posted by samj at 2:07 AM on December 11, 2020 [3 favorites]
Harnessing ancient editions of Excel in VM's is a business model that has significant customer-base hampered only by finding and buying old licence codes for each instance in each VM and using those to keep MS' lawyers from shutting you down.
You can bind the Office assemblies in dotNET and create objects that you save as Excel XSLX -- explore the API's here.
Or you could capture Excel in a Windows VM and drive the windows with AutoHotkey. There's also LibreOffice, which has a cloud-hostable edition or local edition you could drive via automation and accessibility tools, that you could harness to spit out xlsx files -- at the risk it's not exactly the MS implementation your customer wants.
In cloud-centric world, there's also Office 365 API's (eg worked example) which you might be able to wrap.
posted by k3ninho at 2:09 AM on December 11, 2020
You can bind the Office assemblies in dotNET and create objects that you save as Excel XSLX -- explore the API's here.
Or you could capture Excel in a Windows VM and drive the windows with AutoHotkey. There's also LibreOffice, which has a cloud-hostable edition or local edition you could drive via automation and accessibility tools, that you could harness to spit out xlsx files -- at the risk it's not exactly the MS implementation your customer wants.
In cloud-centric world, there's also Office 365 API's (eg worked example) which you might be able to wrap.
posted by k3ninho at 2:09 AM on December 11, 2020
I’ve used aspose before successfully. Aspose.cells in this case. It’s available in several languages. It is not a free product.
posted by gatorbiddy at 7:51 AM on December 11, 2020
posted by gatorbiddy at 7:51 AM on December 11, 2020
I have to get ready for a vacation but just had to drop in quick to offer my two cents: the company I work for made use of C#-driven excel instances to convert data to XSLX in certain use cases, and did everything they could to get away from that, it was horrible. Some issues:
Unexpectedly large licensing cost issues.
If the customer data has errors, or your code has errors, the Excel API will often come back with extremely opaque error codes that have been reused for multiple things over the decades, making diagnosis an tremendous chore. By the end it felt like Microsoft never really wanted their office suite to be API-driven, and that's probably true.
XslxWriter was so close to good enough we tried really hard to make that our primary path, eventually succeeding. It was worth it.
Although aspose also strikes a chord, I've since moved on from the team that had to deal with that stuff, but they may have switched again or at least investigated it.
Tl;dr it would certainly be a challenge but I would predict it to have significant frustration and tedium.
posted by traveler_ at 8:00 AM on December 11, 2020
Unexpectedly large licensing cost issues.
If the customer data has errors, or your code has errors, the Excel API will often come back with extremely opaque error codes that have been reused for multiple things over the decades, making diagnosis an tremendous chore. By the end it felt like Microsoft never really wanted their office suite to be API-driven, and that's probably true.
XslxWriter was so close to good enough we tried really hard to make that our primary path, eventually succeeding. It was worth it.
Although aspose also strikes a chord, I've since moved on from the team that had to deal with that stuff, but they may have switched again or at least investigated it.
Tl;dr it would certainly be a challenge but I would predict it to have significant frustration and tedium.
posted by traveler_ at 8:00 AM on December 11, 2020
Why not use an ETL software? If cost is an issue, both Talend and Pentaho have free open-source version, and if you use MS SQL Server in-house, I think SSIS comes more or less free.
posted by snakeling at 9:49 AM on December 11, 2020
posted by snakeling at 9:49 AM on December 11, 2020
Can you create a template spreadsheet in Excel, where everything is dynamically calculated from data sheets within using formulas? Then whenever you need to generate a excel file, you can overwrite with new data, and everything would update automatically when it's opened?
Or if more complex than something that can be done with dynamic formulas (but you can doa lot with them), have VBA in the file that runs on open to set filter values, refresh pivots, etc?
posted by Boobus Tuber at 10:05 AM on December 11, 2020
Or if more complex than something that can be done with dynamic formulas (but you can doa lot with them), have VBA in the file that runs on open to set filter values, refresh pivots, etc?
posted by Boobus Tuber at 10:05 AM on December 11, 2020
When I had to do this a couple jobs ago SSIS was exactly it. I found the whole thing to be a bit more Microsoft-y than I liked for a process that needed to perform the same set of steps for a whole bunch of data sets. I don’t really like MS’s task scheduler and the output formatting took a lot of trial and error, but the end result was about 50 times more maintainable than the hodgepodge it replaced, and it turned out that execution time went way down as a result.
posted by fedward at 10:18 AM on December 11, 2020
posted by fedward at 10:18 AM on December 11, 2020
« Older Things that look like a part but are actually a... | When is the morally right time to disclose an... Newer »
This thread is closed to new comments.
- package structure
- spreadsheet content
posted by crocomancer at 1:19 AM on December 11, 2020