I recently had the pleasure of automating a reporting system for a client that required the use of an existing complex and pre-formatted Excel template with specific page breaks, merged cells, borders, and more. Here I describe how I was able to use Python and Excel together while maintaining the original template structure and formatting.
In this case, the client was using Survey123 for ArcGIS to have field personnel and contractors submit inspection and reading information from an engineered air-sparge system with over 200 air-sparge wells. They had been manually transcribing from field forms prior to this and already had a complex Excel template setup just the way they wanted. Somehow I needed to populate this template with the data from ArcGIS Online after the Survey123 was submitted then e-mail the filled in template and PDF version to the client. Also, the data needed to be inserted into a SQL server for long-term storage and to allow for advanced query operations. All of this seemed easy enough as Python has libraries for AGOL, SQL Server, and Excel. Or so I thought.
So - I get to work writing the Python script that will be used to complete the task. Reading AGOL goes well. Inserting in SQL server and executing stored procedures goes well. Then comes Excel. And it was one of those annoying, almost "I could live with it" issues, except that this was too obvious and too difficult to work-around without losing something else or causing problems even further down the road.
I attempted to use Win32com and xlrd to enter the data into the various cells. That part was fine, however I would lose a lot of the formatting - especially merged cells. No matter what arguments I passed or tried to do, I always lost some part of the formatting in the template.
To solve this issue I used Excel to beat Excel! Instead of writing the data directly to the template I dumped it out into it's own "data" file so that the template was not changed by Python. When I dumped out the data I used specific pre-processing in my SQL and Pandas dataframe to create a column of lookup values. Then in the template I setup a connection to the data file that was set to automatically refresh on open. I then setup formulas in the template cells to retrieve data from the data file (e.g., vlookup or index/match).
Then all Python had to do was open the template file and print to PDF, which Win32com handles very well.
There were also some unexpected benefits of doing it this way. For example, now instead of having to make modifications to the Python script if a cell moved or new data were added, we only needed to modify the template file. This also gave the client a lot more power to decide on changes without worrying they would break the Python script. Win-Win!