Best practices for Excel workbooks?
November 29, 2016 1:18 PM Subscribe
What are your best practices for error checking when creating and maintaining Excel workbooks?
My work involves a lot of Excel workbooks. Some of these are gigantic (the largest I currently use has 80 tabs) - some of you may feel that I am using an inappropriate tool for some of this work, but assume for the moment that I am constrained in my choice of tool. I try my level best to be careful, methodical and consistent when creating these workbooks but errors do creep in. I have come to realize the importance of creating error catchers and routines to trap these errors, because humans are just inherently fallible. What are some of the best practices you use to ensure your Excel workbooks are as error-free as possible? What do you when you inherit a workbook from someone else to check for errors? One of the things I sometimes find hard to do is to switch between forest and trees mode - I might spend a lot of time checking the minutiae of Excel formulas and miss that the input seems entirely unreasonable and warrants a follow-up with the person who provided the data. What are your tips for error-checking at different levels? Your tips for error checking when there is a substantial amount of VBA involved would also be useful - if you did not write the code yourself, VBA can sometimes be black-boxish and hard to decipher, and can make error checking more challenging. I am keeping this question intentionally fairly broad, so please tell me your Excel wisdom even if you feel it does not exactly pertain.
My work involves a lot of Excel workbooks. Some of these are gigantic (the largest I currently use has 80 tabs) - some of you may feel that I am using an inappropriate tool for some of this work, but assume for the moment that I am constrained in my choice of tool. I try my level best to be careful, methodical and consistent when creating these workbooks but errors do creep in. I have come to realize the importance of creating error catchers and routines to trap these errors, because humans are just inherently fallible. What are some of the best practices you use to ensure your Excel workbooks are as error-free as possible? What do you when you inherit a workbook from someone else to check for errors? One of the things I sometimes find hard to do is to switch between forest and trees mode - I might spend a lot of time checking the minutiae of Excel formulas and miss that the input seems entirely unreasonable and warrants a follow-up with the person who provided the data. What are your tips for error-checking at different levels? Your tips for error checking when there is a substantial amount of VBA involved would also be useful - if you did not write the code yourself, VBA can sometimes be black-boxish and hard to decipher, and can make error checking more challenging. I am keeping this question intentionally fairly broad, so please tell me your Excel wisdom even if you feel it does not exactly pertain.
I've become a big fan of using defined ranges (with the tab name in the range name) in conjunction with xxIFS formulas to both allow flexible querying and QAQC.
Eg, I can have the same variable, let's say "sales", across any number of tabs 'a' through 'f' and I'll set up a defined range a.sales, b.sales... and I also keep a standard starting column that contains my typical search parameters (year, fiscal year, month, weekday) for each tab.
Then you can write stuff like =sumifs(a.sales,a.year,2016,a.month,"November") etc. Having a common nomenclature for fields for all tabs means that it's easy to pull from anywhere within the workbook and not have to check back to a given place to check that you know what you're talking about.
posted by Reasonably Everything Happens at 1:55 PM on November 29, 2016 [3 favorites]
Eg, I can have the same variable, let's say "sales", across any number of tabs 'a' through 'f' and I'll set up a defined range a.sales, b.sales... and I also keep a standard starting column that contains my typical search parameters (year, fiscal year, month, weekday) for each tab.
Then you can write stuff like =sumifs(a.sales,a.year,2016,a.month,"November") etc. Having a common nomenclature for fields for all tabs means that it's easy to pull from anywhere within the workbook and not have to check back to a given place to check that you know what you're talking about.
posted by Reasonably Everything Happens at 1:55 PM on November 29, 2016 [3 favorites]
Similar to above you can create named formulas to ensure that you don't accidentally click in a cell and fuck up your enormous formula, and so that long complex formulas are a lot easier to read, and so that you can be sure you're making the exact same formula call across any number of cells and tabs.
learn how here
posted by beerperson at 2:05 PM on November 29, 2016 [3 favorites]
learn how here
posted by beerperson at 2:05 PM on November 29, 2016 [3 favorites]
Also, per ...These are Pants?, I keep the top row of every sheet for a Subtotal formula and the top left cell of every cell as a pull-down list of 1-11.
This cell usually sits as 2 or 9, so I can see the count or sum for each column as I'm going through (I also freeze the top 2 rows ;>).
posted by Reasonably Everything Happens at 2:11 PM on November 29, 2016
This cell usually sits as 2 or 9, so I can see the count or sum for each column as I'm going through (I also freeze the top 2 rows ;>).
posted by Reasonably Everything Happens at 2:11 PM on November 29, 2016
Checking sums and totals is important (things that should sum to 100% is a key one for me). I have all the values in the lower right bar turned on; sum, max, min, count, count nums - so I can highlight a column to do a quick check of whatever it is.
Simplifying the formulas is also helpful; if there's a lot of nested conditional stuff {e.g. =IF(C3<2>
Similarly, using conditional formatting to highlight outliers or potential errors is very helpful. When cleaning data for example, I might have a column that should be zero or positive; with conditional formatting I can fix all the red rows until the zeroes are red.
Above all, stepping back and using a little bit of a mental model to verify that the results are generally what is expected. What are reasonable results? If this input is changed, roughly how much should the output change?2>
posted by Homeboy Trouble at 2:27 PM on November 29, 2016
Simplifying the formulas is also helpful; if there's a lot of nested conditional stuff {e.g. =IF(C3<2>
Similarly, using conditional formatting to highlight outliers or potential errors is very helpful. When cleaning data for example, I might have a column that should be zero or positive; with conditional formatting I can fix all the red rows until the zeroes are red.
Above all, stepping back and using a little bit of a mental model to verify that the results are generally what is expected. What are reasonable results? If this input is changed, roughly how much should the output change?2>
posted by Homeboy Trouble at 2:27 PM on November 29, 2016
Do you have backups? If possible try to use some sort of version control system so you can go back in time as far as you need.
posted by dilaudid at 2:32 PM on November 29, 2016 [3 favorites]
posted by dilaudid at 2:32 PM on November 29, 2016 [3 favorites]
On the low-tech end, I use a lot of colour-coding, especially I I have lots of tabs that look similar eg if I have ten regions to analyse, and four sheets per region, I'll colour the whole of the sheets for Region A yellow, Region B orange, etc. Not exactly a Professional White Background, but it means I always know where I am.
Agree with the above re named ranges and checksums, and especially conditional formatting (more colours!).
You probably already know that INDEX(MATCH) is way more useful and less prone to creating errors that VLOOKUP and HLOOKUP, but I'll mention it anyway because I've met quite advanced Excel users who weren't familiar with it. If you use it in conjunction with named tables it will respond dynamically to being sorted or filtered.
posted by une_heure_pleine at 4:15 PM on November 29, 2016 [1 favorite]
Agree with the above re named ranges and checksums, and especially conditional formatting (more colours!).
You probably already know that INDEX(MATCH) is way more useful and less prone to creating errors that VLOOKUP and HLOOKUP, but I'll mention it anyway because I've met quite advanced Excel users who weren't familiar with it. If you use it in conjunction with named tables it will respond dynamically to being sorted or filtered.
posted by une_heure_pleine at 4:15 PM on November 29, 2016 [1 favorite]
Yes! Never ever ever LOOKUP. INDEXMATCH always.
Also, you can color code the tabs themselves, which helps with organization.
posted by So You're Saying These Are Pants? at 4:21 PM on November 29, 2016 [1 favorite]
Also, you can color code the tabs themselves, which helps with organization.
posted by So You're Saying These Are Pants? at 4:21 PM on November 29, 2016 [1 favorite]
I suggest you start with the EUropean Spreadsheet Risks Interest Group's (EUSPRIG) section of best practices.
Another real great source to learn about how to standardize and implement best practices in your work, is to look at solutions from Excel Modeling competitions (yes they are things, and no I haven't participated ;) ).
I suggest you spend some time reading through the papers and presentations on the EuSpRIG website. Start with this paper from IBM.
If you need sample workbooks, please feel free to memail me.
Happy spreadsheeting!
posted by rippersid at 1:02 AM on November 30, 2016 [2 favorites]
Another real great source to learn about how to standardize and implement best practices in your work, is to look at solutions from Excel Modeling competitions (yes they are things, and no I haven't participated ;) ).
I suggest you spend some time reading through the papers and presentations on the EuSpRIG website. Start with this paper from IBM.
If you need sample workbooks, please feel free to memail me.
Happy spreadsheeting!
posted by rippersid at 1:02 AM on November 30, 2016 [2 favorites]
One that I like to use when inheriting a spreadsheet is to change all the inputs to zero and to check what that does to the model- i.e. does it give a different value/sums/total to what I expected. If it is not as expected then I either don't understand the spreadsheet, or there is an error in the way it is set up. Either way I know I have to dig in to the details more to ensure I know what is going on.
posted by Gratishades at 1:58 AM on November 30, 2016 [1 favorite]
posted by Gratishades at 1:58 AM on November 30, 2016 [1 favorite]
Regarding the general spreadsheet wisdom part of the question: I think the work of Felienne Hermans fits the bill. She's trying to bridge the gap between software engineering best practices and spreadsheet usage with some really nice results.
As for working with VBA, you may find the following series of 10 blog posts useful, starting here: Excel VBA - industrialisation.
posted by kmt at 6:38 AM on November 30, 2016
As for working with VBA, you may find the following series of 10 blog posts useful, starting here: Excel VBA - industrialisation.
posted by kmt at 6:38 AM on November 30, 2016
Maybe obvious, but worth checking off the list: can tabs be consolidated?
Typically moving through a table is easier (especially with filters) than moving across tabs. Often I'll consolidate data by just adding that field. Eg, if you have sales and a tab for each region, then just add a column for region and consolidate.
I typically use different tabs when (1) the data have different sources or (2) they have a different scale (time?) along the vertical. 2c
posted by Reasonably Everything Happens at 7:34 AM on November 30, 2016
Typically moving through a table is easier (especially with filters) than moving across tabs. Often I'll consolidate data by just adding that field. Eg, if you have sales and a tab for each region, then just add a column for region and consolidate.
I typically use different tabs when (1) the data have different sources or (2) they have a different scale (time?) along the vertical. 2c
posted by Reasonably Everything Happens at 7:34 AM on November 30, 2016
Here are the standards from the Spreadsheet Standards Review Board.
posted by rippersid at 12:10 AM on December 1, 2016
posted by rippersid at 12:10 AM on December 1, 2016
« Older Comfortable, good quality, quiet-soled slippers. | Should I try to "grow" feelings for this person or... Newer »
This thread is closed to new comments.
I was doing cash flow analysis, so, for example, column A would always be divided up among columns B - G. Enforce that by checking column sums. Eventually, there were ~50 or so Final columns, the sum of which could be checked against the inputs.
There were also many columns whos values were all over the place but whose sum had to be zero. Every time I had one of those, I would add a check to enforce that and to loudly alert me if something was inconsistent. You can also centralize these alerts by adding a tab for audit/consistency checking (tab 81!!).
posted by So You're Saying These Are Pants? at 1:55 PM on November 29, 2016 [1 favorite]