Google Sheets: counts with multiply criteria within the same cells
June 2, 2017 9:55 AM   Subscribe

my boss has asked for possibly the impossible.... I am trying to get a count of cells but need to weed out multiply criteria. The basic setup of each cell is "model-serialnumber notes" with not all the cells having the notes

Name
200Q-6710
120W-6713
327DFI/52-6447
120/52-6612
235/61-6600
425/61-6712
325D/61-6500


I am trying to get a count of cells but need to weed out multiply criteria.
The basic setup of each cell is "model-serialnumber notes" with not all the cells having the notes


I want to count any model (the first number) starting with 2,3, or 4 and has a serial number (is after a "-") between 6601 and 6714 but not count anything with a "W". So the count of the above example would be only be 2 (200Q-6710 and 425/61-6712)

I have figured out how to use "countif" and get numbers for the models and the "W"s. I can't figure out how to count between two numbers within the middle of a text. also I am having a little trouble NOT double counting things.
posted by aetg to Computers & Internet (11 answers total)
 
Perhaps you can make new columns with the model, serial number and notes shown separately, using a regular expression function? That would make your task much more straightforward.
posted by demiurge at 10:43 AM on June 2, 2017


Yeah, your first problem is that you have three unique data keys stored in the same cell. I would first work to get them separated out into three different cells and then it should be fairly straightforward to use the COUNTIFS function (since you have multiple criteria, COUNTIF isn't going to work well).
posted by Automocar at 11:01 AM on June 2, 2017


Best answer: Getting them separated into 3 cells isn't very hard. You can do it using RegEx for sure (and the formulas would be shorter) but I'm going to do it with standard Excel/Google Docs functions:

I'm assuming your original cell is A2 (because your column headers are in row 1: raw, model, serial number, notes)
in B2, put:
=left (A2,find("-",A2) - 1)
in C2, put:
=value(mid(A2,find("-",A2) + 1,4))
(this assumes that the serial numbers are always 4 digits - if they're not let me know, and I can fix this one)

in D2, put:
=iferror(right(A2,len(A2)-find(" ",A2)))
Then you can do a (still kind of complicated) countifs:
=countifs(B:B,"2*",B:B,"<>*W*",C:C,">=6601",C:C,"<=6714") 
+ countifs(B:B,"3*",B:B,"<>*W*",C:C,">=6601",C:C,"<=6714") 
+ countifs(B:B,"4*",B:B,"<>*W*",C:C,">=6601",C:C,"<=6714") 
(there are some ways to make that an array so you don't have to add, but it's not worth going into here, and this will eliminate any double counting).

Let me know if you have any questions!
posted by brainmouse at 11:27 AM on June 2, 2017 [2 favorites]


Response by poster: I can't really change the existing format since I am mining information out of a spreadsheet used across the company.

I am however thinking of copying out all the columns I need out of the dozen or so spreadsheets into a single one. basically a sheet with raw data.


brainmouse
the serial numbers are not 4 digits but always the same length. I don't really want to say how long since it a company thing and we are protective of our information.

I assume I would change "=value(mid(A2,find("-",A2) + 1,4))" to the actual number of digits

Do blank cells create problems?
posted by aetg at 12:40 PM on June 2, 2017


You can always create new columns and then hide them, if that's allowed (nobody will be able to see them but you'll know they're there and be able to use them), but a master "raw data" sheet is also a good/valid way to go.

You're correct on changing that 4 to however many digits it is. If it varied, we could do some extra things to find the distance between the "-" and the space or the end, but if it's always the same length it's a lot easier to just change that 4.

Blank cells will not create problems. The "iferror" part on the notes one is so that you get blanks instead of errors when there are no notes, and you can add that to the other 2 also if there are weirdnesses around some cells not having a "-" (which the first 2 formulas rely on), and it shouldn't make any difference at all.
posted by brainmouse at 12:47 PM on June 2, 2017


This seems totally possible with Sheets' regular expression functions. If you're not familiar with RegEx, they are for parsing text, are incredibly powerful and also somewhat difficult to wrap your head around (I am very bad at them).

Here's a decent place to begin, with a focus on the three Google Sheets regex functions.

Matching numeric ranges with regex is not the easiest, but it can totally be done.

You can use a tool like this to test your regex.
posted by wemayfreeze at 1:41 PM on June 2, 2017


Best answer: This can be done with a single logical test formula, though it ends up being a bit of a nested monstrosity. Assuming data in column A, this will give you TRUE / FALSE for all cells for the criteria you outline:

=AND(VALUE(LEFT(A2, 1))>=2, VALUE(LEFT(A2, 1))<=4, ISERROR(FIND("W", A2)), VALUE(RIGHT(A2, 4))>=6601, VALUE(RIGHT(A2, 4))<=6714)

You can then just count the matches in that column:

=COUNTIF(B:B, "TRUE")

To unpack the long formula, it's making 5 logical tests, all of which must pass to return TRUE out of the AND formula:

Test 1: Is the VALUE of digit 1 from the LEFT end of cell A2 greater than or equal to 2?
Test 2: Is the VALUE of digit 1 from the LEFT end of cell A2 less than or equal to 4?
Test 3: Is there an ERROR when trying to FIND the value W in cell A2?
Test 4: Is the VALUE of digits 1 to 4 from the RIGHT end of cell A2 greater than or equal to 6601?
Test 5: Is the VALUE of digits 1 to 4 from the RIGHT end of cell A2 less than or equal to 6714?

As above, you will need to change the 4 in the RIGHT formulae to the length of your serial string.

VALUE is required to convert the characters pulled from mixed data to numeric values, otherwise the greater / less than tests won't work.

FIND will return the position of the starting character of a string within text as a number, so here e.g. 120W-6713 returns 4. If the string isn't found it returns a #VALUE! error; checking for this with ISERROR allows the neccessary TRUE (i.e. "Yes, there is an error which means no W character found here") to pass through the sequence.

I actually wrote this as half a dozen smaller separate formulae columns first, one to extract the first digit, one to check for presence of W, one to extract the serial string, and three to test each of these separately. This is how I'd get this done 99% of the time at work, ideally using Excel to then filter select all the trues from the last 3 columns. I'd only go back and nest it all together if it had to be done for one of those Sheets That Must Not Be Altered ;)
posted by protorp at 1:49 PM on June 2, 2017 [1 favorite]


People are pointing you in the right direction but if this information is important to the company you really should be doing things differently. If you have a dozen sheets I assume there is a lot of data: hundreds rather than tens of rows. If so, this isn't even a job for Excel let alone Google sheets and it will, sooner or later, blow up in your face. So, beyond the immediate question also consider maintainability, mission creep (as here), data validation, access control and backups.

To get directly back on topic, I would suggest working on copies of the sheets rather than on the production sheets until you know it is all bulletproof, fully documented and tested.
posted by epo at 3:19 AM on June 3, 2017


I am however thinking of copying out all the columns I need out of the dozen or so spreadsheets into a single one. basically a sheet with raw data.

This is the right way to go about it, in my opinion, if you're stuck with using this particular tool.

Have an entire subsheet with its own tab, whose contents you are always completely free to erase and overwrite with pasted copies of your original data, in whatever format it's supplied in.

On a separate sheet within the same document, add some formulas that parse and check and clean up anything non-blank found on the first sheet, breaking any column containing multiple embedded values (like model code and serial number) into separate single-valued columns.

Use conditional formatting on that second sheet to put a red background on any row you can't make sense of. That way, you will get instant feedback when you paste stuff onto the first sheet that lets you know when you're dealing with data whose format quirks you haven't seen before.

On other sheets within that same document, add your analysis stuff, which will all be based off the pre-parsed columns from the second sheet. Take no notice of your analysis results if any red cells show up on the cleanup sheet.

Data parsing and cleanup is a whole can of worms on its own. Having a whole subsheet devoted purely to dealing with that part should let you simplify your analysis stuff enough to get it both understandable and reliable.
posted by flabdablet at 8:45 AM on June 3, 2017


Matching numeric ranges with regex is not the easiest, but it can totally be done.

Perceiving a need to write regexes to match numeric ranges is pretty strong code smell, and usually means that you should have broken your data up better beforehand.

Regexes are a good tool for recognizing and extracting digit-strings from whatever they're embedded in, but trying to use regexes to do tests on the numeric values of such extracted digit-strings is definitely doing things the hard way. Just dump each extracted digit-string in its own cell, and let Sheets turn it into an actual number for you.
posted by flabdablet at 8:54 AM on June 3, 2017 [1 favorite]


Response by poster: Protorp - I left out something that screwed with using your method. sometimes the serial string end with a letter but not ever time. I couldn't figure out how to modify your rather simple method to work around that.

Brainmouse seems to do the trick. At least I gave a realistic number to my bosses.

Thanks everyone!
posted by aetg at 1:56 PM on June 5, 2017


« Older What's the brand of your durable urban shopping...   |   Airbnb for the disabled? Newer »
This thread is closed to new comments.