# Please help a data analysis amateur!

July 6, 2010 10:03 AM Subscribe

How to sort to count yes/no answers.... In Access or Excel. Help!

I took a class on Access and Excel about 5 years ago, but itâ€™s all hazy now and I need help solving a very specific problem. I need to sort data by Customer to figure out the accuracy of a test (False positives, false negatives, correct positives, and correct negatives). Many customers tested negative, and were negative. Some tested positive and were positive. But the most important thing (and the thing I am counting) is when customers tested positive and actually weren't, or when they tested negative and actually were. I am counting the number of times this happens, even if that was not the first occurrence.

There are over 6,000 entries for about 1,100 customers. Please help me not have to do this by hand! Please ask me if you need clarification. I am 100% confused by how to proceed.

Due to confidentiality, I cannot post the original document. I am having trouble formatting this chart, but here is an example of what I am trying to accomplish:

Date Customer # Tested Result (Pos/Neg) Actual Result (Pos/Neg)

3/29/2010 Dancer Neg Neg

3/29/2010 Dancer Neg Neg

3/30/2010 Dancer Neg Pos

5/29/2010 Donner Pos Neg

5/29/2010 Prancer Pos Neg

5/30/2010 Prancer Pos Pos

5/31/2010 Prancer Neg Neg

5/29/2010 Cupid Pos Pos

5/30/2010 Cupid Pos Pos

5/31/2010 Cupid Pos Neg

5/29/2010 Dasher Neg Neg

5/30/2010 Dasher Neg Neg

5/26/2010 Vixen Neg Neg

5/27/2010 Vixen Neg Pos

5/28/2010 Vixen Pos Pos

5/28/2010 Vixen Neg Neg

Tested Result (Pos) Tested Result (Neg)

Actual Result Pos Prancer Dancer, Vixen

Actual Result Neg Cupid, Donner Dasher

I took a class on Access and Excel about 5 years ago, but itâ€™s all hazy now and I need help solving a very specific problem. I need to sort data by Customer to figure out the accuracy of a test (False positives, false negatives, correct positives, and correct negatives). Many customers tested negative, and were negative. Some tested positive and were positive. But the most important thing (and the thing I am counting) is when customers tested positive and actually weren't, or when they tested negative and actually were. I am counting the number of times this happens, even if that was not the first occurrence.

There are over 6,000 entries for about 1,100 customers. Please help me not have to do this by hand! Please ask me if you need clarification. I am 100% confused by how to proceed.

Due to confidentiality, I cannot post the original document. I am having trouble formatting this chart, but here is an example of what I am trying to accomplish:

Date Customer # Tested Result (Pos/Neg) Actual Result (Pos/Neg)

3/29/2010 Dancer Neg Neg

3/29/2010 Dancer Neg Neg

3/30/2010 Dancer Neg Pos

5/29/2010 Donner Pos Neg

5/29/2010 Prancer Pos Neg

5/30/2010 Prancer Pos Pos

5/31/2010 Prancer Neg Neg

5/29/2010 Cupid Pos Pos

5/30/2010 Cupid Pos Pos

5/31/2010 Cupid Pos Neg

5/29/2010 Dasher Neg Neg

5/30/2010 Dasher Neg Neg

5/26/2010 Vixen Neg Neg

5/27/2010 Vixen Neg Pos

5/28/2010 Vixen Pos Pos

5/28/2010 Vixen Neg Neg

Tested Result (Pos) Tested Result (Neg)

Actual Result Pos Prancer Dancer, Vixen

Actual Result Neg Cupid, Donner Dasher

Best answer: You could add an aditional column in which you test, using the IF function if the two Pos/Neg columns have different values:

if(C1=D1,0,1)

This will give a 0 for N/N or P/P and a one for P/N or N/P.

And then, using the SUM formula you can add all the 1s and 0s to get the number of P/N and N/P cases.

I hope the is (somewhat) clear.

Or, on preview, use Llama's method.....

posted by PaulZ at 10:15 AM on July 6, 2010

if(C1=D1,0,1)

This will give a 0 for N/N or P/P and a one for P/N or N/P.

And then, using the SUM formula you can add all the 1s and 0s to get the number of P/N and N/P cases.

I hope the is (somewhat) clear.

Or, on preview, use Llama's method.....

posted by PaulZ at 10:15 AM on July 6, 2010

You can pretty easily count the number of positives or negatives using the COUNTIF function.

You could count false positives by making a fifth column to the right and doing something like this: =IF(A3=A4, 0, 1)

That would put a 1 in the fifth column whenever there was a false test. Sum the fifth column and you have your total false tests. Note that this depends on well-formed data; if you have a mixture of POS and Pos and pos and Positive and so on you'd have to fix that first.

posted by PercussivePaul at 10:16 AM on July 6, 2010

You could count false positives by making a fifth column to the right and doing something like this: =IF(A3=A4, 0, 1)

That would put a 1 in the fifth column whenever there was a false test. Sum the fifth column and you have your total false tests. Note that this depends on well-formed data; if you have a mixture of POS and Pos and pos and Positive and so on you'd have to fix that first.

posted by PercussivePaul at 10:16 AM on July 6, 2010

yeah, countif

=countif (D:D, "Neg") -- would count the neg marks

=countif (D:D, "Pos") -- would count the pos marks

posted by Brent Parker at 10:16 AM on July 6, 2010

=countif (D:D, "Neg") -- would count the neg marks

=countif (D:D, "Pos") -- would count the pos marks

posted by Brent Parker at 10:16 AM on July 6, 2010

If you have Excel 2007, you can use the "Countifs" formula like so:

B:B refers to the letter of the "tested result" column and C:C is the letter of the "actual result" column. What this formula does is count up only those instances when both the B result is "neg" and the C result is "pos". Then you can have one of these formulas for each combination you want.

This will only work if you have Excel 2007.

posted by amethysts at 10:16 AM on July 6, 2010

`=COUNTIFS(B:B,"neg",C:C,"pos")`

B:B refers to the letter of the "tested result" column and C:C is the letter of the "actual result" column. What this formula does is count up only those instances when both the B result is "neg" and the C result is "pos". Then you can have one of these formulas for each combination you want.

This will only work if you have Excel 2007.

posted by amethysts at 10:16 AM on July 6, 2010

In column E, row 2, put this formula: C2=D2 and copy the formula down the entire column.

Any row that says "TRUE" was accurate, any that says False was either a false positive or negative. Copy the whole column and then Paste over it using Paste Special... and selecting Values. This removes the formula and makes the cell actually say "TRUE" or "FALSE".

Now, turn on Auto Filter. Filter column E for FALSE. Let's say that row 5 is the first row in your newly filtered data. Put this in cell F5: =C5. Turn off Auto Filter, copy and past special... values all of column F.

Now, to count the Neg and Pos values, =COUNTIF(F:F,"Neg") in one cell in column G and =COUNTIF(F:F,"Pos") in another.

posted by soelo at 10:23 AM on July 6, 2010

Any row that says "TRUE" was accurate, any that says False was either a false positive or negative. Copy the whole column and then Paste over it using Paste Special... and selecting Values. This removes the formula and makes the cell actually say "TRUE" or "FALSE".

Now, turn on Auto Filter. Filter column E for FALSE. Let's say that row 5 is the first row in your newly filtered data. Put this in cell F5: =C5. Turn off Auto Filter, copy and past special... values all of column F.

Now, to count the Neg and Pos values, =COUNTIF(F:F,"Neg") in one cell in column G and =COUNTIF(F:F,"Pos") in another.

posted by soelo at 10:23 AM on July 6, 2010

After this line"Put this in cell F5: =C5.", my directions should say to copy that formula down the whole column before turning off Auto Filter.

posted by soelo at 10:25 AM on July 6, 2010

posted by soelo at 10:25 AM on July 6, 2010

Response by poster: The problem is I only want to count one record per customer, e.g. 1 false positve for Cupid, even if a false positive occurred twice for him/her. Am I asking too much?

posted by BusyBusyBusy at 10:26 AM on July 6, 2010

posted by BusyBusyBusy at 10:26 AM on July 6, 2010

You need a pivot table.

posted by dfriedman at 10:28 AM on July 6, 2010 [1 favorite]

posted by dfriedman at 10:28 AM on July 6, 2010 [1 favorite]

So does the question boil down to "which customers have at least one false positive or false negative?"

posted by artlung at 10:57 AM on July 6, 2010

posted by artlung at 10:57 AM on July 6, 2010

Try this:

=IF(C1="pos",(IF(D1="neg","False positive " &B1,""))," ")

That will tell you whether a false positive exists at all and will include the name. Then you can do a filter for unique values and it will give you a list of all of the times where this occurred showing only one instance for each, and you can just see how many rows you've wound up with and there's your number (% of the original spreadsheet).

(Sorry if I'm not understanding.)

You might have to 'select visible cells' and paste onto a clean spreadsheet to get the number of rows.

There's probably ten easier ways to do this...

posted by A Terrible Llama at 11:03 AM on July 6, 2010

=IF(C1="pos",(IF(D1="neg","False positive " &B1,""))," ")

That will tell you whether a false positive exists at all and will include the name. Then you can do a filter for unique values and it will give you a list of all of the times where this occurred showing only one instance for each, and you can just see how many rows you've wound up with and there's your number (% of the original spreadsheet).

(Sorry if I'm not understanding.)

You might have to 'select visible cells' and paste onto a clean spreadsheet to get the number of rows.

There's probably ten easier ways to do this...

posted by A Terrible Llama at 11:03 AM on July 6, 2010

Is this in access? because it's a pretty easy one in SQL:

SELECT COUNT(DISTINCT cust_id) WHERE test_result != actual_result

posted by pwnguin at 1:51 PM on July 6, 2010 [1 favorite]

SELECT COUNT(DISTINCT cust_id) WHERE test_result != actual_result

posted by pwnguin at 1:51 PM on July 6, 2010 [1 favorite]

Also, if you're unfamiliar with SQL in general and do these sorts of queries regularly, it could be worth your time to run through SQLzoo.

posted by pwnguin at 1:53 PM on July 6, 2010

posted by pwnguin at 1:53 PM on July 6, 2010

Best answer: dfriedman is unfortunately correct; if you want to do this in pure Excel

There

1. Add a column (e) to print only names on records with false negatives:

2. Add a column (f) to test whether each false negative is the

3. Autofill down (ie copy and paste into the whole column below, or drag the corner downward) to fill the rest of each column. At the bottom of column f, just count the number of unique false positives:

... where x is the last row before that calculation.

Pretty basic. And kind of sloppy. You have to just put the names in column e, rows 1 and 2, by hand based on whether they're the first false negatives or not. But this solution should only take a few seconds.

posted by koeselitz at 2:52 PM on July 6, 2010

*cleanly*, you should probably use a pivot table. But pivot tables are pointlessly complex, and if you're going to learn something pointlessly complex, you may as us Access.There

*is*a fast-and-dirty way to do this in pure Excel, if you want; but you have to add two columns. It should be easy, since you can just paste a formula into those columns. (You'll have to start on row 3, and do 1 and 2 separately, although that allows you to do an infinite number of rows after that; yeah, it's a little sloppy.)1. Add a column (e) to print only names on records with false negatives:

`=if(and(c3="Pos",d3="Neg"),b3,"")`2. Add a column (f) to test whether each false negative is the

*first*time a given name has had a false positive:`=if(countif(e$1:e2,e3)=0,"unique_false_positive","")`3. Autofill down (ie copy and paste into the whole column below, or drag the corner downward) to fill the rest of each column. At the bottom of column f, just count the number of unique false positives:

`=countif(f1:fx,"unique_false_positive")`... where x is the last row before that calculation.

Pretty basic. And kind of sloppy. You have to just put the names in column e, rows 1 and 2, by hand based on whether they're the first false negatives or not. But this solution should only take a few seconds.

posted by koeselitz at 2:52 PM on July 6, 2010

(The reason you'd have to use a pivot table for this in Excel, by the way, is because the =countif function accepts one

posted by koeselitz at 2:56 PM on July 6, 2010

*and only one*condition. This limitation is sort of notorious, as I recall. In your situation, it's why you can't countif to test whether a name is the first on the list*and*a false positive. You just have to add columns for those conditions.)posted by koeselitz at 2:56 PM on July 6, 2010

Response by poster: You guys are the best. Artlung has it right that I am essentially asking for which customers and false positives and negatives. I'll work on this again tomorrow morning, and I'm eager to try out these suggestions. Thanks everyone!

posted by BusyBusyBusy at 6:35 PM on July 6, 2010

posted by BusyBusyBusy at 6:35 PM on July 6, 2010

This thread is closed to new comments.

=countif(d:d, b1)

Count everything in column D that matches b1.

posted by A Terrible Llama at 10:11 AM on July 6, 2010