Comments on: Quick Data Collection Problem
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem/
Comments on Ask MetaFilter post Quick Data Collection ProblemTue, 10 Nov 2015 13:52:58 -0800Tue, 10 Nov 2015 13:56:27 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Quick Data Collection Problem
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem
I have a CSV with one column that contains User IDs. I need to make another CSV that counts how many times each User ID appears in the document, and output it to another CSV That has something like UID1 >> 3; UID4 >> 1; UID20 >> 0. There are thousands of User IDs, so I need to be able to set up a range for it to scan for. Is this possible with free software and not much programming expertise? <br /><br /> Here's a sample of the data:<br>
<br>
The range of total data is 1 to 5. The data in the spreadsheet is <br>
<br>
1<br>
1<br>
1<br>
3<br>
3<br>
5<br>
<br>
I need it to output that there are three 1's, zero 2's, two 3's, zero 4's, and one five. <br>
<br>
Even though 4 and 2 don't appear in the document, I still need it to go down the list and fill in '0' for those while counting the other, extant values. This will be applied to lists that number in the few thousands.post:ask.metafilter.com,2015:site.288286Tue, 10 Nov 2015 13:52:58 -0800codacorolladatadataanalysisstatisticscountsprogrammingdatascienceresolvedBy: codacorolla
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175851
Actually, I just realized that what I'm asking is a little off.<br>
<br>
Instead of counting EVERYTHING in a range, I just want the script to count based off of another list. So going back to the previous example,<br>
<br>
Counting Master List: 1, 3, 4<br>
<br>
Data Sheet:<br>
1<br>
1<br>
2<br>
3<br>
3<br>
5<br>
<br>
That would count that there are two 1's, two 3's and zero 4's while ignoring the 2 and the 5, since they don't exist on the list.comment:ask.metafilter.com,2015:site.288286-4175851Tue, 10 Nov 2015 13:56:27 -0800codacorollaBy: GuyZero
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175857
A pivot table in Excel or even Google Sheets should do this. Open Office Calc does pivot tables too.comment:ask.metafilter.com,2015:site.288286-4175857Tue, 10 Nov 2015 14:01:04 -0800GuyZeroBy: GuyZero
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175859
Although a pivot table may not do the filtering you want, but it can definitely do counting/summing.comment:ask.metafilter.com,2015:site.288286-4175859Tue, 10 Nov 2015 14:01:37 -0800GuyZeroBy: brainmouse
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175862
This is very easy with any spreadsheet program (I know google docs will do it, and while I don't know the features exactly of some of the downloadable Excel replacements, I assume they all have this feature): just create a pivot table. You want the userid as the row labels, and the count of userid as the values. Change the range of your pivot table so it's the entirety of the columns you care about (probably something like A:A)<br>
<br>
To do part 2, there are a few ways. One is, have another list of numbers, so your master list would look like:<br>
1<br>
3<br>
4<br>
<br>
Then do a vlookup on your data and have a condition for errors, so your main data sheet would have in the second column (this is excel formatting, but other programs are similar):<br>
=iferror(vlookup(A1,[range of your master list],1,false),"error")<br>
<br>
That should give you the same value back if it's in your master list, and "error" otherwise. Then you can add that as a filter to your pivot table and exclude "error"comment:ask.metafilter.com,2015:site.288286-4175862Tue, 10 Nov 2015 14:03:25 -0800brainmouseBy: Mrs. Pterodactyl
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175863
Do you have Excel? If so, this can be done pretty easily! Here is how I would do it -- modify these instructions as you see fit!<br>
<br>
I'm changing this from numbers to animals to make it less confusing, so if you have a list from A1 to A7 that's like:<br>
<br>
Pig<br>
Pig<br>
Cow<br>
Pig<br>
Horse<br>
Cow<br>
Sheep<br>
<br>
I would copy/paste them all into a new column and hit the "remove duplicates" button for that list so you have them somewhere like C1 to C4 and it looks like like:<br>
Pig<br>
Cow<br>
Horse<br>
Sheep<br>
<br>
In the column next to it, do the simple formula =COUNTIF(range, criteria) so your range will be the first list and the criteria will be the second list (e.g. =COUNTIF(A$1:A$7, C1)) and then you can just drag it down (=COUNTIF(A$1:A$7, C2), &c.) and then you have a nice list so it looks like:<br>
<br>
Pig 3<br>
Cow 2<br>
Horse 1<br>
Sheep 1<br>
<br>
If you don't have Excel, I imagine there's an OpenOffice version that will do basically the same thing. Good luck!comment:ask.metafilter.com,2015:site.288286-4175863Tue, 10 Nov 2015 14:03:38 -0800Mrs. PterodactylBy: mhoye
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175874
In a unix:<br>
<br>
<code>anglachel:~/tmp> cat data<br>
Pig<br>
Pig<br>
Cow<br>
Pig<br>
Horse<br>
Cow<br>
Sheep<br>
anglachel:~/tmp> cat data | sort | uniq -c | sort -n<br>
1 Horse<br>
1 Sheep<br>
2 Cow<br>
3 Pig<br>
anglachel:~/tmp></code><br>
<br>
If you've got comma-delimited data there, you can use the <code>cut</code> utility to slice out the field you want.comment:ask.metafilter.com,2015:site.288286-4175874Tue, 10 Nov 2015 14:11:41 -0800mhoyeBy: If only I had a penguin...
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4175882
If you're only going to do this once, download a <a href="https://www-01.ibm.com/marketing/iwm/iwmdocs/tnd/data/web/en_US/trialprograms/W110742E06714B29.html">free trial of SPSS</a>. From the menu, choose data-> aggregate. Set the ID variable as your break variable and then check off the button that says something like "number of cases" then give the line next to that button the variable name for the number of cases. <br>
<br>
Assuming you want a completely new data file with this information in it, choose create new data file. If you want it merged into your existing data, (that would look like this:<br>
ID NumTimes<br>
1 3<br>
1 3<br>
1 3<br>
3 2<br>
3 2<br>
5 1<br>
<br>
) then that's the default.<br>
<br>
You can also at the same time calculate other aggregate measures if that's of interest. Like let's say your data contains a variable that is each person's hours worked that week (so each case is a person-week), then you can calculate at the same time the average hours worked per week for that ID. Or if you think people did more than one survey and might have listed different satisfaction levels each time you can get the mean or the min or the max for each person or whatever you want.<br>
<br>
PSPP is an open-source version of SPSS and probably has similar functions.comment:ask.metafilter.com,2015:site.288286-4175882Tue, 10 Nov 2015 14:17:37 -0800If only I had a penguin...By: codacorolla
http://ask.metafilter.com/288286/Quick-Data-Collection-Problem#4176018
Mrs. Pterodactyl's method worked like a charm. Thanks!comment:ask.metafilter.com,2015:site.288286-4176018Tue, 10 Nov 2015 16:58:05 -0800codacorolla