help me parse this range of data using excel, by number of events per time span...
August 4, 2010 7:37 AM   Subscribe

help me parse this range of data using excel, by number of events per time span...

i've got data like this:

7/23/2010 12:2:40
7/23/2010 12:2:40
7/23/2010 12:2:40
7/23/2010 12:2:40
7/23/2010 12:2:43
7/23/2010 12:2:43
7/23/2010 12:21:22
7/23/2010 12:21:22
7/23/2010 12:23:23
7/23/2010 12:23:23
7/23/2010 12:31:6


which spans a time period of about 12 days. i want to parse it simply to report back there were "X" lines per astral half hour. so from the period 7/23/2010 12:00:00 to 7/23/2010 12:30:00 how many lines were there? (so a report on the above would be 12-12:30 = 10... 12:30 to 13:00 = 1)

i've got A LOT of data across those days (181761 events) but for reasons not worth explaining the tool i have to use is excel 2007....

thoughts? (i'm open to line by line formulas etc... i've no preconceived notions of a solution here but date math always burns me up in excel).

thanks
posted by chasles to Computers & Internet (15 answers total) 1 user marked this as a favorite
 
does xls recognize the data as a date and time? If so this is easy. If not - then yeah...
posted by JPD at 7:56 AM on August 4, 2010


Response by poster: yessir. after much parsiing/concatenating of single columns the above values are right out of excel, recognized as dates...
posted by chasles at 8:00 AM on August 4, 2010


This would get you most of the way there, to hours rather than half-hours (in Excel 2007): Label the column as Dates & Times at the top, then select all of the column and on the insert menu, insert a PivotTable onto a new worksheet. Check the Date & Time box at right (under "Choose fields to add to the report") to add it to the report. Then right-click on the data in the PivotTable and select "Group." The pre-set groupings include date and hour. Select both (deselect any others) and click OK.
posted by jocelmeow at 8:13 AM on August 4, 2010


Pivot table's probably the cleanest way to do it. Other option:

Create a new sheet.
In A1 put: "Count in half hour ended..."
In A2 put the earliest date/time you have, eg: 7/23/2010 12:00:00
In A3 put: =A2+TIME(0,30,0)
Then drag that formula down until the last date/time. This will fill in A with half-hour increments.

Then in B2 put: =COUNTIFS(data_range,"<>" & A1)
"data_range" is simply where the full data is, eg. "$A$1:$A181761".

And drag that down all the way. It basically says, count it if it's earlier than this time, but later than the previous time (eg. within the half hour ending at the time in column A of this row).
posted by inigo2 at 8:29 AM on August 4, 2010


To do it to half hours, I'd put a second column next to the data called "Rounded"; I'd then fill it with =rounddown(A2*48,0)/48. This takes advantage of the dates-are-numbers thing (each day is 1), so multiplying them by 48 converts it to half hours; then rounds down to the nearest half hour, then converts it back to the original date/time, but rounded. (If you use round rather than rounddown, you'll get 1:15:00-1:44:59 in a bin, and I'm guessing you want 1:00:00-1:29:59, which will all be rounded to 1:00:00.) A Pivot Table can finish you off from there.
posted by Homeboy Trouble at 8:33 AM on August 4, 2010


Best answer: This calls for a little array function dark magic.

(DISCLAIMER: The following appears to work for my simplistic test, use at your own risk)

Put your huge list of values in column A, say reaching down to Annn where nnn is some big number. Set up a list of bin intervals in columns B and C. Bin intervals can be arbitrary, not just half-hours. Your spreadsheet may look something like this:
A                                      B                                       C
7/23/2010 12:2:40          7/3/2010 12:00                7/3/2010 12:30
7/23/2010 12:2:40          7/3/2010 12:30                7/3/2010 13:00
7/23/2010 12:2:40          7/3/2010 13:00                7/3/2010 13:30
7/23/2010 12:2:40 
7/23/2010 12:2:43 
7/23/2010 12:2:43 
7/23/2010 12:21:22 
7/23/2010 12:21:22 
7/23/2010 12:23:23 
7/23/2010 12:23:23 
7/23/2010 12:31:6
Now put this in cell D1:
 =SUM(IF(A$1:A$nnn,>=B1,IF(A$1:A$nnn,<C1,1,0),0))
When you are done typing, press Ctrl+Shift+Enter, not just enter - this tells excel this is an array formula that must be iterated over the entire range. If you do it right, clicking on D1 should show you the formula in curly braces. Drag down along your bin intervals and you are done.

Dark forces apparently already at work
posted by Dr Dracator at 8:42 AM on August 4, 2010


Response by poster: @ jocelmeow i've never really used pivto tables > when you say right click on the data in the pivot table...... includes date and hour....

i'm lost. i tried selecting all the values, i tried using filters instead of grouping etc etc. i can't seem to draw a line from your description to what i need to do but in essence i don't see any method for automatically grouping by hour for example...


thanks so much for your help.
posted by chasles at 8:48 AM on August 4, 2010


Oh boo -- just realized my less thans and greaters thans got borked.

....in B2 put:
=COUNTIFS($A$1:$A181761,"<>" & A13)

posted by inigo2 at 8:54 AM on August 4, 2010


Ok, last try...

=COUNTIFS($A$1:$A181761,"<" & A2, $A$1:$A181761,">" & A1)
posted by inigo2 at 8:56 AM on August 4, 2010


Here's a quick video on grouping. I'll admit, f you haven't used PivotTables before, that feature might not be the the easiest place to start. One thing that video reminded me of that I'd forgotten about, though, is that you can also group manually. You've got good ideas from others here, too, though.
posted by jocelmeow at 10:10 AM on August 4, 2010


Response by poster: @dr Dracator

this is where i had started but it was kicking my a$$. there seems to be an issue in your formular for D1:

=SUM(IF(A$1:A$nnn,>=B1,IF(A$1:A$nnn,
where you've got commas after A$1:A$nnn meaning there is no logical comparison being made so excel bitches. i've monkyed around but no joy so far, i cant coax it into giving me a non zero result. i think this is the answer though so i will keep after it, and post back!

thanks all, i love askme.


-c

posted by chasles at 10:38 AM on August 4, 2010


Create a histogram. It will generate a nice graph and raw data of the # of instances within each bin. You need to define your bins of 30 minutes.
posted by Diddly at 10:41 AM on August 4, 2010


Response by poster: honestly, pivot tables and histograms look to be a perfectly elegant solution, but the array formulas are right in my bread basket and worked perfectly. thanks everybody!!!
posted by chasles at 11:15 AM on August 4, 2010


Hey, I had no idea about this. It's a better solution, nested IFs are a bitch to debug - use it and save yourself the potential headache.
posted by Dr Dracator at 11:48 PM on August 4, 2010


Dr Dracator, just be aware the ability to do COUNTIFS (with multiple conditions) was new with 2007; 2003 and earlier only had the singular COUNTIF.
posted by inigo2 at 4:00 AM on August 5, 2010


« Older How to be a great first time TA?   |   What to say the last day of a break up when... Newer »
This thread is closed to new comments.