February 12, 2010 11:27 AM Subscribe

Excel Filter: Say I have a spreadsheet with two types of data. a) list of transactions with date/time + transaction, and b) days with traffic. How do I reconcile the two sets of data swiftly & easily?

I have two types of data. One is a line-by-line list of transactions and the exact date/time they occurred on - one transaction per line, so one date can span across multiple rows. The other is traffic data - visits per day - one day per line.

Normally I reconcile these by going through the tedious task of tallying the transactions into a # of transactions per day spreadsheet. This involves lots of IF statements and adding things together, then removing all the rows that aren't the tally for that date & then adding in any dates that have zero transactions. It's all rather manual. Since I generate this report once a week, I'd much rather paste the data in & allow the formulas to do the matching for me.

Example (using commas to represent colums)

sale date, transaction ID (SKU), traffic date, traffic

1/1/2010, 1, 1/1/2010, 500

1/1/2010, 1, 1/2/2010, 700

1/1/2010, 1, 1/3/2010, 900

1/2/2010, 1, 1/4/2010, 850

etc. - as you can see there are 3 transactions on the first day and it takes up 3 lines, but the traffic stats are one day per line. I normally create an extra column with this formula [ =IF(A2=A1,C1+1,1) ] to tally each day's transactions and copy/paste into notepad & back into EXCEL to turn the formula values into hard numbers, and then delete the lower numbered duplicate date lines.

Is there a way to have Excel just reconcile these two bits of data? If this is possible, I'd imagine the logic would be something like this:

Column A = transaction date

Column B = transaction ID (what was sold)

Column C = tally of total products sold that day (if A2<A1, then C1+B2, else 1 - or something like this, a formula like this will keep adding +1 each time a transaction occurs, but reset at the start of a new day)

Column D = date

Column E = # of visitors on that date

Column F = the mythical column where my new formula appears that shows the # of transactions on date

Column F would have something like:

"show me the cell in column C that corresponds to the cell in column A that has the highest date that's equal to or less than the date in D (and if there is no date equal to D, then show zero)"

or

"show me the highest cell in column C that has a date in column A that corresponds to the date in column D, otherwise (if there is no equal date), show zero."

or, eliminating the need for column C

"Tally all cells in column B where the date in column A matches the date in column D, otherwise show 0."

Or am I just stuck doing this by hand?
posted by Muffy to Computers & Internet (8 answers total) 1 user marked this as a favorite

I have two types of data. One is a line-by-line list of transactions and the exact date/time they occurred on - one transaction per line, so one date can span across multiple rows. The other is traffic data - visits per day - one day per line.

Normally I reconcile these by going through the tedious task of tallying the transactions into a # of transactions per day spreadsheet. This involves lots of IF statements and adding things together, then removing all the rows that aren't the tally for that date & then adding in any dates that have zero transactions. It's all rather manual. Since I generate this report once a week, I'd much rather paste the data in & allow the formulas to do the matching for me.

Example (using commas to represent colums)

sale date, transaction ID (SKU), traffic date, traffic

1/1/2010, 1, 1/1/2010, 500

1/1/2010, 1, 1/2/2010, 700

1/1/2010, 1, 1/3/2010, 900

1/2/2010, 1, 1/4/2010, 850

etc. - as you can see there are 3 transactions on the first day and it takes up 3 lines, but the traffic stats are one day per line. I normally create an extra column with this formula [ =IF(A2=A1,C1+1,1) ] to tally each day's transactions and copy/paste into notepad & back into EXCEL to turn the formula values into hard numbers, and then delete the lower numbered duplicate date lines.

Is there a way to have Excel just reconcile these two bits of data? If this is possible, I'd imagine the logic would be something like this:

Column A = transaction date

Column B = transaction ID (what was sold)

Column C = tally of total products sold that day (if A2<A1, then C1+B2, else 1 - or something like this, a formula like this will keep adding +1 each time a transaction occurs, but reset at the start of a new day)

Column D = date

Column E = # of visitors on that date

Column F = the mythical column where my new formula appears that shows the # of transactions on date

Column F would have something like:

"show me the cell in column C that corresponds to the cell in column A that has the highest date that's equal to or less than the date in D (and if there is no date equal to D, then show zero)"

or

"show me the highest cell in column C that has a date in column A that corresponds to the date in column D, otherwise (if there is no equal date), show zero."

or, eliminating the need for column C

"Tally all cells in column B where the date in column A matches the date in column D, otherwise show 0."

Or am I just stuck doing this by hand?

I'm a little bit confused about what you're trying to accomplish, but could this be done using pivot tables? Row labels are dates, and you could have a column of count of date or sum of traffic or whatever you want.

posted by brainmouse at 11:48 AM on February 12, 2010

posted by brainmouse at 11:48 AM on February 12, 2010

Ok, let's assume the following: In column A you have transaction dates, column B you have SKUs. In column D you have traffic dates, column E you have traffic data. So, columns A and B correspond to your first set of transaction data, and columns D and E correspond to your second set of traffic data.

Populate column G with the dates that you're interested in (let's say that row 1 is headers and the dates start in G2 and go down to G9999 or whatever). In cell H2, do this:

If I've understood your problem correctly, what you should get now is column H is the number of transactions per date and column I is the total traffic per date.

posted by mhum at 12:04 PM on February 12, 2010

Populate column G with the dates that you're interested in (let's say that row 1 is headers and the dates start in G2 and go down to G9999 or whatever). In cell H2, do this:

=COUNTIF(A$2:A$9999, G2)Copy this formula all down column H. In cell I2, do this:

=SUMIF(D$2:D$9999, G2, E$2:E$9999)Copy this formula all down column I. You can adjust the ranges (e.g.: A$2:A$9999) as needed.

If I've understood your problem correctly, what you should get now is column H is the number of transactions per date and column I is the total traffic per date.

posted by mhum at 12:04 PM on February 12, 2010

To find out the # of transactions per day, you definitely want to use pivot tables.

Is your transaction data in a separate spreadsheet than the traffic? If not, separate them. Make sure you have a first row with words saying what's in each column (like 'transaction date'). Then click anywhere in your transaction data and insert a Pivot table (what version of Excel are you using, I could be more specific?)

For the pivot table, you want 'transaction date' to be in the 'rows' area and your other column to be in 'values'. This will give you the count of transactions per day.

posted by beyond_pink at 12:09 PM on February 12, 2010

Is your transaction data in a separate spreadsheet than the traffic? If not, separate them. Make sure you have a first row with words saying what's in each column (like 'transaction date'). Then click anywhere in your transaction data and insert a Pivot table (what version of Excel are you using, I could be more specific?)

For the pivot table, you want 'transaction date' to be in the 'rows' area and your other column to be in 'values'. This will give you the count of transactions per day.

posted by beyond_pink at 12:09 PM on February 12, 2010

mhum - that did it, thanks. I had to do a tiny bit of adjusting because the dates in column A included the time and were therefore not equal to column G (so I just got rid of the time).

Thank you Metafilter - another mystery of Excel solved.

Here's a screenshot of the result, for as long as imageshack keeps hosting it.

posted by Muffy at 1:17 PM on February 12, 2010

Thank you Metafilter - another mystery of Excel solved.

Here's a screenshot of the result, for as long as imageshack keeps hosting it.

posted by Muffy at 1:17 PM on February 12, 2010

You can also strip off the times within Excel this way:

=DATE(YEAR(A2),MONTH(A2),DAY(A2))as long as Excel recognizes A2 as a valid date/time.

posted by mhum at 1:37 PM on February 12, 2010

Just wanted to say thanks again - I'm using this to create my first weekly report with up-to-date data, rather than having the data slowly go out of data until I do all the manual calculations again. It's made my life much easier - just export from the database & plug it in to excel, tweak one or two things & I'm done.

posted by Muffy at 8:52 AM on February 18, 2010

posted by Muffy at 8:52 AM on February 18, 2010

This thread is closed to new comments.

If you're doing what I think you're doing, I suspect the magic words you'll need to learn are SUMIF and/or COUNTIF.

posted by mhum at 11:44 AM on February 12, 2010