Excel function for consolidating a bunch of rows of nearly identical data?
May 3, 2012 2:01 PM   Subscribe

I need help sorting out a bunch of data in Excel where the row data is almost identical except for one column that I need to retain.

I have a set of data like this in an Excel file: (the / indicates a column)

1234-2345 / ABC / Title1 / Date1 / Date2
1234-2345 / DEF / Title1 / Date1 / Date2
1234-2345 / GHI / Title1 / Date1 / Date2
1234-2345 / LMN / Title1 / Date1 / Date2
1234-2345 / XYZ / Title1 / Date1 / Date2

So basically every row may have another row that is almost identical EXCEPT for the 2nd column which is going to be different. There are many thousands of sets of these.

What I'd like to end up with is something that's set up like this:

1234-2345 / ABC / DEF / GHI / LMN / XYZ / Title1 / Date1 / Date2

I know the column A data (1234-2345) is going to be the key to match these all up.

How do I consolidate these in Excel without a ton of cutting and pasting? I'm not even sure how to Google for instructions for something like this. I can struggle through a VLookup and a simple two column pivot table but that's about the extent of my skills.
posted by marylynn to Computers & Internet (13 answers total) 2 users marked this as a favorite
Are there the same number of the duplicate rows every time? If so, you could use a set of nested IF formulas in a separate column to check if there are that number of rows with the duplicate value below (or above) and if there are, grab all the text strings and CONCATENATE them into one new one.

You would still need to do a bit of fussing around afterwards with sorting, deleting and potentially splitting that column to get the final result you want.

If there aren't the same number of rows each time, you could use a similar method, but with a few more checks in the formula at the beginning to make sure that you're getting all the matches.

This is the first method that pops into my head. There might be something more elegant and/or easier.
posted by Vectorcon Systems at 2:22 PM on May 3, 2012

No, there aren't the same number. MOST of the sets will have the same number of rows (6 of them), in some cases, there may be unique rows or rows with a number <6 of duplicates.
posted by marylynn at 2:27 PM on May 3, 2012

I was thinking concatenate, too, but I also found this method for merging cells without losing data... it may be on too small a scale for you, but with the macro, it could work. Definitely less painful than copy/paste, and easy to get rid of the spare rows with blank cells in column B.
posted by snorkmaiden at 2:30 PM on May 3, 2012

I think the problem with that method is that it would have to be done for each set individually, which is still pretty painful (plus, I'm on Excel 2008 for Mac if that matters - it doesn't even have the ribbon. "It does not wear the ribbon!").

I feel like since the first column is going to be the thing that matches them up I should be able to make use of it, but maybe this is a db thing and not an Excel thing. I usually have a Perl programmer stashed around here but he's too busy with other things to write nifty scripts to solve my title list woes today.
posted by marylynn at 2:36 PM on May 3, 2012

Here's my question, are the ABC DEF GHI items static, like a product name or part number? Meaning is there a finite number of them, or would they be different for each line item?

If so, my thought is to add a column with an 1 in it. Then, do a pivot table with the main number on the left, the Product across the top and the new column to SUM the column with the numbers.

I'm thinking it would look like:

Account Number ABC DEF GHI JKL

1234-5678 1 1 1
1234-5679 1 1 1

Like that.

Then cut and past special values, the pivot table into a new spreadsheet, this will be data.

Then do a new tab, and draw the information from the cut and pasted values.

For example

A1 B1
=Data!a1 =if(Data!b1="1",ABC,"")

I'm hoping you get the idea.

This should leave you with rows and rows of Account Numbers, and Product Names. Then, Concatinate the Product names. (You can build a comma and/or a space with your IF Statement)

So now you should have:

1234-5678 ABC DEF GHI JKL

NOW you can do a VLOOKUP for the rest of the stuff.

To make it simpler, make a new table, removing the Product column, and then remove duplicates.

It's seems like a lot of work, but not as much as cutting and pasting.
posted by Ruthless Bunny at 2:39 PM on May 3, 2012

I'm into my second bottle of wine this evening, so bear with me...

The first thing I would do (and this is not necessarily the most elegant solution, but it is what I would do and I think it will work) is....

Concatenate together column A, C, D,E in column F
Add a countif to column G that looks like this =COUNTIF($F$1:F3,F3) (for G3) and fill down.
This will give you a referenceable number for each item.

In column H, concatenate A and G.
In Column I, a simple reference, so that I1=B1

Then you get outputs by having J=A
K =
and L1 looks like this:

and so on up to O (if you have 6 available types.
P is = C
Q = D
R = E


reformat accordingly, then choose copy and paste values on the lot and then clear duplicates.

Bam, Done.

(Oh man that looks slapdash. Anyway I have an example sheet which I can send you if you like, which might be better than my atrocious, 2007 cotes du rhoneimpaired explanation)
posted by Just this guy, y'know at 2:50 PM on May 3, 2012

my example will only work for 12 rows. Columns K,L,M,N,O,P (or more if there are more than 6 types of column B) should be:


The iferror is in there to neaten things up if you have less than 6 values of B for a particular value of A.
posted by Just this guy, y'know at 3:03 PM on May 3, 2012

I'm working through some of this but I think I may have hit on a much simpler solution. I did a sub-total and found that there are only a few rows that are actually unique (of the 4580 groups, only about 20 had fewer than 6 of those codes - they're customer numbers). So I'm just looking those up individually and then can assume that the other 4560+ have all 6 codes. Then I'll just vlookup for the rest of the data.
posted by marylynn at 3:10 PM on May 3, 2012

OK, here's a formula, based on my thoughts above, that will work for six or fewer duplicates:


Just make sure the data is sorted by column A, post that into a new column, and do some cleanup.
posted by Vectorcon Systems at 3:11 PM on May 3, 2012

That's starting in row 2, with a blank row above.
posted by Vectorcon Systems at 3:12 PM on May 3, 2012

marylynn, if you would like I can forward my test spreadsheet to you, and I reckon you could just drop your data in and it should work.
posted by Just this guy, y'know at 3:15 PM on May 3, 2012

If you can handle a two column pivot table, that should be all you need.
"Summarize" the sheet by Column A and all of the other columns you need and then in the DATA area (2003, can't remember the 2007 term), put a column you don't need. Run the pivot table, copy the whole sheet and Paste Special... Values over the whole thing. Now you will still need to do some cutting and pasting, but it should be less now.
posted by soelo at 4:40 PM on May 3, 2012

I might use some of these in the future since this kind of task comes up over and over, with different kinds of lists. Just this guy, if you want to send it to me I'd appreciate it - my email is in my profile.

Thank you everyone!
posted by marylynn at 9:25 AM on May 4, 2012

« Older Any free-form whiteboarding desktop apps like...   |   Does it do any good to report a dangerous driver? Newer »
This thread is closed to new comments.