Join 3,424 readers in helping fund MetaFilter (Hide)


Excel Merging Rows Help
June 23, 2010 1:57 PM   Subscribe

I have an excel 2007 spreadsheet that has 4,000 rows and 6 columns of information. Need help merging some of the rows together.....


Columns = name, address, primarykey, company, telephone, email

The information comes from two different spreadsheets and I've merged them into one.

Row 1 looks like this:
Bob Smith, ,Atlas, 555-555-1212, smithb@atlas.com

Row 2 may look like this:
Bob Smith, 123 Road, smithb, , ,

Some of the information is out of date and I may not have the 2nd row of information.

I basically need to:

Find duplicate names
merge the address and primary key column information into the 1st row and delete the 2nd row


Any ideas?
posted by bleucube to Computers & Internet (10 answers total)
 
Well, you could add a new column and then figure out all the duplicate names, anyway.
If names are in column A, and new column is column B, formula in B2 should be:
=COUNTIF(A:A,B2)

That will be 1 for all names that are not repeated, and > 1 for any name that is repeated.

Depending on the number of duped names, this could make it an easy manual task, or not really help at all.
posted by inigo2 at 2:11 PM on June 23, 2010


2 step process:

1 - in a 7th column use a formula like "=if(a2=b2,"duplicate","") to identify duplicates. copy all the way down
Use AutoFilter (under Tools/Filter) to view only duplicates and delete them.

2 - join columns A and C together in a separate column using concatenate - "=concatenate(a1," ",c1)
Change the value within the "s if you need to
Copy and past-values (Edit/paste special/values) back into row A and delete row C
posted by jcm at 2:12 PM on June 23, 2010


too early in the morning here. Inigo2 countif is the correct way to go
posted by jcm at 2:14 PM on June 23, 2010


Excel 2007 has a de-duplicator built in. One thing you can do to help it is make a new column that combines the first and second cell of the record to make a more unique record. Then run the Excel de-duper (it is in the Data menu) to clean out the cells that appear more than once. De-duping is not a totally involved process, but it can take some time to whittle down the possible dupes.

Regarding the merge & delete operation, that is something that cannot be totally automated. The duplicated records can be highlighted and you can manually add or delete information.

When I do this for clients, I usually run my de-duper that highlights the rows, then send the file back with the possible dupes back to the client for approval. In an added column, the client can put in a note to keep, delete or combine the record.

Hope this helps.
posted by lampshade at 2:18 PM on June 23, 2010


I'm confused. You don't need to delete the dupes, you need them to complete the record, right? Also, even if you don't have the Row 2 information, does the row still exist, along with the name, with all blanks for the fields?
posted by prenominal at 3:21 PM on June 23, 2010


Can you separate all the "row 2s" from "row 1s"? If you can, I think I would take this approach:
- Place second dataset on a separate worksheet.
- Filter first dataset by a blanks for any column with blanks and use VLOOKUP to lookup the a common key (e.g. the name) in the second dataset and populate that field from the corresponding column in the second dataset.
(Assumes dataset 1 is the most current.)
posted by NailsTheCat at 5:00 PM on June 23, 2010


I turn on condition formatting to show number of duped, 1383 of the records need to be complete.

Prenominal, your right i'm looking to complete the information in the rows that are duped.

i will try suggestions tomorrow and post findings
posted by bleucube at 6:09 PM on June 23, 2010


If you have two rows for each record:

First, create a new column to the left of "name" for row id. For the very first row, enter "1", for the second, enter "2". Copy-paste these two cells for all cells in this column to the end of your data. You now have row ids for each record.

Second, sort the data by this column. You should now have a set of row 1's followed by a set of row 2's.

Now create your complete data. To the right of your current data, create columns for name, address, primarykey, company, telephone, email. In the first row, create "equal to" formulae in each cell to copy the first set of fields from row 1 and for the second set of fields from row 2 in the corresponding columns. Copy-paste this set of formulae until the end of all the row 1s.

Hope this makes sense.
posted by prenominal at 7:44 PM on June 23, 2010


If you are about to have these sorts of issues also in the future, prob. the most reasonable choice is to write a VBA script that does that for you. It's a bit annoying in the beginning, but

a) it is still faster and less risk prone than fixing the changes by hand,
b) once you learn to do that it will save your time tremendously in the future, and
c) You won't have problems in the future you would not be able to solve...

Stay calm as a Zen Master ;)

DB
posted by Doggiebreath at 12:40 AM on June 24, 2010


Seconding VBA as probably your best option here, but I also think you could do this with a series of pivot tables. If all the columns (other than names) were numeric, it would be pretty easy: just select the whole range, create a pivot table with the names as row labels, and for the data field choose any of the remaining data fields. If you change the value field to report the sum, this would return a merge the results for each name into one value, which you could then select as a column and copy-paste somewhere else.

The problem is that your data is in strings, so you'd need some kind of dictionary or hash table to convert to a format that Excel can add.
posted by albrecht at 12:22 PM on June 25, 2010


« Older AC Installation question - So ...   |  Experienced Web / Software Dev... Newer »
This thread is closed to new comments.