How to reconcile two lists in Excel?
December 5, 2018 12:48 PM   Subscribe

I have two long lists that I need to manually reconcile. The lists contain many of the same items, but List B has lots of inconsistency in formatting. I need to rename the second list to have the standard names from List A. Also, each list has some items that are not in the other list, and one is about twice as long as the other.

I'm looking for a way to spread them out, putting items with similar names side-by-side, to make the manual work quicker. I imagine the most straightforward way to define "similar" is alphabetical order. Example:

List A:
Apple Granny Smith
Apple Yellow Delicious
Banana
Pear

List B:
Apple-GrannySmith
Banana-yellow
Pear-
Watermelon

Matched (two columns):
Apple Granny Smith, Apple-GrannySmith
Apple Yellow Delicious, [blank]
Banana, Banana-yellow
[blank], Watermelon
Pear, Pear-
posted by gray17 to Computers & Internet (9 answers total) 4 users marked this as a favorite
 
Depending on how big your data is, you could find the unique column values from each list (Highlight the column, then go to Data, then Remove Duplicates). You can order this however way you like using Sort (Home, then Sort & Filter all the way on the right).

Then create a table where you match up the desired List A values with List B values. You can then create a new column in List B that uses a VLOOKUP command to find the corresponding List A value.
posted by devrim at 12:53 PM on December 5


Excel isn't the greatest tool for this, but you could make some headway by creating new columns next to your lists that have more uniform versions of the entries. Here's an example formula that will remove dashes and spaces and convert all letters to uppercase:
=UPPER(SUBSTITUTE(SUBSTITUTE(A1," ",""),"-",""))
Then you use the MATCH() function to look up uniform versions of each entry in the opposing list. If nothing else, this would flag a lot of the matches.
posted by paper chromatographologist at 12:56 PM on December 5


Or, depending on your data, and the nature of your variations you might be able to just check first n characters, using a formula like this instead:
=LEFT(A1,5)
Or you could combine this with the formula I gave above.
posted by paper chromatographologist at 12:59 PM on December 5


I would import the lists into OpenRefine and use the clustering function. It detects similar entries and can rename them to make everything consistent. You can then export back out to Excel.
posted by mariokrat at 1:27 PM on December 5 [8 favorites]


Yes, this is a great use case for openRefine.
posted by rockindata at 1:30 PM on December 5 [1 favorite]


nthing OpenRefine
posted by kbuxton at 1:31 PM on December 5


If list B always has the hyphen in between, you could try text to columns, and use the hyphen as a delimiter. Then you'd get a List C that was just the items after the - in List B.

You could use CONCAT to combine A and B then, though I'm not sure that's what you're going for?
posted by nakedmolerats at 2:13 PM on December 5


This isn't quite a direct answer, but I had a somewhat similar Excel challenge that I just could not figure out, and I ended up hiring someone on Fiverr to assist me. Not only did that get me out of the fire for what I needed in the immediate sense, but I also now have the Excel formulae that I need to adapt it for future use. Just food for thought!
posted by kryptondog at 8:40 AM on December 6


If you really want to stay in excel to do it, and you're on windows, the fuzzy lookup add-in (it's from microsoft, not a random thing) should be just what you need.
posted by mosst at 10:48 AM on December 6 [1 favorite]


« Older I want a new gmail email address. Can you help me?   |   SpiderFilter: Yet another spider ID question Newer »

You are not logged in, either login or create an account to post comments