Join 3,416 readers in helping fund MetaFilter (Hide)

Tags:

Finding Duplicates in Excel
May 29, 2008 6:25 AM   Subscribe

I'm looking for an easy way to find (not remove) duplicate information in an excel database. Ideally I need to be able to easily cross reference the other records with identical information in a certain column.

I'm sorry for going to MeFi so quickly, but you are so dog gone smart. I'm not bad at figuring out excel, and I have written a macro or two, I'm just having trouble getting started.

The database is around 5k records. The information contained in one column might have 25 duplicate records. I'd like to find and be able to cross reference the rows which have duplicates in this one column.
posted by Classic Diner to Computers & Internet (10 answers total) 7 users marked this as a favorite
 
A real quick way of doing it is to make a new column consisting of counts of the relevant data.
eg. if your data is column A, then in column B do "=COUNT(A1,$A$1:$A$100)" and copy/paste that to cover all the cells. Then anything greater than 1 has a duplicate.
posted by inigo2 at 6:33 AM on May 29, 2008


Not sure about the type of cross-referncing you're hoping to do, but you can identify duplicates pretty easily using the COUNTIF Function.

Assuming your 5K records are in column A, in Cell B2 type =COUNTIF(A:A,A2)

Copy and past this formula down the length the records. Any duplicates will return a value greater than 1.
posted by Gee Your Hair Smells Terrific at 6:33 AM on May 29, 2008


I've done this with VLOOKUP. It checks all rows above it and if there is a duplicate, it writes the duplicated in name in its cell. If there is no duplicate it puts "#N/A". To use it, create a blank column (I put mine to the left of the column I'm checking) and on the second row (no need to put it on the first row of information.) Here's the formula, you'll need to modify the cell identifiers for your spread sheet. You'll see it is designed to check all of column B beginning with Row 8.

=VLOOKUP(B8,$B$7:$B7,1,FALSE)
posted by i_love_squirrels at 6:34 AM on May 29, 2008


You've won this round, inigo2...
posted by Gee Your Hair Smells Terrific at 6:34 AM on May 29, 2008


If you aren't looking to create your own chain or macro, the List Compare macro from www.pspsoftware.co.za is really awesome. I use it to compare lists with thousands and thousands of entries on each column and it sorts it effectively, and in an easy to sort formula (by color).
posted by banannafish at 7:41 AM on May 29, 2008


can you give us an example of the data you are working with? Where are the two sets of data? are they each in their own column? are they in the same sheet?

Knowing this stuff would help tailor a better answer to fit your needs. All of the answers above me will work well and I have used them in the past myself.

And, not to be a troll, but please don't use the term "excel database". It is an excel spreadsheet if you are referring to one tab of data and an excel work book if there are many tabs. If you use Microsoft Access then you are working with a database.

Just a pet peeve of mine :)
posted by remthewanderer at 7:44 AM on May 29, 2008


I just reread my answer -- sorry, but I led you astray a bit. The formula posted by Gee Your Hair Smells Terrific is correct (the countif one). No more posting in the morning for me....
posted by inigo2 at 8:53 AM on May 29, 2008


A much simpler way, especially if you're looking for only a few dozen duplicates:

1. highlight the entire spreadsheet a bright color

2. Data >Filter>Advance Filter> Unique Records Only

3. Unhighlight the result

4. Date > Filter > Show All

(The duplicates will still be highlighted)
posted by BrooklynCouch at 9:32 AM on May 29, 2008 [2 favorites]


@remthewanderer
If the workbook contains a single spreadsheet with one record per row + a header row, it is, in fact, a single-table database. It just isn't a relational database and Excel isn't a RDBMS (nor even, by design, a very good DBMS). Databases are structured data. Lots of Excel spreadsheets would qualify (as would many other sorts of files). It's a very common term and an accurate one for describing that type of data.

@Classic Diner
An even simpler solution (Excel 2007):
1) Highlight the column where you expect to find duplicates
2) From the Home tab, Styles group, choose Conditional Formatting, Highlight Cells Rules, Duplicate Values.... Then pick a color combo from the Duplicate Values dialog box and click the OK button.

Optionally, in the Duplicate Values dialog box, you can choose to highlight the unique values (rather than the duplicates).

You turn off the filtering by choosing Clear Rules, Clear Rules from Entire Sheet from the same Conditional Formatting drop down menu.
posted by wheat at 10:31 AM on May 29, 2008


If you don't have Excel 2007, the free ASAP Utilities software makes it very easy to perform these kinds of actions on spreadsheets. It installs as an add-in and is compatible with versions back to Excel 2000.
posted by fuse theorem at 12:33 PM on May 29, 2008 [1 favorite]


« Older Please help with slogan ideas ...   |  Archaeology in the Caucasus - ... Newer »
This thread is closed to new comments.