Join 3,574 readers in helping fund MetaFilter (Hide)


Another remove duplicates from Excel Q: Find the dupe, remove the ROW
February 20, 2013 2:44 PM   Subscribe

So, Excel 2011 has a handy function for removing duplicates. Neat! Except that when it removes the dupe, it shifts all the cells UP, thereby ruining the relationship between the columns in the row. What I need it to do is this: -Look for dupes -IF it find a dupe, delete the WHOLE ROW that dupe exists in -Profit.

Here's the rub: I'm sorting by e-mail addresses. These are the unique ID. All the other columns could be different, but if that ID is the same, I wanna remove the 2nd (And 3rd and 4th...) instance of it from the sheet.
posted by GilloD to Technology (7 answers total) 4 users marked this as a favorite
 
Realized that's a little cloudy. If column C has a dupe, I wanna delete the whole row, Cols A-E.
posted by GilloD at 2:51 PM on February 20, 2013


If you select the entire area and then click the "Remove Duplicates" button it should give you a dialog box where you can unclick the columns you aren't testing for dupes.
posted by mullacc at 2:58 PM on February 20, 2013 [2 favorites]


O know that isn't directly answering the question, but if you have messy data- you might find that OpenRefine is a drastically better tool for the job.
posted by rockindata at 3:47 PM on February 20, 2013 [3 favorites]


What are you selecting before you hit the remove duplicates command? The whole sheet, or just the column? If you are just selecting the column, you are telling it to do exactly what it is doing: ignoring the rest of the columns. Select the whole sheet (or everything except the column headings) and then do the remove duplicates task. That should work. The same way sort does.
posted by gjc at 5:37 PM on February 20, 2013


You could try this:

(1) Sort the whole thing based on email addresses.

(2) Create a column "B" next to the column where the email addresses are ("A").

(3) Put this formula into B: =IF(A1=A2,0,1) and drag it down. It should put a 1 in column B next to the last occurance of an email address in column A.

(4) Sort the whole sheet on Column B descending to put the 1s up top and the 0s at the bottom.

(5) Delete everything that has a 0 in column B.
posted by alphanerd at 7:50 PM on February 20, 2013


What mullac said. Select all columns, click Remove Duplicates. In the dialog box click Uncheck All, then check only your Email column. Only the email column will be compared for duplicates but the entire row will be removed.

Note that the first instance of an email address is the one that will be kept, so you may want to consider how you're sorting the table before removing duplicates. For example, if there was a "Last Updated" date column and you wanted to keep the most recent update of the duplicated emails, you'd want to sort by Email, then Last Updated (descending) before you remove duplicates.
posted by teg at 10:06 PM on February 20, 2013


If you use the process that alphanerd suggests, you need to do a step between 3 and 4:
Select the new column, right-click Copy, right-click Paste Special Values, enter.

This will change all the formulas into actual values of 1 and 0. Then you can sort by that column and delete all the non-zero rows. If you sort without converting to values, the formula in each cell re-calculates using the newly sorted data (instead of your original sort) and you lose the indicator that tells if the email address was a dupe.
posted by CathyG at 7:30 AM on February 21, 2013


« Older Job Filter: What am I getting...   |  I'm a dual Portuguese - Canadi... Newer »
This thread is closed to new comments.