Excel function/filtering. Help!
August 10, 2011 1:02 PM   Subscribe

In Excel, how would I do the following: if a cell in a certain column has a duplicate in that column, plus the corresponding cell in that row but another column is blank, delete the whole row?

Hi. I feel like I've got the dumb today, and I can't quite come up with how to search for this, so I'm asking the most fun people on the tubes!

I have a spreadsheet. (Well, I have a number of spreadsheets, and will have many more.) I need to remove entries that aren't needed before I upload it to our processing software. The entries that need removing are ones that have a certain column blank. However, not all of the blank ones need to be removed! The ones that do need to be cut have an entry in a different column that is a duplicate of another entry in another row, but I can't just cut everything that has a duplicate, because there's also a set of duplicates that need to stay.

Therefore, like so:
A B C
1 2 3
1 4 3
1 5
6 7

if I know that I need to identify the lines where column A is the same and C is blank (so I can delete them) like line 3, but not pick up lines like line 2 (where A is the same but C is filled) or 4 (where C is blank but A is different), how do I do it?
posted by dust.wind.dude to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
Huh. How about if you sort by A, then C. Then create a new column D. Formula in cell D2: =if(exact(A2,A1),if(isblank(C2),1,0),0)

I think that will give you what you want.
posted by Perplexity at 1:08 PM on August 10, 2011


(In case it's not clear, that will populate column D with 1s and 0s, where the 1s are rows you want to delete. Then you cut column D, and paste special > paste values. Then you can sort by column D and delete all the rows with value 1).
posted by Perplexity at 1:09 PM on August 10, 2011


Response by poster: omg, 6 minutes. is that a record? thank you Perplexity! this definitely worked.

of course, if anyone knows a more efficient way, I'd love to hear it as well - as would my co-workers, I'm sure!
posted by dust.wind.dude at 1:32 PM on August 10, 2011


You can find duplicates by using a COUNTIF formula. If you are looking for them in column A, you would make the formula =COUNTIF(A:A, A2). The result will be how many times that value shows up in column A. So, in this case you could filter (Data->Autofilter) for the values that are above one. then you can also filter for Blanks in column C. You don't need to resort to delete rows. However, what would your desired result be if you had two of the same value in A and they were both blank. Would you want them both deleted, or would you want one to stay? This way, both will be deleted.
posted by soelo at 1:35 PM on August 10, 2011


Response by poster: soelo, yes, if I have two entries in A and both have C blank, I need one of them kept. (And yes, I do need to eventually delete these rows out rather than just filter them away, in order to make the import code read properly, but it doesn't matter at what point that happens.)

Plus, there are some entries that are duplicates in A that both need to be kept (because they each have a value in C).

Basically, a vendor sends me a list of serial numbers of products, with a line item for each service contract or warranty each item has. this means that I could have 4 lines of the same serial number, two of which are two types of service contracts (filled in a certain column) and need to be kept, and two of which are warranties (blank in a certain column), which need to be deleted (because the SLA supersedes the warranty info in the db.) Plus, in the same file, I could have 2 lines of the same serial, showing two different kinds of warranty information (blank in a certain column), but we don't track the individual types, so one line needs to be deleted and the other kept (to be converted into the right code format, which I think I've got covered).
posted by dust.wind.dude at 1:43 PM on August 10, 2011


Hmm, I'd use Access at this point especially if it going into another database, but you can still do this in Excel if you must. Really, the database you're uploading to should be able to handle this type of deduplication.

Back to Excel, though. Pivot Tables will help you do this. I am assuming you just need the three columns A-C in this example. The interface is different from the 2003 to the 2007 version, but it's still the same basic concepts.

In 2007, go to the Insert tab and the first button will be Pivot Table, hit it and select pivot table from the drop down. In the box that pops up, make sure the range is selecting all of your data and pick new Worksheet on the bottom set of radio buttons. Hit OK and this will create a new tab with the Pivot Table Field list showing your column names. Drag all of the column names to the box on the bottom named "Row Labels". Close the Field list. Now your data is there but messy. Right click on a value from column A and uncheck the menu option that starts with "subtotal". Do the same for all columns on the same line as the A value. Copy the whole sheet and Paste Special...values to remove the pivot table formulas.

Now you will have no duplicates in column A at all, but you do want duplicates if they have different service contracts, right? All your contracts are still there but they are blank in column A (and any other column that had the same values in both rows), so you need to copy the value immediately above the blank cells. Filter for blanks, fill in the formula in the first cell and paste it to the rest of the blank cells.
posted by soelo at 2:59 PM on August 10, 2011


VBA. Bedtime here in the uk, but I'll post something tomorrow.
posted by MattWPBS at 4:33 PM on August 10, 2011


An all-in-one formula that you could stick in D2 and copy down is:

=IF(COUNTIF(A:A,A2)=1,"",IF(NOT(ISBLANK(C2)),"",IF(SUMPRODUCT((A:A=A2)*NOT(ISBLANK(C:C))),"Delete - Non-Blank Exists",IF(SUMPRODUCT(($A$1:A1=A2)*ISBLANK($C$1:C1)),"Delete - Duplicate Blank",""))))

The first IF checks if A of the current row is unique. If so, the output is an empty string.
The second IF checks if C of the current row is not blank. If so, the output is an empty string.
The third IF checks if there is a row with A that matches A of the current row and non-blank C. If so, the output is "Delete - Non-Blank Exists"
The fourth IF checks if there is a row with A that matches A of the current row and blank C but only looks in the rows preceding the current row. If so, the output is "Delete - Duplicate Blank", otherwise the output is an empty string.

This relies on there being a header row, otherwise the part of the fourth IF that only checks the rows preceding the current one will fail.

If you're not familiar with using SUMPRODUCT for conditional logic (as the third and fourth IFs do) then it might be useful to look here.
posted by MUD at 5:56 PM on August 10, 2011


This appears to do what you're asking for:

Create a button on the sheet with the data. For the button's code, enter:
Private Sub CommandButton1_Click()
    DeleteDuplicates
End Sub
Then define the sub DeleteDuplicates in a module as follows:
Option Explicit
Dim LastRow As Long
Dim MyRange As Range
Dim Cell As Range

Sub DeleteDuplicates()
    LastRow = Range("A65536").End(xlUp).Row
    Set MyRange = Range("A1:A" & LastRow)
    For Each Cell In MyRange
        Cell.Select
        If Not IsEmpty(Cell) And _
            Application.WorksheetFunction _
            .CountIf(MyRange, Range("A" & Cell.Row).Text) > 1 _
            And IsEmpty(Cell.Offset(0, 2)) Then _
            Range("A" & Cell.Row).EntireRow.Delete
    Next Cell
    
End Sub
Of course, you can insert the code into the CommandButton1_Click() sub directly, but by putting it in a separate module and calling it indirectly you can tie other buttons to it (or a keystroke if you prefer) and it will work regardless of what sheet is active when you use it.
posted by perspicio at 6:34 PM on August 11, 2011


On closer look, that doesn't quite work right. It misses candidate rows. A quick search of the inestimable Ozgrid revealed that it needs to loop from the last cell to the 1st (presumably so as not to lose its place when the row gets deleted).

Reformulating the For loop would solve that.
posted by perspicio at 7:04 PM on August 11, 2011


Fixed and improved:
Option Explicit
Dim LastRow As Long
Dim Cellrow As Long

Sub DeleteDuplicates()
    LastRow = Range("A65536").End(xlUp).Row
    With Application
        .ScreenUpdating = False
        .Calculation = xlCalculationManual
    End With
    For Cellrow = LastRow To 1 Step -1
        If Not IsEmpty(Range("A" & Cellrow)) And _
            Application.WorksheetFunction.CountIf _
            (Range("A1:A" & LastRow), Range("A" & Cellrow).Text) > 1 _
            And IsEmpty(Range("A" & Cellrow).Offset(0, 2)) Then _
            Range("A" & Cellrow).EntireRow.Delete
    Next Cellrow
    With Application
        .ScreenUpdating = True
        .Calculation = xlCalculationAutomatic
    End With
End Sub

posted by perspicio at 7:50 PM on August 11, 2011


Response by poster: Wow, you guys are great! Thank you!

but we discovered this morning we don't have to remove any of this stuff after all

Thank you for all your efforts.
posted by dust.wind.dude at 12:31 PM on August 12, 2011


« Older So...I'm officially sexually active now, huh?   |   Which Mac Mini? Newer »
This thread is closed to new comments.