Save me from having to hire a data enterer.
May 28, 2010 1:20 PM   Subscribe

Bah! Help me deal with my difficult excel data!

Sorry for the exclamation points.

So I have an excel spreadsheet with 4000ish cells all containing a customers name and address (street number, street, city, state and zip code). All of this data is in one cell and I need to split it up. I need one column with names, one column with their street number and street name, one column with their city, one with their state and one with their zip code.

It looks like this:

Mr. Bill Smith 123 E Fake St. Chicago, IL 10101
Bob Loblaw and Associates 534 N. Clark Ave. Hammond, IN 32132
Mrs. Jane Doe 5435 W Wall Blvd. Beach City, CA 90210
Dr. Dave Johnson PO Box 303 Philadelphia, PA 43453

I've tried copying it into a text file and then importing it again, but because the names and address often have different different lengths it has posed some issues. Having only lightly dabbled in excel, I am clueless.
posted by masterscruffy to Technology (12 answers total)
 
you can really easily peel off the zipcode and the state: the command =Right(cell,5) will take the rightmost 5 digits from the cell and put them in your new cell. Someone else may have better ideas for how to peel off the city/address, but frankly, since not only are everything different lengths, there's no indication of where once ends and the other begins (since they don't all start with numbers, see PO Box), at best you're going to be doing a lot of fixing.
posted by brainmouse at 1:23 PM on May 28, 2010


Before doing things the difficult way. Maybe try re-importing the text file into excel and playing around with the deliminator settings on the text file while importing.
posted by MechEng at 1:27 PM on May 28, 2010


Ooof. Are you limited to doing this within Excel itself - have you got access to any, say, Unix utilities like cut or sed?
posted by sldownard at 1:28 PM on May 28, 2010


Do the addresses always start with a number or PO? If so you could use regular expressions in excel to separate the name from address (once you've stripped out zip and city).

You can first split the data (using data to columns) with , as the delimiter. Then you get state and zip into a column. You can split that one based on a single space as a delimiter.
posted by special-k at 1:28 PM on May 28, 2010


No need to reimport. Playing around with delimiters like commas, spaces, and periods in the "Text to columns" function will accomplish the same thing. The hard part will be coming up with rules to do that for you. This will do a lot of the work, but you'll have to do a lot of cleanup.
posted by supercres at 1:30 PM on May 28, 2010 [1 favorite]


Response by poster: Theoretically I could go download an Ubuntu live cd if I had to. I would very much like to avoid it.
posted by masterscruffy at 1:31 PM on May 28, 2010


Replace spaces with % symbol then text to columns based on that, then concatenate (=a1&b1") to put them back together semi-meaningfully, then edit based on that?
posted by A Terrible Llama at 2:05 PM on May 28, 2010


Best answer: A page I have bookmarked for this problem: http://office.microsoft.com/en-us/excel/HA011498501033.aspx
It details how to use all sorts of functions to split text into columns by looking for other things. It's on Microsoft's website so you might have to use IE to get past the windows Live crap.
posted by msbutah at 2:08 PM on May 28, 2010 [1 favorite]


I don't think you can really do this with functions. The text functions within excel can do some things very well but...

Though admittedly you can get the "IL" and "10101" parts separated out using functions.

=MID(A1,LEN(A1)-7,2) = "IL"

=RIGHT(A1,5) = "10101"

I don't think 'text to columns' will be that much use either. Even on your 4 row example it's not clear to me what you would use as a delimiter to get much closer to what you want.

To be honest if you can't reimport the data format with delimiters, I think your stuck doing most of this more or less manually.
posted by selton at 2:46 PM on May 28, 2010


Best answer: Okay, I'm trying so hard to not go to the bar, I did this instead. The macro below (hopefully I can get it to paste correctly) works on your sample data, anyways. There's a few assumptions that I list at the bottom. This should be pretty straightforward, and I included lots of comments that should make this pretty clear, and also provide you the ability to deal with additional address formats. It's pretty brute force (to say the least), but it works 100% on your sample data, anyways...
Sub parse()
'
' parse Macro
'
'SAMPLE DATA:
'Mr. Bill Smith 123 E Fake St. Chicago, IL 10101
'Bob Loblaw and Associates 534 N. Clark Ave. Hammond, IN 32132
'Mrs. Jane Doe 5435 W Wall Blvd. Beach City, CA 90210
'Dr. Dave Johnson PO Box 303 Philadelphia, PA 43453

Dim fullText As String

Dim addressType As String

Dim name As String
Dim streetAddress As String
Dim city As String
Dim state As String
Dim zip As String

Dim length As Integer
Dim tempStart As Integer
Dim tempEnd As Integer

Dim lastRow As Integer

lastRow = ActiveSheet.UsedRange.Rows.Count

For i = 2 To lastRow   'NOTE: This assumes a header row, the data starts on row 2
    'reset the pieces to blank:
    name = ""
    streetAddress = ""
    city = ""
    state = ""
    zip = ""
    
    fullText = Range("A" & i).Value
    length = Len(fullText)

    'First, the easier stuff:
    zip = Right(fullText, 5) 'ZIP code is always the last five characters
    state = Mid(fullText, length - 7, 2) 'State abbreviation is the two chars immediately before the ZIP
    
    'Now, the rest. Identify which type of address it is.
    If InStr(fullText, "PO Box") <> 0 Then
        addressType = "PO Box"
        ElseIf InStr(fullText, "St.") <> 0 Then
            addressType = "Street"
        ElseIf InStr(fullText, "Ave.") <> 0 Then
            addressType = "Avenue"
        ElseIf InStr(fullText, "Blvd.") <> 0 Then
            addressType = "Blvd"
        'ANY NEW STREET TYPES FOLLOW THIS SAME CONVENTION; SEE INSTRUCTIONS AT BOTTOM FOR ADDING NEW CASE
        Else
            addressType = ""
            MsgBox ("Unknown address type")
    End If
    
    'Now, go through and parse based on the address type
    Select Case addressType

        Case "PO Box"  'It's a PO Box. You know it goes "PO Box  #".
            'Figure out the length of the box number by finding the space after it
            tempStart = InStr(fullText, "PO Box") 'finds the starting point of "PO Box" in the original string
            tempEnd = InStr(tempStart + 7, fullText, " ") 'start searching for the space at the 1st digit
            streetAddress = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'You also know everything before "PO Box" is the name
            name = Left(fullText, tempStart - 2)
            
            'You also know everything from the box number first state char is the city
            tempStart = tempEnd + 1 'the starting point of the city is the character after the space after the box #
            tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city
            city = Mid(fullText, tempStart, tempEnd - tempStart)
            
        Case "Street"
            'first get the city name
            tempStart = InStr(fullText, "St.") + 4 'city name starts 4 chars past where "St." starts
            tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city
            city = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Figure out the street address. First, the end of it is what was found before:
            tempEnd = tempStart
            'Based on our assumption there's no numbers in the name, find the first number in the full text.
            'Since this function returns 0 if it's not found, then we have to adjust for that initially
            If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1")
            If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2"))
            If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3"))
            If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4"))
            If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5"))
            If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6"))
            If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7"))
            If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8"))
            If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9"))
            
            If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.")
            
            streetAddress = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Then the name starts at char 1 and goes to the start of the street address
            tempEnd = tempStart - 2
            name = Left(fullText, tempEnd)

        Case "Avenue"
            'first get the city name
            tempStart = InStr(fullText, "Ave.") + 5 'city name starts 5 chars past where "Ave." starts
            tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city
            city = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Figure out the street address. First, the end of it is what was found before:
            tempEnd = tempStart
            'Based on our assumption there's no numbers in the name, find the first number in the full text.
            'Since this function returns 0 if it's not found, then we have to adjust for that initially
            If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1")
            If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2"))
            If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3"))
            If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4"))
            If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5"))
            If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6"))
            If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7"))
            If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8"))
            If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9"))
            
            If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.")
            
            streetAddress = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Then the name starts at char 1 and goes to the start of the street address
            tempEnd = tempStart - 2
            name = Left(fullText, tempEnd)

        Case "Blvd"
            'first get the city name
            tempStart = InStr(fullText, "Blvd.") + 6 'city name starts 6 chars past where "Blvd." starts
            tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city
            city = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Figure out the street address. First, the end of it is what was found before:
            tempEnd = tempStart
            'Based on our assumption there's no numbers in the name, find the first number in the full text.
            'Since this function returns 0 if it's not found, then we have to adjust for that initially
            If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1")
            If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2"))
            If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3"))
            If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4"))
            If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5"))
            If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6"))
            If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7"))
            If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8"))
            If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9"))
            
            If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.")
            
            streetAddress = Mid(fullText, tempStart, tempEnd - tempStart)
            
            'Then the name starts at char 1 and goes to the start of the street address
            tempEnd = tempStart - 2
            name = Left(fullText, tempEnd)

        'PASTE ANY NEW CASE STATEMENTS HERE
        
        Case Else
            MsgBox ("Unkown Address Type")

    End Select

    
    'Finally, fill in the pieces:
    Range("B" & i).Value = name
    Range("C" & i).Value = streetAddress
    Range("D" & i).Value = city
    Range("E" & i).Value = state
    Range("F" & i).Value = zip

Next i

'resize the columns to fit
Cells.Select
Cells.EntireColumn.AutoFit

End Sub

'ASSUMPTIONS:
'-- There will be no numbers in the name
'-- The following rules - note the period with the abbreviations (can be forced by find/replace before running macro):
'   -- Streets are "St."
'   -- Avenues are "Ave."
'   -- Boulevards are "Blvd."
'-- PO Boxes are always represented as "PO Box *#*"
'
'
''ADDITIONAL NOTES:
''To add additional street types, there are two steps.
''First, add the new option, in "If" block starting on line 53.
''Then, add the "Case" statement, using the following template, with two changes:
''-- Where it says "St.", type the new street abbreviation.
''-- The "+ 4" is based on the length of the abbreviation, plus one (so "St." means 4). Change it based on that.
'
'        Case "NewType"
'            'first get the city name
'            tempStart = InStr(fullText, "St.") + 4 'city name starts 4 chars past where "St." starts
'            tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city
'            city = Mid(fullText, tempStart, tempEnd - tempStart)
'
'            'Figure out the street address. First, the end of it is what was found before:
'            tempEnd = tempStart
'            'Based on our assumption there's no numbers in the name, find the first number in the full text.
'            'Since this function returns 0 if it's not found, then we have to adjust for that initially
'            If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1")
'            If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2"))
'            If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3"))
'            If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4"))
'            If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5"))
'            If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6"))
'            If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7"))
'            If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8"))
'            If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9"))
'
'            If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.")
'
'            streetAddress = Mid(fullText, tempStart, tempEnd - tempStart)
'
'            'Then the name starts at char 1 and goes to the start of the street address
'            tempEnd = tempStart - 2
'            name = Left(fullText, tempEnd)



posted by inigo2 at 7:58 PM on May 28, 2010 [5 favorites]


Oh -- to repeat, this assumes the data starts in row 2, and:
Column A = original text
And filled in by macro:
Column B = name
Column C = street address
Column D = City
Column E = State
Column F = ZIP
posted by inigo2 at 8:06 PM on May 28, 2010 [1 favorite]


Very nice. Even if it does not catch all the various iterations of addresses, it would make any parsing job a heck of a lot more pleasant.
posted by lampshade at 8:34 AM on May 30, 2010


« Older News on pacaya damage/spanish translation?   |   Help me find a very specific kind of bag. Newer »
This thread is closed to new comments.