Save me from having to hire a data enterer.
May 28, 2010 1:20 PM Subscribe
Bah! Help me deal with my difficult excel data!
Sorry for the exclamation points.
So I have an excel spreadsheet with 4000ish cells all containing a customers name and address (street number, street, city, state and zip code). All of this data is in one cell and I need to split it up. I need one column with names, one column with their street number and street name, one column with their city, one with their state and one with their zip code.
It looks like this:
Mr. Bill Smith 123 E Fake St. Chicago, IL 10101
Bob Loblaw and Associates 534 N. Clark Ave. Hammond, IN 32132
Mrs. Jane Doe 5435 W Wall Blvd. Beach City, CA 90210
Dr. Dave Johnson PO Box 303 Philadelphia, PA 43453
I've tried copying it into a text file and then importing it again, but because the names and address often have different different lengths it has posed some issues. Having only lightly dabbled in excel, I am clueless.
Sorry for the exclamation points.
So I have an excel spreadsheet with 4000ish cells all containing a customers name and address (street number, street, city, state and zip code). All of this data is in one cell and I need to split it up. I need one column with names, one column with their street number and street name, one column with their city, one with their state and one with their zip code.
It looks like this:
Mr. Bill Smith 123 E Fake St. Chicago, IL 10101
Bob Loblaw and Associates 534 N. Clark Ave. Hammond, IN 32132
Mrs. Jane Doe 5435 W Wall Blvd. Beach City, CA 90210
Dr. Dave Johnson PO Box 303 Philadelphia, PA 43453
I've tried copying it into a text file and then importing it again, but because the names and address often have different different lengths it has posed some issues. Having only lightly dabbled in excel, I am clueless.
Before doing things the difficult way. Maybe try re-importing the text file into excel and playing around with the deliminator settings on the text file while importing.
posted by MechEng at 1:27 PM on May 28, 2010
posted by MechEng at 1:27 PM on May 28, 2010
Ooof. Are you limited to doing this within Excel itself - have you got access to any, say, Unix utilities like cut or sed?
posted by sldownard at 1:28 PM on May 28, 2010
posted by sldownard at 1:28 PM on May 28, 2010
Do the addresses always start with a number or PO? If so you could use regular expressions in excel to separate the name from address (once you've stripped out zip and city).
You can first split the data (using data to columns) with , as the delimiter. Then you get state and zip into a column. You can split that one based on a single space as a delimiter.
posted by special-k at 1:28 PM on May 28, 2010
You can first split the data (using data to columns) with , as the delimiter. Then you get state and zip into a column. You can split that one based on a single space as a delimiter.
posted by special-k at 1:28 PM on May 28, 2010
No need to reimport. Playing around with delimiters like commas, spaces, and periods in the "Text to columns" function will accomplish the same thing. The hard part will be coming up with rules to do that for you. This will do a lot of the work, but you'll have to do a lot of cleanup.
posted by supercres at 1:30 PM on May 28, 2010 [1 favorite]
posted by supercres at 1:30 PM on May 28, 2010 [1 favorite]
Response by poster: Theoretically I could go download an Ubuntu live cd if I had to. I would very much like to avoid it.
posted by masterscruffy at 1:31 PM on May 28, 2010
posted by masterscruffy at 1:31 PM on May 28, 2010
Replace spaces with % symbol then text to columns based on that, then concatenate (=a1&b1") to put them back together semi-meaningfully, then edit based on that?
posted by A Terrible Llama at 2:05 PM on May 28, 2010
posted by A Terrible Llama at 2:05 PM on May 28, 2010
Best answer: A page I have bookmarked for this problem: http://office.microsoft.com/en-us/excel/HA011498501033.aspx
It details how to use all sorts of functions to split text into columns by looking for other things. It's on Microsoft's website so you might have to use IE to get past the windows Live crap.
posted by msbutah at 2:08 PM on May 28, 2010 [1 favorite]
It details how to use all sorts of functions to split text into columns by looking for other things. It's on Microsoft's website so you might have to use IE to get past the windows Live crap.
posted by msbutah at 2:08 PM on May 28, 2010 [1 favorite]
I don't think you can really do this with functions. The text functions within excel can do some things very well but...
Though admittedly you can get the "IL" and "10101" parts separated out using functions.
=MID(A1,LEN(A1)-7,2) = "IL"
=RIGHT(A1,5) = "10101"
I don't think 'text to columns' will be that much use either. Even on your 4 row example it's not clear to me what you would use as a delimiter to get much closer to what you want.
To be honest if you can't reimport the data format with delimiters, I think your stuck doing most of this more or less manually.
posted by selton at 2:46 PM on May 28, 2010
Though admittedly you can get the "IL" and "10101" parts separated out using functions.
=MID(A1,LEN(A1)-7,2) = "IL"
=RIGHT(A1,5) = "10101"
I don't think 'text to columns' will be that much use either. Even on your 4 row example it's not clear to me what you would use as a delimiter to get much closer to what you want.
To be honest if you can't reimport the data format with delimiters, I think your stuck doing most of this more or less manually.
posted by selton at 2:46 PM on May 28, 2010
Best answer: Okay, I'm trying so hard to not go to the bar, I did this instead. The macro below (hopefully I can get it to paste correctly) works on your sample data, anyways. There's a few assumptions that I list at the bottom. This should be pretty straightforward, and I included lots of comments that should make this pretty clear, and also provide you the ability to deal with additional address formats. It's pretty brute force (to say the least), but it works 100% on your sample data, anyways...
posted by inigo2 at 7:58 PM on May 28, 2010 [5 favorites]
Sub parse() ' ' parse Macro ' 'SAMPLE DATA: 'Mr. Bill Smith 123 E Fake St. Chicago, IL 10101 'Bob Loblaw and Associates 534 N. Clark Ave. Hammond, IN 32132 'Mrs. Jane Doe 5435 W Wall Blvd. Beach City, CA 90210 'Dr. Dave Johnson PO Box 303 Philadelphia, PA 43453 Dim fullText As String Dim addressType As String Dim name As String Dim streetAddress As String Dim city As String Dim state As String Dim zip As String Dim length As Integer Dim tempStart As Integer Dim tempEnd As Integer Dim lastRow As Integer lastRow = ActiveSheet.UsedRange.Rows.Count For i = 2 To lastRow 'NOTE: This assumes a header row, the data starts on row 2 'reset the pieces to blank: name = "" streetAddress = "" city = "" state = "" zip = "" fullText = Range("A" & i).Value length = Len(fullText) 'First, the easier stuff: zip = Right(fullText, 5) 'ZIP code is always the last five characters state = Mid(fullText, length - 7, 2) 'State abbreviation is the two chars immediately before the ZIP 'Now, the rest. Identify which type of address it is. If InStr(fullText, "PO Box") <> 0 Then addressType = "PO Box" ElseIf InStr(fullText, "St.") <> 0 Then addressType = "Street" ElseIf InStr(fullText, "Ave.") <> 0 Then addressType = "Avenue" ElseIf InStr(fullText, "Blvd.") <> 0 Then addressType = "Blvd" 'ANY NEW STREET TYPES FOLLOW THIS SAME CONVENTION; SEE INSTRUCTIONS AT BOTTOM FOR ADDING NEW CASE Else addressType = "" MsgBox ("Unknown address type") End If 'Now, go through and parse based on the address type Select Case addressType Case "PO Box" 'It's a PO Box. You know it goes "PO Box#". 'Figure out the length of the box number by finding the space after it tempStart = InStr(fullText, "PO Box") 'finds the starting point of "PO Box" in the original string tempEnd = InStr(tempStart + 7, fullText, " ") 'start searching for the space at the 1st digit streetAddress = Mid(fullText, tempStart, tempEnd - tempStart) 'You also know everything before "PO Box" is the name name = Left(fullText, tempStart - 2) 'You also know everything from the box number first state char is the city tempStart = tempEnd + 1 'the starting point of the city is the character after the space after the box # tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city city = Mid(fullText, tempStart, tempEnd - tempStart) Case "Street" 'first get the city name tempStart = InStr(fullText, "St.") + 4 'city name starts 4 chars past where "St." starts tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city city = Mid(fullText, tempStart, tempEnd - tempStart) 'Figure out the street address. First, the end of it is what was found before: tempEnd = tempStart 'Based on our assumption there's no numbers in the name, find the first number in the full text. 'Since this function returns 0 if it's not found, then we have to adjust for that initially If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1") If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2")) If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3")) If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4")) If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5")) If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6")) If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7")) If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8")) If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9")) If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.") streetAddress = Mid(fullText, tempStart, tempEnd - tempStart) 'Then the name starts at char 1 and goes to the start of the street address tempEnd = tempStart - 2 name = Left(fullText, tempEnd) Case "Avenue" 'first get the city name tempStart = InStr(fullText, "Ave.") + 5 'city name starts 5 chars past where "Ave." starts tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city city = Mid(fullText, tempStart, tempEnd - tempStart) 'Figure out the street address. First, the end of it is what was found before: tempEnd = tempStart 'Based on our assumption there's no numbers in the name, find the first number in the full text. 'Since this function returns 0 if it's not found, then we have to adjust for that initially If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1") If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2")) If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3")) If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4")) If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5")) If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6")) If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7")) If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8")) If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9")) If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.") streetAddress = Mid(fullText, tempStart, tempEnd - tempStart) 'Then the name starts at char 1 and goes to the start of the street address tempEnd = tempStart - 2 name = Left(fullText, tempEnd) Case "Blvd" 'first get the city name tempStart = InStr(fullText, "Blvd.") + 6 'city name starts 6 chars past where "Blvd." starts tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city city = Mid(fullText, tempStart, tempEnd - tempStart) 'Figure out the street address. First, the end of it is what was found before: tempEnd = tempStart 'Based on our assumption there's no numbers in the name, find the first number in the full text. 'Since this function returns 0 if it's not found, then we have to adjust for that initially If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1") If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2")) If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3")) If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4")) If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5")) If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6")) If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7")) If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8")) If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9")) If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.") streetAddress = Mid(fullText, tempStart, tempEnd - tempStart) 'Then the name starts at char 1 and goes to the start of the street address tempEnd = tempStart - 2 name = Left(fullText, tempEnd) 'PASTE ANY NEW CASE STATEMENTS HERE Case Else MsgBox ("Unkown Address Type") End Select 'Finally, fill in the pieces: Range("B" & i).Value = name Range("C" & i).Value = streetAddress Range("D" & i).Value = city Range("E" & i).Value = state Range("F" & i).Value = zip Next i 'resize the columns to fit Cells.Select Cells.EntireColumn.AutoFit End Sub 'ASSUMPTIONS: '-- There will be no numbers in the name '-- The following rules - note the period with the abbreviations (can be forced by find/replace before running macro): ' -- Streets are "St." ' -- Avenues are "Ave." ' -- Boulevards are "Blvd." '-- PO Boxes are always represented as "PO Box *#*" ' ' ''ADDITIONAL NOTES: ''To add additional street types, there are two steps. ''First, add the new option, in "If" block starting on line 53. ''Then, add the "Case" statement, using the following template, with two changes: ''-- Where it says "St.", type the new street abbreviation. ''-- The "+ 4" is based on the length of the abbreviation, plus one (so "St." means 4). Change it based on that. ' ' Case "NewType" ' 'first get the city name ' tempStart = InStr(fullText, "St.") + 4 'city name starts 4 chars past where "St." starts ' tempEnd = length - 5 - 1 - 2 - 1 'total length minus everything after the city ' city = Mid(fullText, tempStart, tempEnd - tempStart) ' ' 'Figure out the street address. First, the end of it is what was found before: ' tempEnd = tempStart ' 'Based on our assumption there's no numbers in the name, find the first number in the full text. ' 'Since this function returns 0 if it's not found, then we have to adjust for that initially ' If InStr(fullText, "1") = 0 Then tempStart = 1000 Else tempStart = InStr(fullText, "1") ' If InStr(fullText, "2") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "2")) ' If InStr(fullText, "3") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "3")) ' If InStr(fullText, "4") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "4")) ' If InStr(fullText, "5") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "5")) ' If InStr(fullText, "6") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "6")) ' If InStr(fullText, "7") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "7")) ' If InStr(fullText, "8") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "8")) ' If InStr(fullText, "9") <> 0 Then tempStart = Application.WorksheetFunction.Min(tempStart, InStr(fullText, "9")) ' ' If tempStart = 1000 Then MsgBox ("Assumptions are suspect. No number in this Street address.") ' ' streetAddress = Mid(fullText, tempStart, tempEnd - tempStart) ' ' 'Then the name starts at char 1 and goes to the start of the street address ' tempEnd = tempStart - 2 ' name = Left(fullText, tempEnd) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
posted by inigo2 at 7:58 PM on May 28, 2010 [5 favorites]
Oh -- to repeat, this assumes the data starts in row 2, and:
Column A = original text
And filled in by macro:
Column B = name
Column C = street address
Column D = City
Column E = State
Column F = ZIP
posted by inigo2 at 8:06 PM on May 28, 2010 [1 favorite]
Column A = original text
And filled in by macro:
Column B = name
Column C = street address
Column D = City
Column E = State
Column F = ZIP
posted by inigo2 at 8:06 PM on May 28, 2010 [1 favorite]
Very nice. Even if it does not catch all the various iterations of addresses, it would make any parsing job a heck of a lot more pleasant.
posted by lampshade at 8:34 AM on May 30, 2010
posted by lampshade at 8:34 AM on May 30, 2010
« Older News on pacaya damage/spanish translation? | Help me find a very specific kind of bag. Newer »
This thread is closed to new comments.
posted by brainmouse at 1:23 PM on May 28, 2010