Help me get data from a word doc into a spreadsheet/database, please. thank you!
September 19, 2007 10:45 AM Subscribe
Help me get Lexis-Nexis search results from an MS word document into a usable spreadsheet/database format. More inside.
I've got an MS word document full of search results from Lexis-Nexis. It is addresses of people and I'm trying to build a list (through reverse directory) of people who live in a particular building.
2 problems.
1. One, it is largely unformatted, just name, followed by some address info, then followed by previous addresses, sometimes a phone number etc.
2. The search doesn't distinguish between people who have that address currently and people who have that address listed as a "previous address". I'm only interested in the ones that list it as a current address.
I need to get this into excel or access, with columns for name, address (which will all be the same, except apt #), phone.
I've tried to import into various formats and take subsets, but no luck so far. Really appreciate any help.
I've got an MS word document full of search results from Lexis-Nexis. It is addresses of people and I'm trying to build a list (through reverse directory) of people who live in a particular building.
2 problems.
1. One, it is largely unformatted, just name, followed by some address info, then followed by previous addresses, sometimes a phone number etc.
2. The search doesn't distinguish between people who have that address currently and people who have that address listed as a "previous address". I'm only interested in the ones that list it as a current address.
I need to get this into excel or access, with columns for name, address (which will all be the same, except apt #), phone.
I've tried to import into various formats and take subsets, but no luck so far. Really appreciate any help.
You can try saving it as a tab-delimited text file, and then opening that with Excel. But only after giving it a consistent format, I'm afraid.
So yeah, I'm with bonaldi. It sounds like it needs the power of a human to get the formatting straightened out.
posted by malaprohibita at 8:35 PM on September 19, 2007
So yeah, I'm with bonaldi. It sounds like it needs the power of a human to get the formatting straightened out.
posted by malaprohibita at 8:35 PM on September 19, 2007
Oh wait, just thought of something that might make your life a bit easier. If you don't already have it, download the text editor TextPad (or the text editor of your choice, really). Save your original file as plain text. Using the ALT+left mouse combo, you can highlight vertical blocks of text and copy/paste them directly into excel. Might be helpful.
posted by malaprohibita at 8:41 PM on September 19, 2007
posted by malaprohibita at 8:41 PM on September 19, 2007
If I understand your description you are pretty much out of luck getting there without a lot of manual prep (I've done many, many, many conversions of the type you're describing).
Basically you can go in, separate each record by a return, and insert a comma between each data element in each record to transform the document into a comma separated value format document (see details on this format here), at which point you can convert it into an Excel file. If you don't need the info after the current address, you can leave that unformatted, it will all just dump into the last field of the spreadsheet. Once you've got it in a spreadsheet format you should be able to sort the records by the current address field which should group the records with the current address you are looking for in one cluster. Since as you describe it the data after the current address may not be consistent (variable number of previous addresses, telephone number may or may not be present?) you will probably have to do some manual pruning and cutting of telephone data to get it into a discrete field.
All this is ASSUMING that each record has all the data points in the same order. If the records don't have the data points in the same order you're totally out of luck for an automated approach. You have to go through every record manually and make the determination, cutting out the relevant records and pasting them into a separate document, then manually entering each element into a spreadsheet or database.
posted by nanojath at 8:46 PM on September 19, 2007
Basically you can go in, separate each record by a return, and insert a comma between each data element in each record to transform the document into a comma separated value format document (see details on this format here), at which point you can convert it into an Excel file. If you don't need the info after the current address, you can leave that unformatted, it will all just dump into the last field of the spreadsheet. Once you've got it in a spreadsheet format you should be able to sort the records by the current address field which should group the records with the current address you are looking for in one cluster. Since as you describe it the data after the current address may not be consistent (variable number of previous addresses, telephone number may or may not be present?) you will probably have to do some manual pruning and cutting of telephone data to get it into a discrete field.
All this is ASSUMING that each record has all the data points in the same order. If the records don't have the data points in the same order you're totally out of luck for an automated approach. You have to go through every record manually and make the determination, cutting out the relevant records and pasting them into a separate document, then manually entering each element into a spreadsheet or database.
posted by nanojath at 8:46 PM on September 19, 2007
Take a look here and see if anything seems useful.
posted by Mr. Gunn at 7:07 PM on September 20, 2007
posted by Mr. Gunn at 7:07 PM on September 20, 2007
« Older Do you have a policy for how you acknowledge... | Pain has a kung fu grip on my shoulders. Newer »
This thread is closed to new comments.
If it is in a reasonably standard format, so that a grep search can tell what is a name and what is an address ... then grep can do it. If not, I don't see how anyone but a human is going to be able to do this. How could the computer tell what's a current address?
posted by bonaldi at 11:11 AM on September 19, 2007