I would like to convert 4000 records
August 27, 2008 8:12 AM Subscribe
I scanned with OmniPage 4,000 records and now have a single column of records saved as a text file each record is separated by a blank line. Each record is composed of 4, 5, 6 or 7 lines with the first 3 or 4 or 5 lines being the address the rest is fax, phone, email..
I would like to convert each record so each line becomes a field (comma delimited) and continue to keep each record separated. Then I can Import this file into MS Access and do what I need to do. I was trying to figure this out in MS Word ‘s search and replace (3 hrs) …no luck for me. Is there a simple program out there that will do this for me or can you tell me how to accomplish this, explained simply. I do not know how to write script. Your time and help is much appreciated. This will really help me out
Much Thanks,
Mauimarty
I would like to convert each record so each line becomes a field (comma delimited) and continue to keep each record separated. Then I can Import this file into MS Access and do what I need to do. I was trying to figure this out in MS Word ‘s search and replace (3 hrs) …no luck for me. Is there a simple program out there that will do this for me or can you tell me how to accomplish this, explained simply. I do not know how to write script. Your time and help is much appreciated. This will really help me out
Much Thanks,
Mauimarty
Convert the blank line to some dummy value, like "xxx---xxxx---xxx".
Convert every line ending* character to double-quote comma double-quote
Convert the dummy value to a line ending character*
* on MS-Windows the line ending character is really two characters. I don't recall how word represents it, but you'll figure it out.
posted by orthogonality at 8:54 AM on August 27, 2008
Convert every line ending* character to double-quote comma double-quote
Convert the dummy value to a line ending character*
* on MS-Windows the line ending character is really two characters. I don't recall how word represents it, but you'll figure it out.
posted by orthogonality at 8:54 AM on August 27, 2008
Oh. If you have any real double quotes in your data, escape them first.
posted by orthogonality at 8:56 AM on August 27, 2008
posted by orthogonality at 8:56 AM on August 27, 2008
Response by poster: Thanks d4nj450n
Your interpretation of the format "right?" is correct.
?... It is 123 mockingbird lane
I need each line to be a field like this "xxxxx", and each record separated by a blank line
Thanks orthogonality
The line ending character is a paragraph mark ^p in MS Word
I will give your suggestion a try and let you know.
No real double quotes to delete
Regards
mauimarty
posted by Mauimarty at 10:39 AM on August 27, 2008
Your interpretation of the format "right?" is correct.
?... It is 123 mockingbird lane
I need each line to be a field like this "xxxxx", and each record separated by a blank line
Thanks orthogonality
The line ending character is a paragraph mark ^p in MS Word
I will give your suggestion a try and let you know.
No real double quotes to delete
Regards
mauimarty
posted by Mauimarty at 10:39 AM on August 27, 2008
So if you want just the fields next to each other in csv format, with out regard to what they are, I can whip up a perl script that does that and post it here. Of course you have to be able to run perl somehow.
If you need a blank field where the fourth address is if there is not a fourth, and the rest of the fields in the right order, you are pretty much sol. Do it your self or hire a temp to do it is about the only way.
posted by d4nj450n at 11:01 AM on August 27, 2008
If you need a blank field where the fourth address is if there is not a fourth, and the rest of the fields in the right order, you are pretty much sol. Do it your self or hire a temp to do it is about the only way.
posted by d4nj450n at 11:01 AM on August 27, 2008
Response by poster:
d4nj450n
I did what you suggested and almost problem solved... but no cigar I end up with 1 huge record with multiple fields instead of multiple records with a few fields..
How do I keep the records separated for importing into MS Access. I appreciate everyone's time.
This is the existing text formatted 1 column
address line ending characters are all the same.... paragraph symbol
address line ending character
address line ending character
fax line ending character
phone line ending character
email line ending character
line ending character Blank Line
address
address
address
address
phone
email
fax
posted by Mauimarty at 12:01 PM on August 27, 2008
d4nj450n
I did what you suggested and almost problem solved... but no cigar I end up with 1 huge record with multiple fields instead of multiple records with a few fields..
How do I keep the records separated for importing into MS Access. I appreciate everyone's time.
This is the existing text formatted 1 column
address line ending characters are all the same.... paragraph symbol
address line ending character
address line ending character
fax line ending character
phone line ending character
email line ending character
line ending character Blank Line
address
address
address
address
phone
fax
posted by Mauimarty at 12:01 PM on August 27, 2008
This works in python:
posted by signal at 1:58 PM on August 27, 2008
data = """ address address address fax phone email address address address address phone email fax address address address address phone email fax foo bar address address address address phone email fax spam eggs """ out = "" line = [] for i in data.splitlines(): if i != "": line.append(i) else: out += ",".join(line) out += "\n" line = [] out += ",".join(line) print outgives:
address,address,address,fax,phone,email address,address,address,address,phone,email,fax address,address,address,address,phone,email,fax,foo,barIf you want, memail me the data and I'll memail you the result.
posted by signal at 1:58 PM on August 27, 2008
Best answer: Woops, it gives:
posted by signal at 1:59 PM on August 27, 2008
address,address,address,fax,phone,email address,address,address,address,phone,email,fax address,address,address,address,phone,email,fax,foo,bar address,address,address,address,phone,email,fax,spam,eggs
posted by signal at 1:59 PM on August 27, 2008
This thread is closed to new comments.
address
address
address
fax
phone
email
address
address
address
address
phone
email
fax
right?
so if it was converted to csv it would look like this :
address,address,address,fax,phone,email
address,address,address,address,phone,email,fax
so things don't line up. Are there any line ids? Like
Address1: 123 mockingbird lane
or is it just
123 mockingbird lane?
posted by d4nj450n at 8:23 AM on August 27, 2008