Help me put the period to this.
September 25, 2010 4:07 PM   Subscribe

I have several lengthy documents I'm editing for a book. The author scanned his hard copy and ran the result through an OCR program. The OCR omitted most of the periods. Is there a way to make Microsoft Word 2000 or OpenOffice Writer 3 put the periods where they go, or am I doomed to putting them in as I edit?
posted by bryon to Computers & Internet (10 answers total)
 
Have you tried turning on invisible characters and seeing if the OCR program left any discernible pattern in place of the missing periods, like maybe two spaces between sentences, that you could use to do a search and replace?
posted by JulianDay at 4:14 PM on September 25, 2010


Ouch. Does the document insert additional spaces between the end of one sentence and the beginning of the next? (Turning on invisible characters will show this.) That might give you the ability to search and replace, but you're still likely to need to proof line-by-line to catch anything that's been missed or anything erroneously inserted.

You might also want to switch on "Check grammar as you type" with the punctuation option checked: it'll be a mass of green at first, and you're reliant upon the MS being written with Word-approved grammar, but it might be worth doing as a second sweep before doing the full line edit.
posted by holgate at 4:17 PM on September 25, 2010


Response by poster: Viewing the invisible characters shows a space and return at the end of a paragraph; I could add a period pretty easily there. Within a paragraph, there's just a single space before the capital letter of the next sentence.
posted by bryon at 4:31 PM on September 25, 2010


Best answer: If you resort to putting a period before every word beginning in a capital letter (excluding instances of " I "), your task would be shortened to removing spurious periods before proper nouns:

Replace all " ([A-Z])" with ". \1", with Use Wildcards selected.
Replace all ". I " with " I ".
posted by IAmBroom at 4:44 PM on September 25, 2010


Sorry: those find/replace options were for MS Word, not OpenOffice. The same can be done in OO, of course.
posted by IAmBroom at 4:45 PM on September 25, 2010


Search for single space followed by a cap? Replace with . space, go one by one to skip "I" and proper nouns?

That's a tough one.

Is there any way to rescan the original?
posted by notyou at 4:47 PM on September 25, 2010


I'm an editor. To be honest, if you're editing these documents for a book, inserting a period after each sentence shouldn't really be that time-consuming, as editing a document means that every word should be read and considered -- and throwing in a period is really just one additional keystroke per sentence.
posted by kate blank at 5:01 PM on September 25, 2010 [1 favorite]


Response by poster: notyou, given what the author had to work with, I don't think rescanning would help.

kate blank, you're right, of course, that it's only one keystroke and I do have to consider every letter of every word anyway. But I've been cleaning this up with lots of searches (e.g., a-f to of) and was hoping to streamline the period process to concentrate on other matters.

IAmBroom, that's a pretty good option. It puts periods in the middle of sentences with things such as "New. York." Still, there's less to fix that way than to add all the periods manually. Many thanks!
posted by bryon at 5:12 PM on September 25, 2010


Is there are grammar checker? Maybe if you add the periods in bulk, as specified, which will give a lot of false positives, then use a grammar checker, it will help find the false positives.
posted by AmbroseChapel at 6:51 PM on September 25, 2010


If you have money to spend (and "time is money" remember) - I bet an off-shore writer/authoring house would do it for $low (like <>
Or use Mechanical Turk for similar budget - pay 1c per 25 periods or similar. Have it done twice and compare versions.
posted by Xhris at 9:07 PM on September 25, 2010


« Older Tasty (Asian) dishes just right for one person?   |   Visual programming would be fun if it didn't... Newer »
This thread is closed to new comments.