That's SanDeE* with a star
April 26, 2013 1:55 PM   Subscribe

Sometimes I have to edit a document that contains many instances of the same proper name, such as a story about a person named Scheindler. Is there an easy way to get Word (or another program that I could paste the document into) to indicate any instances that are clearly trying to be Scheindler, but aren't (Schindler or Schiendler or Sheindler)?

One thing that helps sometimes is if Word recognizes the correct name as misspelled. Then I can verify that the name is spelled correctly, hit "ignore all" and see if any words are still underlined as misspelled. But Word doesn't always recognize even clearly non-word names as misspelled. I can set the spellcheck options not to ignore words in all caps, but I don't see a way to set it to ignore or not ignore words with the initial letter capitalized. Anyway, this won't help if the misspelling is also a dictionary word (interestingly, as I type this question, Schindler is not underlined as misspelled, while the other example names are).

I thought about using wildcards in a search, but the name will be different for every document I edit, so I'd have to create a new wildcard search each time. I may be missing something here -- would wildcards work, considering the name will be different each time?

I use both Word 2003 and 2007 (depending on what computer I'm at). I wouldn't mind copying and pasting the text into another program or webform if that would give me more options.
posted by payoto to Computers & Internet (10 answers total) 5 users marked this as a favorite
If you turn on spell checking, it will highlight anything that it can't find in its dictionary. Assuming you have already fixed all misspellings, then proper names will be all that are highlighted.

If you have a bunch of others, you can add them to your dictionary, and then they won't be highlighted any longer.
posted by Chocolate Pickle at 2:06 PM on April 26, 2013 [1 favorite]

If it's a name that wouldn't otherwise be in spell check, you could always add Scheindler to your dictionary. Then spell check ought to do it for you, assuming the name in question wasn't likely to be misspelled as an already-recognized word and that, next week, you're not going to have the reverse problem mispelling Schindler as Scheindler.
posted by Sara C. at 2:07 PM on April 26, 2013

Response by poster: Spell check isn't foolproof because the misspellings may be in the dictionary. (Like if Busch is misspelled Bush.)

I definitely do not want to add anything to the permanent dictionary - when I use this trick, I do "ignore all" instead so that it doesn't make changes beyond the current document. The names will change from day to day and it's quite possible that next week Scheindler could be a misspelling.
posted by payoto at 2:20 PM on April 26, 2013

You want a tool that lets you search by Soundex, which was initially created to match similar-sounding surnames for census results. I did some Googling and didn't find anything easily available, but maybe armed with that term you can find something.
posted by zsazsa at 2:46 PM on April 26, 2013 [3 favorites]

Best answer: Here's a super simple/stupid soundex searcher that I just whipped up. It's ugly, but it works.
posted by zsazsa at 3:18 PM on April 26, 2013 [11 favorites]

Response by poster: I didn't know Soundex existed and it's exactly what I needed. Thank you!
posted by payoto at 4:31 PM on April 26, 2013

Well this will not help you now but if I ever get around to working on my idea I'll come back and find this...

I've thought about writing an extension that would highlight proper nouns, basically anything not in the dictionary, and then add valid spellings to a local dictionary to prune down and help to find the misspellings.

I wonder if there is a way in word to get a list of every word, then prune down to find the misspellings, I can see ways of doing it with unix utilities, but I don't know of an easy windows tool.
posted by sammyo at 7:53 PM on April 26, 2013

Soundex is probably a good starting place, and if you're not technically savvy it's your best bet. If you have any coding skills, what you could do is use a toolkit like Apache Open NLP to do something more sophisticated like:

1) Turn your document into a list of sentences, using the toolkit's sentence detector.
2) For each sentence, use a part of speech tagger to generate the POS for each word in the sentence.
3) For all words tagged as proper nouns (NNP or NNPS), search against your list of canonical proper nouns (e.g., Scheindler and Busch are the OK proper nouns) and indicate any that don't match (or that don't match and are within a certain Levenshtein distance of the canonical proper noun).
posted by axiom at 2:25 PM on April 27, 2013 [1 favorite]

Response by poster: The problem that I see with having a master dictionary or canonical list is that whether or not a name is misspelled is context-dependent. Say today I'm editing a story about George Bush and need to correct all instances of "Busch." I can't just add "Bush" to a master dictionary because if tomorrow I get a story about August Busch I need it to ferret out instances of "Bush."

What is so elegant about zsazsa's soundex searcher is that 1) it lets me define the "correct" spelling anew every time I use it, and 2) I don't need to know the universe of possible misspellings. It isn't foolproof (I just tested it on Catherine/Katherine and it doesn't identify the latter as matching the soundex of the former), but it gets me most of the way there.
posted by payoto at 10:27 AM on April 28, 2013

Sure. When I say 'canonical' I just mean 'for the purposes of the current document'. So 'Busch' is the canonical spelling when writing about August Busch, but not about the former President.
posted by axiom at 9:47 AM on April 29, 2013

« Older Places for a lunch meeting in Chicago (near the...   |   Can germs travel from cheek to mouth? Newer »
This thread is closed to new comments.