Writing MS Word macro to convert glossary formatting to Excel table
January 25, 2016 8:27 AM   Subscribe

I need some guidance in writing a macro that will parse a glossary Word document and reformat it into an Excel spreadsheet with the term in one column and the definition in a second column.

The formatting of each glossary entry thankfully is consistent as such:

Term Definition goes here and is not bold.
posted by beammeup4 to Computers & Internet (5 answers total) 2 users marked this as a favorite
 
I don't think that you really need to write your own macro. Just paste the text into Excel and then you should be able to use the "Text to Columns" function to separate the first word into the first column and the definition into the second column. It might take a couple tries to figure out the best way to delimit the term column, but that's still much easier than writing a macro.
posted by hydropsyche at 9:10 AM on January 25, 2016


Response by poster: That would work if the terms were all a single word. Some of the terms are phrases in my situation.
posted by beammeup4 at 9:25 AM on January 25, 2016


If there is no delimiter between the word and the body of the definition, Text-to-Columns may not work (but kudos to Hydropsyche anyway, that would have been my first stab too). Breaking analysis on "bold" is a weird one. How many multi-word's are there? Maybe handle them in a second manual step?

Is boldness literally the only differentiator between the word and the definition? No tab or colon or anything else useful? Are the multi-words hyphenated?

Last but not least, if you paste into Excel, and text to column with space as a delimiter, does Excel recognize the cells with bolded text as bolded cells or just bolded text? This would be the key to making this a macro (you need to be able to check for FontStyle = bold). If that part goes smoothly, then row-by-row you concatenate the bold pieces back together (replacing the spaces) and the non bolded piece back together (replacing the spaces).

Hopefully, it doesn't come to that!
posted by milqman at 9:46 AM on January 25, 2016


The best approach will depend on how proficient you are in VBA (e.g. working in Excel object model from Word or in Word object model from Excel) and how frequently you need to perform this task.

If less proficient, I would keep it simple and build on hydropsyche's suggestion. You just need to insert a delimiter between Term and Definition, which you can base on the Bold state of the word (or character if you prefer). The below should get you started.
Dim p As Paragraph
Dim iWord As Integer

For Each p In Selection.Paragraphs
For iWord = 1 To p.Range.Words.Count
If Not p.Range.Words(iWord).Bold Then
'insert a delimiter e.g. "|" here
p.Range.Words(iWord).InsertBefore ("|")
Exit For
End If
Next iWord
Next p
Then you can delimit on that pipe character (either in Word or after copying to Excel) to split it into two separate columns.

If you're fairly proficient in VBA and you need to do this regularly then you can write a macro to create a new Excel workbook directly from Word, add the text, format it as you required etc. I have some VBA which does this, dumping a document's comments' text, the text they refer to, section name etc. into Excel. It works very nicely but takes more time to write obviously. If you'd like my code to start you off just shout and I can PM it to you.
posted by NailsTheCat at 9:53 AM on January 25, 2016 [1 favorite]


Best answer: It is possible in Word to set Find and Replace to work on formats only:

I only have Office 2003(!) on my home machine in front of me, but look for something like:

Find >>> Format >>> Font >>> Font Style >>> Bold

in the Find and Replace dialogue box to find all bold text.

It is then also possible to use Find and Replace to actually Find and Append.

Use the string ^& in the Replace with box, and then add a delimiter e.g. ^&_

So in summary Find: Format: Bold combined with Replace with: ^&_ will append an underscore after each section of bold text in each paragraph (and not inbetween individual bold words, unless your text has crazy space formatting).

You should then be able to use text to columns in Excel as described above with e.g. underscore as a delimiter to break the text in two.
posted by protorp at 12:03 PM on January 25, 2016 [2 favorites]


« Older How do I identify a cabinet hinge to order...   |   Sorry I had to shoot your dog Newer »
This thread is closed to new comments.