MS Word formatting mess - fix or start over
April 30, 2024 8:55 AM   Subscribe

I have a 200-page doc that was created over 20 years ago and edited annually by different people with poor knowledge of MS Word. Should I try to fix it or start over?

When I say people with poor MS Word skills, think of folks who use the space bar to move to a new line, enter a random assortment of tabs and indents all over the place, rarely use Styles, enter hard returns instead of page breaks, and change margins for no apparent reason.

There are quite a few images, tables, and footnotes in this doc as well.

I made a copy and tried to just select all and set the margins to "Moderate," but this didn't fix much and made a bunch of other messes.

Importantly, there are tons of active Track Changes that I don't want to just accept before doing this overhaul.

What is my quickest route to fixing this beast?

We are using 365, if that matters.
posted by Frenchy67 to Computers & Internet (16 answers total) 6 users marked this as a favorite
 
Something close to starting over, if it were me. I'd paste it unformatted into a new document, create Styles, and then apply styles using as many keyboard shortcuts as possible. (E.g. the shortcuts for Heading 1, Heading 2, etc.)
posted by snarfois at 9:10 AM on April 30 [13 favorites]


I'd paste it unformatted into a new document

And do that by pasting it into a good plain text editor like Notepad++ so you don't accidentally bring over any remaining invisible Word cruft, then paste THAT plaintext into a new Word doc and start formatting.
posted by briank at 9:42 AM on April 30 [8 favorites]


And once you do paste it into a new Word doc, you should use the search or search & replace functions to get rid of double spaces, extra tabs, and other unwanted formatting characters. Use th "Special" menu in the Find dialogue box to help.
posted by Leontine at 9:58 AM on April 30 [4 favorites]


Watch out for how copying out as suggested above impacts the track changes. I think you’ll really want to address those and THEN fix the formatting.
posted by rustcellar at 10:01 AM on April 30 [9 favorites]


Sometimes saving to a different format like LibreOffice can be helpful in clearing crud (sometimes it creates more crud though—ymmv)

The Format Painter tool can be helpful in wiping out hidden format changes inside of text.

You can create versions with changes rejected and accepted and compare them to make a new set of tracked changes (this will lose the info on who made the changes, though, so might not be what you want)
posted by music for skeletons at 10:04 AM on April 30


I will +1 the advice to copy it into a plain text editor to banish all the formatting completely before pasting back into Word—if in fact this needs to be a Word doc.

Also, 200 pages is really big. I would urge you to consider breaking it into multiple smaller files.
posted by adamrice at 10:05 AM on April 30 [4 favorites]


I have done this many times, though not to anything 200 pages long.

You can't just convert to plain text and back because of the image, tables, track changes, footnotes, though if you find you have some very tangly bits of text that you just can't seem to get cleaned up, consider doing that for some sections. I sometimes recreate tracked changes (if what matters is the change and not who suggested it) by saving the original text with no changes, saving the final text based on accepting all changes and using Word's document Compare function to redline the changes for me.

Make a copy to work with so you aren't messing up the original document.
Set all images to flow with text rather than be anchored where they are.
Revoke all existing text styles, by setting *everything* to 'Normal' and define your normal style as simply as possible -- single spaces, no indent, no extra spaces between paragraphs. You want to be able to *see* problems.
Find and replace all tabs, double new lines, double spaces, etc. You can do this with regex but I usually don't bother and just run it with, say, find 10 spaces, replace with 1, then find 5 spaces, replace with 1, then find 2 spaces, replace with 1. But I do this to smaller documents than you are working with, so you might find figuring out Regex is worth the trouble.
Go through manually and spot other places where there is weird formatting caused by tabs, spaces, newlines that you might not have caught with your search strings. Changing the font size up on normal to a higher font size for one scroll through the document can also help spot problems because it creates more automatic line breaks.
Show formatting and remove any section breaks -- this might fix your changing margins problems. I'd probably also remove any manually inserted page breaks and define them back into header styles later if you they're needed.
Once you have the document in as plain a state as you can get it into go back and apply header styles. Normally I wouldn't want to strip these all out and start over if I didn't have to but you said they're barely used anyway, so it's probably just as well to start fresh.
Check your original document for any bulleted or numbered lists and recreate them with a style. Be sure to delete any artifact bullets or numbering. If your document doesn't contain a lot of numbers otherwise, you can sometimes do this with find and replace also.
Check your original document for tabular data created with spaces or tabs and paste the original data into excel and use its plain text parsing function to see if you can't get it into an actual table format and then paste back into Word.
Adjust the actual header / normal / bullet styles to what you want them to be in the final document.
posted by jacquilynne at 10:19 AM on April 30 [16 favorites]


If the nuclear option of removing all formatting is too severe, there's a less drastic one: Select all except for the last paragraph mark (the ⁋ at the end of the paragraph when "show invisibles" is turned on) . Copy and paste into a new document. This gets rid of a lot of document cruft, document-specific styles (if any), and some other stuff, but it does not eliminate all formatting.
posted by adamrice at 12:17 PM on April 30 [2 favorites]


jacquilynne has it.

I don't know how you do this efficiently without dealing with the track changes first. In my experience, copying and pasting a doc with track changes into notepad or whatever will pull over the comments/track changes and be worse than just dealing with them in the document.

You will need to do a series of different things, with the order depending on what is in the document and what impacts what, and you'll have to live with the fact that doing one thing might mean you have to fix something different later. So think through the impacts of each change and which is more annoying - a bulk edit that requires going back and fixing some things that the bulk edit screws up, or individually making that edit one instance at a time.

Find and Replace is more useful than you may realize; you can use it to find special characters (tabs, line breaks, etc) and you can replace with nothing to delete things. I've done something like this on a much smaller scale to edit automated transcripts that came with the timestamp codes, which I needed to turn into normal sentences in paragraphs.

I was going to list out a possible order of operations but jacquilynne has it covered. The key is some things might get worse as you try to make them better, and you will need to iterate. Do one thing, if more of the document is made better than worse, keep going; if not, undo and try something else first.
posted by misskaz at 1:24 PM on April 30


n-thing the suggestions to handle the Track Changes stuff first. It'll be far nastier if you do a bunch of reformatting first.

If the previous writers didn't use Styles, the paste-into-plaintext may not help much, especially since you want to preserve footnotes, images, and tables.

Given your description of e.g. spaces used to move to the next line, I think Advanced Find and Replace is your best friend. To fix that, maybe replace " " with ^p, then replace ^p^p with ^p... repeating these as necessary.

Er, if they also used ^p (end of paragraph) as a page break, first replace ^p with ^m (section break).

It sounds like they used tabs like they were using a typewriter. There's probably no easy fix... so keep Show ¶ on, reduce the tabs to a minimum, and convert to tables.

You should be able to apply a single set of margins to the entire document.
posted by zompist at 8:57 PM on April 30


Response by poster: You folks are amazing! Thank you all for the very detailed (and speedy!) advice. I think I will finish the track changes stuff and when the content is all finalized I'll be able to start over and get the formatting sorted using your recommendations. Feeling very grateful for this community.
posted by Frenchy67 at 6:48 AM on May 1


I've done this many times and have found that carrying over anything other than plain text brings with it formatting hang-overs. This also applies with tables and images. I agree you should deal with the tracked changes first, or you'll be in for a world of hurt. The way I've found the most successful is to:
  • create a new blank document and set up styles, margins etc how you want them
  • set the two documents side-by-side on your screen
  • starting at the beginning, cut (not copy) a paragraph at a time and paste it in as 'text only' (you can set this as the default if that makes it easier). Why cut rather than copy? Because it's much easier to keep track of where you're up to and reduces risk of missing anything
  • use your styles to format headings etc as you go and, keeping paragraph marks displayed, delete everything that isn't a space or hard return where they're needed
  • for tables, recreate the table with the required rows/columns and paste each cell in as text
  • for images, cut and paste or save and import them where and how you want them
It sounds tedious and it is, although you'll quickly get into a rhythm and speed will increase dramatically as you go. It's the only way I've found to be sure of not importing all that hidden shit Word creates every time you look away from the screen and all the ugly formatting that people use because they don't know what they're doing. I miss WordPerfect 5.1 and the 'reveal codes' feature every time I do this.
posted by dg at 6:06 PM on May 1 [2 favorites]


I found that a tedious job like this is an excellent way to really get to know a piece of software, especially all those things that save you time like styles and keyboard shortcuts.
posted by snarfois at 2:43 AM on May 2 [1 favorite]


The Reveal Formatting and Select All Text With Similar Formatting tools may be your friends.

I echo that becoming a find and replace wizard will help you. I believe there's also a way to select all text that matches and perform operations on it.
posted by lookoutbelow at 6:33 AM on May 2


I love doing this stuff. You've received some great advice but if someone on my team asked me my opinion on the question, my answer would be "oooh give it to me and let me clean it up for you." This isn't an offer... or is it?
posted by janey47 at 10:24 PM on May 13


Response by poster: Hey all, just wanted to let you know that it's done and your advice was invaluable.

Lessons learned:

- It's amazing how much crud a straight copy-paste was keeping. Putting into Notepad first made things much better.
- Using Styles was really helpful. I'd somehow never bothered except for Headings, but it works quite well.
- I need to monitor staff doing this type of work much more closely than I would have expected.

Still, we have a beautiful clean doc that's easy to read and easy to edit. In fact, it worked so well that we also tackled another long problematic doc that I thought would have to wait until next year.

Thank you all so much.
posted by Frenchy67 at 10:58 AM on July 5 [1 favorite]


« Older Converting groups of TIFFs to a single PDF/   |   How do the Moon and Jupiter protect the Earth from... Newer »

You are not logged in, either login or create an account to post comments