What is a word?
March 22, 2005 2:11 PM   Subscribe

WordFilter: I'm using Microsoft Word as my word processing software for a very, very large project (my PhD). I have a strict limit of 100,000 words to follow, and although it seems as if that's a lot, I'm feeling the pinch right about now. The issue I'm having is that I get conflicting reports when I do a word count, using 2 different methods...

Using Word's own counting algorithm, I get a significantly higher count than I do when I do an Alt-Enter on the document in Win 2K Pro. At first, I thought that Word's count might be better, but then I read this explanation from Microsoft that seems to indicate that discrete characters, line header numbers, and the like might be counted as words.

So what contributes to the differences in the counts? Is one more accurate than the other? I'm hoping the Win 2K count is more precise, as it gives me a little more wiggle room, but I'm prepared for the answer either way. Also, does the Win 2K method account for footnotes and endnotes? I assume it does, as those are still words within the document, although at this point, anything might be true!
posted by yellowcandy to Writing & Language (13 answers total)
 
Algorithms seem to differ widely.

For example, if you export your Word document to text and use 'wc' on a UNIX platform (Cygwin/Linux/OS X etc.) you will get the same results as in Word's algorithm: the page numbering and other header and footer information will be exported along with the text, and 'wc' will scan them.

On the other hand, Word 2004 for Macintosh will give you the option of enabling footnotes and captions in a word count. By default these are not included in calculations.
posted by AlexReynolds at 2:24 PM on March 22, 2005


I would ask whoever would enforce the limit how they would count the words. That's all that matters.
posted by smackfu at 2:28 PM on March 22, 2005


...in fact, you should probably ask what program they'll be using to count the words, and run it through that program. You don't need to worry about what counts as a word to that program as long as you get the right results.
posted by nebulawindphone at 2:36 PM on March 22, 2005


In my dissertation group, a woman who was quite the Word expert said that Word was not really designed for documents longer than 100 pages. She suggested keeping the chapters in separate files until the very end, then putting them together. Because of my field, my dissertation ended up being only about 140 pages, but for some people with dissertations in the 300+ page range, they ended up doing even the final copy in a couple of files.
posted by abbyladybug at 2:37 PM on March 22, 2005


Yeah, I just played around with Word myself and can confirm that every discrete character is a "word"

At first I thought smackfu's answer was flippant but it is probably spot on. You need to know how this will be checked and then work around the constraints in that method.

Anyways, Assuming they will use Word too:

The count includes symbols such as a bullet in a bulleted list. So, use your symbols carefully, and keep characters together.

-Use symbols carefully
-If you are closing a parentheses for example, don't leave a space between the last word and the close parenthesis.
-If a word can be optionally hyphenated then hyphenate it, then it only counts as one word instead of two.

Etc.
posted by vacapinta at 2:39 PM on March 22, 2005


I haven't used it for ages, but I believe there is a feature termed 'master documents' or some such that essentially allows you to keep sections, like chapters, in discrete docs.

I would loudly argue that heds, pagination, maybe even a TOC and index do not count toward your total if you feel that you must go long.

Furthermore, go long anyway, but with an eye to the cut. Just think! You can enter postdoc with your first paper already written!

Hope this helps. Good luck!
posted by mwhybark at 5:57 PM on March 22, 2005


I strongly recommend you abandon Word post-haste. It is almost guaranteed that it is going to leap up and bite you irrevocably in the ass.

The Word support groups are full of horrifying stories about large files being irrepairably damaged by Word. It is simply incompetent at large document processing.

This is why most post-graduate thesis are published using TeX, Framemaker, Ventura, or other long-document publication software.

At the very least, you should be making backup copies like freaking mad -- and do so in both Word, RTF, and plaintext (ASCII) formats.

(Also, you're going to find that it buggers-up pagination, figure numbering, ToC, and other niceties, particularly if you do anything even remotely fancy, like footnotes or cross-references.

You Have Been Warned.)
posted by five fresh fish at 6:16 PM on March 22, 2005


Response by poster: Word has its problems, but I'm not going to suddenly abandon what I've been using for years and migrate to something different in the very last stretch. Word works just fine thanks, and I'll just have to ignore the Chicken Little warnings. I also don't know a single PhD student who's using those packages to write a dissertation, and most especially not in the UK.

I do appreciate the independent confirmation of the characters counting towards the final word count. That helps me indicate I've done my due diligence if challenged on the length.

I'm not submitting the document electronically; it gets printed, bound, and then sent off to readers. It's only if one of them (or someone in the administrative offices) asks for a word count that things get sticky. My hunch is that this won't happen unless one of the two readers has a real issue with the length. I think the document will end up being just under the limit, so the question may well be moot, but it's just nice to have a way of showing that I didn't just invent a word count value for the document.

Printing the thing on A4 paper while in New York will be another hassle altogether... but that's another question for another time.
posted by yellowcandy at 10:21 PM on March 22, 2005


yellowcandy: How hard is that commitment to 100,000 words? My own (UK) institution (OU) had a 100,000 word limit on PhD theses but mine came in at 115,000 and I got permission to submit an extended version pretty much on the nod from my supervisor.
posted by biffa at 3:13 AM on March 23, 2005


Here (Leeds) the limit is in pages rather than words, but I suspect that in any institution where submission is on paper, they're not going to be that fussy. In fact, I suspect that there's a rule-of-thumb for converting page count to word count used by your research degrees office. And that they don't bother checking unless it looks heavy anyway.

At the risk of going off topic - whilst I realise that "use latex" is not a helpful answer, and that if you're close to the end there's no point in switching, but I'm going to have to challenge your assertion that UK phd students all use word. Nobody in my lab does. And having taught "dissertation skills with word" in a previous incarnation, I'd second mwhybark's master document + separate chapters approach. You could then wordcount the actual chapters then add them up (that way you'd not be wordcounting the TOC or bibliography).
posted by handee at 5:58 AM on March 23, 2005


I used the separate chapters approach as it seemed to make sense to do so, but check your uni regs before employing handee's approach to wordcounting, mine made clear that all examinable material was included in the word count, including the bibliography. (I did use word myself)
posted by biffa at 6:38 AM on March 23, 2005


Response by poster: biffa: the limit is set in stone (as is everything at Oxford), and asking for an extension is more hassle than cutting some sections.

I am using a separate chapters approach and may even avoid aggregating the document entirely, after reading this thread.

Thanks to everyone!
posted by yellowcandy at 2:12 PM on March 23, 2005


I have heard horror stories about Word's Master Document feature and dissertations. I'd steer clear. In my experience, any attempts to abandon Word in academia are met with confused and vacant stares. Most profs work in Word, and they want something they can open.
posted by abbyladybug at 6:08 AM on March 24, 2005


« Older Portable / Home Backup Solution   |   Why are seemingly no IM clients compatible with... Newer »
This thread is closed to new comments.