Examining 300 MS Word documents, looking for shortcuts
February 24, 2011 7:03 AM   Subscribe

How would one go about the task of recording the word-counts of hundreds of MS Word documents? Is this something that could be automated?

On Monday, I will be receiving about 300 documents in MS Word format. I am looking for a quick way to produce a list that would contain the filename of each file, followed by the number of words in that file. However, I am unsure about the direction I should take with this. Is this something that can be handled with Word macros, or does it call for VBA or a more specialized solution?

If at all possible, I would like to avoid examining each of the 300 files manually and individually.

I am using Word 2007 on Windows 7.
posted by cac to Computers & Internet (4 answers total) 2 users marked this as a favorite

are they all docx files, if so you can get word count from right click, details...
a docx file is really a zipped file with xml files attached one of the xml files app.xml has the wordcount in it to read in file properties...

i'm guessing you could write a script to read this property from xml and produce a report...
posted by fozzie33 at 7:23 AM on February 24, 2011

also found this script if they are docx files...
posted by fozzie33 at 7:38 AM on February 24, 2011

Here's a simple Python script that will open up a file and print the word count.

>>> import zipfile. xml.dom.minidom
>>> z = zipfile.ZipFile("...")
>>> xml.dom.minidom.parseString(z.read('docProps/app.xml')).documentElement.childNodes[3].firstChild.nodeValue

You can easily extend this to reading the word counts of all the files in a directory and writing the answers to some file or other.

I've hard-coded in the position of the Words element in the app.xml file; this may not work, in which case you will need to bring in a more full-featured XML parser to do the job.
posted by katrielalex at 1:53 PM on February 24, 2011

« Older Secular choral music   |   Daredevil killed in the 1970's - help me find... Newer »
This thread is closed to new comments.