Examining 300 MS Word documents, looking for shortcuts
February 24, 2011 7:03 AM
How would one go about the task of recording the word-counts of hundreds of MS Word documents? Is this something that could be automated?
On Monday, I will be receiving about 300 documents in MS Word format. I am looking for a quick way to produce a list that would contain the filename of each file, followed by the number of words in that file. However, I am unsure about the direction I should take with this. Is this something that can be handled with Word macros, or does it call for VBA or a more specialized solution?
If at all possible, I would like to avoid examining each of the 300 files manually and individually.
I am using Word 2007 on Windows 7.
On Monday, I will be receiving about 300 documents in MS Word format. I am looking for a quick way to produce a list that would contain the filename of each file, followed by the number of words in that file. However, I am unsure about the direction I should take with this. Is this something that can be handled with Word macros, or does it call for VBA or a more specialized solution?
If at all possible, I would like to avoid examining each of the 300 files manually and individually.
I am using Word 2007 on Windows 7.
are they all docx files, if so you can get word count from right click, details...
a docx file is really a zipped file with xml files attached one of the xml files app.xml has the wordcount in it to read in file properties...
i'm guessing you could write a script to read this property from xml and produce a report...
posted by fozzie33 at 7:23 AM on February 24, 2011
a docx file is really a zipped file with xml files attached one of the xml files app.xml has the wordcount in it to read in file properties...
i'm guessing you could write a script to read this property from xml and produce a report...
posted by fozzie33 at 7:23 AM on February 24, 2011
also found this script if they are docx files...
http://blog.kiddaland.net/2009/06/office-2007-metadata/
posted by fozzie33 at 7:38 AM on February 24, 2011
http://blog.kiddaland.net/2009/06/office-2007-metadata/
posted by fozzie33 at 7:38 AM on February 24, 2011
Here's a simple Python script that will open up a file and print the word count.
>>> import zipfile. xml.dom.minidom
>>> z = zipfile.ZipFile("...")
>>> xml.dom.minidom.parseString(z.read('docProps/app.xml')).documentElement.childNodes[3].firstChild.nodeValue
u'178'
You can easily extend this to reading the word counts of all the files in a directory and writing the answers to some file or other.
I've hard-coded in the position of the Words element in the app.xml file; this may not work, in which case you will need to bring in a more full-featured XML parser to do the job.
posted by katrielalex at 1:53 PM on February 24, 2011
>>> import zipfile. xml.dom.minidom
>>> z = zipfile.ZipFile("...")
>>> xml.dom.minidom.parseString(z.read('docProps/app.xml')).documentElement.childNodes[3].firstChild.nodeValue
u'178'
You can easily extend this to reading the word counts of all the files in a directory and writing the answers to some file or other.
I've hard-coded in the position of the Words element in the app.xml file; this may not work, in which case you will need to bring in a more full-featured XML parser to do the job.
posted by katrielalex at 1:53 PM on February 24, 2011
This thread is closed to new comments.
posted by fozzie33 at 7:11 AM on February 24, 2011