Join 3,512 readers in helping fund MetaFilter (Hide)


Batch convert Word docs to text files?
July 9, 2009 12:29 PM   Subscribe

Batch convert Word docs to text files?

Does anyone know of a utility to convert a large number of Word documents (hundreds) to text files in Windows XP? They are in Word 2003 (.doc) format. The Word docs are about a 1MB each because they have images, but the text files they turn into are only about 4KB. I want to run a text processing script I wrote in perl on them, but I'm not sure how to get them in text format in the first place without opening each and converting it by hand. Looking for free solutions if possible, just because it's a pain for me to pay for software at work.
posted by yarrow to Computers & Internet (5 answers total) 3 users marked this as a favorite
 
http://wvware.sourceforge.net/
http://directory.fsf.org/project/catdoc/
posted by devbrain at 1:19 PM on July 9, 2009


p.s. for XP, via cygwin -- http://mirror.nyi.net/cygwin/release/catdoc/
posted by devbrain at 1:20 PM on July 9, 2009


I think this would work, but I'm not sure: I'd try making a Word macro to open a doc and export it as txt (trivially easy to do via "record Macro") then edit the macro to parameterize it, and finally run it from outside of Word (which is also easy, but the pc on which I've got examples is miles away from here so I can't give examples.)
posted by anadem at 1:24 PM on July 9, 2009


A word macro saved in your normal.dot something like this (untested)
change the 'PathToUse' line to the folder containing all your .doc files

Public Sub BatchReplaceAll()

Dim PathToUse As String
Dim myFile As String
Dim myDoc As Document

On Error Resume Next

'Close all open documents before beginning
Documents.Close SaveChanges:=wdPromptToSaveChanges

PathToUse = "C:\my doc files\"

'Set the directory and type of file to batch process
myFile = Dir$(PathToUse & "*.doc")

While myFile <> ""

'Open document
Set myDoc = Documents.Open(PathToUse & myFile)

'Save doc as text
myDoc.SaveAs FileName:=, FileFormat:=wdFormatText
'myDoc.SaveAs FileName:=Replace(myDoc.FullName, "doc", "txt")

'close
myDoc.Close SaveChanges:=wdDoNotSaveChanges

'Next file in folder
myFile = Dir$()

Wend

End Sub
posted by Lanark at 2:52 PM on July 9, 2009


Thanks all, I'll try out these options.
posted by yarrow at 6:36 AM on July 10, 2009


« Older Are there any books in the glo...   |  Best seating area for photo op... Newer »
This thread is closed to new comments.