Batch convert Word docs to text files?
July 9, 2009 12:29 PM Subscribe
Batch convert Word docs to text files?
Does anyone know of a utility to convert a large number of Word documents (hundreds) to text files in Windows XP? They are in Word 2003 (.doc) format. The Word docs are about a 1MB each because they have images, but the text files they turn into are only about 4KB. I want to run a text processing script I wrote in perl on them, but I'm not sure how to get them in text format in the first place without opening each and converting it by hand. Looking for free solutions if possible, just because it's a pain for me to pay for software at work.
Does anyone know of a utility to convert a large number of Word documents (hundreds) to text files in Windows XP? They are in Word 2003 (.doc) format. The Word docs are about a 1MB each because they have images, but the text files they turn into are only about 4KB. I want to run a text processing script I wrote in perl on them, but I'm not sure how to get them in text format in the first place without opening each and converting it by hand. Looking for free solutions if possible, just because it's a pain for me to pay for software at work.
p.s. for XP, via cygwin -- http://mirror.nyi.net/cygwin/release/catdoc/
posted by devbrain at 1:20 PM on July 9, 2009
posted by devbrain at 1:20 PM on July 9, 2009
I think this would work, but I'm not sure: I'd try making a Word macro to open a doc and export it as txt (trivially easy to do via "record Macro") then edit the macro to parameterize it, and finally run it from outside of Word (which is also easy, but the pc on which I've got examples is miles away from here so I can't give examples.)
posted by anadem at 1:24 PM on July 9, 2009
posted by anadem at 1:24 PM on July 9, 2009
A word macro saved in your normal.dot something like this (untested)
change the 'PathToUse' line to the folder containing all your .doc files
Public Sub BatchReplaceAll()
Dim PathToUse As String
Dim myFile As String
Dim myDoc As Document
On Error Resume Next
'Close all open documents before beginning
Documents.Close SaveChanges:=wdPromptToSaveChanges
PathToUse = "C:\my doc files\"
'Set the directory and type of file to batch process
myFile = Dir$(PathToUse & "*.doc")
While myFile <> ""
'Open document
Set myDoc = Documents.Open(PathToUse & myFile)
'Save doc as text
myDoc.SaveAs FileName:=, FileFormat:=wdFormatText
'myDoc.SaveAs FileName:=Replace(myDoc.FullName, "doc", "txt")
'close
myDoc.Close SaveChanges:=wdDoNotSaveChanges
'Next file in folder
myFile = Dir$()
Wend
End Sub >
posted by Lanark at 2:52 PM on July 9, 2009
change the 'PathToUse' line to the folder containing all your .doc files
Public Sub BatchReplaceAll()
Dim PathToUse As String
Dim myFile As String
Dim myDoc As Document
On Error Resume Next
'Close all open documents before beginning
Documents.Close SaveChanges:=wdPromptToSaveChanges
PathToUse = "C:\my doc files\"
'Set the directory and type of file to batch process
myFile = Dir$(PathToUse & "*.doc")
While myFile <> ""
'Open document
Set myDoc = Documents.Open(PathToUse & myFile)
'Save doc as text
myDoc.SaveAs FileName:=, FileFormat:=wdFormatText
'myDoc.SaveAs FileName:=Replace(myDoc.FullName, "doc", "txt")
'close
myDoc.Close SaveChanges:=wdDoNotSaveChanges
'Next file in folder
myFile = Dir$()
Wend
End Sub >
posted by Lanark at 2:52 PM on July 9, 2009
Response by poster: Thanks all, I'll try out these options.
posted by yarrow at 6:36 AM on July 10, 2009
posted by yarrow at 6:36 AM on July 10, 2009
This thread is closed to new comments.
http://directory.fsf.org/project/catdoc/
posted by devbrain at 1:19 PM on July 9, 2009