CJK Word translation macro
March 30, 2008 8:43 PM   RSS feed for this thread Subscribe

I need a Word macro that will go through a folder, run word/character counts on each file in the folder, and deliver a total count of Chinese/Japanese/Korean characters, ignoring all English. This seems like it shouldn't be hard to script, but insufficient leetness on my part (and the fact that I've never written a macro before) means I'm more or less clueless here. What am I missing?
posted by bokane to technology (5 comments total)
Actually that sounds like it would be really difficult, because you'd have to figure out what character set each character belonged too, and that could depend on the documents encoding (Unicode vs. big5). Unless VBA has some built in functions for doing that (and it might) it would be a huge problem. Quick goggling doesn't turn anything up.

A quick approximation would be to count every character outside of the Ascii range, but that could overcount some things.
posted by delmoi at 9:31 PM on March 30


Yeah - the few times I've used the built-in CJK count in Word, it seems to be counting full-width Chinese punctuation (in GB2312, at least) as a CJK character.
posted by bokane at 9:33 PM on March 30


gak. icky. half-arsed attempt in your inbox.
posted by pompomtom at 1:45 AM on March 31


A friend of mine has put together something like this.
posted by adamrice at 7:26 AM on March 31


Here's my (very, very) slightly modified version of the code pompomtom was kind enough to send me FTW:

Sub CountNonEnglishCharacters()

Set BadChars = New Collection

BadChars.Add Item:=""""
BadChars.Add Item:="("
BadChars.Add Item:=")"
BadChars.Add Item:="1"
BadChars.Add Item:="2"
BadChars.Add Item:="3"
BadChars.Add Item:="4"
BadChars.Add Item:="5"
BadChars.Add Item:="6"
BadChars.Add Item:="7"
BadChars.Add Item:="8"
BadChars.Add Item:="9"
BadChars.Add Item:="0"
BadChars.Add Item:="%"
BadChars.Add Item:="^#"
BadChars.Add Item:="^w"
BadChars.Add Item:="^$"
BadChars.Add Item:="&"
BadChars.Add Item:=","
BadChars.Add Item:="/"
BadChars.Add Item:="-"
BadChars.Add Item:="."
BadChars.Add Item:="'"
BadChars.Add Item:=":"
BadChars.Add Item:="。"
BadChars.Add Item:=","
BadChars.Add Item:="("
BadChars.Add Item:=")"
BadChars.Add Item:=" "
BadChars.Add Item:="【"
BadChars.Add Item:="】"
BadChars.Add Item:="["
BadChars.Add Item:="]"
BadChars.Add Item:="《"
BadChars.Add Item:="》"

CurrentFile = Dir("DIRECTORY PATH*.doc")
TotalCount = 0

While CurrentFile <> ""
' MsgBox ("c:\temp\Marbridge\" & CurrentFile)
Documents.Open FileName:="DIRECTORY PATH" & CurrentFile, ReadOnly:=True

For Each candidate In BadChars
With Selection.Find
.Text = candidate
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Next candidate
ThisCount = ActiveDocument.Characters.Count
TotalCount = TotalCount + ThisCount

ActiveDocument.Close savechanges:=False

CurrentFile = Dir
Wend
MsgBox ("There is a total of " & TotalCount & " characters.")
End Sub

posted by bokane at 11:50 PM on March 31


« Older What is a good intro book on e...   |   What are some good sci-fi stor... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Basic web development? January 13, 2008
Should I be using LaTex instead of MS Word? November 13, 2007
Just a little macro to help me dominate the world! April 27, 2007
Bibliographies in Excel May 4, 2005
Setting up a spoken word studio environment with a... March 17, 2005