CJK Word translation macro
March 30, 2008 8:43 PM

I need a Word macro that will go through a folder, run word/character counts on each file in the folder, and deliver a total count of Chinese/Japanese/Korean characters, ignoring all English. This seems like it shouldn't be hard to script, but insufficient leetness on my part (and the fact that I've never written a macro before) means I'm more or less clueless here. What am I missing?
posted by bokane to Technology (5 answers total)
Actually that sounds like it would be really difficult, because you'd have to figure out what character set each character belonged too, and that could depend on the documents encoding (Unicode vs. big5). Unless VBA has some built in functions for doing that (and it might) it would be a huge problem. Quick goggling doesn't turn anything up.

A quick approximation would be to count every character outside of the Ascii range, but that could overcount some things.
posted by delmoi at 9:31 PM on March 30, 2008


Yeah - the few times I've used the built-in CJK count in Word, it seems to be counting full-width Chinese punctuation (in GB2312, at least) as a CJK character.
posted by bokane at 9:33 PM on March 30, 2008


gak. icky. half-arsed attempt in your inbox.
posted by pompomtom at 1:45 AM on March 31, 2008


A friend of mine has put together something like this.
posted by adamrice at 7:26 AM on March 31, 2008


Here's my (very, very) slightly modified version of the code pompomtom was kind enough to send me FTW:
Sub CountNonEnglishCharacters()

Set BadChars = New Collection

BadChars.Add Item:=""""
BadChars.Add Item:="("
BadChars.Add Item:=")"
BadChars.Add Item:="1"
BadChars.Add Item:="2"
BadChars.Add Item:="3"
BadChars.Add Item:="4"
BadChars.Add Item:="5"
BadChars.Add Item:="6"
BadChars.Add Item:="7"
BadChars.Add Item:="8"
BadChars.Add Item:="9"
BadChars.Add Item:="0"
BadChars.Add Item:="%"
BadChars.Add Item:="^#"
BadChars.Add Item:="^w"
BadChars.Add Item:="^$"
BadChars.Add Item:="&"
BadChars.Add Item:=","
BadChars.Add Item:="/"
BadChars.Add Item:="-"
BadChars.Add Item:="."
BadChars.Add Item:="'"
BadChars.Add Item:=":"
BadChars.Add Item:="。"
BadChars.Add Item:=","
BadChars.Add Item:="("
BadChars.Add Item:=")"
BadChars.Add Item:="    "
BadChars.Add Item:="【"
BadChars.Add Item:="】"
BadChars.Add Item:="["
BadChars.Add Item:="]"
BadChars.Add Item:="《"
BadChars.Add Item:="》"

CurrentFile = Dir("DIRECTORY PATH*.doc")
TotalCount = 0

While CurrentFile <> ""
  ' MsgBox ("c:\temp\Marbridge\" & CurrentFile)
   Documents.Open FileName:="DIRECTORY PATH" & CurrentFile, ReadOnly:=True

   For Each candidate In BadChars
       With Selection.Find
           .Text = candidate
           .Replacement.Text = ""
           .Forward = True
           .Wrap = wdFindContinue
       End With
       Selection.Find.Execute Replace:=wdReplaceAll
   Next candidate
   ThisCount = ActiveDocument.Characters.Count
   TotalCount = TotalCount + ThisCount

   ActiveDocument.Close savechanges:=False

   CurrentFile = Dir
Wend
MsgBox ("There is a total of " & TotalCount & " characters.")
End Sub

posted by bokane at 11:50 PM on March 31, 2008


« Older Books on economy   |   Singularity + Sci-Fi = My Nerdy Request Newer »
This thread is closed to new comments.