Change Cyrillic to Arabic, in Kazak
April 18, 2007 12:11 AM   Subscribe

I use Mellel for writing in Kazak on OS X. I have files in Cyrillic which I need to transliterate into their Arabic alphabet. It isn't a straight letter-letter deal though. How do I correctly find and replace so that үн becomes ٴۇن, not ۇن, ,for example. (In other words, "if ә,ө,ү,і, but no г,к,е, add ٴ to front of word")

In Kazak Arabic, the letters ا، و، ۇ، ى (a, o, u, i) can represent both the hard (а, о, ұ, ы) and soft (ә, ө, ү, і) Cyrillic vowels of the Cyrillic alphabet. If there is a г, к, е (g, k, e) it is understood that the vowels are soft since hard vowels don't appear with those 3 consonants (only in rare cases). This is why a straight forward letter-to-letter find and replace won't work!
posted by steppe to Writing & Language (4 answers total)
 
Mellel supports regular expressions, and if it supports Unicode regex, then you could look into this Unicode regular expressions overview and regex conditional overview to find out how test for a match a particular combination pattern, and replace it with the characters you want.
posted by Blazecock Pileon at 12:23 AM on April 18, 2007


If there is a г, к, е (g, k, e) it is understood that the vowels are soft since hard vowels don't appear with those 3 consonants (only in rare cases).
Do you mean, if the word starts with ә,ө,ү,і and the second letter is not one of г, к, е, add ٴ to the start of the word? ? If so, you can run the following Perl program on your text file:
perl -e 'use utf8; use encoding "utf8"; while (<>) { s/\b([\x{04d9}\x{04e9}\x{04af}\x{0456}][^\x{0433}\x{043a}\x{0435}])/\x{0674}\1/g; print; } ' < original-file-name > modified-file-name
which will add the hamza where appropriate, and you can then do the global replace. Note that any new lines were added by MetaFilter or your browser, so you shouldn't use them in Terminal, and you'll need to change original-file-name and modified-file-name to reflect what you have on your system.
posted by Aidan Kehoe at 2:40 AM on April 18, 2007


Response by poster: Thanks for your feedback. The complexity of the answers, at least to my eye, is what led me to post here !

One response, IF those "soft" (ә,ө,ү,і ) vowels appear anywhere, even once, in the word, and there isn't one of the three letters (г, к, е) anywhere in the word, in Cyrillic, THEN the Arabic word needs only one hamza at the very front (not over or near every instance of a vowel in the word, only the very front, once). If there is even one of those three letters, then it tells the reader that the vowel is a soft vowel (ә,ө,ү,і ), not a hard vowel (а, о, ұ, ы). Otherwise the hamza tells the reader the vowels are soft.

What would that perl program look like now?
posted by steppe at 10:20 AM on May 4, 2007


What would that perl program look like now?
Like this:
perl -e 'use utf8; use encoding "utf8"; while (<>) { s/\b([\x{04d9}\x{04e9}\x{04af}\x{0456}][^\x{0433}\x{043a}\x{0435}]+\b)/\x{0674}\1/g; print; } ' < original-file-name > modified-file-name
Not tested, sorry.
posted by Aidan Kehoe at 7:36 AM on October 9, 2007


« Older How difficult is it to buy a handgun online...   |   Help me find out what song this is Newer »
This thread is closed to new comments.