Change Cyrillic to Arabic, in Kazak
April 18, 2007 12:11 AM Subscribe
I use Mellel for writing in Kazak on OS X. I have files in Cyrillic which I need to transliterate into their Arabic alphabet. It isn't a straight letter-letter deal though. How do I correctly find and replace so that үн becomes ٴۇن, not ۇن, ,for example. (In other words, "if ә,ө,ү,і, but no г,к,е, add ٴ to front of word")
In Kazak Arabic, the letters ا، و، ۇ، ى (a, o, u, i) can represent both the hard (а, о, ұ, ы) and soft (ә, ө, ү, і) Cyrillic vowels of the Cyrillic alphabet. If there is a г, к, е (g, k, e) it is understood that the vowels are soft since hard vowels don't appear with those 3 consonants (only in rare cases). This is why a straight forward letter-to-letter find and replace won't work!
In Kazak Arabic, the letters ا، و، ۇ، ى (a, o, u, i) can represent both the hard (а, о, ұ, ы) and soft (ә, ө, ү, і) Cyrillic vowels of the Cyrillic alphabet. If there is a г, к, е (g, k, e) it is understood that the vowels are soft since hard vowels don't appear with those 3 consonants (only in rare cases). This is why a straight forward letter-to-letter find and replace won't work!
If there is a г, к, е (g, k, e) it is understood that the vowels are soft since hard vowels don't appear with those 3 consonants (only in rare cases).Do you mean, if the word starts with ә,ө,ү,і and the second letter is not one of г, к, е, add ٴ to the start of the word? ? If so, you can run the following Perl program on your text file:
perl -e 'use utf8; use encoding "utf8"; while (<>) { s/\b([\x{04d9}\x{04e9}\x{04af}\x{0456}][^\x{0433}\x{043a}\x{0435}])/\x{0674}\1/g; print; } ' < original-file-name > modified-file-name >
which will add the hamza where appropriate, and you can then do the global replace. Note that any new lines were added by MetaFilter or your browser, so you shouldn't use them in Terminal, and you'll need to change original-file-name and modified-file-name to reflect what you have on your system.posted by Aidan Kehoe at 2:40 AM on April 18, 2007
Response by poster: Thanks for your feedback. The complexity of the answers, at least to my eye, is what led me to post here !
One response, IF those "soft" (ә,ө,ү,і ) vowels appear anywhere, even once, in the word, and there isn't one of the three letters (г, к, е) anywhere in the word, in Cyrillic, THEN the Arabic word needs only one hamza at the very front (not over or near every instance of a vowel in the word, only the very front, once). If there is even one of those three letters, then it tells the reader that the vowel is a soft vowel (ә,ө,ү,і ), not a hard vowel (а, о, ұ, ы). Otherwise the hamza tells the reader the vowels are soft.
What would that perl program look like now?
posted by steppe at 10:20 AM on May 4, 2007
One response, IF those "soft" (ә,ө,ү,і ) vowels appear anywhere, even once, in the word, and there isn't one of the three letters (г, к, е) anywhere in the word, in Cyrillic, THEN the Arabic word needs only one hamza at the very front (not over or near every instance of a vowel in the word, only the very front, once). If there is even one of those three letters, then it tells the reader that the vowel is a soft vowel (ә,ө,ү,і ), not a hard vowel (а, о, ұ, ы). Otherwise the hamza tells the reader the vowels are soft.
What would that perl program look like now?
posted by steppe at 10:20 AM on May 4, 2007
What would that perl program look like now?Like this:
perl -e 'use utf8; use encoding "utf8"; while (<>) { s/\b([\x{04d9}\x{04e9}\x{04af}\x{0456}][^\x{0433}\x{043a}\x{0435}]+\b)/\x{0674}\1/g; print; } ' < original-file-name > modified-file-name
Not tested, sorry.posted by Aidan Kehoe at 7:36 AM on October 9, 2007
This thread is closed to new comments.
posted by Blazecock Pileon at 12:23 AM on April 18, 2007