How to deal with Chinese in a Perl program on a mac
October 14, 2011 2:35 AM   Subscribe

Working with Chinese and English text, in a Perl programme, on a Mac. I feel like an idiot.

I need to write a Perl programme on my mac to shuffle some Chinese sentences. None of Aquamacs, TextEdit, Pages or Word seems to provide support. Googling unicode support provides a tsunami of information that doesn't seem to help.

Listing the various combinations I have tried would be a monument only to human doggedness and stupidity. They all result in any combination of files in which the Chinese is corrupt, files which Perl thinks are binary and illegal, files which terminal thinks are binary, or similar.

To summarize the challenge, If I could do the following, I would be happy:

/usr/bin/perl -w

$string = "两个艺术家交换了签名。";

print $string, "\n";

Any suggestions?
posted by stonepharisee to Computers & Internet (5 answers total)
 
Have you asked your question on StackOverflow, or searched on their for previous answers? I found this:

How can I output UTF-8 from Perl?
http://stackoverflow.com/questions/627661/how-can-i-output-utf-8-from-perl

I think you want:

#!/usr/bin/env perl -w

use strict;
use utf8;

my $str = '两个艺术家交换了签名。';
binmode(STDOUT, ":utf8");
print( "$str\n" );

Apparently this answer is also correct, i.e. using the following before the print statement:

use open qw/:std :utf8/;
posted by asymptotic at 2:51 AM on October 14, 2011


Response by poster: By crikey, you nailed it in one! Thank you so much. StackOverflow was unknown to me.
posted by stonepharisee at 2:54 AM on October 14, 2011


Also I think this SO answer is relevant, but I won't lie it didn't make much sense for me as I'm not a Perl-head:

Why does modern Perl avoid UTF-8 by default?
http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default/6163129#6163129
posted by asymptotic at 2:55 AM on October 14, 2011


Response by poster: I spoke too soon. The result seems fine in emacs, but not in any other manner in which I view the output. There, it is reduced to garbage again.

A big part of the problem is that I am forced to view the file in one way or another (using this or that program). I'll dig deeper into the StackOverflow suggestions now.
posted by stonepharisee at 2:59 AM on October 14, 2011


Be careful about determining the success of the above based on how other applications fail to decode the data. You may need to fiddle with their options to explicitly open files as UTF-8 encoded; if you don't they could assume some other crazy encoding.

If you end up posting a SO question be sure to enumerate the full list of applications you're attempting to get working with Unicode; hell the applications might have Unicode bugs in them! This may end up being a SuperUser question, which is a similar forum to StackOverflow but dedicated to application/server/configuration type questions, whereas StackOverflow is dedicated to programming/software engineering questions.
posted by asymptotic at 3:03 AM on October 14, 2011 [1 favorite]


« Older What are the best methods for this type of survey?   |   RSS stealer, thy name is revenge. Newer »
This thread is closed to new comments.