Join 3,514 readers in helping fund MetaFilter (Hide)


Modifying php-based web calendar to allow UTF-8 (Japanese) input
March 29, 2006 10:19 PM   Subscribe

Probably easy question about PHP and unicode (UTF-8) and RegEx. I'm trying to modify a php webcalendar (VTcalendar) to allow Japanese text in calendar postings. I've found all the variables to get the UTF-8 headers, and so japanese text manually inserted into pages appears fine. But, there's an input validation thingy I don't know how to modify. (short snippet inside)

The calendar item input form rejects any Japanese text, and I think I've traced it to the file "inputvalidation.inc.php" which starts with the code below. if I try to delete the part about allowable characters in line 7, I get an error about the '^' in the last line. can this be modified to allow UTF-8 characters?

if (!defined("ALLOWINCLUDES")) { exit; } // prohibits direct calling of include files
define("constValidTextCharWithoutSpacesRegEx",'\w~!@#\$%^&*\(\)\-+=\{\}\[\]\|\\\:";\'<>?,.\/');
define("constValidTextCharWithSpacesRegEx",'\s'.constValidTextCharWithoutSpacesRegEx);
define("constCalendaridMAXLENGTH",20);
define("constCalendaridVALIDMESSAGE", '1 to '.constCalendaridMAXLENGTH.' characters (A-Z,a-z,0-9,-,.)');
define("constCalendarnameMAXLENGTH",100);
define("constCalendarnameVALIDMESSAGE", '1 to '.constCalendarnameMAXLENGTH.' characters (A-Z,a-z,0-9,-,.,&,\',[space],[comma])');
define("constCalendarTitleMAXLENGTH",50);
define("constKeywordMaxLength",100);
define("constSpecificsponsorMaxLength",100);
define("constPasswordMaxLength",20);
define("constPasswordRegEx", '/^['.constValidTextCharWithoutSpacesRegEx.']{1,'.constPasswordMaxLength.'}$/');
posted by planetkyoto to Computers & Internet (4 answers total)
 
I don't envy you. What exactly are you removing and what kind of error do you get? A parse error?

I'm befuddled by character sets and collations and spend as much time as I can avoiding them, but I remember seeing this modifier in the php regex docs:
u (PCRE_UTF8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
You might want to poke around there.
posted by miniape at 5:41 AM on March 30, 2006


After looking at this again, I think what you need to do is use the 'u' modifier at the end of this:

define("constValidTextCharWithoutSpacesRegEx",'\w~!@#\$%^&*\(\)\-+=\{\}\[\]\|\\\:";\'<>?,.\/');

Then add a range of unicode characters to it using hex values. Look at the first few comments on: http://us3.php.net/manual/en/reference.pcre.pattern.modifiers.php

Note you might have to write everything with hex values. I haven't tried it.
posted by miniape at 11:39 AM on March 30, 2006


Thanks, miniape, I'm going to work on this.
posted by planetkyoto at 4:01 PM on March 30, 2006


I realized that only trusted logged-in users are going to be able to submit events to the event cal, so I just went through every text field that was being checked for "constValidTextCharWithSpacesRegEx" and replaced the whole line with return TRUE;

This is not exactly an elegant solution, but I think it's safe under the circumstances, and allows me to keep moving forward on developing the site.

Thanks for your assistance.
posted by planetkyoto at 10:38 AM on April 13, 2006


« Older PDFfilter: Font encoding in Ad...   |  This silly little thing haunts... Newer »
This thread is closed to new comments.