Modifying php-based web calendar to allow UTF-8 (Japanese) input
March 29, 2006 10:19 PM   RSS feed for this thread Subscribe

Probably easy question about PHP and unicode (UTF-8) and RegEx. I'm trying to modify a php webcalendar (VTcalendar) to allow Japanese text in calendar postings. I've found all the variables to get the UTF-8 headers, and so japanese text manually inserted into pages appears fine. But, there's an input validation thingy I don't know how to modify. (short snippet inside)

The calendar item input form rejects any Japanese text, and I think I've traced it to the file "inputvalidation.inc.php" which starts with the code below. if I try to delete the part about allowable characters in line 7, I get an error about the '^' in the last line. can this be modified to allow UTF-8 characters?

if (!defined("ALLOWINCLUDES")) { exit; } // prohibits direct calling of include files
define("constValidTextCharWithoutSpacesRegEx",'\w~!@#\$%^&*\(\)\-+=\{\}\[\]\|\\\:";\'<>?,.\/');
define("constValidTextCharWithSpacesRegEx",'\s'.constValidTextCharWithoutSpacesRegEx);
define("constCalendaridMAXLENGTH",20);
define("constCalendaridVALIDMESSAGE", '1 to '.constCalendaridMAXLENGTH.' characters (A-Z,a-z,0-9,-,.)');
define("constCalendarnameMAXLENGTH",100);
define("constCalendarnameVALIDMESSAGE", '1 to '.constCalendarnameMAXLENGTH.' characters (A-Z,a-z,0-9,-,.,&,\',[space],[comma])');
define("constCalendarTitleMAXLENGTH",50);
define("constKeywordMaxLength",100);
define("constSpecificsponsorMaxLength",100);
define("constPasswordMaxLength",20);
define("constPasswordRegEx", '/^['.constValidTextCharWithoutSpacesRegEx.']{1,'.constPasswordMaxLength.'}$/');
posted by planetkyoto to computers & internet (4 comments total)
I don't envy you. What exactly are you removing and what kind of error do you get? A parse error?

I'm befuddled by character sets and collations and spend as much time as I can avoiding them, but I remember seeing this modifier in the php regex docs:

u (PCRE_UTF8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

You might want to poke around there.
posted by miniape at 5:41 AM on March 30, 2006


After looking at this again, I think what you need to do is use the 'u' modifier at the end of this:

define("constValidTextCharWithoutSpacesRegEx",'\w~!@#\$%^&*\(\)\-+=\{\}\[\]\|\\\:";\'<>?,.\/');

Then add a range of unicode characters to it using hex values. Look at the first few comments on: http://us3.php.net/manual/en/reference.pcre.pattern.modifiers.php

Note you might have to write everything with hex values. I haven't tried it.
posted by miniape at 11:39 AM on March 30, 2006


Thanks, miniape, I'm going to work on this.
posted by planetkyoto at 4:01 PM on March 30, 2006


I realized that only trusted logged-in users are going to be able to submit events to the event cal, so I just went through every text field that was being checked for "constValidTextCharWithSpacesRegEx" and replaced the whole line with return TRUE;

This is not exactly an elegant solution, but I think it's safe under the circumstances, and allows me to keep moving forward on developing the site.

Thanks for your assistance.
posted by planetkyoto at 10:38 AM on April 13, 2006


« Older PDFfilter: Font encoding in Ad...   |   This silly little thing haunts... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Do I have to turn このファイル.dat into 00001.dat? June 24, 2008
Inputting Kanji on a Palm January 29, 2008
Is there any way to surf Japanese web pages on a... November 29, 2007
Konnichiwa February 26, 2006
How to fix an Atom feed? October 26, 2005