HTML cleanup in comment boxes
October 14, 2004 5:00 PM   Subscribe

FilterFilter: Recent forays into php programming have lead me to the conclusion that it's really hard to stop people adding javascript hacks, dodgy html and injected SQL queries into the box which says "post this to my site". Are there any known, good and actively managed libraries which allow me to sanitize any html entered in, for example, comment boxes so that I can stop the hacker ownzering my site?
posted by seanyboy to Computers & Internet (7 answers total)
 
Best answer: I found an interesting code snippet called "sanitize", which an author created for just this purpose. I don't have it any more, but I know I found it by googling "php sanitize". It worked well, and sanitized in several ways (mysql, php, html, etc).
posted by websavvy at 5:09 PM on October 14, 2004


Best answer: There are plenty of things out there, but you might try writing one too. Not because the world needs another attempt at this, but because I found it helped me develop a healthy paranoia about user input. Whatever code you use is better than nothing (usually-- unless you start to think of it as bulletproof); it's the attitude that counts.

There are a lot of good things to do. Only allowing data into the db through stored procedures that are owned by a db user that has only the permissions they need is a good start. Unfortunately the bulk of PHP db programming is done in MySQL which only started supporting stored procs recently.

Sanitize the hell out of user data. Never look for certain things to remove. Get rid of everything but what you expect (have an array of HTML tags you will allow, not an array of tags you pull out). And don't trust PHP's built-in functions to save you. I know some people refuse to use them and shut magic quotes off. If that helps encourage paranoia, so be it.
posted by yerfatma at 5:29 PM on October 14, 2004


These problems are relatively small and thus have been solved in tons of trivial ways.

The PHP documentation shows how to avoid SQL injection.

This is some code that looks reasonably useful for tag stripping.
posted by majick at 5:33 PM on October 14, 2004


One thing that works pretty well is to do a replace operation on the string that changes all "<" to "<". Then if you want to allow certain tags, have additional replace instructions to turn, say, "<B>" back into "<B>"

For SQL you primarily want to change every ' to '' (that's two apostrophes, not a quote mark) I believe.
posted by kindall at 6:11 PM on October 14, 2004


One thing that works pretty well is to do a replace operation on the string that changes all "<" to "&lt;". Then if you want to allow certain tags, have additional replace instructions to turn, say, "&lt;B>" back into "<B>"

For SQL you primarily want to change every ' to '' (that's two apostrophes, not a quote mark) I believe.
posted by kindall at 6:12 PM on October 14, 2004


Best answer: The strip_tags() function culls everything formatted using angled brackets, including <?php> and soforth. This is an optional argument for tags you wish to allow.

From there, a trim() gets rid of extraneous whitespace at the beginning and end of the string.

If you don't plan to allow any HTML, use htmlspecialchars() to escape punctuation and other such dross.
posted by Danelope at 7:17 PM on October 14, 2004


Best answer: strip_tags() doesn't strip malicious code, such as an onclick attribute to an a tag. The term for this problem is Cross Site Scripting (XSS for short, due to acronym collisions) if that will help people google.

OWASP has this PHP filter but it doesn't look like they have attribute exclusion yet (check the CVS though). While you're there you should probably check out their top ten web app vulnerabilities and their guide to web app security.
posted by revgeorge at 6:30 AM on October 15, 2004


« Older REST and SOAP   |   How did East Asians do math before the adoption of... Newer »
This thread is closed to new comments.