URLs with ampersands (&) screwing up my website database calls.
January 23, 2008 8:38 AM   Subscribe

URLs with ampersands (&) screwing up my website database calls.

The past few days I've been configuring my .htaccess files so that it supports search engine-friendly URLs for a product-based website that I'm building... something like www.xyz.com/toys/Spiderman/ instead of www.xyz.com/main.php?productID=3424wqe533

The keyword in the URL (eg "Spiderman") is the product title, and allows the PHP script to call up the record in MySQL. So far so good.

Only problem is, things fall apart if a product has an ampersand in its name - because the browser thinks it's a variable separator. For example "Superman & Friends" which yields a URL like www.xyz.com/toys/Superman & Friends/

What can I do to overcome this? I've spent a lot of time reading up and experimenting on rawurlencode() functions but to no avail.
posted by arrowhead to Computers & Internet (7 answers total) 4 users marked this as a favorite
I had the same problem recently on one of my sites. The only thing I could find that looked like a "real" solution involved editing the httpd.conf file; you might want to look into RewriteMap and the 'escape' map, if you can make changes to httpd.conf (RewriteMap can't be used in a .htaccess file).

The solution I ended up using doesn't work in the case where a user types a URL with an ampersand directly, but I just changed my code to escape the ampersand in links. It's not ideal, but since the particular feature was meant to be followed by links and not direct URLs, it's not likely to affect things too much.

Basically, what I did was to replace ampersands in URLs (in the Perl script that generated the page) with '%2526'. %25 is the URL escape sequence for the % character, so Apache escapes that and you end up with %26, which is the escape sequence for an ampersand. CGI.pm (in my case) doesn't try to use that as a separator and I end up with the ampersand in the param correctly.
posted by Godbert at 8:52 AM on January 23, 2008

Encode your URL with urlencode to replace & with its according entity.
posted by Blazecock Pileon at 9:00 AM on January 23, 2008

You should be escaping these. Put a record in your database that reads "<blink>" and "&mdash;" as a test. You should see exactly those two values in your /rendered/ web page. You must not write from a non-HTML source to a web page without encoding it. That's a big no-no.

(It's also a big no-no to write to the database from the web server without decoding it and re-encoding.)

posted by cmiller at 9:27 AM on January 23, 2008

URL encoding these titles is the simple solution, but your URLs can get ugly doing that. You can solve this by stripping all but alphanumeric characters out of your title, saving that as a separate field in your database, and doing lookups based on that field.

So this:

could become something more like this:

You just have to watch out for duplicate URLs if you're doing that, as it's not a 1-to-1 translation.
posted by scottreynen at 10:09 AM on January 23, 2008

...and then laugh about it.
posted by rush at 2:30 PM on January 23, 2008

(It's also a big no-no to write to the database from the web server without decoding it and re-encoding.)

No, if you're concerned with SQL injection attacks, it's a big no-no to write to the database from the application server without building prepared statements. Decoding and encoding data is a poor substitute for that.
posted by me & my monkey at 2:38 PM on January 23, 2008

what's wrong with a url that reads:


(or similar)?

Gives you the keyword in the title but means you can use the actual productID for the database lookup.
posted by garius at 7:47 AM on January 24, 2008

« Older How to choose a real estate agent after...   |   What makes a good classical recording? Newer »
This thread is closed to new comments.