Why is my character encoding different on 2 pages on the same website?
November 19, 2008 2:49 PM   Subscribe

Why are 2 different pages on the same website, showing the same data from the same database fields showing it differently (character encoding issue)?

I have a product detail page that is using UTF-8 encoding to show data about a product (there are hundreds of products- we'll use just one as the example here). If I change the encoding to ISO-8859-1, the character encoding symbols show up where there are uumlauts, trademarks, etc. So, this page works on UTF-8.

When the user clicks a link on that page that pops up a print version of the same page, showing the same data organized differently for a better printable format, the page shows the encoding symbols when using UTF-8 and shows the text just fine when using ISO-8859-1. The exact reverse.

Same website, same server, same database, same table, same fields, why would this be happening?
posted by Chuck Cheeze to Computers & Internet (7 answers total) 1 user marked this as a favorite
 
Is there a meta tag or xml declaration in either of the pages that sets a charset? If so, is it different from the charset sent in the HTTP headers or the charset of the database? Look for one of these:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

<?xml version="1.0" encoding="iso-8859-1"?>

Then inspect the HTTP headers sent for each page (there are Firefox extensions that let you do this) and your database setup, and resolve any conflicts.
posted by expialidocious at 3:09 PM on November 19, 2008


It's not the database, given that the data is from the same fields in the same table.
Is there code on the server which is converting the encoding of the text on one of the pages? In PHP you can use iconv to do this (iirc). Otherwise, as expialidocious says, it's most likely in the headers.
posted by le morte de bea arthur at 3:21 PM on November 19, 2008


Response by poster: The product detail page shares a common header with the rest of the site, which includes this:

<>
<>
<>

The print page has its own header, but the same declarations, other than the http-equiv being set to ISO-8859-1 due to this issue.

I installed LiveHeaders on Firefox and ran the pages against it. the Accept-Charset line, for both of these pages is ISO-8859-1,utf-8...

I think this is a header sent by the browser- is there a line that is a response from the server that tells me what the encoding actually is?
posted by Chuck Cheeze at 3:35 PM on November 19, 2008


Different browsers? Different platforms? Different default settings in a browser? Different language settings assumed as default?

We have a squid cache between our webservers and the public internet to turn situations where we'd typically get hammered off the 'net into "Oh, hey, lots of traffic today." type of deals. We had a problem where overnight occasionally someone from Turkey would hit the website with their browser asking specifically for turkish language encoding... and we didn't have the turkish language pack installed in Plone or something like that, so it threw an error. And then for the next fifteen minutes, any time anyone would hit the page, they'd get the error about the turkish encoding not working.
posted by SpecialK at 3:39 PM on November 19, 2008


Response by poster: Sorry the common header above is the doctype, etc etc. Standard web standards stuff.

@SpecialK. This is same browser on same machine. Look at product page, looks good. Click the print link, looks bad. Interesting about the Turkish website, but this is on any given day from the US on a Mac 10.5 Safari 3.
posted by Chuck Cheeze at 3:45 PM on November 19, 2008


Your page headers did not come through in the post. If you can send a URL, that'd be helpful.

Yes, the Accept-Charset is a request header, sent by the browser as part of your GET or POST request. Using Live HTTP Headers, look for a block of headers that starts with something like

HTTP/1.x 200 OK

That is the beginning of the response headers for one URL. Below it will be a Content-type header, something like this:

Content-Type: text/html; charset=UTF-8

Check to see that the charset specified for both pages match each other and match the native charset of the database.

I hadn't thought of iconv, but it's also a possibility. It converts text from one encoding to another. Maybe there's some processing that converts the character set differently depending on page?
posted by expialidocious at 3:52 PM on November 19, 2008


I've never had these kinds of glitches when absolutely everything is set to UTF8 throughout, so it sounds like something's missing. You definitely need to set Content-Type as expialidocious suggests, and it's sensible to also include a meta tag reiterating that header.
posted by malevolent at 12:42 AM on November 20, 2008


« Older I say oysters and you say ersters...   |   How do you know if you have a stem cell? Newer »
This thread is closed to new comments.