How to fix an Atom feed?
October 26, 2005 1:58 AM   Subscribe

I'm told that my weblog's atom feed is broken: according to feedvalidator.org this is because 'my feed appears to be encoded as "utf-8", but my server is reporting "US-ASCII". Is it advisable that I should persuade my server to report an 'utf-8', ecoding, and, if so, how?

Here's the feed in question. I'm running MT 3.2: it's only since I upgraded to this version of MT that the site has been trying to encode in utf-8. According to the feedvalidator site, I should 'either ensure that the charset parameter of the HTTP Content-Type header matches the encoding declaration,' or 'ensure that the server makes no claims about the encoding.' How do I do this? I have limited (CPanelX) access to my webserver and a decidedly faint and incomplete grasp of the technicalities involved.
posted by misteraitch to Computers & Internet (9 answers total) 1 user marked this as a favorite
 
Your problem is that nothing ever actually specifies the encoding of your feed, so Feed Validator is assuming US ASCII by default.

The easiest way to fix this is to modify Movable Type so that it outputs <?xml version="1.0" encoding="utf-8"?> as the first line of atom.xml
posted by nmiell at 2:38 AM on October 26, 2005


Assuming you haven't modified your server since posting your question, the HTTP headers in your feed don't specify any particular character encoding. The Atom feed you're serving does specify utf-8 encoding, as nmiell's suggestion above. So any reasonable software is going to have the information it needs to know your encoding. Ie: no major problem that I see.

It would be marginally better to convince Apache to serve a content encoding in the HTTP headers. But be sure it's the right one! I don't know how to convince Apache to do that, sorry.
posted by Nelson at 3:03 AM on October 26, 2005


Response by poster: Thanks nmiell, Nelson; I've changed the atom feed template to specify UTF-8. Feedvalidator listed some other problems with atom.xml, which I'm working on fixing now - I guess it could have been one of these other issues that led to the feed's being unreadable.
posted by misteraitch at 3:10 AM on October 26, 2005


Response by poster: OK: it looks like I was thrown by the validator displaying the utf-8 message first: the feed passes for valid after I fixed some of the other issues with invalid characters in entry titles, etc.
posted by misteraitch at 4:00 AM on October 26, 2005


Best answer: Assuming you haven't modified your server since posting your question, the HTTP headers in your feed don't specify any particular character encoding

Not true. It's currently served as "text/xml". Not mentioning an encoding automatically means the feed is "US-ASCII", not that no encoding is specified.

Three ways to solve this:
1) Send "Content-Type: application/xml" or "applications/atom+xml". This results in the feedreader/validator looking inside the document at the < ?xml ?> declaration.
2) Send "Content-Type: text/xml; charset=utf-8"
3) Escape all utf-8 characters so that the feed is valid US-ASCII. This requires replacing all character codes 128 or above with &128; or whatever.
posted by cillit bang at 4:42 AM on October 26, 2005


Response by poster: cillit bang—forgive my ingorance, but in your first two solutions, where would I put those declarations? In my confused muddlings after the MT3.2 upgrade, I had resorted in effect to your solution #3 to get stuff to display OK: but it seems this caused some Atom validity issues when I included escaped characters in weblog entry titles...
posted by misteraitch at 5:25 AM on October 26, 2005


The headers are sent before the actual document. I don't use Movable Type so I don't know how to do this, but you can't change them by editing the template.

If you go with option 3, you need to remove the "utf-8" from the < ?xml ?> declaration.
posted by cillit bang at 5:42 AM on October 26, 2005


Best answer: If your web server runs Apache, you can put the following line into a file called ".htaccess" in the directory containing your feed (assumes your web server is configured to allow this):

AddCharset UTF-8 .xml

See the W3C internationalization FAQ for details.
posted by mbrubeck at 6:50 AM on October 26, 2005


Response by poster: That did the trick, mbrubeck, many thanks. My thanks also to cillit bang, for your answers.
posted by misteraitch at 11:43 AM on October 26, 2005


« Older Origin of 'Hush Little Baby'?   |   expensive cartridges Newer »
This thread is closed to new comments.