In-place replacement of charset barf in static html files?
November 24, 2008 7:39 PM Subscribe
How do I replace non-printable barf from charset mangling with sed/awk or perl? I have a collection of flat html files which at some point in the past got corrupted charset-wise. You can see an example broken file here. Apache serves them up utf-8 in a clearly broken way, but dropping in a .htaccess to force iso-8859-1 doesn't help (see here) and ditto windows-1252 (see here). When I open the files in vim or less, I see "<89>" as if it were one char for what should be ’ (right curly quotation mark). I don't know how to replace that in a programmatic way since it's not a literal bracket-eight-nine-bracket. Halp?8>