June 4th, 2005


BART and UTF-8

Waiting for a BART this afternoon I was looking at the LED reader board to see when the next train was coming. Some advertisement about an upcoming concert came on, but where there was supposed to be punctuation at some point (probably an EM DASH, U+2014), there were instead two iso_8859-1 characters (some vowel with diacritic and something else).

Theory: somebody wrote up the advertisement in MS-Word (which auto-corrected a double-hyphen into an em-dash), copy/pasted it into a web form (with the paste preserving the Unicode character), submitted it to a website that lets you add advertisements to BART reader boards, the browser converted the Unicode in the textarea to utf-8... then the stupid cgi script blindly took it and passed it off to the reader board, which only knows iso_8859-1/Latin-1, and not utf-8, because it was built so long ago.

But I stilllllll love tech-nol-ogyyyyyyy....
... but noooooooot as much as-you-you-see.....