Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the issues is actually that web browsers can have inconsistent encoding of the data they send, and depending on the amount of testing (across browsers) done that can yield surprises.

For instance, the "unicode snowman" is because MSIE 5-8 will refuse to send a form as UTF-8 (completely ignoring `accept-charset`) if it can encode everything to Latin-1. Conversedly, most browsers will default to UTF-8 (but I believe normalization may vary). If the system was built in the early 00s and only tested in MSIE, it might well expect all input data as latin-1 (because that seemed to work at the time) and crap out when UTF-8 comes in.



What does "unicode snowman" have to do with this?


Some websites now will include a hidden input field in all forms <input type="hidden" name="snowman" value-"&#9731" />

to convince IE that it's supposed to be sending UTF-8, not latin1 (And so the site can recognize if the input was likely mangled.


It's built into Rails, except they use utf8=✓ now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: