One of the issues is actually that web browsers can have inconsistent encoding of the data they send, and depending on the amount of testing (across browsers) done that can yield surprises.
For instance, the "unicode snowman" is because MSIE 5-8 will refuse to send a form as UTF-8 (completely ignoring `accept-charset`) if it can encode everything to Latin-1. Conversedly, most browsers will default to UTF-8 (but I believe normalization may vary). If the system was built in the early 00s and only tested in MSIE, it might well expect all input data as latin-1 (because that seemed to work at the time) and crap out when UTF-8 comes in.
For instance, the "unicode snowman" is because MSIE 5-8 will refuse to send a form as UTF-8 (completely ignoring `accept-charset`) if it can encode everything to Latin-1. Conversedly, most browsers will default to UTF-8 (but I believe normalization may vary). If the system was built in the early 00s and only tested in MSIE, it might well expect all input data as latin-1 (because that seemed to work at the time) and crap out when UTF-8 comes in.