I started writing a comment on this but it got *very* long, so here: http://www....

stevoski · on June 17, 2010

Good article, Patrick. It is so easy to say "oh, those stupid/lazy/naive web-form programmers". But actually this is a fiendish problem to solve correctly - as your article demonstrates.

oconnore · on June 17, 2010

True, but accepting, say, 1-1000 arbitrary Unicode characters is a step in the right direction.

pjscott · on June 17, 2010

And that's probably the best trade-off between correctness and complexity.

jheriko · on June 17, 2010

what's wrong with rejecting no characters and using escaping/encoding to avoid injection problems etc? that catches every possible edge case - even the ones I don't know yet.

the hard problem is validating the data - but you don't actually have to do it.