Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's being worked on†. It hasn't made it into the standard package library as far as I know.

E.g., there are two ways to write Cañyon City. You can write the ñ as U+00F1 or as an ascii lower-case n followed by a combining tilde (U+0303). The first case results in a single rune, and the second in two runes. Example††. You need additional logic in order to normalize to a canonical representation and realize that the two strings are actually the same.

Also, if you are displaying the string, you need to account for the fact that, although the two strings have different byte and rune lengths, they take up exactly the same number of pixels on your display medium.

http://blog.golang.org/normalization

††http://play.golang.org/p/XJPydELZ6s



>E.g., there are two ways to write Cañyon City. You can write the ñ as U+00F1 or as an ascii lower-case n followed by a combining tilde (U+0303). The first case results in a single rune, and the second in two runes. Example††. You need additional logic in order to normalize to a canonical representation and realize that the two strings are actually the same.

Who thought that having two ways to go about this was a good idea in the first place?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: