Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Why would the strings match? Aren't they completely different?

In some theoretical sense, yes. In terms of solving users' problems and providing business value, I need to make it possible for users to find the entry for "Łódź" without typing accents.

> (What language are we talking about, anyway?)

We're talking about Scala; it runs on the JVM and so java.lang.String is the string type.



This right here is where we start to mix up two totally different things by using the same words for them.

This thread is talking about two completely different types of Equals; we shouldn't be using the same word for them and certainly not the same function name in code:

- SomeString1.Equals(SomeString2) -> are all the bytes in array 1 equal to the bytes in array 2?

- HumanSimilarity(t1, t2) - given text s1 and text s2, give me a number that tells me how similar a person would perceive these strings to be. You could even go further:

SimilarityForReaderInLocale(locale, t1, locale1, t2, locale2) - for a human reader in locale, given text t1 written by a human in locale1 and text t2 written by a human in locale2, how similar would the reader perceive these two pieces of text?

When we talk about 'least surprise' in manipulations, what I think we really mean is that text manipulation should be defined by a locale; no actually, a superset of a locale. That superset being your typical human reader in that locale.


Oh, so you're coding around your users' inability to type the letters they actually want? That sounds basically impossible... you'd have to pick letters according to how similar they look instead of their meaning in any context. And, I meant human language! :)


"impossible" isn't good enough - I could write a bunch of hacks that would cover many of the cases, but the programming language (or ideally the unicode standard) should provide a standard answer, rather than each programmer having to implement this themselves.

Human language? English speakers talking about cities in a variety of countries (which should be correctly named, but searchable by english speakers).


Impossible does not stop many people from doing exactly this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: