Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can say that about every single design decision made about every product.

No, that's not true. Many aspects of my computer's UI are user-configurable.



Yes but not every single one of them


Obviously. All I'm saying is that this particular decision ought not to have been taken from the user. Real alphabetical order is not an unreasonable thing to want.


“Real alphabetical ordering” is incredibly nonspecific. It’s underspecified even for ASCII-US, but essentially meaningless for those of us in 2025 who need to handle Unicode.

How do capital letters sort relative to lowercase letters? How do letters sort relative to digits? How do you consider code points that can correspond to different letters in different lettering systems with different ordering? How do you handle diacritics? Do you want the behaviour to be stable through Unicode normalization? Should it differ based on the character encoding? Should different representations of the same character, such as blackboard lettering or circled numbers, be sorted with other representations of the same character or grouped separately?

You can come up answers for these questions, but there’s no unambiguously correct option. The least subjective option is sorting based on encoded byte representation (if that is even specified), but that is not “alphabetical” and would not be intuitive to most users.


You're focusing on the wrong part of the problem when you say "essentially meaningless". Yes, choices must be made about how you order your "alphabet". But the meat of the request is that sorting goes character by character. That's a clear criteria, even with Unicode involved.

And I would say the reasonable way to define character is grapheme cluster and yes you want it stable to normalization and encoding.

How capital letters/diacritics/different representations affect the order of your alphabet, and which ones are considered equivalent, is something without a clear answer. Same for whether letters or numbers come first, and where punctuation goes. But you don't need consensus on that to fix the problem in the post.


I thought it was pretty well-known that capital letter come before lower-case. I think it's punctuation, then numbers, then capital letters, then lower-case. At any rate, that's what textbook indices do (assuming I remember correctly).


The issue at hand is how numbers are sorted. That has nothing to do with unicode.


Unicode has many different representations of digits, and I would dispute using the term “alphabetical” to refer to digit ordering in any case.


You are starting to sound like a troll. Yes, unicode has many representations of digits. That has nothing to do with the question of whether 2.jpg should come before or after 10.jpg.


You think user deserve to have control, but you think that control only needs to extend to the treatment of those ten characters, nothing else?

I guess your position is coherent, but it’s very silly.


Those ten characters are of disproportionately high importance (to put it mildly).


You're wrong about that. See UTS #10 § 1.4.

(I did not downvote you.)


"Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: