Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare, and the situation where someone wants "10" to be before "9" is far more common. Moreover, desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.

> I miss the time when computers did what you told them to, instead of trying to read your mind.

You may be looking at that time through rose-tinted glasses. I don't like when computers lie to me either, but "mind-reading" is really helpful in ways we take for granted, like autosave. Desktops can have an option to sort files truly alphabetically, but the more common case should always be the default; that's the definition of "intuitive".

* https://news.ycombinator.com/item?id=45404022#45405279



I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)

---

I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.


> I don't want to put leading zeroes before every all the single digit numbers in my file names.

> ... it would be great if I could just trust that every device will understand numbers.

Strings are not numbers, even if some part of their content "looks like a number."

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

So what are programs to do?

Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.


> Display strings in a consistent, documented, manner.

IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.

> this is your preference for a specific situation.

Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!

Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:

    1
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    2
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    3
    31
    32
    4
    5
    6
    7
    8
    9
If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.


Do I understand that you want these to be sorted like this?

  1
  2
  9
  10
  11
So I guess you also want things sorted like

  1.1
  1.2
  2
  9
  9.9
And also

  1
  1.1
  1.10
  1.2
  1.10.1
So when you're done defining whatever crazy rules you think up, how do I pause whatever and edit the filenames to get them back into lexicographical order?

You can massage lexicographical to meet your needs. I can't massage your arbitrary rules to meet my needs.


Your examples don’t need any extra rules to be sorted correctly. The basic idea is that any sequence of digits is treated for sorting as if it were a single character. On my iPhone, your examples are sorted as expected.


Would you sort

  1.10
  1.2
or

  1.2
  1.10
?

I would not know how an OS treats those if we do not assume mindreading vs proper lexicographic order. Why would we need to substitute precision with vagueness for something that simply taking care of proper naming would suffice?


Ah yes sorry, 1.10 comes after 1.2 because 10 is bigger than 2 (so in fact different from your example). But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.

If you have non-integer numbers in your filenames then it won’t give the order you want, but there isn’t going to be a rule that works for all cases.


I was with you until this point, but 1.2 is bigger than 1.10, because 1.2 is a shortened version of writing 1.20 _unless_ you explicitely want these to be version numbers or something like that. The normal expectation would be to treat numbers as, well, mathematical numbers, and not SemVer, especially if we only have one decimal point, don't you think?


As I said, the sorting rule won’t always give pleasing results, but it seems to me like a simple and reasonable modification of lexicographic ordering.


It is neither simple, nor reasonable.

1.10, the number, is equivalent to 1.1. It is less than 1.2. You say you want numbers to sort as numbers, but you want 1.10 to be greater than 1.2.

Do you consider '1/4' to be a number? Should it come before or after '1/3'?

I'm guessing that you don't want to sort one character at a time if you encounter one of [0-9]. Instead, you want to group all consecutive [0-9] as a single sortable number. But aren't characters '.', ',', '/', '-' also part of numbers?

What about numbers like ↋, 五, π, B, ⅔, or -1?


It doesn’t work for decimals. It also doesn’t work for pi, or most dates. That’s okay. Supporting those cases would require “reading your mind” / trying to guess what the user wants by applying opaque rules. I certainly don’t want that.

Treating consecutive digits as numbers is a simple modification (I still think it’s quite simple) that is easy to understand and supports 99% of real-world use cases.


> But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.

What level of assumption is here expected from the sorting-system, would it have to process ALL entries of the list to find multiple decimal-points and then assume that they are ALL versions and not numbers?

How to treat this on different locales, where the decimal point is a comma and thousands-separator is a dot. Should the locale then also be considered by that system? Also when listing the folder of a remote-system with a different locale?

What about dates, should that system attempt to sort entries with multiple date-formats (yyyy-mm-dd, dd-mm-yyyy, dd-MMM-yyyy,...)?

The topic is far more complex than this narrow example. If we expect such a system to alter its sorting based on some data format interpretation, there is a risk of misinterpretation which might make the whole list unusable...


It has nothing to do with decimal points. It just looks at any contiguous sequence of digits and treats it as a single character for the purposes of sorting. The decimal point could be any other character and the behavior would be the same.


So only whole numbers are sorted as numbers then.

Decimal numbers are treated as strings and will have a completely different order, with digits after the decimal point sorted differently to whole numbers without fractions?

Or you mean every set of continuous digits within the same string are considered as individual whole number?

Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.

--> This could be covered by adjusting the logic based on the amount of decimal points.

And the logic complexity keeps increasing, up to an arbitrary point of "no, this will not be considered", resulting in an unpredictable user-experience of sorting...


>Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.

Yes. I don’t see why this is a big deal.

I didn’t suggest adjusting the logic based on the number of decimal points.


Ah ok.

I understand that you found your perfect trade-off for sorting based on longer considerations. But it will be difficult to communicate such a concept to a user.

Applying partial rules to improve sorting in one direction is not a lossless activity, it makes the UX actually worse in other scenarios as the user is first guided to assume a certain behavior, but then learns that his expectation is broken in adjacent scenarios (Which is more or less the bottom-line of that article to begin with).

In the end it'll be just "another standard" for sorting [0]

[0] https://xkcd.com/927/


> But it will be difficult to communicate such a concept to a user.

This isn't a prerequisite, since the existing naive character sort approach is not communicated either. In fact, it's almost universally unexpected by any user who hasn't written a naive string sort. Apple doesn't do this, and I very much did not need it communicated to me why 10 was coming after 2, because that's what everyone, who's not a programmer, expects.

As a litmus test, go ask some people, who are not programmers, without loading the question beyond "here are some files, how would you expect for them to be displayed in a list?". Show the lists side by side. It should not surprise you.


I consider 八 to be a whole number.


There is a rule that works for all cases. It's lexicographical sorting.

Simple. Consistent. Easy to manipulate to get what you want.


We just discussed a situation where lexicographical sorting doesn’t work. Adding in a rule to treat consecutive digits as one number doesn’t significantly complicate the logic and makes sorting work for a major additional use case. It doesn’t magically fix every case but it fixes a common one with minimal downsides.


> IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent.

Are you sure about that?

  So how do you suggest handling hexadecimal numbers?
  Or octal numbers?
  What about binary numbers?
  What about file names with portions of a date and/or time?
  How is a program supposed to know any of the above?
> Let's say I have a directory of 32 numbered files.

Assuming any of the filesystems I am aware of is in use, those names are strings having one or two characters. They are not "numbered files."


Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it. We got to learn this in school in the 80-ies because sorting paper documents would be more logical and easier to find stuff. So way before most people got computerized.

It just happens to be the most logical way to sort for computers too, as long as humans are involved in the usage of the data.


> Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it.

That would be great, but this ISO is just one of the standards, and there are still regional standards as well.

And that's still ignoring the end-user. In Europe for example, humans might create filenames with date in format dd.mm, e.g. "Report 25.01.xls"

A system attempting to sort this intelligently would likely assume this is a decimal number, as it has zero context for it.

It's just slightly worse than the lack of consistent UTC-usage of systems, with the mixed attempts to correct data to local timezone (or not) depending on application...


Okay, I'll refine the rule to "Treat any sequence of digits as a base 10 whole number for the purpose of sorting". I still think this is quite clear. (Frankly, I also think the original definition is quite clear unless you're purposefully trying to misinterpret it.)

> those names are strings having one or two characters. They are not "numbered files."

Yes they are! In this context, a number is an idea, not a data type. Strings are capable of containing numbers.


I generally agree that treating substrings that are numbers as numbers is a good default for most users in most situations.

However, for hex numbers this simply won't give good results because some of them will just happen to not contain any of the digits A to F and be treated as base-10 numbers by the heuristic while others will include these digits and be sorted differently.

(So, a having a strict lexicographic mode as an alternative in file managers would be nice.)


Octal or binary numbers are going to be fine, but it'll totally and confusingly mess up hexadecimal numbers.


I am not sure any of the points you raised change anything to the OP's point, do they?

Op was taking about changing the rule to something more intuitive, in such case it would s'en natural that decimal numbers are used.


Your concept appears to have coherence until you consider that numbers are not necessarily expressed in decimal notation. What about hexadecimal numbers in filenames? Should they be sorted your way?

And what about very long strings of digits in the filenames - so long that they are too long for even the longest available numerical representation? In some apps, they are converted to floating point...


> "Treat any sequence of digits as a number for the purpose of sorting" is consistent.

How about decimal numbers, are they strings or still numbers?

How about version numbers with multiple dots?

How about decimal numbers of a different locale, e.g. you list the folder from a remote machine with filenames of a different locale?

The problem with such semi-consistent schemes is that they are still guess-work, they may make some cases better for some people, but other cases practically unusable because the system doesn't have sufficient information to handle all scenarios consistently.


> Strings are not numbers, even if some part of their content "looks like a number."

Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.

> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

> So what are programs to do?

> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.

These do not follow from each other.

First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.

Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.

Either way, this line of reasoning is just wrong.


> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense

Strictly speaking 9, 1 and 0 are not in the alphabet so can't be sorted alphabetically.

And I think most "normal users" wouldn't expect that programmers generalize the alphabet like we do.


> This works, but it looks kind of ugly

Maybe I'm weird but I prefer the way zero padding looks :)

I personally think the misalignment of lines where the numbers have different lengths looks (a lot) uglier than having zero padding. Sometimes it even throws _me_ off because the numbers have different lengths and ... well it just doesn't look sorted to me! :)

So the bonus of zero padding is that it'll be sorted correctly even if the file manager tries to be "smart" and sort incorrectly.


Well, that's not alphabetical order.

It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.


What desktop environment called this alphabetical?


This is a really important point - my file manager just says "Name" with sorting. So while its not perfectly defined, it doesn't make the promise of saying its alphabetical.


I mean, nine does come before ten in alphabetical order.


> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Amen.

> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".

> (all?)

Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.

I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].

1: https://audiobookshelf.org

2: https://github.com/PaulWoitaschek/Voice

3: https://tinymicros.com/wiki/Apple_iPod_Remote_Protocol


There’s a couple of reasons I don’t use m4b files:

- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.

- Some MP3 players (even those that support AAC) don’t support M4B.

- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)

- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)

I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.


And maybe someone else uses “American” style dates in their file names mm-dd-YYYY, can those also be put in correct order for those users?


That is just silly notation used by a minority in this world ;-)


Perhaps, but if you set your browser language to US English you have dates displayed as MM.DD.YYYY and there's no way to change it neither to European nor ISO (YYYY-MM-DD) format.


I'm not sure I agree. I think I could be convinced if there was a unique and universal representation for numeric values using characters.

But we have so many textual representations of numeric values that I'm assuming the "mind-reading" goodness only works for a small subset. And the subset will be somewhat intuitive for developers but unlikely to be so for non-technical people.

For example, does the order handle numbers with fractions (decimal points)? If yes, does it require a at least one leading digit (zero)? Does a.12345 come before or after a.345?

Does it handle thousand separators? What about international thousand and decimal separators (e.g. Euro-style . for thousand separation and , for decimal separation).

Does it handle scientific notation?

If the answer is no to any of these questions, it's likely to lead to surprise/confusion.

It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.


The sort rules are simple (1). Treat any consecutive sequence of digits as a number when sorting. So for example version numbers (which must be massively more common than decimals in filenames) work correctly, and 5.9 is indeed smaller than 5.10 and the latter is not identical to 5.1 .

Given that this idea goes back more than two decades, has been the default behaviour of the most used OSes for many years, with no major outcry, I think empirically we can be fairly certain that it does not routinely lead to a lot of surprises and confusion.

(1) https://en.m.wikipedia.org/wiki/Natural_sort_order


> The sort rules are simple

In considering the simplicity of the rule, I think you're using a developers perspective here where we automatically classify numbers and have a clear mental model of the separation between value and representation.

But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.

> with no major outcry

I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.


> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.

I am a highly technical user that works with a lot of people with traditional engineering degrees but little to no software experience (except as frequent users). The answer here is that they've learned that all computer software is arcane and mysterious, and so they just accept that there will be strange patterns they have to pick up on, and that's their role as a user. They don't complain about strange and confusing behavior because they treat all the behavior as strange and confusing.


    > traditional engineering degrees
What does that mean? What disciplines? I cannot believe that all junior graduates in engineering disciplines in the 2020s are not doing some programming, even if just writing macros in a CAD program.


Most of the people I work with are 35+, but even the juniors in MechE, Aero, etc. tend to have some scripting experience that doesn't necessarily translate to having a robust intuition about DBs, the relationship between frontend and backend design, etc.


> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.

Because EVERYTHING a computer does to non-developer/technical users is "strange and confusing". With few exceptions, most people have no idea why their computer does something the way it does, or how they could make it do something different even if they wanted it to. And most of the time, when they complain about it to someone knowledgeable the answer will be some variant on "that's just sort of the way it is". Imagine a world where the names are sorting the way that the OP is looking for, you're still having to explain to someone why the first group sorts "out of order" and the second group sorts "in order". And if they complained, they would almost certainly get an answer that is some variant on "that's just sort of the way it is".


And if you explain in detail about how it works, a lot of people (not all, but quite of few of the more obstreperous types who raise these as CRITICAL BUGS with solutions apparently SO SIMPLE MY DOG COULD IMPLEMENT IT) will then say "I don't know why you have to make it all so complicated, things were simpler and better in v(n-12) in 1997".

If you add an option you're making it more complicated, harder to document and less discoverable, if you don't it's "useless", if you use a heuristic it's "too magical". Eventually someone has to be unhappy.


> But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.

You don't have to explain it if the situation never comes up.

I'd bet 99.9% of computer users don't have any files which would trigger this edge case in a situation they would actually notice. Decimals just aren't that commonly used in this context, and even if you do have decimals the sorting will still work a lot of the time. For the remaining 0.5%, chalk it up to a bug.

I literally had to test this on my Mac just now because I never realized it was broken.


> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.

It reminds me of the recent article here titled something like "Altoids by the mouthful". We just get used to eating cat poop and we never realize it is not a good idea to eat cat poop, not that we should make it more palatable by chasing the cat poop by chewing Altoids by the mouthful.

Edit: for today's lucky ten thousand

https://news.ycombinator.com/item?id=45343449


> Treat any consecutive sequence of digits as a number when sorting.

Based on this description, I have no idea how the following would be sorted:

• photo.jpg

• photo1.jpg

• photo01.jpg

• photos.jpg


Does it matter?

There's a user expectation that photo20.jpg comes after photo3.jpg.

There's no user expectation around whether photo1.jpg or photo01.jpg comes first. Just like there's no user expectation around whether photo1.jpg or Photo1.jpg comes first. Users also don't have the slightest idea about what order punctuation gets sorted in.

Just sort the things that matter in the way users expect (natural sort order) and come up with something reasonably consistent for the rest.


> There's a user expectation that photo20.jpg comes after photo3.jpg.

my user expectation is the opposite

i get what you're saying but it's not achievable in practice, at least not consistently


It sounds like a problem with too many expectations therefore someone will be disappointed.


> There's a user expectation that photo20.jpg comes after photo3.jpg

I expect photo20.jpg to come first.

> There's no user expectation around whether photo1.jpg or photo01.jpg comes first.

Clearly photo01.jpg comes first.

> Just like there's no user expectation around whether photo1.jpg or Photo1.jpg comes first.

Of course Photo1.jpg comes first because uppercase comes before lowercase.

It really sounds like you're using the word "user" to mean "dumb" and I wonder, what got you to the point that you started considering yourself an expert on "dumb" and feeling the need to defend "dumb" ?

I'm sorry but it all comes off so condescending, like "users" are a different+lower species or something.


> Does it matter?

Yes. An algorithm must be unambiguosly specified for all possible inputs.


> An algorithm must be unambiguosly specified for all possible inputs.

And it is. It's just that some outputs may not match what the user expects. TFA's preferred algorithm (simple lexicographic sorting) matches user expectations 90% of the time. The algorithm actually in use on most OSs (simple lexicographic sorting + treat consecutive digits as combined numbers) matches expectations 99% of the time. An algorithm that matches expectations 100% of the time doesn't exist. Shouldn't we pick the 99% algorithm?

(I am admittedly making up the actual percentages, but you get the point.)


I get your point but I still disagree (also about the percentages btw). Can you also get _my_ point?

Well-designed machines quite _often_ operate against "user" expectations when those expectations are wrong.

For instance say if I charge my phone for an hour, it'll last for a day. How long will it last when I charge it for two hours? Because in practice the answer is either "also a day" or it is "the battery catches on fire", this machine acts _against_ user expectations and stops charging the phone after an hour.

Maybe an even better example: coins! I dunno about coins in the US but but get this: the 5 eurocent coin is _bigger_ than the 10 eurocent coin! I dunno why, or if there even is a good reason for that, but it doesn't seem to bother "users" of money (e.g. everybody) when they have to sort out cash.

Anyway my point is that even if _some_ (but definitely not all!) people may expect numerical sorting, doesn't mean that they're right ... and it's not like lexicographic sorting is rocket science and zero padding .. well I think you said you don't like the way it looks, but I actually think it looks very neat because things line up and it's actually easier to read for me, as well :)

It's dumbing things down, in a bad way. It's like hiding the inner workings of stuff, and it's a mistake to think that even if somebody is not familiar with computers that they are _stupid_. People might even get curious and figure out that numbers come before uppercase and those come before lowercase. And maybe one day someone comes along and says "you know that's because of ASCII?" and they learn a thing! Which is cool.

Instead it's like you're painting people scratching their heads wondering "why number not go up?"


I just tried it on Mac, its sorted in the order you listed. Extending it a bit, the order is:

photo1 photo01 photo001 photo0001 photo2

So the shorter representation of the same number comes first. It does make intuitive sense to me.


But did it show as a list or an ordered collection of folders? And the second time you opened the folder did it rearrange into a haphazard scattering with items off the edge of the window?


> I just tried it on Mac, its sorted in the order you listed. Extending it a bit, the order is:

> photo1 photo01 photo001 photo0001 photo2

What you enumerated is known as "ascending lexicographical ordering" and has nothing to do with "the shorter representation of the same number", but instead the ASCII[0] character values in each file name.

0 - https://man.freebsd.org/cgi/man.cgi?query=ascii&apropos=0&se...


With ASCII lexicographic ordering, photo01 would come before photo1.


> With ASCII lexicographic ordering, photo01 would come before photo1.

Good catch. I did not closely inspect the ordering initially presented, which was:

  photo1 photo01 photo001 photo0001 photo2
This does not represent a valid sort order based solely on file names produced by `ls`, which would be:

  photo0001 photo001 photo01 photo1 photo2


1 and 0 aren't even in the alphabet so in "alphabetical order" I still wouldn't know a prior how that's sorted.

I guess?

  photo.jpg
  photo[nine].jpg
  photos.jpg
  photos[zero][one].jpg


There is a standard algorithm - CLDR collation. There are several options available but, generally speaking, it’s a standard.

The specific option for numeric sorting is “kn”.

As far as I can tell, every operating system and many other interfaces tend to use this standard algorithm.

https://www.unicode.org/reports/tr35/tr35-collation.html#CLD...


> If the answer is no to any of these questions, it's likely to lead to surprise/confusion.

Worse, if the answer is yes to any of these questions, it's also likely to lead to surprise/confusion. The only way to win is not to play.


The entire idea that numbers would be treated on a character by character basis rather than as numbers is somewhat intuitive for developers and not for non-technical people.

The answer to all of those questions is no for lexicographic ordering. Lexicographic ordering leads to surprise and confusion as a result.

> It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.

It's been on windows and macOS for coming up on 25 years, and is in practically every modern UI. It’s reasonable.


Are filenames likely to include those representations? I feel like probably not (can you even include commas in Windows filenames?)

More to the point of the article--if you want things sorted by date, sort by date. I think most laypeople aren't looking at long CHAR1234_5678 filenames anyway, they're looking at thumbnails and dates.


> can you even include commas in Windows filenames?

Yes.

> Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following: The following reserved characters:

< (less than)

> (greater than)

: (colon)

" (double quote)

/ (forward slash)

\ (backslash)

| (vertical bar or pipe)

? (question mark)

* (asterisk)

https://learn.microsoft.com/en-us/windows/win32/fileio/namin...


> if you want things sorted by date, sort by date

Unfortunately it doesn't work. When I copy the files, they all get new dates in whatever random order they happened to be copied in.


The most common date format used in Europe uses period separators so can often appear in filenames. Commas are probably more rare. Things like versions are often fractional like v1.3 or v1.11 and can appear embedded in filenames.


That's not fractional though.

Proper fractional, 1.11 is smaller than 1.3.

In versions, 1.11 is larger than 1.3


Ah, the classic filenames with decimal points and scientific notation in them, so common...


Here's a different scenario: filenames with dates in them. Consider September Budget and October Budget. September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th.

The problem is that there is no such thing as natural, and it is quite hard to determine what is more common. (Quite often more common is culturally dependent or, worse, contex dependent).


So the argument is that because this doesn't solve the challenge of ordering all possible strings by semantic meaning it should not be used?

Even though it increases the match between semantic meaning and string sorting in many important cases and is a simple and consistent rule?


the one true way: budget_09.csv, budget_10.csv


Then `budget_100.csv` comes by and now you need to rename 99 files.


It’s been about two thousand years since the number of months in a year has been increased. I don’t think we’re getting 88 new ones anytime soon.


Sure, but if in this case the number would have only indicated the month you have an issue way earlier than 100 actually, you already have an issue on month 13 when you would go back got 01 and now you are overriding the old one.


Presumably there is a separate directory for every year.


And it's been about 25 years since we had to increase the number of digits for a year.

budget_97.csv, budget_98.csv, budget_99.csv, budget_2000.csv


> It’s been about two thousand years since the number of months in a year has been increased.

What? What are you thinking of? The number of months in a year is always 12 or 13 in any calendar system because they start by reflecting the moon. If you mean the Christian calendar, it was fixed at 12 months to the year well over 2000 years ago. If you mean any calendar, it's probably been more like one year since the number of months in a year has been increased. 12 lunar months falls short of a solar year by about 11 days, so any given lunar calendar will generate an extra month about every three years, and there are lots of different lunar calendars.

(For example, the Chinese calendar occasionally repeats full months in order to keep the month of the year lined up with the season. Whenever this happens, there will be 13 months in the year, of which two share the same name.)


The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is. Either that, or when month 6 got renamed August in honor of Emperor Augustus

[1] https://en.wikipedia.org/wiki/Roman_calendar#Legendary_10-mo...


> The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is.

Well, in the first place (as you note), there is no reason to believe that claim - the ancient Romans never made such a claim, but the classical Romans made that claim about the ancient Romans - but more importantly even if it were true the months would have been added many centuries prior to "about two thousand years" ago. Nothing related to additional months happened two thousand years ago.


Given that 09 and 10 refer to months, that wont ever gonna be a problem. And if you want to differentiate them years too, you can prefix with 2025- or put them in a 2025/, 2026/ etc folder.


Even better, I'd prefer to have more semantic meaning, and for budget-2025-09.csv, buget-2025-10.csv to work everywhere...


>September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th. The problem is that there is no such thing as natural

Yeah, but there is such a thing as "give a predictable and consistent way I can name the files so that they sort as I want everywhere" which (if different OSes don't try to be "smart") would have been to prefix them with the numeric date zero padded.


Budget 2025-09.ods and Budget 2025-10.ods would sort reliably.

The options explode infinitely if you start trying to guess what people want in terms of semantic grouping. One user might want to see "September Budget" beside "September Sales Projections" and "September Calendar", and another might want to group it with "October Budget" and "November Budget".

If you have simple, stupid, but predictable tools, people can work around that, by picking naming conventions and even directory groupings that achieve what they want.

The worst is when you have an enforced sort that's not what you want. I think in Windows now, even if you say "Sort by name" in the Downloads directory, it insists on sub-grouping by age. I want every version of the Foobaz spec I downloaded, and no, I don't remember if all of them were in the last 3 months!


There is a simple criteria for ordering file names: treat sequences of characters as alphabetical, and sequences of digits as numbers.

It's easy to understand and predictable; it just happens to not be based on ASCII character codes, which is a legacy technology method only ever meaningful to US developers.


You can easily disable grouping in Windows Explorer.


Date is already in the metadata, it doesn't need to be in the filename.


Have you ever copied a file?


Yes, have you never edited the metadata? Also most filesystems these days preserve it when copied, e.g. my camera's EXFAT filesystem on an SD card gets the creation date preserved when I copy it to my PC or NAS, or between NAS & laptop later.


> Yes, have you never edited the metadata?

I don't even know what that means.

And just because some OS's copy the creation date doesn't mean all of them do. Specifically, the most popular desktop OS -- Windows -- doesn't.

(And it has nothing to do with your filesystem. It's your OS.)


>I don't even know what that means

Obviously something like:

  touch -t 202309271530 myfile.txt


And I'm supposed to do that manually for each of the couple hundred photos I copy...?

I'm sorry if I have a hard time taking that suggestion seriously.


>Yes, have you never edited the metadata?

Is your suggestion that people edit the metadata to get the sorting they want? madness...


Agreed.What's more, the idea that people learn to put leading zeros is wrong and impractical, unless you know in advance how many digits you need. When you go from version 5.9.17 to 5.10.0 you don't go back and relabel every existing folder as 5.09.17.

The today standard way of sorting is well defined, unambiguous, and natural. Lexographic has its place, but user facing interfaces ain't it.


Had this in the Beat Saber mod manager recently. The game released 1.40.10 and my mod manager suddenly thought that game went backwards from 1.40.9


I had a similar fun problem with a little tool for use with an ATSC TV tuner.

For context, while NTSC program selections were typically indexed by channel ("ABC here is channel 4, NBC is channel 6"), ATSC uses "subchannels" like "12.1" or "21.5". I had assumed these could be safely stored as a decimal type.

Then one of the broadcasters here introduced both "42.1" and "42.10" and it broke the key model in the underlying SQLite database I kept the channel info in.


No

Just no

User interfaces that try to be cleaver are a pita.

Keep it simple, and avoid the confusion with corner cases that otherwise will baffle users. Like this


Lexicographic order is great when you need an unambiguous criterion that will work the same in every implementation; but you only need that for automated processing, i.e. for coding.

For user-facing presentation, having 5.9.xxx before 5.10.xxx is simpler; the corner case that baffles users is having 5.1 and 5.10 before 5.2.


Some (most) systems will sort 5.9 after 5.10 though, so if the user is baffled they'll need to learn it anyway. Adding a second way to do it kinda makes things worse


LOL I can tell you don't have the experience of designing UI and shipping product to end users

> Keep it simple

What's simple? Good defaults make things simple, which means putting 9 before 10 in case, for the reason explained by parent.


I think the only problem is that it's a surprise and mystery, particularly because "dumb" alphabetical sort has existed forever. When they "fixed this" for the 99% of regular users cases, they should have made it as separate "smart natural sort" option separate from the "strict alphabetical sort" option (next to date, size, etc). Simple and obvious, rather than surprisingly different from the decades of experience that even non-technical users already have.


It's not just the one decision though; there are literally thousands, maybe tens of thousands, of these decisions in most software. You want every single one of them to have an option? You want it to support every single combination? At some point, it is ridiculous. Sometimes you just have to decide how your software is going to work and not leave every single decision to the user.


You don’t let every decision to the user, you make good defaults, but leave the option to override to the user! And thousands isn’t scary as long as groups/tags/search work, so what’s ridiculous about empowering the user?


Increasing the number of different possible combinations of settings your software can be running with by a factor of one nonillion is not a choice I’d make if I wanted to have any confidence in its reliability and security.


That's why you write small programs. It won't take long for most programs to bloat to the level where they're dealing with nonillions of combinations, whether the user has control over those combinations or not.


How the files sort seems kinda important. It gets at the core behavior of the program. It's not something superficial like a default icon, which the user probably can change.


It may be one of thousands of decisions, but it's one of a handful that are exposed in the user interface as a fundamental action.


In a file manager? Any more than the displayed thumbnails, icon size, whether folders are separated from files, whether images are separated from videos, what video types are supported, what file types are opened inline, what the click and double click behaviours are, etc?

And yeah kde has settings for all these but kde is also known for being too configurable.


There's such thing as too many options, and there's also such thing as too few. This is one of the important ones. I'd say that macOS, Gnome, and Windows have definitely hidden or removed a lot of important options in the past decade, and despite the modern slickness mesmerizing people into thinking they're easier to use, they're actually harder to use as a result.

(I say this as a professional developer and power-user of all 3 desktops over the past 25 ish years, who also helps non-technical family and friends a few times every year. Some people will be like "oh I'm so bad at computers lol" or "oh this is a piece of junk huh" but really the UI just got dumber in the name of "ease of use", and the expert has to be called in to decipher it.)


I might be wrong on this, but I vaguely recall that on macOS back when you could commonly option-click to reveal advanced options, if you held option when clicking a sort it would change how it sorted from alphabetical to lexical or vice versa. I’m not a thousand percent sure of it, though, I think when I needed it I was able to set a directory preference via terminal to change how a specific directory was sorted and it was an option there. MacOS had (or has) a lot of buried options which I presume date back to its origins as a Unix as well as a convenience to its developers. A lot of the command line utilities were hacked calls to graphical settings code though, so it wasn’t very stable version to version as the UI calls changed and nobody prioritized non-UI bug fixes or breaking changes. These days CLI is nearly forgotten or assumed to be an exploit vector - see Screen Time data for example.


But the alternative would be a surprise to people who assume "by name" will order numbers, including those who are new to technology (and I think most non-technical people who sort things manually unknowingly order numbers).

We want to minimize surprises and mysteries, but computers have so much hidden complexity it's impossible to eliminate them. If users were shown a full description of how every feature on their computer worked before using it, they'd quickly start ignoring the descriptions. There should probably be a tooltip or "manual entry" for "by name" for those who are curious, and it should never be labeled "alphabetical" because it's not. But cases like the author's, where he assumes a feature works differently than most people (including the designers) assume, can't be helped.


> and the situation where someone wants "10" to be before "9" is far more common.

I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.

> desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.

FYI the more formal name for the "by name" order is "natural sort order".


> I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.

Depends on which direction you're sorting in, no?


> Depends on which direction you're sorting in, no?

In a vacuum: yes. In this particular case: no, because we have the article's context clarifying that we're talking about ascending order.


It’s more confusing. I thought the article was correct when they said -10 coming before -9. Why? Because they were talking about the strict alphabetical sort. They are already prepending zeroes to force the comparison to be 10 vs 09. So, yes, they were talking about ascending order, but not natural ascending order, but ascii sorting order where 10 is before 9 because the comparison isn’t 9 vs 10, but 1 vs 9.

It was only clear to me because I could guess where they were going. They were complaining about natural sort vs alphabetical sort, which is a case I’ve run into many times, so I could see the argument coming.

The irony to me was that they were already altering how they named files to fit what they thought the computer wanted by prepending a zero to get a proper alphabetic sort. And even after that, some computers didn’t follow their idea of what it should be doing.


You mean file9 before file10?

I have some beef with microsoft, that you can only change this at the Computer level, not per user (see registry key below). Also they call it natural sorting for users, but logical sorting internaly. Unify your termini!

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer] "NoStrCmpLogical"=dword:00000001


To change it per user, set it in the user's hive instead of in the local machine hive (e.g. HKEY_CURRENT_USER instead of HKEY_LOCAL_MACHINE)


TIL they are called "hives". Windows Registry is an interesting thing. Even casual users have to interactive with it once or twice w/o fully understand it.

https://learn.microsoft.com/en-us/windows/win32/sysinfo/regi...


Raymond Chen explained why a registry file is called a “hive”:

Because one of the original developers of Windows NT hated bees. So the developer who was responsible for the registry snuck in as many bee references as he could. A registry file is called a “hive”, and registry data are stored in “cells”, which is what honeycombs are made of.

https://devblogs.microsoft.com/oldnewthing/20030808-00/?p=42...



Thanks for fixing the link!


I mean, if you're running regedit at all you are not a casual user.


> I agree with Microsoft/Google/KDE's order.

I don't. I want string sorting to be string sorting. Filenames are strings.

I wouldn't mind if there was an option to tell the file manager to do this "wrangle numbers out of strings and treat them as numbers" thing--so that I could turn that option off, and others who want that behavior could turn it on.

But for this to be the default, without even a way to change it (except in Dolphin, it looks like)? That seems daft to me.

Btw, I use Trinity Desktop, and I just verified that in TDE's version of Konqueror, the sorting of filenames is the same as for ls on the command line, e.g., 'item-10.txt' comes before 'item-9.txt'. Another good reason for me not to have switched to a more "modern" desktop.

> The author's situation is extremely rare

I don't think it is. But that's really beside the point. The computer is my tool. If it doesn't do what I want or expect it to do, it's a bad tool for me. And designers of tools shouldn't be making assumptions about how I want to use it. They should be giving me ways to tune it to how I want to use it.

> "mind-reading" is really helpful in ways we take for granted, like autosave.

I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.


I generally agree with your points (and love TDE) but

> I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.

That’s why, with auto save systems, you flag/name a version as your canonical save point.

Rather like a video game, I’d rather have the autosaves and not need them, because I generally save the game myself, than not have them at all.

A computer can be helpful and obedient at the same time, when it’s done correctly and puts the user in control.


> with auto save systems, you flag/name a version as your canonical save point.

You mean each saved version is stored separately, like a version control system?

A system like that would be fine (in fact I use version control all the time for this kind of thing). But that's often not how auto save is implemented; the auto save just clobbers the last version you saved. That's the kind I don't use.


I’ve never used an autosave system for general software like that, overwriting the file but keeping no history, before. Which one behaves like that?


How long have you been using computers? Once upon a time, all autosave systems worked like that. That's why they were rarely used.


> How long have you been using computers?

Since the 1990s, and I've never used an autosave system that worked that way.


The file sorting isn’t something relegated to niche users because of the prevalence of tv episode file name sorting (eg S01e01) and it has necessitated the leading zeroes to make it work properly with “alphabetical sorting”.


And that would sort correctly with both methods, though, especially when each "field" is delineated (e.g. Show.S0XE0Y.Episode.Name.HEVC.1080p.mkv)


You’re saying that files with s1e10 and s1e9 would place 9 first?


Both should be supported.

Perhaps put the uncommon (true alphanumerical order) behind a nested menu or something. But the mind-reading-less option should be there.


> Both should be supported.

At least in KDE they are, and you can pick whether you want natural or alphabetical sorting (which has a case sensitive and insensitive variant).


"The author's situation is extremely rare"

People sorting their files for alphabetical order is extremely rare?

And right now I fail to see even one 'case where someone wants "10" to be before "9"'


People sorting their files in alphabetical order but who want numerical values in their files to be sorted digit by digit instead of as numbers is the rare case.

I might go further in my ideal sorting algo which would be normalize capitalization and ignore all non-alphanumeric characters and treat them all as separators.


What you vaguely outline has already been standardised in UTS #10. The algorithm is both based on prevailing user expectations and also has shaped them since the wide-spread adoption of implementations.


> ignore all alphanumeric characters

There's not much left to sort by then, is there?


Good catch! Fixed.


"mind-reading" is a really an unfortunate term though. Every algorithm is a strict and consistent set of rules that tries to serve the needs of its users. No magic is ever involved.

It is just that some users have conflicting needs and some sets of rules are more complex than others. So I think what this really is about is 'computer reading', the needs of some users to be able to predict with ease what the computer is going to do. Some people would rather be able to predict the computer doing something that they actually don't really need, and then make up for its shortcomings, than have something they feel they cannot predict and control, but is actually closer to what they want.

This is a bit like the term magic. Any sufficiently complex algorithm may indistinguishable from mind-reading, but it's still an algorithm. Mind-reading, like magic, depends on us being able to understand or not, which is highly subjective. But both are misleading terms.


> I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare...

Even if that were a valid reason for making it the default behavior, the real issue is they don't even give you the option to have the lexically correct sort order. They just decided to give you something that's not accurate and that's all you get.

A trend which is frustratingly, increasingly common.

It's trivial to allow customization behind menus. But we rarely get that anymore. Especially for sandboxes devices like phones.

It's a giant middle finger to users who want to actually use their devices as a tool, instead of simply a portal for more sales and marketing.


I agree with everything but the definition of intuitive; sometimes, the more common situation is less intuitive. An egregious example of this is "Close ad" buttons, which are intentionally placed unintuitively to direct the user to view the ad.

Your definition of "intuitive" would imply that innovation in intuition is impossible, which is evidently not true.


>You may be looking at that time through rose-tinted glasses.

Nope, regarding what he talks about, the time was rose-tinted itself.


I agree with you, but I also agree with the author: the heuristic used to figure out the "natural" ordering here is broken; if you're going to "guess" at how to order things, you need to be more sophisticated than just "find a suffix that looks like a number and order by it".


What is the reason to append a textual file name with a number? User Experience?

They are magic numbers. Maybe a serial ID, date stamp with more magic, revision, release, ...

Magic Number land has 10 > 9 in the above.

9 > 10 is only possible when removing the Magic Number and morph into mealiness text.

At the moment I cannot think of any magic number where 9 > 10.


How is that right, when file explorer picks an arbitrary character in the middle(!) of the filename and sorts by it? Say, I have a file987name.txt and list5.txt, so sorting by name ascending a file explorer would for whatever reason decide to sort by fifth character, so that list5 would lower than file987name, because 5 is lower than 9, via some twisted logic. How is that normal in any way?

Thankfully I'm using Total Commander and FastStone as a image organizer, neither of which have this bug in the sorting.


... no file explorer behaves as you describe.


That was an analogy, to illustrate how the "intelligent guessing" of sorting looks weird as soon as any other character is ignored.

PS: apparently FastStone also sorts "intelligently" :( , I didn't test it correctly the first time. Only Total Commander does sorting as expected.


Haven't people started calling this "natural" order or something?


Most of the time, as a regular user, I agree with having smarter ordering. And smarter all features for what its worth. Except when it doesn't work because of some corner case. In which case the "smart feature" becomes a kind of a leaky abstraction - now as a user I have to figure out how the machine works, so that I can trick into doing what I need.

Give the user an option: have both "by name" lexicographic ordering, make it default by all means, but also provide a way to switch to an alphabetical order one for power users. Same applies to other features.

It is disappointing that apps and even some Linux Desktops today take the flexibility away from users, in the name of usability. By all means, I like and benefit from all the smart features, and I want them and will keep the on by default, but leave me an option to do the simpler, dumber and more predictable things too, for the case when I need to fallback to it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: