What is the takeaway here? Heuristic exception raising? Exposing error bars and confidence figures? If it were that trivial, would they even have to leave it to the user to make the actual decision?
I agree. With sanity checks, the program would report the size only when it is believable (yet still wrong), and say something like "error calculating size" when the number is bigger than the disk?
This doesn't help at all, and sacrifices consistency (and availability) for no clear benefit. Software that is wrong consistently (or mislabeled) can be used (or fixed by an errata). Software that aborts for harmless reasons is useless.
Dunno about you, but my takeaway is that asking "how much space does X occupy" on a modern filesystem is a poorly specified question. Which means that there isn't a single answer. Which means that software should not try to report a single answer, unless there is additional context that fully disambiguates the question. And generally speaking, that last part will not be true, but programmers will think things like "well, they're looking at backups, so we are in this context...", which will sometimes be wrong.
Stepping back a bit, this becomes partly a UX problem and partly a user education problem. Software alone is not going to be able to perfectly guess[1] why a human wants to know, so the human needs to learn enough to competently ask. Software can certainly help educate them, and I wish more software tried to level up the machine operator instead of trying to guess the right thing in the face of ignorance.
[1] Don't mention "genai". Even if the robots get to the point that they are better than me at contextualizing this sort of thing, until they assume legal liability for outcomes, humans need to make decisions.
I don't think the issue is with the number, but rather the context. What should the end user think when finder reports that there is 60TB of data on a 2TB drive? People with above average file system knowledge can generate an explanation for it, but what good does that number do?
When programming a feature like this it can be hard to spot the problem. The calculations for the backup's size can be 100% right, but still not appropriate to show.