>they don’t need to do anything to conform to your arbitrary organization choices
They don't need to. It'd be smart. It's not "arbitrary". It's fucking library science.
>hashes are as long or short as you need them to be
Hashes might uniquely identify a computer file, but they don't uniquely identify an edition/release of a published book. Some jackass on libgen decides to tweak a single byte, now it has a new hash... but it's not a new edition.
>publication timestamp is in every ebook’s metadata
As someone who takes a look at every internal opf file, no... they're not in every ebook.
You're suggesting I go to the extra trouble of doing a job they could do easily, when I can only do it poorly, and I don't know why... because the first person to respond was a dumbass and thought I was attacking him? I swear, 99% of humans are still monkeys.
You don't need to hash file contents (though that is often a useful thing to do). You can hash e.g. the URL that was earlier claimed to be the canonical identifier. Running it through your favorite hash function fixes your complaints about file names (choose your favorite hash function such that it is not too long and only outputs allowed characters).
Ah. The url, so I can substitute one difficult-for-human-readability with another difficult-for-human-readability, both of which are excessively long and opaque-by-design.
>choose your favorite hash function such that it is not too long
ISBN's 13 digits is about as long as is tolerable. Any time there is a list of authors six names long (academic titles) along with a subtitle, it's very easy to bump up against max filename size.
This isn't a problem I can solve on my own. Just trying to bring attention to it. My solution thus far is to just avoid publishers who are so unprofessional as to not provide numbers. It's not tough, Project Gutenberg does it. Anyone can do it. If you're some amateur whose entire catalog is 8 books published, you say "this book is 1, and this book is 2" etc, and it's a done deal. Again, I don't expect anyone to use ISBNs (in the US, you have to pay for them unless you're one of the big 5 publishing houses), but just use your own for god's sake.
Hashes are not excessively long unless you choose to make it so. They might be opaque/random if you want, or they might not. "Remove all special characters and keep only the first 5 characters with space padding" is a string hash function. "Keep only the first 5 vowels with space padding" is a string hash function.
Here's a friendly AI generated hash function to give you an opaque 13 digit number if you're into that:
It looks like their ebook sources are all published in git repos online, so you could check out the repos, get the timestamp of the initial commits, and do a monotonic ID on that if you wanted. You could also contribute the change back to them if you think it's something others would benefit from.
They don't need to. It'd be smart. It's not "arbitrary". It's fucking library science.
>hashes are as long or short as you need them to be
Hashes might uniquely identify a computer file, but they don't uniquely identify an edition/release of a published book. Some jackass on libgen decides to tweak a single byte, now it has a new hash... but it's not a new edition.
>publication timestamp is in every ebook’s metadata
As someone who takes a look at every internal opf file, no... they're not in every ebook.
You're suggesting I go to the extra trouble of doing a job they could do easily, when I can only do it poorly, and I don't know why... because the first person to respond was a dumbass and thought I was attacking him? I swear, 99% of humans are still monkeys.