Hacker Newsnew | past | comments | ask | show | jobs | submit | contact9879's commentslogin

printed documents, images, horribly inaccessible pdfs, horribly inaccessible websites

> Printed documents - Use the original, which is digital.

> Images - Use the original, which is digital.

> horribly inaccessible pdfs - Use the original, which has real text in the PDF

> horribly inaccessible websites - All text on any web site is digital. Nobody uses OCR on a website.

A massive paper producer like the government shouldn't adopt their type setting to people who are using technology wrongly.



God damn...

Why didn't they fax it back and forth a few times as well, just for good measure?


it's easier to mandate font than to excise all processes within the fed bureaucracy that result in these.

images being digital have no bearing on OCR ability


Images: use the original, which is a digital text document and not an image.

Unless they are making documents on typewriters. And in those cases neither Biden or Trump font is an option.


i’m not sure how end-of-life it will actually be because rosetta is used in apple/container and seems to be a large part of the virtualization stuff apple’s built in the last few years

I would imagine they would disable the user-facing "load x86_64 Mach-O's seamlessly" and other loader magic, and keep around the core for such things.

it looks just like signal with a purple theme


thanks sophie. now if only this would get as many eyeballs as the inciting one

sigh


Damn. What’s with all the personal attacks against the author in this comments section?


It’s such a well written (and factually correct) post, I don’t get the hate at all.


American culture still holds puritanical views of sex, and this article crosses the taboo threshold that 15 year olds are having sex.


do you have a citation for M5 having MTE?


I did primary research. I just bought an M5 Mac and confirmed by doing:

  $ sysctl -a | grep MTE4
  hw.optional.arm.FEAT_MTE4: 1


Thank you for posting that. I was pretty sure the M5 was going to ship with MTE, but the last time I checked the documents, they still hadn't updated them (nor any mention of M5 having Apple10 in the metal feature tables). Some big features there that makes me want to upgrade!


It does.


the ebook identifier uniquely identifies every ebook. standard ebook ebooks use the url as their unique identifier


Those are poor identifiers. A numeric or short alphanumeric identifier that can be part of the filename is important... I have as many as 5 different editions of the same title so title+author doesn't do the trick. Nor am I putting a url into the filename, couldn't if I wanted to as there are disallowed characters in a url in every filesystem I've ever heard of. How difficult is it to keep a incrementing catalog number like Project Gutenberg does? Anything that doesn't have a proper unique just seems unprofessional.



This isn't a solution either. Not sure why you think it is. Here's how I name files, just as an example:

    Meditationes de Prima Philosophia - GTNB•0023306 (2007) - Descartes, René (aut)

    Meditations on First Philosophy - 9780203417621 (2013) - Descartes, René (aut); Haldane, Elizabeth (trl); Ross, G. R. T. (trl) & Tweyman, Stanley (edt,wfw)
Where and how should I put a URI in there, especially considering that they at minimum need the colon (:), which is a problematic character in filenames on NTFS/HFS/APFS/XFS? They're not exactly disallowed, but they create a resource fork or some shit and so it doesn't behave as you would expect. If Standard Ebooks just started numbering their books, then I'd slap the STBK• in front of the number and use that. They're not in Worldcat, or I could use OCLC numbers (but it shouldn't be other people's job to keep the catalog of their own books).


choose your favorite hash

    hash(<dc:identifier>)


Hashes are too long, aren't human-recognizable as to meaning, etc. I don't want half-assed workarounds. They need to uniquely number their books.


- they don’t need to do anything to conform to your arbitrary organization choices

- hashes are as long or short as you need them to be

- publication timestamp is in every ebook’s metadata, is almost guaranteed to be unique, monotonically increases, and has actual semantic meaning compared to an isbn or oclc


>they don’t need to do anything to conform to your arbitrary organization choices

They don't need to. It'd be smart. It's not "arbitrary". It's fucking library science.

>hashes are as long or short as you need them to be

Hashes might uniquely identify a computer file, but they don't uniquely identify an edition/release of a published book. Some jackass on libgen decides to tweak a single byte, now it has a new hash... but it's not a new edition.

>publication timestamp is in every ebook’s metadata

As someone who takes a look at every internal opf file, no... they're not in every ebook.

You're suggesting I go to the extra trouble of doing a job they could do easily, when I can only do it poorly, and I don't know why... because the first person to respond was a dumbass and thought I was attacking him? I swear, 99% of humans are still monkeys.


You don't need to hash file contents (though that is often a useful thing to do). You can hash e.g. the URL that was earlier claimed to be the canonical identifier. Running it through your favorite hash function fixes your complaints about file names (choose your favorite hash function such that it is not too long and only outputs allowed characters).


Ah. The url, so I can substitute one difficult-for-human-readability with another difficult-for-human-readability, both of which are excessively long and opaque-by-design.

>choose your favorite hash function such that it is not too long

ISBN's 13 digits is about as long as is tolerable. Any time there is a list of authors six names long (academic titles) along with a subtitle, it's very easy to bump up against max filename size.

This isn't a problem I can solve on my own. Just trying to bring attention to it. My solution thus far is to just avoid publishers who are so unprofessional as to not provide numbers. It's not tough, Project Gutenberg does it. Anyone can do it. If you're some amateur whose entire catalog is 8 books published, you say "this book is 1, and this book is 2" etc, and it's a done deal. Again, I don't expect anyone to use ISBNs (in the US, you have to pay for them unless you're one of the big 5 publishing houses), but just use your own for god's sake.


Hashes are not excessively long unless you choose to make it so. They might be opaque/random if you want, or they might not. "Remove all special characters and keep only the first 5 characters with space padding" is a string hash function. "Keep only the first 5 vowels with space padding" is a string hash function.

Here's a friendly AI generated hash function to give you an opaque 13 digit number if you're into that:

echo -n "$URL" | sha1sum | awk '{print $1}' | xxd -r -p | od -An -t u8 | tr -d ' \n' | cut -c1-13

For example, for https://standardebooks.org/ebooks/denis-diderot/the-indiscre... you get the ID 4897562473051.

It looks like their ebook sources are all published in git repos online, so you could check out the repos, get the timestamp of the initial commits, and do a monotonic ID on that if you wanted. You could also contribute the change back to them if you think it's something others would benefit from.


MASQUE[0] is the protocol for this. Cloudflare already uses masque instead of wireguard in their warp vpn.

[0]https://datatracker.ietf.org/wg/masque/about/


i was curious about this and did some digging around for an open source implementation. this is what i found: https://github.com/iselt/masque-vpn


installing third-party OSes on pixels does not void its warranty. It's one of the biggest reasons why GrapheneOS only supports Pixels and not e.g. Samsung's Galaxy lines.


They note in-post that it took ~2 months


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: