Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

In practice and at scale, the guesses of the LLM are the superior outcome.



> But you aren't aware, because the OCR doesn't know that it failed. You would have to go through the entire text by hand to fix the corruptions, but that's too much work, so you won't, and the corruptions stay in.

Well, if you assume that you're never going to read the book, then sure. But in that case it's even more efficient to not OCR the book either. You'll never know the difference.

If you do read the book, you'll know where the failures are. And they're easy to correct if you can edit the document. I usually file reports of printing errors in Kindle books when I encounter them.

(Do the errors get corrected? No.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: