I'm playing around with using hyperlinks in pdfs to get around how much the www sucks for posting serious research with serious working code.
Caveat emptor: I'm first working on getting the basic groundwork out, like a pipeline that shows what you need to do to extract a scanned pdf in a quality that tesseract can actually get text out of.
> I'm first working on getting the basic groundwork out, like a pipeline that shows what you need to do to extract a scanned pdf in a quality that tesseract can actually get text out of.
Sounds like TBL's contribution doesn't suck so much after all.