Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you use qpdf for extraction when its README states “qpdf does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents.”


Not the person you're replying to, but when they said "extraction" I believe they're talking about extracting pages from a PDF (like "splitting" the PDF apart, page-wise), not text. At least that's a thing I've used qpdf for in the past.


Which is also what the "extract" button does in Adobe Acrobat Pro DC for Professional Enterprise Customers or whatever they're calling it now, so it's arguably a term of art for PDFs.


You can render the PDF into QDF mode and then it is relatively easy to extract text just by searching for Tj and TJ operators.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: