> The absolute best way of doing this is these days is likely through a vision b...

> The absolute best way of doing this is these days is likely through a vision based machine learning model, but that is an approach that is very far away from scaling to processing hundreds of gigabytes of PDF files off a single server with no GPU.

SmolDocling is pretty fast and the ONNX weights can be scaled to many CPUs: https://huggingface.co/ds4sd/SmolDocling-256M-preview

Not sure what time scale the author had in mind for processing GBs of PDFs, but the future might be closer than “very far away”