I threw together a toy project to see if it would help me understand the basic concepts and my takeaway was that, if you can shape your input into something a dedicated classification model (e.g. YOLO for document layout analysis) can work with, you can farm each class out to the most appropriate model.
It turns out that I can run most of the appropriate models on my ancient laptop if I don't mind waiting for the complicated ones to finish. If I do mind, I can just send that part to OpenAI or similar. If your workflow can scale horizontally like my OCR pipeline crap, every box in your shop with RAM >= 16GB might be useful.
Apologies if this is all stuff you're familiar with.
It turns out that I can run most of the appropriate models on my ancient laptop if I don't mind waiting for the complicated ones to finish. If I do mind, I can just send that part to OpenAI or similar. If your workflow can scale horizontally like my OCR pipeline crap, every box in your shop with RAM >= 16GB might be useful.
Apologies if this is all stuff you're familiar with.