I threw together a toy project to see if it would help me understand the basic c...

I threw together a toy project to see if it would help me understand the basic concepts and my takeaway was that, if you can shape your input into something a dedicated classification model (e.g. YOLO for document layout analysis) can work with, you can farm each class out to the most appropriate model.

It turns out that I can run most of the appropriate models on my ancient laptop if I don't mind waiting for the complicated ones to finish. If I do mind, I can just send that part to OpenAI or similar. If your workflow can scale horizontally like my OCR pipeline crap, every box in your shop with RAM >= 16GB might be useful.

Apologies if this is all stuff you're familiar with.