Yep, batching is a feature I really wish the OpenAI API had. That and the ability to intelligently cache frequently used prompts. Much easier to achieve this with a hosted OS model, so I guess it's a speed + customizability/cost tradeoff for the time being.
imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it