Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep, batching is a feature I really wish the OpenAI API had. That and the ability to intelligently cache frequently used prompts. Much easier to achieve this with a hosted OS model, so I guess it's a speed + customizability/cost tradeoff for the time being.


imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it


Ah that would make sense. Similar to vLLM which does dynamic packing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: