Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about constrained decoding (with JSON schemas)? I noticed my vLLM instance is using 1 CPU 100%.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: