Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure how this is different than:

https://github.com/1rgs/jsonformer

or

https://github.com/newhouseb/clownfish

or

https://github.com/mkuchnik/relm

or

https://github.com/ggerganov/llama.cpp/pull/1773

or

https://github.com/Shopify/torch-grammar

Overall there are a ton of these logit based guidance systems, the reason they don't get tons of traction is the SOTA models are behind REST APIs that don't enable this fine-grained approach.

Those models perform so much better that people generally settle for just re-requesting until they get the correct format (and with GPT-4 that ends up being a fairly rare occurrence in my experience)



Thanks for bringing clownfish and relm to my attention! afaik other libraries loop over the entire vocabulary at every step of the generation. We on the other hand build an index at initialization by looping once over the vocabulary. Then generation is just as fast as standard generation.


torch-grammar generates a mask per PDA stack... we don't try to compute all the possible stacks. I'm sure there's something smarter that could be done here and you've probably figured it out (though IIRC regular languages don't have the arbitrarily recursive stack problem that you get when you get to context-free languages?) anyway, in practice we spend a few milliseconds on the first few requests building caches and then just apply masks from caches after that.


Sorry for misrepresenting your work. Thank you for correcting me and the explanation. Will take a closer look.


Hi, author of ReLM here. We use automata as well, like you describe, if I understand correctly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: