How does this relate to ggmls bnf sampling?

remilouf · on Aug 14, 2023

Two differences:

(1) This feature only requires regex-guided generation. We have a PR for BNF sampling that is about to be merged. (2) ggml loops over the entire vocabulary (~50k tokens) at each step, which introduces a noticeable overhead, and makes it unusable for complex grammars. Our method works by building an index at initialization, and build the masks at each step with a dictionary lookup. Once the index is built, generation is just as fast as standard generation. Doesn't depend on the complexity of the grammar, the size of the LLM or its vocabulary size.

spott · on Aug 14, 2023

Regex-guided gen is slick… is it arbitrary? Or are you custom building it for json?

If arbitrary, how are you pre-defining a set of masks? I would expect that splitting an arbitrary regex into a bunch of contexts for a masking dictionary to be non-trivial.

remilouf · on Aug 15, 2023

Regex-Gen is implemented in all generality in the library (minus some constructs that we still have to add). JSON is merely an application.

You can read https://blog.normalcomputing.ai/posts/2023-07-27-regex-guide... for a more detailed explanation of how it works. Should answer your question :)