llama.cpp is orders of magnitude easier. Rather than controlling token by token, with an imperative statement for each, we create a static grammar to describe ex. a JSON schema.
I'm honestly unsure what this offers over that, especially because I'm one of 3 groups with a WASM llama.cpp, and you can take it from me, you don't want to use it. (~3 tokens/sec with a 3B model on MVP M2 Max/Ultra/whatever they call top of line for MBP. About 2% of perf of Metal, and I'd bet 10% of running on CPU without WASM. And there's no improvement in sight)
I don't think the key idea here is to run llama.cpp itself in WASM - it's to run LLMs in native code, but have fast custom-written code from end-users that can help pick the next token. WASM is a neat mechanism for that because many different languages can use it as a compile target, and it comes with a robust sandbox by default.
It's only the controller that runs in Wasm, not the inference.
The pyctrl is a just a sample controller, you can write a controller that takes any kind of grammar (eg., a yacc grammar [0] - the python code in that example is only used for glueing).
Llama.cpp grammars were quite slow in my testing (20ms per token or so, compared to 2ms for the yacc grammar referenced above).
I'm honestly unsure what this offers over that, especially because I'm one of 3 groups with a WASM llama.cpp, and you can take it from me, you don't want to use it. (~3 tokens/sec with a 3B model on MVP M2 Max/Ultra/whatever they call top of line for MBP. About 2% of perf of Metal, and I'd bet 10% of running on CPU without WASM. And there's no improvement in sight)