Some more information in the official blog post: https://blog.mozilla.org/blog/2...

fabrice_d · on Aug 1, 2017

"this can be very easily done fully client-side" : maybe, if you have the voice model and and inference engine that runs well on devices. Mozilla doesn't have that yet, so this experiment uses a backend running a Kaldi server and model that uses too much memory to run locally.

Once DeepSpeech is ready I'm pretty sure they will switch to that, and ultimately to on-device voice recognition with PipSqueak (PipSqueak is expected to be an inference engine usable on devices). Unfortunately none of these projects are far along enough to be usable.

Common Voice is mostly related to DeepSpeech as this will help getting data to train the engine.

detaro · on Aug 1, 2017

Here is the backend: https://github.com/mozilla/speech-proxy (which talks to Kaldi via I think https://github.com/api-ai/asr-server)

pebers · on Aug 2, 2017

> Actually, I think this can be very easily done fully client-side, with good accuracy. Even on Android, the voice recognition can run client-side / offline.

I'm not sure I'd say it's easy; you will certainly trade off accuracy versus a state-of-the-art server model. Among other things, Firefox users are not going to download gigabytes of recognition model, so it'd have to be a lot smaller than the server ones would be.

Very possibly it will be slower too, since the servers would most likely be using GPUs for at least parts of the recognition, but it might not be easy to ensure the same on all the millions of PCs Firefox runs on.

a3_nm · on Aug 2, 2017

It's interesting if Mozilla is running their own speech recognition system. I wonder whether it would actually be usable in practice... The problem is, I couldn't find any kind of online demo.