> Are the weights encoded on every startup? Not really, that code only runs whil...

> Are the weights encoded on every startup?

Not really, that code only runs while importing the PyTorch format. See readme for the frontend app: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... When loading the model from *.cgml, the model file already contains compressed tensors. That’s how that file is only 4.55 GB, versus 13.4 GB in the original model.

> was that really needed?

For desktops with many CPU cores, a simpler scalar version would probably work equally well. Still, low-end computers don’t always have many cores to use by these background encoding tasks. Also, CPU usage on laptops translates to battery drain.