Not really, that code only runs while importing the PyTorch format. See readme for the frontend app: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... When loading the model from *.cgml, the model file already contains compressed tensors. That’s how that file is only 4.55 GB, versus 13.4 GB in the original model.
> was that really needed?
For desktops with many CPU cores, a simpler scalar version would probably work equally well. Still, low-end computers don’t always have many cores to use by these background encoding tasks. Also, CPU usage on laptops translates to battery drain.
Not really, that code only runs while importing the PyTorch format. See readme for the frontend app: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... When loading the model from *.cgml, the model file already contains compressed tensors. That’s how that file is only 4.55 GB, versus 13.4 GB in the original model.
> was that really needed?
For desktops with many CPU cores, a simpler scalar version would probably work equally well. Still, low-end computers don’t always have many cores to use by these background encoding tasks. Also, CPU usage on laptops translates to battery drain.