Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Couple times in the past I wanted to port open source ML models from CUDA/Python to a better technology stack. I have ported Whisper https://github.com/Const-me/Whisper/ and Mistral https://github.com/Const-me/Cgml/ to D3D11. I don’t remember how much time I spent, but given both were unpaid part-time hobby projects, probably under 160 hours / each.

These software projects were great to validate the technology choices, but note I only did bare minimum to implement specific ML models. Implementing a complete PyTorch backend gonna involve dramatically more work. I can’t even estimate how much more because I’m not an expert in Python or these Python-based ML libraries.



Wow, a very nice reimplementation.

To go on a tangent, I note your custom 'BCML1' 5bit per weight compression codec and your optimised hand-coded AVX2 to encode it... was that really needed? Are the weights encoded on every startup? Why not do it once and save to disk?


> Are the weights encoded on every startup?

Not really, that code only runs while importing the PyTorch format. See readme for the frontend app: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... When loading the model from *.cgml, the model file already contains compressed tensors. That’s how that file is only 4.55 GB, versus 13.4 GB in the original model.

> was that really needed?

For desktops with many CPU cores, a simpler scalar version would probably work equally well. Still, low-end computers don’t always have many cores to use by these background encoding tasks. Also, CPU usage on laptops translates to battery drain.


This is great!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: