Hacker Newsnew | past | comments | ask | show | jobs | submit | rajatgupta314's commentslogin

Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.

Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers:

https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main?s...

Example looking at the same weight on Ollama is BF16:

https://ollama.com/library/gpt-oss:20b/blobs/e7b273f96360


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: