Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.
Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers:
Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers:
https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main?s...
Example looking at the same weight on Ollama is BF16:
https://ollama.com/library/gpt-oss:20b/blobs/e7b273f96360