FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:
Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.
Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers: