Found the discrepancy. I use single precision in PyTorch. When I benchmark sgemm, the SSE code path is selected.
Conclusion: MKL detects Zen now, but currently only implements a Zen code path for dgemm and not for sgemm. To get good performance for sgemm, you have to fake being an Intel CPU.
Conclusion: MKL detects Zen now, but currently only implements a Zen code path for dgemm and not for sgemm. To get good performance for sgemm, you have to fake being an Intel CPU.
Edit, longer description: https://github.com/pytorch/builder/issues/504