I've found that `-march=native` with _any_ optimization level will almost always result in faster code. However, that faster code isn't always backwards-compatible to older generation hardware. And, where it is backwards-compatible, it can actually be slower.