Very useful. SIMD has a much lower barrier to use (doesn't need graphics drivers...

jedbrown · on June 22, 2018

A GPU is much wider than a single core, but only slightly wider than a server CPU. For example, a 28-core Xeon has dual-issue FMA with 6-cycle latency and 16-wide packed SP registers, thus reaches peak floating point performance with 5376 independent operations in-flight at any instant. It's only about 4x higher for V100, which has higher TDP.