Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very useful. SIMD has a much lower barrier to use (doesn't need graphics drivers, GPGPU frameworks etc., almost universally available and fallbacks are easily implemented) and is much easier to target (same language, same toolchain, same memory model).

Also notice that the execution model of GPUs and CPUs is quite different. You need a far larger "breadth" of execution to efficiently use a GPU, compared to a CPU.



A GPU is much wider than a single core, but only slightly wider than a server CPU. For example, a 28-core Xeon has dual-issue FMA with 6-cycle latency and 16-wide packed SP registers, thus reaches peak floating point performance with 5376 independent operations in-flight at any instant. It's only about 4x higher for V100, which has higher TDP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: