The warp model in GPUs is great at hiding the DRAM latency. The GPU isn't idly w...

The warp model in GPUs is great at hiding the DRAM latency. The GPU isn't idly waiting for DRAM.

All threads that need a memory access are in a hardware queue, and data coming from the DRAM immediately dequeues a thread and runs the work until the next memory access. So you compute at the full throughput of your RAM. Thread scheduling done in software can't have such granularity and low overhead, and hyperthreading has too few threads to hide the latency (2 vs 768).