Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Raw FLOPs is completely misleading, which is why Nvidia focus on it as a metric. The GPU can't keep those ops active - particularly during inference when most of the data is fresh so caches don't help. It's the roofline model.

In my experience FPGA>GPU for inference, if you have people who can implement good FPGA designs. And inference is more common than training. Much of this is due to explicit memory management and more memory on FPGA.



Well, my primary point is that the earlier assertion: "GPUs are 20x faster than FPGAs" is no where close to the theory of operations, let alone reality.

ASICs (in this case: a fully dedicated GPU) obviously wins in the situation it is designed for. The A100, and other GPU designs, probably will have higher FLOPs than any FPGA made on the 7nm node.

But not a "lot" more FLOPs, and the additional flexibility of an FPGA could really help in some problems. It really depends on what you're trying to do.

------

At best, 7nm top-of-the-line GPU is ~2x more FLOPs than 7nm top-of-the-line FPGA under today's environment. In reality, it all comes down to how the software was written (and FPGAs could absolutely win in the right situation)


> GPUs are 20x faster than FPGAs

The original comment by brandmeyer said "ASIC", not "GPU".

Take the same RTL. Synthesize it for ASIC and for FPGA. Observe a 20x difference after normalizing for power, area, and clock speed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: