Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A 100× transistor count would amount to basically 6 to 7 doublings of Moore's law. Or 10× in nm device lengths. So once the inherent difficulties of designing a chip for trailing-edge ASIC are addressed (with better free EDA tools and such) it seems that FPGA-based commercial products (as opposed to use of FPGAs for bespoke prototyping needs) should become quite uncompetitive. There's also structured ASIC, multi-project wafer, etc. design approaches that are sort of half-way, and might provide an interesting alternative as well. OTOH, FPGA's might also be more easily designed to integrate pre-built components like CPU cores, and the 100× rule wouldn't apply to such parts if used in an FPGA-based design.


100x transistor count is a big enough difference that you would want some sort of ALU and branching/FSM/loop unit arranged in an array with a few fpga elements on inputs and outputs.

It sounds to me that the real problem is still that the ideal programming model hasn't been created yet.

What would it look like? Compile a C function into an FPGA pipeline? A dataflow language where you explicitly define processes and their implementation?

I mean imagine if you could take a mathematical formula, how would it translate into a series of additions, subtractions and multiplications? You could write it down in Fortran and then you would have a well defined and ordered tree of operations. Do we just translate that tree into hardware? Like, you have 100 instructions with no control flow and you just translate it into 100 ALUs? Does it make sense to reuse the ALUs and therefore have 100 instructions map to less than 100 ALUs?

If we assume the above model, then there are specific requirements for the hardware.

What if need more than one algorithm? Can I switch the implementation fast enough? Can the hardware have multiple programmed algorithms for the same ALUs?

Programmable ALUs sound awfully close to what a CPU does but in theory you would just have a register file with say 16 different ALU configurations and the data you are sending through the ALUs is prefixed with a 4 bit opcode that tells the ALU which configuration to use. We are getting further and further away from what an FPGA does and closer to how CPUs work but we still have the concept of programming the hardware for specific algorithms.

These are just random thoughts but they reveal that the idea of a pure FPGA is clearly not what the accelerator market needs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: