Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.


I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: