That seems incredibly prescient for accounts created before even GPT-1. Obviously broad data scraping existed before then, but even amongst this crowd I find it hard to believe that’s the real motivator.
Yes, they do. They're called Neural Engine, aka NPUs. They aren't being used for local LLMs on Macs because they are optimized for power efficiency running much smaller AI models.
Meanwhile, the GPU is powerful enough for LLMs but has been lacking matrix multiplication acceleration. This changes that.
From a compute perspective, GPUs are mostly about fast vector arithmetic, with which you can implement decently fast matrix multiplication. But starting with NVIDIA's Volta architecture at the end of 2017, GPUs have been gaining dedicated hardware units for matrix multiplication. The main purpose of augmenting GPU architectures with matrix multiplication hardware is for machine learning. They aren't directly useful for 3D graphics rendering, but their inclusion in consumer GPUs has been justified by adding ML-based post-processing and upscaling like NVIDIA's various iterations of DLSS.