You're approaching this from a developers point of view. Users absolutely don't ...

talldayo · on June 10, 2024

> Users absolutely don't care if their prompt response has been generated by a CUDA kernel or some poorly documented apple specific silicon

They most certainly will. If you run GPT-4o on an iPhone with MLX, it will suck. Users will tell you it sucks, and they won't do so in developer-specific terms.

The entire point of this thread is that Apple can't make users happy with their Neural Engine. They require a stopgap cloud solution to make up for the lack of local power on iPhone.

> And haven't they already spent quite a bit on money on their pytorch-like MLX framework?

As well as Accelerate Framework, Metal Performance Shaders and previously, OpenCL. Apple can't decide where to focus their efforts, least of which in a way that threatens CUDA as a platform.