You get considerably more ML FLOPS per dollar in a 4090 than any mac. It seems l...

hajile · on June 11, 2024

They don't need the entire mac. Their cost per Max chip is probably $200-300 which beats the 4090 by a massive margin and each chip can do more than a 4090 because it also has a CPU onboard.

4090 peaks out at around 550w which means they can run 5+ of their Max chips in the same power budget.

A 4090 is $2000. Apple can probably get 5 chips on a custom motherboard for that cost. They'll use the same amount of power, but get a lot more raw compute.

Hugsun · on June 12, 2024

> Their cost per Max chip is probably $200-300 which beats the 4090 by a massive margin...

That's true. I was talking about end user pricing.

> ...each chip can do more than a 4090 because it also has a CPU onboard.

That's a strange thing to say. It has a CPU, correct. It makes the chip more versatile but for data center ML tasks it doesn't really matter. A 4090 chip also has much more ML relevant compute per chip. So apple's chips can't really "do more than a 4090" in any relevant way.

Of course apple pays less for their in house made chips vs external products. That comparison doesn't seem relevant to the context, e.g. they're not going to challenging CUDA with internal chips.

They might get more compute per watt though. My guess is that nvidias datacenter chips are competitive in that space, but that's another story.

eek2121 · on June 11, 2024

The GPU in the M-series is much slower than a 4090. 4060-4070ish performance at best, and it varies quite a bit.

ribit · on June 11, 2024

You need to consider this in the context of the relevant task. Nvidia GPUs have extremely high peak performance for GEMM, but when working with LLMs, bandwidth (and RAM capacity) becomes the limiting factor. There is a reason why real ML-focused datacenter Nvidia GPUs use much wider RAM interfaces and a much higher price point. The M2 Ultra might not have the raw compute, but it has a lot of RAM and large caches.

hajile · on June 11, 2024

If they can get 5 4070s for the price and power of one 4090, that's a win for them as they'll get more performance per dollar and per watt.

talldayo · on June 11, 2024

> and per watt

Part of the advantage of using "one 4090" is that the max TDP is only 450w, as opposed to 5 M2 Ultras running at ~150w each. When you scale up to Nvidia's latest Blackwell architecture, I genuinely don't know how Apple could beat them on performance-per-watt. Buying M2 Ultras wholesale is probably cheaper than an NVL72 cluster, but certainly not what you'd want to use for Linux or maximizing AI-based performance-per-watt.

hajile · on June 11, 2024

You are missing the point. We're discussing if Apple can use their own chips more cheaply than buying Nvidia's chips.

The Max TDP is not actual peak power consumption. Gamer's nexus recorded 500w peak and almost 670w overclocked. Most reviews I've looked at seem to put peak power consumption around 550w.

M2 Ultra wasn't even mentioned and it uses more than 150w. The correct question would be about M3 Max as we have solid numbers on it. M3 Max uses around 100w when both the GPU and CPU are heavily utilized and less than that when only the GPU is used.

This means that Apple could run 5 of their M3 Max chips in the same peak power as the 4090. But wait, there's more. 4090 doesn't run in a vacuum. It requires a separate CPU setup and a couple hundred more watts.

That means we could power 7 or so M3 Max chips with that same amount of power.

Of course, this isn't the whole story. 4090 isn't a professional chip either (while Apple can bin and certify their own CPUs and know they're getting a server-grade chip) and the 4090 also doesn't have nearly enough RAM. H100 starts at $25,000 and goes up. Apple could buy 75-100 M3 Max chips for that kind of money. That's certainly a load more compute than H100 would offer. Blackwell will be even more expensive in comparison.

throwaway4good · on June 11, 2024

The M2 is a chip designed to be in a laptop (and it is quite powerful given its low power consumption). Presumedly they have a different chip or at least completely different configuration (RAM, network, etc.) in their data centers.

mrweasel · on June 11, 2024

The interesting point here is that developers targeting the Mac can safely assume that the users will have a processor capable of significant AI/ML workloads. On the Windows (and Linux) side of things, there's no common platform, no assumption that the users will have an NPU or GPU capable of doing what you want. I think that's also why Microsoft was initially going for the ARM laptops, where they'd be sure that the required processing power is available.

qwytw · on June 11, 2024

> The interesting point here is that developers targeting the Mac can safely assume that the users will have a processor capable of significant AI/ML workloads

Also that a significant proportion (majority?) of them will have just 8 GB of memory which is not exactly sufficient to run any complex AI/ML workloads.

talldayo · on June 11, 2024

Easy solution; just swap multiple gigabytes of your model to SSD-based ZRAM when you run out of memory. What could possibly go wrong?

skohan · on June 11, 2024

I believe MS is trying to standardize this, in the same way as they do with DirectX support levels, but I agree it's probably going to be inherently a bit less consistent than Apple offerings

pjmlp · on June 11, 2024

DirectML can use multiple backends.

InsomniacL · on June 11, 2024

That sounds like a big issue, but surely assuming for either case is bad.

I expect OS's will expose an API which, when queried, will indicate the level of AI inference available.

Similar to video decoding/encoding where clients can check if hardware acceleration is available.

treprinum · on June 11, 2024

How does it help me (with maxed out M3 Max) that Apple might have some chip in the future right now? I do DL on A6000 and 4090, not waiting until Apple produces a chip someday that is faster than 1650 in ML...

sofixa · on June 11, 2024

That's probably where Microsoft's "Copilot+ PCs" come in.

pjmlp · on June 11, 2024

Plus DirectML, wich as the name implies, builds on top of DirectX, allowing multiple backends, CPU, GPU, NPU.

skohan · on June 11, 2024

There was a rumor floating around that Apple might try to enter the server chip business with an AI chip, which is an interesting concept. Apple's never really succeeded in the B2B business, but they have proven a lot of competency in the silicon space.

Even their high-end prosumer hardware could be interesting as an AI workstation given the VRAM available if the software support were better.

jitl · on June 11, 2024

> Apple's never really succeeded in the B2B business

Idk every business I’ve worked and all the places my friends work seem to be 90% Apple hardware, with a few Lenovo issued for special case roles in finance or something.

neodymiumphish · on June 11, 2024

They mean server infrastructure.

esskay · on June 11, 2024

Of course you do, Apple's selling mobile SOC's not high end cards. That doesn't mean they're incapable of making them for the right application. You don't seriously think the server farms running on M4 Pro Max chips do you...