The unit economics at this point are about utilization. Their cost is well below what they’re charging, but only when there is enough traffic to keep the GPUs busy. So the game is about increasing demand to level the load.
GPT has negligible moat because they gave up on all their integrations. Claude code is starting to develop one as people start to build things that require Claude Code specifically, not just any LLM.
> Claude code is starting to develop one as people start to build things that require Claude Code specifically, not just any LLM.
I hate to be a "source?" guy, but I'm curious if you have any examples of this. Skills and MCP are really the only extensions on CC itself I'm aware of, and these are both supported in Codex.
Things like Dispatch / remote sessions is something CC has that Codex does not, but these features are quite easy to replicate (and I expect Codex to do so in short order).
I agree that’s a great question which I don’t really know the answer to. These tools have been moving in lock step for some time now. One will innovate and within 2-3 weeks the others have that feature. Where I sit the mind share all seems to be going to Claude though. The moat develops when people build something that only works on one - even if the others have the same features it doesn’t matter unless they’re literally binary compatible. Skills are just prompts at the end of the day with nothing more specialized than a file naming convention.
Having written several orchestrators I’ll say that the code to invoke the tool is pretty equivalent but it’s the details that matter. Exact CLI flags and json fields.
Also not like it’s a particularly good piece of tech. It was the first to show a new category. But jeebus the design and security are a nightmare. Any of the numerous other claws are better choices for anything serious.
Classic SV hubris. Talk to OpenAI people and they’re so convinced they’re untouchable, they don’t bother worrying about things like revenue, or product strategy. All they cared about was being the first to AGI. Well it looks like that isn’t happening soon enough. And now they have zero moat except brand recognition, which is quickly getting eroded.
The idea that they don’t learn from experience might be true in some limited sense, but ignores the reality of how LLMs are used. If you look at any advanced agentic coding system the instructions say to write down intermediate findings in files and refer to them. The LLM doesn’t have to learn. The harness around it allows it to. It’s like complaining that an internal combustion engine doesn’t have wheels to push it around.
But how will I make ad-supported youtube videos about how I automated my life with OpenClaw using a $10M boutique AI server to make a few thousand in ad revenue while burning tens of thousands per month on API cost.
These specs look enormously cheaper than doing it with dell servers. The last quote I had for a bog standard dell server was $50k and only if bought in the next few days or so. The prices are going up weekly.
These are "unsupported" configurations. Nvidia/AMD discourage running multiple gaming/workstation cards and encourage customers to buy $500K SXM/OAM servers.
DGX Spark is a fantastic option at this price point. You get 128GB VRAM which is extremely difficult to get at this price point. Also it’s a fairly fast GPU. And stupidly fast networking - 200gbps or 400gbps mellanox if you find coin for another one.
Meh. DGX is Arm and CUDA. Strix is X86 and ROCm. Cuda has better support than ROCm . And x86 has better support than Arm.
Nowadays I find most things work fine on Arm. Sometimes something needs to be built from source which is genuinely annoying. But moving from CUDA to ROCm is often more like a rewrite than a recompile.
> But moving from CUDA to ROCm is often more like a rewrite than a recompile.
Isn't everyone* in this segment just using PyTorch for training, or wrappers like Ollama/vllm/llama.cpp for inference? None have a strict dependency on Cuda. PyTorch's AMD backend is solid (for supported platforms, and Strix Halo is supported).
* enthusiasts whose budget is in the $5k range. If you're vendor-locked to CUDA, Mac Mini and Strix Halo are immediately ruled out.
Most everything starts as PyTorch. (Or maybe Jax.) But the inference engines all use hand tuned CUDA kernels - at least the good ones do. You have to do that to optimize things.
I'm certain inference engines don't use hand-tuned CUDA on Radeon or Mac Mini chips. My statement holds: those engines have no strict dependency on CUDA, or they'd be Nvidia-only.
I’m not very well versed in this domain, but I think it’s not going to be “VRAM” (GDDR) memory, but rather “unified memory”, which is essentially RAM (some flavour of DDR5 I assume). These two types of memory has vastly different bandwidth.
I’m pretty curious to see any benchmarks on inference on VRAM vs UM.
I’m using VRAM as shorthand for “memory which the AI chip can use” which I think is fairly common shorthand these days. For the spark is it unified, and has lower bandwidth than most any modern GPU. (About 300 GB/s which is comparable to an RTX 3060.)
So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.
> Windows touches more people’s lives than almost any technology on Earth.
Thankfully Ballmer failed and this isn’t even close to true. I, like a lot of highly technical professionals, have been Windows sober for many years now.
Not OP, but it is probably either "Average Hold Time" or "Average Handle Time". I supposed the usage here indicated some call center metric that management was expecting in a certain range but the new tool skewed it in a different direction.
GPT has negligible moat because they gave up on all their integrations. Claude code is starting to develop one as people start to build things that require Claude Code specifically, not just any LLM.
reply