More

cmrdporcupine · 2025-12-25T00:05:32 1766621132

depends on the kind of battle, but

see Ukraine drone warfare ... there's a lot going on there which is more than just miniaturized motors, etc. a lot is efficient power use of the semiconductors in those drones, the image processors attached to the cameras, etc. that i suspect relies on newer processes

cmrdporcupine · 2025-12-24T17:57:15 1766599035

90% of those problems effect people like you and I, developers and power users, not "regular" users of machines who are mostly mobile device and occasional laptop/desktop application users.

I suspect we'll see somebody -- a phone manufacturer or similar device -- make a major transition to RISC-V from ARM etc in the next 10 years that we won't even notice.

fulafel · 2025-12-24T19:00:48 1766602848

I agree, some will, but it may not be a more open platform from developer POV.

cmrdporcupine · 2025-12-24T17:12:33 1766596353

The problems with natural gas are definitely not confined to combustion. Methane leakage is a huge problem.

That and if you just encourage more exploration, and it's cheaper to just burn the stuff anyways, guess what happens in the price conscious free market?

cmrdporcupine · 2025-12-24T15:04:47 1766588687

When carbon byproducts are produced from these kinds of reactions, are they "pure" carbon, or will there be residues from the impurities in the methane?

The reason I ask is I wonder if the carbon could be used as a soil amendment to help replenish top soils in agriculture, or as a growing medium generally. But this would only be conceivable if it's just carbon.

estimator7292 · 2025-12-24T15:54:05 1766591645

It extremely depends on the exact reactions. I'm not a chemist but AFAIK carbon nanotube production doesn't like taking in non-carbon atoms.

Things like crystallization reactions will produce very pure products, some other reactions will absorb more contaminants.

cmrdporcupine · 2025-12-23T23:00:24 1766530824

Pascal. Modula-2. BASIC. Hell, Logo.

Lately, yes, Julia and R.

Lots of systems I grew up with were 1-indexed and there's nothing wrong with it. In the context of history, C is the anomaly.

I learned the Wirth languages first (and then later did a lot of programming in MOO, a prototype OO 1-indexed scripting language). Because of that early experience I still slip up and make off by 1 errors occasionally w/ 0 indexed languages.

(Actually both Modula-2 and Ada aren't strictly 1 indexed since you can redefine the indexing range.)

It's funny how orthodoxies grow.

teo_zero · 2025-12-23T23:39:47 1766533187

In fact zero-based has shown some undeniable advantages over one-based. I couldn't explain it better than Dijkstra's famous essay: http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF

cmrdporcupine · 2025-12-24T00:14:16 1766535256

It's fine, I can see the advantages. I just think it's a weird level of blindness to act like 1 indexing is some sort of aberration. It's really not. It's actually quite friendly for new or casual programmers, for one.

fc417fc802 · 2025-12-24T06:53:58 1766559238

I think the objection is not so much blindness as the idea that professional tools should not generally be tailored to the needs of new or casual users at the expense of experienced users.

IshKebab · 2025-12-24T09:47:13 1766569633

Is there any actual evidence that new programmers really find this hard? Python is renowned for being beginner friendly and I've never heard of anyone suggesting it was remotely a problem.

There are only a few languages that are purely for beginners (LOGO and BASIC?) so it's a high cost to annoy experienced programmers for something that probably isn't a big deal anyway.

nine_k · 2025-12-23T23:36:25 1766532985

Pascal, frankly, allowed to index arrays by any enumerable type; you could use Natural (1-based), or could use 0..whatever. Same with Modula-2; writing it, I freely used 0-based indexing when I wanted to interact with hardware where it made sense, and 1-based indexes when I wanted to implement some math formula.

fc417fc802 · 2025-12-24T06:56:28 1766559388

As I understand it Julia changed course and is attempting to support arbitrary index ranges, a feature which Fortran enjoys. (I'm not clear on the details as I don't use either of them.)

pklausler · 2025-12-24T16:30:14 1766593814

Let’s hope that they don’t also replicate ISO Fortran’s design flaws with lower array bounds, which contain enough pitfalls and portability problems that I don’t recommend their use.

bsder · 2025-12-24T07:09:23 1766560163

> Lots of systems I grew up with were 1-indexed and there's nothing wrong with it. In the context of history, C is the anomaly.

The problem is that Lua is effectively an embedded language for C.

If Lua never interacted with C, 1-based indexing would merely be a weird quirk. Because you are constantly shifting across the C/Lua barrier, 1-based indices becomes a disaster.

cmrdporcupine · 2025-12-23T14:49:29 1766501369

Yes you usually need to compact first before doing this kind of thing because the context windows are different.

cmrdporcupine · 2025-12-23T02:19:32 1766456372

Curious to see how this works out for you. Let us know.

pixelpoet · 2025-12-23T03:01:22 1766458882

Also curious with two Strix Halo machines at the ready for exactly this kind of usage

Tepix · 2025-12-23T20:23:49 1766521429

Don't wait for me. Donato Capitella has done this and created videos on his youtube channel at https://www.youtube.com/@donatocapitella

cmrdporcupine · 2025-12-23T21:20:03 1766524803

That's GLM 4.6 tho, not 4.7?

Still, informative. And stupidly I'd seen this video before. It sounds like the TLDR is: not quite.

cmrdporcupine · 2025-12-22T21:09:13 1766437753

10k wouldn't even get you 1/4 of the way there. You couldn't even run this or DeepSeek 3.2 etc for that.

Esp with RAM prices now spiking.

coder543 · 2025-12-22T21:17:33 1766438253

$10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

The point in this thread is that it would likely be too slow due to prompt processing. (M5 Ultra might fix this with the GPU's new neural accelerators.)

embedding-shape · 2025-12-22T22:53:51 1766444031

> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it

I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

rynn · 2025-12-22T23:28:08 1766446088

> Please do give that a try and report back the prefill and decode speed.

M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.

https://pastebin.com/2wJvWDEH

I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.

cmrdporcupine · 2025-12-22T23:33:54 1766446434

I think the DGX Spark will likely underperform the M4 from what I've read.

However it will be better for training / fine tuning, etc. type workflows.

rynn · 2025-12-23T00:29:11 1766449751

> I think the DGX Spark will likely underperform the M4 from what I've read.

For the DGX benchmarks I found, the Spark was mostly beating the M4. It wasn't cut and dry.

coder543 · 2025-12-23T00:36:16 1766450176

The Spark has more compute, so it should be faster for prefill (prompt processing).

The M4 Max has double the memory bandwidth, so it should be faster for decode (token generation).

coder543 · 2025-12-22T23:10:59 1766445059

> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal.

embedding-shape · 2025-12-23T09:05:38 1766480738

No, but the models you will be able to run, will run fast and many of them are Good Enough(tm) for quite a lot of tasks already. I mostly use GPT-OSS-120B and glm-4.5-air currently, both easily fit and run incredibly fast, and the runners haven't even yet been fully optimized for Blackwell so time will tell how fast it can go.

bigyabai · 2025-12-22T23:49:07 1766447347

You definitely could, the RTX Pro 6000 has 96 (!!!) gigs of memory. You could load 2 experts at once at an MXFP4 quant, or one expert at FP8.

coder543 · 2025-12-22T23:55:40 1766447740

No… that’s not how this works. 96GB sounds impressive on paper, but this model is far, far larger than that.

If you are running a REAP model (eliminating experts), then you are not running GLM-4.7 at that point — you’re running some other model which has poorly defined characteristics. If you are running GLM-4.7, you have to have all of the experts accessible. You don’t get to pick and choose.

If you have enough system RAM, you can offload some layers (not experts) to the GPU and keep the rest in system RAM, but the performance is asymptotically close to CPU-only. If you offload more than a handful of layers, then the GPU is mostly sitting around waiting for work. At which point, are you really running it “on” the RTX Pro 6000?

If you want to use RTX Pro 6000s to run GLM-4.7, then you really need 3 or 4 of them, which is a lot more than $10k.

And I don’t consider running a 1-bit superquant to be a valid thing here either. Much better off running a smaller model at that point. Quantization is often better than a smaller model, but only up to a point which that is beyond.

bigyabai · 2025-12-23T00:34:24 1766450064

You don't need a REAP-processed model to offload on a per-expert basis. All MoE models are inherently sparse, so you're only operating on a subset of activated layers when the prompt is being processed. It's more of a PCI bottleneck than a CPU one.

> And I don’t consider running a 1-bit superquant to be a valid thing here either.

I don't either. MXFP4 is scalar.

coder543 · 2025-12-23T00:43:34 1766450614

Yes, you can offload random experts to the GPU, but it will still be activating experts that are on the CPU, completely tanking performance. It won't suddenly make things fast. One of these GPUs is not enough for this model.

You're better off prioritizing the offload of the KV cache and attention layers to the GPU than trying to offload a specific expert or two, but the performance loss I was talking about earlier still means you're not offloading enough for a 96GB GPU to make things how they need to be. You need multiple, or you need a Mac Studio.

If someone buys one of these $8000 GPUs to run GLM-4.7, they're going to be immensely disappointed. This is my point.

embedding-shape · 2025-12-23T09:06:58 1766480818

> If someone buys one of these $8000 GPUs to run GLM-4.7, they're going to be immensely disappointed. This is my point.

Absolutely, same if they get a $10K Mac/Apple computer, immense disappointment ahead.

Best is of course to start looking at models that fit within 96GB, but that'd make too much sense.

virgildotcodes · 2025-12-23T11:07:47 1766488067

$10k is > 4 years of a $200/mo sub to models which are currently far better, continue to get upgraded frequently, and have improved tremendously in the last year alone.

This almost feels like a retro computing kind of hobby than anything aimed at genuine productivity.

embedding-shape · 2025-12-23T11:33:48 1766489628

I don't think the calculation is that simple. With your own hardware, there literally is no limits of runtime, or what models you use, or what tooling you use, or availability, all of those things are up to you.

Maybe I'm old school, but I prefer those benefits over some cost/benefit analysis across 4 years which by the time we're 20% through it, everything has changed.

But I also use this hardware for training my own models, not just inference and not just LLMs, I'd agree with you if we were talking about just LLM inference.

naasking · 2025-12-23T14:08:17 1766498897

They are better in some ways, but they're also neutered.

benjiro · 2025-12-22T22:06:35 1766441195

> $10k gets you a Mac Studio with 512GB of RAM

Because Apple has not adjusted their pricing yet for the new ram pricing reality. The moment they do, its not going to be a $10k system anymore but in the $15k+...

The amount of wafers going to AI is insane and will influence not just memory prices. Do not forget, the only reason why Apple is currently immunity to this, is because they tend to make long term contracts but the moment those expire ... then will push the costs down consumers.

tonyhart7 · 2025-12-22T22:20:52 1766442052

generous of you to predict apple only make it 50% expensive

cmrdporcupine · 2025-12-22T20:34:51 1766435691

Devstral Small 24b looks promising as something I want to try fine tuning on DSLs, etc. and then embedding in tooling.

hedgehog · 2025-12-22T23:43:04 1766446984

I haven't tried it yet, but yes. Qwen3 Next 80B works decently in my testing, and fast. I had mixed results with the new Nemotron, but it and the new Qwen models are both very fast to run.

mark_l_watson · 2025-12-23T12:51:18 1766494278

Same experience: on my old M2 Mac with just 32B of memory both Qwen 3 30B and the new Nemotron models are very useful for coding if I prepare a one-shot prompt with directions and relevant code. I don’t like them for agentic coding tools. I have mentioned this elsewhere: it is deeply satisfying to mix local model use with commercial APIs and services.

cmrdporcupine · 2025-12-22T20:27:26 1766435246

Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.

alok-g · 2025-12-23T02:53:02 1766458382

For others like me who did not know about Crush:

https://github.com/charmbracelet/crush

https://news.ycombinator.com/item?id=44736176