Hacker Newsnew | past | comments | ask | show | jobs | submit | timschmidt's commentslogin

Reading weights out of memory is the definition of a large linear read. I'm a bit mystified someone hasn't put an embarrassingly parallel flash storage controller next to some tensor processors on a PCIe card. It could have 4Tb of flash hanging off enough channels to saturate SRAM skipping DRAM entirely, and could even offload prompt processing to a GPU in the same workstation so long as it got reasonable tokens/s in inference. I'd buy one tomorrow.

For the last year, there has been development work at several companies for products including HBF (high-bandwidth flash memory) as a supplement to HBM, in order to enable running inference for big LLMs at a reasonable cost, e.g. on one GPU-like card.

HBF was initially announced by SanDisk, early in 2025, then early this year Hynix has announced that they have joined SanDisk in producing HBF, and that the common specification will be standardized under the Open Compute Project.

With HBF, it would be easy to make a GPU card with 4 TB of HBF, which could run the biggest existing open weights LLMs in their native unquantized form.


Exciting news! This is how I see running frontier models at home becoming reasonably affordable. Though it may take a depreciation cycle or two.

For sparse MoE models, the single expert layers that the inference gets sampled from are actually quite small - single-digit megabytes or so.

> Can a N100-class minipc” be installed inside of a wall with a touchscreen and serve as a PoE powered Home Assistant interface?

Yes. Generally only requiring a $10 PoE splitter like this: https://www.ebay.com/itm/134500605396

Some N100 class machines draw more power, but many don't, and there are more capable PoE splitters for a few dollars extra.

Use a USB touchscreen.


I might also point out that with Pi/mini PC pricing being the way it is, a used iPad mini mounted to the wall is also in the same price range. As a bonus you could remove it from the wall and walk around with it, and you’ve got way less DIY work to deal with.

Well, nobody's arguing about RPi prices here. I'm just advocating for using the right tool for the job. Lots of people claim that Raspberry Pis have been rendered obsolete by cheap N100 mini PCs but they simply lack the understanding of what RPi actually is and what are its optimal use cases. Hosting a home server on a 16GB Raspberry Pi is mental illness territory and that's where an x86 mini PC is going to make way more sense. Same with retro gaming (unless you really need Composite out). RPis shine when you need compact size, low power and heat with great selection of hardware accessories like cameras and other sensors but also want to run full Linux or need that extra performance that a micro controller just doesn't have.

Edit: Putting a device with permanently attached battery inside of a wall or even on a mount, always plugged in gives me the heebie-jeebies.


Basically, my point is that for the use case of “touch screen on a wall,” you can grab something modest like a 4GB Raspbery Pi 5 for over $110 with no screen, power supply, enclosure, etc.

Or you look at a mini PC and you really can’t buy one at all for much less than $200 these days. Again, no screen.

But Apple will sell you a refurbished iPad mini for $379 and you’ve got nothing to setup.

I share your concern about running it with the battery all the time, but I think it’s pretty common. I probably wouldn’t put it in my wall but I know of a place of business I frequent often that has one plugged in 24/7 and nothing has happened.

Apple power manages devices that are plugged in all the time, they’ll likely just park the battery at 80%. They are also about as good as you can get as far as hardware quality: Apple sells a bazillion devices and has definitely thought of fire risk.

The other benefit of the iPad is that the accessory ecosystem is vast.


But that's way bigger and hotter than a RPi4 with an official Touch Display 2. It's technically possible but sounds silly and impractical.

I've been doing some symbolica-like things recently in the https://github.com/timschmidt/hyperreal ecosystem. Not a full CAS, just enough symbolic math to maintain precision through the calculations.

Benchmarks against Symbolica and numerica here: https://github.com/timschmidt/hyperlattice/blob/main/benchma...


Nice, I will check this out in more detail later. I had a quick look at the benchmarks and it looks like you compare f64 hyperreal with numericas 128 bit implementation, which will fall back to using arb-prec GMP. There is also F64(simply wrapping around f64), and now DoubleFloat with 106 bits precision, which should be much faster. There is also the ErrorPropagatingFloat wrapper that may be of interest.

For simple numerical operations, using an entire Symbolica Atom will introduce a large amount of overhead. It should only be used if the expression contains symbols as well. But perhaps I misunderstood the point of the benchmark?


Hyperreal doesn't have any f64 mode. All math done with hyperreals is at infinite precision using a Rational of two BigUInts and a recursive real Computable. Real provides a cohesive interface over both allowing for easy scalar math. Computables are handled symbolically through a set of deterministic reduction rules until approximation is required, to preserve precision and reduce complexity. Approximation only happens at explicit public API boundaries like .to_f64_lossy() not used except for IO.

Hyperreal gets performance back through caching observed facts about the numbers it's representing at creation, and through operations, and specializing dispatch for predicates and geometric operations. Using this approach throughout the stack allows us to avoid computing on the full representation or collapsing it into an approximation. Instead asking questions like "do we know if it's definitely zero, definitely not, or unknown?" or "is it rational?" or "does it have a known sign, or unknown?" and so on. Each question specializes dispatch further, and some eliminate the need for it entirely.

Asking questions using the cached facts is approximately as fast as computing with f64s. So we do that whenever possible throughout the stack. But then when you actually need to do the exact computation, hyperreal does that too, and can approximate it out to whatever precision you'd like. f32 and f64 being common, but others being supported as well. The downside is that calculating quickly with them requires this sort specialization, but the work's been done for the geometry functions.

I'll look into DoubleFloat and ErrorPropagatingFloat for benches. I should mention that numerica@128bit beat the other pure rust bignum crates I tested. The benchmarks are mostly just to give me an understanding of the performance shapes of the implementation choices of high precision numeric libraries alongside hyperreal.


Thanks for the clarification! Hyperreal sounds very useful for zero testing (at the moment I use ErrorPropagatingFloat for this, but it is fickle), I will play around with this in the near future.

Yes, it should be useful for that. Hyperreal's trig and approximate functions performance is also stellar. Perhaps the biggest compromise in terms of the math supported by hyperreals at the moment is that although Rational equality can be exactly tested, Computable equality is currently structural. So it's possible to end up with two mathematically equivalent Computables which aren't structurally equal. Because it's not a full CAS.

It's still possible to approximate them both, and test them against each other, but since the whole architecture is built to reduce, avoid, and cache approximations because they're expensive, it's not the default.


In the end the zero test problem is undecidable for reasonably complicated expressions, so sadly there is no guarantee that you can rewrite one Computable into another even if they evaluate to the same. For polynomials you can do finite field evaluation tests to prove equality with a likelihood bound of your choosing. That may be interesting for hyperreal too.

Yes. One of those problems with no neat solution, and worse no performant one. :)

You might find this bit of performance engineering interesting: https://github.com/timschmidt/hyperreal/blob/8a016808f4b0ba3...

The matrix math layer wanted that kind of optimization to avoid worst case operations.


Cars run red lights in real life. Driving defensively requires anticipating it. Anyone expecting them not to is more likely to get in a crash.

The rest I can't speak to.


I'm not op, but he's probably referring to: https://en.wikipedia.org/wiki/Cracking_(chemistry)


Using dashes twice like that is valid. It's a bit like parentheses, to frame a tangential statement between them, but with emphasis instead of quietly. See: https://en.wikipedia.org/wiki/Dash

I use that construction in my totally human writing often enough. Some of us missed a few English classes it seems.


Are you seriously going to claim that this is not LLM generated?

https://alphapixeldev.com/what-is-a-mercenary-programmer-and...

The guys twitter account is full of LLM slop: https://x.com/alphapixel

Perhaps all these other posts from the same author in completely different styles are also not LLM generated? https://wildirismarketing.com/articles-and-blog-posts/

>I use that construction in my totally human writing often enough. Some of us missed a few English classes it seems.

I made no comment as to the validity of the construction.


> Are you seriously going to claim that this is not LLM generated?

Do you see me making that claim? My comment seems to be about grammar. Do you always jump to conclusions?

> I made no comment as to the validity of the construction.

See:

> I mean look at this sentence which randomly contains the " - " pattern twice in a row

They're called parenthetical dashes. They're not random. And it's one pattern, not two. You'll find it used with parentheses (obviously) and commas as well as dashes and perhaps even other punctuation[1].

As to whether or not the post was written by AI, I don't care either way. That seems to be something you care about. But you shouldn't base those conclusions on the use of parenthetical dashes.

1: https://editorsmanual.com/articles/commas-vs-parentheses-vs-...


> I’m not a fan of “think of the children“ arguments

Yet you're making one.

> the Internet cannot actually be a complete free for all

Yet in many important ways, it is.

As much as publishers would like to shut down Scihub, it exists. The Pirate Bay famously persists. Nation states with entirely opposed legal systems connect and interoperate to at least some degree.


Scihub and The Pirate Bay are not at all anonymous, aggressively police for CSAM, and rely on reputation systems.


North Korean Internet will solve ip4 address exhaustion.


Dude it’s CSAM what are we even doing here.


What's CSAM? All of Freenet? Doubtful.

The OP said: "Extremely depraved things are not the only thing to use freedom of speech for, and freely speaking can result in all kinds of repressions."

Which is objectively true.

You're throwing reporters, political dissidents, whistleblowers, minority groups, and just regular people who don't appreciate the Stasi in with the child pornographers which some might take as an insult and offense.

What kind of criminal does Phil Zimmermann look like to you? We had this argument already in the 90s.


People lump them together because of an anti-technology reputation, but I don't think most Amish would have trucked with Luddites. Amish tend to avoid actively participating in popular social movements, and oppose violence and property destruction.


Very Holobiont of you.


An excellent distinction to make. Life however often says "Why not both? And 11 more you'd have never thought of. And one that seems impossible just for fun."

If it's possible, and it can force a function up a gradient, life is almost certainly doing it somewhere.


You might even say it finds a way.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: