Hacker Newsnew | past | comments | ask | show | jobs | submit | more d_silin's commentslogin

It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.

This is what they are trying to create, more specifically:

https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...


It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.

I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.


> The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.

Could you provide some keywords to read more about this?


An old concept indeed! I think about this Ed Fredkin story a lot... In his words:

"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."

RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin


And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...


can't you just save the seed?


My understanding is that because GPUs do operations in a highly parallelized fashion, and because float point operations aren't commutative, then once you're using GPUs the seed isn't enough, no. You'd need the seed plus the specific order in which each of intermediate steps of the calculation was finished by the various streaming multiprocessors.


It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).


Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.


Generating enough random numbers with the right distribution for Gibbs sampling, at incredibly low power is what their hardware does.


I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.


I tried this, but not with the "AI magic" angle. It turns out nobody cares because CSPRNGs are random enough and really fast.



The article linked by you uses magnetic tunnel junctions for implementing the RNG part.

The Web site of Extropic claims that their hardware devices are made with standard CMOS technology, which cannot make magnetic tunnel junctions.

So it appears that there is no connection between the linked article and what Extropic does.

The idea of stochastic computation is not at all new. I have read about such stochastic computers as a young child, more than a half of century ago, long before personal computers. The research on them was inspired by the hypotheses about how the brain might work.

Along with analog computers, stochastic computers were abandoned due to the fast progress of deterministic digital computers, implemented with logic integrated circuits.

So anything new cannot be about the structure of stochastic computers, which has been well understood for decades, but only about a novel extremely compact hardware RNG device, which could be scaled to a huge number of RNG devices per stochastic computer.

I could not find during a brief browsing of the Extropic site any description about the principle of their hardware RNG, except that it is made with standard CMOS technology. While there are plenty of devices made in standard CMOS that can be used as RNG, they are not reliable enough for stochastic computation (unless you use complex compensation circuits), so Extropic must have found some neat trick to avoid using complex circuitry, assuming that their claims are correct.

However I am skeptical about their claims because of the amount of BS words used on their pages, which look like taken from pseudo-scientific Star Trek-like mumbo-jumbo, e.g. "thermodynamic computing", "accelerated intelligence", "Extropic" derived from "entropic", and so on.

To be more clear, there is no such thing as "thermodynamic computing" and inventing such meaningless word combinations is insulting for the potential customers, as it demonstrates that the Extropic management believes that they must be naive morons.

The traditional term for such computing is "stochastic computing". "Stochastics" is an older, and in my opinion better, alternative name for the theory of probabilities. In Ancient Greek, "stochastics" means the science of guessing. Instead of "stochastic computing" one can say "probabilistic computing", but not "thermodynamic computing", which makes no sense (unless the Extropic computers are dual use, besides computing, they also provide heating and hot water for a great number of houses!).

Like analog computers, stochastic computers are a good choice only for low-precision computations. With increased precision, the amount of required hardware increases much faster for analog computers and for stochastic computers than for deterministic digital computers.

The only currently important application that is happy even with precisions under 16 bit is AI/ML, so trying to market their product for AI applications is normal for Extropic, but they should provide more meaningful information about what advantages their product might have.


You can absolutely sign the image with the on-camera certificate, for example, but that would too boring of a solution to hype.


See that's what I'm saying.


"CSCI 4020: Writing Fast Code in Slow Languages" does exist, at least in the book form. Teach algorithmic complexity theory in slowest possible language like VB or Ruby. Then demonstrate how O(N) in Ruby trumps O(N^2) in C++.


One of my childhood books compared bubble sort implemented in FORTRAN and running on a Cray-1 and quicksort implemented in BASIC and running on TRS-80.

The BASIC implementation started to outrun the supercomputer at some surprisingly pedestrian array sizes. I was properly impressed.


To be fair, the standard bubble sort algorithm isn't vectorized, and so can only use about 5% of the power of a Cray-1. Which is good for another factor of about 5 in the array size.


A Cray-1 was still fast at non-vector code when new.


Yes, as I understand it, its 80MHz clock gave it a 12.5ns memory access time, and I think it normally accessed memory four times per instruction, enabling it to do 20 MIPS (of 64-bit ALU ops). But the vector units could deliver 160 megaflops, and usually did. I think a TRS-80 could technically run about half a million instructions per second (depending on what they were) but only about 0.05 Dhrystone MIPS—see the Cromemco Z2 on https://netlib.org/performance/html/dhrystone.data.col0.html for a comparable machine.

So we can estimate the Cray's scalar performance at 400× the TRS-80's. On that assumption, Quicksort on the TRS-80 beats the Cray somewhere between 10000 items and 100_000 items. This probably falsifies the claim—10000 items only fits in the TRS-80's 48KiB maximum memory if the items are 4 bytes or less, and although external sorting is certainly a thing, Quicksort in particular is not well-suited to it.

But wait, BASIC on the TRS-80 was specified. I haven't benchmarked it, but I think that's about another factor of 40 performance loss. In that case the crossover isn't until between 100_000 and 1_000_000 items.

So the claim is probably wrong, but close to correct. It would be correct if you replaced the TRS-80 with a slightly faster microcomputer with more RAM, like the Apple iiGS, the Commodore 128, or the IBM PC-AT.


We had this as a lab in a learning systems course. converting python loops into numpy vector manipulation (map reduce), and then into tensorflow operations, and measuring the speed.

Gave a good idea of how python is even remotely useful for AI.



I work with Python programmers (engineers/scientists who 'know' Python) daily. Having them understand why their slow code is slow would be amazing.


We are rebuilding a core infrastructure system from unmaintained python (it's from before our company was bought and everyone left) to java. It's nothing interesting, standard ML infrastructure fare. A straightforward, uncareful, like weekend implementation in java was over ten times faster.

The reason is very simple: Python takes longer for a few function calls than Java takes to do everything. There's nothing I can do to fix that.

I wrote a portion of code that just takes a list of 170ish simple functions and run them, and they are such that it should be parallelizable, but I was rushing and just slapped the boring serialized version into place to get things working. I'll fix it when we need to be faster I thought.

The entire thing runs in a couple nanoseconds.

So much of our industry is writing godawful interpreted code and then having to do crazy engineering to get stupid interpreted languages to do a little faster.

Oh, and this was before I fixed it so the code didn't rebuild a constant regex pattern 100k times per task.

But our computers are so stupidly fast. It's so refreshing to be able to just write code and it runs as fast as computers run. The naive, trivial to read and understand code just works. I don't need a PhD to write it, understand it, or come up with it.


I’ll take the fight on Algorithmic complexity any day.

There are many cases where O(n^2) will beat O(n).

Utilising the hardware can make a bigger difference than algorithmic complexity in many cases.

Vectorised code on linear memory vs unvectorised code on data scattered around the heap.


I sincerely hope you are joking...


Big O notation drops the coefficient, sometimes that coefficient is massive enough that O(N) only beats out O(N^2) at billions of iterations.

Premature optimisation is a massive issue, spending days working on finding a better algorithm is many times not with the time spent since the worse algorithm was plenty good enough.

Real world beats algorithmic complexity many many times because you spent ages building a complex data structure with a bunch of heap allocations all over the heap to get O(N) while it's significantly faster to just do the stupid thing that is in linear memory.


I imagine this is a class specifically about slow languages. Writing code that doesn't get garbage collected, using vectorized operations(numpy), exploiting jit to achieve performance greater than normal C, etc.


VB is actually quite fast since VB 6, but your point stands.


Python has come along way. It’s never gonna win for something like high-frequency trading, but it will be super competitive in areas you wouldn’t expect.


It could be much better if most folks used PyPy instead of CPython as favourite implementation.


The Python interpreter and core library is mostly C code, right? Even a Python library can be coded in C. If you want to sort an array for example, it will cost more in Python because it's sorting python objects, but it's coded in C.


Optimizing at the algorithmic and architectural level rather than relying on language speed


"Depending on future emissions, the IPCC now projects an average sea-level rise of half a meter to 1 meter by 2100"


Someday they will realize that we are still in an ice age.


We've been in an ice age for millions of years (standard definition of an ice age: there is persistent ice sheets at the poles; on Greenland and Antarctica).

Within ice ages, there are glacials and interglacials. Glacial periods are what normal people think of when they say "ice age"; like when the ice sheets covered Canada and extended down to Chicago/NYC areas.

Without manmade climate change, we'd be slowly sliding back to another glacial period. So in that sense, the warming is welcome, but we're overdoing it and shooting too hard in the opposite direction.


A life well lived.


That's why I am never afraid to walk alone at night, in the dark forest.


Must not live in the Sundarbans


Mission success, apparently. Next flight (in 2026) will launch next generation of Starship.


For quick and dirty Python benchmark, try https://github.com/DarkStar1982/fast_langton_ant/

Run as "python3 server.py -s 10000000 -n"


How is GitLab doing?


737 with fly-by-wire avionics would be what 737MAX should have been.


No, they need a different landing gear to make room for bigger engines. Larger diameter engines are needed for better fuel efficiency. Fly by wire is nice, but fuel economy is more important.


Don't think it would have sold - behind the airbus neos on fuel efficiency. Hence the janky mcas solution to make the reengine work


Cancelled?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: