They were not cynical kissing up to previous holders of power, they were desperate attempts to cover their asses against EEOC lawsuits. And they didn't end because of Trump's second victory, they ended because the Supreme Court defanged EEOC (and half the rest of the federal regulatory agencies).
The actual reason for the "corporate DEI" in tech was that since Griggs v. Duke Power Co. (1971), EEOC could sue companies that had lower minority proportion than population norm for discrimination, and could could prove the discrimination in court using nothing but the racial makeup of the employees, and some policy at the company that could in theory have disparate impact. And under their standards, literally any policy has disparate impact.
This hit other sectors first, to which they responded by hiring more minorities. But tech had the problem that schools were consistently producing fewer minority engineering grads than the population proportion, and in a world where approximately every engineer got employed, some US tech companies would have to have lower minority representation than the population no matter what they did. And because the disparity between engineering grads and racial population proportion was so high, in fact most large companies would fail to meet the necessary minority proportions.
But EEOC would not instantly file suit against every offender, instead they would file ~40 such suits per year, targeting large companies that they considered particularly bad. And so companies that felt they might get hit soon started doing DEI programs, at first to attract more minority engineers (from other companies in the same sector, which would then fall under the limit, making it zero-sum), but then they realized that the EEOC didn't really sue the companies that were the loudest at touting their DEI credentials, and it all became extremely performative, no longer trying to attract minority talent but to be the loudest company talking about the subject. Iterate over that for a few decades and it got really weird.
It ended because Trump named 3 SC justices on his first term, and in a few important cases between 2023 and today, the new SC tore the whole thing down, and suing a company for disparate impact is now considered unconstitutional.
Every step taken by the nonprofit leadership has to be, (or at least seem to be at the time), net positive for the stated goal of the nonprofit. To be legal, the IPO needs to be a net gain for the nonprofit.
It can easily be that, if they believe that the capital it raises increases the long-term value of the company by a greater multiple than the proportion of the company that is lost from the nonprofit to outside investors.
The primary example of this is Novo Nordisk (the Ozempic company). Their largest shareholder is, through an intermediary, the Novo Nordisk Foundation, which is one of the largest charities in the world. Nordisk used to be a charity that owned 100% of it's own labs and facilities, but in 1989 they realized that they were just too small, and would get trampled by larger international players without greatly increasing their scope. So they made their subsidiary go public (through a complex merger, not an IPO), and now only own 28% of it, instead of 100%. But, in large part because of the capital that going public brought them, despite constantly distributing money for research and charity, that's 28% of a company that's more than 100x bigger that what they used to be. And they retained 77% voting control.
We are very, very far from thermodynamic limits. Lots of people have done the math, and current-gen systems use ~1000000000x more power than the Landauer limit, and ~100000x more power than ideal digital implementation on existing CMOS.
Currently, most AI systems work so that there is a large pool of memory on one side, compute on other side, and a very fat pipe between them. 90%+ of all energy goes into moving data from one side to the other, and selecting the specific element you wish to use from the large pool of ram. The energy cost of holding that data in memory and reading it from the memory cells, and the energy cost of doing the actual computation with low-precision FP are both trivial in comparison.
The systems are built this way because this is the most flexible architecture, and can be used for many different kinds of workloads. But the workload of a transformer in no way requires this flexibility. All the data is fairly local to the execution units that consume it. If you design a system as full PIM, where each ALU is associated and located with the small storage pool that contains only the elements used by that alu, and then tile that out to implement the full model, you cut out most of the energy cost of moving data. The cost is you need much more silicon to implement a working system, but the benefit is not just improved energy-efficiency, but also token speed and silicon efficiency.
The industry is moving towards such designs, with many startups working towards it with different approaches, Nvidia's recent aquisition* of Groq, etc. There is a well-understood path towards ~1000x higher token speeds at ~1000x better energy efficiency, that requires no new innovations, just investment of money into specialization.
There are even more gains if you move the weights into ROM, but that would require you to specialize not just for a specific type of model, but also for a specific set of model weights, ala Taalas.
I find the AI discourse is diseased because on one side you get people breathlessly overestimating the current state of the industry and progress that's going to happen in the next ~2 years, and on the other side people assume that the technology as is is what it will always be and completely ignore that the industry is aware of and actively working towards many ways to improve hardware, it's just that complex leading edge silicon chips take years to take from idea to working products, and transformer inference was only very recently proven to be a market large enough to specialize for.
Also in 2021, Nauka (the cursed Russian module) arrived on the station, and accidentally fired thrusters while attached, fighting against the attitude control of the station, flipping it around 540 degrees and putting a lot of stress that the station wasn't designed for on all the structural parts.
It was surreal to follow when it was happening, NASA was seriously underplaying what was going on and it was up to amateurs with telescopes looking at the station to tell the world that the situation is still ongoing.
This is one of the three major mishaps related to Nauka.
Typical desktop GPU ram does not support being write-back cached by the CPU. With PCIe resizable BAR, you could map the area into ram, so you could technically fit 32GB to memory, but it would have to be uncached (or write-combine cached), which would make it really, really slow.
There are a bunch of datacenter GPUs that support full cache coherency, but if you used them like that the VRAM would be very high latency from the CPU. So it would only be really slow.
I don't think it would help. It's not just a software issue that can be fixed in the kernel, the hardware fundamentally isn't part of the cache coherency system of the CPU.
I get the complaints in that thread but I still think it is hilarious. That repo is a gong show to random shit and perhaps one of the best worst examples of "opensource" LLM development.
If the latter is 10x faster, the issue is some kind of weird compilation failure for the above version. For one, it only cuts a third of the multiplies.
FP Division by constant is optimized by a compiler into a multiply. Graphics processing typically happens on the GPU these days, and on all recent GPUs FPMUL belongs to the class of lowest-latency operations. That is, there are no other instructions that complete faster.
Only with things like -ffast-math enabled will compilers do the reciprocal.
It can make a fair difference in some cases, but it's often better to selectively use it in code locations you know are acceptable by doing it manually in the code.
That's not totally true. It's sufficient to be exactly representable, but you only need the reciprocal rounding error to be small enough to guarantee the multiplication rounding step fixes it across the entire range of numerators. For IEEE754 f16 values, there are 28 such extra values, the positive and negative sides of 1705/x where x is a power of 2 at least as great as 2048.
Maybe for f16. The compiler's implementation could just be checking all numerators to see if the transformation is safe. The corner cases are messy and not quickly brute-forceable for f32 or brute-forceable at all for f64 though, so I doubt they'd bother, especially when I bet those constants have showed up literally zero times across all programs.
> so you can limit yourself to trusted packages, e.g. packages vouched by your preferred set of root signers who publish compilations of trusted publisher keys.
This is really not good enough. The real gigantic problem with supply chain risk is not that you get tricked to use a package by bad actor, it is that if everyone using gazillion packages by known good authors, that makes all those known good authors with upload rights for their own packages into exploitable vulnerabilities for all the software that depends on their libraries. So far, this has mostly looked like someone stealing creds and sneakily uploading compromised versions, if the situation persists it will eventually get worse with organized crime attacking and using rubber hose cryptoanalysis on devs. There is too much value hanging on too wobbly a foundation here, the situation is not stable and it needs to change.
The C++ standard library is terrible because it was designed from nothing with no actual real-world testing. IMHO the best path forwards for Rust is that when crates become established and "complete" with little further development needed, they should eventually be merged into some large conglomerated library with an actual organization behind it. This doesn't necessarily need to be the standard library that ships with the language. Yes, this would end up like Python, where eventually there would be 4 http clients in there with 3 of them deprecated, but that would still be better than the present state of affairs.
The C++ std lib is no longer terrible. It is really at a usable level these days. Not fun, but totally bearable. The motivation for C++ has never been the quality of the language or the std lib anyways, so it can happily chug along in many places (including the browser I am typing this on).
I disagree about merging existing "done" libraries into a mega library. That can work to some extent, but that approach will not produce something lasting (in the sense that it will remain without the need for changes for a long time). The way to achieve a lasting mega library is by putting all the pieces you need, and constantly working to increase the consistency between them. Somewhat like turning a long winded rambling into a much denser article.
Going from a set of good and working libraries to a large CONSISTENT library would be substantially laborious. Hence the need for someone with deep pockets to take it on. (There are other ways for that to happen but those are rarer).
It came pretty close to happening in Rust 2024, but it was determined that there just wasn't enough time left before the end-of-year deadline to roll out such a big change.
When the new edition rolls around that swaps to the new type, I expect a bunch of libraries are going to get really annoying to use. And vice-versa for new libraries with the old editions.
Both types have the `From` conversion traits implemented between each other, so in most cases interoperating with APIs using the old type should be as simple as doing `(1...4).into()`. And, probably because of the warts of the old types, I haven't seen them used much in APIs, so even that I don't think will happen very often.
Unlikely, since library interfaces need to use a trait to accept all of the open/closed inclusive/exclusive syntax variations. If the only accept one specific named range type, they're clunky already.
The actual reason for the "corporate DEI" in tech was that since Griggs v. Duke Power Co. (1971), EEOC could sue companies that had lower minority proportion than population norm for discrimination, and could could prove the discrimination in court using nothing but the racial makeup of the employees, and some policy at the company that could in theory have disparate impact. And under their standards, literally any policy has disparate impact.
This hit other sectors first, to which they responded by hiring more minorities. But tech had the problem that schools were consistently producing fewer minority engineering grads than the population proportion, and in a world where approximately every engineer got employed, some US tech companies would have to have lower minority representation than the population no matter what they did. And because the disparity between engineering grads and racial population proportion was so high, in fact most large companies would fail to meet the necessary minority proportions.
But EEOC would not instantly file suit against every offender, instead they would file ~40 such suits per year, targeting large companies that they considered particularly bad. And so companies that felt they might get hit soon started doing DEI programs, at first to attract more minority engineers (from other companies in the same sector, which would then fall under the limit, making it zero-sum), but then they realized that the EEOC didn't really sue the companies that were the loudest at touting their DEI credentials, and it all became extremely performative, no longer trying to attract minority talent but to be the loudest company talking about the subject. Iterate over that for a few decades and it got really weird.
It ended because Trump named 3 SC justices on his first term, and in a few important cases between 2023 and today, the new SC tore the whole thing down, and suing a company for disparate impact is now considered unconstitutional.
reply