If you like this, it's also worth looking at llama2.c[1], an implementation of t...

nicklecompte · on May 20, 2024

One other thing to add is large-scale RLHF. Big Tech can pay literally hundreds of technically-sophisticated people throughout the world (e.g. college grads in developing countries) to improve LLM performance on all sorts of specific problems. It is not a viable way to get AGI, but it means your LLM can learn tons of useful tricks that real people might want, and helps avoid embarrassing "mix broken glass into your baby formula" mistakes. (Obviously it is not foolproof.)

I suspect GPT-4's "secret sauce" in terms of edging out competitors is that OpenAI is better about managing data contractors than the other folks. Of course it's a haze of NDAs to learn specifics, and clearly the contractors are severely underpaid compared to OpenAI employees/executives. But a lone genius with a platinum credit card can't create a new world-class LLM without help from others.

stephc_int13 · on May 20, 2024

Yes, this is the secret sauce and the moat. Not as easy as buying more compute with unlimited budget.

… built on the back of a disposable workforce…

There is something grim and dystopian, thinking about the countless small hands feeding the machine.

factormeta · on May 20, 2024

>There is something grim and dystopian, thinking about the countless small hands feeding the machine.

Dystopian indeed, this is pretty much how Manhattan Project and CERN were done, with many independent contractors doing different parts, and only a few has the overview. A page out of corporate management book, it very much allows concentration of power in the hands of a few.

wodenokoto · on May 20, 2024

Since when is CERN a dystopian project?

nicklecompte · on May 20, 2024

Big Government Socialism won't let you build your own 25km-circumference particle accelerator. Bureaucrats make you fill out "permits" and "I-9s for the construction workers instead of hiring undocumented day laborers."

I am wondering if "CERN was pushed on the masses by the few" is an oblique reference to public fears that the LHC would destroy the world.

pagekicker · on May 20, 2024

Very generous to compare to Manhattan Project or CERN.

fragmede · on May 20, 2024

don't buy into the hype, but when Facebook has spent around as much on GPUs as the Manhattan project (but not the Apollo program), the comparison kinda makes itself.

https://twitter.com/emollick/status/1786213463456448900

$22 in 2008 -> $33 today https://data.bls.gov/cgi-bin/cpicalc.pl?cost1=22&year1=20080...

nicklecompte · on May 20, 2024

The Big Dig (Boston highway overhaul) cost $22bn in 2024 dollars. The Three Gorges dam cost $31bn. These are expensive infrastructure projects (including the infrastructure for data centers). It doesn't say anything about how important they are for society.

Comparing LLMs to the Manhattan Project based on budget alone is stupid and arrogant. The comparison only "makes itself" because Ethan Mollick is a childish and unscientific person.

ladzoppelin · on May 20, 2024

I read this last week and its terrifying. If the world lets Facebook become an AI leader its on us as we all know how that story will play out.

thelittleone · on May 20, 2024

We must summon a fellowship of the AI ring with one hobbit capable of withstanding the corrupting allure of it all.

kreeben · on May 20, 2024

Don't torment the hobbits! Send the eagles right away!

factormeta · on May 22, 2024

>Comparing LLMs to the Manhattan Project based on budget alone is stupid and arrogant

Just want to clarify. The comparison to Manhattan Project or CERN is referencing "the countless small hands feeding the machine." In projects such as these, roles and jobs are divided into small parts, that people who are working on it don't really see the forest from the tree, and that only a few that has the picture of the whole project.

littlestymaar · on May 20, 2024

The big difference is that CERN or Manhattan projects where done by local contractors with often more than decent wages, which isn't the case when you pay people from Madagascar a couple dollar a day.

bzzzt · on May 20, 2024

Maybe it's the only way. Companies that don't have that concentrated power will probably fall apart.

fire_lake · on May 20, 2024

Hard to defend because once your model is out there other companies can train on its output.

stephc_int13 · on May 20, 2024

Yes, but the output is only one third of the data. You also need the input and the annotations.

kleton · on May 20, 2024

OpenAI is heavily relying on Scale AI for training data (contractors).

andy99 · on May 19, 2024

And if you want to understand I'd recommend this post (gpt2 in 60 lines of numpy) and the post on attention it links to. The concepts are mostly identical to llama, just with a few minor architectural tweaks. https://jaykmody.com/blog/gpt-from-scratch/

bhavesh2712 · on May 20, 2024

Thanks for sharing this!

AnthonyMouse · on May 20, 2024

> Most programmers have an intuition that duplicating a large software project from scratch, like Linux or Chromium for example, would require incredible amounts of expertise, manpower and time. It's not something that a small team can achieve in a few months. You're limited by talent, not hardware.

But only for the same reasons. Linux runs on very nearly every piece of hardware ever made. The APIs you have to implement in order to run "Linux programs" are large and full of old complexity that exists for compatibility. Chromium is full of code to try to make pages render even though they were designed for Internet Explorer 6.

Conversely, some university programs have students create a basic operating system from scratch. It's definitely something a small team can do as long as you don't care about broad hardware support or compatibility with existing applications. In principle a basic web browser is even simpler.

evanjrowley · on May 19, 2024

Links for llama2.c:

https://github.com/karpathy/llama2.c

https://news.ycombinator.com/item?id=36838051

Fubarberry · on May 19, 2024

There's also a project where they have GPT-2 running off of an excel spreadsheet.

https://arstechnica.com/information-technology/2024/03/once-...

isaacfung · on May 20, 2024

I recommend reading https://github.com/bkitano/llama-from-scratch over the article op linked.

It actually teaches you how to build llama iteratively, test, debug and interpret the training loss rather than just desribing the code.

gmays · on May 20, 2024

> The code isn't that complicated, you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance, as an individual with a background in programming and who still remembers their calculus and linear algebra, with a year or so of self study.

Great overview. One gap I've been working on (daily) since October is the math working towards MA's Mathematics for Machine Learning course (https://mathacademy.com/courses/mathematics-for-machine-lear...).

I wrote about my progress (http://gmays.com/math) if anyone else is interested in a similar path. I recently crossed 200 days of doing math daily (at least a lesson a day). It's definitely taking longer than I want, but I also have limited time (young kids + startup + investing).

The 'year of self study' definitely depends on where you're starting from and how much time you have, but it's very doable if you can dedicate an hour or two a day.

barrkel · on May 20, 2024

The code is much more similar, in principle, to a virtual machine. The actual code, the bit that contains the logic which has the semantics we intend, is in the trained weights, where the level of complexity is much higher and more subtle.

Const-me · on May 20, 2024

> you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so

I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable.

A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects.

miki123211 · on May 20, 2024

I'd say a year for somebody who doesn't know what a linear layer is and couldn't explain why a GPU might be of any use if you're not playing games, but who knows what the derivative of 3x^2 is.

netdevnet · on May 20, 2024

> What makes LLMs difficult is getting access to all the hardware to train them, getting the data, and being able to preprocess that data.

Yes, that's my opinion too. GAOs (Grassroots AI Organisations) are constrained by access to data and the hardware needed to process the data and train the model on it. I look forward to a future where GAOs will crowdsource their computations in the same way many science labs borrow computing power from people around the world.

miki123211 · on May 20, 2024

This is hard because you need high bandwidth between the GPUs in your cluster, bandwidth far higher than broadband could provide. I'm not even sure whether the time spend synchronizing between far-away machines would offset the increase in computational power.

bradfox2 · on May 19, 2024

I feel like this ignores the complexity of the distributed training frameworks. The challenge is in making it fast at scale.

fasa99 · on May 22, 2024

>" THe fact that this 960-line file and a somewhat modern C compiler is all you really need to run a state-of-the-art language model is really surprising to many."

"the code for AGI will be simple" - John Deremetrius Carmack

fooker · on May 20, 2024

> THe code isn't that complicated.

This is an indication that we’re at the infancy of this field.