It is deeply offensive to the serious players of the game to suggest Mornington Crescent is "made up". Yes, to neophytes it can seem random and unstructured but it is preposterous to suggest game with such a lineage is fictional.
The Mornington Crescent Players Association (MCPA - often lovingly renlffered to as The Scottish Father) unanimously voted through the Flodden amendments last year. The Mornington Crescent Rules Committee (not to be confused with the Rules Committee of Mornington Crescent) will be voting on the topic on December 25th. Whereupon it will be passed to the International Board for ratification.
The only controversial point is it will be applied retroactively over the last decade, changing the results of no less than 3 world championship matches
> There are decisions you didn't realize you needed to make, until you get there.
Is the key insight and biggest stumbling block for me at the moment.
At the moment (encourage by my company) I'm experimenting with as hands off as possible Agent usage for coding. And it is _unbelievably_ frustrating to see the Agent get 99% of the code right in the first pass only to misunderstand why a test is now failing and then completely mangle both it's own code and the existing tests as it tries to "fix" the "problem". And if I'd just given it a better spec to start with it probably wouldn't have started producing garbage.
But I didn't know that before working with the code! So to develop a good spec I either have to have the agent stopping all the time so I can intervene or dive into the code myself to begin with and at that point I may as well write the code anyway as writing the code is not the slow bit.
And my process now (and what we're baking into the product) is:
- Make a prompt
- Run it in a loop over N files. Full agentic toolkit, but don't be wasteful (no "full typecheck, run the test suite" on every file).
- Have an agent check the output. Look for repeated exploration, look for failures. Those imply confusion.
- Iterate the prompt to remove the confusion.
First pass on the current project (a Vue 3 migration) went from 45 min of agentic time on 5 files to 10 min on 50 files, and the latter passed tests/typecheck/my own scrolling through it.
It was not a CPU hog - this is a myth that needs to die The flash runtime was pretty modest.
Now, the code people wrote was CPU hogs, because lots of non coders were writing code and they would do anything to make it work. The Flash runtime was not causing the Punch the Monkey and to peg your CPU, it was because the punch the monkey ad was fucking awful code.
All those Flash programmer went on to write the first wave of HTML5 stuff which, shock horror, where vastly CPU inefficient.
Yeah, that's what I'm trying to explain (maybe unsuccessfully). I do know backprop, I studied and used it back in the early 00s when it was very much not cool. But I don't think that knowledge is especially useful to use LLMs.
We don't even have a complete explanation of how we go from backprop to the emerging abilities we use and love, so who cares (for that purpose) how backprop works? It's not like we're actually using it to explain anything.
As I say in another comment, I often give talks to laypeople about LLMs and the mental model I present is something like supercharged Markov chain + massive training data + continuous vocabulary space + instruction tuning/RLHF. I think that provides the right abstraction level to reason about what LLMs can do and what their limitations are. It's irrelevant how the supercharged Markov chain works, in fact it's plausible that in the future one could replace backprop with some other learning algorithm and LLMs could still work in essentially the same way.
In the line of your first paragraph, probably many teens who had a lot of time in their hands when Bing Chat was released, and some critical spirit to not get misled by the VS, have better intuition about what an LLM can do than many ML experts.
I disagree in the case of LLMs, because they really are an accidental side effect of another tool. Not understanding the inner workings will make users attribute false properties to them. Once you understand how they work (how they generate plausible text), you get a far deeper grasp on their capabilities and how to tweak and prompt them.
And in fact this is true of any tool, you don’t have to know exactly how to build them but any craftsman has a good understanding how the tool works internally. LLMs are not a screw or a pen, they are more akin to an engine, you have to know their subtleties if you build a car. And even screws have to be understood structurally in advanced usage. Not understanding the tool is maybe true only for hobbyists.
Could you provide an example of an advanced prompt technique or approach that one would be much more likely to employ if they had knowledge of X internal working?
None of this is about an end user in the sense of the user of an LLM. This is aimed at the prospective user of a training framework which implements backpropagation at a high level of abstraction. As such it draws attention to training problems which arise inside the black box in order to motivate learning what is inside that box. There aren't any ML engineers who shouldn't know all about single layer perceptrons I think, and that makes for a nice analogy to real life issues in using SGD and backpropagation for ML training.
The post I was replying to was about "colleagues, who are extremely invested in capabilities of LLMs" and then mentions how they are uninterested in how they work and just interested in what they can do and societal implications.
It sounds to me very much like end users, not people who are training LLMs.
The analogy is if you don't understand the limitations of the tool you may try and make it do something it is bad at and never understand why it will never do the thing you want despite looking like it potentially coild
The Bitter Lesson is with enough VC subsidised compute those things are useful.
reply