I understand the sadness around not understanding it, it's fucking hard. however, there are more and more resources online getting published for how to get started understanding it that help with understanding the math at an abstraction that helps with learning how to build with it.
I would strongly strongly strongly recommend starting with karpathy's from 0 to hero neural networks youtube course - it starts with building a tensor library and back propagation, explaining it in a way that finally clicked for me https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...
as i've dove more and more into it, i would strongly recommend trying to run things on your local machine too (llama.cpp, ollama, LM studio). that has helped me fight that feeling of like "are we all just going to be open AI developers in the end" and made me feel like you _can_ integrate these things into stuff you build by your self. I can't imagine how fucked we'd all be if llama was never opensourced. being old does not mean that you can't continue to grow, and remember that it's okay to feel overwhelmed about all this - many people are.
There's different levels of "understanding how things work" and the author makes it clear what kind of understanding he's going for. If you look at the source code to program, you should be able to point to any line of code and answer "What does this particular line of code do? Why is it important and how does it relate to the rest of the design?" Same applies to a part on a electronic schematic or a mechanical drawing. There is likely no similar meaningful answer to those questions if you look at a particular weight in a model.
I’ve seen mention of researchers being able to point to specific neural pathways in AI for both simple things and even more advanced abstract things like “lying” or “truthfulness”. So it’s not a total lost cause maybe.
I think what the author is trying to get across, and what I tend to agree with having touched on the mathematics behind transformers at least, is that we don't know how these models actually arrive at the outputs they do.
We know the rules they play by thoroughly - we made those ourselves(the math/model structure). But the outputs we are getting in many cases were never explicitly outlined in the rule set. We can follow the prompts step-by-step but quickly end up on seemingly non-nonsensical paths that explode into a web of what appears to be completely unrelated concepts. It could be that our meat brains simply don't have the working memory necessary to track these meta and meta-meta emergent systems at play that arrive at an output.
I am profoundly, profoundly cynical about this particular development in computing culture.
But your last paragraph resonates with me pretty deeply, and suggests to me that there might be a way forward for me when this becomes unavoidable, which it will.
Frankly I would rather direct my energies away from the accelerating face of dehumanising technologies and towards rehumanising technology through education, but I do recognise I'll eventually have to engage with this just to educate.
I don't think you've grappled with the point the author is making.
>“If we open up ChatGPT or a system like it and look inside, you just see millions of numbers flipping around a few hundred times a second,” says AI scientist Sam Bowman. “And we just have no idea what any of it means.”
>To me as an engineer, that is just incredibly unsatisfying. Without understanding how something works, we are doomed to be just users.
AI aren't complicated. They aren't sophisticated math that you can poke at and understand.
They're fucking million dollar spaghetti code that happen to work (for values of 'work').
Those videos are teaching people "This is an if statement! This is a CPU!" And then you can look at 5.8 billion lines of spaghetti code and say "Gee! I understand how this works now! Yay!"
Asking because I literally do not know: Can you step through AI like you can step through C++ code in a debugger? Like, if you type in a prompt "Draw me a picture of a cat wearing a blue hat" could you (if you wanted to) step through every piece of the AI's process of generating that picture like you are stepping through code? If I wanted to understand how a Diffie–Hellman key exchange function worked, I could step through everything line by line to understand it, it would be deterministic, and I could do the exact same thing again and see the exact same steps.
And then you can look at 5.8 billion lines of spaghetti code
LLMs don't have anywhere near that much code. The algorithms for training and inference are not that complicated; the "intelligent" behavior is entirely due to the weights.
OP clearly means that the weights are spaghetti code, technically they may be data but if they encode all of the actual functionality of the system then they are effectively bytecode which is interpreted by a runtime. You can understand how the runtime works if you care to learn, but you will never understand what's happening below that, nor will anyone else.
Aside from annoying people who want to understand how things work, it also means you can't ever know if you have a fully optimal or correct solution, all you can do is keep throwing money into the training furnace and hope a better solution falls out next time. The whole nature of it gatekeeps out anyone who doesn't have enormous amounts of money to burn.
I can see that, although to me there's a difference between weights and something like bytecode. The weights don't encode any sort of logical operations, they're just numbers that get multiplied and added according to relatively simple algorithms.
Totally agreed that the process of generating and evaluating weights is opaque and not very accessible.
But that's exactly the point. The code you are talking about is more like an interpreter for a virtual machine, which then runs a program made up of billions of numbers that wasn't designed by a human (or any sort of intelligence - you can argue about the end product, but the training process certainly isn't intelligent)
I would strongly strongly strongly recommend starting with karpathy's from 0 to hero neural networks youtube course - it starts with building a tensor library and back propagation, explaining it in a way that finally clicked for me https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...
jeremy howard also has a fantastic video that is more around how to use LLMs and such called a hacker's guide to language models - https://www.youtube.com/watch?v=jkrNMKz9pWU&t=607s
as i've dove more and more into it, i would strongly recommend trying to run things on your local machine too (llama.cpp, ollama, LM studio). that has helped me fight that feeling of like "are we all just going to be open AI developers in the end" and made me feel like you _can_ integrate these things into stuff you build by your self. I can't imagine how fucked we'd all be if llama was never opensourced. being old does not mean that you can't continue to grow, and remember that it's okay to feel overwhelmed about all this - many people are.