my opinion: it quickly gets into "the math behind LLMs" that make no sense to me...

zackmorris · on May 20, 2024

Ya there are concepts in programming and math that are mostly self-teachable from first principles, but then there's what looks like gibberish because it's too new to have been distilled down into something tractable yet. I would say that arrays and matrices are straightforward to understand, while tensors are not. So I'm disappointed that so much literature currently revolves around tensors. Same for saying embedding instead of just vector representation, etc.

It helps me to think in terms of levels of abstraction rather than complexity. My education stopped at a 4 year degree, but AI is mostly postgraduate still. So I have to translate to what I know because I haven't internalized the lingo.

Here's the most approachable teaching of neural nets (NNs) and large language models (LLMs) that I've seen so far:

https://news.ycombinator.com/item?id=40213292 (Alice’s Adventures in a differentiable wonderland)

https://arxiv.org/pdf/2404.17625 (pdf)

https://news.ycombinator.com/item?id=40215592 (tensor and NN layer breadcrumbs)

  II A strange land 105
    7 Convolutional layers 107
      ..
      7.1.3 Translational equivariant layers 112
    ..
    9 Scaling up the models 143
      ..
      9.3 Dropout and normalization 151
        9.3.1 Regularization via dropout 152
        9.3.2 Batch (and layer) normalization 156
  
  III Down the rabbit-hole 167
    10 Transformer models 169
      10.1 Introduction 169
        10.1.1 Handling long-range and sparse dependencies 170
        10.1.2 The attention layer 172
        10.1.3 Multi-head attention 174
      10.2 Positional embeddings 177
        10.2.1 Permutation equivariance of the MHA layer 177
        10.2.2 Absolute positional embeddings 179
        10.2.3 Relative positional embeddings 182
      10.3 Building the transformer model 182
        10.3.1 The transformer block and model 182
        10.3.2 Class tokens and register tokens 184
    11 Transformers in practice 187
      11.1 Encoder-decoder transformers 187
        11.1.1 Causal multi-head attention 188
        11.1.2 Cross-attention 189
        11.1.3 The complete encoder-decoder transformer 190
      11.2 Computational considerations 191
        11.2.1 Time complexity and linear-time transformers 191
        11.2.2 Memory complexity and the online softmax 192
        11.2.3 The KV cache 194
        11.2.4 Transformers for images and audio 194
      11.3 Variants of the transformer block 197

anon373839 · on May 20, 2024

I recommend _Deep Learning with Python_ by François Chollet (the creator of Keras). It’s very clear and approachable, explains all of these concepts, and doesn’t try to “impress” you with unnecessary mathematical notation. Excellent introductory book.

The only downside is that in 2024, you are probably going to use PyTorch and not Keras + Tensorflow as shown in the book.

gradascent · on May 20, 2024

If you want to gain familiarity with the kind of terminology you mentioned here, but don't have a background in graduate-level mathematics (or even undergrad really), I highly recommend Andrew Ng's "Deep Learning Specialization" course on Coursera. It was made a few years ago but all of the fundamental concepts are still relevant today.

antonjs · on May 20, 2024

Fei Fei Li and Andrej Karpathy's Stanford CS231N course is also a great intro to the basic of the math from an engineering forward perspective. I'm pretty sure all the materials are online. You build up from the basic components to an image focused CNN.

starik36 · on May 20, 2024

> understand but don't really get

That's exactly where I am at. Despite watching Karpathy's tutorial videos, I quickly got lost. My highest level of math education is Calculus 3 which I barely passed. This probably means that I will only ever understand LLMs at a high level.

danielmarkbruce · on May 20, 2024

Understanding Deep Learning is a very approachable text that will get you 80% of the way there.

Dive into Deep Learning is another.

Both have free PDF versions available.

The math isn't difficult. The notation is a little foreign, and you have to take your time reading and rereading the equations.

lithos · on May 22, 2024

It's Signals and Systems in part to get the notation and explanation. MIT has the course online for free. (Though probably a little more general than what you need, since the class is also used to prep electrical engineers for robotics and radio communication).

requizm · on May 26, 2024

>MIT has the course online for free

What is the name of the course? This? https://ocw.mit.edu/courses/res-6-007-signals-and-systems-sp...

lithos · on June 1, 2024

Yeah that's it.