This explanation walks you through the math and the corresponding code, but (at ...

whoateallthepy · on Feb 11, 2023

At the end of last year I put together a repository to try and show what is achieved by self-attention on a toy example: detect whether a sequence of characters contains both "a" and "b".

The toy problem is useful because the model dimensionality is low enough to make visualization straightforward. The walkthrough also goes through how things can go wrong, and how it can be improved, etc.

The walkthrough and code is all available here: https://github.com/rstebbing/workshop/tree/main/experiments/....

It's not terse like nanoGPT or similar because the goal is a bit different. In particular, to gain more intuition about the intermediate attention computations, the intermediate tensors are named and persisted so they can be compared and visualized after the fact. Everything should be exactly reproducible locally too!

GaggiX · on Feb 11, 2023

This article explains well why we need attention: https://jalammar.github.io/visualizing-neural-machine-transl..., and also how it was developed.

zmmmmm · on Feb 11, 2023

those animations are beautiful!

doug_durham · on Feb 11, 2023

I agree. It seems like the target audience is the experienced Deep learning practitioner. Which makes me wonder why such an audience would need this treatment. Why not just read the original paper?