More

luis_likes_math · 2025-11-06T14:34:16 1762439656

We introduce a geometric framework for understanding Transformer language models through an analogy with General Relativity. In this view, keys and queries define a curved “space of meaning,” and attention acts like gravity, moving information across it. Layers represent discrete time steps where token representations evolve along curved—not straight—paths shaped by context. Through visualization and simulation experiments, we show that these trajectories indeed bend and reorient, confirming the presence of attention-induced curvature in embedding space.

luis_likes_math · 2025-11-05T19:06:28 1762369588

Learn about AI Agents, Reasoning Models, and RAG, in this friendly video! https://www.youtube.com/watch?v=xXL0gMo9Lso

luis_likes_math · 2025-10-09T02:12:27 1759975947

The KL divergence of distributions P and Q is a measure of how similar P and Q are. However, the KL Divergence of P and Q is not the same as the KL Divergence of Q and P. Why? Learn the intuition behind this in this friendly video.

luis_likes_math · 2025-09-22T13:27:56 1758547676

Imagine you have a red glove. Could you change the color to blue, by only looking at it? In the real world, you can't, but in the quantum world, these kind of phenomena are possible! Learn about it in this friendly video!

luis_likes_math · 2025-05-22T13:19:52 1747919992

Quantum computers are fast. But is there a limit to how fast they are? Learn about the speed of information, Shannon entropy, and information gain, in this video!

luis_likes_math · 2025-05-08T14:11:59 1746713519

Hello! Here is a breakdown of GRPO (Group Relative Policy Optimization), used to train reasoning models like DeepSeek.

luis_likes_math · 2025-03-31T14:05:43 1743429943

Learn about discrete dynamical systems and their eigenvalues/eigenvectors in this friendly video!

luis_likes_math · 2025-02-25T15:49:30 1740498570

Come learn about the Stone-Weierstrass theorem, including compact sets, Hausdorff spaces, and algebras of functions, in a simple language!

luis_likes_math · 2025-02-18T14:11:06 1739887866

I like to see word embeddings as a universe where words fly in spce like planets. The attention mechanism acts like the laws of gravity that govern in this universe. Learn all about it in this video!

luis_likes_math · 2025-01-23T13:21:06 1737638466

Friendly video explaining why a neural network with only one hidden layer can approximate any continuous functions (using a Lego block analogy)