Bookmarks
Street Fighting Transformers
Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.
Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...
Academic talk from the Simons Institute presenting a unified framework for efficient linear layers in Transformers—highly relevant to deep-learning researchers and practitioners.
How DeepSeek Rewrote the Transformer [MLA]
In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.
What is a Transformer? (Transformer Walkthrough Part 1/2)
In-depth technical walkthrough of Transformer architecture by an AI researcher, directly aligned with deep-learning educational content.
The Attention Mechanism in Large Language Models
Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.
Large Language Models in Five Formulas
Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
Andrej Karpathy kicks off Stanford CS25 with a primer on Transformer architecture, its history, and cross-domain applications.
Transformer Neural Network: Visually Explained
Step-by-step visual and PyTorch implementation of the Transformer—covering self-attention, positional encoding, and multi-head mechanisms.