Ludwig - ai/language

Fast LLM Inference From Scratch

Added on October 13, 2025 · 37 min read

Can LLMs dream of Electric Sheep?

Added on September 8, 2025 · 14 min read

The Parallelism Mesh Zoo

Added on September 1, 2025 · 15 min read

How Attention Sinks Keep Language Models Stable

Added on August 26, 2025 · 9 min read

Do LLMs Have Good Music Taste?

Added on August 19, 2025 · 4 min read

LLM in-context learning as (approximating) Solomonoff induction

Added on August 11, 2025 · 6 min read

AI Predictions With ex-Applied AI engineer at Stripe!

Added on July 24, 2025 · 39:58 · 43.0K views

Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.

It's Not About Scale, It's About Abstraction

Added on July 24, 2025 · 46:22 · 118.5K views

François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.

Street Fighting Transformers

Added on July 22, 2025 · 25:13 · 8.2K views

Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.

How might LLMs store facts | Deep Learning Chapter 7

Added on July 22, 2025 · 22:42 · 1.5M views

High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.

What Matters for Model Merging at Scale?

Added on July 22, 2025 · 24:47 · 145 views

Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.

François Chollet on OpenAI o-models and ARC

Added on July 22, 2025 · 86:47 · 85.8K views

Building Anthropic | A conversation with our co-founders

Added on July 22, 2025 · 51:49 · 101.0K views

The ARC Prize 2024 Winning Algorithm

Added on July 22, 2025 · 69:05 · 17.1K views

LSTM: The Comeback Story?

Added on July 22, 2025 · 67:02 · 25.3K views

How DeepSeek Rewrote the Transformer [MLA]

Added on July 22, 2025 · 18:09 · 680.7K views

In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.

Causal Representation Learning: A Natural Fit for Mechanistic Interpretability

Added on July 22, 2025 · 59:25 · 1.9K views

Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months

Added on July 22, 2025 · 75:49 · 312.5K views

Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.

V.O. Complete. A masterclass from the pioneer of artificial intelligence. Jürgen Schmidhuber

Added on July 22, 2025 · 61:56 · 18.9K views

Neel Does Research (Vibe Coding Edition)

Added on July 22, 2025 · 154:23 · 4.1K views

Kevin Ellis - Probabilistic Thinking in Language and Code - IPAM at UCLA

Added on July 22, 2025 · · views

What is the Transformers’ Context Window in Deep Learning? (and how to make it LONG)

Added on July 22, 2025 · 27:03 · 3.0K views

DeepMind’s AlphaEvolve AI: History In The Making!

Added on July 22, 2025 · · views

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

Added on July 22, 2025 · 23:16 · 18.5K views

CUDA Mode Keynote | Andrej Karpathy | Eureka Labs

Added on July 22, 2025 · 23:21 · 37.7K views

Agentic Engineering in Action with Mitchell Hashimoto

Added on July 22, 2025 · 61:04 · 29.8K views

Dylan Patel: GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

Added on July 22, 2025 · 62:17 · 214.2K views

Zed Inferred: Diffusion Language Models

Added on July 22, 2025 · 62:36 · 4.3K views

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Added on July 22, 2025 · 47:51 · 14.6K views

In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.

The Attention Mechanism in Large Language Models

Added on July 22, 2025 · 21:01 · 126.5K views

Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.

OpenAI's Stable Code 3B: A Game-Changer for Coding & Programming Tasks

Added on July 22, 2025 · · views

Large Language Models in Five Formulas

Added on July 22, 2025 · 58:01 · 38.7K views

Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.

Jeff Dean (Google): Exciting Trends in Machine Learning

Added on July 22, 2025 · 72:30 · 180.4K views

Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.

Let's build the GPT Tokenizer

Added on July 22, 2025 · 133:34 · 860.3K views

Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.

LoRA explained (and a bit about precision and quantization)

Added on July 22, 2025 · 17:06 · 100.2K views

Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.

How to Build an LLM from Scratch | An Overview

Added on July 22, 2025 · · views

Floating Points are no more, Changes everything for LLMs!!!

Added on July 22, 2025 · · views

Fine-tune LLMs - Line by line code example

Added on July 22, 2025 · 8:20 · 4.7K views

Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.

Tutorial | LLMs in 5 Formulas (360°)

Added on July 22, 2025 · · views

Anthropic's Meta Prompt: A Must-try!

Added on July 22, 2025 · 12:34 · 97.3K views

Sholto Douglas & Trenton Bricken - How LLMs Actually Think

Added on July 22, 2025 · 193:13 · 181.2K views

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

Added on July 22, 2025 · 13:40 · 387.6K views

DeepSeek Debrief: >128 Days Later – SemiAnalysis

Added on July 9, 2025 · 13 min read

How I don't use LLMs

Added on June 27, 2025 · 10 min read

How we built our multi-agent research system

Added on June 26, 2025 · 17 min read

DeepSeek-V3 Explained 1: Multi-head Latent Attention

Added on May 29, 2025 · 9 min read

You could have designed state of the art positional encoding

Added on May 16, 2025 · 13 min read

attention is logarithmic, actually

Added on May 16, 2025 · 11 min read

Llama from scratch (or how to implement a paper without crying)

Added on May 16, 2025 · 11 min read

a Hugging Face Space by nanotron

Added on May 3, 2025 · 1 min read

Multi-layer language heads: the output latent is for text (and nothing else)

Added on April 19, 2025 · 4 min read

CS336: Language Modeling from Scratch

Added on April 19, 2025 · 4 min read

Contextualization Machines

Added on April 17, 2025 · 17 min read

Why Attention Is All You NeedWhy Attention Is All You Need

Added on March 9, 2025 · 10 min read

Unveiling_DeepSeek.pdf

Added on January 22, 2025 · 8 min read

DeepSeek-V3 Explained: A Deep Dive into the Next-Generation AI Model

Added on January 18, 2025 · 9 min read

Oasis: A Universe in a Transformer

Added on October 31, 2024 · 4 min read

The MiniPile Challenge for Data-Efficient Language Models

Added on July 29, 2024 · 18 min read

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Added on July 22, 2024 · 1 min read

Reader

Added on July 2, 2024 · 6 min read

[2305.13009] Textually Pretrained Speech Language Models

Added on June 24, 2024 · 1 min read

MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

Added on June 18, 2024 · 1 min read

The Annotated Transformer

Added on May 27, 2024 · 32 min read

Auto-Regressive Next-Token Predictors are Universal Learners

Added on May 26, 2024 · 40 min read

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Added on March 6, 2024 · 34 min read

gemini_v1_5_report

Added on February 18, 2024 · 1h 59m read

2309.10668

Added on February 8, 2024 · 37 min read

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Added on January 23, 2024 · 1 min read

Visual Guides to understand the basics of Large Language Models

Added on January 14, 2024 · 1 min read

Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs

Added on January 14, 2024 · 26 min read

mlx-examples/lora at main · ml-explore/mlx-examples · GitHub

Added on January 10, 2024 · 4 min read

Mixtral of Experts

Added on January 10, 2024 · 1 min read

An Intuition for Attention

Added on January 7, 2024 · 9 min read

How GPT3 Works - Visualizations and Animations

Added on January 5, 2024 · 3 min read

Tensor2Tensor Intro

Added on January 4, 2024 · 1 min read

The Annotated Transformer

Added on January 4, 2024 · 16 min read

GitHub - tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial

Added on January 4, 2024 · 32 min read

The Random Transformer

Added on January 4, 2024 · 1 min read

CS25: Transformers United V3

Added on January 4, 2024 · 6 min read

Spaces using openai/whisper-large-v2 232

Added on January 3, 2024 · 7 min read

Text Summarization: How to Calculate BertScore

Added on January 3, 2024 · 4 min read

MotionGPT: Human Motion as a Foreign Language

Added on January 3, 2024 · 48 min read

Bookmarks

Subcategories