Ludwig - ai

Live coding session where George Hotz designs and trains a simple neural-network chess engine, examining model architecture, training loop, and gameplay integration.

AI Predictions With ex-Applied AI engineer at Stripe!

Added on July 24, 2025 · 39:58 · 43.0K views

Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.

Simple Artificial Neural Network entirely in assembly language

Added on July 24, 2025 · 30:54 · 10.7K views

Demonstrates building and training a single-layer neural network entirely in x86-64 assembly language, covering forward pass, MSE loss, back-propagation, and low-level numeric routines.

Why Does Diffusion Work Better than Auto-Regression?

Added on July 24, 2025 · 20:18 · 635.9K views

Explains the mechanics and trade-offs of modern generative models, contrasting autoregressive transformer pipelines with denoising diffusion processes and detailing why diffusion excels at image generation while transformers dominate text.

It's Not About Scale, It's About Abstraction

Added on July 24, 2025 · 46:22 · 118.5K views

François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Added on July 23, 2025 · 1 min read

Street Fighting Transformers

Added on July 22, 2025 · 25:13 · 8.2K views

Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.

How might LLMs store facts | Deep Learning Chapter 7

Added on July 22, 2025 · 22:42 · 1.5M views

High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.

What Matters for Model Merging at Scale?

Added on July 22, 2025 · 24:47 · 145 views

Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...

Added on July 22, 2025 · 41:35 · 8.0K views

Academic talk from the Simons Institute presenting a unified framework for efficient linear layers in Transformers—highly relevant to deep-learning researchers and practitioners.

Joscha Bach - Why Your Thoughts Aren't Yours.

Added on July 22, 2025 · 112:46 · 127.3K views

Extended, in-depth interview with AI researcher Joscha Bach covering advanced AI architectures, cognition, and regulatory issues—valuable for AI and cognitive science audiences.

AI for science with Sir Paul Nurse, Demis Hassabis, Jennifer Doudna, and John Jumper

Added on July 22, 2025 · 54:23 · 107.2K views

Panel discussion with leading scientists on how AI accelerates scientific discovery; offers strategic and technical perspectives on AI applications in research.

Cyber Animism by Joscha Bach

Added on July 22, 2025 · 105:37 · 28.0K views

In-depth lecture by AI researcher Joscha Bach on philosophical and cognitive aspects of AI, valuable for understanding conceptual foundations and ethics.

Normalization models of attention

Added on July 22, 2025 · 70:34 · 1.0K views

Academic tutorial on computational models of visual attention with hands-on MATLAB code; directly relevant for researchers in computational neuroscience and AI.

François Chollet on OpenAI o-models and ARC

Added on July 22, 2025 · 86:47 · 85.8K views

How difficult is AI alignment? | Anthropic Research Salon

Added on July 22, 2025 · 28:05 · 28.1K views

Building Anthropic | A conversation with our co-founders

Added on July 22, 2025 · 51:49 · 101.0K views

(Ep.73) DeepSeek CEO interview in English.

Added on July 22, 2025 · 27:10 · 78.4K views

Can Latent Program Networks Solve Abstract Reasoning?

Added on July 22, 2025 · 51:26 · 14.8K views

The ARC Prize 2024 Winning Algorithm

Added on July 22, 2025 · 69:05 · 17.1K views

LSTM: The Comeback Story?

Added on July 22, 2025 · 67:02 · 25.3K views

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

Added on July 22, 2025 · 98:49 · 42.5K views

How DeepSeek Rewrote the Transformer [MLA]

Added on July 22, 2025 · 18:09 · 680.7K views

In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.

Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI"

Added on July 22, 2025 · 56:23 · 117.5K views

ARC-AGI-2 Overview With Francois Chollet

Added on July 22, 2025 · 21:31 · 16.0K views

Richard S. Sutton, Turing Award Winner | Approximately Correct

Added on July 22, 2025 · 32:51 · 4.5K views

Causal Representation Learning: A Natural Fit for Mechanistic Interpretability

Added on July 22, 2025 · 59:25 · 1.9K views

Advancing AI Reasoning - From Games to Complex Problem Solving | NVIDIA GTC 2025 Session

Added on July 22, 2025 · 40:05 · 8.7K views

SemiAnalysis Founder Dylan Patel on New AI Regulations, Chinese AI & xAI's Surge to Hyperscale

Added on July 22, 2025 · 84:57 · 29.3K views

How To Think About Thinking Models

Added on July 22, 2025 · 94:22 · 7.5K views

Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months

Added on July 22, 2025 · 75:49 · 312.5K views

Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.

On the Biology of a Large Language Model (Part 2)

Added on July 22, 2025 · 56:26 · 14.6K views

V.O. Complete. A masterclass from the pioneer of artificial intelligence. Jürgen Schmidhuber

Added on July 22, 2025 · 61:56 · 18.9K views

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11

Added on July 22, 2025 · 79:58 · 137.6K views

ORIGINAL FATHER OF AI ON DANGERS! (Prof. Jürgen Schmidhuber)

Added on July 22, 2025 · · views

Neel Does Research (Vibe Coding Edition)

Added on July 22, 2025 · 154:23 · 4.1K views

Kevin Ellis - Probabilistic Thinking in Language and Code - IPAM at UCLA

Added on July 22, 2025 · · views

What is the Transformers’ Context Window in Deep Learning? (and how to make it LONG)

Added on July 22, 2025 · 27:03 · 3.0K views

Autoencoders | Deep Learning Animated

Added on July 22, 2025 · 11:40 · 71.7K views

AI Olympics (multi-agent reinforcement learning)

Added on July 22, 2025 · 11:13 · 5.1M views

DeepMind’s AlphaEvolve AI: History In The Making!

Added on July 22, 2025 · · views

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

Added on July 22, 2025 · 23:16 · 18.5K views

Diffusion Models: DDPM | Generative AI Animated

Added on July 22, 2025 · 32:05 · 28.2K views

What is a Transformer? (Transformer Walkthrough Part 1/2)

Added on July 22, 2025 · 63:00 · 26.2K views

In-depth technical walkthrough of Transformer architecture by an AI researcher, directly aligned with deep-learning educational content.

Diffusion Models From Scratch | Score-Based Generative Models Explained | Math Explained

Added on July 22, 2025 · 38:11 · 49.4K views

The Unreasonable Effectiveness of JPEG: A Signal Processing Approach

Added on July 22, 2025 · 34:48 · 1.2M views

Dylan Patel (SemiAnalysis) on Multi-Datacenter Training @ Decentralized AI Day 2025

Added on July 22, 2025 · · views

CUDA Mode Keynote | Andrej Karpathy | Eureka Labs

Added on July 22, 2025 · 23:21 · 37.7K views

More Than Image Generators: A Science of Problem-Solving using Probability | Diffusion Models

Added on July 22, 2025 · 52:28 · 27.7K views

Agentic Engineering in Action with Mitchell Hashimoto

Added on July 22, 2025 · 61:04 · 29.8K views

ICML 2024 Tutorial"Machine Learning on Function spaces #NeuralOperators"

Added on July 22, 2025 · · views

The Breakthrough Behind Modern AI Image Generators | Diffusion Models Part 1

Added on July 22, 2025 · 24:23 · 56.6K views

Scaling Computing Performance Beyond the End of Moore’s Law: Song Han

Added on July 22, 2025 · 31:52 · 3.3K views

17.12.2024: Flow-based Models (Part 2)

Added on July 22, 2025 · · views

When AI Is Designed Like A Biological Brain

Added on July 22, 2025 · 9:58 · 65.0K views

Fireside Chat With Ilya Sutskever and Jensen Huang AI Today and Vision of the Future March 2023

Added on July 22, 2025 · 53:06 · 27.1K views

Information Theory for Language Models: Jack Morris

Added on July 22, 2025 · 78:13 · 7.6K views

Dylan Patel: GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

Added on July 22, 2025 · 62:17 · 214.2K views

Mind from Matter (Lecture By Joscha Bach)

Added on July 22, 2025 · 109:02 · 7.8K views

University lecture by cognitive scientist Joscha Bach examining AI architecture and machine consciousness; fits educational and technical focus on cognition and AI philosophy.

All the neurons of a neural network learning the sine function : network with 1 neuron per layer

Added on July 22, 2025 · · views

Zed Inferred: Diffusion Language Models

Added on July 22, 2025 · 62:36 · 4.3K views

Matt Squire - Diving into Transformer Model Internals | PyData London 25

Added on July 22, 2025 · 34:01 · 2.8K views

Luminal - Search-Based Deep Learning Compilers

Added on July 22, 2025 · 69:29 · 1.0K views

I Visualised Attention in Transformers

Added on July 22, 2025 · · views

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Added on July 22, 2025 · 47:51 · 14.6K views

In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.

The Attention Mechanism in Large Language Models

Added on July 22, 2025 · 21:01 · 126.5K views

Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.

OpenAI's Stable Code 3B: A Game-Changer for Coding & Programming Tasks

Added on July 22, 2025 · · views

Andrew Ng: Opportunities in AI - 2023

Added on July 22, 2025 · 36:55 · 2.0M views

Andrew Ng outlines current AI trends, enterprise adoption patterns, and startup opportunities with an emphasis on data-centric supervised learning.

Large Language Models in Five Formulas

Added on July 22, 2025 · 58:01 · 38.7K views

Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.

Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy

Added on July 22, 2025 · 71:40 · 874.4K views

Andrej Karpathy kicks off Stanford CS25 with a primer on Transformer architecture, its history, and cross-domain applications.

Dense Associative Memory in Machine Learning

Added on July 22, 2025 · 56:26 · 5.5K views

Research talk on Dense Associative Memory networks, exploring high-capacity energy-based models for pattern storage and retrieval.

Navigating Progress in AI and Neuroscience

Added on July 22, 2025 · 59:05 · 3.9K views

Talk explores reciprocal advances between neuroscience and AI, highlighting how brain insights inform interpretable machine-learning models.

An overview of Generative AI: music, video and image creation

Added on July 22, 2025 · 60:42 · 2.7K views

Google DeepMind’s Douglas Eck surveys state-of-the-art generative AI systems for music, video, and images, detailing model architectures and datasets.

1 - Introduction

Added on July 22, 2025 · · views

Marc Raibert: Boston Dynamics and the Future of Robotics | Lex Fridman Podcast #412

Added on July 22, 2025 · · views

Jeff Dean (Google): Exciting Trends in Machine Learning

Added on July 22, 2025 · 72:30 · 180.4K views

Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.

V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video (Explained)

Added on July 22, 2025 · 50:03 · 47.5K views

Paper walk-through of V-JEPA, detailing a predictive video representation model trained without labels for downstream vision tasks.

Let's build the GPT Tokenizer

Added on July 22, 2025 · 133:34 · 860.3K views

Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.

LoRA explained (and a bit about precision and quantization)

Added on July 22, 2025 · 17:06 · 100.2K views

Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.

How to Build an LLM from Scratch | An Overview

Added on July 22, 2025 · · views

Transformer Neural Network: Visually Explained

Added on July 22, 2025 · 10:50 · 10.1K views

Step-by-step visual and PyTorch implementation of the Transformer—covering self-attention, positional encoding, and multi-head mechanisms.

Floating Points are no more, Changes everything for LLMs!!!

Added on July 22, 2025 · · views

Sitan Chen - Provably learning a multi-head attention layer - IPAM at UCLA

Added on July 22, 2025 · · views

Fine-tune LLMs - Line by line code example

Added on July 22, 2025 · 8:20 · 4.7K views

Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.

AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

Added on July 22, 2025 · 28:46 · 41.0K views

Illustrated guide to Stable Diffusion explaining latent-diffusion training, CLIP text encoders, and reverse-diffusion image generation.

Stable Diffusion in Code (AI Image Generation) - Computerphile

Added on July 22, 2025 · 16:56 · 309.4K views

Computerphile coding session builds and tweaks Stable Diffusion models in Python/Colab, clarifying sampler parameters and latent spaces.

Tutorial | LLMs in 5 Formulas (360°)

Added on July 22, 2025 · · views

Let's build GPT: from scratch, in code, spelled out.

Added on July 22, 2025 · 116:20 · 6.0M views

End-to-end coding tutorial constructs a minimal GPT Transformer—including dataset, BPE tokenizer, self-attention, and training loop—from scratch.

George Hotz | Programming | rewriting linearizer (tinygrad) | Day In The Life Of A Software Engineer

Added on July 22, 2025 · 106:04 · 32.7K views

George Hotz refactors tinygrad’s linearizer, exposing low-level tensor compiler optimizations that map high-level ops to efficient GPU kernels.

Anthropic's Meta Prompt: A Must-try!

Added on July 22, 2025 · 12:34 · 97.3K views

Sholto Douglas & Trenton Bricken - How LLMs Actually Think

Added on July 22, 2025 · 193:13 · 181.2K views

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

Added on July 22, 2025 · 13:40 · 387.6K views

How fly neurons compute the direction of visual motion

Added on July 22, 2025 · 56:32 · 93.3K views

The Most Important Algorithm in Machine Learning

Added on July 22, 2025 · 40:08 · 767.5K views

If we don’t get AGI by GPT-7 (~$1T), will we just never get it? – Sholto Douglas & Trenton Bricken

Added on July 22, 2025 · · views

How Diffusion Works for Text

Added on July 22, 2025 · 42:24 · 6.6K views

H-Nets - the Past

Added on July 21, 2025 · 28 min read

H-Nets - the Future

Added on July 21, 2025 · 17 min read

Neural Scaling Laws by Data Manifold Dimensions

Added on July 21, 2025 · 14 min read

Adam with Aggressive Gradient Clipping ≈ Smoothed SignSGD/NormSGD

Added on July 21, 2025 · 6 min read

The State of Generative Models

Added on July 15, 2025 · 16 min read

DeepSeek Debrief: >128 Days Later – SemiAnalysis

Added on July 9, 2025 · 13 min read

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

Added on July 2, 2025 · 19 min read

The Illustrated AlphaFold

Added on June 28, 2025 · 51 min read

Muon and a Selective Survey on Steepest Descent in Riemannian and Non-Riemannian Manifolds

Added on June 27, 2025 · 41 min read

How I don't use LLMs

Added on June 27, 2025 · 10 min read

Continuous Thought Machines

Added on June 27, 2025 · 24 min read

How we built our multi-agent research system

Added on June 26, 2025 · 17 min read

Activation Atlas

Added on June 26, 2025 · 17 min read

World Models

Added on June 26, 2025 · 35 min read

DeepSeek-V3 Explained 1: Multi-head Latent Attention

Added on May 29, 2025 · 9 min read

Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT

Added on May 16, 2025 · 6 min read

You could have designed state of the art positional encoding

Added on May 16, 2025 · 13 min read

attention is logarithmic, actually

Added on May 16, 2025 · 11 min read

AI Arrives In The Middle East: US Strikes A Deal with UAE and KSA – SemiAnalysis

Added on May 16, 2025 · 14 min read

Transformers Represent Belief State Geometry in their Residual Stream

Added on May 16, 2025 · 19 min read

Llama from scratch (or how to implement a paper without crying)

Added on May 16, 2025 · 11 min read

The MAP-Elites Algorithm: Finding Optimality Through Diversity

Added on May 16, 2025 · 1 min read

How To Scale

Added on May 13, 2025 · 53 min read

Deep Dive into Yann LeCun’s JEPA

Added on May 6, 2025 · 29 min read

Are Transformers universal approximators of sequence-to-sequence functions?

Added on May 3, 2025 · 1 min read

a Hugging Face Space by nanotron

Added on May 3, 2025 · 1 min read

Training Large Language Models to Reason in a Continuous Latent Space

Added on April 24, 2025 · 50 min read

Training Large Language Models to Reason in a Continuous Latent Space

Added on April 22, 2025 · 1 min read

On the Biology of a Large Language Model

Added on April 22, 2025 · 3h 34m read

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Added on April 22, 2025 · 1 min read

"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"

Added on April 22, 2025 · 3 min read

Multi-layer language heads: the output latent is for text (and nothing else)

Added on April 19, 2025 · 4 min read

CS336: Language Modeling from Scratch

Added on April 19, 2025 · 4 min read

Contextualization Machines

Added on April 17, 2025 · 17 min read

What Is ChatGPT Doing … and Why Does It Work?

Added on April 15, 2025 · 1h 33m read

Position: Model Collapse Does Not Mean What You Think

Added on April 10, 2025 · 1 min read

RWKV Language Model

Added on April 7, 2025 · 1 min read

diffusion transformers

Added on April 5, 2025 · 1 min read

Circuit Tracing: Revealing Computational Graphs in Language Models

Added on March 29, 2025 · 3h 43m read

Advanced Performance Optimizations for Models

Added on March 29, 2025 · 26 min read

Softmax Attention is a Fluke

Added on March 24, 2025 · 13 min read

Transformers Laid Out

Added on March 23, 2025 · 38 min read

A friendly introduction to machine learning compilers and optimizers

Added on March 18, 2025 · 21 min read

Neural Networks, Manifolds, and Topology

Added on March 9, 2025 · 15 min read

Attention from Beginners Point of View

Added on March 9, 2025 · 2 min read

(How) Do Language Models Track State?

Added on March 9, 2025 · 1 min read

Why Attention Is All You NeedWhy Attention Is All You Need

Added on March 9, 2025 · 10 min read

Crossing the uncanny valley ofconversational voice

Added on March 1, 2025 · 11 min read

neural video codecs: the future of video compression

Added on February 17, 2025 · 23 min read

Unnamed Document

Added on January 25, 2025 · 1 min read

Unveiling_DeepSeek.pdf

Added on January 22, 2025 · 8 min read

DeepSeek-V3 Explained: A Deep Dive into the Next-Generation AI Model

Added on January 18, 2025 · 9 min read

by Marcus Hutter and David Quarel and Elliot Catt

Added on December 24, 2024 · 4 min read

Towards a Categorical Foundation of Deep Learning: A Survey

Added on December 22, 2024 · 1 min read

Soft question: Deep learning and higher categories

Added on December 22, 2024 · 1 min read

BLT__Patches_Scale_Better_Than_Tokens

Added on December 17, 2024 · 1h 3m read

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

Added on December 17, 2024 · 1h 26m read

Fundamental Components of Deep Learning: A category-theoretic approach

Added on December 17, 2024 · 6h 12m read

Gemini: A Family of Highly Capable Multimodal Models

Added on December 17, 2024 · 2h 29m read

Flow Matching Guide and Code

Added on December 17, 2024 · 2h 56m read

Mastering Board Games by External and Internal Planning with Language Models

Added on December 17, 2024 · 2h 43m read

Fundamental Components of Deep Learning: A category-theoretic approach

Added on December 16, 2024 · 6h 12m read

Genie 2: A large-scale foundation world model

Added on December 10, 2024 · 7 min read

WilliamYi96/Awesome-Energy-Based-Models: A curated list of resources on energy-based models.

Added on December 9, 2024 · 7 min read

"CBLL, Research Projects, Computational and Biological Learning Lab, Courant Institute, NYU"

Added on December 9, 2024 · 6 min read

yataobian/awesome-ebm: Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning)

Added on December 9, 2024 · 16 min read

Greg Yang

Added on December 5, 2024 · 7 min read

TS_Tutorial

Added on December 3, 2024 · 2h 13m read

How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.

Added on November 18, 2024 · 20 min read

ageron/handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Added on January 26, 2024 · 2 min read

Bookmarks