Bookmarks

George Hotz | Programming | twitchchess | a simple neural chess AI | Part1

Live coding session where George Hotz designs and trains a simple neural-network chess engine, examining model architecture, training loop, and gameplay integration.

AI Predictions With ex-Applied AI engineer at Stripe!

Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.

Simple Artificial Neural Network entirely in assembly language

Demonstrates building and training a single-layer neural network entirely in x86-64 assembly language, covering forward pass, MSE loss, back-propagation, and low-level numeric routines.

Why Does Diffusion Work Better than Auto-Regression?

Explains the mechanics and trade-offs of modern generative models, contrasting autoregressive transformer pipelines with denoising diffusion processes and detailing why diffusion excels at image generation while transformers dominate text.

It's Not About Scale, It's About Abstraction

François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.

Stephen Wolfram - Where the Computational Paradigm Leads (in Physics, Tech, AI, Biology, Math, ...)

Stephen Wolfram’s keynote explores the broad “computational paradigm” as a unifying lens across physics, technology, AI, biology and mathematics—an ideas-driven talk without focused technical implementation details.

Street Fighting Transformers

Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.

How might LLMs store facts | Deep Learning Chapter 7

High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.

What Matters for Model Merging at Scale?

Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...

Academic talk from the Simons Institute presenting a unified framework for efficient linear layers in Transformers—highly relevant to deep-learning researchers and practitioners.

Joscha Bach - Why Your Thoughts Aren't Yours.

Extended, in-depth interview with AI researcher Joscha Bach covering advanced AI architectures, cognition, and regulatory issues—valuable for AI and cognitive science audiences.

Dylan Patel - Inference Math, Simulation, and AI Megaclusters - Stanford CS 229S - Autumn 2024

Stanford CS 229S lecture on large-scale inference math and AI megaclusters—direct, advanced technical content useful to ML researchers and engineers.

AI for science with Sir Paul Nurse, Demis Hassabis, Jennifer Doudna, and John Jumper

Panel discussion with leading scientists on how AI accelerates scientific discovery; offers strategic and technical perspectives on AI applications in research.

Cyber Animism by Joscha Bach

In-depth lecture by AI researcher Joscha Bach on philosophical and cognitive aspects of AI, valuable for understanding conceptual foundations and ethics.

Normalization models of attention

Academic tutorial on computational models of visual attention with hands-on MATLAB code; directly relevant for researchers in computational neuroscience and AI.

François Chollet on OpenAI o-models and ARC

Into the Realm Categorical

(Ep.73) DeepSeek CEO interview in English.

Can Latent Program Networks Solve Abstract Reasoning?

The ARC Prize 2024 Winning Algorithm

LSTM: The Comeback Story?

How DeepSeek Rewrote the Transformer [MLA]

In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.

ARC-AGI-2 Overview With Francois Chollet

How To Think About Thinking Models

Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months

Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.

On the Biology of a Large Language Model (Part 2)

Autoencoders | Deep Learning Animated

AI Olympics (multi-agent reinforcement learning)

What is a Transformer? (Transformer Walkthrough Part 1/2)

In-depth technical walkthrough of Transformer architecture by an AI researcher, directly aligned with deep-learning educational content.

Agentic Engineering in Action with Mitchell Hashimoto

When AI Is Designed Like A Biological Brain

Mind from Matter (Lecture By Joscha Bach)

University lecture by cognitive scientist Joscha Bach examining AI architecture and machine consciousness; fits educational and technical focus on cognition and AI philosophy.

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.

The Attention Mechanism in Large Language Models

Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.

Andrew Ng: Opportunities in AI - 2023

Andrew Ng outlines current AI trends, enterprise adoption patterns, and startup opportunities with an emphasis on data-centric supervised learning.

Large Language Models in Five Formulas

Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.

Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy

Andrej Karpathy kicks off Stanford CS25 with a primer on Transformer architecture, its history, and cross-domain applications.

Dense Associative Memory in Machine Learning

Research talk on Dense Associative Memory networks, exploring high-capacity energy-based models for pattern storage and retrieval.

Navigating Progress in AI and Neuroscience

Talk explores reciprocal advances between neuroscience and AI, highlighting how brain insights inform interpretable machine-learning models.

An overview of Generative AI: music, video and image creation

Google DeepMind’s Douglas Eck surveys state-of-the-art generative AI systems for music, video, and images, detailing model architectures and datasets.

1 - Introduction

Jeff Dean (Google): Exciting Trends in Machine Learning

Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.

V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video (Explained)

Paper walk-through of V-JEPA, detailing a predictive video representation model trained without labels for downstream vision tasks.

Let's build the GPT Tokenizer

Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.

LoRA explained (and a bit about precision and quantization)

Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.

Transformer Neural Network: Visually Explained

Step-by-step visual and PyTorch implementation of the Transformer—covering self-attention, positional encoding, and multi-head mechanisms.

Fine-tune LLMs - Line by line code example

Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.

AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

Illustrated guide to Stable Diffusion explaining latent-diffusion training, CLIP text encoders, and reverse-diffusion image generation.

Stable Diffusion in Code (AI Image Generation) - Computerphile

Computerphile coding session builds and tweaks Stable Diffusion models in Python/Colab, clarifying sampler parameters and latent spaces.

Tutorial | LLMs in 5 Formulas (360°)

Let's build GPT: from scratch, in code, spelled out.

End-to-end coding tutorial constructs a minimal GPT Transformer—including dataset, BPE tokenizer, self-attention, and training loop—from scratch.

George Hotz | Programming | rewriting linearizer (tinygrad) | Day In The Life Of A Software Engineer

George Hotz refactors tinygrad’s linearizer, exposing low-level tensor compiler optimizations that map high-level ops to efficient GPU kernels.

Anthropic's Meta Prompt: A Must-try!

How fly neurons compute the direction of visual motion

How Diffusion Works for Text

H-Nets - the Past

H-Nets - the Future

The State of Generative Models

How I don't use LLMs

Continuous Thought Machines

Activation Atlas

World Models

DeepSeek-V3 Explained 1: Multi-head Latent Attention

How To Scale

Deep Dive into Yann LeCun’s JEPA

a Hugging Face Space by nanotron

On the Biology of a Large Language Model

Contextualization Machines

RWKV Language Model

Neural Networks, Manifolds, and Topology

(How) Do Language Models Track State?

neural video codecs: the future of video compression

Unnamed Document

Unveiling_DeepSeek.pdf

Flow Matching Guide and Code

Greg Yang

TS_Tutorial

2305.20091

Chess-GPT's Internal World Model

Twitter's Recommendation Algorithm

Recommender Systems: A Primer

Reader

Speech-to-text models

A Recipe for Training Neural Networks

1-bit Model

Heatmaps and CNNs Using Fast.ai

KAN: Kolmogorov–Arnold Networks

KAN: Kolmogorov-Arnold Networks

Root Mean Square Layer Normalization

Root Mean Square Layer Normalization

Pattern Recognition and Machine Learning

þÿThe Little Book of Deep Learning

gemini_v1_5_report

How to Use t-SNE Effectively

Deep Learning Course

2309.10668

Turing-1951 Intelligent Machinery-a Heretical Theory

Self-Rewarding Language Models

Pruning vs Quantization: Which is Better?

Mixtral of Experts

Practical Deep Learning for Coders 2022

Attention? Attention!

Pen and Paper Exercises in Machine Learning

Tensor2Tensor Intro

Generative Agents: Interactive Simulacra of Human Behavior

Subcategories