Bookmarks
George Hotz | Programming | twitchchess | a simple neural chess AI | Part1
Live coding session where George Hotz designs and trains a simple neural-network chess engine, examining model architecture, training loop, and gameplay integration.
AI Predictions With ex-Applied AI engineer at Stripe!
Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.
Simple Artificial Neural Network entirely in assembly language
Demonstrates building and training a single-layer neural network entirely in x86-64 assembly language, covering forward pass, MSE loss, back-propagation, and low-level numeric routines.
Why Does Diffusion Work Better than Auto-Regression?
Explains the mechanics and trade-offs of modern generative models, contrasting autoregressive transformer pipelines with denoising diffusion processes and detailing why diffusion excels at image generation while transformers dominate text.
It's Not About Scale, It's About Abstraction
François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.
Stephen Wolfram - Where the Computational Paradigm Leads (in Physics, Tech, AI, Biology, Math, ...)
Stephen Wolfram’s keynote explores the broad “computational paradigm” as a unifying lens across physics, technology, AI, biology and mathematics—an ideas-driven talk without focused technical implementation details.
Street Fighting Transformers
Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.
How might LLMs store facts | Deep Learning Chapter 7
High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.
What Matters for Model Merging at Scale?
Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.
Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...
Academic talk from the Simons Institute presenting a unified framework for efficient linear layers in Transformers—highly relevant to deep-learning researchers and practitioners.
Joscha Bach - Why Your Thoughts Aren't Yours.
Extended, in-depth interview with AI researcher Joscha Bach covering advanced AI architectures, cognition, and regulatory issues—valuable for AI and cognitive science audiences.
Dylan Patel - Inference Math, Simulation, and AI Megaclusters - Stanford CS 229S - Autumn 2024
Stanford CS 229S lecture on large-scale inference math and AI megaclusters—direct, advanced technical content useful to ML researchers and engineers.
AI for science with Sir Paul Nurse, Demis Hassabis, Jennifer Doudna, and John Jumper
Panel discussion with leading scientists on how AI accelerates scientific discovery; offers strategic and technical perspectives on AI applications in research.
Cyber Animism by Joscha Bach
In-depth lecture by AI researcher Joscha Bach on philosophical and cognitive aspects of AI, valuable for understanding conceptual foundations and ethics.
Normalization models of attention
Academic tutorial on computational models of visual attention with hands-on MATLAB code; directly relevant for researchers in computational neuroscience and AI.
How DeepSeek Rewrote the Transformer [MLA]
In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.
Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months
Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.
What is a Transformer? (Transformer Walkthrough Part 1/2)
In-depth technical walkthrough of Transformer architecture by an AI researcher, directly aligned with deep-learning educational content.
Mind from Matter (Lecture By Joscha Bach)
University lecture by cognitive scientist Joscha Bach examining AI architecture and machine consciousness; fits educational and technical focus on cognition and AI philosophy.
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.
The Attention Mechanism in Large Language Models
Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.
Andrew Ng: Opportunities in AI - 2023
Andrew Ng outlines current AI trends, enterprise adoption patterns, and startup opportunities with an emphasis on data-centric supervised learning.
Large Language Models in Five Formulas
Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
Andrej Karpathy kicks off Stanford CS25 with a primer on Transformer architecture, its history, and cross-domain applications.
Dense Associative Memory in Machine Learning
Research talk on Dense Associative Memory networks, exploring high-capacity energy-based models for pattern storage and retrieval.
Navigating Progress in AI and Neuroscience
Talk explores reciprocal advances between neuroscience and AI, highlighting how brain insights inform interpretable machine-learning models.
An overview of Generative AI: music, video and image creation
Google DeepMind’s Douglas Eck surveys state-of-the-art generative AI systems for music, video, and images, detailing model architectures and datasets.
Jeff Dean (Google): Exciting Trends in Machine Learning
Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.
V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video (Explained)
Paper walk-through of V-JEPA, detailing a predictive video representation model trained without labels for downstream vision tasks.
Let's build the GPT Tokenizer
Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.
LoRA explained (and a bit about precision and quantization)
Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.
Transformer Neural Network: Visually Explained
Step-by-step visual and PyTorch implementation of the Transformer—covering self-attention, positional encoding, and multi-head mechanisms.
Fine-tune LLMs - Line by line code example
Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.
AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
Illustrated guide to Stable Diffusion explaining latent-diffusion training, CLIP text encoders, and reverse-diffusion image generation.
Stable Diffusion in Code (AI Image Generation) - Computerphile
Computerphile coding session builds and tweaks Stable Diffusion models in Python/Colab, clarifying sampler parameters and latent spaces.
Let's build GPT: from scratch, in code, spelled out.
End-to-end coding tutorial constructs a minimal GPT Transformer—including dataset, BPE tokenizer, self-attention, and training loop—from scratch.
George Hotz | Programming | rewriting linearizer (tinygrad) | Day In The Life Of A Software Engineer
George Hotz refactors tinygrad’s linearizer, exposing low-level tensor compiler optimizations that map high-level ops to efficient GPU kernels.
Subcategories
- architectures (51)
- compression (12)
- computer_vision (11)
- ethics (12)
- game_ai (4)
- generative_models (36)
- interpretability (33)
- language_models (72)
- natural_language_processing (16)
- neural_networks (26)
- optimization (19)
- reinforcement_learning (16)