Bookmarks
AI Predictions With ex-Applied AI engineer at Stripe!
Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.
It's Not About Scale, It's About Abstraction
François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.
Street Fighting Transformers
Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.
How might LLMs store facts | Deep Learning Chapter 7
High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.
What Matters for Model Merging at Scale?
Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.
How DeepSeek Rewrote the Transformer [MLA]
In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.
Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months
Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.
The Attention Mechanism in Large Language Models
Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.
Large Language Models in Five Formulas
Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.
Jeff Dean (Google): Exciting Trends in Machine Learning
Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.
Let's build the GPT Tokenizer
Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.
LoRA explained (and a bit about precision and quantization)
Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.
Fine-tune LLMs - Line by line code example
Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.