Bookmarks

AI Predictions With ex-Applied AI engineer at Stripe!

Fireside-style discussion with a former Stripe applied-AI engineer about the technical evolution of GPT-3/4 and other generative language models, and the broader ethical, workforce, and AGI implications of such transformer-based systems.

It's Not About Scale, It's About Abstraction

François Chollet’s AGI-24 keynote critiques current LLM capabilities, uses ARC benchmark results to expose compositional reasoning gaps, and proposes integrating transformer models with program-synthesis to achieve more abstract, generalizable language intelligence.

Street Fighting Transformers

Sasha Rush delivers practical estimation techniques for Transformer/LLM models, beneficial for ML researchers and practitioners.

How might LLMs store facts | Deep Learning Chapter 7

High-quality educational lecture on how transformers store factual information, directly relevant to AI interpretability.

What Matters for Model Merging at Scale?

Technical summary of a current arXiv paper on large-scale model merging, providing up-to-date insights for ML practitioners.

François Chollet on OpenAI o-models and ARC

Building Anthropic | A conversation with our co-founders

The ARC Prize 2024 Winning Algorithm

LSTM: The Comeback Story?

How DeepSeek Rewrote the Transformer [MLA]

In-depth analysis of a transformer variant (DeepSeek MLA) covering architecture, performance, and equations—highly relevant deep-learning material.

Mark Zuckerberg – AI Will Write Most Meta Code in 18 Months

Long-form interview with Mark Zuckerberg discussing Llama 4, productivity gains from AI coding tools, and broader AGI implications—useful insight for AI/ML practitioners and researchers.

Neel Does Research (Vibe Coding Edition)

DeepMind’s AlphaEvolve AI: History In The Making!

CUDA Mode Keynote | Andrej Karpathy | Eureka Labs

Agentic Engineering in Action with Mitchell Hashimoto

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

In-depth review of a recent research paper on Energy-Based Transformers, offering technical insights into advanced deep-learning architectures.

The Attention Mechanism in Large Language Models

Visual, high-level explanation of scaled dot-product attention and why it enables large language models to capture long-range dependencies.

Large Language Models in Five Formulas

Tutorial distills LLM behavior into five key formulas—perplexity, attention, GEMM efficiency, scaling laws, and RASP reasoning.

Jeff Dean (Google): Exciting Trends in Machine Learning

Jeff Dean reviews recent algorithmic and hardware advances enabling Gemini-class multimodal LLMs and highlights scientific applications.

Let's build the GPT Tokenizer

Andrej Karpathy codes a GPT Byte-Pair-Encoding tokenizer from scratch, dissecting Unicode handling and frequency-based merges.

LoRA explained (and a bit about precision and quantization)

Concise primer on LoRA and QLoRA, showing how low-rank adapters enable parameter-efficient fine-tuning of Transformer models under quantization.

Fine-tune LLMs - Line by line code example

Hands-on Jupyter notebook walks line-by-line through performing LoRA fine-tuning of a large language model using HuggingFace PEFT.

Tutorial | LLMs in 5 Formulas (360°)

Anthropic's Meta Prompt: A Must-try!

Sholto Douglas & Trenton Bricken - How LLMs Actually Think

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

How I don't use LLMs

a Hugging Face Space by nanotron

cs ai

Contextualization Machines

Unveiling_DeepSeek.pdf

The MiniPile Challenge for Data-Efficient Language Models

[2305.13009] Textually Pretrained Speech Language Models

The Annotated Transformer

gemini_v1_5_report

2309.10668

mlx-examples/lora at main · ml-explore/mlx-examples · GitHub

Mixtral of Experts

An Intuition for Attention

Tensor2Tensor Intro

The Random Transformer

MotionGPT: Human Motion as a Foreign Language

Subcategories