2024-01-01–ongoing
resources: 1656
tags: 202
last sync: 2025-09-01 20:31

This is a curated collection of browser bookmarks and YouTube videos I've gathered since 1st January 2024. Each resource is classified within a Wikipedia-inspired hierarchical tag system in an attempt to capture and organize my primary areas of interest.

The tag evolves continuously with the help of OpenAI's o3, which tends to its health and suggests refinements such as eliminating semantic duplicates, further subdividing nodes and ensuring that going down a level implies you've zoomed in on a narrower domain in a given field. This hopefully will help to keep the whole thing (hopefully) cohesive.

Quality varies semi-intentionally. There are many highly-qualitative resources but you'll also find more casual reads. This is because this is not a curated list in itself, the only moment of curation is me deciding this was good or interesting enough to bookmark.

To help with discoverability, you can find a fuzzy search bar. Each tag in tree also has a dedicated HTML page. Do note that some bookmarks are only tagged up to intermediate nodes!

Categories

Recent Bookmarks

Visualizing 6D Mesh Parallelism

Plus some lore

Inside vLLM: Anatomy of a High-Throughput LLM Inference System

From paged attention, continuous batching, prefix caching, specdec, etc. to multi-GPU, multi-node dynamic serving at scale.

The Parallelism Mesh Zoo

When training large scale LLMs, there is a large assortment of parallelization strategies which you can employ to scale your training runs to work on more GPUs.

POMDPs for Dummies

Tutorial for learning about solving partially observable Markov decision processes (POMDPs).

How Attention Sinks Keep Language Models Stable

We discovered why language models catastrophically fail on long conversations: when old tokens are removed to save memory, models produce complete gibberish. We found models dump massive attention onto the first few tokens as "attention sinks"—places to park unused attention since softmax requires weights to sum to 1. Our solution, StreamingLLM, simply keeps these first 4 tokens permanently while sliding the window for everything else, enabling stable processing of 4 million+ tokens instead of just thousands. This mechanism is now in HuggingFace, NVIDIA TensorRT-LLM, and OpenAI's latest models.

Statistics behind Block Sparse Attention

How can a language model comprehend a million-token document without drowning in O(N²) attention cost? A statistical model revealing the success of block sparse attention through learned similarity gaps.

Gödel’s Incompleteness Theorems

Gödel’s two incompleteness theorems are among the most important results in modern logic, and have deep implications for various issues.

What Alan T. did for his PhD

We’ve all been there before: by the time you start graduate school in Princeton, you’ve already invented the Turing machine, pioneered the concept of computational universality, and pro…

Lecture 3: Gödel, Turing, and Friends

On Thursday, I probably should've told you explicitly that I was compressing a whole math course into one lecture. On the one hand, that means I don't really expect you to have understood everything.

The Complete Idiot's Guide to the Independence of the Continuum Hypothesis: Part 1 of <=Aleph_0

A global pandemic, apocalyptic fires, and the possible descent of the US into violent anarchy three days from now can do strange things to the soul. Bertrand Russell—and if he’d done no…

See all bookmarks →

Recent Videos

What P vs NP is actually about

Explains the P vs NP problem by reducing arbitrary algorithms to SAT circuits, illustrating NP-completeness, reversibility, and implications for cryptography.

What Is an Interactive Theorem Prover? | Kevin Buzzard

Live demonstration of the Lean interactive theorem prover, showing how formal logic rules are encoded, manipulated, and verified, and discussing its role in mathematical research and future software tooling.

A Swift Introduction to Geometric Algebra

Provides a rapid, physics-motivated introduction to geometric algebra, covering multivectors, grades, geometric products, and rotors as an extension of linear-algebraic concepts.

A Brief Overview of Sheaf Theory - Part 1

The first lecture in a sheaf-theory series, defining presheaf stalks, sheafification, and exactness concepts such as kernels and images within a categorical framework.

Kan Academy: Introduction to Limits

An example-driven primer on categorical limits, building from sets and vector spaces to equalisers, fibre products, cones, and universal properties, aimed at newcomers to abstract category theory.

See all videos →