Bookmarks

PyTorch is dead. Long live JAX.

DragonflyDB Architecture Overview, Internals, and Trade-offs - hitting 6.43 million ops/sec

Technical deep dive into DragonflyDB’s architecture, concurrency model, memory layout, and trade-offs that enable >6 M ops/sec in a Redis-compatible distributed datastore.

Failure & Change: Principles of Reliable Systems • Mark Hibberd • YOW! 2018

Conference talk distilling principles and practices for designing, operating and evolving large-scale, failure-tolerant software systems, with emphasis on complexity, change management and reliability patterns.

Database Scalability

Episode 048: Why TigerBeetle Is So Slow, With Tobi!

A live coding / debugging session with the TigerBeetle team that digs into tracing, measurement and architectural choices to diagnose the latency bottlenecks inside their distributed financial-ledger database.

Dylan Patel - Inference Math, Simulation, and AI Megaclusters - Stanford CS 229S - Autumn 2024

Stanford CS 229S lecture on large-scale inference math and AI megaclusters—direct, advanced technical content useful to ML researchers and engineers.

You Don't Know Network Programming

Although recorded as a live Twitch stream, it is a hands-on coding session on low-level TCP/network programming with references to code and technical articles, providing educational value.

Designing Distributed Systems with TLA+ • Hillel Wayne • YOW! 2019

System Design for Next-Gen Frontier Models — Dylan Patel, SemiAnalysis

A distributed systems reliability glossary

Why Resonate

Consistency Models

a Hugging Face Space by nanotron

6.824 Schedule: Spring 2022

Causal ordering

Internal consistency in streaming systems

You own your data, in spite of the cloud

A Distributed Systems Reading List

Subcategories