Bookmarks

Who needs malloc anyways?

Live-coded session that implements a bump allocator from scratch, demonstrates eliminating dynamic malloc calls during module initialization, and refactors dependent modules to use the custom allocator.

Proficient Parallel Programming - King Butcher - Software You Can Love VC 2023

Practical talk on parallel programming that outlines why naïve multithreading can degrade performance and details debugging techniques, cache effects, and CPU utilization strategies for writing efficient concurrent code.

Improving Learn OpenGL's Text Rendering Example | Adventures in Coding

Walk-through of refactoring LearnOpenGL’s font-rendering sample—profiling GPU/CPU bottlenecks, redesigning glyph batching, and achieving a 10× frame-rate improvement.

Simple Code, High Performance

A case study demonstrating how refactoring large, complex codebases into simpler designs can yield order-of-magnitude speedups, with detailed profiling and optimization techniques.

Understanding Compiler Optimization - Chandler Carruth - Opening Keynote Meeting C++ 2015

Chandler Carruth explains how modern C++ compilers perform optimization passes, inlining, and code generation, helping developers write code that the optimizer can better transform into efficient machine instructions.

CppCon 2016: Timur Doumler “Want fast C++? Know your hardware!"

CppCon talk illustrating how cache hierarchies, branch prediction, alignment, and SIMD influence C++ performance and providing guidelines for writing hardware-conscious, high-speed code.

Casey Muratori on his work experience

A wide-ranging interview with Casey Muratori focusing on lessons learned from decades of writing highly optimized code and cultivating a performance-aware programming mindset.

C++ cache locality and branch predictability

Practical C++ demonstration of how cache locality and branch prediction affect real-world runtime, showcasing code patterns and optimizations to exploit modern CPU behaviour for faster programs.

Episode 048: Why TigerBeetle Is So Slow, With Tobi!

A live coding / debugging session with the TigerBeetle team that digs into tracing, measurement and architectural choices to diagnose the latency bottlenecks inside their distributed financial-ledger database.

04 CUDA Fundamental Optimization Part 2

Lecture on CUDA fundamental optimizations provides specialized technical guidance for high-performance GPU computing.

Enter The Arena: Simplifying Memory Management (2023)

In-depth talk on arena allocators and lifetime management, offering practical memory-management strategies for systems programmers.

CppCon 2018: Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”

CppCon lecture presenting data-oriented design versus OOP with concrete performance case studies—highly relevant to C++ practitioners.

Linking can be fast (if you cheat): Roc's Surgical Linker - Brendan Hansknecht

Conference talk detailing a novel fast linking approach for the Roc language, directly relevant to compilers/linkers and build performance.

CppCon 2017: Carl Cook “When a Microsecond Is an Eternity: High Performance Trading Systems in C++”

CppCon conference talk delivering in-depth techniques for ultra-low-latency C++ systems, directly relevant to performance-critical software engineering.

CppCon 2018: Alan Talbot “Moving Faster: Everyday efficiency in modern C++”

CppCon talk focused on everyday performance techniques in modern C++, directly useful for software engineers concerned with optimization.

RollerCoaster Tycoon was the last of its kind.

BLAZINGLY FAST C++ Optimizations

Focuses on techniques for high-performance C++ code, aligning with software optimization and best practices.

AI Coding: Compute Shaders vs CUDA vs openCL - Dr. Fuhua (Frank) Cheng

Dennis Gustafsson – Parallelizing the physics solver – BSC 2025

Conference talk detailing techniques for parallelizing a physics solver; highly relevant to concurrency and performance optimization in software engineering.

WHY IS THE HEAP SO SLOW?

Parsing Protobuf Like Never Before

On Bloat

Why is Yazi fast?

Data-Oriented Design

On Competing with C Using Haskell

Performance

Daniel Lemire's blog

Program tuning as a resource allocation problem

How web bloat impacts users with slow connections

applicative-mental-models

applicative-mental-models

Cache-Oblivious Algorithms

Unnamed Document

When Network is Faster than Cache

resume.txt

Text Buffer Reimplementation

Your ABI is Probably Wrong

Why is Python slow

John Carmack on Inlined Code

zackoverflow

Subcategories