Bookmarks
Who needs malloc anyways?
Live-coded session that implements a bump allocator from scratch, demonstrates eliminating dynamic malloc calls during module initialization, and refactors dependent modules to use the custom allocator.
Proficient Parallel Programming - King Butcher - Software You Can Love VC 2023
Practical talk on parallel programming that outlines why naïve multithreading can degrade performance and details debugging techniques, cache effects, and CPU utilization strategies for writing efficient concurrent code.
Improving Learn OpenGL's Text Rendering Example | Adventures in Coding
Walk-through of refactoring LearnOpenGL’s font-rendering sample—profiling GPU/CPU bottlenecks, redesigning glyph batching, and achieving a 10× frame-rate improvement.
Simple Code, High Performance
A case study demonstrating how refactoring large, complex codebases into simpler designs can yield order-of-magnitude speedups, with detailed profiling and optimization techniques.
Understanding Compiler Optimization - Chandler Carruth - Opening Keynote Meeting C++ 2015
Chandler Carruth explains how modern C++ compilers perform optimization passes, inlining, and code generation, helping developers write code that the optimizer can better transform into efficient machine instructions.
CppCon 2016: Timur Doumler “Want fast C++? Know your hardware!"
CppCon talk illustrating how cache hierarchies, branch prediction, alignment, and SIMD influence C++ performance and providing guidelines for writing hardware-conscious, high-speed code.
Casey Muratori on his work experience
A wide-ranging interview with Casey Muratori focusing on lessons learned from decades of writing highly optimized code and cultivating a performance-aware programming mindset.
C++ cache locality and branch predictability
Practical C++ demonstration of how cache locality and branch prediction affect real-world runtime, showcasing code patterns and optimizations to exploit modern CPU behaviour for faster programs.
Episode 048: Why TigerBeetle Is So Slow, With Tobi!
A live coding / debugging session with the TigerBeetle team that digs into tracing, measurement and architectural choices to diagnose the latency bottlenecks inside their distributed financial-ledger database.
04 CUDA Fundamental Optimization Part 2
Lecture on CUDA fundamental optimizations provides specialized technical guidance for high-performance GPU computing.
Enter The Arena: Simplifying Memory Management (2023)
In-depth talk on arena allocators and lifetime management, offering practical memory-management strategies for systems programmers.
CppCon 2018: Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”
CppCon lecture presenting data-oriented design versus OOP with concrete performance case studies—highly relevant to C++ practitioners.
Linking can be fast (if you cheat): Roc's Surgical Linker - Brendan Hansknecht
Conference talk detailing a novel fast linking approach for the Roc language, directly relevant to compilers/linkers and build performance.
CppCon 2017: Carl Cook “When a Microsecond Is an Eternity: High Performance Trading Systems in C++”
CppCon conference talk delivering in-depth techniques for ultra-low-latency C++ systems, directly relevant to performance-critical software engineering.
CppCon 2018: Alan Talbot “Moving Faster: Everyday efficiency in modern C++”
CppCon talk focused on everyday performance techniques in modern C++, directly useful for software engineers concerned with optimization.
BLAZINGLY FAST C++ Optimizations
Focuses on techniques for high-performance C++ code, aligning with software optimization and best practices.
Dennis Gustafsson – Parallelizing the physics solver – BSC 2025
Conference talk detailing techniques for parallelizing a physics solver; highly relevant to concurrency and performance optimization in software engineering.