Ludwig - cs/computer_architecture/hardware/optimization

Advanced Performance Optimizations for Models

Added on March 29, 2025

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal

ai/deep_learning

Algorithms for Modern Hardware

Added on November 19, 2024

Its intended audience is everyone from performance engineers and practical algorithm researchers to undergraduate computer science students who have just finished an advanced algorithms course and want to learn more practical ways to speed up a program than by going from O(nlogn) to O(nloglogn).

cs/software_development/performance_optimization

Optimizing subroutines in assembly language

Added on July 29, 2024

Optimizing subroutines in assembly language involves various techniques such as using inline assembly in a C++ compiler, separating code using MMX registers from code using ST registers, and understanding different register sizes and memory operands. It is important to consider the use of instruction prefixes, intrinsic functions for vector operations, and accessing class and structure members efficiently. Additionally, preventing false dependences, aligning loop and subroutine entries, and optimizing instruction sizes can improve performance. However, it is crucial to note that these optimizations are processor-specific and may vary depending on the target platform.

cs/computer_architecture/low_level

Unknown

Added on July 9, 2024

Hardware prefetching in multicore processors can be too aggressive, wasting resources and impacting performance for co-running threads. Combining hardware and software prefetching can optimize performance by efficiently handling irregular memory accesses. A method described in Paper II offers a low-overhead framework for accurate software prefetching in applications with irregular access patterns.

Bookmarks

Advanced Performance Optimizations for Models

Algorithms for Modern Hardware

Optimizing subroutines in assembly language

Unknown

Subcategories