Bookmarks

A Beginner's Guide to Vectorization By Hand: Part 3

We're continuing our expendition to the world of manual vectorization. In this part, we will explain the most common technique for vectorizing conditional code (usually referred as if-conversion).

A Beginner's Guide to Vectorization By Hand: Part 1

The CPU vendors have been trying for a lot of time to exploit as much parallelism as they can and the introduction of vector instructions is one way to go.

Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)

General Lessons from Boosting Data Ingestion in the range-set-blaze Crate by 7x

Fast Multidimensional Matrix Multiplication on CPU from Scratch

Numpy can multiply two 1024x1024 matrices on a 4-core Intel CPU in ~8ms.This is incredibly fast, considering this boils down to 18 FLOPs / core / cycle, with...

Comparing SIMD on x86-64 and arm64

The text compares SIMD implementations using SSE on x86-64 and Neon on arm64 processors, including emulating SSE on arm64 with Neon. It explores vectorized code performance using intrinsics, auto-vectorization, and ISPC, highlighting the efficiency of SSE and Neon implementations. The study shows how optimizing for SIMD instructions significantly boosts performance over scalar implementations in ray-box intersection tests.

Chapter 2 Basics of SIMD Programming

The text explains how to organize data for SIMD operations and provides examples of SIMD-Ready Vectors. It also discusses the relationship between vectors and scalars in SIMD programming. Built-in functions for VMX instructions and SIMD operation principles are outlined in the text.

Subcategories