Mar 21, 2026
#pragma unroll
Notes on #pragma unroll: what latency-bound means on a GPU, why warps alone can't always fix it, and how the compiler uses unrolling to hide SFU and memory latency in SASS.
Notes on machine learning systems, GPU performance, and things I had to learn the hard way.
Mar 21, 2026
Notes on #pragma unroll: what latency-bound means on a GPU, why warps alone can't always fix it, and how the compiler uses unrolling to hide SFU and memory latency in SASS.
Jul 5, 2024
A rebuilt and expanded version of my transformer positional encoding essay, now with cleaner math, figures, and implementation notes.