Nikhil Paleti

Blog

Notes on machine learning systems, GPU performance, and things I had to learn the hard way.

#pragma unroll

Notes on #pragma unroll: what latency-bound means on a GPU, why warps alone can't always fix it, and how the compiler uses unrolling to hide SFU and memory latency in SASS.

Positional Encoding Explained

A rebuilt and expanded version of my transformer positional encoding essay, now with cleaner math, figures, and implementation notes.