Posts Tagged "cuda"

Twelve Attempts at an FP4 Kernel

A worklog of NVFP4 kernels, failed experiments, and one stubborn memory bus

Honey, I Tiled the Tensors

Shapes, Strides, Swizzles and Suffering! - An intro to Layout Algebra

(Updated on )