meshy

Notes & experiments

Working notes on GPU kernels, training systems, and related experiments.

Entries
ForgeTrain vs FlashAttention-3 / 4
Forward-pass benchmark of three attention kernels on an H100 PCIe, with methodology and per-shape results.
/fa-bench · bf16 · causal · D=128