meshy
Notes & experiments
Working notes on GPU kernels, training systems, and related experiments.
Entries
ForgeTrain vs FlashAttention-3 / 4
→
Forward-pass benchmark of three attention kernels on an H100 PCIe, with methodology and per-shape results.
/fa-bench · bf16 · causal · D=128