- My 2 cents on Fusing GEMM + Top-K + Softmax on SM100
- Optimizing 3D Square Convolution for cuDNN-like Performance - A Worklog
- Preconditioned SGD can level up your training game
- Understanding Lightning Attention: A Breakthrough in Linear Attention Efficiency
- From 10 to 1000 Tokens/Second: Cursor AI's Secret Weapon Revealed