mixed-mla: replace AITER reference with pure pytorch baseline by theodorechapman · Pull Request #113 · gpu-mode/reference-kernels

theodorechapman · 2026-03-07T19:48:43Z

This replaces the mixed-mla reference implementation with a pure PyTorch baseline so the task no longer depends on AITER compilation, which has been slowing down submission time for the AMD competition

kept the same task-facing structure:
- generate_input(...)
- ref_kernel(...)
- check_implementation
preserved the fp8 Q / fp8 KV decode path in pure PyTorch
kept kv_data generation for bf16, fp8, and mxfp4

This submission is not bitwise identical to the old mla_decode_fwd path.

However, When tested on a MI355X, the code passed comprehensive tests against the current AITER reference with max error in the 3.8e-06 to 1.5e-05 range. Got the same when I submitted it through popcorn as a custom kernel.

Only caveat is that this implementation is much slower then the AITER reference. AITER reference sits around 200µs whereas this Pytorch implementation sits around 4000

pure PyTorch baseline

danielhua23 · 2026-03-08T03:55:41Z

problems/amd_202602/mixed-mla/reference.py

-The reference kernel quantizes Q to fp8 on-the-fly and uses fp8 KV (a8w8 kernel),
-which is ~2-3x faster than bf16 on MI355X with negligible accuracy loss.
-
-Decode only — persistent mode with get_mla_metadata_v1.


keep these introduction

Had this fixed locally haha, just pushed

danielhua23 · 2026-03-08T03:58:42Z

initial ok for me, but we need check if the two impls are exactly the same, since this is weekedn, we will answer you in two days

mixed-mla: replace AITER reference with

7df7dcd

pure PyTorch baseline

danielhua23 reviewed Mar 8, 2026

View reviewed changes

fixed top comment

79d10eb

danielhua23 mentioned this pull request Mar 8, 2026

Pytorch only reference for mxfp4 to avoid aiter compile on each GH actions job #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixed-mla: replace AITER reference with pure pytorch baseline#113

mixed-mla: replace AITER reference with pure pytorch baseline#113
theodorechapman wants to merge 2 commits intogpu-mode:mainfrom
theodorechapman:mixed-mla-pytorch-reference

theodorechapman commented Mar 7, 2026

Uh oh!

danielhua23 Mar 8, 2026

Uh oh!

theodorechapman Mar 8, 2026

Uh oh!

danielhua23 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

theodorechapman commented Mar 7, 2026

Uh oh!

danielhua23 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

theodorechapman Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

danielhua23 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants