Skip to content

mixed-mla: replace AITER reference with pure pytorch baseline#113

Open
theodorechapman wants to merge 2 commits intogpu-mode:mainfrom
theodorechapman:mixed-mla-pytorch-reference
Open

mixed-mla: replace AITER reference with pure pytorch baseline#113
theodorechapman wants to merge 2 commits intogpu-mode:mainfrom
theodorechapman:mixed-mla-pytorch-reference

Conversation

@theodorechapman
Copy link

This replaces the mixed-mla reference implementation with a pure PyTorch baseline so the task no longer depends on AITER compilation, which has been slowing down submission time for the AMD competition

  • kept the same task-facing structure:
    • generate_input(...)
    • ref_kernel(...)
    • check_implementation
  • preserved the fp8 Q / fp8 KV decode path in pure PyTorch
  • kept kv_data generation for bf16, fp8, and mxfp4

This submission is not bitwise identical to the old mla_decode_fwd path.

However, When tested on a MI355X, the code passed comprehensive tests against the current AITER reference with max error in the 3.8e-06 to 1.5e-05 range. Got the same when I submitted it through popcorn as a custom kernel.

Only caveat is that this implementation is much slower then the AITER reference. AITER reference sits around 200µs whereas this Pytorch implementation sits around 4000

The reference kernel quantizes Q to fp8 on-the-fly and uses fp8 KV (a8w8 kernel),
which is ~2-3x faster than bf16 on MI355X with negligible accuracy loss.

Decode only — persistent mode with get_mla_metadata_v1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep these introduction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had this fixed locally haha, just pushed

@danielhua23
Copy link
Contributor

initial ok for me, but we need check if the two impls are exactly the same, since this is weekedn, we will answer you in two days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants