Add evaluator security note and remediation plan by josusanmartin · Pull Request #115 · gpu-mode/reference-kernels

josusanmartin · 2026-03-08T02:23:07Z

Summary

This PR adds a repository-level security note describing the in-process evaluator trust-boundary issue that affects multiple challenge families, along with an immediate/short-term/long-term remediation plan.

What this adds

EVALUATOR_SECURITY.md with:
- a concise description of the evaluator issue
- the evaluator families that share the pattern
- a conservative record of what was directly verified on the live service
- a quantitative note on implausible public timings such as matmul_v2 at 0.001 µs
- a staged remediation proposal
a short pointer from README.md

Scope

This PR is documentation-first. It does not include exploit payloads or attempt to land a large evaluator refactor in one change.

Why docs first

The current issue is architectural and spans multiple evaluator families (amd_202602, pmpp_v2, nvidia, amd, amd_distributed, helion, bioml, pmpp). A docs-first PR creates a clear remediation target without mixing disclosure and a partial code fix.

Proposed follow-up work

add evaluator self-integrity checks as a short-term mitigation
split trusted evaluator logic from untrusted submission.py execution in a follow-up refactor
re-run affected leaderboards after patching

Add evaluator security note and remediation plan

2b859bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluator security note and remediation plan#115

Add evaluator security note and remediation plan#115
josusanmartin wants to merge 1 commit intogpu-mode:mainfrom
josusanmartin:evaluator-security-report

josusanmartin commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

josusanmartin commented Mar 8, 2026

Summary

What this adds

Scope

Why docs first

Proposed follow-up work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant