Skip to content

Lock B200 GPU clocks during benchmarks for reproducible results#118

Closed
msaroufim wants to merge 8 commits intomainfrom
feat/gpu-clock-lockdown
Closed

Lock B200 GPU clocks during benchmarks for reproducible results#118
msaroufim wants to merge 8 commits intomainfrom
feat/gpu-clock-lockdown

Conversation

@msaroufim
Copy link
Member

@msaroufim msaroufim commented Mar 10, 2026

No description provided.

Add gpu_lockdown() context manager to utils.py that locks SM/memory
clocks and power cap via nvidia-smi before benchmark and leaderboard
runs. Clocks are queried dynamically from the GPU and reset on exit.
Adapted from tritonbench/.ci/gpu/tune-b200.sh.

Controlled by POPCORN_GPU_LOCKDOWN env var (default: enabled).
Always lock clocks — no need for a toggle. Each _sudo_nvsmi call
already logs its own warning on failure, so the ok accumulator was
redundant.
Use check_call/check_output instead of swallowing errors — if we
can't lock clocks the benchmark numbers are unreliable so we should
not silently continue.
No need to unlock clocks — the OS resets them when the process exits.
Just lock once at the start of benchmark/leaderboard mode.
No reason to hardcode 750W when we can query power.max_limit the
same way we query max clocks.
Locking SM and memory clocks to fixed frequencies already prevents
drift. Setting power limit to max is a no-op.
You can't actually lock to max frequency — the GPU will throttle if
it gets hot or draws too much power. Cap power to 750W (below B200's
1kW TDP) so clocks can be sustained. This matches tritonbench's
tune-b200.sh approach.
Hardcode 1965 MHz SM clock and 750W power cap — these are the
validated settings from NVIDIA's own B200 systems. Drop the dynamic
max clock queries and extra nvidia-smi calls (persistence mode,
memory lock, app clocks) in favor of the minimal proven setup.
@ngc92
Copy link
Collaborator

ngc92 commented Mar 10, 2026

LGTM, assuming the numbers match

@msaroufim
Copy link
Member Author

We decided against doing this because we don't to give user submissions sudo access by proxy

@msaroufim msaroufim closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants