Lock B200 GPU clocks during benchmarks for reproducible results#118
Closed
Lock B200 GPU clocks during benchmarks for reproducible results#118
Conversation
Add gpu_lockdown() context manager to utils.py that locks SM/memory clocks and power cap via nvidia-smi before benchmark and leaderboard runs. Clocks are queried dynamically from the GPU and reset on exit. Adapted from tritonbench/.ci/gpu/tune-b200.sh. Controlled by POPCORN_GPU_LOCKDOWN env var (default: enabled).
Always lock clocks — no need for a toggle. Each _sudo_nvsmi call already logs its own warning on failure, so the ok accumulator was redundant.
Use check_call/check_output instead of swallowing errors — if we can't lock clocks the benchmark numbers are unreliable so we should not silently continue.
No need to unlock clocks — the OS resets them when the process exits. Just lock once at the start of benchmark/leaderboard mode.
No reason to hardcode 750W when we can query power.max_limit the same way we query max clocks.
Locking SM and memory clocks to fixed frequencies already prevents drift. Setting power limit to max is a no-op.
You can't actually lock to max frequency — the GPU will throttle if it gets hot or draws too much power. Cap power to 750W (below B200's 1kW TDP) so clocks can be sustained. This matches tritonbench's tune-b200.sh approach.
Hardcode 1965 MHz SM clock and 750W power cap — these are the validated settings from NVIDIA's own B200 systems. Drop the dynamic max clock queries and extra nvidia-smi calls (persistence mode, memory lock, app clocks) in favor of the minimal proven setup.
Collaborator
|
LGTM, assuming the numbers match |
Member
Author
|
We decided against doing this because we don't to give user submissions sudo access by proxy |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.