Enforce auction timeout in orchestrator wait logic#469
Enforce auction timeout in orchestrator wait logic#469ChristianPavilonis wants to merge 4 commits intomainfrom
Conversation
- Pass remaining time budget to each provider instead of full timeout, so backend first_byte_timeout cannot exceed the auction deadline - Extract remaining_budget_ms helper and add unit tests for it - Simplify from_url_with_first_byte_timeout to take Duration (not Option) - Fix misleading doc on ensure() to note timeout is not in backend name - Update TODO comment with specific untested timeout paths
aram356
left a comment
There was a problem hiding this comment.
Staff-level review of auction timeout enforcement
Good work overall. The approach of threading remaining_budget_ms through the orchestrator and into backend first_byte_timeout is sound. The doc comments are thorough and the compute_name / ensure split in BackendConfig is a clean refactor. A few issues to address before merging, ranging from a semantic correctness concern to testing gaps.
Summary of findings
| Priority | Issue |
|---|---|
| P0 | Backend timeout is first-registration-wins; later providers in the same auction may inherit a stale timeout |
| P1 | Requests still launch when remaining budget is already 0 ms |
| P1 | select() blocking means wall-clock can exceed timeout_ms (documented, but consider mitigation) |
| P2 | Per-provider timeout_ms() trait method is ignored at the transport layer |
| P2 | Critical timeout paths remain untested (acknowledged in TODO, but risky to merge without) |
| P3 | as u32 truncation in remaining_budget_ms |
| P3 | URL parsing logic triplicated across three methods |
aram356
left a comment
There was a problem hiding this comment.
Summary
This PR adds auction deadline enforcement to the orchestrator — providers now receive a shrinking time budget via remaining_budget_ms(), backends get a matching first_byte_timeout, and the select() loop checks the deadline after each response. The mediator is skipped when the budget is exhausted. The backend_name() trait method no longer registers backends as a side-effect.
Blocking
🔧 wrench
- 0ms budget guard missing in provider loop: requests still launch when remaining budget is 0ms — the mediator path checks for this but the provider loop does not (
orchestrator.rs:284) - Backend timeout poisoning:
first_byte_timeoutis not part of the backend name, so first-registration-wins — if bidders and mediators share an origin, the mediator's tighter timeout is silently ignored (backend.rs:128)
❓ question
- Can bidders and mediators share the same origin? This determines whether the timeout poisoning is a real issue in the current deployment topology (
backend.rs:128)
Non-blocking
🤔 thinking
as u32truncation:remaining_budget_mscastsu128tou32— safe in practice but technically unsound (orchestrator.rs:21)- Provider
timeout_ms()unused: per-provider timeout config is ignored at the transport layer (orchestrator.rs:289)
♻️ refactor
- Triplicated URL parsing: three methods have identical parsing logic — extract a helper (
backend.rs:222)
🌱 seedling
- Untested timeout paths: core behavioral change has no integration test coverage (
orchestrator.rs:691)
🏕 camp site
- Log ordering: "Running auction with strategy" log on line 83 fires after the auction completes — reads as though the auction is starting. Consider moving it before the
if/elseor changing to "Auction completed with strategy".
👍 praise
backend_name()side-effect removed: no longer registers a backend on read (prebid.rs:955)- Remaining-budget mediator design: clean early-return and fallback (
orchestrator.rs:121)
…ate URL parsing - Include first_byte_timeout in backend name to prevent first-registration-wins poisoning where later requests silently inherit an earlier timeout - Add 0ms budget guard in provider loop (consistent with mediator path) - Use min(provider.timeout_ms(), remaining_budget) to respect per-provider caps - Fix as u32 truncation in remaining_budget_ms with try_from/unwrap_or - Extract parse_origin helper to deduplicate URL parsing across 3 methods - Update backend_name trait method to accept timeout_ms for correct name prediction - Add comment documenting select() blocking dependency on first_byte_timeout - Expand TODO with follow-up guidance for testability improvements
Review feedback addressed (48fb712)Thanks for the thorough review! All findings have been addressed in the latest push. Here's a breakdown: Blocking — resolved🔧 0ms budget guard in provider loop ( 🔧 Backend timeout poisoning ( The Non-blocking — resolved🤔 🤔 Provider ♻️ Triplicated URL parsing ( 🌱 Untested timeout paths ( Created issue #473 Comment about All changes pass |
Summary
settings.auction.timeout_ms) in the orchestrator'sselect()loop — previously, waits could extend to the backend's hardcoded 15sfirst_byte_timeout, ignoring the auction deadline.first_byte_timeoutis set totimeout_msinstead of 15s, and (2) the orchestrator checks elapsed time after eachselect()return, dropping remaining requests when the deadline is exceeded.Changes
crates/common/src/backend.rsfirst_byte_timeoutfield toBackendConfig(default 15s), builder method, timeout-aware backend naming, andfrom_url_with_first_byte_timeout()convenience methodcrates/common/src/auction/orchestrator.rsselect()loop — trackauction_start, drop remaining requests when timeout exceeded, pass only remaining time to mediatorcrates/common/src/integrations/prebid.rsfrom_url_with_first_byte_timeout()withcontext.timeout_msfor bid requestscrates/common/src/integrations/aps.rsfrom_url_with_first_byte_timeout()withcontext.timeout_msfor bid requests; rename_context→contextcrates/common/src/integrations/adserver_mock.rsfrom_url_with_first_byte_timeout()withcontext.timeout_msfor mediation requestsCloses
Closes #405
Test plan
cargo test --workspacecargo clippy --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --bin trusted-server-fastly --release --target wasm32-wasip1fastly compute serveChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)