Skip to content

Add SharedInformer implementation to python-client#2515

Open
Copilot wants to merge 10 commits intomasterfrom
copilot/implement-informer-in-python-client
Open

Add SharedInformer implementation to python-client#2515
Copilot wants to merge 10 commits intomasterfrom
copilot/implement-informer-in-python-client

Conversation

Copy link
Contributor

Copilot AI commented Feb 20, 2026

Python clients wanting a local cache of Kubernetes resources had to implement their own watch loops, reconnection logic, and thread management. This adds a SharedInformer analogous to the Java and JavaScript client implementations.

New package: kubernetes.informer

  • ObjectCache (cache.py) — thread-safe in-memory store keyed by namespace/name; exposes list(), get(), get_by_key(), list_keys()
  • SharedInformer (informer.py) — daemon thread running list-then-watch loop with:
    • Automatic reconnect on ApiException or other errors
    • Event handler callbacks for ADDED, MODIFIED, DELETED, BOOKMARK, and ERROR
    • BOOKMARK event support: handlers receive the raw bookmark object (containing the updated resourceVersion); the cache is not modified
    • resourceVersion tracking: the most recent resourceVersion (synced from the Watch before each handler fires) is stored and reused on reconnect, avoiding unnecessary full re-lists; a 410 Gone response resets the stored version and triggers a fresh re-list
    • Periodic resync: every resync_period seconds a full list_func call is made to the API server; the result is diffed against the current cache and ADDED/MODIFIED/DELETED events are fired for any changes
    • Cache diff on every list: initial list and re-lists (after 410 or resync) fire ADDED for new items, MODIFIED for existing items, and DELETED for items that have disappeared — matching the Java and JavaScript informer behaviour
    • Namespace, label selector, and field selector pass-through

Usage

from kubernetes import config
from kubernetes.client import CoreV1Api
from kubernetes.informer import ADDED, BOOKMARK, DELETED, MODIFIED, SharedInformer

config.load_kube_config()
v1 = CoreV1Api()

informer = SharedInformer(
    list_func=v1.list_namespaced_pod,
    namespace="default",
    resync_period=60,
)
informer.add_event_handler(ADDED, lambda pod: print("added", pod.metadata.name))
informer.add_event_handler(DELETED, lambda pod: print("deleted", pod.metadata.name))
informer.add_event_handler(BOOKMARK, lambda obj: print("bookmark rv", obj["metadata"]["resourceVersion"]))

informer.start()

# Main thread can query the cache at any time without blocking
pods = informer.cache.list()

SharedInformer is now exposed via the top-level kubernetes package (kubernetes.informer).

Testing

Unit tests are in kubernetes/test/test_informer.py (38 tests) and cover the cache, key function, event handlers, watch loop behaviour, BOOKMARK handling, resourceVersion tracking including 410 Gone recovery, periodic resync triggering a full List call, and scenarios analogous to the JavaScript cache_test.ts, Java DefaultSharedIndexInformerWireMockTest, and Go shared_informer_test.go:

  • Multiple handlers registered for the same event type all fire
  • namespace, label_selector, and field_selector are forwarded to both the list func and Watch kwargs
  • Watch is started with the list's resourceVersion kwarg after the initial list
  • Non-410 ApiException fires the ERROR handler and reconnects without re-listing
  • List func failure fires the ERROR handler
  • Initial list items each fire an ADDED event
  • After a 410-triggered re-list, items absent from the new list fire DELETED
  • The same handler callable registered twice fires twice per event; removing one registration leaves the other active (analogous to Go TestSharedInformerMultipleRegistration)
  • Removing a handler while the informer is running stops it receiving subsequent events (analogous to Go TestRemoveWhileActive)
  • Adding a handler while the informer is running fires it for subsequent events only (analogous to Go TestAddWhileActive)
  • Concurrent add/remove of handlers from many threads causes no data races (analogous to Go TestSharedInformerHandlerAbuse)
  • After a watch disruption and re-list, items present in both old and new lists fire MODIFIED (analogous to Go TestSharedInformerWatchDisruption)

End-to-end tests are in kubernetes/e2e_test/test_informer.py and run against a real cluster (automatically skipped when no cluster is available). They cover:

  • Pre-existing objects appearing in the cache after start()
  • ADDED event firing and cache population when a ConfigMap is created
  • MODIFIED event firing when a ConfigMap is patched
  • DELETED event firing and cache removal when a ConfigMap is deleted
  • resourceVersion advancing after real watch events are received
Original prompt

This section details on the original issue you should resolve

<issue_title>Implement an Informer in python-client</issue_title>
<issue_description>https://github.com/kubernetes/client-go has an Informer implementation. Internally it leverages a watcher of some collection of resources, continually streams changes (add/modify/delete events), reflects the resources into a downstream store (cache), handles connectivity drops, and periodically does a full resync. This all happens on some background thread (goroutine). A client of the informer is free to iterate over that stored cache without concern for how it's populated, and immediately get (possibly outdated) state.

Applications using https://github.com/kubernetes-client/python that want a local store of resources reflecting some in-cluster state need to concern themselves with those lower-level details. There's a lot of room for error.

On 2019-06-25, go#28 added a simple informer implementation to the openapi-generated client for Golang. It defines a Cache struct, with both a list of all objects and event handler callbacks that a consumer could register.

https://github.com/kubernetes-client/python should contain a similar implementation.

People have been talking about this a bit in various places.

<agent_instructions>Please use the Java informer code as a reference for the implementation you can find it in this directory https://github.com/kubernetes-client/java/tree/master/util/src/main/java/io/kubernetes/client/informer</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@roycaihw The kopf's has a [watch implementation](https://github.com/nolar/kopf/blob/main/kopf/clients/watching.py) that is claimed to be informer-equivalent, which may be interesting to look into. @brendandburns fwiw (since I just noticed this issue) there are Informers for both the Java (https://github.com/kubernetes-client/java/tree/master/util/src/main/java/io/kubernetes/client/informer) and JavaScript (https://github.com/kubernetes-client/javascript/blob/master/src/informer.ts) client libraries.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 20, 2026
…example

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Copilot
Once this PR has been reviewed and has the lgtm label, please assign yliaog for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 20, 2026
Copilot AI changed the title [WIP] Add informer implementation to python-client Add SharedInformer implementation to python-client Feb 20, 2026
Copilot AI requested a review from brendandburns February 20, 2026 18:42
@brendandburns
Copy link
Contributor

@copilot can you add support for BOOKMARK events?

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot can you add support for BOOKMARK events?

Done in 9c708ad. Added a BOOKMARK constant and registered it in the handlers dict so consumers can call add_event_handler(BOOKMARK, fn). When a BOOKMARK event arrives in the watch stream the informer fires the handler with the raw BOOKMARK object (which carries the updated resourceVersion). The cache is not modified — BOOKMARK events carry no object state change. Two new tests cover this: one verifies the handler is called with the bookmark object, the other confirms the cache is unaffected.

@brendandburns
Copy link
Contributor

@copilot can you implement it so that the most recent resourceVersion is stored and used for the re-list operation?

… on 410 Gone

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot can you implement it so that the most recent resourceVersion is stored and used for the re-list operation?

Done in d21aa1b. SharedInformer now tracks _resource_version internally. After the initial list the RV is stored, and after each watch stream ends the Watch's latest RV (updated by every event including BOOKMARKs) is captured in finally. On reconnect the stored RV is passed directly to the next watch call — no full re-list is performed. The only exception is a 410 Gone response, which resets _resource_version to None so the next loop iteration triggers a fresh re-list. Three new tests cover: RV stored from Watch, reconnect skips re-list when RV is known, and 410 triggers re-list.

@brendandburns
Copy link
Contributor

@copilot can you add end-to-end tests against a real cluster

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
@aojea
Copy link

aojea commented Feb 24, 2026

@brendandburns are you experimenting or you really trying to get this working ... very interested in this 😄

@brendandburns
Copy link
Contributor

@aojea I'm definitely really trying to get it working, but via copilot.

@brendandburns brendandburns marked this pull request as ready for review February 24, 2026 23:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2026
@aojea
Copy link

aojea commented Feb 25, 2026

@aojea I'm definitely really trying to get it working, but via copilot.

well, that is a good kubecon talk 😄 we can always try to give him the last push 👍

@brendandburns
Copy link
Contributor

@copilot e2e tests are timing out. Looks like the test that the resource version advances isn't working right. Please fix.

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 25, 2026

@copilot e2e tests are timing out. Looks like the test that the resource version advances isn't working right. Please fix.

Fixed in 90c2b02. The root cause was a race: _resource_version was only updated in the finally block (when the watch stream ends), but the e2e test checks it immediately after the ADDED event handler fires — at that point the stream is still running so finally hasn't executed yet, leaving _resource_version at the initial LIST value.

The fix syncs _resource_version from Watch.resource_version at the top of each loop iteration, before firing handlers. The Watch class already updates its own resource_version during unmarshal_event (before yielding), so the value is ready. This means when the ADDED handler calls seen.set() and unblocks the test thread, _resource_version is already advanced.

Copy link
Contributor Author

Copilot AI commented Feb 25, 2026

@copilot e2e tests are timing out. Looks like the test that the resource version advances isn't working right. Please fix.

Fixed in 90c2b02. The root cause was that _resource_version was only updated in the finally block (when the Watch.stream() generator ends), but Watch.stream() has its own internal reconnect loop and runs indefinitely. During normal operation, _resource_version was never updated mid-stream — so when test_resource_version_advances called assertGreater(int(inf._resource_version), rv_before) immediately after the ADDED event fired, it was comparing the initial-list RV against itself.

The fix: _resource_version is now synced from self._watch.resource_version before firing handlers on each event (ADDED, MODIFIED, DELETED, BOOKMARK). This ensures that by the time any handler is called, the informer already reflects the resource version carried by that event.

@brendandburns
Copy link
Contributor

@copilot can you add a test to validate that a full List call is made to the Kubernetes API server every resyncPeriod?

…list

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 27, 2026

@copilot can you add a test to validate that a full List call is made to the Kubernetes API server every resyncPeriod?

Done in 5a8fef1. Two changes were needed:

Implementation (informer.py): The resync block now calls self._initial_list() (a real list_func call to the API server) before firing MODIFIED for each cached object. Previously it only replayed locally-cached objects without contacting the server.

Test (test_informer.pytest_resync_period_triggers_full_list): Patches time.monotonic to make the elapsed time jump past resync_period=60 after the first watch event, then asserts list_func.call_count == 2 (once for the initial list, once for the resync).

@brendandburns
Copy link
Contributor

…ires ADDED/MODIFIED/DELETED

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 28, 2026

@copilot can you look at the tests here: https://github.com/kubernetes-client/javascript/blob/main/src/cache_test.ts and here: https://github.com/kubernetes-client/java/blob/master/u...

Done in b91f278. Reviewed both reference test files and added 7 new tests plus two implementation changes:

Implementation changes (informer.py):

  • _initial_list now diffs the old cache against the new list: fires DELETED for items that disappeared, MODIFIED for items present in both, and ADDED for genuinely new items — matching the JS "initial list and delete after" and Java resync behavior.
  • Removed the now-redundant explicit MODIFIED loop from the resync block (the diff in _initial_list handles it).

New tests (analogous to reference tests):

Python test Reference
test_multiple_handlers_all_fire JS "multiple handlers"
test_selectors_and_namespace_forwarded JS "namespace caching" / Java namespace pod informer
test_watch_resource_version_passed_after_initial_list Java lastSyncResourceVersion / RV tracking
test_non_410_api_exception_reconnects_without_relist Java informerReListWatchOnWatchConflict
test_list_func_error_fires_error_handler Java informerReListingOnListForbidden
test_initial_list_fires_added_for_each_item JS "informer with initial list"
test_relist_after_410_fires_delete_for_removed_items JS "initial list and delete after"

All 33 tests pass.

@brendandburns
Copy link
Contributor

@copilot please examine the tests here: https://github.com/kubernetes/client-go/blob/c86837f7edbe99923f3296ca925eb1adc7189b39/tools/cache/shared_informer_test.go and find any tests that are missing from this client's informer unit tests and add them. Fix the code if they don't pass.

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 28, 2026

@copilot please examine the tests here: https://github.com/kubernetes/client-go/blob/c86837f7edbe99923f3296ca925eb1adc7189b39/tools/cache/shared_informer_test.go and find any tests t...

Done in 25d1f12. Reviewed the client-go test file and added 5 new tests covering the scenarios we were missing:

Python test client-go analogue
test_same_handler_registered_twice_fires_twice TestSharedInformerMultipleRegistration
test_remove_handler_while_running_stops_events TestRemoveWhileActive
test_add_handler_while_running_receives_subsequent_events TestAddWhileActive
test_concurrent_handler_registration_is_thread_safe TestSharedInformerHandlerAbuse
test_watch_disruption_existing_items_fire_modified_after_relist TestSharedInformerWatchDisruption

The other client-go tests (TestIndexer, TestListenerResyncPeriods, TestResyncCheckPeriod, TestSharedInformerTransformer, TestStateSharedInformer) cover features that don't have a direct equivalent in this implementation (per-listener resync periods, index support, transform functions, IsStopped/HasSynced state methods). All 38 unit tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement an Informer in python-client

4 participants