Add operator support for datastore migration#4553
Open
caseydavenport wants to merge 17 commits intotigera:masterfrom
Open
Add operator support for datastore migration#4553caseydavenport wants to merge 17 commits intotigera:masterfrom
caseydavenport wants to merge 17 commits intotigera:masterfrom
Conversation
Three changes to support the v1-to-v3 CRD migration controller: 1. UseV3CRDS() now checks for a DatastoreMigration CR in Converged/Complete phase before falling through to API discovery. This handles operator restarts mid-migration correctly. Signature changed to take rest.Config instead of kubernetes.Interface since it creates clients internally now. 2. APIServer controller goes hands-off during migration. When a DatastoreMigration CR is in Migrating phase, reconciliation is skipped so the migration controller can own the APIService. When the CR reaches Converged, the controller patches the operator's own deployment with CALICO_API_GROUP=projectcalico.org/v3 to trigger a rolling restart into v3 CRD mode. 3. ComponentHandler generically injects CALICO_API_GROUP into all containers of every Deployment/DaemonSet/StatefulSet it reconciles. The env var is auto-detected from the operator's own environment in the constructor, so no per-component plumbing is needed.
Grant calico-kube-controllers permissions for the datastore migration controller: DatastoreMigration CRs, APIService access for removal during migration, and CRD deletion for v1 cleanup.
When UseV3CRDS detects v3 mode via DatastoreMigration CR, the CALICO_API_GROUP env var isn't set on the process. Rather than relying on the env var, main.go now calls SetCalicoAPIGroup() based on the detection result, and componentHandler picks it up from the package-level variable.
Add a MigrationRBACComponent in the kubecontrollers render package that creates a ClusterRole/ClusterRoleBinding granting kube-controllers broad access to both projectcalico.org and crd.projectcalico.org API groups. The installation controller checks for the DatastoreMigration CR on each reconcile and creates or deletes the RBAC accordingly.
The apiservices and CRD permissions are only needed during migration, so move them from the static kube-controllers ClusterRole into the dynamic migration ClusterRole that gets created/deleted based on DatastoreMigration CR existence. Also add 'create' verb for apiservices (needed for abort/restore).
The installation controller needs to re-reconcile promptly when the DatastoreMigration phase changes so it can inject CALICO_API_GROUP into components. Without this watch, it only picks up changes on the 5-minute periodic reconcile. Uses a deferred watch via WaitToAddResourceWatch since the DatastoreMigration CRD may not be installed.
Move the MigrationRBACComponent reconciliation to the top of the Reconcile function, before namespace migration and calico-node readiness checks. This ensures the migration ClusterRole is created promptly when a DatastoreMigration CR appears, even if the rest of the reconcile is blocked on other conditions.
Member
Author
|
Companion PRs:
|
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
- Move DatastoreMigration API check out of render package into the installation controller; MigrationRBACComponent now takes a bool - Rename triggerOperatorRestart to setAPIGroupEnvVar - Add Job and CronJob to injectAPIGroupEnv - Add debug logging to MigrationRBACComponent path
Replace WaitToAddResourceWatch (which panics on unstructured objects due to ContextLoggerForResource casting to ObjectMetaAccessor) with a custom watch loop that polls discovery API and calls WatchObject directly with the unstructured object.
UseV3CRDS only runs at startup, so if the operator boots before the migration reaches Converged, the component handler has no API group to inject. The installation controller now checks the migration phase on each reconcile and calls SetCalicoAPIGroup when it sees Converged or Complete. This closes the gap between migration completing and the operator reacting. Also add a mutex to SetCalicoAPIGroup since it's now called from both main() and the installation controller reconcile goroutine.
Move SetCalicoAPIGroup / getCalicoAPIGroupEnvs into a standalone package with typed constants (V1, V3) instead of passing strings. The mutex and env var construction are encapsulated — callers use apigroup.Set(apigroup.V3) and apigroup.EnvVars().
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
caseydavenport
commented
Mar 17, 2026
- Extract shared datastoremigration package with GetPhase, Exists, WaitForWatchAndAdd, GVR, and phase constants. Removes duplication between apiserver and installation controllers. - Add deferred DatastoreMigration watch to apiserver controller - Remove dynamicClient from apiserver controller (no longer needed) - Update env var injection to merge in place (update existing env var instead of appending a duplicate) - Add comment explaining why installation controller sets apigroup directly in addition to the apiserver controller restart path
- Consolidate DatastoreMigration GVR into apis.DatastoreMigrationGVR - Construct dynamic client once per controller instead of per-reconcile - Use sync.RWMutex for read-heavy apigroup accessors - Remove redundant apigroup.Set from installation controller - Add unit tests for apigroup and datastoremigration packages
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion PR to projectcalico/calico#12012 — adds the operator-side changes needed to support v1-to-v3 CRD datastore migration.
API group detection (
pkg/apis/version.go):UseV3CRDS()now takes arest.Configinstead of a clientset. After checking theCALICO_API_GROUPenv var, it uses a dynamic client to check for a DatastoreMigration CR in Converged/Complete phase before falling through to API discovery.API group tracking (
pkg/apigroup/): New package with typed constants (V1,V3) and mutex-protectedSet()/Get()/EnvVars(). Called frommain.goat startup and from the installation controller when migration reaches Converged.Shared migration utilities (
pkg/controller/migration/datastoremigration/): Package withGetPhase(),Exists(),WaitForWatchAndAdd(), GVR, and phase constants. Used by both the apiserver and installation controllers to avoid duplication.Component env injection (
pkg/controller/utils/component.go):NewComponentHandler()readsapigroup.EnvVars()and injectsCALICO_API_GROUPinto all containers of Deployments, DaemonSets, StatefulSets, Jobs, and CronJobs. Merges in place if the env var already exists.APIServer controller (
pkg/controller/apiserver/): Checks DatastoreMigration phase on each reconcile. Defers reconciliation during Migrating phase. Patches operator deployment withCALICO_API_GROUPenv var on Converged. Deferred watch on DatastoreMigration CRs.Installation controller (
pkg/controller/installation/):apigroup.Set(V3)when Converged/Complete for immediate env injectionDynamic migration RBAC (
pkg/render/kubecontrollers/migration_rbac.go): Creates/deletes a ClusterRole+ClusterRoleBinding granting kube-controllers broad access to both API groups, apiservices, and CRDs. Only present when a DatastoreMigration CR exists.Static kube-controllers RBAC: Grants read access to
migration.projectcalico.orgDatastoreMigration CRs for the migration controller's informer.Companion PRs: