Skip to content

Add operator support for datastore migration#4553

Open
caseydavenport wants to merge 17 commits intotigera:masterfrom
caseydavenport:casey-migration-operator-support
Open

Add operator support for datastore migration#4553
caseydavenport wants to merge 17 commits intotigera:masterfrom
caseydavenport:casey-migration-operator-support

Conversation

@caseydavenport
Copy link
Member

@caseydavenport caseydavenport commented Mar 16, 2026

Companion PR to projectcalico/calico#12012 — adds the operator-side changes needed to support v1-to-v3 CRD datastore migration.

API group detection (pkg/apis/version.go): UseV3CRDS() now takes a rest.Config instead of a clientset. After checking the CALICO_API_GROUP env var, it uses a dynamic client to check for a DatastoreMigration CR in Converged/Complete phase before falling through to API discovery.

API group tracking (pkg/apigroup/): New package with typed constants (V1, V3) and mutex-protected Set()/Get()/EnvVars(). Called from main.go at startup and from the installation controller when migration reaches Converged.

Shared migration utilities (pkg/controller/migration/datastoremigration/): Package with GetPhase(), Exists(), WaitForWatchAndAdd(), GVR, and phase constants. Used by both the apiserver and installation controllers to avoid duplication.

Component env injection (pkg/controller/utils/component.go): NewComponentHandler() reads apigroup.EnvVars() and injects CALICO_API_GROUP into all containers of Deployments, DaemonSets, StatefulSets, Jobs, and CronJobs. Merges in place if the env var already exists.

APIServer controller (pkg/controller/apiserver/): Checks DatastoreMigration phase on each reconcile. Defers reconciliation during Migrating phase. Patches operator deployment with CALICO_API_GROUP env var on Converged. Deferred watch on DatastoreMigration CRs.

Installation controller (pkg/controller/installation/):

  • Reconciles migration RBAC early (before blocking operations like namespace migration)
  • Checks migration phase and calls apigroup.Set(V3) when Converged/Complete for immediate env injection
  • Deferred watch on DatastoreMigration CRs for prompt reconciliation on phase changes

Dynamic migration RBAC (pkg/render/kubecontrollers/migration_rbac.go): Creates/deletes a ClusterRole+ClusterRoleBinding granting kube-controllers broad access to both API groups, apiservices, and CRDs. Only present when a DatastoreMigration CR exists.

Static kube-controllers RBAC: Grants read access to migration.projectcalico.org DatastoreMigration CRs for the migration controller's informer.

Companion PRs:

Three changes to support the v1-to-v3 CRD migration controller:

1. UseV3CRDS() now checks for a DatastoreMigration CR in Converged/Complete
   phase before falling through to API discovery. This handles operator
   restarts mid-migration correctly. Signature changed to take rest.Config
   instead of kubernetes.Interface since it creates clients internally now.

2. APIServer controller goes hands-off during migration. When a
   DatastoreMigration CR is in Migrating phase, reconciliation is skipped
   so the migration controller can own the APIService. When the CR reaches
   Converged, the controller patches the operator's own deployment with
   CALICO_API_GROUP=projectcalico.org/v3 to trigger a rolling restart
   into v3 CRD mode.

3. ComponentHandler generically injects CALICO_API_GROUP into all
   containers of every Deployment/DaemonSet/StatefulSet it reconciles.
   The env var is auto-detected from the operator's own environment in
   the constructor, so no per-component plumbing is needed.
Grant calico-kube-controllers permissions for the datastore migration
controller: DatastoreMigration CRs, APIService access for removal
during migration, and CRD deletion for v1 cleanup.
When UseV3CRDS detects v3 mode via DatastoreMigration CR, the
CALICO_API_GROUP env var isn't set on the process. Rather than
relying on the env var, main.go now calls SetCalicoAPIGroup()
based on the detection result, and componentHandler picks it up
from the package-level variable.
Add a MigrationRBACComponent in the kubecontrollers render package
that creates a ClusterRole/ClusterRoleBinding granting kube-controllers
broad access to both projectcalico.org and crd.projectcalico.org API
groups. The installation controller checks for the DatastoreMigration
CR on each reconcile and creates or deletes the RBAC accordingly.
The apiservices and CRD permissions are only needed during migration,
so move them from the static kube-controllers ClusterRole into the
dynamic migration ClusterRole that gets created/deleted based on
DatastoreMigration CR existence. Also add 'create' verb for
apiservices (needed for abort/restore).
The installation controller needs to re-reconcile promptly when the
DatastoreMigration phase changes so it can inject CALICO_API_GROUP
into components. Without this watch, it only picks up changes on the
5-minute periodic reconcile.

Uses a deferred watch via WaitToAddResourceWatch since the
DatastoreMigration CRD may not be installed.
Move the MigrationRBACComponent reconciliation to the top of the
Reconcile function, before namespace migration and calico-node
readiness checks. This ensures the migration ClusterRole is created
promptly when a DatastoreMigration CR appears, even if the rest of
the reconcile is blocked on other conditions.
@caseydavenport
Copy link
Member Author

Companion PRs:

- Move DatastoreMigration API check out of render package into the
  installation controller; MigrationRBACComponent now takes a bool
- Rename triggerOperatorRestart to setAPIGroupEnvVar
- Add Job and CronJob to injectAPIGroupEnv
- Add debug logging to MigrationRBACComponent path
Replace WaitToAddResourceWatch (which panics on unstructured objects
due to ContextLoggerForResource casting to ObjectMetaAccessor) with
a custom watch loop that polls discovery API and calls WatchObject
directly with the unstructured object.
UseV3CRDS only runs at startup, so if the operator boots before the
migration reaches Converged, the component handler has no API group
to inject. The installation controller now checks the migration phase
on each reconcile and calls SetCalicoAPIGroup when it sees Converged
or Complete. This closes the gap between migration completing and
the operator reacting.

Also add a mutex to SetCalicoAPIGroup since it's now called from
both main() and the installation controller reconcile goroutine.
Move SetCalicoAPIGroup / getCalicoAPIGroupEnvs into a standalone
package with typed constants (V1, V3) instead of passing strings.
The mutex and env var construction are encapsulated — callers use
apigroup.Set(apigroup.V3) and apigroup.EnvVars().
- Extract shared datastoremigration package with GetPhase, Exists,
  WaitForWatchAndAdd, GVR, and phase constants. Removes duplication
  between apiserver and installation controllers.
- Add deferred DatastoreMigration watch to apiserver controller
- Remove dynamicClient from apiserver controller (no longer needed)
- Update env var injection to merge in place (update existing env var
  instead of appending a duplicate)
- Add comment explaining why installation controller sets apigroup
  directly in addition to the apiserver controller restart path
@caseydavenport caseydavenport marked this pull request as ready for review March 18, 2026 18:33
@caseydavenport caseydavenport requested a review from a team as a code owner March 18, 2026 18:33
- Consolidate DatastoreMigration GVR into apis.DatastoreMigrationGVR
- Construct dynamic client once per controller instead of per-reconcile
- Use sync.RWMutex for read-heavy apigroup accessors
- Remove redundant apigroup.Set from installation controller
- Add unit tests for apigroup and datastoremigration packages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants