Add startup probe to calico-node for faster rolling updates#4562
Closed
caseydavenport wants to merge 1 commit intotigera:masterfrom
Closed
Add startup probe to calico-node for faster rolling updates#4562caseydavenport wants to merge 1 commit intotigera:masterfrom
caseydavenport wants to merge 1 commit intotigera:masterfrom
Conversation
The calico-node readiness probe checks Felix health and BIRD status via cheap local calls (HTTP to localhost and a unix socket command). Previously the readiness probe used the Kubernetes default 10s period, which meant each node took 10-30s to be marked ready during rollouts. Add a startup probe with a 5s period that runs the same check. K8s doesn't start the readiness/liveness probes until the startup probe succeeds, so this gives fast initial detection during rolling updates while keeping steady-state probes at the default interval. The startup probe allows up to 2 minutes for initial startup (failureThreshold=24 x periodSeconds=5). On a 4-node cluster this reduces DaemonSet rollout from ~5 minutes to ~2 minutes. On larger clusters the improvement scales linearly.
Member
Author
|
Closing — the startup probe adds risk of restart loops on slow-starting nodes for marginal rollout speed improvement (~5-10s per node). The readiness and startup checks are identical, so the startup probe doesn't buy us much here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The calico-node readiness probe checks Felix health and BIRD status via cheap local calls (HTTP to localhost, birdcl on a unix socket). These complete in milliseconds.
This adds a startup probe with
periodSeconds: 5andfailureThreshold: 24(2 minute startup budget). Kubernetes doesn't start readiness/liveness probes until the startup probe succeeds, so this gives fast initial ready detection during pod startup while keeping the steady-state readiness check at the default 10s interval.The main benefit is decoupling startup from steady-state — if we ever want to relax the readiness probe period for large clusters, the startup probe ensures rollout speed isn't affected. The immediate improvement is modest (~5-10s per node during rolling updates).