Skip to content

Make Retry Period configurable#1083

Open
viragvoros wants to merge 2 commits intogardener:masterfrom
viragvoros:configurable-retry-period
Open

Make Retry Period configurable#1083
viragvoros wants to merge 2 commits intogardener:masterfrom
viragvoros:configurable-retry-period

Conversation

@viragvoros
Copy link

@viragvoros viragvoros commented Mar 5, 2026

This PR makes the retry period used when machine creation fails with a codes.ResourceExhausted error configurable.

PR #981 introduced the LongRetry retry period (10 minutes) to handle situations where machine creation fails due to exhausted resources. However, in certain environments this retry duration may still be insufficient.
To improve flexibility, this PR allows operators to adjust the retry period used specifically for the ResourceExhausted error case. The default behavior stays unchanged: if no value is provided as a a command-line flag (--resource-exhausted-retry) to the machine-controller-manager-provider, the controller continues to use machineutils.LongRetry.

Which issue(s) this PR fixes:
Fixes #977
Extends logic introduced in PR #981 to make the retry duration configurable.

Special notes for your reviewer:
The implementation keeps the existing default behavior unchanged:

  • The ResourceExhausted retry period defaults to machineutils.LongRetry.
  • The value can be optionally overridden with --resource-exhausted-retry CLI flag in machine-controller-manager-provider.
  • The override is applied once during startup before the controller begins running.

No changes are required for existing deployments unless operators want to configure the retry duration.

Release note:

other operator
machine-controller-manager now allows configuring the retry duration for ResourceExhausted errors during machine creation with the `--resource-exhausted-retry` CLI flag in machine-controller-manager-provider. If not specified, the default `machineutils.LongRetry` stays unchanged.

@Kumm-Kai @hasit97

@viragvoros viragvoros requested a review from a team as a code owner March 5, 2026 10:28
@gardener-prow
Copy link

gardener-prow bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign elankath for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@CLAassistant
Copy link

CLAassistant commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@gardener-prow gardener-prow bot added the do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Mar 5, 2026
@gardener-prow
Copy link

gardener-prow bot commented Mar 5, 2026

Welcome @viragvoros!

It looks like this is your first PR to gardener/machine-controller-manager 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if gardener/machine-controller-manager has its own contribution guidelines.

Thank you, and welcome to Gardener. 😃

@gardener-prow gardener-prow bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Mar 5, 2026
@Kumm-Kai
Copy link
Contributor

Kumm-Kai commented Mar 5, 2026

Thanks @viragvoros 🙂
I'm not 100% certain than changing the retry period for all cases where LongRetry is used doesn't cause any side effects 🤔 At least the SafetyOptions are not affected by it, but a lost of stuff is using LongRetry.

@viragvoros
Copy link
Author

viragvoros commented Mar 5, 2026

@Kumm-Kai So an entirely new retry period variable would be better maybe?

@Kumm-Kai
Copy link
Contributor

Kumm-Kai commented Mar 5, 2026

@Kumm-Kai So an entirely new retry period variable would be better maybe?

I would say that both approaches are valid, but at least from our POV, as we are only interested in the ResourceExhausted retry period, a specific setting would introduce only a single timing change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine creation retry too frequently for machines with ResourceExhausted

3 participants