CORS-4336: Add CI jobs for AWS European Sovereign Cloud (EUSC)#75568
CORS-4336: Add CI jobs for AWS European Sovereign Cloud (EUSC)#75568liweinan wants to merge 16 commits intoopenshift:mainfrom
Conversation
|
@liweinan: This pull request references CORS-4336 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
...t/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
Outdated
Show resolved
Hide resolved
...t/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
Outdated
Show resolved
Hide resolved
ci-operator/step-registry/cluster-profiles/cluster-profiles-config.yaml
Outdated
Show resolved
Hide resolved
...ipi/private/provision/cucushift-installer-rehearse-aws-eusc-ipi-private-provision-chain.yaml
Outdated
Show resolved
Hide resolved
...ipi/private/provision/cucushift-installer-rehearse-aws-eusc-ipi-private-provision-chain.yaml
Outdated
Show resolved
Hide resolved
...ipi/private/provision/cucushift-installer-rehearse-aws-eusc-ipi-private-provision-chain.yaml
Outdated
Show resolved
Hide resolved
ci-operator/step-registry/ipi/conf/aws/eusc-ami/ipi-conf-aws-eusc-ami-commands.sh
Outdated
Show resolved
Hide resolved
|
@liweinan as we discussed offline, for the new partition we need three types of cluster:
|
...t/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
Outdated
Show resolved
Hide resolved
|
@yunjiang29 Thanks for the review! I'll refactor this PR today. |
24fed80 to
de00d69
Compare
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@yunjiang29 Thanks for the detailed review! I'll update the PR recordingly. |
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
Address yunfei's review comments on PR openshift#75568: 1. Job naming convention: - Rename jobs from -f60 to -f7 suffix (non-destructive tests) - Update cron schedule to standard f7 pattern: 7,14,23,30 2. Private cluster configuration: - Add complete private cluster setup with bastion host - Add VPC, security groups, and proxy configuration - Set PUBLISH=Internal for private cluster access - Add minimal IAM permission provisioning - Follow pattern from cucushift-installer-rehearse-aws-ipi-private-provision 3. AMI configuration fix: - Replace deprecated compute.platform.aws.amiID field - Use platform.aws.defaultMachinePlatform.amiID instead
4b73bfe to
7f83d83
Compare
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
1. Job naming convention:
- Rename jobs from -f60 to -f7 suffix (non-destructive tests)
- Update cron schedule to standard f7 pattern: 7,14,23,30
2. Private cluster configuration:
- Add complete private cluster setup with bastion host
- Add VPC, security groups, and proxy configuration
- Set PUBLISH=Internal for private cluster access
- Add minimal IAM permission provisioning
- Follow pattern from cucushift-installer-rehearse-aws-ipi-private-provision
3. AMI configuration fix:
- Replace deprecated compute.platform.aws.amiID field
- Use platform.aws.defaultMachinePlatform.amiID instead
4. Generalize step registry components for reusability:
- Enhance ipi-conf-aws-custom-endpoints to support multiple AWS partitions
* Add AWS_DOMAIN_SUFFIX env var (defaults to amazonaws.com)
* Support amazonaws.eu for EUSC, amazonaws.com.cn for China
* Allow full URLs for maximum flexibility
- Make ipi-conf-aws-eusc-ami more generic
* Support AWS_CUSTOM_AMI_ID for general use
* Maintain AWS_EUSC_AMI_ID for backward compatibility
* Can be used for EUSC, China, GovCloud, or custom AMI scenarios
- Use generic steps in EUSC provision chain with partition-specific config
- Remove obsolete ipi-conf-aws-eusc-endpoints (replaced by generic version)
7f83d83 to
55daf83
Compare
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
...erator/step-registry/ipi/conf/aws/custom-endpoints/ipi-conf-aws-custom-endpoints-commands.sh
Outdated
Show resolved
Hide resolved
1. Job naming convention:
- Rename jobs from -f60 to -f7 suffix (non-destructive tests)
- Update cron schedule to standard f7 pattern: 7,14,23,30
2. Private cluster configuration:
- Add complete private cluster setup with bastion host
- Add VPC, security groups, and proxy configuration
- Set PUBLISH=Internal for private cluster access
- Add minimal IAM permission provisioning
- Follow pattern from cucushift-installer-rehearse-aws-ipi-private-provision
3. Generalize step registry components for maximum reusability:
a) Enhance ipi-conf-aws-custom-endpoints for all AWS partitions:
- Add AWS_DOMAIN_SUFFIX env var (defaults to amazonaws.com)
- Support amazonaws.eu (EUSC), amazonaws.com.cn (China)
- Allow full URLs for maximum flexibility
- Remove obsolete ipi-conf-aws-eusc-endpoints step
b) Extend ipi-conf-aws to support custom AMI configuration:
- Add AWS_AMI_ID env var for custom RHCOS AMI
- Useful for EUSC, China, GovCloud, or any partition without public AMIs
- Fix deprecated amiID field -> defaultMachinePlatform.amiID
- Auto-detection still works for C2S/SC2S
- Remove obsolete ipi-conf-aws-eusc-ami step
c) EUSC provision chain now uses only generic steps with env config
This refactoring reduces code duplication (net -59 lines) and makes step
components reusable across all AWS partitions.
55daf83 to
c6c4827
Compare
|
@liweinan, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
Relative PRs merged: #75441 / openshift/ci-tools#4973 |
...t/openshift-tests-private/openshift-openshift-tests-private-release-4.22__multi-nightly.yaml
Outdated
Show resolved
Hide resolved
We can see |
|
After discussing with Yunfei locally, we decided to use the AMI noted here: #75568 (comment) |
Implement comprehensive CI infrastructure for AWS EUSC partition in eusc-de-east-1 region. Job coverage (9 jobs): - Common IPI: aws-eusc-ipi-f7, aws-eusc-ipi-f28-destructive, aws-eusc-ipi-fips-f7 - Private: aws-eusc-ipi-private-f7, aws-eusc-ipi-private-f28-destructive, aws-eusc-ipi-private-fips-f7 - Disconnected: aws-eusc-ipi-disconnected-private-f7 - STS: aws-eusc-ipi-sts-f7 - KMS: aws-eusc-ipi-byo-kms-f7 Key features: - Dynamic service endpoint auto-detection from AWS API - Split AMI variables (CONTROL_PLANE_AMI, COMPUTE_AMI) for flexible configuration - Complete private cluster deprovision cleanup (bastion, security groups, stacks, S3) - Support for FIPS-enabled clusters - Disconnected (air-gapped) private cluster support - STS (Security Token Service) authentication with OIDC - Custom KMS key encryption for etcd - Both non-destructive (f7) and destructive (f28) test variants Technical implementation: - Cluster profile: aws-eusc with automatic region detection - Custom RHCOS AMI support for control plane and compute nodes separately - Endpoint auto-detection from AWS API (no hardcoded values) - Manual credentials mode for CCO - Minimal IAM permissions - Mirror registry for disconnected environments - Backward compatible with existing AWS partitions Workflows created: - cucushift-installer-rehearse-aws-eusc-ipi (common IPI) - cucushift-installer-rehearse-aws-eusc-ipi-private (private cluster) - cucushift-installer-rehearse-aws-eusc-ipi-disconnected-private (disconnected) - cucushift-installer-rehearse-aws-eusc-ipi-sts (STS authentication) - cucushift-installer-rehearse-aws-eusc-ipi-byo-kms (custom KMS key) Signed-off-by: Wei Li <weli@redhat.com>
This refactors the AWS European Sovereign Cloud (EUSC) CI configuration to maximize reuse of standard AWS workflows and reduce maintenance burden. Changes based on @yunjiang29's review feedback: - Reduced from 9 to 6 jobs following the pattern: 3 cluster types × 2 test types - Improved FIPS coverage from 1/9 (11%) to 2/6 (33%) jobs: * aws-eusc-ipi-fips-f7 (IPI + FIPS) * aws-eusc-ipi-private-sts-fips-f7 (Private + STS + FIPS) - Combined features across jobs: * aws-eusc-ipi-f28-destructive (destructive testing) * aws-eusc-ipi-private-mini-perm-f28 (Private + minimal permissions) * aws-eusc-ipi-disc-priv-kms-f7 (Disconnected + KMS) * aws-eusc-ipi-disc-priv-f28 (Disconnected destructive) - All jobs cover: FIPS, STS, KMS, minimal permissions across 3 cluster types - Deleted 15 EUSC-specific files, created 8 new ones (net reduction: -7 files) - Maximized reuse of standard AWS workflows: * Basic IPI: reuses cucushift-installer-rehearse-aws-ipi-deprovision * Private: reuses cucushift-installer-rehearse-aws-ipi-private-deprovision * Disconnected: reuses cucushift-installer-rehearse-aws-ipi-disconnected-private-provision * Private-STS: reuses cucushift-installer-rehearse-aws-ipi-private-cco-manual-security-token-service - EUSC-specific changes limited to: * Inserting ipi-conf-aws-custom-endpoints ref for service endpoint configuration * Custom provision chain for disconnected-private-kms (combines disconnected + KMS) - Deleted all EUSC-specific deprovision chains (reuse standard chains) - Removed unnecessary byo-kms and STS specific directory structures 1. Custom endpoints (ipi-conf-aws-custom-endpoints-commands.sh): - Removed auto-detection logic for AWS_DOMAIN_SUFFIX - Simplified to use environment variable or default to "amazonaws.com" - Removed Route53 endpoint configuration (global service) - Designed for easy removal when installer adds native EUSC support 2. AMI configuration (ipi-conf-aws-commands.sh): - Simplified from split variables (CONTROL_PLANE_AMI/COMPUTE_AMI) to single CONTROL_PLANE_AMI - Preserved C2S/SC2S auto-detection logic - Removed complex heredoc patching, kept simple approach - Updated documentation for clarity 1. **Minimize EUSC-specific code**: Only 8 workflow files vs 15 previously 2. **Maximize standard workflow reuse**: Follows USGov pattern, not C2S pattern 3. **Prepare for future evolution**: Custom endpoints easy to remove when installer supports EUSC natively 4. **FIPS coverage aligned with USGov**: 33% vs USGov's 18%, not C2S's 100% - make update completed successfully - All 6 jobs generated in ci-operator/jobs/.../periodics.yaml - Step registry validation passed Addresses: openshift#75568
Address review feedback to simplify configuration scripts: - Remove AWS_DOMAIN_SUFFIX from step parameters (use cluster profile) - Support separate CONTROL_PLANE_AMI and COMPUTE_AMI configuration - Replace yq-go with yq v4 for YAML manipulation - Eliminate unnecessary fallback logic, rely on correct parameter passing - Remove intermediate variables (RHCOS_AMI) in C2S auto-detection These changes follow existing script patterns and maintain compatibility with C2S/SC2S auto-detection while enabling flexible AMI configuration for partitions like EUSC.
Fixes CI check error that enforces OWNERS files for all component configuration directories.
- Update BASE_DOMAIN from qe.devcluster.openshift.com to ci-eusc.devcluster.openshift.com for all AWS EUSC CI jobs to use the dedicated delegated subdomain for CI/QE account - Add 8 multi-arch EUSC CI jobs in openshift-tests-private release-4.22 multi-nightly: * BYO KMS encryption with FIPS (ARM f7, AMD f28-destructive) * Disconnected private (ARM f7, AMD f28-destructive) * Private STS (ARM f7, AMD f28-destructive) * Custom DNS with minimal permissions (ARM f7, AMD f28-destructive) - Add e2e-aws-eusc-techpreview jobs to openshift/installer configs: * release-4.22, release-4.23, release-5.0, and main - Add installer repo to aws-eusc cluster profile owners - Restore version info comments in ipi-conf-aws-commands.sh All jobs use cluster_profile: aws-eusc with BASE_DOMAIN: ci-eusc.devcluster.openshift.com and FEATURE_SET: TechPreviewNoUpgrade.
The installer now configures service endpoints implicitly for EUSC partition, so manual endpoint configuration via ipi-conf-aws-custom-endpoints is no longer needed. Changes: - Remove ipi-conf-aws-custom-endpoints from all 5 EUSC workflow files - Update documentation to reflect implicit endpoint configuration - Simplify workflow by relying on installer's built-in EUSC support This addresses review feedback from yunjiang29 that the installer handles endpoints automatically for special AWS partitions like EUSC.
Update generated Prow job configurations after rebasing to the latest origin/main. Changes include: - Updated cluster assignments to match current build cluster distribution - EUSC jobs properly integrated with latest job generation logic
Delete 4 EUSC-specific workflows and 2 provision chains, replacing them with standard AWS workflows. This reduces maintenance burden and ensures consistency with standard AWS job configurations. Changes: - Delete cucushift-installer-rehearse-aws-eusc-ipi workflow - Delete cucushift-installer-rehearse-aws-eusc-ipi-private workflow - Delete cucushift-installer-rehearse-aws-eusc-ipi-private-sts workflow - Delete cucushift-installer-rehearse-aws-eusc-ipi-disconnected-private workflow - Delete cucushift-installer-rehearse-aws-eusc-ipi provision chain - Delete cucushift-installer-rehearse-aws-eusc-ipi-private provision chain Modified 5 jobs to use standard AWS workflows: - aws-eusc-ipi-fips-f7 → cucushift-installer-rehearse-aws-ipi - aws-eusc-ipi-f28-destructive → cucushift-installer-rehearse-aws-ipi - aws-eusc-ipi-private-sts-fips-f7 → aws-ipi-private-cco-manual-security-token-service - aws-eusc-ipi-private-mini-perm-f28 → cucushift-installer-rehearse-aws-ipi-private - aws-eusc-ipi-disc-priv-f28 → cucushift-installer-rehearse-aws-ipi-disconnected-private All modified jobs now include: - cluster_profile: aws-eusc (handles region and AMI configuration) - COMPUTE_NODE_TYPE: m5.xlarge - CONTROL_PLANE_INSTANCE_TYPE: m6i.xlarge Preserved for further discussion: - cucushift-installer-rehearse-aws-eusc-ipi-disconnected-private-kms (unique combination not available in standard AWS workflows) Result: -300 lines, 100% workflow reuse for modified jobs
- Delete last EUSC-specific workflow: cucushift-installer-rehearse-aws-eusc-ipi-disconnected-private-kms - Delete associated provision chain - Update aws-eusc-ipi-disc-priv-kms-f7 job to use standard cucushift-installer-rehearse-aws-ipi-disconnected-private workflow - Add COMPUTE_NODE_TYPE and CONTROL_PLANE_INSTANCE_TYPE env vars to the job All EUSC jobs now use standard AWS workflows with cluster_profile: aws-eusc. This completes the refactoring based on review feedback.
Two critical bug fixes in ipi-conf-aws-commands.sh:
1. Fix CONTROL_PLANE_AMI being unconditionally overwritten
- Before: Always fetched from GitHub in C2S/SC2S environments
- After: Only auto-detect if user hasn't provided CONTROL_PLANE_AMI
- Impact: Users can now override AMI for control plane nodes
2. Fix COMPUTE_AMI being unconditionally overwritten
- Before: COMPUTE_AMI="${CONTROL_PLANE_AMI}" (always overwrites)
- After: COMPUTE_AMI="${COMPUTE_AMI:-${CONTROL_PLANE_AMI}}" (respects user value)
- Impact: Users can now specify different AMIs for compute nodes
Both fixes are 100% backward compatible with existing jobs.
All current C2S/SC2S jobs don't set these env vars, so behavior unchanged.
Changes per yunjiang29's review comments:
1. Remove all 6 EUSC jobs from amd64-nightly.yaml
- All EUSC jobs now run against multi-nightly payload only
- ARM for non-destructive (f7), AMD for destructive (f28)
2. Fix ipi-conf-aws-commands.sh for C2S/SC2S:
- Restore version info comment: "# custom rhcos ami for non-public regions"
- Restore inline comments: "# 4.9 and below" and "# 4.10 and above"
- Add COMPUTE_AMI and echo in C2S block
- Remove unreasonable default COMPUTE_AMI logic outside C2S block
3. Fix multi-nightly.yaml jobs:
a) Rename KMS job to include "etcd" and meet 61-char limit:
aws-eusc-ipi-byo-kms-encryption-fips-tp-amd-f28-destructive
→ aws-eusc-ipi-byo-kms-etcd-encryption-fips-tp-f28-destructive
b) Fix KMS config for destructive job:
ENABLE_AWS_KMS_KEY_COMPUTE/CONTROL_PLANE: yes → no
ENABLE_AWS_KMS_KEY_DEFAULT_MACHINE: no → yes
c) Add -mini-perm to STS job names (they use AWS_INSTALL_USE_MINIMAL_PERMISSIONS):
aws-eusc-ipi-private-sts-tp-arm-f7
→ aws-eusc-ipi-private-sts-mini-perm-tp-arm-f7
aws-eusc-ipi-private-sts-tp-amd-f28-destructive
→ aws-eusc-ipi-private-sts-mini-perm-tp-amd-f28-destructive
Result:
- 8 EUSC jobs in multi-nightly (4 ARM f7 + 4 AMD f28-destructive)
- 4 installer presubmit jobs (unchanged)
- 0 EUSC jobs in amd64-nightly
- Total: 12 EUSC jobs (down from 18)
Fix the AMI configuration condition to check both CONTROL_PLANE_AMI and COMPUTE_AMI are empty before auto-fetching RHCOS AMIs for C2S/SC2S regions. Add AWS EUSC CI jobs to release-4.23 and release-5.0 multi-nightly configs. Regenerate jobs after rebase to latest main
… config
1. Configure Custom DNS jobs to use GCP Cloud DNS domain
- Changed BASE_DOMAIN to qe.gcp.devcluster.openshift.com
- Modified 6 custom DNS jobs across 4.22, 4.23, 5.0
- Custom DNS workflow requires external GCP Cloud DNS for testing
2. Add aws-eusc cluster type support to install scripts
- Modified ipi-install-install-aws-commands.sh (1 location)
- Modified ipi-install-install-commands.sh (2 locations)
- Fixes "Unsupported cluster type 'aws-eusc'" error
3. Configure verified AMI for all EUSC jobs
- Use ami-0b78302f83217d149 (120GB gp3 volume, verified working)
- Added COMPUTE_AMI and CONTROL_PLANE_AMI to 28 jobs:
- 24 openshift-tests-private multi-nightly jobs
- 4 installer presubmit jobs
- Do not use ami-0c5e051fb7dc39f8d (has 2GB volume issue)
66c8136 to
2faef2b
Compare
The rebase accidentally reverted two custom DNS jobs back to ci-eusc domain. This commit fixes them to use qe.gcp.devcluster.openshift.com as required by the custom DNS workflow.
|
[REHEARSALNOTIFIER]
A total of 31942 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@yunjiang29 Thanks for the detailed review! I have updated the PR accordingly. @patrickdillon @tthvo As openshift/cluster-ingress-operator#1360 is verified locally: #75568 (comment), after that PR is merged, I guess we can use the job here for testing then. |
|
[REHEARSALNOTIFIER]
A total of 31942 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
| @@ -625,7 +625,7 @@ periodics: | |||
| secret: | |||
| secretName: result-aggregator | |||
| - agent: kubernetes | |||
| cluster: build09 | |||
| cluster: build10 | |||
There was a problem hiding this comment.
@liweinan you can try to do a rebase to avoid such updates in your PR.
|
Rehearsal run shows that the credentials are not being found: |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Implement continuous integration support for AWS EUSC partition (aws-eusc) in eusc-de-east-1 region. Includes cluster profile definition, service endpoints configuration, custom AMI handling, and periodic test jobs.
This enables OpenShift testing on AWS's new European Sovereign Cloud infrastructure, which requires explicit endpoint configuration and custom RHCOS AMIs not available in public regions.