Skip to content

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171

Open
sbryngelson wants to merge 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test
Open

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171
sbryngelson wants to merge 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 19, 2026

Summary

Hardens self-hosted CI with test sharding, retry logic, and script deduplication.

Test sharding & retry

  • Add --shard i/n flag to ./mfc.sh test — splits tests via modular arithmetic for even distribution
  • Frontier GPU matrix now runs 2 shards per interface (acc/omp), halving wall-clock time
  • Zero-test guard on both --only and --shard — empty results raise an error instead of silent green CI
  • GitHub runner tests retry up to 5 sporadic failures using tests/failed_uuids.txt
  • Abort path cleans failed_uuids.txt to prevent stale retries

--only filter improvements

  • UUIDs use OR logic (match any), labels use AND logic (match all)
  • --only matching zero tests now raises an error instead of silently passing

CI script consolidation

  • Merge submit-bench.sh into submit.sh for all 3 clusters (frontier, frontier_amd, phoenix) — submit.sh auto-detects bench vs test mode from the submitted script's basename
  • Unify frontier/ and frontier_amd/ scripts via directory-name detection — build.sh, bench.sh, submit.sh, and test.sh are now byte-identical across both directories
  • Net deletion of 3 files and ~120 lines of duplicated shell code

Other

  • Frontier test jobs use --qos=normal on batch partition (1h59m, CFD154 account)
  • --requeue on Phoenix SLURM jobs for preemption recovery
  • Build retry wrapper (3 attempts with clean between)
  • Pin nick-fields/retry to commit SHA for security on self-hosted runners
  • Lint-gate must pass before self-hosted tests run
  • Skip benchmark workflow for bot review events

Depends on: #1170

Test plan

  • Frontier GPU tests run in 2 shards per interface and complete within 2h
  • Phoenix tests pass with --requeue and preemption recovery
  • Lint-gate blocks self-hosted tests on lint failure
  • GitHub runner retry logic fires on ≤5 test failures
  • Benchmark jobs submit correctly via merged submit.sh (bench mode auto-detected)
  • frontier/ and frontier_amd/ scripts are identical and detect cluster correctly
  • --shard with zero resulting tests raises an error (not silent pass)

Copilot AI review requested due to automatic review settings February 19, 2026 20:01
@codeant-ai codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

cubic-dev-ai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:M This PR changes 30-99 lines, ignoring generated files labels Feb 20, 2026
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 21, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 23, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 23, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026
@sbryngelson sbryngelson marked this pull request as draft February 25, 2026 01:04
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
sbryngelson and others added 3 commits February 25, 2026 21:23
The -s check already guarantees the file is non-empty, so
NUM_FAILED > 0 is always true in that branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…zero-match guard

- Include shard in SLURM job_slug to prevent output file collisions
  between parallel shards (e.g., test-gpu-acc-1-of-2.out)
- Consolidate frontier/ and frontier_amd/ submit.sh and test.sh into
  identical scripts that derive compiler flag and config from directory
- Add $shard_opts to CPU test branch for future-proofing
- Add zero-match guard for --only filter to fail loudly instead of
  silently exiting 0 when no tests match
- Hoist failed_uuids_path to single definition at top of test()
- Compute log slug dynamically in test.yml for shard-aware filenames
- Remove unnecessary shard: '' from non-sharded matrix entries
- Replace useless cat|tr pipeline with tr < file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The --only filter now detects whether each term is a UUID (8-char hex)
or a trace label and applies appropriate matching:
  - Labels: AND logic (--only 2D Bubbles matches tests with both)
  - UUIDs: OR logic (--only UUID1 UUID2 matches tests with either)
  - Mixed: keep case if all labels match OR any UUID matches

This preserves the documented behavior for label filtering while
correctly supporting the CI retry path that passes multiple UUIDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
submit.sh now auto-detects job type (bench vs test) from the submitted
script's basename, selecting the appropriate SBATCH account, time limit,
and partition. This eliminates three submit-bench.sh files and makes
frontier/ and frontier_amd/ scripts byte-identical via directory-name
detection for compiler flags and cluster-specific options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
sbryngelson and others added 2 commits February 26, 2026 09:40
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise MFCException when --shard produces zero cases (prevents
  silent green CI with nothing executed)
- Pin nick-fields/retry to commit SHA for security on self-hosted
  runners with cluster credentials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@sbryngelson sbryngelson marked this pull request as ready for review February 26, 2026 22:07
coderabbitai[bot]

This comment was marked as outdated.

@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
sbryngelson and others added 2 commits February 26, 2026 18:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MFlowCode MFlowCode deleted a comment from codecov bot Feb 27, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 27, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 27, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 27, 2026
Replace per-case case-optimized builds with one generic build, reducing
build time from ~34 min to ~5-10 min. Halve benchmark timesteps to
compensate for slower non-optimized runtime. Reduce GPU --mem from 12
to 4 GB. Lower test build retry timeout from 480 to 60 minutes.

Closes MFlowCode#1275

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
.github/workflows/phoenix/bench.sh (1)

37-37: Quote $(nproc) to prevent word splitting.

The static analysis tool correctly identifies that $(nproc) should be quoted to prevent potential word splitting issues.

Proposed fix
-    if ./mfc.sh build -j $(nproc) $build_opts; then
+    if ./mfc.sh build -j "$(nproc)" $build_opts; then

Also apply to line 53:

-./mfc.sh bench $bench_opts -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
+./mfc.sh bench $bench_opts -j "$(nproc)" -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/frontier_amd/build.sh (1)

46-46: Quote the command substitution to prevent word splitting.

Per shellcheck SC2046, the unquoted $(...) can cause unexpected word splitting. While the current output is a single flag, quoting improves robustness.

Suggested fix
-        if ./mfc.sh test -v -a --dry-run $([ "$cluster_name" = "frontier" ] && echo "--rdma-mpi") -j 8 $build_opts; then
+        if ./mfc.sh test -v -a --dry-run "$([ "$cluster_name" = "frontier" ] && echo "--rdma-mpi")" -j 8 $build_opts; then

Note: Quoting the empty string when the condition is false will pass an empty argument. If mfc.sh cannot handle empty arguments gracefully, use a conditional approach instead:

rdma_opt=""
[ "$cluster_name" = "frontier" ] && rdma_opt="--rdma-mpi"
if ./mfc.sh test -v -a --dry-run $rdma_opt -j 8 $build_opts; then
.github/workflows/frontier_amd/submit.sh (1)

21-22: Quote $1 to handle filenames with spaces or special characters.

While unlikely in this CI context, quoting the variable is a shell best practice.

Suggested fix
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents="$(cat "$1")"
.github/workflows/phoenix/submit.sh (1)

12-13: Quote $1 for robustness.

Same recommendation as the frontier submit scripts.

Suggested fix
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents="$(cat "$1")"

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1412eb2 and 73fd804.

📒 Files selected for processing (24)
  • .github/scripts/submit_and_monitor_bench.sh
  • .github/workflows/bench.yml
  • .github/workflows/frontier/bench.sh
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit-bench.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/bench.sh
  • .github/workflows/frontier_amd/build.sh
  • .github/workflows/frontier_amd/submit-bench.sh
  • .github/workflows/frontier_amd/submit.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit-bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/test.yml
  • benchmarks/5eq_rk3_weno3_hllc/case.py
  • benchmarks/hypo_hll/case.py
  • benchmarks/ibm/case.py
  • benchmarks/igr/case.py
  • benchmarks/viscous_weno5_sgb_acoustic/case.py
  • toolchain/mfc/bench.py
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/test.py
💤 Files with no reviewable changes (3)
  • .github/workflows/frontier/submit-bench.sh
  • .github/workflows/frontier_amd/submit-bench.sh
  • .github/workflows/phoenix/submit-bench.sh

Comment on lines +59 to +63
shard_suffix=""
if [ -n "$4" ]; then
shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
fi
job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate shard format before deriving job_slug.

An invalid shard value currently gets queued and only fails later in test argument parsing. Early validation here avoids wasting scheduler time.

🔧 Proposed fix
 shard_suffix=""
 if [ -n "$4" ]; then
-    shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
+    if [[ ! "$4" =~ ^[1-9][0-9]*/[1-9][0-9]*$ ]]; then
+        echo "ERROR: Invalid shard '$4'. Expected i/n (e.g., 1/2)."
+        exit 1
+    fi
+    shard_i="${4%/*}"
+    shard_n="${4#*/}"
+    if [ "$shard_i" -gt "$shard_n" ]; then
+        echo "ERROR: Invalid shard '$4'. Expected i<=n."
+        exit 1
+    fi
+    shard_suffix="-${shard_i}-of-${shard_n}"
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
shard_suffix=""
if [ -n "$4" ]; then
shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
fi
job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"
shard_suffix=""
if [ -n "$4" ]; then
if [[ ! "$4" =~ ^[1-9][0-9]*/[1-9][0-9]*$ ]]; then
echo "ERROR: Invalid shard '$4'. Expected i/n (e.g., 1/2)."
exit 1
fi
shard_i="${4%/*}"
shard_n="${4#*/}"
if [ "$shard_i" -gt "$shard_n" ]; then
echo "ERROR: Invalid shard '$4'. Expected i<=n."
exit 1
fi
shard_suffix="-${shard_i}-of-${shard_n}"
fi
job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"

Comment on lines +55 to +63
uuids = [t for t in ARG("only") if is_uuid(t)]
labels = [t for t in ARG("only") if not is_uuid(t)]

for case in cases[:]:
check = set(case.trace.split(" -> "))
check.add(case.get_uuid())

label_ok = all(label in check for label in labels) if labels else True
uuid_ok = any(u in check for u in uuids) if uuids else True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize --only UUID terms before matching.

Line 53 accepts uppercase UUID input, but Lines 60 and 63 perform case-sensitive membership checks. That can miss valid UUID filters like AB12CD34.

🔧 Proposed fix
-    uuids  = [t for t in ARG("only") if is_uuid(t)]
+    uuids  = [t.lower() for t in ARG("only") if is_uuid(t)]
@@
-        check.add(case.get_uuid())
+        check.add(case.get_uuid().lower())

@MFlowCode MFlowCode deleted a comment from codecov bot Feb 27, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 27, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 27, 2026
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (1412eb2) to head (73fd804).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
- Coverage   44.05%   44.04%   -0.02%     
==========================================
  Files          70       70              
  Lines       20496    20499       +3     
  Branches     1991     1993       +2     
==========================================
- Hits         9029     9028       -1     
- Misses      10328    10330       +2     
- Partials     1139     1141       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

2 participants