Add test sharding, proactive clean, and retry logic for self-hosted CI by sbryngelson · Pull Request #1171 · MFlowCode/MFC

sbryngelson · 2026-02-19T20:01:54Z

Summary

Hardens self-hosted CI with test sharding, retry logic, and script deduplication.

Test sharding & retry

Add --shard i/n flag to ./mfc.sh test — splits tests via modular arithmetic for even distribution
Frontier GPU matrix now runs 2 shards per interface (acc/omp), halving wall-clock time
Zero-test guard on both --only and --shard — empty results raise an error instead of silent green CI
GitHub runner tests retry up to 5 sporadic failures using tests/failed_uuids.txt
Abort path cleans failed_uuids.txt to prevent stale retries

`--only` filter improvements

UUIDs use OR logic (match any), labels use AND logic (match all)
--only matching zero tests now raises an error instead of silently passing

CI script consolidation

Merge submit-bench.sh into submit.sh for all 3 clusters (frontier, frontier_amd, phoenix) — submit.sh auto-detects bench vs test mode from the submitted script's basename
Unify frontier/ and frontier_amd/ scripts via directory-name detection — build.sh, bench.sh, submit.sh, and test.sh are now byte-identical across both directories
Net deletion of 3 files and ~120 lines of duplicated shell code

Other

Frontier test jobs use --qos=normal on batch partition (1h59m, CFD154 account)
--requeue on Phoenix SLURM jobs for preemption recovery
Build retry wrapper (3 attempts with clean between)
Pin nick-fields/retry to commit SHA for security on self-hosted runners
Lint-gate must pass before self-hosted tests run
Skip benchmark workflow for bot review events

Depends on: #1170

Test plan

Frontier GPU tests run in 2 shards per interface and complete within 2h
Phoenix tests pass with --requeue and preemption recovery
Lint-gate blocks self-hosted tests on lint failure
GitHub runner retry logic fires on ≤5 test failures
Benchmark jobs submit correctly via merged submit.sh (bench mode auto-detected)
frontier/ and frontier_amd/ scripts are identical and detect cluster correctly
--shard with zero resulting tests raises an error (not silent pass)

The -s check already guarantees the file is non-empty, so NUM_FAILED > 0 is always true in that branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…zero-match guard - Include shard in SLURM job_slug to prevent output file collisions between parallel shards (e.g., test-gpu-acc-1-of-2.out) - Consolidate frontier/ and frontier_amd/ submit.sh and test.sh into identical scripts that derive compiler flag and config from directory - Add $shard_opts to CPU test branch for future-proofing - Add zero-match guard for --only filter to fail loudly instead of silently exiting 0 when no tests match - Hoist failed_uuids_path to single definition at top of test() - Compute log slug dynamically in test.yml for shard-aware filenames - Remove unnecessary shard: '' from non-sharded matrix entries - Replace useless cat|tr pipeline with tr < file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The --only filter now detects whether each term is a UUID (8-char hex) or a trace label and applies appropriate matching: - Labels: AND logic (--only 2D Bubbles matches tests with both) - UUIDs: OR logic (--only UUID1 UUID2 matches tests with either) - Mixed: keep case if all labels match OR any UUID matches This preserves the documented behavior for label filtering while correctly supporting the CI retry path that passes multiple UUIDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

submit.sh now auto-detects job type (bench vs test) from the submitted script's basename, selecting the appropriate SBATCH account, time limit, and partition. This eliminates three submit-bench.sh files and makes frontier/ and frontier_amd/ scripts byte-identical via directory-name detection for compiler flags and cluster-specific options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Raise MFCException when --shard produces zero cases (prevents silent green CI with nothing executed) - Pin nick-fields/retry to commit SHA for security on self-hosted runners with cluster credentials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace per-case case-optimized builds with one generic build, reducing build time from ~34 min to ~5-10 min. Halve benchmark timesteps to compensate for slower non-optimized runtime. Reduce GPU --mem from 12 to 4 GB. Lower test build retry timeout from 480 to 60 minutes. Closes MFlowCode#1275 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

.github/workflows/phoenix/bench.sh (1)
37-37: Quote $(nproc) to prevent word splitting.

The static analysis tool correctly identifies that $(nproc) should be quoted to prevent potential word splitting issues.
Proposed fix
-    if ./mfc.sh build -j $(nproc) $build_opts; then
+    if ./mfc.sh build -j "$(nproc)" $build_opts; then
Also apply to line 53:
-./mfc.sh bench $bench_opts -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
+./mfc.sh bench $bench_opts -j "$(nproc)" -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/frontier_amd/build.sh (1)
46-46: Quote the command substitution to prevent word splitting.

Per shellcheck SC2046, the unquoted $(...) can cause unexpected word splitting. While the current output is a single flag, quoting improves robustness.
Suggested fix
-        if ./mfc.sh test -v -a --dry-run $([ "$cluster_name" = "frontier" ] && echo "--rdma-mpi") -j 8 $build_opts; then
+        if ./mfc.sh test -v -a --dry-run "$([ "$cluster_name" = "frontier" ] && echo "--rdma-mpi")" -j 8 $build_opts; then
Note: Quoting the empty string when the condition is false will pass an empty argument. If mfc.sh cannot handle empty arguments gracefully, use a conditional approach instead:
rdma_opt=""
[ "$cluster_name" = "frontier" ] && rdma_opt="--rdma-mpi"
if ./mfc.sh test -v -a --dry-run $rdma_opt -j 8 $build_opts; then
.github/workflows/frontier_amd/submit.sh (1)
21-22: Quote $1 to handle filenames with spaces or special characters.

While unlikely in this CI context, quoting the variable is a shell best practice.
Suggested fix
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents="$(cat "$1")"
.github/workflows/phoenix/submit.sh (1)
12-13: Quote $1 for robustness.

Same recommendation as the frontier submit scripts.
Suggested fix
 if [ ! -z "$1" ]; then
-    sbatch_script_contents=`cat $1`
+    sbatch_script_contents="$(cat "$1")"

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1412eb2 and 73fd804.

📒 Files selected for processing (24)

.github/scripts/submit_and_monitor_bench.sh
.github/workflows/bench.yml
.github/workflows/frontier/bench.sh
.github/workflows/frontier/build.sh
.github/workflows/frontier/submit-bench.sh
.github/workflows/frontier/submit.sh
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/bench.sh
.github/workflows/frontier_amd/build.sh
.github/workflows/frontier_amd/submit-bench.sh
.github/workflows/frontier_amd/submit.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/bench.sh
.github/workflows/phoenix/submit-bench.sh
.github/workflows/phoenix/submit.sh
.github/workflows/test.yml
benchmarks/5eq_rk3_weno3_hllc/case.py
benchmarks/hypo_hll/case.py
benchmarks/ibm/case.py
benchmarks/igr/case.py
benchmarks/viscous_weno5_sgb_acoustic/case.py
toolchain/mfc/bench.py
toolchain/mfc/cli/commands.py
toolchain/mfc/test/test.py

💤 Files with no reviewable changes (3)

.github/workflows/frontier/submit-bench.sh
.github/workflows/frontier_amd/submit-bench.sh
.github/workflows/phoenix/submit-bench.sh

coderabbitai · 2026-02-27T17:41:17Z

.github/workflows/frontier/submit.sh

+shard_suffix=""
+if [ -n "$4" ]; then
+    shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
+fi
+job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"


⚠️ Potential issue | 🟡 Minor

Validate shard format before deriving job_slug.

An invalid shard value currently gets queued and only fails later in test argument parsing. Early validation here avoids wasting scheduler time.

🔧 Proposed fix

shard_suffix="" if [ -n "$4" ]; then - shard_suffix="-$(echo "$4" | sed 's|/|-of-|')" + if [[ ! "$4" =~ ^[1-9][0-9]*/[1-9][0-9]*$ ]]; then + echo "ERROR: Invalid shard '$4'. Expected i/n (e.g., 1/2)." + exit 1 + fi + shard_i="${4%/*}" + shard_n="${4#*/}" + if [ "$shard_i" -gt "$shard_n" ]; then + echo "ERROR: Invalid shard '$4'. Expected i<=n." + exit 1 + fi + shard_suffix="-${shard_i}-of-${shard_n}" fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

shard_suffix=""

if [ -n "$4" ]; then

shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"

fi

job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"

shard_suffix=""

if [ -n "$4" ]; then

if [[ ! "$4" =~ ^[1-9][0-9]*/[1-9][0-9]*$ ]]; then

echo "ERROR: Invalid shard '$4'. Expected i/n (e.g., 1/2)."

exit 1

fi

shard_i="${4%/*}"

shard_n="${4#*/}"

if [ "$shard_i" -gt "$shard_n" ]; then

echo "ERROR: Invalid shard '$4'. Expected i<=n."

exit 1

fi

shard_suffix="-${shard_i}-of-${shard_n}"

fi

job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"

coderabbitai · 2026-02-27T17:41:17Z

toolchain/mfc/test/test.py

+    uuids  = [t for t in ARG("only") if is_uuid(t)]
+    labels = [t for t in ARG("only") if not is_uuid(t)]
+
+    for case in cases[:]:
+        check = set(case.trace.split(" -> "))
+        check.add(case.get_uuid())
+
+        label_ok = all(label in check for label in labels) if labels else True
+        uuid_ok  = any(u in check for u in uuids)  if uuids  else True


⚠️ Potential issue | 🟡 Minor

Normalize --only UUID terms before matching.

Line 53 accepts uppercase UUID input, but Lines 60 and 63 perform case-sensitive membership checks. That can miss valid UUID filters like AB12CD34.

🔧 Proposed fix

- uuids = [t for t in ARG("only") if is_uuid(t)] + uuids = [t.lower() for t in ARG("only") if is_uuid(t)] @@ - check.add(case.get_uuid()) + check.add(case.get_uuid().lower())

codecov · 2026-02-27T20:43:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (1412eb2) to head (73fd804).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
- Coverage   44.05%   44.04%   -0.02%     
==========================================
  Files          70       70              
  Lines       20496    20499       +3     
  Branches     1991     1993       +2     
==========================================
- Hits         9029     9028       -1     
- Misses      10328    10330       +2     
- Partials     1139     1141       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 19, 2026 20:01

Copilot started reviewing on behalf of sbryngelson February 19, 2026 20:02 View session

codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026

This comment was marked as outdated.

Sign in to view

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:M This PR changes 30-99 lines, ignoring generated files labels Feb 20, 2026

This comment was marked as outdated.

Sign in to view

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 21, 2026

sbryngelson force-pushed the ci-test branch from 55b68e5 to 491b27b Compare February 23, 2026 14:50

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 23, 2026

MFlowCode deleted a comment from github-actions bot Feb 23, 2026

sbryngelson force-pushed the ci-test branch from 3ce4f39 to f3bab46 Compare February 24, 2026 16:03

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026

sbryngelson force-pushed the ci-test branch from 749eb67 to a9b1e40 Compare February 24, 2026 16:49

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026

sbryngelson marked this pull request as draft February 25, 2026 01:04

MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026

sbryngelson and others added 3 commits February 25, 2026 21:23

Remove redundant NUM_FAILED > 0 guard in test retry logic

b5c095f

The -s check already guarantees the file is non-empty, so NUM_FAILED > 0 is always true in that branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MFlowCode deleted a comment from github-actions bot Feb 26, 2026

sbryngelson and others added 2 commits February 26, 2026 09:40

Use normal QOS instead of hackathon for Frontier test jobs

a1c55ed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MFlowCode deleted a comment from github-actions bot Feb 26, 2026

Merge branch 'master' into ci-test

8b48e30

sbryngelson marked this pull request as ready for review February 26, 2026 22:07

This comment was marked as outdated.

Sign in to view

MFlowCode deleted a comment from github-actions bot Feb 26, 2026

sbryngelson and others added 2 commits February 26, 2026 18:22

Trigger CI

46dcd73

Rename ambiguous single-letter variable l to label in _filter_only

a2431bf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171
sbryngelson wants to merge 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test

sbryngelson commented Feb 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 27, 2026

Uh oh!

coderabbitai bot Feb 27, 2026

Uh oh!

codecov bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

sbryngelson commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test sharding & retry

--only filter improvements

CI script consolidation

Other

Test plan

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 27, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

sbryngelson commented Feb 19, 2026 •

edited

Loading

`--only` filter improvements