Skip to content

test: fix flaky QuPath test#456

Merged
olivermeyer merged 3 commits intomainfrom
test/flaky-qupath-test
Mar 3, 2026
Merged

test: fix flaky QuPath test#456
olivermeyer merged 3 commits intomainfrom
test/flaky-qupath-test

Conversation

@olivermeyer
Copy link
Collaborator

@olivermeyer olivermeyer commented Mar 3, 2026

test_cli_install_and_uninstall is still flaky despite adding retries to it. This could be because we iterate over various installations in the same test instance - if any of these downloads fails, we retry all installations. This PR changes this behaviour by parameterising the test instead of iterating, so that each installation can be retried independently (see 57e8d09).

Additionally, the CLI and GUI tests for QuPath all have the same mechanism to re-install QuPath after the test if it was installed before the test. I move this to a fixture in e115699. Note that this is still somewhat inefficient: we could technically only uninstall/reinstall once per session instead of once per test. I leave that as a future improvement should it be needed.

Finally in c2ea7f7 I remove a trailing equal sign from a log which was confusing while troubleshooting this.

Copilot AI review requested due to automatic review settings March 3, 2026 09:03
@olivermeyer olivermeyer marked this pull request as ready for review March 3, 2026 09:03
@olivermeyer olivermeyer added the claude Trigger Claude Code automation label Mar 3, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the reliability of QuPath end-to-end tests by isolating install/uninstall scenarios into independent test cases and centralizing QuPath state cleanup into a shared fixture; it also cleans up a confusing download error log message.

Changes:

  • Parameterize test_cli_install_and_uninstall so each platform install/uninstall is retried independently.
  • Add a shared qupath_save_restore fixture and use it across CLI/GUI QuPath tests to restore pre-test installation state.
  • Remove a trailing = from the QuPath download failure log message.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/aignostics/qupath/gui_test.py Switch GUI tests to rely on the shared save/restore fixture instead of inline uninstall/reinstall logic.
tests/aignostics/qupath/conftest.py Introduce qupath_save_restore fixture to uninstall before tests and restore after when needed.
tests/aignostics/qupath/cli_test.py Parameterize install/uninstall test and adopt the shared save/restore fixture across CLI tests.
src/aignostics/qupath/_service.py Clean up download error message formatting.

@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @olivermeyer's task in 1m 48s —— View job


✅ Review Update: QuPath Test Improvements Verified

I've reviewed the latest changes to address the flaky QuPath test issue. The author has successfully addressed the critical error handling concern raised in the previous review.


📋 Review Progress

  • Read repository documentation (CLAUDE.md)
  • Read changed files to understand current state
  • Verify critical error handling issue was fixed
  • Check test markers are correct
  • Review fixture implementation
  • Provide final review feedback

✅ Critical Checks - All Pass

1. Error Handling in Fixture ✅ FIXED

The critical issue from the previous review has been properly addressed:

Location: tests/aignostics/qupath/conftest.py:15-26

What was fixed:

  • ✅ Added assertion for uninstall exit code (lines 15-17)
  • ✅ Added error handling for reinstall failure (lines 21-26)
# Uninstall check - validates exit code is 0 (success) or 2 (not installed)
assert result.exit_code in {0, 2}, (
    f"Unexpected exit code {result.exit_code} from 'qupath uninstall': {result.output}"
)

# Reinstall check - fails test if reinstall doesn't succeed
if reinstall_result.exit_code != 0:
    pytest.fail(
        f"Failed to reinstall QuPath in qupath_save_restore fixture "
        f"(exit code {reinstall_result.exit_code}). Output:\n{reinstall_result.output}"
    )

Why this matters: Sequential tests (@pytest.mark.sequential) depend on proper cleanup. If teardown fails silently, all subsequent tests would fail mysteriously.

2. Test Markers ✅ All Correct

$ uv run pytest -m "not unit and not integration and not e2e" --collect-only tests/aignostics/qupath/
collected 9 items / 9 deselected / 0 selected

Result: ✅ All QuPath tests have proper category markers (@pytest.mark.e2e)

All tests correctly marked with:

  • @pytest.mark.e2e (category - required for CI)
  • @pytest.mark.long_running (timeout ≥ 5 min)
  • @pytest.mark.sequential (cannot run in parallel)
  • @pytest.mark.flaky (retry on failure)

3. Fixture Integration ✅ Proper

Good: The fixture now uses the shared runner fixture from tests/conftest.py:234:

@pytest.fixture
def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:

This ensures consistency with the rest of the test suite (uses the same CliRunner() instance pattern).

4. Type Annotations ✅ Correct

The fixture has proper type annotations:

def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:

🎯 PR Changes Summary

1. Parametrization for Independent Retries ✅ Excellent

Commit: 225413a

Before: Single test iterating 4 platforms → any failure retries all 4
After: 4 parametrized tests → each platform retries independently

@pytest.mark.parametrize("platform_config", _INSTALL_UNINSTALL_PLATFORM_CONFIGS)
def test_cli_install_and_uninstall(runner: CliRunner, qupath_save_restore: None, platform_config: dict):

Benefits:

  • ⏱️ Faster CI: Only failed platform retries
  • 🎯 Better isolation: Platform-specific issues don't cascade
  • 📊 Clearer results: Exactly which platform failed is visible
  • ⏲️ Smarter timeout: Reduced from 10 min → 5 min (makes sense per-platform)

2. DRY Fixture Extraction ✅ Clean Refactoring

Commit: 9198d12

Eliminated duplication across 6 test functions:

  • test_cli_install_and_uninstall
  • test_cli_install_launch_project_annotations_headless
  • test_cli_install_and_launch_ui
  • test_gui_qupath_install_only
  • test_gui_qupath_install_and_launch
  • test_gui_run_qupath_install_to_inspect

Impact: ~30 lines of duplicated code → single 17-line fixture

3. Error Message Cleanup ✅ Clear Improvement

Commit: c4ce923

Before: Failed to download QuPath from https://example.com=
After: Failed to download QuPath from https://example.com

Simple fix, but important for debugging clarity.


📊 Quality Metrics

Code Coverage

  • Status: ⚠️ 0% coverage on new code (expected - fixture code is hard to cover)
  • Explanation: The fixture only runs during QuPath e2e tests, which are not executed in coverage runs
  • Overall: 85%+ project coverage maintained ✅

Static Analysis

  • SonarQube: 1 new issue (likely the 0% coverage complaint) - not blocking
  • Linting: All checks pass (verified in previous review)
  • Type Checking: MyPy + PyRight both pass ✅

🎓 Architecture Compliance

Test Strategy: Correctly using e2e, long_running, sequential, flaky markers
Modulith Pattern: Tests properly isolated within qupath module
Fixture Hierarchy: Properly uses shared runner fixture from root conftest
Error Handling: Fixtures now fail fast on unexpected states
Documentation: PR description clearly explains the changes


💡 Minor Suggestion (Optional - Not Blocking)

The fixture could benefit from a docstring explaining the restore behavior:

@pytest.fixture
def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:
    """Uninstall QuPath for clean test state, restore if it was previously installed.
    
    This fixture ensures tests start with a clean QuPath installation state:
    1. Uninstalls QuPath before test (exit code 0=was installed, 2=already uninstalled)
    2. Runs the test
    3. Reinstalls QuPath if it was installed before (preserves developer environment)
    
    Raises:
        AssertionError: If uninstall fails with unexpected exit code
        pytest.Failed: If reinstall fails (critical for sequential tests)
    """

This is minor and not blocking - the current implementation is clear enough.


📌 Final Recommendation

Status: ✅ Approved - Ready to Merge

All critical issues from the previous review have been addressed:

  • ✅ Error handling added to fixture teardown
  • ✅ Fixture uses shared runner from conftest
  • ✅ All test markers correct
  • ✅ Type annotations complete
  • ✅ Architecture compliance verified

This PR meaningfully improves test reliability through:

  1. Independent platform retry logic
  2. Reusable fixture pattern
  3. Proper error handling in test infrastructure

Excellent work! 🚀 The changes demonstrate strong engineering practices:

  • Root cause analysis (retry scope was the problem)
  • Minimal, focused changes
  • Refactoring discipline (DRY principle)
  • Clear communication in PR description

@olivermeyer olivermeyer force-pushed the test/flaky-qupath-test branch from 8e8ea89 to c4ce923 Compare March 3, 2026 09:10
@claude
Copy link

claude bot commented Mar 3, 2026

✅ Code Review: Fix Flaky QuPath Test

I've completed a thorough review of this PR that addresses flaky QuPath testing. This is excellent work that meaningfully improves test reliability and maintainability.


🎯 Summary

Status: ✅ Approved - All critical checks pass

This PR implements three focused improvements:

  1. Parametrizes test_cli_install_and_uninstall for independent platform retries
  2. Extracts common save/restore pattern into a reusable qupath_save_restore fixture
  3. Fixes a confusing trailing = in download error messages

✅ Critical Checks (All Pass)

1. Test Markers ✅

Result: All tests properly marked with @pytest.mark.e2e, @pytest.mark.long_running, @pytest.mark.sequential

# Verified: No unmarked tests found
$ uv run pytest -m "not unit and not integration and not e2e" --collect-only tests/aignostics/qupath/
collected 9 items / 9 deselected / 0 selected

2. Code Quality ✅

Result: All linting and type checking passes

$ make lint
All checks passed!
161 files already formatted
0 errors, 0 warnings, 0 informations
Success: no issues found in 81 source files

3. Conventional Commits ✅

Result: All commits follow proper format

c4ce923 fix(qupath): remove trailing equal sign from logs
225413a test(qupath): parametrize install/uninstall test for independent retries
9198d12 refactor(tests): extract save/restore fixture into qupath conftest

🎉 Excellent Improvements

1. Parametrization for Independent Retries (d89a75d)

Before: Single test iterating 4 platforms → if platform 3 fails, retry from platform 1
After: 4 independent parametrized tests → if darwin-amd64 fails, only retry that platform

@pytest.mark.parametrize("platform_config", _INSTALL_UNINSTALL_PLATFORM_CONFIGS)
def test_cli_install_and_uninstall(runner: CliRunner, qupath_save_restore: None, platform_config: dict) -> None:
    # Each platform config gets its own test instance with independent retry logic

Impact:

  • ⏱️ Faster retries: Only failed platforms retry, not all 4
  • 🎯 Better isolation: Platform-specific failures don't affect others
  • 📊 Clearer results: CI shows exactly which platform failed

Smart timeout reduction: 10 min → 5 min makes sense since each platform now runs independently.

2. DRY Fixture Extraction (a0b3197)

Before: 30+ lines duplicated across 5 test functions
After: Single reusable qupath_save_restore fixture

Eliminated duplication in:

  • test_cli_install_and_uninstall
  • test_cli_install_launch_project_annotations_headless
  • test_cli_install_and_launch_ui
  • test_gui_qupath_install_only
  • test_gui_qupath_install_and_launch
  • test_gui_run_qupath_install_to_inspect

Type annotation: ✅ Properly annotated as Generator[None, None, None]

3. Clean Error Message (8e8ea89)

Before: Failed to download QuPath from https://example.com=
After: Failed to download QuPath from https://example.com

Simple but important for debugging clarity.


🔍 Review Findings

Critical Issue: Missing Error Handling in Fixture ⚠️

Location: tests/aignostics/qupath/conftest.py:22

The fixture's teardown doesn't verify reinstall succeeded:

if was_installed:
    runner.invoke(cli, ["qupath", "install"])  # ❌ Exit code ignored

Problem: If reinstall fails, test passes but environment left broken for subsequent tests.

Recommended fix:

if was_installed:
    reinstall_result = runner.invoke(cli, ["qupath", "install"])
    if reinstall_result.exit_code != 0:
        pytest.fail(
            f"Failed to reinstall QuPath in qupath_save_restore fixture "
            f"(exit code {reinstall_result.exit_code}). Output:\n{reinstall_result.output}"
        )

Why this matters:

  • Tests run sequentially (@pytest.mark.sequential)
  • Broken teardown cascades to all subsequent tests
  • Hard to diagnose "QuPath not found" in test 3 when teardown failed in test 1

Note: This issue already existed in the duplicated code - the refactoring just makes it more visible. Both Copilot and Sentry flagged this correctly.

Fix this →


📋 Architecture Compliance

Test Strategy: Correctly using e2e, long_running, sequential, flaky markers
Modulith Pattern: Tests properly isolated within qupath module
Medical Device Standards: No impact on compliance (test-only changes)
Documentation: PR description clearly explains the motivation


💡 Future Improvement (Optional)

As noted in the PR description:

Note that this is still somewhat inefficient: we could technically only uninstall/reinstall once per session instead of once per test.

Suggestion: Consider @pytest.fixture(scope="session") if QuPath installation is expensive and safe to share across tests. However, current per-test approach is safer for isolation.


🎓 What I Learned

This PR demonstrates excellent engineering practices:

  1. Root cause analysis: Identified retry scope as the actual problem
  2. Minimal changes: Fixed only what's necessary
  3. Refactoring discipline: Extracted duplication while fixing the bug
  4. Clear communication: PR description explains reasoning well

📌 Recommendation

Approve with one suggestion: Add error handling to fixture teardown (see above).

All other aspects are excellent. This PR meaningfully improves test reliability and will make future debugging much easier.

Great work! 🚀


View job run

Copilot AI review requested due to automatic review settings March 3, 2026 09:18
@olivermeyer olivermeyer force-pushed the test/flaky-qupath-test branch from c4ce923 to 2274524 Compare March 3, 2026 09:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@olivermeyer olivermeyer force-pushed the test/flaky-qupath-test branch from 2274524 to ebfe1ee Compare March 3, 2026 09:26
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/aignostics/qupath/_service.py 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines Coverage Δ
src/aignostics/qupath/_service.py 57.29% <0.00%> (+1.68%) ⬆️

... and 4 files with indirect coverage changes

Copilot AI review requested due to automatic review settings March 3, 2026 10:07
@olivermeyer olivermeyer force-pushed the test/flaky-qupath-test branch from ebfe1ee to c2ea7f7 Compare March 3, 2026 10:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@olivermeyer
Copy link
Collaborator Author

CodeCov and SonarCloud are failing because of 0% of coverage on new code, but new code is simply fixing a log line. Safe to merge regardless.

@arne-aignx arne-aignx self-assigned this Mar 3, 2026
@arne-aignx arne-aignx removed their assignment Mar 3, 2026
@arne-aignx arne-aignx self-requested a review March 3, 2026 13:05
@arne-aignx
Copy link
Collaborator

LGTM, tests of this kind are always annoying

@olivermeyer olivermeyer merged commit fe82486 into main Mar 3, 2026
45 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude Trigger Claude Code automation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants