test: fix flaky QuPath test by olivermeyer · Pull Request #456 · aignostics/python-sdk

olivermeyer · 2026-03-03T09:03:42Z

test_cli_install_and_uninstall is still flaky despite adding retries to it. This could be because we iterate over various installations in the same test instance - if any of these downloads fails, we retry all installations. This PR changes this behaviour by parameterising the test instead of iterating, so that each installation can be retried independently (see 57e8d09).

Additionally, the CLI and GUI tests for QuPath all have the same mechanism to re-install QuPath after the test if it was installed before the test. I move this to a fixture in e115699. Note that this is still somewhat inefficient: we could technically only uninstall/reinstall once per session instead of once per test. I leave that as a future improvement should it be needed.

Finally in c2ea7f7 I remove a trailing equal sign from a log which was confusing while troubleshooting this.

Copilot

Pull request overview

This PR improves the reliability of QuPath end-to-end tests by isolating install/uninstall scenarios into independent test cases and centralizing QuPath state cleanup into a shared fixture; it also cleans up a confusing download error log message.

Changes:

Parameterize test_cli_install_and_uninstall so each platform install/uninstall is retried independently.
Add a shared qupath_save_restore fixture and use it across CLI/GUI QuPath tests to restore pre-test installation state.
Remove a trailing = from the QuPath download failure log message.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`tests/aignostics/qupath/gui_test.py`	Switch GUI tests to rely on the shared save/restore fixture instead of inline uninstall/reinstall logic.
`tests/aignostics/qupath/conftest.py`	Introduce `qupath_save_restore` fixture to uninstall before tests and restore after when needed.
`tests/aignostics/qupath/cli_test.py`	Parameterize install/uninstall test and adopt the shared save/restore fixture across CLI tests.
`src/aignostics/qupath/_service.py`	Clean up download error message formatting.

tests/aignostics/qupath/conftest.py

tests/aignostics/qupath/cli_test.py

tests/aignostics/qupath/conftest.py

src/aignostics/qupath/_service.py

claude · 2026-03-03T09:09:24Z

Claude finished @olivermeyer's task in 1m 48s —— View job

✅ Review Update: QuPath Test Improvements Verified

I've reviewed the latest changes to address the flaky QuPath test issue. The author has successfully addressed the critical error handling concern raised in the previous review.

📋 Review Progress

Read repository documentation (CLAUDE.md)
Read changed files to understand current state
Verify critical error handling issue was fixed
Check test markers are correct
Review fixture implementation
Provide final review feedback

✅ Critical Checks - All Pass

1. Error Handling in Fixture ✅ FIXED

The critical issue from the previous review has been properly addressed:

Location: tests/aignostics/qupath/conftest.py:15-26

What was fixed:

✅ Added assertion for uninstall exit code (lines 15-17)
✅ Added error handling for reinstall failure (lines 21-26)

# Uninstall check - validates exit code is 0 (success) or 2 (not installed)
assert result.exit_code in {0, 2}, (
    f"Unexpected exit code {result.exit_code} from 'qupath uninstall': {result.output}"
)

# Reinstall check - fails test if reinstall doesn't succeed
if reinstall_result.exit_code != 0:
    pytest.fail(
        f"Failed to reinstall QuPath in qupath_save_restore fixture "
        f"(exit code {reinstall_result.exit_code}). Output:\n{reinstall_result.output}"
    )

Why this matters: Sequential tests (@pytest.mark.sequential) depend on proper cleanup. If teardown fails silently, all subsequent tests would fail mysteriously.

2. Test Markers ✅ All Correct

$ uv run pytest -m "not unit and not integration and not e2e" --collect-only tests/aignostics/qupath/
collected 9 items / 9 deselected / 0 selected

Result: ✅ All QuPath tests have proper category markers (@pytest.mark.e2e)

All tests correctly marked with:

✅ @pytest.mark.e2e (category - required for CI)
✅ @pytest.mark.long_running (timeout ≥ 5 min)
✅ @pytest.mark.sequential (cannot run in parallel)
✅ @pytest.mark.flaky (retry on failure)

3. Fixture Integration ✅ Proper

Good: The fixture now uses the shared runner fixture from tests/conftest.py:234:

@pytest.fixture
def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:

This ensures consistency with the rest of the test suite (uses the same CliRunner() instance pattern).

4. Type Annotations ✅ Correct

The fixture has proper type annotations:

def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:

🎯 PR Changes Summary

1. Parametrization for Independent Retries ✅ Excellent

Commit: 225413a

Before: Single test iterating 4 platforms → any failure retries all 4
After: 4 parametrized tests → each platform retries independently

@pytest.mark.parametrize("platform_config", _INSTALL_UNINSTALL_PLATFORM_CONFIGS)
def test_cli_install_and_uninstall(runner: CliRunner, qupath_save_restore: None, platform_config: dict):

Benefits:

⏱️ Faster CI: Only failed platform retries
🎯 Better isolation: Platform-specific issues don't cascade
📊 Clearer results: Exactly which platform failed is visible
⏲️ Smarter timeout: Reduced from 10 min → 5 min (makes sense per-platform)

2. DRY Fixture Extraction ✅ Clean Refactoring

Commit: 9198d12

Eliminated duplication across 6 test functions:

test_cli_install_and_uninstall
test_cli_install_launch_project_annotations_headless
test_cli_install_and_launch_ui
test_gui_qupath_install_only
test_gui_qupath_install_and_launch
test_gui_run_qupath_install_to_inspect

Impact: ~30 lines of duplicated code → single 17-line fixture

3. Error Message Cleanup ✅ Clear Improvement

Commit: c4ce923

Before: Failed to download QuPath from https://example.com=
After: Failed to download QuPath from https://example.com

Simple fix, but important for debugging clarity.

📊 Quality Metrics

Code Coverage

Status: ⚠️ 0% coverage on new code (expected - fixture code is hard to cover)
Explanation: The fixture only runs during QuPath e2e tests, which are not executed in coverage runs
Overall: 85%+ project coverage maintained ✅

Static Analysis

SonarQube: 1 new issue (likely the 0% coverage complaint) - not blocking
Linting: All checks pass (verified in previous review)
Type Checking: MyPy + PyRight both pass ✅

🎓 Architecture Compliance

✅ Test Strategy: Correctly using e2e, long_running, sequential, flaky markers
✅ Modulith Pattern: Tests properly isolated within qupath module
✅ Fixture Hierarchy: Properly uses shared runner fixture from root conftest
✅ Error Handling: Fixtures now fail fast on unexpected states
✅ Documentation: PR description clearly explains the changes

💡 Minor Suggestion (Optional - Not Blocking)

The fixture could benefit from a docstring explaining the restore behavior:

@pytest.fixture
def qupath_save_restore(runner: CliRunner) -> Generator[None, None, None]:
    """Uninstall QuPath for clean test state, restore if it was previously installed.
    
    This fixture ensures tests start with a clean QuPath installation state:
    1. Uninstalls QuPath before test (exit code 0=was installed, 2=already uninstalled)
    2. Runs the test
    3. Reinstalls QuPath if it was installed before (preserves developer environment)
    
    Raises:
        AssertionError: If uninstall fails with unexpected exit code
        pytest.Failed: If reinstall fails (critical for sequential tests)
    """

This is minor and not blocking - the current implementation is clear enough.

📌 Final Recommendation

Status: ✅ Approved - Ready to Merge

All critical issues from the previous review have been addressed:

✅ Error handling added to fixture teardown
✅ Fixture uses shared runner from conftest
✅ All test markers correct
✅ Type annotations complete
✅ Architecture compliance verified

This PR meaningfully improves test reliability through:

Independent platform retry logic
Reusable fixture pattern
Proper error handling in test infrastructure

Excellent work! 🚀 The changes demonstrate strong engineering practices:

Root cause analysis (retry scope was the problem)
Minimal, focused changes
Refactoring discipline (DRY principle)
Clear communication in PR description

claude · 2026-03-03T09:14:37Z

✅ Code Review: Fix Flaky QuPath Test

I've completed a thorough review of this PR that addresses flaky QuPath testing. This is excellent work that meaningfully improves test reliability and maintainability.

🎯 Summary

Status: ✅ Approved - All critical checks pass

This PR implements three focused improvements:

Parametrizes test_cli_install_and_uninstall for independent platform retries
Extracts common save/restore pattern into a reusable qupath_save_restore fixture
Fixes a confusing trailing = in download error messages

✅ Critical Checks (All Pass)

1. Test Markers ✅

Result: All tests properly marked with @pytest.mark.e2e, @pytest.mark.long_running, @pytest.mark.sequential

# Verified: No unmarked tests found
$ uv run pytest -m "not unit and not integration and not e2e" --collect-only tests/aignostics/qupath/
collected 9 items / 9 deselected / 0 selected

2. Code Quality ✅

Result: All linting and type checking passes

$ make lint
All checks passed!
161 files already formatted
0 errors, 0 warnings, 0 informations
Success: no issues found in 81 source files

3. Conventional Commits ✅

Result: All commits follow proper format

c4ce923 fix(qupath): remove trailing equal sign from logs
225413a test(qupath): parametrize install/uninstall test for independent retries
9198d12 refactor(tests): extract save/restore fixture into qupath conftest

🎉 Excellent Improvements

1. Parametrization for Independent Retries (`d89a75d`)

Before: Single test iterating 4 platforms → if platform 3 fails, retry from platform 1
After: 4 independent parametrized tests → if darwin-amd64 fails, only retry that platform

@pytest.mark.parametrize("platform_config", _INSTALL_UNINSTALL_PLATFORM_CONFIGS)
def test_cli_install_and_uninstall(runner: CliRunner, qupath_save_restore: None, platform_config: dict) -> None:
    # Each platform config gets its own test instance with independent retry logic

Impact:

⏱️ Faster retries: Only failed platforms retry, not all 4
🎯 Better isolation: Platform-specific failures don't affect others
📊 Clearer results: CI shows exactly which platform failed

Smart timeout reduction: 10 min → 5 min makes sense since each platform now runs independently.

2. DRY Fixture Extraction (`a0b3197`)

Before: 30+ lines duplicated across 5 test functions
After: Single reusable qupath_save_restore fixture

Eliminated duplication in:

test_cli_install_and_uninstall
test_cli_install_launch_project_annotations_headless
test_cli_install_and_launch_ui
test_gui_qupath_install_only
test_gui_qupath_install_and_launch
test_gui_run_qupath_install_to_inspect

Type annotation: ✅ Properly annotated as Generator[None, None, None]

3. Clean Error Message (`8e8ea89`)

Before: Failed to download QuPath from https://example.com=
After: Failed to download QuPath from https://example.com

Simple but important for debugging clarity.

🔍 Review Findings

Critical Issue: Missing Error Handling in Fixture ⚠️

Location: tests/aignostics/qupath/conftest.py:22

The fixture's teardown doesn't verify reinstall succeeded:

if was_installed:
    runner.invoke(cli, ["qupath", "install"])  # ❌ Exit code ignored

Problem: If reinstall fails, test passes but environment left broken for subsequent tests.

Recommended fix:

if was_installed:
    reinstall_result = runner.invoke(cli, ["qupath", "install"])
    if reinstall_result.exit_code != 0:
        pytest.fail(
            f"Failed to reinstall QuPath in qupath_save_restore fixture "
            f"(exit code {reinstall_result.exit_code}). Output:\n{reinstall_result.output}"
        )

Why this matters:

Tests run sequentially (@pytest.mark.sequential)
Broken teardown cascades to all subsequent tests
Hard to diagnose "QuPath not found" in test 3 when teardown failed in test 1

Note: This issue already existed in the duplicated code - the refactoring just makes it more visible. Both Copilot and Sentry flagged this correctly.

Fix this →

📋 Architecture Compliance

✅ Test Strategy: Correctly using e2e, long_running, sequential, flaky markers
✅ Modulith Pattern: Tests properly isolated within qupath module
✅ Medical Device Standards: No impact on compliance (test-only changes)
✅ Documentation: PR description clearly explains the motivation

💡 Future Improvement (Optional)

As noted in the PR description:

Note that this is still somewhat inefficient: we could technically only uninstall/reinstall once per session instead of once per test.

Suggestion: Consider @pytest.fixture(scope="session") if QuPath installation is expensive and safe to share across tests. However, current per-test approach is safer for isolation.

🎓 What I Learned

This PR demonstrates excellent engineering practices:

Root cause analysis: Identified retry scope as the actual problem
Minimal changes: Fixed only what's necessary
Refactoring discipline: Extracted duplication while fixing the bug
Clear communication: PR description explains reasoning well

📌 Recommendation

Approve with one suggestion: Add error handling to fixture teardown (see above).

All other aspects are excellent. This PR meaningfully improves test reliability and will make future debugging much easier.

Great work! 🚀

View job run

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

tests/aignostics/qupath/conftest.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-03-03T10:02:53Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/aignostics/qupath/_service.py	0.00%	1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines	Coverage Δ
src/aignostics/qupath/_service.py	`57.29% <0.00%> (+1.68%)`	⬆️

... and 4 files with indirect coverage changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

tests/aignostics/qupath/conftest.py

sonarqubecloud · 2026-03-03T11:37:43Z

Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

olivermeyer · 2026-03-03T12:04:52Z

CodeCov and SonarCloud are failing because of 0% of coverage on new code, but new code is simply fixing a log line. Safe to merge regardless.

arne-aignx · 2026-03-03T13:05:53Z

LGTM, tests of this kind are always annoying

Copilot AI review requested due to automatic review settings March 3, 2026 09:03

olivermeyer marked this pull request as ready for review March 3, 2026 09:03

olivermeyer requested a review from helmut-hoffer-von-ankershoffen as a code owner March 3, 2026 09:03

olivermeyer added the claude Trigger Claude Code automation label Mar 3, 2026

Copilot started reviewing on behalf of olivermeyer March 3, 2026 09:04 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/conftest.py Show resolved Hide resolved

tests/aignostics/qupath/conftest.py Outdated Show resolved Hide resolved

sentry bot reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/conftest.py Show resolved Hide resolved

claude bot reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/cli_test.py Show resolved Hide resolved

claude bot reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/cli_test.py Show resolved Hide resolved

claude bot reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/conftest.py Outdated Show resolved Hide resolved

claude bot reviewed Mar 3, 2026

View reviewed changes

src/aignostics/qupath/_service.py Show resolved Hide resolved

olivermeyer force-pushed the test/flaky-qupath-test branch from 8e8ea89 to c4ce923 Compare March 3, 2026 09:10

Copilot AI review requested due to automatic review settings March 3, 2026 09:18

olivermeyer force-pushed the test/flaky-qupath-test branch from c4ce923 to 2274524 Compare March 3, 2026 09:18

Copilot started reviewing on behalf of olivermeyer March 3, 2026 09:19 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/conftest.py Outdated Show resolved Hide resolved

tests/aignostics/qupath/conftest.py Outdated Show resolved Hide resolved

refactor(tests): extract save/restore fixture into qupath conftest

57e8d09

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

olivermeyer force-pushed the test/flaky-qupath-test branch from 2274524 to ebfe1ee Compare March 3, 2026 09:26

olivermeyer and others added 2 commits March 3, 2026 11:06

test(qupath): parametrize install/uninstall test for independent retries

e115699

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(qupath): remove trailing equal sign from logs

c2ea7f7

Copilot AI review requested due to automatic review settings March 3, 2026 10:07

olivermeyer force-pushed the test/flaky-qupath-test branch from ebfe1ee to c2ea7f7 Compare March 3, 2026 10:07

Copilot started reviewing on behalf of olivermeyer March 3, 2026 10:07 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

tests/aignostics/qupath/conftest.py Show resolved Hide resolved

arne-aignx self-assigned this Mar 3, 2026

arne-aignx removed their assignment Mar 3, 2026

arne-aignx self-requested a review March 3, 2026 13:05

olivermeyer merged commit fe82486 into main Mar 3, 2026
45 of 48 checks passed

Conversation

olivermeyer commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Review Update: QuPath Test Improvements Verified

📋 Review Progress

✅ Critical Checks - All Pass

1. Error Handling in Fixture ✅ FIXED

2. Test Markers ✅ All Correct

3. Fixture Integration ✅ Proper

4. Type Annotations ✅ Correct

🎯 PR Changes Summary

1. Parametrization for Independent Retries ✅ Excellent

2. DRY Fixture Extraction ✅ Clean Refactoring

3. Error Message Cleanup ✅ Clear Improvement

📊 Quality Metrics

Code Coverage

Static Analysis

🎓 Architecture Compliance

💡 Minor Suggestion (Optional - Not Blocking)

📌 Final Recommendation

Uh oh!

claude bot commented Mar 3, 2026

✅ Code Review: Fix Flaky QuPath Test

🎯 Summary

✅ Critical Checks (All Pass)

1. Test Markers ✅

2. Code Quality ✅

3. Conventional Commits ✅

🎉 Excellent Improvements

1. Parametrization for Independent Retries (d89a75d)

2. DRY Fixture Extraction (a0b3197)

3. Clean Error Message (8e8ea89)

🔍 Review Findings

Critical Issue: Missing Error Handling in Fixture ⚠️

📋 Architecture Compliance

💡 Future Improvement (Optional)

🎓 What I Learned

📌 Recommendation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 3, 2026

Quality Gate failed

Uh oh!

olivermeyer commented Mar 3, 2026

Uh oh!

arne-aignx commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

olivermeyer commented Mar 3, 2026 •

edited

Loading

claude bot commented Mar 3, 2026 •

edited

Loading

1. Parametrization for Independent Retries (`d89a75d`)

2. DRY Fixture Extraction (`a0b3197`)

3. Clean Error Message (`8e8ea89`)

codecov bot commented Mar 3, 2026 •

edited

Loading