Skip to content

Add replay-contract test for compacted vs raw event prompt equivalence#4668

Open
davidahmann wants to merge 1 commit intogoogle:mainfrom
davidahmann:codex/issue-4667-compaction-replay-contract
Open

Add replay-contract test for compacted vs raw event prompt equivalence#4668
davidahmann wants to merge 1 commit intogoogle:mainfrom
davidahmann:codex/issue-4667-compaction-replay-contract

Conversation

@davidahmann
Copy link

Problem

Compaction behavior lacked a direct regression assertion that replay prompt semantics remain equivalent between raw and compacted event streams.

Why now

Replay determinism needs a clear contract test so compaction changes cannot silently alter effective prompt reconstruction.

What changed

  • Added test_replay_contract_compacted_and_raw_events_match_effective_prompt in tests/unittests/apps/test_compaction.py.
  • Test reconstructs the effective prompt via contents._get_contents for:
    • raw event stream
    • equivalent compacted summary + trailing events
  • Asserts semantic equality of resulting prompt content.

Validation

  • uv run pyink --check --diff tests/unittests/apps/test_compaction.py
  • uv run pytest -q tests/unittests/apps/test_compaction.py -k replay_contract_compacted_and_raw_events_match_effective_prompt

Refs #4667

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical need for regression testing around event compaction behavior. It introduces a new contract test to rigorously verify that the process of compacting event streams does not alter the effective prompt semantics when reconstructing prompts. This ensures replay determinism and prevents silent changes to prompt reconstruction logic due to compaction.

Highlights

  • New Replay Contract Test: Introduced test_replay_contract_compacted_and_raw_events_match_effective_prompt to verify that prompt reconstruction from compacted event streams is semantically identical to reconstruction from raw event streams.
  • Semantic Equivalence Assertion: The new test explicitly asserts that the effective prompt generated by _get_contents from a compacted summary plus trailing events matches the prompt generated from the full raw event stream.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/unittests/apps/test_compaction.py
    • Added a new contract test for replay prompt equivalence between compacted and raw event streams.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 1, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a valuable contract test to ensure that the effective prompt remains equivalent for both raw and compacted event streams. The implementation is correct and achieves its goal. I have one suggestion to improve the maintainability of the new test by reducing some code duplication.

Comment on lines +816 to +859
def test_replay_contract_compacted_and_raw_events_match_effective_prompt(
self,
):
raw_events = [
self._create_event(1.0, 'inv1', 'User asks about weather'),
self._create_event(2.0, 'inv1', 'Agent asks clarifying question'),
self._create_event(3.0, 'inv2', 'User clarifies location'),
self._create_event(4.0, 'inv2', 'Agent proposes plan'),
self._create_event(5.0, 'inv3', 'User asks for final answer'),
]

compacted_events = [
self._create_compacted_event(
1.0,
4.0,
(
'User asks about weather\n'
'Agent asks clarifying question\n'
'User clarifies location\n'
'Agent proposes plan'
),
appended_ts=4.5,
),
self._create_event(5.0, 'inv3', 'User asks for final answer'),
]

raw_prompt = '\n'.join(
part.text
for content in contents._get_contents(None, raw_events)
for part in content.parts
if part.text
)
compacted_prompt = '\n'.join(
part.text
for content in contents._get_contents(None, compacted_events)
for part in content.parts
if part.text
)

self.assertEqual(
compacted_prompt,
raw_prompt,
'Compaction should preserve deterministic replay prompt semantics.',
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you can extract the logic for reconstructing the prompt from a list of events into a local helper function. This makes the test cleaner and avoids repeating the same generator expression.

  def test_replay_contract_compacted_and_raw_events_match_effective_prompt(
      self,
  ):
    raw_events = [
        self._create_event(1.0, 'inv1', 'User asks about weather'),
        self._create_event(2.0, 'inv1', 'Agent asks clarifying question'),
        self._create_event(3.0, 'inv2', 'User clarifies location'),
        self._create_event(4.0, 'inv2', 'Agent proposes plan'),
        self._create_event(5.0, 'inv3', 'User asks for final answer'),
    ]

    compacted_events = [
        self._create_compacted_event(
            1.0,
            4.0,
            (
                'User asks about weather\n'
                'Agent asks clarifying question\n'
                'User clarifies location\n'
                'Agent proposes plan'
            ),
            appended_ts=4.5,
        ),
        self._create_event(5.0, 'inv3', 'User asks for final answer'),
    ]

    def _reconstruct_prompt(events_list):
      return '\n'.join(
          part.text
          for content in contents._get_contents(None, events_list)
          for part in content.parts
          if part.text
      )

    raw_prompt = _reconstruct_prompt(raw_events)
    compacted_prompt = _reconstruct_prompt(compacted_events)

    self.assertEqual(
        compacted_prompt,
        raw_prompt,
        'Compaction should preserve deterministic replay prompt semantics.',
    )

@davidahmann
Copy link
Author

Implemented the scoped fix for #4667 with a replay-contract test that validates effective prompt equivalence between raw and compacted events.

Validation run:

  • (pass)
  • (pass)

Current CI snapshot: triage bot, CLA, and header checks are green.

This contribution was informed by patterns from Wrkr. Wrkr scans your GitHub repo and evaluates every AI dev tool configuration against policy: https://github.com/Clyra-AI/wrkr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants