-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Research Finding
Paper: Test-Driven AI Agent Definition (TDAD) (arXiv:2603.08806, 2026)
Treats agent system prompts and skill definitions as compiled artifacts: behavioral specs → executable tests (via coding agent) → iterative prompt refinement until tests pass. Adds semantic mutation testing (faulty prompt variants) to measure test-suite robustness. Reports 92% compilation success, 86-100% mutation scores.
Applicability to Zeph
Directly applicable to Zeph's continuous improvement protocol and self-learning pipeline:
1. Skill behavioral specs
Each SKILL.md could have a companion SKILL_TESTS.md with expected input/output behavior pairs. After self-learning mutates a skill, run the behavioral tests to validate the mutation didn't regress.
2. System prompt block testing
Zeph's system prompt has stable blocks (Block 1: base identity, Block 2: volatile env). TDAD mutation testing could verify that removing or altering a block causes measurable behavior change — confirming the block is actually load-bearing.
3. Two-agent loop integration
The TDAD two-agent loop (test writer + prompt refiner) maps naturally onto Zeph's orchestration: spawn a test-writer sub-agent to generate behavioral tests for a skill, then a skill-refiner to improve the skill until tests pass. Uses existing AgentTestHarness (ARCH-08) as test executor.
References
- arXiv:2603.08806
- Zeph crates:
zeph-skills(learning.rs,registry.rs),zeph-core(agent/),AgentTestHarness(ARCH-08)