research(testing): TDAD behavioral spec testing for skills and system prompt blocks

## Research Finding

Paper: Test-Driven AI Agent Definition (TDAD) (arXiv:2603.08806, 2026)

Treats agent system prompts and skill definitions as compiled artifacts: behavioral specs → executable tests (via coding agent) → iterative prompt refinement until tests pass. Adds semantic mutation testing (faulty prompt variants) to measure test-suite robustness. Reports 92% compilation success, 86-100% mutation scores.

## Applicability to Zeph

Directly applicable to Zeph's continuous improvement protocol and self-learning pipeline:

### 1. Skill behavioral specs
Each `SKILL.md` could have a companion `SKILL_TESTS.md` with expected input/output behavior pairs. After self-learning mutates a skill, run the behavioral tests to validate the mutation didn't regress.

### 2. System prompt block testing
Zeph's system prompt has stable blocks (Block 1: base identity, Block 2: volatile env). TDAD mutation testing could verify that removing or altering a block causes measurable behavior change — confirming the block is actually load-bearing.

### 3. Two-agent loop integration
The TDAD two-agent loop (test writer + prompt refiner) maps naturally onto Zeph's orchestration: spawn a `test-writer` sub-agent to generate behavioral tests for a skill, then a `skill-refiner` to improve the skill until tests pass. Uses existing `AgentTestHarness` (ARCH-08) as test executor.

## References

- arXiv:2603.08806
- Zeph crates: `zeph-skills` (`learning.rs`, `registry.rs`), `zeph-core` (`agent/`), `AgentTestHarness` (ARCH-08)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(testing): TDAD behavioral spec testing for skills and system prompt blocks #1842

Research Finding

Applicability to Zeph

1. Skill behavioral specs

2. System prompt block testing

3. Two-agent loop integration

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(testing): TDAD behavioral spec testing for skills and system prompt blocks #1842

Description

Research Finding

Applicability to Zeph

1. Skill behavioral specs

2. System prompt block testing

3. Two-agent loop integration

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions