CAUST is an autonomous multi-agent system for red-teaming machine unlearning methods. It automatically generates hypotheses, designs experiments, and evaluates unlearning techniques to discover vulnerabilities in data-based and concept-erasure approaches.
CAUST uses a dual-loop architecture:
- Inner Loop: Generates hypotheses, designs experiments, executes them on H200 GPUs, and evaluates results
- Outer Loop: Synthesizes findings across iterations, judges novelty and impact, and produces comprehensive reports
The system leverages:
- Multi-agent orchestration via CAMEL-AI
- RAG (Retrieval Augmented Generation) for research paper knowledge
- GPU-accelerated experiment execution on Kubernetes
- Persistent memory for successful attack discoveries
CAUST/
├── aust/ # Application package
│ ├── configs/ # Prompts, personas, thresholds, task templates
│ ├── experiments/ # Placeholder for experiment artifacts
│ ├── logs/ # Runtime logs
│ ├── outputs/ # Persistent inner loop results
│ ├── rag_paper_db/ # Vector store for paper RAG
│ ├── scripts/ # CLI entry points for inner loop tooling
│ ├── src/ # Source code
│ │ ├── agents/ # LLM-powered agent implementations
│ │ ├── loop/ # Orchestration and state management
│ │ ├── memory/ # Long-term memory system
│ │ ├── rag/ # Research paper retrieval subsystem
│ │ ├── toolkits/ # Integrations with external unlearning toolchains
│ │ └── logging_config.py # Project-wide logging setup
│ ├── tests/ # Test suite (unit + integration)
│ └── utils/ # Helper scripts (e.g., paper downloads)
├── docker/ # Docker and Kubernetes configs
│ ├── Dockerfile # Container image definition
│ ├── job.yaml # Kubernetes GPU job template
│ └── pvc.yaml # Persistent volume claims
├── external/ # Third-party submodules (DeepUnlearn, CAMEL)
├── logs/ # Legacy log location (top-level)
├── requirements.txt # Python dependencies
└── docs/ # Project documentation
- Python 3.11.5+
- Docker 24.0.7+
- Kubernetes 1.28.x with NVIDIA GPU support (H200)
- CUDA 12.1+ (for H200 GPUs)
-
Clone the repository:
git clone https://github.com/vios-s/CAUST.git cd CAUST -
Create Python virtual environment:
python3.11 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install --upgrade pip==23.3.1 pip install -r requirements.txt
-
Install CAMEL-AI in dev mode (Story 1.2):
# Will be added in Story 1.2 -
Set up environment variables:
cp .env.example .env # Edit .env with your API keys (OpenRouter, etc.) -
Run tests:
pytest tests/
Build the Docker image:
docker build -t caust:latest -f docker/Dockerfile .Test the container:
docker run --rm caust:latest-
Create persistent volumes:
kubectl apply -f docker/pvc.yaml
-
Submit GPU job:
# Edit docker/job.yaml to set TASK_ID and TASK_TYPE kubectl apply -f docker/job.yaml -
Monitor job:
kubectl get jobs kubectl logs job/caust-experiment-job
All project documentation is organized in the docs/ directory:
- docs/architecture.md - System architecture overview
- docs/prd.md - Product requirements document
- docs/brief.md - Project brief
- docs/stories/ - Implementation stories and documentation
- Story 1.5 - Hypothesis Refinement Workforce
- Story 1.0-1.8 - Integration summaries
- Test results and implementation details
- Inner loop orchestrator documentation
Complete documentation for using CAMEL-AI patterns:
- docs/camel-resources/README_CAMEL_RESOURCES.md - Master index and overview
- docs/camel-resources/CAMEL_QUICK_REFERENCE.md - Quick lookup while coding
- docs/camel-resources/CAMEL_PATTERNS_GUIDE.md - Complete technical reference
- docs/camel-resources/CAMEL_INDEX.md - File index and navigation
- docs/config-guides/CONCEPT_ERASURE_CONFIG_SUMMARY.md - Configuration reference
- docs/MAIN_PY_SUMMARY.md - Code structure summary
- docs/architecture/ - Comprehensive architecture details
- Components, workflows, data models
- Tech stack and external APIs
- Test strategy and security considerations
- docs/epics/ - Epic-level planning and requirements
- docs/prds/ - Product requirement details
- Formatter:
black(line length 100) - Linter:
ruff - Type Checking:
mypy(strict mode)
Run code quality checks:
black aust/ tests/
ruff check aust/ tests/
mypy aust/- Unit tests:
tests/unit/test_{module}.py - Integration tests:
tests/integration/test_{workflow}.py
Run tests with coverage:
pytest tests/ --cov=aust --cov-report=htmlAll production code uses the logging framework (no print() statements):
from aust.logging_config import get_logger, set_correlation_id
logger = get_logger(__name__)
# Set correlation ID for request tracing
set_correlation_id("task_123")
# Log messages
logger.info("Starting experiment", extra={"experiment_id": "exp_001"})
logger.error("Experiment failed", extra={"error": str(e)})Configuration files are located in aust/configs/:
prompts/: Agent prompt templatesthresholds/: Evaluation threshold configurationstasks/: Task-specific configurationspersonas/: Judge persona definitions
- Ensure NVIDIA Docker runtime is installed:
nvidia-docker --version - Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
- Check GPU node labels:
kubectl get nodes --show-labels | grep nvidia - Verify PVC is bound:
kubectl get pvc
- Ensure
PYTHONPATHincludes project root:export PYTHONPATH=/path/to/CAUST:$PYTHONPATH - Check virtual environment is activated
- Verify
logs/directory exists and is writable - Check log level in configuration (default: INFO)
Please follow the coding standards and ensure all tests pass before submitting changes.
TBD
Project repository: https://github.com/vios-s/CAUST