Deeptrace

Deeptrace is a multimodal sensitive-data leak simulation engine that generates realistic synthetic leak artifacts across image, video, audio, and document channels.

The project is designed for security testing, DLP model training, red-team exercises, and awareness programs where real leaked data is either unavailable or too risky to use.

Pitch Narrative

The blind spot

Modern leak incidents are no longer text-only. In practice, secrets leak through screenshots, recordings, meetings, and office documents. Traditional DLP coverage is often insufficient for these channels.

The solution

Deeptrace simulates realistic leak trajectories end to end. Instead of randomly inserting secret strings, it uses LLM-driven contextual generation to place secrets into coherent host environments, then renders them into multimodal artifacts.

Practical value

DLP cold start: generate large synthetic datasets without exposing production secrets.
Red/blue exercises: produce believable honey artifacts for adversary engagement.
Security awareness: create immersive leak examples for internal training.

Vision

Simulating the unseen, to trace the unknown.

Why Deeptrace Matters

Multimodal coverage instead of text-only assumptions. Traditional DLP workflows mostly focus on text logs. Real incidents often happen through screenshots, recordings, shared slides, and meeting audio. Deeptrace fills that practical gap.
Safe synthetic data for model training. High-quality leak datasets are hard to collect because real secrets are regulated and dangerous to circulate. Deeptrace produces high-fidelity synthetic samples without exposing production credentials.
Better realism for offensive and defensive drills. Security exercises become more actionable when bait files and leak traces look operationally authentic.

Core Innovations

LLM-based contextual generation. Deeptrace does not only paste a secret string into random text. It generates semantically coherent host context such as code snippets, config fragments, chat logs, and business documents.
Adaptive scene routing. For visual modalities, the system can infer where a secret is most likely to appear (IDE, CLI, chat, config, UI), reducing manual rule authoring.
End-to-end multimodal rendering pipeline. The same scenario can be rendered into static images, scrolling videos, speech, PDF, Word, and PPT outputs to emulate real-world leakage surfaces.
Robustness-oriented augmentation. Optional noise and compression-like perturbations help evaluate OCR and vision models under imperfect capture conditions.

Incremental Development Story

Deeptrace is intentionally built in layers, and this evolution is important to understand the current architecture:

Phase 1: Scenario generation and visual rendering. The initial version focused on LLM-driven scene synthesis and image/video outputs.
Phase 2: Audio and document modalities. Audio, PDF, Word, and PPT generators were added to cover broader enterprise leakage channels.
Phase 3: Secret generator module (added later). The key and network-string generator was introduced as a later enhancement to support scalable synthetic secret creation. This module is an incremental extension, not the original core, and is now integrated as an optional upstream source for scenario payloads.

This staged growth explains why some modules look more mature than others and why interfaces evolved over time.

Supported Modalities

Image: IDE/CLI/chat/config/UI screenshots.
Video: Pan/scroll videos generated from visual scenes.
Audio: Secret mention simulation via TTS.
PDF: Formal leak-like document artifacts.
Word: Internal memo and handoff style documents.
PPT: Presentation-style sensitive content artifacts.

Project Structure

.
├── main.py
├── llm_utils.py
├── config/
│   ├── config.json
│   └── config_sample.json
├── generators/
│   ├── secret_generators/
│   │   ├── Key_Gen.py
│   │   ├── NetworkStr_Gen.py
│   │   └── secret_gen.py
│   ├── audio_gen.py
│   ├── chat_gen.py
│   ├── cli_gen.py
│   ├── config_gen.py
│   ├── pdf_gen.py
│   ├── ppt_gen.py
│   ├── ui_gen.py
│   ├── video_utils.py
│   ├── vscode_gen.py
│   └── word_gen.py
├── output/
└── output_sample/

Quick Start

1. Environment

Python 3.8+
Node.js 18+
LLM API key

2. Install dependencies

pip install -r requirements.txt
sudo apt-get install -y espeak-ng

3. Configure runtime

Create or edit config/config.json based on config/config_sample.json.

Key fields:

api_key: LLM API key
base_url: LLM API endpoint
output_dir: output folder
add_noise: whether to add visual noise
items: generation list (secret, secret_type, modality, optional scene)

4. Run generator

python main.py
python main.py --config config/config_sample.json

5. Run backend (optional web workflow)

cd backend
pip install -r requirements.txt
python main.py

Backend endpoints:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

6. Run frontend (optional web workflow)

cd frontend
npm install
npm run dev

Frontend: http://localhost:3000

Typical web flow:

Register and log in.
Create a task from dashboard.
Configure secret type, modality, and scene.
Submit task and review outputs.

Generation Flow

Analyze secret_type and modality.
Select or infer scene for image/video tasks.
Generate context-rich content with embedded secret.
Render with the modality-specific generator.

Secret Generator (Incremental Module)

The secret generator was added after the initial multimodal pipeline and is now available as a reusable component.

Key generator APIs:
- Keygen(modeset: int) -> str
- RandKeygen() -> str
Network string APIs:
- Strgen(modeset: int) -> str
- RandStrgen() -> str
Unified wrapper:
- AllRandgen() -> str

See source files under generators/secret_generators/ for supported enum types and format constraints.

Documentation Note

To keep docs maintainable, previous standalone pitch and usage docs are merged into this README.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
backend		backend
config		config
examples		examples
frontend		frontend
generators		generators
.gitignore		.gitignore
README.md		README.md
README.zh-CN.md		README.zh-CN.md
cleanup.bat		cleanup.bat
cleanup.sh		cleanup.sh
docker-compose.dev.yml		docker-compose.dev.yml
llm_utils.py		llm_utils.py
main.py		main.py
package-lock.json		package-lock.json
requirements.txt		requirements.txt
start-dev.bat		start-dev.bat
start-dev.sh		start-dev.sh
start_simple.bat		start_simple.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deeptrace

Pitch Narrative

The blind spot

The solution

Practical value

Vision

Why Deeptrace Matters

Core Innovations

Incremental Development Story

Supported Modalities

Project Structure

Quick Start

1. Environment

2. Install dependencies

3. Configure runtime

4. Run generator

5. Run backend (optional web workflow)

6. Run frontend (optional web workflow)

Generation Flow

Secret Generator (Incremental Module)

Documentation Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deeptrace

Pitch Narrative

The blind spot

The solution

Practical value

Vision

Why Deeptrace Matters

Core Innovations

Incremental Development Story

Supported Modalities

Project Structure

Quick Start

1. Environment

2. Install dependencies

3. Configure runtime

4. Run generator

5. Run backend (optional web workflow)

6. Run frontend (optional web workflow)

Generation Flow

Secret Generator (Incremental Module)

Documentation Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages