SEAT — Self-Extending Agent Toolkit

SEAT is a proof-of-concept HTTP service that lets agents build and execute their own tools at runtime. When a caller describes what a tool should do in plain language, SEAT prompts a language model to generate a Python function, validates the generated code against a strict AST-based security policy, persists the approved function in PostgreSQL, and executes it on demand inside an isolated subprocess — all without any human involvement between description and first run.

How It Works

Describe — a caller sends a plain-language description of the tool they need (e.g. "convert Celsius to Fahrenheit").
Generate — SEAT sends the description to a configured LLM (Ollama, OpenAI, or any LLMWire-compatible provider) requesting a GeneratedCode structured response that includes the function name, source code, input schema, and output description.
Validate — the generated code is parsed to an AST and checked against a list of forbidden module imports (os, sys, subprocess, socket, http, etc.) and forbidden builtin calls (exec, eval, open, __import__, etc.). Code that does not pass this check is rejected with HTTP 422.
Register — the validated function is wrapped in a thin stdin/stdout harness via a Jinja2 template and persisted in the tools table with its status set to active.
Execute — the caller posts input data to the tool's execute endpoint; SEAT writes the wrapper script to a temporary file, spawns it as a subprocess with the input piped as JSON to stdin, collects the JSON result from stdout, updates rolling execution statistics, and returns the result.

Quick Start

Start the database

docker compose up -d

This starts PostgreSQL 17 on port 5433 with the seat user and database.

Install and migrate

pip install -e ".[dev]"
cp .env.example .env          # edit LLM settings as needed
alembic upgrade head

Start the API server

uvicorn seat.api.app:build_app --factory --reload

The API is now available at http://localhost:8000. A built-in test GUI is served at the root URL — open http://localhost:8000/ in a browser. The OpenAPI interactive docs are at http://localhost:8000/docs.

Create a tool

curl -s -X POST http://localhost:8000/tools \
  -H "Content-Type: application/json" \
  -d '{"description": "Convert a temperature from Celsius to Fahrenheit"}' \
  | jq .

The response contains the tool's id, generated code, input_schema, and initial statistics.

Execute the tool

TOOL_ID="<id from previous response>"

curl -s -X POST "http://localhost:8000/tools/$TOOL_ID/execute" \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"celsius": 100}}' \
  | jq .

A successful response looks like:

{
  "success": true,
  "output": "212.0",
  "error": null,
  "execution_time": 0.031
}

List and search tools

# List all active tools
curl -s "http://localhost:8000/tools?status=active" | jq .

# Search by keyword
curl -s "http://localhost:8000/tools/search?q=temperature" | jq .

Test GUI

Open http://localhost:8000/ in a browser. The dark-mode single-page interface provides four tabs:

Create Tool — enter a description (and optional name), generate a tool via the LLM.
Tools List — browse, search, filter by status, and delete tools.
Tool Detail — inspect a tool's code, input schema, statistics, and metadata.
Execute — select a tool, provide JSON input, run it, and view the result.

The GUI is a single HTML file (src/seat/static/index.html) with no external dependencies. It consumes the same REST API documented below.

API Reference

Method	Path	Description
`GET`	`/`	Serves the built-in test GUI (HTML)
`GET`	`/health`	Liveness check — returns `{"status": "ok"}`
`POST`	`/tools`	Generate, validate, and register a new tool
`GET`	`/tools`	List all tools; optional `?status=active\|deprecated` filter
`GET`	`/tools/search?q=<query>`	Case-insensitive substring search on name and description
`GET`	`/tools/{id}`	Retrieve a single tool by UUID
`POST`	`/tools/{id}/execute`	Execute a tool with JSON input data
`DELETE`	`/tools/{id}`	Remove a tool from the registry

All endpoints return JSON. POST /tools returns 201 on success. DELETE /tools/{id} returns 204 with no body. Both return 404 when the tool does not exist. POST /tools returns 422 when generated code fails security validation.

Architecture

HTTP client
     │
     ▼
┌────────────────────────────────────────────────────────┐
│  FastAPI application  (seat.api.app)                   │
│                                                        │
│  POST /tools                GET /tools/{id}/execute    │
│       │                              │                 │
│       ▼                              ▼                 │
│  ToolGenerator              ToolExecutor               │
│  ┌──────────────┐           ┌───────────────────────┐  │
│  │ LLMClient    │           │ tempfile + subprocess │  │
│  │ (LLMWire)     │           │ JSON stdin / stdout   │  │
│  │ Jinja2 wrap  │           │ configurable timeout  │  │
│  └──────────────┘           └───────────────────────┘  │
│       │                              │                 │
│       ▼                              ▼                 │
│  CodeValidator              ToolRegistry               │
│  ┌──────────────┐           ┌───────────────────────┐  │
│  │ AST parser   │           │ SQLAlchemy async ORM  │  │
│  │ import check │           │ PostgreSQL (asyncpg)  │  │
│  │ builtin check│           │ Alembic migrations    │  │
│  └──────────────┘           └───────────────────────┘  │
└────────────────────────────────────────────────────────┘

The lifespan hook wires the engine, session factory, ToolGenerator, and ToolExecutor into app.state once at startup. FastAPI's dependency injection pulls them into each request handler without any global state.

Security Model

SEAT applies a defence-in-depth approach to code that it did not write.

AST validation (pre-persistence). Before any code reaches the database it is parsed to a Python AST. The validator walks every node and rejects the submission if it finds:

An import or from ... import statement whose top-level module is one of: os, sys, subprocess, shutil, socket, http, urllib, requests, httpx, ctypes, multiprocessing, threading, signal, importlib, pickle, shelve, or sqlite3.
A function call whose name matches any of: exec, eval, compile, __import__, open, globals, locals, getattr, setattr, or delattr.
Missing function definition — code that contains no def is not a valid tool.

Validation is purely static; no code is executed during this phase.

Subprocess sandbox (execution). Approved tools run as a separate python3 process. SEAT passes input as JSON to the process's stdin and reads the result from stdout. The subprocess inherits no database credentials or application secrets. A configurable timeout (default 30 s, controlled by EXECUTOR_TIMEOUT) kills the process if it runs too long. The temporary script file is always deleted after execution, even on failure.

Configuration

All settings are read from environment variables (or a .env file via pydantic-settings):

Variable	Default	Description
`DATABASE_URL`	`postgresql+asyncpg://seat:seat@localhost:5433/seat`	Async-compatible PostgreSQL connection URL
`LLM_PROVIDER`	`ollama`	LLMWire provider name (`ollama`, `openai`, ...)
`LLM_MODEL`	`llama3`	Model identifier passed to the LLM provider
`LLM_API_KEY`	(empty)	API key for hosted providers; leave empty for Ollama
`EXECUTOR_TIMEOUT`	`30`	Maximum seconds to allow a tool subprocess to run

Roadmap

v0.2.0

MCP server generation. Each registered tool will be automatically exposed as a Model Context Protocol (MCP) tool descriptor, allowing any MCP-compatible agent (Claude Desktop, Continue, etc.) to discover and invoke SEAT-generated tools without custom API integration.

Docker tool containers. The subprocess executor will gain an optional Docker backend. When enabled, each tool execution runs inside a disposable container with a read-only filesystem, no network access, and strict resource limits — providing OS-level isolation in addition to the existing AST and timeout safeguards.

References and Papers

MCP Specification: https://spec.modelcontextprotocol.io/
Toolformer — Language Models Can Teach Themselves to Use Tools: https://arxiv.org/abs/2302.04761
ART — Automatic multi-step Reasoning and Tool-use for large language models: https://arxiv.org/abs/2303.09014

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
alembic		alembic
scripts		scripts
src/seat		src/seat
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEAT — Self-Extending Agent Toolkit

How It Works

Quick Start

Start the database

Install and migrate

Start the API server

Create a tool

Execute the tool

List and search tools

Test GUI

API Reference

Architecture

Security Model

Configuration

Roadmap

v0.2.0

References and Papers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SEAT — Self-Extending Agent Toolkit

How It Works

Quick Start

Start the database

Install and migrate

Start the API server

Create a tool

Execute the tool

List and search tools

Test GUI

API Reference

Architecture

Security Model

Configuration

Roadmap

v0.2.0

References and Papers

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages