A simple AI node orchestration and management library, like Ray/K8s, but simpler and lighter.
PyLet is a lightweight and reliable system for launching and orchestrating AI workloads across laptops, edge devices, and GPU servers. It manages one thing: instances (processes with resource allocation).
PyLet is built for modern AI applications that need simple orchestration, strong execution environment isolation, and efficient sharing of GPUs and nodes.
- Simple: No containers, no complex configs, just instance. Just
pylet startandpylet submit. - Pythonic orchestration: Launch and coordinate distributed AI workloads with simple APIs.
- Execution isolation: Run heterogeneous processes with different software stacks on the same node, useful for agentic workflow, multi-agent orchestration and complex generator-verifier patterns.
- Fine-grained, flexible AI node resource sharing: Request specific GPUs by index, share GPUs across processes, and support partial GPU allocation, essential for improving GPU utilization.
- Easy logging and debugging: Stream logs from running and debugging instances.
- Service discovery: Instances get a
PORTenv var; endpoint available viaget-endpoint.
With PyLet, you can launch multiple AI nodes across laptops and GPU servers, and orchestrate agentic workflows and other distributed AI applications with much less system complexity.
See docs/architecture.md for design details. See CONTRIBUTING.md for contribution philosophy.
- Agentic workflow, swarm and teams applications
- Serverless AI application
- Heterogeneous multi-process AI applications
- Cloud-edge, multi-region AI deployment and orchestration
- Python 3.9+
- Linux (tested on Ubuntu)
pip install pyletFor development:
git clone https://github.com/ServerlessLLM/pylet.git
cd pylet
pip install -e ".[dev]"# Terminal 1: Start head node
pylet start
# Terminal 2: Start worker node with GPUs
pylet start --head localhost:8000 --gpu-units 4
# Terminal 3: Submit an instance
pylet submit 'vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT' \
--gpu-units 1 --name my-vllm
# Check status
pylet get-instance --name my-vllm
# Get endpoint for inference
pylet get-endpoint --name my-vllm
# Output: 192.168.1.10:15600
# View logs
pylet logs <instance-id>
# Cancel
pylet cancel <instance-id>import pylet
# Connect to head node
pylet.init() # or pylet.init("http://head:8000")
# Submit an instance
instance = pylet.submit(
"vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT",
name="my-vllm",
gpu=1,
memory=4096,
)
# Wait for it to start
instance.wait_running()
print(f"Endpoint: {instance.endpoint}")
# Get logs
print(instance.logs())
# Cancel when done
instance.cancel()
instance.wait()For local testing:
import pylet
with pylet.local_cluster(workers=2, gpu_per_worker=1) as cluster:
instance = pylet.submit("nvidia-smi", gpu=1)
instance.wait()
print(instance.logs())Async API available via import pylet.aio as pylet.
See examples/README.md for more detailed examples including vLLM and SGLang.
| Command | Description |
|---|---|
pylet start |
Start head node |
pylet start --head <ip:port> --gpu-units N |
Start worker with N GPUs |
pylet submit <cmd> --gpu-units N --name <name> |
Submit instance |
pylet get-instance --name <name> |
Get instance status |
pylet get-endpoint --name <name> |
Get instance endpoint (host:port) |
pylet logs <id> |
View instance logs |
pylet logs <id> --follow |
Follow logs in real-time |
pylet cancel <id> |
Cancel instance |
pylet list-workers |
List registered workers |
Apache 2.0