Skip to content

ServerlessLLM/pylet

Repository files navigation

PyLet

PyPI version Python versions License

A simple AI node orchestration and management library, like Ray/K8s, but simpler and lighter.

Why PyLet?

PyLet is a lightweight and reliable system for launching and orchestrating AI workloads across laptops, edge devices, and GPU servers. It manages one thing: instances (processes with resource allocation).

PyLet is built for modern AI applications that need simple orchestration, strong execution environment isolation, and efficient sharing of GPUs and nodes.

  • Simple: No containers, no complex configs, just instance. Just pylet start and pylet submit.
  • Pythonic orchestration: Launch and coordinate distributed AI workloads with simple APIs.
  • Execution isolation: Run heterogeneous processes with different software stacks on the same node, useful for agentic workflow, multi-agent orchestration and complex generator-verifier patterns.
  • Fine-grained, flexible AI node resource sharing: Request specific GPUs by index, share GPUs across processes, and support partial GPU allocation, essential for improving GPU utilization.
  • Easy logging and debugging: Stream logs from running and debugging instances.
  • Service discovery: Instances get a PORT env var; endpoint available via get-endpoint.

With PyLet, you can launch multiple AI nodes across laptops and GPU servers, and orchestrate agentic workflows and other distributed AI applications with much less system complexity.

See docs/architecture.md for design details. See CONTRIBUTING.md for contribution philosophy.

Good Fit For

  • Agentic workflow, swarm and teams applications
  • Serverless AI application
  • Heterogeneous multi-process AI applications
  • Cloud-edge, multi-region AI deployment and orchestration

Requirements

  • Python 3.9+
  • Linux (tested on Ubuntu)

Install

pip install pylet

For development:

git clone https://github.com/ServerlessLLM/pylet.git
cd pylet
pip install -e ".[dev]"

Quick Start

CLI

# Terminal 1: Start head node
pylet start

# Terminal 2: Start worker node with GPUs
pylet start --head localhost:8000 --gpu-units 4

# Terminal 3: Submit an instance
pylet submit 'vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT' \
    --gpu-units 1 --name my-vllm

# Check status
pylet get-instance --name my-vllm

# Get endpoint for inference
pylet get-endpoint --name my-vllm
# Output: 192.168.1.10:15600

# View logs
pylet logs <instance-id>

# Cancel
pylet cancel <instance-id>

Python API

import pylet

# Connect to head node
pylet.init()  # or pylet.init("http://head:8000")

# Submit an instance
instance = pylet.submit(
    "vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT",
    name="my-vllm",
    gpu=1,
    memory=4096,
)

# Wait for it to start
instance.wait_running()
print(f"Endpoint: {instance.endpoint}")

# Get logs
print(instance.logs())

# Cancel when done
instance.cancel()
instance.wait()

For local testing:

import pylet

with pylet.local_cluster(workers=2, gpu_per_worker=1) as cluster:
    instance = pylet.submit("nvidia-smi", gpu=1)
    instance.wait()
    print(instance.logs())

Async API available via import pylet.aio as pylet.

See examples/README.md for more detailed examples including vLLM and SGLang.

Commands

Command Description
pylet start Start head node
pylet start --head <ip:port> --gpu-units N Start worker with N GPUs
pylet submit <cmd> --gpu-units N --name <name> Submit instance
pylet get-instance --name <name> Get instance status
pylet get-endpoint --name <name> Get instance endpoint (host:port)
pylet logs <id> View instance logs
pylet logs <id> --follow Follow logs in real-time
pylet cancel <id> Cancel instance
pylet list-workers List registered workers

License

Apache 2.0

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages