Puma aims to be a lightweight, high-performance inference engine for heterogeneous devices. Currently under active development.
Run make build to build the puma binary.
Run ./puma help to see all available commands.
For example, you can run ./puma version to see the binary version.
Use llama.cpp as the default backend for quick prototyping, will implement our own backend in the future.