HiveNode is the worker component of the Hive system. It connects to a central HiveCore proxy and runs local inference using Ollama. By running HiveNode on any machine (on-premise, cloud, or behind firewalls), you can join that machine’s compute resources to the HiveCore network and serve requests routed by the central proxy.
- Overview
- Key Features
- Installation & Setup
- Configuration
- Running
- How it Works
- Logging & Monitoring
- Contributing
- License
In the Hive architecture:
- HiveCore serves as the central proxy and gateway, managing queues and distributing inference requests.
- HiveNode runs on worker machines. It connects out to HiveCore (so the worker does not need to be publicly accessible). Once connected, HiveNode polls inference jobs from HiveCore, which then uses its local Ollama server to process requests.
This design allows multiple machines, possibly spread across different networks, to operate as a single, unified inference cluster.
-
Docker-First Ollama Runtime
HiveNode primarily manages an
ollama/ollamaDocker container itself, including startup and in-place upgrades. -
Bring Your Own Ollama
If you already run Ollama yourself, HiveNode can target an external Ollama URL instead of managing Docker.
-
Multiple Concurrent Connections
Each HiveNode can open several parallel connections to HiveCore, letting you scale inference throughput per worker.
-
Centralized Configuration & Scaling
Workers require minimal configuration—just point them to HiveCore and set a valid key.
-
Extensible Logging
Built-in InfluxDB logging for GPU usage (via NVML) and system metrics if environment variables are set.
-
Prerequisites
-
Clone the repository
git clone https://github.com/VakeDomen/HiveNode.git cd HiveNode -
Configure
- rename
.env.sampleto.env
mv .env.sample .env
- configure the
.envenvironment as defined in Configuration
- rename
-
Run
- see Running
cargo run --release
A sample .env for the default Docker-managed mode might look like:
# The address and port of HiveCore’s node connection server (default 7777 in HiveCore).
HIVE_CORE_URL=hivecore.example.com:7777
# Worker key provided by HiveCore admin. Must have "Worker" role.
HIVE_KEY=my-secret-key
# docker (default) or external
OLLAMA_MODE=docker
# Docker-managed Ollama settings
OLLAMA_PORT=11434
HIVE_OLLAMA_MODELS=/usr/share/ollama/.ollama/
GPU_PASSTHROUGH=-1
# Number of parallel connections to open to HiveCore. Best to match Ollama configuration.
CONCURRENT_REQUESTS=4
# (Optional) InfluxDB settings for logging
INFLUX_HOST=http://localhost:8086
INFLUX_ORG=MY_ORG
INFLUX_TOKEN=my-tokenA sample .env for bring-your-own Ollama mode:
HIVE_CORE_URL=hivecore.example.com:7777
HIVE_KEY=my-secret-key
OLLAMA_MODE=external
OLLAMA_URL=http://localhost:11434
CONCURRENT_REQUESTS=4HIVE_CORE_URL: Where HiveNode connects to HiveCore (must match HiveCore’sNODE_CONNECTION_PORT, by default7777).HIVE_KEY: The Worker key from HiveCore’s admin interface. Required for authentication.OLLAMA_MODE:dockerby default. Setexternalto use an existing Ollama instance instead of Docker-managed Ollama.OLLAMA_PORT: Host port for the Docker-managed Ollama container. Required indockermode.HIVE_OLLAMA_MODELS: Host directory mounted into the Docker-managed Ollama container for model storage. Required indockermode.GPU_PASSTHROUGH: Optional GPU selection for Docker mode. Use-1for all GPUs, a comma-separated list such as0,1for specific GPUs, or leave unset for CPU mode.OLLAMA_URL: Required only inexternalmode. The local or remote address of the Ollama service.CONCURRENT_REQUESTS: Sets how many parallel connections (and thus concurrent tasks) this HiveNode should proxy. Adjust based on your hardware resources and Ollama configuration.INFLUX_*: (Optional) If configured, HiveNode will record logs and GPU usage metrics to InfluxDB. If not provided, it simply won’t log to Influx.
Docker-managed mode is the primary path. In this mode HiveNode will pull or reuse ollama/ollama, bind it to OLLAMA_PORT, mount HIVE_OLLAMA_MODELS, and internally set OLLAMA_URL to that local container.
If you prefer to manage Ollama yourself, set OLLAMA_MODE=external and provide OLLAMA_URL.
If you want to prepare an Ollama host manually, or experiment with multiple instances and GPU layouts, you can use the provided setup_ollama.sh helper script.
This script: - Checks if netstat and curl are installed (and installs them if not). - Installs Ollama if not already present. - Lets you pick how many Ollama instances to run and how to assign GPUs to each instance. - Finds free ports for each Ollama instance, runs it, and logs their output.
chmod +x setup_ollama.sh
./setup_ollama.shAfter configuring the .env file, run:
cargo run --releaseOr compile and execute binary directly:
cargo build --release
./target/release/hive_node- Authentication
- On startup, HiveNode initializes Ollama according to
OLLAMA_MODE. - Each of the
CONCURRENT_REQUESTSworker threads tries to authenticate to HiveCore using the key inHIVE_KEY. - Upon successful auth, HiveNode advertises its versions and its supported models to HiveCore.
- On startup, HiveNode initializes Ollama according to
- Polling & Proxying
- HiveNode periodically polls HiveCore for incoming tasks. If HiveCore’s queue has work for a given model, it dispatches it to the node.
- HiveNode forwards the request to
OLLAMA_URLfor local inference (via the Ollama HTTP API), then streams the response back to HiveCore.
- Reconnection & Control
- If the connection drops or an error occurs, HiveNode waits briefly, then reconnects.
- HiveCore can issue commands like
REBOOTorSHUTDOWN, which HiveNode listens for in the incoming messages. UPDATEis supported in Docker-managed mode and causes HiveNode to refresh the Docker image and reconnect.
- Scaling
- To allow more capacity on the same machine, increase the
CONCURRENT_REQUESTScount. - To add more workers across multiple machines, simply run additional HiveNode instances (each with its own .env and valid Worker key).
- Hive supports having multiple workers on the same machine, but each worker should have its own token.
- To allow more capacity on the same machine, increase the
HiveNode can log system metrics like GPU usage, memory, and proxied requests to InfluxDB:
- Enable Influx Logging: Provide
INFLUX_HOST,INFLUX_ORG, andINFLUX_TOKENin the.env. - GPU Metrics: HiveNode uses NVML to gather GPU info. This is only collected if an NVIDIA GPU is present and NVML is available on the system.
- Request Streaming: All inference requests and responses can be logged with success/error tags.
These metrics are pushed to Influx in the background. If any of the Influx environment variables are missing or invalid, HiveNode just skips that monitoring.
We welcome pull requests! Before submitting, please open an issue to discuss your proposed changes. Make sure to:
- Keep code style consistent.
- Update documentation if adding or changing features.
HiveNode is distributed under the MIT License, just like HiveCore. See the LICENSE file in this repository for details.
