A Jarvis style AI that lives in your computer, ready to serve your requests.
Its an Ai that lives in your computer, and can do commands like make new projects, open apps and more to come!
Anything related to downloading files from the web, or running terminal commands are strictly limited. It can at most access the web and make repos/vs code projects. It is also blocked from the network (unless you disable it and have a good internet provider).
graph TD
A(Mic/STT) --> B[LLM/Agent]
B --> C1(Piper/TTS)
B --> C2(Actions/Keyboard input)
Input: Could be Mic or a wakeup call "Wake up daddys home" for example.
Ouptut: a pipeline of actions, creating media, even asking clarification questions as well.
So far we are using:
Agent: Qwen3.5 2b. This is big enough to have great agentic capability, whilst being able to fit on a laptop
STT: We are using faster-whisper because of its compatibility across platforms/code, there are probably better options though
TTS: Piper TTS is what we are using. Despite its quality, it can get a near 200x real time audio speed with compatibility across devices of all sorts. We are looking to finetune piper to "sound" more like jarvis using huggingface datasets, but it has not been done yet.
Enviroment: So far, the idea is that it can input preset keyboard commands, though we are looking to use the Agents multimodel capabilities to control its actions (with safeguards).
-
Install the requirements:
pip install -r requirements.txt -
Install torch with the right CUDA version for your system (see pytorch.org), e.g. for CUDA 13.0:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
other gpus may work better with different versions, so be sure to check the compatibility. If you don't have a compatible GPU, you can still run the model on CPU, but it will be much slower.
CPU only (if you dont have a gpu): pip install torch torchvision torchaudio
-
Download the Piper TTS binary and put it somewhere on your PATH (or next to
Piper_tts.py) -
Install Ollama and pull the model:
ollama pull qwen3.5:2b
qwen3.5:27bis an amazing option if you have 16 GB+ VRAM (will be a tight fit though) -
Run the chatbot:
- Basic mode (local transformers, no tools):
python mainchat.py - Agent mode (Ollama + tool/MCP support):
USE_QWEN_AGENT=1 python mainchat.py
- Basic mode (local transformers, no tools):
To fine-tune Piper on a custom voice, install the extra deps first:
sudo apt install espeak-ng
pip install piper-phonemize --extra-index-url https://rhasspy.github.io/piper-phonemize/
pip install "piper-train @ git+https://github.com/rhasspy/piper.git#subdirectory=src/python"Then run python training/piperCustomVoice.py
You can choose any dataset youd like, but be sure to not steal other peoples data (very bad). we use open sourced datasets on huggingface.