Talk To Me is a Spring Boot application that takes a text and creates an audio file. It's using the ONNX Runtime and a text-to-speech model and provides a very simple REST API.
The project is a proof of concept and in early development. It's main goal is to try out the ONNX Runtime and how to implement support for TTS engines. It's not performance-optimized, but rather focused on getting a working prototype up and running for experiments.
- Text-to-speech conversion using ONNX Runtime with GPU (CUDA) and CPU support. GPU is only supported for Windows x64 and Linux x64. See documentation
- Base structure to support multiple TTS Engines. Currently Pocket TTS and Chatterbox are implemented
- Most application settings are configurable. See application.properties for details
- Generated Audio files are saved at local storage (default path:
storage/)
Support TTS engines:
Pocket TTS:
- ONNX files: INT8 or FP32 versions of
flow_lm_flowandflow_lm_main. Themimi_decoder.onnx,mimi_encoder.onnxandtext_conditioner.onnx - Tokenizer files:
tokenizer.modelfile for the SentencePiece tokenizer - WAV file for the voice cloning. This should be a WAV file with a sample rate of 24000 Hz, mono channel, and 16-bit depth and a duration of around 10 seconds. In my tests the
reference_sample.wav(16000 Hz, mono) provided in the HuggingFace repository wasn't ideal. So I recommend using a custom file with the specifications.
Chatterbox:
- ONNX files:
speech_encoder.onnx,speech_encoder.onnx_data,embed_tokens.onnx,embed_tokens.onnx_data,language_model.onnx,language_model.onnx_data,conditional_decoder.onnx,conditional_decoder.onnx_data - Tokenizer files:
tokenizer.json(required) andtokenizer_config.json(recommended) - WAV file for the voice cloning (
voice.wavby default)
- Settings like address/port, basic auth, model path, model filenames or GPU support can be configured in application.properties
- Execute
./gradlew bootRunto start the application. The API will be available athttp://127.0.0.1:8080/api/tts - The included Bruno collection has example API calls. See the Bruno documentation how to install it.
The project is licensed by the Apache 2 license.