Talk To Me

Talk To Me is a Spring Boot application that takes a text and creates an audio file. It's using the ONNX Runtime and a text-to-speech model and provides a very simple REST API.

The project is a proof of concept and in early development. It's main goal is to try out the ONNX Runtime and how to implement support for TTS engines. It's not performance-optimized, but rather focused on getting a working prototype up and running for experiments.

Features

Text-to-speech conversion using ONNX Runtime with GPU (CUDA) and CPU support. GPU is only supported for Windows x64 and Linux x64. See documentation
Base structure to support multiple TTS Engines. Currently Pocket TTS and Chatterbox are implemented
Most application settings are configurable. See application.properties for details
Generated Audio files are saved at local storage (default path: storage/)

Getting started

Support TTS engines:

Pocket TTS with this ONNX models
Chatterbox with onnx-community/chatterbox-onnx

Required files

Pocket TTS:

ONNX files: INT8 or FP32 versions of flow_lm_flow and flow_lm_main. The mimi_decoder.onnx, mimi_encoder.onnx and text_conditioner.onnx
Tokenizer files: tokenizer.model file for the SentencePiece tokenizer
WAV file for the voice cloning. This should be a WAV file with a sample rate of 24000 Hz, mono channel, and 16-bit depth and a duration of around 10 seconds. In my tests the reference_sample.wav (16000 Hz, mono) provided in the HuggingFace repository wasn't ideal. So I recommend using a custom file with the specifications.

Chatterbox:

ONNX files: speech_encoder.onnx, speech_encoder.onnx_data, embed_tokens.onnx, embed_tokens.onnx_data, language_model.onnx, language_model.onnx_data, conditional_decoder.onnx, conditional_decoder.onnx_data
Tokenizer files: tokenizer.json (required) and tokenizer_config.json (recommended)
WAV file for the voice cloning (voice.wav by default)

Configuration and running

Settings like address/port, basic auth, model path, model filenames or GPU support can be configured in application.properties
Execute ./gradlew bootRun to start the application. The API will be available at http://127.0.0.1:8080/api/tts
The included Bruno collection has example API calls. See the Bruno documentation how to install it.

License

The project is licensed by the Apache 2 license.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
bruno		bruno
gradle/wrapper		gradle/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Talk To Me

Features

Getting started

Required files

Configuration and running

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Talk To Me

Features

Getting started

Required files

Configuration and running

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages