Chatterbox TTS API Service
This is a high-performance Text-to-Speech (TTS) service based on Chatterbox-TTS. It provides an OpenAI TTS-compatible API interface, an enhanced interface supporting voice cloning, and a clean Web user interface.
This project aims to provide developers and content creators with a privately deployable, powerful, and easy-to-integrate TTS solution.

Project Repository: https://github.com/jianchang512/chatterbox-api
Usage in pyVideoTrans
This project can serve as a powerful TTS backend to provide high-quality English dubbing for pyVideoTrans.
Start This Project: Ensure the Chatterbox TTS API service is running locally (
http://127.0.0.1:5093).Update pyVideoTrans: Make sure your pyVideoTrans version is upgraded to
v3.73or higher.Configure pyVideoTrans:

- In the pyVideoTrans menu, go to
TTS Settings->Chatterbox TTS. - API Address: Enter the address of this service, default is
http://127.0.0.1:5093. - Reference Audio (Optional): If you want to use voice cloning, enter the filename of the reference audio here (e.g.,
my_voice.wav). Ensure this audio file is placed in thechatterboxfolder within the pyVideoTrans root directory. - Adjust Parameters: Tune
cfg_weightandexaggerationas needed for optimal results.
Parameter Tuning Suggestions:
- General Scenarios (TTS, Voice Assistant): Default settings (
cfg_weight=0.5,exaggeration=0.5) work well for most cases. - Fast-Paced Reference Audio: If the reference audio has a fast speaking rate, try lowering
cfg_weightto around0.3to improve the rhythm of the generated speech. - Expressive/Dramatic Speech: Try a lower
cfg_weight(e.g.,0.3) and a higherexaggeration(e.g.,0.7or higher). Increasingexaggerationoften speeds up the speech, while loweringcfg_weighthelps balance it for a more deliberate and clearer pace.
- In the pyVideoTrans menu, go to
Quick Start Method 1: For Windows Users
We provide a portable package win.7z containing all dependencies for Windows users, greatly simplifying the installation process.
Download and Extract:
Baidu Netdisk Download Link 【Built-in model ~4GB (CPU version, GPU method below)】https://pan.baidu.com/s/1zXzRAQ0P7X8LJp4OrCvw7w?pwd=1234
Start the Service:
Double-click the
启动服务.batscript in the root directory.When you see information similar to the following in the command window, the service has started successfully:

✅ Model loaded successfully. Service started successfully, HTTP address: http://127.0.0.1:5093
Method 2: For macOS, Linux, and Manual Installation Users
For macOS, Linux users, or Windows users who prefer a manual setup, please follow these steps.
1. Prerequisites
- Python: Ensure Python 3.9 or higher is installed.
- ffmpeg: This is a required audio/video processing tool.
- macOS (using Homebrew):
brew install ffmpeg - Debian/Ubuntu:
sudo apt-get update && sudo apt-get install ffmpeg - Windows (Manual): Download ffmpeg and add it to your system's
PATHenvironment variable.
- macOS (using Homebrew):
2. Installation Steps
# 1. Clone the repository
git clone https://github.com/jianchang512/chatterbox-api.git
cd chatterbox-api
# 2. Create and activate a Python virtual environment (recommended)
python3 -m venv venv
# on Windows:
# venv\Scripts\activate
# on macOS/Linux:
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start the service
python app.pyOnce the service starts successfully, you will see the service address http://127.0.0.1:5093 in the terminal.
⚡ Upgrade to GPU Version (Optional)
If your computer has an NVIDIA GPU with CUDA support and you have correctly installed the NVIDIA driver and CUDA Toolkit, you can upgrade to the GPU version for significant performance gains.
Windows Users (One-Click Upgrade)
- First, ensure you have successfully run
启动服务.batat least once to complete the basic environment setup. - Double-click the
安装N卡GPU支持.batscript. - The script will automatically uninstall the CPU version of PyTorch and install the GPU version compatible with CUDA 12.6.
Linux Manual Upgrade
After activating the virtual environment, execute the following commands:
# 1. Uninstall the existing CPU version of PyTorch
pip uninstall -y torch torchaudio
# 2. Install PyTorch matching your CUDA version
# The following command is for CUDA 12.6. Get the correct command for your CUDA version from the PyTorch website.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126You can visit the PyTorch website to get the installation command suitable for your system.
After upgrading, restart the service. You should see Using device: cuda in the startup logs.
📖 User Guide
1. Web Interface
After starting the service, open http://127.0.0.1:5093 in your browser to access the Web UI.
- Input Text: Enter the text you want to convert in the text box.
- Adjust Parameters:
cfg_weight: (Range 0.0 - 1.0) Controls speech rhythm. Lower values result in slower, more deliberate speech. For fast-paced reference audio, you can lower this value (e.g., 0.3).exaggeration: (Range 0.25 - 2.0) Controls the emotional and intonational exaggeration of the speech. Higher values produce more expressive speech and may increase speed.
- Voice Cloning: Click "Choose File" to upload a reference audio file (e.g., .mp3, .wav). If a reference audio is provided, the service will use the cloning interface.
- Generate Speech: Click the "Generate Speech" button, wait a moment, and you can preview and download the generated MP3 file online.
2. API Calls
Interface 1: OpenAI Compatible Interface (/v1/audio/speech)
This interface does not require reference audio and can be called directly using the OpenAI SDK.
Python Example (openai SDK):
from openai import OpenAI
import os
# Point the client to our local service
client = OpenAI(
base_url="http://127.0.0.1:5093/v1",
api_key="not-needed" # API key is not strictly required but needed by the SDK
)
response = client.audio.speech.create(
model="chatterbox-tts", # This parameter is ignored
voice="en", #
speed=0.5, # Corresponds to the cfg_weight parameter
input="Hello, this is a test from the OpenAI compatible API.",
instructions="0.5" # (Optional) Corresponds to the exaggeration parameter, note it must be a string
response_format="mp3" # Optional 'mp3' or 'wav'
)
# Save the audio stream to a file
response.stream_to_file("output_api1.mp3")
print("Audio saved to output_api1.mp3")Interface 2: Voice Cloning Interface (/v2/audio/speech_with_prompt)
This interface requires uploading both text and a reference audio file in multipart/form-data format.
Python Example (requests library):
import requests
API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
REFERENCE_AUDIO = "path/to/your/reference.mp3" # Replace with your reference audio path
form_data = {
'input': 'This voice should sound like the reference audio.',
'cfg_weight': '0.5',
'exaggeration': '0.5',
'response_format': 'mp3' # Optional 'mp3' or 'wav'
}
with open(REFERENCE_AUDIO, 'rb') as audio_file:
files = {'audio_prompt': audio_file}
response = requests.post(API_URL, data=form_data, files=files)
if response.ok:
with open("output_api2.mp3", "wb") as f:
f.write(response.content)
print("Cloned audio saved to output_api2.mp3")
else:
print("Request failed:", response.text)