GPT-SoVITS is an excellent open-source multilingual text-to-speech (TTS) project supporting multiple languages including Chinese, English, Japanese, and Korean. Its main features include:
Zero-Shot Text-to-Speech (TTS): Generate speech quickly with only a 5-second audio sample.
Few-Shot TTS: Fine-tune the model with only 1 minute of training data to improve voice similarity and naturalness.
Cross-lingual Support: Supports synthesis in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese, and Chinese.
GPT-SoVITS has been upgraded to version v2, adding the following features:
- Added support for Korean and Cantonese.
- Optimized text frontend processing.
- Expanded the underlying model training data to 5000 hours.
- Can generate higher-quality synthesized audio from low-quality reference audio (such as network audio with missing high frequencies or muffled sound).
GPT-SOVITS User Manual https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e
The video translation software has integrated the GPT-SoVITS v2 version. This article will briefly introduce how to download the GPT-SoVITS integrated package and use it in the video translation software.
Download the Integrated Package
It is recommended to download the official integrated package of GPT-SoVITS to ensure compatibility. Third-party API interfaces are incompatible with the official version and may cause errors in the video translation software.
Download link: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4
Start the API Service
In the address bar of the GPT-SoVITS folder, type cmd
and press Enter. In the terminal window that pops up, enter .\runtime\python api_v2.py
to start the API service.
The default port is 9880
. In the video translation software, you need to enter http://127.0.0.1:9880
.
The api service must be started to use it in the translation software.
Configure in the Video Translation Dubbing Software
1. Fill in the API Address
Start the software, click Menu -> TTS Settings -> GPT-SoVITS
, and enter http://127.0.0.1:9880
in the API Text Box
.
Note: The default port is 9880. If you change the port, the API address must also be changed accordingly. Also, ensure that when deploying locally, the address should be
127.0.0.1
, not0.0.0.0
.
2. Fill in the Reference Audio
The reference audio is the audio whose tone GPT-SoVITS will use for speech synthesis. Suppose you have an audio file 1.wav
(5 seconds long, containing "今天是个好天气,瓢泼大雨倾盆下"), you can copy this file to the GPT-SoVITS folder, place it in the same location as the api_v2.py
file, and enter the corresponding content in the software's Reference Audio Text Box
.
Language code:
zh
represents Chinese,en
represents English,ja
represents Japanese,ko
represents Korean.
If you store the reference audio files uniformly in the wavs
folder in the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#今天是个好天气,瓢泼大雨倾盆下#zh
.
3. Check api_v2?
If you are starting the api_v2.py
file, make sure to select the api_v2?
option.
4. Test Connection
Click Test. If there is no error, the configuration is successful.
Frequently Asked Questions
404 error during testing
This is due to the use of a third-party integrated package. The third-party package's API is incompatible with the official version. Please download and use the official package.
"The remote computer actively refused" or "Please check if the api service is started"
The API service may not be started, or it may be blocked by the firewall. Please ensure that the API is started, or close the