GPT-SoVITS is an excellent open-source multilingual text-to-speech (TTS) project, supporting languages such as Chinese, English, Japanese, Korean, and more. Its main features include:
Zero-shot Text-to-Speech (TTS): Quickly generate speech using just a 5-second voice sample.
Few-shot TTS: Fine-tune the model with just 1 minute of training data to improve voice similarity and naturalness.
Cross-language Support: Supports synthesis in languages different from the training dataset, currently including English, Japanese, Korean, Cantonese, and Chinese.
GPT-SoVITS has now been upgraded to version v2, with the following new features:
- Added support for Korean and Cantonese.
- Optimized text front-end processing.
- Expanded the underlying model training data to 5000 hours.
- Generates higher-quality synthetic audio from low-quality reference audio (e.g., internet audio with missing high frequencies or muffled sound).
GPT-SOVITS User Manual: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e
The video translation software has integrated GPT-SoVITS v2. This article briefly explains how to download the GPT-SoVITS integration package and use it within the video translation software.
Download the Integration Package
It is recommended to download the official GPT-SoVITS integration package to ensure compatibility. Third-party API interfaces are not compatible with the official version and may cause errors in the video translation software.
Download link: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4

Start the API Service
In the GPT-SoVITS folder, type cmd in the address bar and press Enter. In the opened terminal window, enter .\runtime\python api_v2.py to start the API service.

The default port is 9880. In the video translation software, you need to enter http://127.0.0.1:9880.
The API service must be started to use it in the translation software.
Configure in the Video Translation Dubbing Software
1. Enter the API Address
Start the software, click Menu -> TTS Settings -> GPT-SoVITS, and enter http://127.0.0.1:9880 in the API Text Box.

Note: The default port is 9880. If you modify the port, the API address must be changed accordingly. Also, ensure that when deploying locally, the address should be
127.0.0.1, not0.0.0.0.
2. Enter the Reference Audio
Note: The reference audio must be in WAV format and have a duration of 5-10 seconds; otherwise, a 400 Client error will occur.
The reference audio is the audio whose voice timbre GPT-SoVITS will use for speech synthesis. Suppose you have an audio file 1.wav (5 seconds long, with the content "Today is a good weather, pouring rain is falling heavily"). You can copy this file to the GPT-SoVITS folder, place it in the same location as the api_v2.py file, and enter the corresponding content in the software's Reference Audio Text Box.

Language codes:
zhfor Chinese,enfor English,jafor Japanese,kofor Korean.
If you store all reference audio files in the wavs folder within the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#Today is a good weather, pouring rain is falling heavily#zh.

3. Check the api_v2? Option
If you started the api_v2.py file, make sure the api_v2? option is selected. 
4. Test the Connection
Click the test button. If no error is reported, the configuration is successful.
Common Issues
404 error during testing
This is caused by using a third-party integration package, as its API is not compatible with the official version. Please download and use the official package.
Prompt: "The remote computer actively refused" or "Please check if the API service is started"
This may be because the API service is not started or is blocked by the firewall. Ensure the API is started or disable the firewall.
