Skip to content

> CosyVoice open-source address https://github.com/FunAudioLLM/CosyVoice > > CosyVoice-api open-source address https://github.com/jianchang512/cosyvoice-api > > Supports Chinese, English, Japanese, Korean, and Cantonese, with corresponding language codes zh|en|jp|ko|yue

How to Use

For Window10/11, you can directly download the integrated package. After decompression, double-click run-api.bat to run the API service and double-click run-webui.bat to open the web interface.

Windows integrated package download address:

https://www.123684.com/s/03Sxjv-IljB3

2024-1126 Patch Download:

> After downloading, unzip and overwrite in the same directory as api.py to complete the upgrade. https://github.com/jianchang512/cosyvoice-api/releases/download/0.2/1126-buding.zip

Usage in Video Translation Software

  1. First, upgrade the software to 2.08+.
  2. Ensure that the CosyVoice project has been deployed, that api.py from CosyVoice-api has been placed in the project, and that api.py has been successfully started (the API service must be started to be used in the translation software).
  3. Open the video translation software, go to the top-left corner, Settings--CosyVoice: Fill in the API address, the default is http://127.0.0.1:9233
  4. Fill in the reference audio and corresponding text.
Reference audio filling:

Each line is divided into two parts by the '#' symbol. The first part is the wav audio path, and the second part is the text corresponding to the audio. Multiple lines can be filled.

The optimal duration of the wav audio is 5-15 seconds. If the audio is placed in the root path of the CosyVoice project, that is, the same directory as webui.py, only the name needs to be filled here.
If it is placed in the wavs directory under the root directory, then you need to fill in wavs/audio name.wav

Reference audio filling example:

1.wav#Hello dear friend
wavs/2.wav#Hello friends
  1. After filling in, select CosyVoice in the main interface dubbing channel, and select the corresponding role. The clone role copies the timbre from the original video.

For other systems, please deploy CosyVoice first. The specific deployment method is as follows

Source Code Deployment CosyVoice Official Project

> Deployment uses conda, which is also strongly recommended, otherwise installation may fail, and many problems will be encountered. Some dependencies cannot be successfully installed using pip on Windows, such as pynini

1. Download and install miniconda

Miniconda is a conda management software. It is easy to install on Windows, just like ordinary software, you can complete it by clicking next all the way.

Download address https://docs.anaconda.com/miniconda/

After downloading, double-click the exe file,

The only thing to note is that in the following interface, you need to select the top two checkboxes, otherwise the subsequent operation will be a little troublesome. The second box means "Add conda commands to system environment variables". If it is not selected, you will not be able to directly use the conda short command.

Then click "install" and wait until it is completed before closing.

2. Download CosyVoice source code

First create an empty directory, for example, create a folder D:/py on the D drive, and the following will be explained using this as an example.

Open the CosyVoice open-source address https://github.com/FunAudioLLM/CosyVoice

After downloading and decompressing, copy all files in the CosyVoice-main directory to D:/py.

3. Create a virtual environment and activate it

Enter the D:/py folder, enter cmd in the address bar, and press Enter. A cmd black window will open.

In this window, enter the command conda create -n cosyvoice python=3.10 and press Enter. This creates a virtual environment named "cosyvoice" with a python version of "3.10".

Continue to enter the command conda activate cosyvoice and press Enter to activate this virtual environment. Only after activation can installation, startup, and other operations be continued; otherwise, errors will inevitably occur.

The activation mark is that the "(cosyvoice)" character is added at the beginning of the command line.

4. Install the pynini module

This module can only be installed using the conda command on Windows. This is also the reason why conda is recommended for Windows at the beginning.

Continue to enter the command conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3 in the cmd window that has been opened and activated above, and press Enter.

Note: A prompt will appear during installation that requires confirmation. At this time, enter y and press Enter, as shown below.

5. Install other series of dependencies, using Alibaba Cloud mirror

> Open the requirements.txt file and delete the last line WeTextProcessing==1.0.3, otherwise the installation will definitely fail because this module depends on pynini, and pynini cannot be installed under pip on Windows. > > Then add 3 lines Matcha-TTS flask and waitress to requirements.txt. >

Continue to enter the command

pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

and press Enter. After a long wait, it will be installed successfully without any accidents.

6. Download the api.py file and place it in the project.

Go to this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py to download the api.py file. After downloading, place it together with webui.py.

image.png

image.png

Start the API Service

> The API address is: http://127.0.0.1:9233

Enter the command and press Enter to execute python api.py