clone-voice Voice Cloning Tool

The model used in this project is xtts_v2 produced by coqui.ai, released under the Coqui Public Model License 1.0.0. Please follow this license when using this project. The full license text is available at https://coqui.ai/cpml.txt.

This is a voice cloning tool that can use any human voice tone to synthesize speech from text or convert one voice to another using that tone.

It's very easy to use and works without an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and use it with just a few clicks.

Supports 16 languages including Chinese, English, Japanese, Korean, French, German, Italian, and allows online voice recording from a microphone.

For optimal synthesis results, it is recommended to record a clear, accurate voice clip lasting 5 to 20 seconds without background noise.

The English synthesis effect is excellent, while the Chinese effect is acceptable.

How to Use the Windows Pre-compiled Version (Other systems can be deployed from source)

Click here to open the Releases download page, download the pre-compiled main file (~1.7G) and the model file (~3G).
After downloading, extract the files to a location, e.g., E:/clone-voice.
Double-click app.exe and wait for the web window to open automatically. Please read the text prompts in the cmd window carefully, as any errors will be displayed there.
After downloading the model, extract it into the tts folder within the software directory. The result after extraction should look like this:

Conversion Steps
- Select the 【Text -> Voice】 button, enter text in the text box or click "Import SRT subtitle file", then click "Start Now".
- Select the 【Voice -> Voice】 button, click or drag the audio file to convert (mp3/wav/flac), then select the voice tone to clone from the "Voice File to Use" dropdown. If none are satisfactory, you can also click the "Local Upload" button to choose a pre-recorded 5-20s wav/mp3/flac voice file. Or click the "Start Recording" button to record your own voice online for 5-20s, then click "Use" after finishing. Finally, click the "Start Now" button.
If the machine has an NVIDIA GPU and the CUDA environment is correctly configured, CUDA acceleration will be used automatically.

Source Code Deployment (Linux, Mac, Windows)

The source code version requires setting a proxy in the .env file (e.g., HTTP_PROXY=http://127.0.0.1:7890). It needs to download models from https://huggingface.co and https://github.com, which are inaccessible from within China. You must ensure the proxy is stable and reliable, otherwise, large model downloads may fail midway.

Requirements: Python 3.9 -> 3.11, and the Git-CMD tool must be installed beforehand. Download Link.
Create an empty directory, e.g., E:/clone-voice. Open a cmd window in this directory by typing cmd in the address bar and pressing Enter. Use git to pull the source code to the current directory: git clone [email protected]:jianchang512/clone-voice.git .
Create a virtual environment: python -m venv venv
Activate the environment. On Windows: E:/clone-voice/venv/scripts/activate.
Install dependencies: pip install -r requirements.txt --no-deps. For Windows and Linux to enable CUDA acceleration, continue by executing pip uninstall -y torch to uninstall, then execute pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121. (Requires an NVIDIA GPU and a properly configured CUDA environment.)
On Windows, extract ffmpeg.7z and place ffmpeg.exe in the same directory as app.py. For Linux and Mac, download the corresponding version of ffmpeg from the ffmpeg official website, extract the ffmpeg binary, and place the executable ffmpeg in the same directory as app.py.
First, run python code_dev.py. When prompted to agree to the license, enter y, then wait for the model download to complete.
Downloading the model requires a global proxy. The model is very large. If the proxy is not stable and reliable, you may encounter many errors; most errors are caused by proxy issues.
If it shows that multiple models were downloaded successfully but finally prompts a "Downloading WavLM model" error, you need to modify the library file \venv\Lib\site-packages\aiohttp\client.py. Around line 535, add your proxy address above the line if proxy is not None:, e.g., proxy="http://127.0.0.1:10809".
After the download is complete, start the application with python app.py.
【Training Instructions】 If you want to train, execute python train.py. Adjust training parameters in param.json, then re-run the training script python train.py.
Each startup will connect to external servers to check for or update models. Please wait patiently. If you don't want this check/update on every startup, you need to manually modify a file in the dependency package. Open \venv\Lib\site-packages\TTS\utils\manage.py, around line 389, inside the def download_model method, comment out the following code:

if md5sum is not None:
	md5sum_file = os.path.join(output_path, "hash.md5")
	if os.path.isfile(md5sum_file):
	    with open(md5sum_file, mode="r") as f:
		if not f.read() == md5sum:
		    print(f" > {model_name} has been updated, clearing model cache...")
		    self.create_dir_and_download_model(model_name, model_item, output_path)
		else:
		    print(f" > {model_name} is already downloaded.")
	else:
	    print(f" > {model_name} has been updated, clearing model cache...")
	    self.create_dir_and_download_model(model_name, model_item, output_path)

The source code version may frequently encounter errors during startup, mostly due to proxy issues preventing model download from external sources or incomplete downloads. It is recommended to use a stable proxy and enable it globally. If you cannot complete the download, it is recommended to use the pre-compiled version.

Frequently Asked Questions

The xtts model is for learning and research purposes only, not for commercial use.

The source code version requires setting a proxy in the .env file (e.g., HTTP_PROXY=http://127.0.0.1:7890). It needs to download models from https://huggingface.co and https://github.com, which are inaccessible from within China. You must ensure the proxy is stable and reliable, otherwise, large model downloads may fail midway.
After startup, the model needs to be cold-loaded, which takes some time. Please wait patiently until http://127.0.0.1:9988 is displayed and the browser page opens automatically. Wait another two or three minutes before starting conversion.

Features include:

 Text-to-Speech: Input text and generate speech using the selected voice tone.
 
 Voice-to-Voice: Select an audio file from local storage and generate another audio file using the selected voice tone.

If the opened cmd window remains inactive for a long time and requires pressing Enter to continue output, click the icon in the top-left corner of the cmd window, select "Properties", then uncheck the boxes for "QuickEdit Mode" and "Insert Mode".
Pre-compiled version: Voice-to-Voice thread fails to start.
First, confirm the model is correctly downloaded and placed. The tts folder should contain 3 subfolders, as shown in the image below.
If it's correctly placed but the error persists, click to download extra-to-tts_cache.zip. Extract the downloaded file and copy the resulting 2 files into the tts_cache folder in the software's root directory.
If the above method doesn't work, fill in the proxy address in the .env file after HTTP_PROXY, e.g., HTTP_PROXY=http://127.0.0.1:7890. This can resolve the issue. Ensure the proxy is stable and the port is correct.
Prompt: "The text length exceeds the character limit of 182/82 for language"
This is caused by sentences separated by periods being too long. It is recommended to use periods to separate very long statements instead of using many commas. Alternatively, you can open the clone/character.json file and manually modify the limit.
Prompt: "symbol not found __svml_cosf8_ha"
Open the webpage https://www.dll-files.com/svml_dispmd.dll.html, click the red "Download" text. After downloading, extract the file and copy the DLL inside to C:\Windows\System32.

CUDA Acceleration Support

Installing CUDA Tools Detailed Installation Guide

If your computer has an NVIDIA graphics card, first update the graphics driver to the latest version. Then install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA 11.X.

After installation, press Win + R, type cmd, and press Enter. In the pop-up window, type nvcc --version to confirm version information is displayed, similar to this image:

Then type nvidia-smi to confirm output information is shown, including the CUDA version number, similar to this image:

This indicates correct installation and CUDA acceleration is ready. Otherwise, reinstallation is required.

clone-voice Voice Cloning Tool ​

How to Use the Windows Pre-compiled Version (Other systems can be deployed from source) ​

Source Code Deployment (Linux, Mac, Windows) ​

Frequently Asked Questions ​

CUDA Acceleration Support ​

clone-voice Voice Cloning Tool

How to Use the Windows Pre-compiled Version (Other systems can be deployed from source)

Source Code Deployment (Linux, Mac, Windows)

Frequently Asked Questions

CUDA Acceleration Support