Skip to content

Voice Cloning Tool

clone-voice voice cloning tool open-source project address

This project uses models from https://github.com/coqui-ai/TTS, under the CPML license. It's for learning and research purposes only and cannot be used commercially.

This is a voice cloning tool that can use any human voice to synthesize text into speech using that voice, or convert one voice into another using that voice.

It's very easy to use, and you don't need an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and you can use it with a few clicks.

Supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian, and can record audio online from a microphone.

To ensure good synthesis results, it is recommended to record for 5 to 20 seconds, with clear and accurate pronunciation and no background noise.

The English effect is great, and the Chinese effect is decent.

How to Use the Pre-compiled Windows Version (Source code deployment available for other systems)

  1. Click here to open the Releases download page, download the pre-compiled main file (1.7G) and the model (3G).

  2. After downloading, extract it to a location, such as E:/clone-voice.

  3. Double-click app.exe, wait for the web window to open automatically. Please carefully read the text prompts in the cmd window. Any errors will be displayed here.

  4. After downloading, extract the model to the tts folder in the software directory.

  5. Conversion steps:

    • Select the 【Text->Voice】 button, enter text in the text box, or click to import an SRT subtitle file, then click "Start Now".

    • Select the 【Voice->Voice】 button, click or drag the audio file (mp3/wav/flac) to be converted, then select the voice to clone from the "Voice to use" drop-down box. If you don't have a satisfactory one, you can also click the "Local Upload" button to select a recorded 5-20s wav/mp3/flac sound file. Or click the "Start Recording" button to record your own voice online for 5-20 seconds, click "Use" after recording. Then click the "Start Now" button.

  6. If the machine has an NVIDIA GPU and CUDA is correctly configured, CUDA acceleration will be used automatically.

Source Code Deployment (Linux, Mac, Windows)

The source code version requires a global proxy because it needs to download models from https://huggingface.co, which is inaccessible in China.

  1. Requires python 3.9->3.11

  2. Create an empty directory, such as E:/clone-voice, open a cmd window in this directory. The method is to enter cmd in the address bar and then press Enter. Use git to pull the source code to the current directory: git clone [email protected]:jianchang512/clone-voice.git .

  3. Create a virtual environment: python -m venv venv

  4. Activate the environment, on Windows: E:/clone-voice/venv/scripts/activate,

  5. Install dependencies: pip install -r requirements.txt

  6. On Windows, extract ffmpeg.7z, place ffmpeg.exe in the same directory as app.py. On Linux and Mac, download the corresponding version of ffmpeg from the ffmpeg website, extract the ffmpeg program to the root directory. You must place the executable binary file ffmpeg and app.py in the same directory.

    First run python code_dev.py, enter y when prompted to agree to the agreement, then wait for the model download to complete. Downloading the model requires a global proxy. The model is very large, and if the proxy is not stable enough, you may encounter many errors. Most errors are caused by proxy problems.

    If it shows that multiple models have been downloaded successfully, but finally still prompts "Downloading WavLM model" error, you need to modify the library file \venv\Lib\site-packages\aiohttp\client.py, around line 535, add your proxy address above if proxy is not None:, such as proxy="http://127.0.0.1:10809".

  7. After downloading is complete, start python app.py,

  8. Each startup will connect to check for updates or updates to the model, please wait patiently. If you don't want to check or update each time you start, you need to manually modify the dependent package files, open \venv\Lib\site-packages\TTS\utils\manage.py, around line 389, in the def download_model method, comment out the following code:

if md5sum is not None:
	md5sum_file = os.path.join(output_path, "hash.md5")
	if os.path.isfile(md5sum_file):
	    with open(md5sum_file, mode="r") as f:
		if not f.read() == md5sum:
		    print(f" > {model_name} has been updated, clearing model cache...")
		    self.create_dir_and_download_model(model_name, model_item, output_path)
		else:
		    print(f" > {model_name} is already downloaded.")
	else:
	    print(f" > {model_name} has been updated, clearing model cache...")
	    self.create_dir_and_download_model(model_name, model_item, output_path)
  1. The source code version may frequently encounter errors during startup, mainly due to proxy problems that prevent the model from being downloaded from outside the wall or the download being interrupted and incomplete. It is recommended to use a stable proxy and enable it globally. If you still cannot download completely, it is recommended to use the pre-compiled version.

CUDA Acceleration Support

Detailed installation instructions for CUDA tools

Notes

The xtts model is for learning and research purposes only and cannot be used commercially.

  1. The source code version requires a global proxy because it needs to download models from https://huggingface.co, which is inaccessible in China. The source code version may frequently encounter errors during startup, mainly due to proxy problems preventing the model from being downloaded from outside the wall or the download being interrupted and incomplete. It is recommended to use a stable proxy and enable it globally. If you still cannot download completely, it is recommended to use the pre-compiled version.

  2. After startup, it needs to cold load the model, which will consume some time. Please wait patiently until http://127.0.0.1:9988 is displayed and the browser page opens automatically. Wait for two or three minutes before converting.

  3. Functions include:

     Text-to-speech: Enter text and generate speech using the selected voice.
    
     Voice-to-voice: Select an audio file locally and generate another audio file using the selected voice.
    
  4. If the opened cmd window is idle for a long time, you need to press Enter on it to continue outputting. Please click on the icon in the upper left corner of the cmd, select "Properties", and then uncheck the "Quick Edit" and "Insert Mode" checkboxes.