Original Voice Cloning and Multi-Character Dubbing
Part 1: Video-Based Voice Cloning
Voice cloning refers to: Using the original speaker's voice in a video to dub over it. For example, translating Chinese into English sounds as if the same person is now speaking English instead of Chinese.
Among software dubbing channels, any channel with a
cloneoption in the dubbing character list supports voice cloning. Selectingclonemeans performing voice cloning during dubbing.Cloning principle: Extract the subtitle data to be dubbed, loop through each subtitle, use the starting time of that subtitle to clip the corresponding audio segment from the original video as reference audio, and then send the reference audio and subtitle text together to the dubbing channel for dubbing.
Channels Supporting Voice Cloning

- OmniVoice (local API): Supports all languages (recommended)
- Qwen-TTS (built-in locally): Supports over 10 common languages like Chinese, English, Japanese, Korean (recommended)
- GPT-SoVITS (local API): Supports Chinese, English, Japanese, Korean (recommended)
- F5-TTS (local API): Supports Chinese and English (recommended)
- VoxCPM-TTS (local API): Supports over 10 languages (recommended)
- Chatterbox (built-in locally): Supports over 10 languages (recommended)
- Index-TTS (local API): Supports Chinese and English (recommended)
- CosyVoice (local API): Supports over 10 common languages like Chinese, English, Japanese, Korean
- Spark-TTS (local API): Supports English
- Dia-TTS (local API): Supports English
- clone-voice (local API): Supports over 10 languages (no longer maintained, not recommended)
How to Use
Since an original video is needed, this feature is only available in the
Translate Video and Audiofunction.
- First, select the target language for dubbing from the
Target Languagedropdown. - Choose a dubbing channel from the
Dubbing Channellist. For channels marked with(local API), you must deploy the corresponding service locally on your computer. Refer to the respective documentation for deployment methods. After deployment, enter the API or WebUI address inSoftware - TTS Settings - Corresponding Channel Settings - URL. - Then, select the
cloneoption from theDubbing Characterdropdown.
Optimal Cloning Configuration
To ensure the best cloning results, it is recommended to follow these settings:
- Avoid using
LLM Re-segmentationas it re-divides the timeline, causing confusion when clipping reference audio from the original video. - Ensure each subtitle duration is between
3-10s. Too short reference audio (e.g., less than 3s) may result in noise, while too long (e.g., more than 10s) may cause errors in some channels. OpenMenu - Tools/Options - Advanced Options - Speech Recognition Parameters, setMaximum Voice Durationto6-10andMinimum Voice Duration (in milliseconds)to3000-4000to define the subtitle range. Also, select the option toMerge Overly Short Subtitlesso the program automatically merges them with adjacent ones. - Use an AI engine for translation, such as DeepSeek or OpenAI ChatGPT, and select
Send Complete Subtitles. - For speech recognition, for Chinese, use
Qwen-ASR/Doubao Voice Large Model - Speed Version/Ali Bailianetc., and for English, useFaster-whisper+large-v3model. - Click
Set More Parametersand selectSeparate Vocals and Background Noiseto obtain clean vocals without background noise, thereby improving cloning quality.
If many of your subtitles are shorter than 3s, it is recommended to use the OmniVoice-TTS dubbing channel, which avoids errors with short reference audio.
Using Reference Audio
Sometimes you may not want to clone the original video's voice but use a voice from a local audio file or even your own voice.
- First, record or otherwise obtain a
5-10sWAV format audio file. Ensure the audio contains clear, accurate, and single voice without background noise, and no extra silence at the beginning or end. For example, you can use tools like CapCut to extract a 10s speech segment from a longer audio or video as reference audio. - Ensure the audio is in
WAVformat, named with a short name likemyaudio1.wav, and copy it to thesoftware/f5-ttsfolder. Then, openSoftware Menu - TTS Settings - Set Reference Audio, start a new line in the text box, and entermyaudio1.wav#the text spoken in this audio, then save. For example:
myaudio1.wav#You say all is empty, yet you keep your eyes closed. If you opened them to look at me, I don't believe you would see nothing.Note: For GPT-SoVITS dubbing, reference audio should be placed in the root directory of the GPT-SoVITS software, not in the f5-tts folder.
- After saving, return to the main interface, select
myaudio1.wavfrom the dubbing character dropdown, and you can use it.
WAV format audio files have a suffix of
.wav. If you cannot see it, open any folder, clickView - File Name Extensionsin the folder's navigation bar, and check it. In Windows 11, it isView - Show - File Name Extensions.
Part 2: Multi-Character Subtitle-Based Dubbing
Since v3.74, the "Multi-Character Subtitle Dubbing" feature has been added. Click the Multi-Character Subtitle Dubbing button on the left toolbar. In the pop-up window, import the SRT subtitle file to be dubbed, then assign a character to each subtitle to achieve multi-voice dubbing.

