Supports Two Forms of Qwen3-TTS
Qwen-TTS is an advanced speech synthesis technology that converts text into incredibly realistic and natural-sounding human voices. One of its standout features is the ability to automatically adjust the speech rhythm and emotion based on the text content.
- The first is the built-in local Qwen3-TTS.
- The second is Alibaba Bailian's online API.
Qwen3-TTS Local Built-in (Offline Version)
Please ensure you have upgraded to version 3.97+ or higher. It can be used directly after the upgrade and will use the 1.7B model by default.
On first use, it will automatically download two sets of models (base and custom), totaling approximately 8GB. Please be patient. You can also download them manually. The manual download method is as follows:
Open the
modelsfolder within the software directory and create two new folders:models--Qwen--Qwen3-TTS-12Hz-1.7B-Baseandmodels--Qwen--Qwen3-TTS-12Hz-1.7B-CustomVoice.First, open this link https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base/tree/main. Download all the files and folders and place them into the
/models/models--Qwen--Qwen3-TTS-12Hz-1.7B-Basefolder.Next, open this link https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice/tree/main. Similarly, download all files and folders into the
models/models--Qwen--Qwen3-TTS-12Hz-1.7B-CustomVoicefolder.Illustration:
Reference Audio:
Used for cloning a voice based on a reference audio clip of 3-10 seconds.
Go to Menu -- Tools -- TTS Settings -- Qwen-tts (Local). Fill in the reference audio file path and the corresponding text for that audio. One entry per line. You can then select this reference audio for voice cloning in the dubbing character options.
Example:
n10.wav#You say all is vanity, yet why do you keep your eyes shut? If you were to open them and look at me, I don't believe you would see nothing.Place the n10.wav audio file in the f5-tts folder within the software directory. Then, enter the text spoken in the audio after the # symbol.


Voice Style Guidance Prompt
When using the built-in preset voices like Vivian, Uncle_fu, Sohee with the Qwen-TTS model, you can enter a guidance prompt.
Go to Menu -- Tools -- TTS Settings -- Qwen-tts (Local). Enter a short prompt in the prompt text box (e.g., use an angry and frantic tone). This prompt will be automatically applied when using the built-in voices.
Qwen3-TTS Alibaba Bailian API (Online Version)
The qwen3-tts model supports 10 languages and various Chinese dialects. Model name:
qwen3-tts-flashClick here to view specific voices and supported languages for qwen3-tts
{
"Qian Yue (Cherry)": "Cherry",
"Su Yao (Serena)": "Serena",
"Chen Xu (Ethan)": "Ethan",
"Qian Xue (Chelsie)": "Chelsie",
"Mo Tu (Momo)": "Momo",
"Shi San (Vivian)": "Vivian",
"Yue Bai (Moon)": "Moon",
"Si Yue (Maia)": "Maia",
"Kai (Kai)": "Kai",
"Bu Chi Yu (Nofish)": "Nofish",
"Meng Bao (Bella)": "Bella",
"Zhan Ni Fu (Jennifer)": "Jennifer",
"Tian Cha (Ryan)": "Ryan",
"Ka Jie Lin Na (Katerina)": "Katerina",
"Ai Deng (Aiden)": "Aiden",
"Cang Ming Zi (Eldric Sage)": "Eldric Sage",
"Guai Xiao Mei (Mia)": "Mia",
"Sha Xiao Mi (Mochi)": "Mochi",
"Yan Zheng Ying (Bellona)": "Bellona",
"Tian Shu (Vincent)": "Vincent",
"Meng Xiao Ji (Bunny)": "Bunny",
"A Wen (Neil)": "Neil",
"Mo Jiang Shi (Elias)": "Elias",
"Xu Da Ye (Arthur)": "Arthur",
"Lin Jia Mei Mei (Nini)": "Nini",
"Gui Po Po (Ebona)": "Ebona",
"Xiao Wan (Seren)": "Seren",
"Wan Pi Xiao Hai (Pip)": "Pip",
"Shao Nu A Yue (Stella)": "Stella",
"Bo De Jia (Bodega)": "Bodega",
"Suo Ni Sha (Sonrisa)": "Sonrisa",
"A Lie Ke (Alek)": "Alek",
"Duo Er Qie (Dolce)": "Dolce",
"Su Xi (Sohee)": "Sohee",
"Xiao Ye Xing (Ono Anna)": "Ono Anna",
"Lai En (Lenn)": "Lenn",
"Ai Mi Er An (Emilien)": "Emilien",
"An De Lei (Andre)": "Andre",
"La Di Ao · Ge Er (Radio Gol)": "Radio Gol",
"Shanghai-A Zhen (Jada)": "Jada",
"Beijing-Xiao Dong (Dylan)": "Dylan",
"Nanjing-Old Li (Li)": "Li",
"Shaanxi-Qin Chuan (Marcus)": "Marcus",
"Minnan-A Jie (Roy)": "Roy",
"Tianjin-Li Bide (Peter)": "Peter",
"Sichuan-Qing Er (Sunny)": "Sunny",
"Sichuan-Cheng Chuan (Eric)": "Eric",
"Cantonese-A Qiang (Rocky)": "Rocky",
"Cantonese-A Qing (Kiki)": "Kiki"
}Step 1: Obtain and Configure Your API KEY
- Click this link to visit the Alibaba Cloud Bailian platform: https://bailian.console.aliyun.com/?tab=model#/api-key

Log in to your Alibaba Cloud account (if you don't have one, just follow the prompts to register).
On the API-KEY management page, click "Create API-KEY". The system will automatically generate a string starting with "sk-". This is your API KEY. Copy this string.
Go back to the pyVideoTrans software. In the top menu bar, find TTS Settings, click on it, and select Qwen TTS from the dropdown menu.

In the Qwen3 TTS configuration window that pops up, paste the API KEY you just copied into the "API KEY" input box. You can click the "Test" button to listen to an example. If you hear sound, the configuration is successful. Finally, click Save.

Step 2: Use Qwen3-TTS in Video Translation
After configuration, you can enable Qwen3-TTS when processing a single video.
- On the main interface of pyVideoTrans, find the dropdown menu for "Dubbing Channel", click on it, and select "Qwen3 TTS".
- In the "Dubbing Character" menu next to it, you can choose your preferred voice. For example, select "Cherry" for a standard female voice, or choose "Sunny" for an interesting Sichuan dialect dubbing.

Step 3: Use in Batch Dubbing and Multi-Character Dubbing
The power of Qwen-TTS is also available for batch processing tasks, greatly improving your efficiency.
- Batch Dubbing for Subtitles: If you have multiple SRT subtitle files to dub, switch to the "Batch Dubbing for Subtitles" interface. Similarly, select "Qwen TTS" and your desired character in the "Dubbing Channel" section below.
- Multi-Character Dubbing for Subtitles: This feature is also applicable when processing dialogues involving multiple characters. You can assign different Qwen-TTS voices to different characters in the "Multi-Character Dubbing for Subtitles" section.




