Skip to content

Starting from version v3.74-0720 of the pyVideoTrans video translation software, it has integrated Alibaba's Qwen3-TTS speech synthesis service!

In simple terms, Qwen-TTS is an advanced speech synthesis technology that converts text into highly realistic and natural-sounding human voices. A key highlight is its ability to automatically adjust the rhythm and emotion of the speech based on the text content.

Supports 2 forms of Qwen3-TTS:

  • One is the online API via Alibaba Bailian.
  • The other is locally deployed Qwen3-TTS.

1: Qwen3-TTS Alibaba Bailian API (Online Version)

The qwen3-tts model supports 10 languages and multiple Chinese dialects. Model name: qwen3-tts-flash

Click here to view detailed voice descriptions and supported languages for qwen3-tts

json
{
  "芊悦(Cherry)": "Cherry",
  "苏瑶(Serena)": "Serena",
  "晨煦(Ethan)": "Ethan",
  "千雪(Chelsie)": "Chelsie",
  "茉兔(Momo)": "Momo",
  "十三(Vivian)": "Vivian",
  "月白(Moon)": "Moon",
  "四月(Maia)": "Maia",
  "凯(Kai)": "Kai",
  "不吃鱼(Nofish)": "Nofish",
  "萌宝(Bella)": "Bella",
  "詹妮弗(Jennifer)": "Jennifer",
  "甜茶(Ryan)": "Ryan",
  "卡捷琳娜(Katerina)": "Katerina",
  "艾登(Aiden)": "Aiden",
  "沧明子(Eldric Sage)": "Eldric Sage",
  "乖小妹(Mia)": "Mia",
  "沙小弥(Mochi)": "Mochi",
  "燕铮莺(Bellona)": "Bellona",
  "田叔(Vincent)": "Vincent",
  "萌小姬(Bunny)": "Bunny",
  "阿闻(Neil)": "Neil",
  "墨讲师(Elias)": "Elias",
  "徐大爷(Arthur)": "Arthur",
  "邻家妹妹(Nini)": "Nini",
  "诡婆婆(Ebona)": "Ebona",
  "小婉(Seren)": "Seren",
  "顽屁小孩(Pip)": "Pip",
  "少女阿月(Stella)": "Stella",
  "博德加(Bodega)": "Bodega",
  "索尼莎(Sonrisa)": "Sonrisa",
  "阿列克(Alek)": "Alek",
  "多尔切(Dolce)": "Dolce",
  "素熙(Sohee)": "Sohee",
  "小野杏(Ono Anna)": "Ono Anna",
  "莱恩(Lenn)": "Lenn",
  "埃米尔安(Emilien)": "Emilien",
  "安德雷(Andre)": "Andre",
  "拉迪奥·戈尔(Radio Gol)": "Radio Gol",
  "上海-阿珍(Jada)": "Jada",
  "北京-晓东(Dylan)": "Dylan",
  "南京-老李(Li)": "Li",
  "陕西-秦川(Marcus)": "Marcus",
  "闽南-阿杰(Roy)": "Roy",
  "天津-李彼得(Peter)": "Peter",
  "四川-晴儿(Sunny)": "Sunny",
  "四川-程川(Eric)": "Eric",
  "粤语-阿强(Rocky)": "Rocky",
  "粤语-阿清(Kiki)": "Kiki"
}

Step 1: Obtain and Configure Your API KEY

  1. Please click this link to visit the Alibaba Cloud Bailian platform: https://bailian.console.aliyun.com/?tab=model#/api-key

  1. Log in to your Alibaba Cloud account (if you don't have one, register as prompted).

  2. On the API-KEY management page, click "Create API-KEY". The system will automatically generate a string starting with "sk-". This is your API KEY. Please copy this string.

  3. Return to the pyVideoTrans software, find TTS Settings in the top menu bar, click it, and select Qwen TTS from the dropdown menu.

  4. In the pop-up Qwen3 TTS configuration window, paste the API KEY you just copied into the "API KEY" input box. You can click the "Test" button to listen to a sample. If you can hear the sound, the configuration is successful. Finally, click Save.

Step 2: Using Qwen3-TTS in Video Translation

Once configured, you can enable Qwen3-TTS when processing a single video.

  • In the main interface of pyVideoTrans, find the dropdown menu for "Dubbing Channel", click it, and select "Qwen3 TTS".
  • In the adjacent "Voice Role" menu, you can choose your preferred voice. For example, select "Cherry" for a standard female voice, or "Sunny" for a fun Sichuan dialect dubbing.

Step 3: Using in Batch Dubbing and Multi-Role Dubbing

The powerful features of Qwen-TTS are also applicable to batch processing tasks, greatly improving your work efficiency.

  • Batch Dubbing for Subtitles: If you have multiple SRT subtitle files that need dubbing, you can switch to the "Batch Dubbing for Subtitles" interface. Similarly, select "Qwen TTS" as the "Dubbing Channel" and choose your desired voice role.
  • Multi-Role Dubbing for Subtitles: This feature also applies when processing dialogue involving multiple characters. You can assign different Qwen-TTS voices to different characters in the "Multi-Role Dubbing for Subtitles" function area.


2: Qwen3-TTS Local Deployment (Offline Version)

Version 3.95 and later added support for locally deployed Qwen3-TTS. If you know how to deploy, you can deploy it yourself and start the corresponding model. Source code deployment reference

To lower the deployment barrier for regular users, we have created a one-click integrated package specifically for Windows 10/11. ✅ No need to manually install Python ✅ No need to configure complex environment variables ✅ Built-in environment management tool, ready to use after extraction ✅ Automatically downloads models (configured with domestic acceleration)

Step 1: Download and Extract the Integrated Package

  1. 【Important】 Please extract the compressed package to a path without Chinese characters or spaces (e.g., D:\AI\QwenTTS).
    • ❌ Bad example: C:\Users\张三\桌面\新建文件夹
    • ✅ Good example: D:\Tools\Qwen-TTS
  2. Open the folder, and you will see a file structure containing 5 startup scripts:

Step 2: Start the Service

Double-click the corresponding .bat file based on your needs. The first run will automatically download the model (may take some time). When you see * To create a public link, set share=True in launch()., it means the startup is successful.

Please keep the black command-line window open; do not close it.

1. Voice Cloning Mode (Requires Reference Audio)

Suitable for cloning a voice based on a 3-10 second reference audio clip.

  • 🎧 Start Voice Cloning-0.6B Model.bat: Fast speed, low configuration requirements.
  • 🎧 Start Voice Cloning-1.7B Model.bat: More realistic effect, but slightly slower.

2. Voice Design Mode

Suitable for creating voices through text descriptions.

  • 🎨 Start Voice Design.bat: Input a Prompt (e.g., "a deep-voiced middle-aged male") to design the voice.

3. Custom Voice Mode (Preset Characters)

Includes high-quality preset voices like Vivian, Uncle_fu, Sohee, etc. Cannot use reference audio.

  • 👤 Start Custom Voice-0.6B Model.bat
  • 👤 Start Custom Voice-1.7B Model.bat

Step 3: Configure in pyVideoTrans

Ensure the command-line window from the previous step remains open, then open the pyVideoTrans software to connect.

  1. Go to Menu -> TTS Settings -> Qwen3 TTS (Local).
  2. In the WebUI URL field, enter: http://127.0.0.1:8000.
  3. Click "Test" and save.

⚠️ Important Notes:

  • If you started a "Custom Voice" model (preset characters), in the pyVideoTrans dubbing settings, you must clear/delete the "Reference Audio" path, otherwise an error will occur.
  • If you started a "Voice Cloning" model, you must specify a reference audio clip in pyVideoTrans.

Appendix: Enabling GPU Acceleration (Optional)

The default configuration is CPU mode for compatibility with all computers. If you have an NVIDIA GPU with CUDA installed, you can enable acceleration (inference speed increases by about 10x) with the following steps:

  1. Right-click the corresponding .bat file and select "Edit".
  2. Delete the code --device cpu --dtype float32 at the end of the file.
  3. Save and run it again.