π Zero-Baseline One-Click Run! Qwen3-TTS Speech Synthesis/Cloning Windows All-in-One Package Tutorial β
π Preface β
Qwen3-TTS is a very powerful speech synthesis (TTS) model. It can not only generate speech from text but also clone your voice, and even design a voice that has never existed before through text description!
Typically, deploying such open-source large models requires a complex Python environment, installing various dependency libraries, which presents a high barrier for non-technical users.
I have created a one-click all-in-one package specifically for Windows 10/11. β No need to manually install Python β No need to configure complex environment variables β Built-in environment management tool (uv.exe) β Automatically downloads models (configured with domestic acceleration)
You just need to download the package, extract it, double-click, and you can start using it!
π Step 1: Download and Extract β
- Download the all-in-one package compressed file I provided.
- γImportantγ Please extract the compressed package to a path without Chinese characters or spaces (e.g.,
D:\AI\QwenTTS).- Wrong example:
C:\Users\εΌ δΈ\Desktop\New Folder - Correct example:
D:\Tools\Qwen-TTS
- Wrong example:
- Open the folder, and you should see a file structure as shown in the image below:

π±οΈ Step 2: Choose the Function You Need (5 Startup Scripts) β
There are 5 .bat files in the folder, each corresponding to different functions and model sizes. You can choose which one to double-click based on your computer's configuration and needs.
1οΈβ£ If You Want to "Clone" a Voice (Based on Reference Audio) β
This mode allows you to upload a 3-10 second reference audio clip, and the AI will imitate that voice to speak.
- π§ Start Voice Cloning-0.6B Model.bat
- Features: Fast speed, low computer requirements, suitable for trying it out.
- π§ Start Voice Cloning-1.7B Model.bat
- Features: Better effect, more realistic voice, but slightly slower generation speed and higher computer performance requirements.
2οΈβ£ If You Want to "Design" a Voice (Voice Design) β
This mode does not require reference audio. You can directly use text to describe voice characteristics, e.g., "a deep, magnetic middle-aged male voice".
- π¨ Start Voice Design.bat (uses the 1.7B model)
- How to play: Input a Prompt to create a unique voice.
3οΈβ£ If You Want to Use "Preset" Characters (Custom Voice Tones) β
This mode comes with preset high-quality voice tones like Vivian, Uncle_fu, Sohee, etc., which are stable and pleasant-sounding.
- π€ Start Custom Voice Tone-0.6B Model.bat
- π€ Start Custom Voice Tone-1.7B Model.bat
- Note: In this mode, you cannot use reference audio; you can only select a character from the dropdown menu.
βοΈ Step 3: Launch and Automatic Configuration β
- Double-click the
.batfile you selected. - A black command-line window will appear. Please do not close it!
- If it's the first time running, the tool will automatically configure the environment and download model files for you.
- It already has a built-in domestic acceleration source (hf-mirror.com), ensuring good download speeds.
- Depending on your internet speed, this may take a few minutes to over ten minutes. Please be patient.
- When the following text appears in the black window, it means the launch was successful:text
* To create a public link, set `share=True` in `launch()`.

π Step 4: Start Using β
- Open your browser (Chrome or Edge recommended).
- Type
http://127.0.0.1:8000into the address bar and press Enter. - You will see the Qwen3-TTS operation interface:
- Input Box: Enter the text you want the AI to read.
- Reference Audio / Prompt: Upload audio or enter a description based on the mode you launched.
- Generate: Click the button and witness the magic!

β
π Advanced: Using with pyVideoTrans β
If you are a pyVideoTrans user, this all-in-one package can be perfectly integrated with the video translation software:

- Ensure the black window remains open.
- Open the pyVideoTrans software.
- Go to Menu -> TTS Settings -> Qwen3 TTS(Local).
- In the WebUI URL field, enter:
http://127.0.0.1:8000. - Note: If you launched the "Custom Voice Tone" model, please clear the reference audio settings in pyVideoTrans, otherwise it will cause an error.
β‘ Expert Advanced: How to Enable Graphics Card (GPU) Acceleration? β
By default, to ensure everyone's computer (including laptops without dedicated graphics cards) can run it, I set the configuration to CPU mode.
If you have an NVIDIA graphics card and have already installed the CUDA environment, you can get over 10x faster inference speed with a simple modification!
- Right-click the
.batfile you want to modify and select "Edit" (or open it with Notepad). - Find the part at the end of the file containing the following code:batch
--device cpu --dtype float32 - Delete this code (i.e., delete
--device cpu --dtype float32). - Save the file and double-click to run it again. The program will automatically call your GPU for acceleration.
β Frequently Asked Questions β
- Q: What should I do if it flashes and closes after double-clicking?
- A: Please check if the extraction path contains Chinese characters or spaces. Also, ensure you have the VC++ runtime libraries installed (computers used for gaming usually have them).
- Q: The generation speed is very slow?
- A: The default CPU mode is indeed slower than GPU. If you have an Nvidia card, it's recommended to enable acceleration as per the "Expert Advanced" section. The 1.7B model being slower than the 0.6B model is normal.
- Q: It gets stuck on the first launch?
- A: This is downloading the model. The file is large (several GB). Please check if there is a progress bar or download prompt in the black window. As long as there is no error, please wait patiently.
