pyVideoTrans FAQ & Solutions
To help you use pyVideoTrans more effectively, we've compiled the following common issues and their solutions.
The menu bar -> Help/About section contains many helpful links, such as model download addresses and CUDA configuration. If you encounter problems, try clicking on these links.

Part 1: Installation & Startup Issues
1. After double-clicking sp.exe, the software doesn't open or doesn't respond for a long time?
This is usually normal behavior. Please don't worry.
- Reason: This software is based on
PySide6and the main interface contains many components. The first load requires initialization, which takes some time. Depending on your computer's performance, startup time can range from 5 seconds to 2 minutes. - Solutions:
- Wait Patiently: Please wait patiently after double-clicking.
- Check Security Software: Some antivirus or security software may block the program startup. Try temporarily disabling them or adding this software to your trusted/whitelist.
- Check File Path: Ensure the software installation path contains only English letters and numbers. It must not contain Chinese characters, spaces, or special symbols. For example,
D:\pyVideoTransis a good path, whileD:\program file\video toolmight cause problems. - Upgrade Package Issues: If the software fails to start after applying an upgrade patch, it likely means the operation was incorrect. Please re-download the full software package, extract it, and then apply the new upgrade patch over it.
2. What to do if the startup prompts a missing python310.dll file?
This indicates you only downloaded the upgrade patch, not the main program.
- Solution:
- First, go to the official website and download the complete software package.
- Extract the complete package to a specified directory.
- Then download the latest upgrade patch and extract it into the complete package's directory, overwriting files.
3. Does the software require installation?
This software is a portable version and does not require installation. Download the complete package, extract it, and double-click sp.exe to run it directly.
4. Why does antivirus software flag it as a virus or block it?
- Reason: This software is packaged using the
PyInstallertool and lacks a commercial digital signature. Some security software may raise risk warnings based on this; this is a common false positive. - Solutions:
- Add to Trusted List: Add this software to your antivirus software's trusted zone or whitelist.
- Run from Source Code: If you are a developer, you can choose to deploy and run it directly from the source code to completely avoid this issue.
5. Does the software support Windows 7?
No, it does not. Many core components this software relies on no longer support Windows 7.
Part 2: Core Functions & Settings
6. How to improve speech recognition accuracy?
Recognition accuracy mainly depends on the model size you choose.
- Model Selection: In "faster" or "openai" mode, the larger the model, the higher the accuracy, but the slower the processing speed and the greater the resource consumption.
tiny: Smallest size, fastest speed, but lower accuracy.base/small/medium: Balanced performance and resource consumption, a common choice.large-v3: Largest size, best performance, but highest hardware requirements.
- Optimization Settings: Click
Menu -> Tools -> Advanced Options
Find the faster/openai speech recognition adjustment section and make the following changes:
- Voice threshold set to
0.5 - Minimum duration / milliseconds set to
3000 - Maximum voice duration / seconds set to
6 - Silence separation milliseconds set to
140
7. Why did the video quality/definition decrease after processing?
Any operation involving re-encoding will inevitably lead to some video quality loss. To maximize original quality preservation, ensure all the following conditions are met:
- Original Video Format: Use the most compatible H.264 (libx264) encoded MP4 file.
- Disable Slow Processing: In the function options, do NOT check "Video Auto Slow Mode".
- Do Not Burn Hard Subtitles: Choose not to embed subtitles, or only embed soft subtitles. Hard subtitles force the entire video to be re-encoded.
- Advanced Options - Video Output Quality Control: The default value is 23. You can lower it to 18 or lower (minimum 0). The lower the value, the higher the output video quality, but the larger the file size.
- Advanced Options - Output Video Compression Rate: The default is
fast. You can choosesloworslowerfor higher quality, but processing time will increase. - Advanced Options - 264/265 Encoding: The default is
264. You can choose265for higher output video quality under the same bitrate.
8. Why is the output video extremely large?
- Modify Advanced Options - Video Output Quality Control to 25-51. A larger number results in a smaller output video size, but also lower quality.
- Advanced Options - 264/265 Encoding: Choose 265. Under the same quality, 265 produces a smaller file size.
9. How to configure a network proxy?
Some translation or TTS services (e.g., Google, OpenAI, Gemini) are not directly accessible from certain regions and require a network proxy.
- Configuration Method: Enter your proxy service address in the "Network Proxy Address" text box on the main interface.
- Format: Usually in the format
http://127.0.0.1:10808(the port number should match your proxy client settings). - Important: If you are unfamiliar with proxies or don't have one available, leave this field empty. Incorrect settings will cause errors.
10. How to customize subtitle font, color, and style?
Click on the main interface -> Set more parameters -> Modify hard subtitles.
Part 3: Common Issues & Troubleshooting
10. When batch translating videos (e.g., 30-50-100 videos), it always gets stuck?
By default, batch tasks divide each job into multiple phases and process them concurrently. Too many tasks can exhaust system resources. You can select Advanced Options -- Force serial batch translation to change the execution mode to serial. This means the next video starts only after the first one is fully processed.
11. Why is the audio, subtitles, and video out of sync after processing?
This is a normal phenomenon in language translation.
- Reason: When expressing the same idea in different languages, sentence length and syllable count differ, causing pronunciation duration to change. For example, a 2-second Chinese sentence, when translated into English, might result in a 4-second audio clip. This duration change makes it impossible to perfectly align the dubbed audio with the original video's lip movements and timeline.
12. Constantly getting "Out of VRAM" errors (e.g., Unable to allocate)?
This error means your graphics card doesn't have enough VRAM (or system RAM) to perform the current task, often due to using a large model or processing long videos.
- Solutions (try in recommended order):
- Use a Smaller Model: Switch the recognition model from
large-v3tomedium,small, orbase. Thelarge-v3model requires at least 8GB of VRAM, but other programs also consume VRAM during runtime. - Adjust Advanced Settings: In the menu bar
Tools/Options->Advanced Options, make the following changes to trade some accuracy for lower VRAM usage:CUDA data type: Changefloat32tofloat16orint8.beam_size: Change5to1.best_of: Change5to1.Context: Changetruetofalse.
- Use a Smaller Model: Switch the recognition model from
13. I have CUDA installed, but the software still can't use GPU acceleration?
Please check the following possible causes:
- CUDA Version Mismatch: The CUDA support version built into this software is 12.8. If your CUDA version is too old, it won't be utilized.
- Outdated Graphics Driver: Please update your NVIDIA graphics driver to the latest version.
- Missing cuDNN: Ensure you have correctly installed cuDNN matching your CUDA version.
- Hardware Incompatibility: GPU acceleration only supports NVIDIA graphics cards (Nvidia GPUs). AMD or Intel graphics cards cannot use CUDA.
14. Getting an error during execution containing "ffprobe exec error" or ffmpeg?
This error is usually related to file paths being too long or containing special symbols.
- Reason: Windows has a maximum path length limit (usually 260 characters). If your video file name is very long (e.g., downloaded from YouTube) and stored in deeply nested folders, the total path easily exceeds this limit.
- Solution: Move the video file to a shallower directory (e.g.,
D:\videos) and rename it to a short English or numeric name.
15. The software says the video "contains no audio track"?
- Possible Reason 1: The video genuinely has no sound. For example, videos downloaded from YouTube or other sites might have video and audio streams separate, and an error during merging could cause audio loss.
- Possible Reason 2: Excessive background noise. If the video environment is very noisy (e.g., street, concert), the human voice might be masked, preventing the model from recognizing valid speech.
- Possible Reason 3: Incorrect language selection. Ensure the language you selected in the "Original Language" option matches the language actually spoken in the video. For example, if the video has English dialogue, you must select "English" for correct recognition.
16. GPU usage is very low. Is this normal?
Yes, this is normal. The software workflow is: Speech Recognition -> Text Translation -> Text-to-Speech -> Video Composition.
Only during the first step, "Speech Recognition", does the software heavily utilize the GPU for computation. Other steps (like translation, composition) primarily rely on the CPU. Therefore, the GPU being in a low-load state most of the time is expected behavior.
17. I keep processing the same video, but the recognition results and subtitles never change.
- Reason: To save time and computational resources, the software enables a caching mechanism by default. If it detects that subtitle files have already been generated for a video, it uses the cached result without reprocessing.
- Solution: If you want to force re-recognition and translation, check the
Clean generatedcheckbox in the upper left corner of the software's main interface.

18. After processing a few videos, my hard drive space is full?
This is often due to enabling the "Video Slow Mode" function, which generates many temporary files.
- Reason: This function splits the video into many small segments based on subtitles and processes each segment, creating cache files much larger than the original video size.
- Solution:
- Manual Cleanup: After processing, you can manually delete the contents of the
tmpfolder in the software's root directory. - Auto Cleanup: The program will automatically clean these caches when you close the software normally.
- Manual Cleanup: After processing, you can manually delete the contents of the
Part 4: Comprehensive Information
18. Does the software support Docker deployment?
Currently, no.
19. Can it recognize hard-coded subtitles in the video (OCR function)?
No, it cannot. The software works by analyzing the audio track in the video, recognizing human speech, and converting it to text. It does not have Optical Character Recognition (OCR) capabilities. If needed, you can check out another project for recognizing hard subtitles in videos based on the Zhipu AI model
20. Can I add support for new languages?
Yes, you can add new target languages. Click here for details
21. Is the software paid? Can I use it commercially?
- Cost: This project is free and open-source software. You can use all features for free. Please note that if you use third-party translation, TTS (Text-to-Speech), or STT (Speech-to-Text) APIs, those service providers may charge fees, but this is unrelated to the software itself.
- Commercial Use: Individuals and companies are free to use this software. However, if you wish to integrate this project's code into your own commercial products, you must comply with the GPL-v3 open-source license. Additionally, some models or online APIs used might have their own license agreements. Whether commercial use is permitted depends on the specific platform you are using (e.g., consult Microsoft for the Edge-TTS channel, consult
https://github.com/2noise/ChatTTSfor the ChatTTS TTS channel).
22. Is human customer support provided?
No. This project is a free, open-source software developed by an individual with no revenue, so dedicated human support staff cannot be provided. If you encounter problems, please read this FAQ carefully first. You can also choose to support the project by scanning the WeChat QR code in the bottom right corner of the software, leaving your WeChat ID for paid technical support.
23. Where can I download the software and models?
- Software Download: pyvideotrans.com/downpackage
- Source Code Repository: github.com/jianchang512/pyvideotrans
24. Errors & Logs
- Log Location: The
logsfolder in the software's root directory contains log files named with the current year, month, and day. - Feedback Method: When an error occurs, click the "Report Error" button in the popup window to automatically submit it to the official forum. Or, copy the last 30 lines of the log and ask an AI.
25. Speech Recognition Issues
- Problem: The recognition result is empty or garbled.
- Solutions:
- Check if the "Original Language" is selected correctly (avoid over-reliance on Auto detection).
- Check if the video has background music interference (try enabling noise reduction).
- Insufficient VRAM: Lower
beam_size, switch toint8quantization, or use thesmallmodel.
- Problem: Prompt says VRAM or RAM is insufficient.
- Solutions:
- If using the large-v3 model, it might genuinely be insufficient. Try using a smaller model.
- If already using a smaller model, check if you have multiple available graphics cards. The first card might have too little available VRAM, causing this prompt. The software defaults to the first available GPU. Try upgrading to v3.98-317, which defaults to the card with the most available VRAM when multiple GPUs are present.
26. Translation Issues
- Problem: Translation results contain blank lines or include prompt words.
- Solutions:
- Local small models (e.g., 7B) may lack intelligence. Switch to online models like DeepSeek/GPT-4.
- Disable the "Send complete subtitles" option, and switch to line-by-line translation.
- Set
trans_thread=1to reduce concurrency. - Click here for detailed principles and solutions
27. TTS Issues
- Problem: Edge-TTS reports error 403 or generates silence.
- Solutions: Microsoft is rate-limiting. In "Advanced Options", set "Concurrent TTS threads" to 1, and "Pause seconds after TTS" to 5-10 seconds.
- Problem: Cannot connect to F5-TTS/CosyVoice.
- Solutions: Ensure the terminal window for the external TTS service is not closed, and the API address is configured correctly (pay attention to the port number).
28. Environment & Network
- Problem: Model download fails.
- Solutions: HuggingFace may be inaccessible from your region. Configure a "Network Proxy", or manually download models from a mirror site and place them in the
modelsdirectory. - Problem: GPU is not being utilized.
- Solutions: Confirm that CUDA 12.8 and cuDNN 9.x are installed. AMD graphics cards do not support CUDA.
29. Suspected bugs in versions below v3.98
- Problem: Upgrading to v3.98 may resolve the issue.
30. Error when using "Batch dubbing for subtitles" in v3.98
- Problem: An SRT or TXT file is already imported, but the prompt still says
Import SRT or Fill TextorMust import srt file or fill in text in the text box. - Solutions: This is a bug. Please download
sp.exeand overwrite the existing one. Download link
31. Experiencing "dll not loaded or not found" error with AzureTTS
- Problem:
Could not find module ... Microsoft.CognitiveServices.Speech.core.dll' (or one of its dependencies) - Solutions: If you downloaded a patch package, please re-download the complete package. If you already have the complete version, your operating system might be missing VC++ runtime components. Try installing Microsoft's VC++ components and then restart the software. Microsoft VC++ components download link
32. Why was the "Auto Detect" option removed from the pronunciation language list in the new version?
- Answer: You can still choose "Auto Detect" in the "Batch Audio-to-Text" function panel. "Auto Detect" was removed from the "Transcribe Video or Audio" function because subsequent steps in video translation (like subtitle translation, TTS involving reference audio) require a specific original language for certain channels. Otherwise, errors occur. Also, some speech recognition channels do not return the detected language code, making mid-process updates difficult. Therefore, after careful consideration, "Auto Detect" was removed from video translation. Please specify the language explicitly. If you only want to transcribe speech to text, you can use the "Batch Audio-to-Text" function in the left panel separately.
33. File Path Issues
- Issues with the path of the input audio, video, or subtitle file
- The software internally relies on the command line to use
ffmpeg. This means, especially on Windows, command line length is limited. If the file path or name is very long (from drive letter to the end of the file name), for example exceeding 200 characters, errors are highly likely on Windows. The solution is to rename the file to a very short name and move the input file from a deeply nested directory to a shallow directory. - If the input file path or name contains special symbols, especially on Windows, like
?*or emojis, errors are very likely. This is particularly common for videos downloaded from YouTube, which often have long titles containing various emojis and special symbols. Using such paths directly in the Windows terminal without processing will likely cause errors. The solution is the same: remove special symbols, rename to a short name. - If the software is installed in a deeply nested directory, problems are more noticeable on Windows due to its default command line length restrictions. The solution is to move it to a shallow directory that does not require administrator privileges.
- The software internally relies on the command line to use
