An ideal translated video should have the following characteristics: accurate subtitles of appropriate length, voiceover timbre matching the original, and perfect synchronization between subtitles, audio, and visuals.
This guide details the four steps of video translation and provides optimal configuration recommendations for each step.
Step 1: Speech Recognition
Goal: Convert the speech in the video into a subtitle file in the corresponding language.
Corresponding Control Element: The "Speech Recognition" row

Best Configuration for Non-Chinese:
- Free:
faster-whisper(local)open-whisper(local), select modellarge-v3 - Paid: OpenAI API interface
- Free:
Best Configuration for Chinese:
- Free: Alibaba FunASR
- Paid:
Alibaba Bailian ASR,Doubao Speech Recognition Large Model
Best for Japanese:
- Free: Huggingface_ASR ->
kotoba-tech/kotoba-whisper-v2.0orreazon-research/japanese-wav2vec2-large-rs35kh
- Free: Huggingface_ASR ->
Best Configuration for Other Languages:
- Paid:
Gemini Large Model Recognition - Paid: openai-api
- Paid:
Note: If you do not have an Nvidia GPU or have not configured the CUDA environment to enable CUDA acceleration, processing with local models will be extremely slow. It may crash if the VRAM is insufficient.
Step 2: Subtitle Translation
Goal: Translate the subtitle file generated in Step 1 into the target language.
Corresponding Control Element: The "Translation Channel" row

Best Configuration:
- Preferred AI Channels (Paid): DeepSeek / Gemini / OpenAI ChatGPT / Alibaba Bailian
Step 3: Dubbing
Goal: Generate voiceover based on the translated subtitle file.
Corresponding Control Element: The "Dubbing Channel" row

Best Configuration:
- Free: Edge-TTS: Free and supports all languages.
- Free (Chinese, English, Japanese, Korean):
F5-TTS/Index-TTS/GPT-SOVITS/CosyVoice(local) - Paid: Doubao TTS / Qwen-TTS / 302.AI / Minimaxi / OpenAI-TTS
Additional installation of corresponding
F5-TTS/CosyVoice/clone-voice/GPT-SOVITSintegration packages is required. See documentation: https://pyvideotrans.com/f5tts.html
Step 4: Synchronization of Subtitles, Dubbing, and Video
- Goal: Synchronize subtitles, dubbing, and video.
- Corresponding Control Element: The
Sync & Alignrow
- Best Configuration:
- Enable
Secondary Recognition. This will perform speech recognition on the final voiceover file to generate subtitles with a more precise timeline. - When translating Chinese to English, you can set the
Dubbing Speedvalue (e.g.,10or15) to speed up the voiceover, as English sentences are often longer. - Enable both the
Speed Up DubbingandSlow Down Videooptions to force alignment of subtitles, audio, and video.
- Enable
Output Video Quality Control
- The default output uses lossy compression. If you need lossless output, go to Menu -> Tools -> Advanced Options -> Video Output Control section and set
Video Transcoding Loss Controlto 0:
- Note: If the original video is not in mp4 format or uses embedded hardcoded subtitles, video encoding conversion will cause some loss, though it's usually minimal. Increasing video quality will significantly reduce processing speed and increase the output video file size.
