An ideal translated video should have the following characteristics: accurate subtitles, appropriate length, voice-over tone consistent with the original audio, and perfect synchronization of subtitles, audio, and video.
This guide will detail the four steps of video translation and provide optimal configuration recommendations for each step.
Step 1: Speech Recognition
Goal: Convert the speech in the video into a subtitle file in the corresponding language.
Corresponding Control Element: "Speech Recognition" line
Optimal Configuration:
- Select
faster-whisper(local)
- Model selection
large-v2
,large-v3
, orlarge-v3-turbo
- Speech segmentation mode selection
Whole recognition
- Select
Noise reduction
(time-consuming) - Select
Retain original background sound
(time-consuming) - If the video is in Chinese, also select
Chinese re-segmentation
- Select
Note: If there is no N-card or CUDA environment is not configured and CUDA acceleration is not enabled, the processing speed will be extremely slow. Insufficient VRAM may cause crashes.
Step 2: Subtitle Translation
Goal: Translate the subtitle file generated in step one into the target language.
Corresponding Control Element: "Translation Channel" line
Optimal Configuration:
- Priority Choice: If you have a VPN and understand how to configure it, use the
gemini-1.5-flash
model (Gemini AI channel) in Menu - Translation Settings - Gemini pro. - Suboptimal Choice: If you don't have a VPN or don't know how to configure a proxy, select
OpenAI ChatGPT
in "Translation Channel", and use thechagpt-4o
series model in Menu - Translation Settings - OpenAI ChatGPT (requires third-party relay). - Alternative: If you can't find a suitable third-party relay, you can choose to use domestic AI such as Moon's Dark Side, deepseek, etc.
- In Menu - Tools/Options - Advanced Options, select the two items shown in the following figure:
GeminiAI usage instructions https://pyvideotrans.com/gemini.html
- Priority Choice: If you have a VPN and understand how to configure it, use the
Step 3: Voice-over
Goal: Generate a voice-over based on the translated subtitle file.
Corresponding Control Element: "Voice-over Channel" line
Optimal Configuration:
- Chinese or English:
F5-TTS(local)
, voice role selectionclone
- Japanese and Korean:
CosyVoice(local)
, voice role selectionclone
- Other languages:
clone-voice(local)
, voice role selectionclone
- All three channels can maximize the retention of the original video's emotional color, with
F5-TTS
providing the best effect.
Requires additional installation of the corresponding
F5-TTS/CosyVoice/clone-voice
integration package, see documentation https://pyvideotrans.com/f5tts.html- Chinese or English:
Step 4: Subtitle, Voice-over, and Video Synchronization
- Goal: Synchronize the subtitles, voice-over, and video.
- Corresponding Control Element:
Synchronization
line - Optimal Configuration:
- When translating Chinese into English, you can set the
Voice-over speed
value (e.g.,10
or15
) to speed up the voice-over, as English sentences are usually longer. - Select the
Video extension
,Voice-over acceleration
, andVideo slowdown
options to force alignment of subtitles, audio, and video. - In Menu - Tools/Options - Advanced Options - Subtitle Audio Video Alignment Area, make the following settings:
Maximum audio acceleration factor
andVideo slowdown factor
can be adjusted according to the actual situation (default is 3).
It is recommended to fine-tune whether each option is selected and its value based on the actual speaking speed in the video.
- When translating Chinese into English, you can set the
Output Video Quality Control
- The default output is lossy compression. For lossless output, in Menu - Tools - Advanced Options - Video Output Control Area, set
Video transcoding loss control
to 0: - Note: If the original video is not in mp4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the size of the output video.