pyVideoTrans - open source free video translation dubbing software

Video translation, involving dubbing, subtitles, and screen synchronization, has always been a technical challenge. This is because different languages have vastly different grammatical structures and speaking speeds. When a sentence is translated into another language, the number of characters and the speaking speed will change, resulting in a mismatch between the duration of the dubbed audio and the original audio, causing the subtitles, audio, and video to become desynchronized.

Specifically, this manifests as: the original video shows the character finishing speaking, but the dubbing is only halfway through; or the next sentence in the original video has already started, but the dubbing is still speaking the previous sentence.

Translation Leading to Changes in Character Count

For example, when the following Chinese sentences are translated into English, their length and number of syllables change significantly, and the corresponding audio duration also changes:

Chinese: 得国最正莫过于明
English: There is no country more upright than the Ming Dynasty
Chinese: 我一生都在研究宇宙
English: I have been studying the universe all my life
Chinese: 北京圆明园四只黑天鹅疑被流浪狗咬死
English: Four black swans in Beijing's Yuanmingyuan Garden suspected of being bitten to death by stray dogs

As can be seen, when Chinese subtitles are translated into English subtitles and dubbed, the dubbing time usually exceeds the original Chinese audio duration. To solve this problem, the following strategies are usually adopted:

Several Coping Strategies

Increasing Dubbing Speed: Theoretically, as long as there is no upper limit on the speaking speed, it is always possible to match the audio duration with the subtitle duration. For example, if the original audio duration is 1 second and the dubbing duration is 3 seconds, increasing the dubbing speed to 300% can synchronize the two. However, this method makes the audio sound rushed and unnatural, and the speed fluctuates, resulting in a poor overall effect.
Simplifying the Translation: Reducing the dubbing time by shortening the translation. For example, translating "我一生都在研究宇宙" to the simpler "Cosmology is my life's work". While this method is most effective, it requires modifying subtitles sentence by sentence, which is very inefficient.
Adjusting Silent Intervals Between Subtitles: If there is silent time between two subtitles in the original audio, some of the silent time can be reduced or removed to bridge the duration gap. For example, if there is a 2-second silent interval between two subtitles in the original audio, and the translated first subtitle is 1.5 seconds longer than the original subtitle, the silent time can be shortened to 0.5 seconds to align the dubbing time of the second subtitle with the original audio time. However, not all subtitles have sufficient silent time to adjust, so this method has limited applicability.
Removing Silence at the Beginning and End of Dubbing: Silence is usually retained at the beginning and end of the dubbing; removing this silence can effectively shorten the dubbing time.
Slowing Down Video Playback: If simply speeding up the dubbing does not produce good results, consider combining it with slowing down video playback. For example, the original audio duration of a certain subtitle is 1 second, and it becomes 3 seconds after dubbing. We can shorten the dubbing time to 2 seconds (speeding up by 1 times), and at the same time reduce the playback speed of the corresponding video segment to half (extending the duration to 2 seconds), thus achieving synchronization.

Each of the above methods has its advantages and disadvantages and cannot perfectly solve all problems. To achieve optimal synchronization, manual fine-tuning is usually required, which contradicts the goal of software automation. Therefore, video translation software usually combines the above strategies to strive for the best effect.

Implementation in Video Translation Software

In the software, these strategies are usually controlled through the following settings:

Main Interface Settings:

The "Dubbing Speed" setting is used to accelerate the dubbing overall;

The "Dubbing Acceleration" setting is used to automatically increase the dubbing duration to match the subtitles;

The "Video Slowdown" setting is used to automatically reduce the video playback speed to match the dubbing duration;

The "Video Extension" setting is used to freeze the last frame of the video until the dubbing ends after the dubbing is finished.

Advanced Options Settings (Menu Bar -- Tools/Options -- Advanced Options -- Subtitle Audio Video Alignment):
Options such as "Remove Blank Space at the End of Dubbing" / "Remove Silent Interval Between Two Subtitles" / "Remove Subtitle Duration Greater Than Dubbing Duration" allow users to more precisely control the synchronization of subtitles and dubbing.
In addition, "Maximum Audio Acceleration Factor" (default 3x) and "Video Slowdown Factor" (default 20x) limit the degree of acceleration and deceleration to prevent audio distortion or excessively slow video playback.
Audio Compensation Left Shift: Due to the precision limitations of the underlying technology (ffmpeg), even if synchronization is achieved at the beginning of the video, the dubbing duration may gradually become longer than the subtitle duration over time. The "Audio Compensation Left Shift" setting can shift the subtitle timeline to the left as a whole, effectively mitigating this problem, such as eliminating a blank space between subtitles every 3 minutes.

By flexibly using the above settings, video translation software can automate the synchronization of subtitles and dubbing as much as possible, improving translation efficiency.

Translation Leading to Changes in Character Count ​

Several Coping Strategies ​

Implementation in Video Translation Software ​

Translation Leading to Changes in Character Count

Several Coping Strategies

Implementation in Video Translation Software