Skip to content

Why Audio, Subtitles, and Video Are Out of Sync

Sentence length changes after translation between different languages, and pronunciation time generally also changes. For example, when translating from Chinese to English, the sentence length will definitely be different, and the time it takes to pronounce the Chinese sentence is generally different from the time it takes to pronounce the English sentence.

Chinese:有多远滚多远

English:Get out of here as far as you can!

Chinese:滚远点

Japanese:ここから出て行け。

The original Chinese audio in the video lasts 2 seconds. After translation into English and dubbing, the duration may be 4 seconds, which inevitably leads to desynchronization.

How to Synchronize Them, Regardless of Effect, Just Synchronization

As mentioned above, the duration before translation is 2 seconds, and the duration after translation is 4 seconds. If you only need them to be synchronized and don't care about the speed of speech or the speed of the video, you can directly speed up the audio by 2 times, shortening the 4-second duration to 2 seconds, thus achieving synchronization. Or slow down the video, extending the original 2-second segment to 4 seconds, can also achieve alignment.

Specific Operation of Audio Acceleration for Alignment:

  1. Select "Auto Audio Acceleration" in the software interface, and uncheck "Auto Video Slowdown". image-20240902003425516
  2. Open Menu - Tools - Options, and set the maximum audio acceleration factor to 100.

Synchronization can be achieved, but the drawback is obvious: the speech speed is erratic.

Video Slowdown for Alignment:

  1. Uncheck "Auto Audio Acceleration" in the software interface, and select "Auto Video Slowdown".

    image-20240902003436797

  2. Open Menu - Tools - Options, and set the maximum video slowdown factor to 20.

This can also achieve alignment, the speech speed remains unchanged, and the video is slowed down, but the video also becomes erratic.

If you just want simple alignment and don't care about the effect, you can use these two methods.

Better Acceptable Synchronization Methods

Obviously, the above synchronization methods are not practical. Audio that is too fast or video that is too slow is unacceptable, and the experience is too poor. For better results, you can enable both "Auto Audio Acceleration" and "Auto Video Slowdown".

Specific Operation:

  1. When selecting faster mode or openai mode, try to use medium or larger models and select "Overall Recognition". image-20240902004236786

  2. Select "Auto Audio Acceleration" and "Auto Video Slowdown" in the software interface, and set a smaller overall acceleration value, such as 10%.

image-20240902003457505

  1. Open Menu - Tools - Options, and set the maximum audio acceleration factor to 1.8, i.e., the maximum speech speed is accelerated to 1.8 times the normal speed. You can manually modify it to 2 or 1.5, etc., which are values greater than 1. image-20240902003537160
  2. Open Menu - Tools - Options, and set the maximum video acceleration factor to 2, i.e., slow down to 0.05 times the normal speed. You can change it to 3 or 5, etc., which are values greater than 1.
  3. After operations 1-3 above, it may still not be aligned because the maximum value is limited. When the maximum value is reached and it is still not aligned, it will be abandoned and directly delayed. Then you can continue to adjust the options related to the picture and subtitles in Menu - Tools - Options.

Is There a Perfect Synchronization Method?

Except for manual processing with human intervention, such as simplifying the translation, adding transition frames, etc., no perfect method has been found that can be automatically implemented by a program.

To simultaneously ensure that in very long or very short videos, in any language translation and dubbing, the program can automatically achieve these goals: "acceptable audio acceleration range", "acceptable video slowdown range", and "mouth opening and closing time coincides with the start time of speech", it seems to be an impossible task at present. There is no perfect method except for manual adjustment.