Skip to content

Software Working Principle

This software translates and processes videos by recognizing the spoken voice within them, independently of any existing subtitles. As long as there is human speech in the video, it can be processed, regardless of whether the video contains subtitles.

Please Note:

  • If the video only contains subtitles without spoken voice, this software cannot process it.
  • This software cannot directly extract or recognize existing subtitle files in the video.

Downloading the Software

The download and extraction method is only applicable to Windows systems. For Mac and Linux, please install from source code.

  1. Open the software website: https://pyvideotrans.com/
  2. Click the download button to go to the download page: https://pyvideotrans.com/downpackage.htmlDownload Page
  3. Select the Baidu Netdisk download link to download the complete installation package and the latest patch package. Baidu Netdisk DownloadDownload Selection

For the first use, you must download the complete installation package. After downloading the patch package, extract it and overwrite the contents to the directory where the complete installation package was extracted.

Extracting the Installation Package

The download and extraction method is only applicable to Windows systems. For Mac and Linux, please install from source code.

The downloaded complete package and patch package are both 7z compressed files. You can extract them using 7-Zip or other decompression software.

Recommended using 360 Compression Software:

Extraction Precautions:

  1. Avoid permission issues: Do not extract the software to the desktop or to folders under C drive like Program Files that require administrator privileges.
  2. Avoid path errors: The extraction path should not contain Chinese characters, spaces, or special symbols.

Strongly recommended: Create a new folder with an English or numerical name on the D drive or E drive (non-system drive), and extract the software to this folder. For example: D:/videotrans.

Extraction Path Example

After extraction, find the sp.exe file and double-click to start the software. sp.exe

Starting the Software

The sp.exe startup method is only applicable to Windows systems. For Mac and Linux, please install from source code.

Double-click sp.exe to start the software. Because the software uses PySide6 to build the interface and includes many functional modules, startup may take some time. Please wait patiently.

Starting

After successful startup, the software main interface will be displayed:

Main Interface

Interface Description:

  • Top-left title bar: Displays the software version number.
  • Bottom-left: Clicking opens the software documentation website.
  • Menu bar: Contains settings for translation, dubbing channels, and help and about information.
  • Left buttons: Various functional modules. For video translation, mainly use the Default Configuration Translation and Custom Video Translation buttons. Default Configuration Translation is simple but the translation quality is generally lower; Custom Video Translation provides more customization options for better translation results. It is recommended to use Custom Video Translation.

Video Translation Operation Steps

The software defaults to opening the Custom Video Translation module, and the right side is the operation area.

Custom Video Translation

The operation area contains the following 6 parts:

1. Select the Original Video to be Translated

Select Video

  • Select the video to process: Click the button to select one or more video files from your computer (hold down the Ctrl key for multiple selections).
  • Folder: Check this box to select a folder; the software will batch translate all video files in the folder.
  • Clean generated: If you operate on the same video again, the default is to use the cached data generated last time. If you need to regenerate all files, check this box.
  • Save to...: Click the button to select the save location for the translated files. It defaults to saving to the _video_out folder in the original video directory.
  • Only save video: The translation process will generate intermediate files such as subtitle files and audio files. If you only need the final translated video, check this box. Only Save Video

2. Select Translation Channel

Translation Channel

This software will first convert the video speech into subtitles, and then translate the subtitles into the target language. The translation channel is used to complete the subtitle translation work.

  • Translation Channel: Select the subtitle translation channel.

    • Microsoft Translator: Free, no VPN required, generally acceptable translation quality. (Default option) Microsoft Translator
    • Google: Better translation quality, VPN required.
    • OpenAI ChatGPT: Best translation quality, requires VPN and a paid account, it is recommended to use chatgpt-4o or a newer model.
    • Baidu Translate/Tencent Translate: Domestic translation channels, no VPN required, moderate translation quality.
  • Pronunciation Language: Select the speech language of the original video.

  • Target Language: Select the target language to translate to.

  • Network Proxy: If using a translation channel that requires a VPN (e.g., Google, OpenAI), enter the proxy IP and port here.

3. Select Dubbing Channel

The translated subtitle file will use the selected dubbing channel to generate the audio file.

Dubbing Channel

  • Dubbing Channel: Select the dubbing engine.

    • EdgeTTS: Based on Microsoft Edge browser's text-to-speech function, free, no proxy required. (Default option) EdgeTTS
    • Local channel: Requires additional installation and configuration, can be used offline locally.
    • Third-party paid API: Usually has a free trial quota.
  • Dubbing Role: Select the dubbing role (e.g., male voice, female voice). You need to select the target language first before selecting the dubbing role.Dubbing Role

  • Try Dubbing: Listen to the selected dubbing role's effect.

  • Dubbing Speed/Volume/Pitch: Adjust the dubbing speed, volume, and pitch. Speed and volume settings represent the percentage increase or decrease relative to the default value. For example, a speed of 15 means 15% faster than the normal speed (1.15 times speed); a volume of 90 means 90% higher than the normal volume (1.9 times volume).

4. Select Speech Recognition Engine

This is the most important step, which recognizes the speech in the video into text and generates SRT subtitles.

Speech Recognition

  • Speech Recognition: Select the speech recognition engine used to convert video speech into subtitles. The default is faster-whisper, which is free and can run locally.
  • Select Model: If using faster-whisper or openai-whisper, you can choose different models. The larger the model, the higher the accuracy, but the slower the running speed and the more resources consumed. The software only includes tiny and medium models by default; other models need to be downloaded separately. It is recommended to use the large-v2 or large-v3-turbo model for the best results (requires an NVIDIA graphics card and CUDA/cuDNN support).
  • Speech Segmentation Mode: Select the speech segmentation method. It is recommended to use the default Whole Recognition mode for better results. The Equal Segmentation mode will divide the speech into segments of equal duration and is only available when using faster-whisper/openai-whisper.
  • Chinese Repunctuation: Check this option to use Alibaba Cloud's punctuation model to repunctuate Chinese, improving subtitle quality.
  • Speech Denoising: Check this option to use Alibaba Cloud's speech denoising model to denoise the speech, improving recognition accuracy.

5. Set Synchronization Alignment

Synchronization Alignment

Because different languages have different speeds and lengths, the duration of the translated dubbing may not be consistent with the original video. This section is used to adjust the synchronization between subtitles, dubbing, and video.

  • Video Extension: If the dubbing duration exceeds the original video duration, checking this option will add a still image at the end of the video to match the video duration with the dubbing duration.

  • Dubbing Acceleration: If the dubbing duration exceeds the original video duration, checking this option will speed up the dubbing to match the video duration. (Maximum acceleration factor is 3 times, which can be modified in the menu Tools -> Advanced Options)

  • Video Slowdown: If the dubbing duration exceeds the original video duration, checking this option will slow down the video playback speed to match the dubbing duration. (Maximum slowdown factor is 20 times, which can be modified in the menu Tools -> Advanced Options)

  • Subtitle Embedding: Select the subtitle embedding method.

    • No subtitle embedding: Do not embed subtitles in the video.
    • Embed hard subtitles: Permanently embed subtitles into the video, visible in any player.
    • Embed soft subtitles: Save subtitles as a separate file with the video; requires player support to display.
    • Embed hard subtitles (double): Embed both original and target language hard subtitles.
    • Embed soft subtitles (double): Embed both original and target language soft subtitles.

image.png

  • Chinese, Japanese, Korean single-line characters: When embedding hard subtitles, set the maximum number of characters per line for Chinese, Japanese, and Korean languages (default 20).
  • Other languages: When embedding hard subtitles, set the maximum number of characters per line for other languages (default 60).

6. Process Background Sound

Background Sound

  • Keep original background sound: Check this option to keep the original background music in the translated video. Note: This option will significantly increase processing time and system resource consumption, and improve the accuracy of subtitle generation.
  • Add extra background audio: Click the button to select an audio file as new background music.
  • Loop background sound: If the new background music is shorter than the video, check this option to loop the background music.
  • Background volume: Adjust the volume of the background music. Values less than 1 reduce the volume, and values greater than 1 increase the volume.

Start Execution

Start Execution

  • CUDA Acceleration: If you have an NVIDIA graphics card and have installed CUDA/cuDNN, checking this option can significantly improve translation speed.

Click the Start Execution button, and the software will start translating the video.

Executing

  • If only one video is translated, the software will pause after generating and translating subtitles, allowing for manual correction of the subtitles (e.g., correcting typos). Pause for Modification

  • If multiple videos are selected, the translation process will not pause, and the subtitles of all videos will be displayed in the subtitle area on the right, which may appear messy, but this will not affect the final translation result.

View Results

After the translation is complete, click the progress bar to open the folder containing the results. The translated video file is in MP4 format; other files are intermediate generated material files (e.g., SRT subtitle files, audio files).

image.png

Many Other Functions Are Available

For example,

  • Specifically for transcribing audio and video into subtitles
  • Batch dubbing SRT subtitle files into audio
  • Translating SRT subtitles into another language's SRT subtitles

image.png

image.png

Use as needed.

Open Source Statement

This software is open source. Open source address: https://github.com/jianchang512/pyvideotrans

Open source license GPL-V3: https://www.gnu.org/licenses/gpl-3.0.txt

Software website: https://pyvideotrans.com

This software is free to download, free to use, and requires no login or registration. The developer has not sold it on any platform or authorized anyone to sell it on any platform.

The software incorporates various free and open-source solutions, including online and local options, available for free use.

The software also supports some commercial third-party API solutions, such as ChatGPT/Tencent Translate/ByteDance Volcano. If you need to use them, please prepare your own account and key, etc. You need to open or purchase them from the corresponding third-party platform. The cost is unrelated to this software; the software only provides the technical implementation of the interface with third-party APIs.