Quick Start Guide - pyVideoTrans | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

This is a powerful open-source video translation software dedicated to seamlessly converting videos from one language's voice and subtitles to another. Whether you are a content creator, educator, or language learner, pyVideoTrans provides you with a one-stop solution to break down language barriers.

Core Features at a Glance

Fully Automatic Video Translation: Intelligently recognizes the speech in the video, generates source language subtitles, translates them into the target language, performs voice-over, and finally synthesizes the new audio and subtitles into the original video, all in one go.
Speech Recognition and Transcription: Accurately transcribes human speech in video or audio files into SRT subtitle files with timestamps in batch.
SRT Subtitle File Translation: Supports batch translation of SRT subtitle files, retaining the original timecodes and format, and provides various bilingual subtitle styles.
Text-to-Speech (TTS): Utilizes various advanced TTS channels to generate high-quality, natural, and fluent voice-overs for your text or SRT subtitle files.
Practical Toolkit: Built-in auxiliary tools such as video/audio/subtitle merging, voice and background sound separation, etc., to meet your various refined needs in video processing.

How the Software Works

Before you begin, be sure to understand how this software works:

pyVideoTrans works by identifying and processing the [human speaking voice] in the video. It has absolutely nothing to do with whether there are existing subtitles (hard subtitles) in the video screen.

Can Process: Any video containing human speech, whether it has embedded subtitles or not.
Cannot Process: Videos with only background music and hard subtitles but without any human speech. This software also cannot directly extract hard subtitles from the video screen.

Download and Installation

1.1 Windows Users (Pre-packaged Version)

We provide a ready-to-use pre-packaged version for Windows 10/11 users, eliminating the need for complicated configuration.

Click to download the Windows pre-packaged version, unzip and use

Unzipping Precautions

Incorrectly unzipping the package is the most common reason for the software failing to launch. Please strictly follow these rules:

Prohibit Administrator Privileges Paths: Do not unzip to system folders such as C:/Program Files, C:/Windows, or Desktop.
Path Must Be Pure English: The unzipping path cannot contain any Chinese characters, spaces, or special symbols.
Recommended Practice: Create a folder with pure English or numbers (e.g., D:/videotrans) on a non-system drive such as D drive or E drive, and then unzip the package into this folder.

Unzipping Path Example

Launching the Software

After unzipping, enter the folder, find the sp.exe file, and double-click to run it. sp.exe

The software needs to load more modules when it is launched for the first time, which may take tens of seconds. Please be patient.

1.2 MacOS / Linux Users (Source Code Deployment)

For MacOS and Linux users, deployment is required via source code.

Source Code Repository Address: https://github.com/jianchang512/pyvideotrans
Detailed Deployment Tutorial:
- Detailed Tutorial for MacOS System Source Code Deployment
- Detailed Tutorial for Linux System Source Code Deployment

Software Interface and Core Functions

After the software is launched, you will see the following main interface.

Left Function Area: Switch the main function modules of the software, such as Custom Video Translation, Audio and Video to Subtitles, etc.
Top Menu Bar: Perform global configuration.
- Translation Settings: Configure the API keys and related parameters of each translation channel (such as OpenAI, Azure).
- TTS Settings: Configure the API keys and related parameters of each voice-over channel (such as OpenAI TTS, Azure TTS).
- Speech Recognition Settings: Configure the API keys and parameters of the speech recognition channel (such as OpenAI API, Alibaba ASR).
- Tools/Options: Contains various advanced options and auxiliary tools, such as subtitle format adjustment, video merging, voice separation, etc.
- Help/About: View software version information, documentation, and community links.
Right Workspace: The specific operation area of the current function module.

Quick Start - Video Translation Full Process

This is the most core function of the software. We will guide you step by step through a complete video translation task. The Custom Video Translation module is opened by default.

Step 1: Select Video and Output Settings

Select the video to process: Click the button to select one or more video files (hold Ctrl to select multiple).
Folder: Check this item to process all videos in the entire folder in batch.
Save to..: Set the output directory for the translated video. The default is the _video_out folder under the original video directory.
Clean generated: If you need to reprocess the same video (instead of using the cache), check this item.
Save video only: After checking this item, only the final MP4 video will be retained after processing, and intermediate files such as subtitles and audio will be automatically deleted.
Transfer subtitle position: If the original video has hard subtitles, checking this item can try to place the new subtitles in a different position to avoid overlapping.
Shutdown after finished: Automatically shut down the computer after processing all tasks, suitable for large-scale, long-term tasks.

Step 2: Configure Translation and Voice-over

Translation channel: Select the engine used to translate subtitles.
- Free: Google(Free) (requires proxy), Microsoft Translate (no proxy required).
- High Quality (API Key Required): OpenAI, Gemini, DeepL, etc. Set the API key in the corresponding position in the top menu bar.
Source language: Must accurately select the language spoken by the characters in the original video.
Target language: The target language you want to translate into.
Glossary: After checking, you can use the preset glossary for translation to ensure the accuracy of professional vocabulary.
Network proxy: If you use a channel that requires a proxy (such as Google, OpenAI), please fill in your proxy address and port here (such as http://127.0.0.1:10808).
Voiceover channel: Select the engine to generate the voice-over. Edge-TTS is the default option, which is free and has excellent results.
Voiceover role: You must select the target language first to load and select the corresponding voice (male/female voice, etc.).
Listen to voiceover: Click to preview the sound effect of the current role.
Voiceover speed/volume/pitch: Adjust as needed, the value represents the percentage increase or decrease based on the default.

Step 3: Configure Speech Recognition

This is a key step in converting video speech into text subtitles, which directly affects the quality of all subsequent processes.

Speech recognition: It is recommended to use the default faster-whisper(local), which is free, runs locally, and has excellent results.
Select model: The larger the model, the more accurate the recognition, but the slower the speed and the more resource consumption.
- Entry: tiny / medium
- Recommended: large-v3-turbo (good effect and fast speed, highly recommended to use with NVIDIA graphics card and CUDA acceleration).
Speech cutting mode: It is recommended to use the default overall recognition.
LLM re-segmentation: After checking, the large language model will be used to intelligently segment and punctuate the recognized text, which significantly improves the readability of subtitles.
Noise reduction: After checking, the audio will be noise-reduced to improve the accuracy of speech recognition in noisy environments.

Step 4: Set Synchronization and Subtitles

Since different languages have different speech speeds, the translated voice-over duration may not match the original video. You can adjust it here.

Sync alignment:
- Voiceover acceleration: When the voice-over is longer than the video, accelerate the voice-over to match the video duration (commonly used).
- Video slow down: When the voice-over is longer than the video, slow down the video to match the voice-over duration.
- Video extension: When the voice-over is longer than the video, add still frames at the end of the video to match the voice-over duration.
Subtitle embedding:
- Do not embed subtitles: Only replace the sound, do not add any subtitles.
- Embed hard subtitles: Permanently "burn" the subtitles into the screen, which cannot be turned off.
- Embed soft subtitles: Package the subtitles as an independent track into the video, and the player can choose to turn it on or off.
- (Dual): Embed bilingual subtitles in both source and target languages at the same time.

Step 5: Process Background Sound

Retain original background sound: Check this item, the software will try to separate the human voice and background sound of the original video, and retain the background sound in the final video. Note: This function will significantly increase the processing time, but it can greatly improve the quality of the finished product.
Add additional background audio: You can also choose your own audio file as the new background music.
Background volume: Adjust the volume of the background sound, less than 1 to reduce, greater than 1 to increase.

Step 6: Start Execution

CUDA acceleration: If you have an NVIDIA graphics card and have correctly installed the CUDA environment, be sure to check this item, which can increase the speed of speech recognition by several times or even dozens of times.

After all settings are completed, click the [Start] button.

Executing

The software will start working. If only one video is processed, it will pause after the subtitles are generated and translated, allowing you to proofread and modify the subtitles in the text box on the right. Click Execute again to continue after confirming that there is no error.

Step 7: View Results

After the task is completed, click the progress bar area at the bottom to open the output folder. You will see the final MP4 file and the materials generated during the process, such as SRT subtitles and voice-over files.

Explore Other Practical Functions

In addition to the core video translation, pyVideoTrans also provides several independent and powerful functions.

4.1 Audio and Video to Subtitles/Speech Transcription/Speech Recognition

Transcribe video or audio files into SRT subtitles in batch. Just drag in the file, set the original language and recognition model, and you can start. Supports advanced functions such as LLM re-segmentation and noise reduction.

4.2 Batch Translate SRT Subtitles

If you already have SRT subtitle files, this function can help you quickly translate them into other languages and keep the timeline unchanged. It also supports selecting multiple output formats such as Single language subtitles, Target language at the top (dual), Target language at the bottom (dual).

4.3 Batch Voice-over for Subtitles

Synthesize your SRT files or plain text into voice-over files (such as WAV or MP3) in batch through the selected TTS engine. Supports fine-tuning of speech speed, volume, and pitch.

4.4 Audio and Video Subtitle Merging

This is a practical post-production tool. When you have separate video, voice-over, and subtitle files, you can use it to perfectly merge the three into a final video file and support custom subtitle styles.

Chapter 5: Function Overview and Support List

The power of pyVideoTrans lies in its high scalability and support for multiple services.

Speech Recognition (STT) Support:
- Local Offline: faster-whisper, openai-whisper
- Online API: OpenAI SpeechToText, GoogleSpeech, Alibaba FunASR, Doubao Model, and custom API.
Subtitle Translation Support:
- Microsoft Translate, Google Translate, Baidu Translate, Tencent Translate, DeepL, DeepLX, ByteDance Volcano
- Large Language Model: ChatGPT, AzureAI, Gemini, other OpenAI-compatible AI large models and local large models
- Offline Translation: OTT
Speech Synthesis (TTS) Support:
- Microsoft Edge TTS, Google TTS, Azure AI TTS, OpenAI TTS, Elevenlabs
- Voice Cloning/Local: GPT-SoVITS, clone-voice, ChatTTS, Fish TTS, CosyVoice, F5-TTS, KokoroTTS
- Custom TTS server API
Supported Languages:
- Simplified and Traditional Chinese, English, Korean, Japanese, Russian, French, German, Italian, Spanish, Portuguese, Vietnamese, Thai, Arabic, Turkish, Hungarian, Hindi, Ukrainian, Kazakh, Indonesian, Malay, Czech, Polish, Dutch, Swedish, Filipino, Finnish, Persian, etc., and support automatic detection.

Thank you for choosing pyVideoTrans. We hope this software can be your powerful assistant in bridging the language barrier!

Core Features at a Glance ​

How the Software Works ​

Download and Installation ​

1.1 Windows Users (Pre-packaged Version) ​

Unzipping Precautions ​

Launching the Software ​

1.2 MacOS / Linux Users (Source Code Deployment) ​

Software Interface and Core Functions ​

Quick Start - Video Translation Full Process ​

Step 1: Select Video and Output Settings ​

Step 2: Configure Translation and Voice-over ​

Step 3: Configure Speech Recognition ​

Step 4: Set Synchronization and Subtitles ​

Step 5: Process Background Sound ​

Step 6: Start Execution ​

Step 7: View Results ​

Explore Other Practical Functions ​

4.1 Audio and Video to Subtitles/Speech Transcription/Speech Recognition ​

4.2 Batch Translate SRT Subtitles ​

4.3 Batch Voice-over for Subtitles ​

4.4 Audio and Video Subtitle Merging ​

Chapter 5: Function Overview and Support List ​