pyVideoTrans Command Line Tool User Manual
This document details the command-line (CLI) usage of the pyVideoTrans video translation tool. The tool supports four core functions: Speech-to-Text (STT), Text-to-Speech (TTS), Subtitle Translation (STS), and full Video-to-Video (VTV) translation.
⚠️ Important Notes: Before You Start
- Execution Method: This document is based on running the tool via
uv run cli.py. - File Paths: The
--nameparameter must use the absolute path to the file. - Path Quoting: If the path contains spaces, you must wrap the path in English double quotes
"".- ✅ Correct:
--name "C:\My Videos\test file.mp4" - ❌ Incorrect:
--name C:\My Videos\test file.mp4
- ✅ Correct:
- Getting Voice Roles: Please select the corresponding TTS channel and target language in the software UI, then view the available voice roles there. Due to space and readability constraints, they are not listed exhaustively here.
1. Speech-to-Text (STT)
Extract speech from a video or audio file and generate an SRT subtitle file.
Basic Command Format
uv run cli.py --task stt --name "Absolute file path" [Optional Parameters]Parameter Details
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--task | str | Yes | stt | Task type identifier |
--name | str | Yes | - | Absolute path to the audio/video file |
--recogn_type | int | No | 0 | Speech recognition channel ID (see Appendix 1) |
--model_name | str | No | tiny | Model size (tiny, small, base, medium, large-v2, etc. Please check the specific model names available in the software UI based on the selected recognition channel) |
--detect_language | str | No | auto | Source audio language code, defaults to auto-detection |
--cuda | bool | No | False | Add this flag to enable GPU (CUDA) acceleration |
--remove_noise | bool | No | False | Add this flag to enable audio noise reduction |
--enable_diariz | bool | No | False | Add this flag to enable speaker diarization (distinguish different speakers) |
--nums_diariz | int | No | -1 | Specify the number of speakers (only valid when speaker diarization is enabled) |
--fix_punc | bool | No | False | Add this flag to attempt punctuation restoration |
Usage Examples
Transcribe using Faster-Whisper (tiny model):
uv run cli.py --task stt --name "D:\videos\demo.mp4" --recogn_type 0 --model_name tinyUse GPU acceleration and specify source language as Chinese:
uv run cli.py --task stt --name "D:\videos\demo.mp4" --detect_language zh-cn --cuda2. Text-to-Speech (TTS)
Convert an SRT subtitle file or text into speech audio.
Basic Command Format
uv run cli.py --task tts --name "Absolute SRT file path" --voice_role "Voice Name" [Optional Parameters]Parameter Details
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--task | str | Yes | tts | Task type identifier |
--name | str | Yes | - | Absolute path to the SRT subtitle file |
--tts_type | int | No | 0 | Dubbing channel ID (see Appendix 2) |
--voice_role | str | Yes | - | Voice role name (Please check the specific available role names in the software UI based on the selected TTS channel) |
--voice_rate | str | No | +0% | Speech rate adjustment (e.g., +10%, -10%) |
--volume | str | No | +0% | Volume adjustment |
--pitch | str | No | +0Hz | Pitch adjustment |
--target_language_code | str | No | - | Target language code (required for some TTS engines) |
--voice_autorate | bool | No | False | Automatically speed up audio to align with subtitle timestamps |
--align_sub_audio | bool | No | False | Force modification of subtitle timestamps to fit audio length |
Usage Examples
Dubbing using Edge-TTS (Chinese male voice):
uv run cli.py --task tts --name "C:\subs\movie.srt" --tts_type 0 --voice_role "zh-CN-YunyangNeural" --target_language_code zh-cn3. Subtitle Translation (STS)
Translate an SRT subtitle file into another language.
Basic Command Format
uv run cli.py --task sts --name "Absolute SRT file path" --target_language_code "Target Language" [Optional Parameters]Parameter Details
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--task | str | Yes | sts | Task type identifier |
--name | str | Yes | - | Absolute path to the SRT subtitle file |
--translate_type | int | No | 0 | Translation channel ID (see Appendix 3) |
--target_language_code | str | Yes | - | Target language code (see Appendix 4) |
--source_language_code | str | No | auto | Original language code |
Usage Examples
Translate subtitles to English (using Google Translate):
uv run cli.py --task sts --name "D:\subs\source.srt" --target_language_code en --translate_type 04. Video-to-Video Translation (VTV)
Full pipeline processing: Recognition -> Translation -> Dubbing -> Synthesis, directly producing a translated video.
Basic Command Format
uv run cli.py --task vtv --name "Video path" --source_language_code "Source Language" --target_language_code "Target Language" [Optional Parameters]Parameter Details
VTV mode integrates parameters from all the above functions. Listed below are additional parameters not covered above.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--task | str | Yes | vtv | Task type identifier |
--name | str | Yes | - | Absolute path to the video file |
--source_language_code | str | Yes | - | Spoken language (Cannot be set to auto in VTV mode) |
--target_language_code | str | Yes | - | Target language |
--subtitle_type | int | No | 1 | Subtitle embedding method (see explanation below) |
--video_autorate | bool | No | False | Automatically slow down video frames to align with dubbed audio |
--is_separate | bool | No | False | Whether to separate vocals from background sound (preserve background audio) |
--recogn2pass | bool | No | False | Whether to perform a second pass of speech recognition (improves accuracy) |
--clear_cache | bool | No | True | Whether to clean temporary files (default: clean) |
--no-clear-cache | flag | No | - | Add this flag to NOT clean the cache |
Regarding values for --subtitle_type:
0: Do not embed subtitles1: Hard subtitles (default)2: Soft subtitles3: Hard bilingual subtitles4: Soft bilingual subtitles
Usage Examples
Translate Chinese video to English, preserve background audio, embed hard subtitles, use GPU acceleration:
uv run cli.py --task vtv --name "E:\movies\clip.mp4" --source_language_code zh-cn --target_language_code en --voice_role "en-US-GuyNeural" --is_separate --cuda --subtitle_type 1Appendix: Channel and Code Reference Tables
Appendix 1: Speech Recognition Channel List (--recogn_type)
In the software UI, this corresponds to the specific speech recognition channel sequence number, starting from 0
| ID | Channel Name | Notes |
|---|---|---|
| 0 | faster-whisper | Local (Recommended, fast) |
| 1 | openai-whisper | Local (Official version) |
| 2 | Alibaba FunASR | Local |
| 3 | Huggingface_ASR | |
| 4 | OpenAI Speech Recognition API | Requires API Key |
| 5 | Gemini Large Model Recognition | |
| 6 | Alibaba Bailian Qwen3-ASR | |
| 7 | ByteDance Speech Large Model (Fast Version) | |
| 8 | Zhipu AI GLM-ASR | |
| 9 | Deepgram.com | |
| 10 | ByteDance Audio/Video Subtitle Generation | |
| 11 | Parakeet-tdt | |
| 12 | Whisper.cpp | |
| 13 | Faster-Whisper-XXL.exe | |
| 14 | WhisperX | |
| 15 | 302.AI | |
| 16 | ElevenLabs.io | |
| 17 | Google Recognition API | Free |
| 18 | STT Speech Recognition | Local |
| 19 | Custom Recognition API |
Supported model names, applicable only to faster-whisper and openai-whisper channels. For other channels, please check the software (--model_name): tiny, small, base, medium, large-v3-turbo, large-v1, large-v2, large-v3
Appendix 2: Text-to-Speech Channel List (--tts_type)
In the software UI, this corresponds to the specific TTS channel sequence number, starting from 0
After selecting the target language and TTS channel, the specific available role names will be displayed in the software UI.
| ID | Channel Name | Notes |
|---|---|---|
| 0 | Edge-TTS | Free (Recommended) |
| 1 | piper TTS | Local |
| 2 | VITS | Local |
| 3 | Qwen3 TTS | |
| 4 | Doubao TTS Model 2.0 | |
| 5 | ByteDance TTS | |
| 6 | Zhipu AI GLM-TTS | |
| 7 | GPT-SoVITS | Local |
| 8 | F5-TTS | Local |
| 9 | Index TTS | Local |
| 10 | CosyVoice | Local |
| 11 | Supertonic | Local |
| 12 | Minimaxi TTS | |
| 13 | OpenAI TTS | |
| 14 | 302.AI | |
| 15 | Elevenlabs.io | |
| 16 | Azure-TTS | |
| 17 | Gemini TTS | |
| 18 | VoxCPM TTS | Local |
| 19 | ChatterBox TTS | Local |
| 20 | ChatTTS | Local |
| 21 | Spark TTS | Local |
| 22 | Dia TTS | Local |
| 23 | kokoro TTS | Local |
| 24 | clone-voice | Local |
| 25 | Fish TTS | Local |
| 26 | Google TTS | Free |
| 27 | Custom TTS API |
Appendix 3: Translation Channel List (--translate_type)
In the software UI, this corresponds to the specific translation channel sequence number, starting from 0
| ID | Channel Name | Notes |
|---|---|---|
| 0 | Free (Default) | |
| 1 | Microsoft | Free |
| 2 | M2M100 | Local |
| 3 | OpenAI ChatGPT | |
| 4 | DeepSeek | |
| 5 | Gemini AI | |
| 6 | Zhipu AI | |
| 7 | AzureGPT AI | |
| 8 | Compatible AI/Local Model | |
| 9 | OpenRouter | |
| 10 | SillyTavern | |
| 11 | 302.AI | |
| 12 | Alibaba Bailian | |
| 13 | ByteDance Large Model | |
| 14 | Tencent Translation | |
| 15 | Baidu Translation | |
| 16 | DeepL | |
| 17 | DeepLx | |
| 18 | Alibaba Machine Translation | |
| 19 | OTT | Local |
| 20 | LibreTranslate | Local |
| 21 | MyMemory API | Free |
| 22 | Custom Translation API |
Appendix 4: Language Code List
Applicable to --source_language_code and --target_language_code
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
| en | English | zh-cn | Simplified Chinese | zh-tw | Traditional Chinese |
| fr | French | de | German | ja | Japanese |
| ko | Korean | ru | Russian | es | Spanish |
| th | Thai | it | Italian | pt | Portuguese |
| vi | Vietnamese | ar | Arabic | tr | Turkish |
| hi | Hindi | hu | Hungarian | uk | Ukrainian |
| id | Indonesian | ms | Malay | kk | Kazakh |
| cs | Czech | pl | Polish | nl | Dutch |
| sv | Swedish | he | Hebrew | bn | Bengali |
| fa | Persian | fil | Filipino | ur | Urdu |
| yue | Cantonese |
