Skip to content

pyVideoTrans Command Line Tool User Manual

This document details the command-line (CLI) usage of the pyVideoTrans video translation tool. The tool supports four core functions: Speech-to-Text (STT), Text-to-Speech (TTS), Subtitle Translation (STS), and full Video-to-Video (VTV) translation.


⚠️ Important Notes: Before You Start

  1. Execution Method: This document is based on running the tool via uv run cli.py.
  2. File Paths: The --name parameter must use the absolute path to the file.
  3. Path Quoting: If the path contains spaces, you must wrap the path in English double quotes "".
    • ✅ Correct: --name "C:\My Videos\test file.mp4"
    • ❌ Incorrect: --name C:\My Videos\test file.mp4
  4. Getting Voice Roles: Please select the corresponding TTS channel and target language in the software UI, then view the available voice roles there. Due to space and readability constraints, they are not listed exhaustively here.

1. Speech-to-Text (STT)

Extract speech from a video or audio file and generate an SRT subtitle file.

Basic Command Format

bash
uv run cli.py --task stt --name "Absolute file path" [Optional Parameters]

Parameter Details

ParameterTypeRequiredDefaultDescription
--taskstrYessttTask type identifier
--namestrYes-Absolute path to the audio/video file
--recogn_typeintNo0Speech recognition channel ID (see Appendix 1)
--model_namestrNotinyModel size (tiny, small, base, medium, large-v2, etc. Please check the specific model names available in the software UI based on the selected recognition channel)
--detect_languagestrNoautoSource audio language code, defaults to auto-detection
--cudaboolNoFalseAdd this flag to enable GPU (CUDA) acceleration
--remove_noiseboolNoFalseAdd this flag to enable audio noise reduction
--enable_diarizboolNoFalseAdd this flag to enable speaker diarization (distinguish different speakers)
--nums_diarizintNo-1Specify the number of speakers (only valid when speaker diarization is enabled)
--fix_puncboolNoFalseAdd this flag to attempt punctuation restoration

Usage Examples

Transcribe using Faster-Whisper (tiny model):

bash
uv run cli.py --task stt --name "D:\videos\demo.mp4" --recogn_type 0 --model_name tiny

Use GPU acceleration and specify source language as Chinese:

bash
uv run cli.py --task stt --name "D:\videos\demo.mp4" --detect_language zh-cn --cuda

2. Text-to-Speech (TTS)

Convert an SRT subtitle file or text into speech audio.

Basic Command Format

bash
uv run cli.py --task tts --name "Absolute SRT file path" --voice_role "Voice Name" [Optional Parameters]

Parameter Details

ParameterTypeRequiredDefaultDescription
--taskstrYesttsTask type identifier
--namestrYes-Absolute path to the SRT subtitle file
--tts_typeintNo0Dubbing channel ID (see Appendix 2)
--voice_rolestrYes-Voice role name (Please check the specific available role names in the software UI based on the selected TTS channel)
--voice_ratestrNo+0%Speech rate adjustment (e.g., +10%, -10%)
--volumestrNo+0%Volume adjustment
--pitchstrNo+0HzPitch adjustment
--target_language_codestrNo-Target language code (required for some TTS engines)
--voice_autorateboolNoFalseAutomatically speed up audio to align with subtitle timestamps
--align_sub_audioboolNoFalseForce modification of subtitle timestamps to fit audio length

Usage Examples

Dubbing using Edge-TTS (Chinese male voice):

bash
uv run cli.py --task tts --name "C:\subs\movie.srt" --tts_type 0 --voice_role "zh-CN-YunyangNeural" --target_language_code zh-cn

3. Subtitle Translation (STS)

Translate an SRT subtitle file into another language.

Basic Command Format

bash
uv run cli.py --task sts --name "Absolute SRT file path" --target_language_code "Target Language" [Optional Parameters]

Parameter Details

ParameterTypeRequiredDefaultDescription
--taskstrYesstsTask type identifier
--namestrYes-Absolute path to the SRT subtitle file
--translate_typeintNo0Translation channel ID (see Appendix 3)
--target_language_codestrYes-Target language code (see Appendix 4)
--source_language_codestrNoautoOriginal language code

Usage Examples

Translate subtitles to English (using Google Translate):

bash
uv run cli.py --task sts --name "D:\subs\source.srt" --target_language_code en --translate_type 0

4. Video-to-Video Translation (VTV)

Full pipeline processing: Recognition -> Translation -> Dubbing -> Synthesis, directly producing a translated video.

Basic Command Format

bash
uv run cli.py --task vtv --name "Video path" --source_language_code "Source Language" --target_language_code "Target Language" [Optional Parameters]

Parameter Details

VTV mode integrates parameters from all the above functions. Listed below are additional parameters not covered above.

ParameterTypeRequiredDefaultDescription
--taskstrYesvtvTask type identifier
--namestrYes-Absolute path to the video file
--source_language_codestrYes-Spoken language (Cannot be set to auto in VTV mode)
--target_language_codestrYes-Target language
--subtitle_typeintNo1Subtitle embedding method (see explanation below)
--video_autorateboolNoFalseAutomatically slow down video frames to align with dubbed audio
--is_separateboolNoFalseWhether to separate vocals from background sound (preserve background audio)
--recogn2passboolNoFalseWhether to perform a second pass of speech recognition (improves accuracy)
--clear_cacheboolNoTrueWhether to clean temporary files (default: clean)
--no-clear-cacheflagNo-Add this flag to NOT clean the cache

Regarding values for --subtitle_type:

  • 0: Do not embed subtitles
  • 1: Hard subtitles (default)
  • 2: Soft subtitles
  • 3: Hard bilingual subtitles
  • 4: Soft bilingual subtitles

Usage Examples

Translate Chinese video to English, preserve background audio, embed hard subtitles, use GPU acceleration:

bash
uv run cli.py --task vtv --name "E:\movies\clip.mp4" --source_language_code zh-cn --target_language_code en --voice_role "en-US-GuyNeural" --is_separate --cuda --subtitle_type 1

Appendix: Channel and Code Reference Tables

Appendix 1: Speech Recognition Channel List (--recogn_type)

In the software UI, this corresponds to the specific speech recognition channel sequence number, starting from 0

IDChannel NameNotes
0faster-whisperLocal (Recommended, fast)
1openai-whisperLocal (Official version)
2Alibaba FunASRLocal
3Huggingface_ASR
4OpenAI Speech Recognition APIRequires API Key
5Gemini Large Model Recognition
6Alibaba Bailian Qwen3-ASR
7ByteDance Speech Large Model (Fast Version)
8Zhipu AI GLM-ASR
9Deepgram.com
10ByteDance Audio/Video Subtitle Generation
11Parakeet-tdt
12Whisper.cpp
13Faster-Whisper-XXL.exe
14WhisperX
15302.AI
16ElevenLabs.io
17Google Recognition APIFree
18STT Speech RecognitionLocal
19Custom Recognition API

Supported model names, applicable only to faster-whisper and openai-whisper channels. For other channels, please check the software (--model_name): tiny, small, base, medium, large-v3-turbo, large-v1, large-v2, large-v3

Appendix 2: Text-to-Speech Channel List (--tts_type)

In the software UI, this corresponds to the specific TTS channel sequence number, starting from 0

After selecting the target language and TTS channel, the specific available role names will be displayed in the software UI.

IDChannel NameNotes
0Edge-TTSFree (Recommended)
1piper TTSLocal
2VITSLocal
3Qwen3 TTS
4Doubao TTS Model 2.0
5ByteDance TTS
6Zhipu AI GLM-TTS
7GPT-SoVITSLocal
8F5-TTSLocal
9Index TTSLocal
10CosyVoiceLocal
11SupertonicLocal
12Minimaxi TTS
13OpenAI TTS
14302.AI
15Elevenlabs.io
16Azure-TTS
17Gemini TTS
18VoxCPM TTSLocal
19ChatterBox TTSLocal
20ChatTTSLocal
21Spark TTSLocal
22Dia TTSLocal
23kokoro TTSLocal
24clone-voiceLocal
25Fish TTSLocal
26Google TTSFree
27Custom TTS API

Appendix 3: Translation Channel List (--translate_type)

In the software UI, this corresponds to the specific translation channel sequence number, starting from 0

IDChannel NameNotes
0GoogleFree (Default)
1MicrosoftFree
2M2M100Local
3OpenAI ChatGPT
4DeepSeek
5Gemini AI
6Zhipu AI
7AzureGPT AI
8Compatible AI/Local Model
9OpenRouter
10SillyTavern
11302.AI
12Alibaba Bailian
13ByteDance Large Model
14Tencent Translation
15Baidu Translation
16DeepL
17DeepLx
18Alibaba Machine Translation
19OTTLocal
20LibreTranslateLocal
21MyMemory APIFree
22Custom Translation API

Appendix 4: Language Code List

Applicable to --source_language_code and --target_language_code

CodeLanguageCodeLanguageCodeLanguage
enEnglishzh-cnSimplified Chinesezh-twTraditional Chinese
frFrenchdeGermanjaJapanese
koKoreanruRussianesSpanish
thThaiitItalianptPortuguese
viVietnamesearArabictrTurkish
hiHindihuHungarianukUkrainian
idIndonesianmsMalaykkKazakh
csCzechplPolishnlDutch
svSwedishheHebrewbnBengali
faPersianfilFilipinourUrdu
yueCantonese