pyVideoTrans Command Line Tool User Manual

⚠️ Important Notes: Before You Start

Execution Method: This document is based on running the tool via uv run cli.py.
File Paths: The --name parameter must use the absolute path to the file.
Path Quoting: If the path contains spaces, you must wrap the path in English double quotes "".
- ✅ Correct: --name "C:\My Videos\test file.mp4"
- ❌ Incorrect: --name C:\My Videos\test file.mp4
Getting Voice Roles: Please select the corresponding TTS channel and target language in the software UI, then view the available voice roles there. Due to space and readability constraints, they are not listed exhaustively here.

1. Speech-to-Text (STT)

Extract speech from a video or audio file and generate an SRT subtitle file.

Basic Command Format

bash

uv run cli.py --task stt --name "Absolute file path" [Optional Parameters]

Parameter Details

Parameter	Type	Required	Default	Description
`--task`	str	Yes	`stt`	Task type identifier
`--name`	str	Yes	-	Absolute path to the audio/video file
`--recogn_type`	int	No	0	Speech recognition channel ID (see Appendix 1)
`--model_name`	str	No	`tiny`	Model size (`tiny`, `small`, `base`, `medium`, `large-v2`, etc. Please check the specific model names available in the software UI based on the selected recognition channel)
`--detect_language`	str	No	`auto`	Source audio language code, defaults to auto-detection
`--cuda`	bool	No	False	Add this flag to enable GPU (CUDA) acceleration
`--remove_noise`	bool	No	False	Add this flag to enable audio noise reduction
`--enable_diariz`	bool	No	False	Add this flag to enable speaker diarization (distinguish different speakers)
`--nums_diariz`	int	No	-1	Specify the number of speakers (only valid when speaker diarization is enabled)
`--fix_punc`	bool	No	False	Add this flag to attempt punctuation restoration

Usage Examples

Transcribe using Faster-Whisper (tiny model):

bash

uv run cli.py --task stt --name "D:\videos\demo.mp4" --recogn_type 0 --model_name tiny

Use GPU acceleration and specify source language as Chinese:

bash

uv run cli.py --task stt --name "D:\videos\demo.mp4" --detect_language zh-cn --cuda

2. Text-to-Speech (TTS)

Convert an SRT subtitle file or text into speech audio.

Basic Command Format

bash

uv run cli.py --task tts --name "Absolute SRT file path" --voice_role "Voice Name" [Optional Parameters]

Parameter Details

Parameter	Type	Required	Default	Description
`--task`	str	Yes	`tts`	Task type identifier
`--name`	str	Yes	-	Absolute path to the SRT subtitle file
`--tts_type`	int	No	0	Dubbing channel ID (see Appendix 2)
`--voice_role`	str	Yes	-	Voice role name (Please check the specific available role names in the software UI based on the selected TTS channel)
`--voice_rate`	str	No	`+0%`	Speech rate adjustment (e.g., `+10%`, `-10%`)
`--volume`	str	No	`+0%`	Volume adjustment
`--pitch`	str	No	`+0Hz`	Pitch adjustment
`--target_language_code`	str	No	-	Target language code (required for some TTS engines)
`--voice_autorate`	bool	No	False	Automatically speed up audio to align with subtitle timestamps
`--align_sub_audio`	bool	No	False	Force modification of subtitle timestamps to fit audio length

Usage Examples

Dubbing using Edge-TTS (Chinese male voice):

bash

uv run cli.py --task tts --name "C:\subs\movie.srt" --tts_type 0 --voice_role "zh-CN-YunyangNeural" --target_language_code zh-cn

3. Subtitle Translation (STS)

Translate an SRT subtitle file into another language.

Basic Command Format

bash

uv run cli.py --task sts --name "Absolute SRT file path" --target_language_code "Target Language" [Optional Parameters]

Parameter Details

Parameter	Type	Required	Default	Description
`--task`	str	Yes	`sts`	Task type identifier
`--name`	str	Yes	-	Absolute path to the SRT subtitle file
`--translate_type`	int	No	0	Translation channel ID (see Appendix 3)
`--target_language_code`	str	Yes	-	Target language code (see Appendix 4)
`--source_language_code`	str	No	`auto`	Original language code

Usage Examples

Translate subtitles to English (using Google Translate):

bash

uv run cli.py --task sts --name "D:\subs\source.srt" --target_language_code en --translate_type 0

4. Video-to-Video Translation (VTV)

Full pipeline processing: Recognition -> Translation -> Dubbing -> Synthesis, directly producing a translated video.

Basic Command Format

bash

uv run cli.py --task vtv --name "Video path" --source_language_code "Source Language" --target_language_code "Target Language" [Optional Parameters]

Parameter Details

VTV mode integrates parameters from all the above functions. Listed below are additional parameters not covered above.

Parameter	Type	Required	Default	Description
`--task`	str	Yes	`vtv`	Task type identifier
`--name`	str	Yes	-	Absolute path to the video file
`--source_language_code`	str	Yes	-	Spoken language (Cannot be set to `auto` in VTV mode)
`--target_language_code`	str	Yes	-	Target language
`--subtitle_type`	int	No	1	Subtitle embedding method (see explanation below)
`--video_autorate`	bool	No	False	Automatically slow down video frames to align with dubbed audio
`--is_separate`	bool	No	False	Whether to separate vocals from background sound (preserve background audio)
`--recogn2pass`	bool	No	False	Whether to perform a second pass of speech recognition (improves accuracy)
`--clear_cache`	bool	No	True	Whether to clean temporary files (default: clean)
`--no-clear-cache`	flag	No	-	Add this flag to NOT clean the cache

Regarding values for --subtitle_type:

0: Do not embed subtitles
1: Hard subtitles (default)
2: Soft subtitles
3: Hard bilingual subtitles
4: Soft bilingual subtitles

Usage Examples

Translate Chinese video to English, preserve background audio, embed hard subtitles, use GPU acceleration:

bash

uv run cli.py --task vtv --name "E:\movies\clip.mp4" --source_language_code zh-cn --target_language_code en --voice_role "en-US-GuyNeural" --is_separate --cuda --subtitle_type 1

Appendix: Channel and Code Reference Tables

Appendix 1: Speech Recognition Channel List (`--recogn_type`)

In the software UI, this corresponds to the specific speech recognition channel sequence number, starting from 0

ID	Channel Name	Notes
0	faster-whisper	Local (Recommended, fast)
1	openai-whisper	Local (Official version)
2	Alibaba FunASR	Local
3	Huggingface_ASR
4	OpenAI Speech Recognition API	Requires API Key
5	Gemini Large Model Recognition
6	Alibaba Bailian Qwen3-ASR
7	ByteDance Speech Large Model (Fast Version)
8	Zhipu AI GLM-ASR
9	Deepgram.com
10	ByteDance Audio/Video Subtitle Generation
11	Parakeet-tdt
12	Whisper.cpp
13	Faster-Whisper-XXL.exe
14	WhisperX
15	302.AI
16	ElevenLabs.io
17	Google Recognition API	Free
18	STT Speech Recognition	Local
19	Custom Recognition API

Supported model names, applicable only to faster-whisper and openai-whisper channels. For other channels, please check the software (--model_name): tiny, small, base, medium, large-v3-turbo, large-v1, large-v2, large-v3

Appendix 2: Text-to-Speech Channel List (`--tts_type`)

In the software UI, this corresponds to the specific TTS channel sequence number, starting from 0

After selecting the target language and TTS channel, the specific available role names will be displayed in the software UI.

ID	Channel Name	Notes
0	Edge-TTS	Free (Recommended)
1	piper TTS	Local
2	VITS	Local
3	Qwen3 TTS
4	Doubao TTS Model 2.0
5	ByteDance TTS
6	Zhipu AI GLM-TTS
7	GPT-SoVITS	Local
8	F5-TTS	Local
9	Index TTS	Local
10	CosyVoice	Local
11	Supertonic	Local
12	Minimaxi TTS
13	OpenAI TTS
14	302.AI
15	Elevenlabs.io
16	Azure-TTS
17	Gemini TTS
18	VoxCPM TTS	Local
19	ChatterBox TTS	Local
20	ChatTTS	Local
21	Spark TTS	Local
22	Dia TTS	Local
23	kokoro TTS	Local
24	clone-voice	Local
25	Fish TTS	Local
26	Google TTS	Free
27	Custom TTS API

Appendix 3: Translation Channel List (`--translate_type`)

In the software UI, this corresponds to the specific translation channel sequence number, starting from 0

ID	Channel Name	Notes
0	Google	Free (Default)
1	Microsoft	Free
2	M2M100	Local
3	OpenAI ChatGPT
4	DeepSeek
5	Gemini AI
6	Zhipu AI
7	AzureGPT AI
8	Compatible AI/Local Model
9	OpenRouter
10	SillyTavern
11	302.AI
12	Alibaba Bailian
13	ByteDance Large Model
14	Tencent Translation
15	Baidu Translation
16	DeepL
17	DeepLx
18	Alibaba Machine Translation
19	OTT	Local
20	LibreTranslate	Local
21	MyMemory API	Free
22	Custom Translation API

Appendix 4: Language Code List

Applicable to --source_language_code and --target_language_code

Code	Language	Code	Language	Code	Language
en	English	zh-cn	Simplified Chinese	zh-tw	Traditional Chinese
fr	French	de	German	ja	Japanese
ko	Korean	ru	Russian	es	Spanish
th	Thai	it	Italian	pt	Portuguese
vi	Vietnamese	ar	Arabic	tr	Turkish
hi	Hindi	hu	Hungarian	uk	Ukrainian
id	Indonesian	ms	Malay	kk	Kazakh
cs	Czech	pl	Polish	nl	Dutch
sv	Swedish	he	Hebrew	bn	Bengali
fa	Persian	fil	Filipino	ur	Urdu
yue	Cantonese

pyVideoTrans Command Line Tool User Manual ​

⚠️ Important Notes: Before You Start ​

1. Speech-to-Text (STT) ​

Basic Command Format ​

Parameter Details ​

Usage Examples ​

2. Text-to-Speech (TTS) ​

Basic Command Format ​

Parameter Details ​

Usage Examples ​

3. Subtitle Translation (STS) ​

Basic Command Format ​

Parameter Details ​

Usage Examples ​

4. Video-to-Video Translation (VTV) ​

Basic Command Format ​

Parameter Details ​

Usage Examples ​

Appendix: Channel and Code Reference Tables ​

Appendix 1: Speech Recognition Channel List (--recogn_type) ​

Appendix 2: Text-to-Speech Channel List (--tts_type) ​

Appendix 3: Translation Channel List (--translate_type) ​

Appendix 4: Language Code List ​

pyVideoTrans Command Line Tool User Manual

⚠️ Important Notes: Before You Start

1. Speech-to-Text (STT)

Basic Command Format

Parameter Details

Usage Examples

2. Text-to-Speech (TTS)

Basic Command Format

Parameter Details

Usage Examples

3. Subtitle Translation (STS)

Basic Command Format

Parameter Details

Usage Examples

4. Video-to-Video Translation (VTV)

Basic Command Format

Parameter Details

Usage Examples

Appendix: Channel and Code Reference Tables

Appendix 1: Speech Recognition Channel List (`--recogn_type`)

Appendix 2: Text-to-Speech Channel List (`--tts_type`)

Appendix 3: Translation Channel List (`--translate_type`)

Appendix 4: Language Code List