Skip to content

Explanation of Advanced Settings Options

In the top menu -- Tools/Options -- Advanced Options, you can customize various parameters for finer control. See the image below.

【General Settings】

  • Software Interface Language: Set the language of the software interface. The software needs to be restarted after modification.

  • Single Video Interactive Translation Pause Countdown: The countdown in seconds for pauses during single video interactive translation (set to 0 to skip the editing window).

  • Independent Function Output Directory: Used to set the output location for features like batch speech transcription, batch subtitle dubbing, and batch SRT subtitle translation. This is not the location for video translation results. The default is the output folder under the software installation directory.

  • Retry Count on Failure: Number of retry attempts after a failure (for errors that are potentially recoverable through retries).

  • LLM Re-segmentation Batch Subtitle Lines: The number of subtitles sent per batch when re-segmenting using a Large Language Model (LLM). Larger values provide better segmentation results. Sending all subtitles at once is best, but limited by the maximum output tokens and context (max_token). Input that is too long may exceed AI limits and fail. Default is 20 subtitle lines.

  • AI Channel for LLM Re-segmentation: The AI channel used for LLM re-segmentation. Currently supports OpenAI-ChatGPT or DeepSeek channels.

  • Disable Desktop Notifications: Disables desktop notifications upon task completion or failure.

  • Batch Video Translation Batch Size: When translating videos in batch, set how many videos are processed simultaneously per batch. The default is 0, meaning no limit.

  • Show All Parameters on Main Interface?: To avoid clutter, the main interface hides most parameters by default. Check this to switch to displaying all parameters.

  • Simultaneous CPU Tasks [Restart Required]: Maximum number of simultaneous CPU tasks. A higher number is faster but may consume more memory. The maximum should not exceed the number of CPU cores. Changes take effect after saving and restarting.

  • Simultaneous GPU Tasks [Restart Required]: Number of simultaneous GPU tasks. Set to 1 unless you have multiple GPUs or a single GPU with >24GB VRAM. Changes take effect after saving and restarting.

  • Multi-GPU Mode [Restart Required]: Enable this if you have multiple GPUs. You can simultaneously increase the above Simultaneous GPU Tasks option to 2 or your number of GPUs. Changes take effect after saving and restarting.

【Video Output Control】

  • Video Output Quality Control: Loss control during video transcoding. 0 = lossless but very large file size, 51 = poor quality, small file size.

  • Output Video Compression Ratio: Primarily adjusts the balance between encoding speed and quality. Options include: ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow. Encodes from fastest to slowest, compression rate from low to high, and file size from large to small.

  • 264/265 Encoding: Choose between libx264 or libx265 encoding. libx264 offers better compatibility; libx265 provides better compression and higher definition.

  • Output Video Format (mp4/mkv): The output video format (mp4 or mkv).

  • Force Software Encoding for Video?: Force ffmpeg to use software encoding/decoding? (Slower but more compatible and less prone to errors; hardware encoding is preferred by default).

  • Video Composition CUDA Hardware Decoding: Force the use of CUDA for video decoding during the final video composition step. Faster but more prone to errors.

  • Custom ffmpeg Command Arguments: Custom ffmpeg command arguments. These will be added before the output file position. For example: -bf 7 -b_ref_mode middle

【Speech Recognition Parameters】

  • Select VAD: Select the Voice Activity Detection (VAD) method to use.

  • Speech Threshold: The minimum probability for an audio segment to be considered speech. VAD calculates a speech probability for each segment. Segments exceeding this threshold are treated as speech; others are treated as silence or noise. A lower value is more sensitive but may mistake noise for speech.

  • Non-Speech Threshold: Decreasing this can reduce hallucinations but may also miss text.

  • Maximum Speech Duration (Seconds): Limits the maximum length of a single speech segment. Forces splitting if this duration is exceeded. Enter a number in seconds.

  • Minimum Speech Duration (Milliseconds): The minimum duration for a speech segment. If a subtitle segment is shorter than this value (in ms), an attempt is made to merge it with an adjacent subtitle. Unit is milliseconds.

  • Silence Partition Duration (Milliseconds): The duration of silence required after speech ends to trigger splitting into a new segment. Enter a number in ms. Splitting only occurs at silent segments longer than this value.

  • Merge Short Subtitles with Neighbors: Short subtitle merging is enabled only when this option is selected.

  • Pre-segment Audio for Whisper?: Whether to pre-segment the audio into sentence fragments before sending to the whisper model for recognition. Check this if using the clone dubbing feature, and set the minimum speech duration to 3000 and maximum to 10, to improve voice cloning reliability.

  • Speaker Diarization Model: The model used for speaker diarization. The default built-in model supports Chinese and English. Selecting pyannote requires a token from https://huggingface.co and acceptance of the pyannote organization's license agreement.

    Please visit URL for tutorial: https://pvt9.com/shuohuaren

  • Huggingface Token: Fill in your token from huggingface.co. Required for using pyannote. See tutorial at: https://pvt9.com/shuohuaren

  • CUDA Data Type: The CUDA data type in faster mode. int8 consumes fewer resources and is faster but has lower precision; float32 consumes more resources, is slower, but has higher precision; float16 is suitable for GPU acceleration. default uses the system default.

  • Recognition Accuracy beam_size``: Adjusts accuracy during subtitle recognition (range 1-5). 1 uses the least VRAM, 5 uses the most.

  • Recognition Accuracy best_of``: Adjusts accuracy during subtitle recognition (range 1-5). 1 uses the least VRAM, 5 uses the most.

  • Enable Contextual Awareness: Enabling this may consume more GPU resources but yields better results, though it can increase the chance of repetitions or hallucinations.

  • Repetition Penalty: Increasing this value helps reduce repetitions.

  • Text Compression Rate: Decreasing this value helps reduce repetitions.

  • Sampling Temperature: The sampling temperature.

  • Hotwords: Tell the model which words might appear. Use commas to separate multiple words.

  • Faster-Whisper Model: List of models for faster-whisper, separated by English commas.

  • Whisper.cpp Model: List of model names for whisper.cpp, separated by English commas.

  • Gemini Speech Recognition Batch Slices: The number of audio slices sent per batch when using Gemini for speech recognition. Larger values yield better results but increase the failure rate.

  • Convert Traditional Chinese to Simplified Chinese Subtitles: Force conversion of recognized traditional Chinese subtitles to simplified Chinese.

  • Delete Trailing Punctuation in Subtitles?: Delete trailing punctuation marks in subtitle text.

【Subtitle Translation Adjustment】

  • Traditional Translation Channel Batch Subtitle Lines: The number of subtitle lines sent per batch on traditional translation channels.

  • AI Translation Channel Batch Subtitle Lines: The number of subtitle lines sent per batch on AI translation channels.

  • Translate All Subtitle Lines at Once via AI: Translate all subtitle lines in one batch using the AI translation channel for best quality. Important Notes:

    1. Must use advanced models that support very long context (e.g., top-tier online AI models).
    2. Set the max token value high enough in the corresponding AI channel settings to avoid truncation errors on long outputs.
    3. Response times may be slow, resulting in delayed data return.
  • Pause Seconds After Translation: Pause for a set number of seconds after each translation to limit request frequency.

  • Send Complete Subtitle: Whether to send the complete subtitle format content when using the AI translation channel.

  • AI Translation Model Temperature: The temperature value for the AI translation model. Default is 0.2.

【Subtitle Dubbing Adjustment】

  • Concurrent Dubbing Threads: The number of concurrent threads used for dubbing.

  • Pause Seconds After Dubbing: Pause for a set number of seconds after each dubbing process to limit request frequency.

  • Remove Silence Buffers Before/After Dubbed Segments: Removes silence buffers before and after each dubbed subtitle segment. Improves audio-video synchronization but might cause abrupt endings.

  • Keep Dubbing File for Each Subtitle: Retain the dubbing result file for each individual subtitle line.

  • Text Normalization: Normalize the text before dubbing.

  • ChatTTS Timbre Value: The timbre value for ChatTTS.

  • EdgeTTS Dubbing Channel Concurrency: The concurrency level for dubbing on the EdgeTTS channel. Higher values are faster but risk rate limiting and failure.

  • EdgeTTS Dubbing Channel Retry Count on Failure: The number of retry attempts after failure on the EdgeTTS channel. Some failures are not recoverable by retries; setting this too high only increases processing time.

  • Vocal/Background Separation Threads: The number of threads used for separating vocals from background music/noise. More threads are faster but consume more resources.

  • Background Sound Separation Model: Select the model used for separating background sound.

【Subtitle/Sound/Video Alignment】

  • Maximum Audio Speed-up Multiplier: The maximum multiplier for speeding up audio. Default is 100.

  • Maximum Video Slow-down Multiplier: The maximum multiplier for slowing down video. Default is 10. Cannot exceed 10.

  • CJK Subtitle Characters Per Line: The maximum number of Chinese, Japanese, or Korean characters per line for subtitles. Text exceeding this limit will wrap to the next line. Applies only to target subtitles in video translation or standalone speech transcription output.

  • Other Language Subtitle Characters Per Line: The maximum number of characters per line for subtitles in languages other than CJK. Text exceeding this limit will wrap. Applies only to target subtitles in video translation or standalone speech transcription output.

【Whisper Model Prompts】

  • Whisper Model Chinese Simplified Prompt: Prompt sent to the whisper model when the spoken language is Simplified Chinese.

  • Whisper Model Chinese Traditional Prompt: Prompt sent when the spoken language is Traditional Chinese.

  • Whisper Model English Prompt: Prompt sent when the spoken language is English.

  • Whisper Model French Prompt: Prompt sent when the spoken language is French.

  • Whisper Model German Prompt: Prompt sent when the spoken language is German.

  • Whisper Model Japanese Prompt: Prompt sent when the spoken language is Japanese.

  • Whisper Model Korean Prompt: Prompt sent when the spoken language is Korean.

  • Whisper Model Russian Prompt: Prompt sent when the spoken language is Russian.

  • Whisper Model Spanish Prompt: Prompt sent when the spoken language is Spanish.

  • Whisper Model Thai Prompt: Prompt sent when the spoken language is Thai.

  • Whisper Model Italian Prompt: Prompt sent when the spoken language is Italian.

  • Whisper Model Greek Prompt: Prompt sent when the spoken language is Greek.

  • Whisper Model Khmer Prompt: Prompt sent when the spoken language is Khmer.

  • Whisper Model Norwegian Prompt: Prompt sent when the spoken language is Norwegian.

  • Whisper Model Portuguese Prompt: Prompt sent when the spoken language is Portuguese.

  • Whisper Model Vietnamese Prompt: Prompt sent when the spoken language is Vietnamese.

  • Whisper Model Arabic Prompt: Prompt sent when the spoken language is Arabic.

  • Whisper Model Turkish Prompt: Prompt sent when the spoken language is Turkish.

  • Whisper Model Hindi Prompt: Prompt sent when the spoken language is Hindi.

  • Whisper Model Hungarian Prompt: Prompt sent when the spoken language is Hungarian.

  • Whisper Model Ukrainian Prompt: Prompt sent when the spoken language is Ukrainian.

  • Whisper Model Indonesian Prompt: Prompt sent when the spoken language is Indonesian.

  • Whisper Model Malay Prompt: Prompt sent when the spoken language is Malay.

  • Whisper Model Kazakh Prompt: Prompt sent when the spoken language is Kazakh.

  • Whisper Model Czech Prompt: Prompt sent when the spoken language is Czech.

  • Whisper Model Polish Prompt: Prompt sent when the spoken language is Polish.

  • Whisper Model Dutch Prompt: Prompt sent when the spoken language is Dutch.

  • Whisper Model Swedish Prompt: Prompt sent when the spoken language is Swedish.

  • Whisper Model Hebrew Prompt: Prompt sent when the spoken language is Hebrew.

  • Whisper Model Bengali Prompt: Prompt sent when the spoken language is Bengali.

  • Whisper Model Persian Prompt: Prompt sent when the spoken language is Persian.

  • Whisper Model Urdu Prompt: Prompt sent when the spoken language is Urdu.

  • Whisper Model Cantonese Prompt: Prompt sent when the spoken language is Cantonese.

  • Whisper Model Romanian Prompt: Prompt sent when the spoken language is Romanian.

  • Whisper Model Filipino Prompt: Prompt sent when the spoken language is Filipino.