Skip to content

How to achieve better sentence segmentation results

  1. Select the optimal model

    • For videos with the spoken language being Chinese, first choose ByteDance Speech Large Model Speedy, Qwen-ASR (Local), Alibaba FunASR (Local) + paraformer-zh
    • For videos with other spoken languages, first choose openai-whisper (Local) + large-v3 model, faster-whisper (Local) model + large-v3 model, OpenAI Online Recognition API
  2. Set appropriate segmentation parameters:

    • In Menu--Tools--Advanced Options--Speech Recognition Parameters area

      • Set Minimum Duration / Millisecond to 1000 (set the minimum subtitle duration in milliseconds)
      • Set Maximum Voice Duration Seconds to 3 to 5 (set the maximum subtitle duration in seconds)
      • Set Silence Segmentation Duration Milliseconds to a value between 140 and 600 (smaller values lead to finer segmentation, larger values result in longer sentences)
    • If the dubbing role on the main interface does not select clone for voice cloning dubbing, you can uncheck Merge Short Subtitles to Adjacent in the Speech Recognition Parameters area

    • In Select VAD, the default ten-vad sentence segmentation model is used; you can try switching to the silero model and adjust it accordingly in the Speech Recognition Parameters area

  3. Second recognition: If dubbing is selected, you can check Second Recognition in the top right corner of the main interface. This will perform speech transcription on the dubbed audio again, generating shorter subtitles; the duration automatically applies half of the set values for Minimum Duration / Millisecond and Maximum Voice Duration Seconds

  4. Select Noise Reduction or Separate Voice and Background Sound: If the audio background is not clean, you can check Noise Reduction (very slow) in the top right corner of the main interface or Separate Voice and Background under Set More Parameters (if both are selected simultaneously, only Separate Voice and Background will be executed)

  5. If using faster-whisper (Local), you can also try unchecking Pre-segment audio with Whisper? in Menu--Tools--Advanced Options--Speech Recognition Parameters area. This may improve sentence segmentation but could also generate longer subtitles

  6. Translate only one video at a time; after speech recognition is complete, an editing box will pop up, allowing you to adjust the recognized subtitle results