Skip to content

For the faster-whisper speech recognition channel, only the following settings can achieve the best sentence segmentation effect!

The principle of speech recognition is to split the entire audio into several small segments based on silent intervals. Each segment may be 1 second, 5 seconds, 10 seconds, or 20 seconds long. These small segments are then transcribed into text and combined into subtitle form.

When using faster-whisper mode or GeminiAI as the speech recognition channel, the following settings can achieve relatively good recognition results.

  1. Use a Larger Model: First and foremost, use a larger model. For example, the tiny model is too small and the effect is certainly poor, while the large-v2 model is several times better.

  2. Optimize Settings: Click Menu → Tools → Advanced Options

Find the faster/openai Speech Recognition Adjustment section and make the following modifications:

  • Voice Activity Threshold: Set to 0.5
  • Minimum Duration (ms): Set to 1000
  • Maximum Speech Duration (seconds): Set to 5
  • Silence Split (ms): Set to 200

Of course, you can also test other values according to your needs.

Reduce 403 Error Rate for edge-tts (Also Applicable to Other Dubbing Channels)

Since dubbing requires connecting to Microsoft's API, which has rate limiting measures, 403 errors cannot be completely avoided. However, you can reduce their occurrence with the following adjustments:

Find Menu → Tools/Options → Advanced Options → Dubbing Adjustment as shown below:

  1. Number of Subtitles to Dub Simultaneously: It is recommended to set it to 1. Reducing the number of subtitles dubbed simultaneously can lower errors caused by excessive request frequency. This setting also applies to other dubbing channels.
  2. Pause Time After Dubbing (seconds): For example, set it to 5, meaning pause for 5 seconds after completing the dubbing of one subtitle before starting the next. It is recommended to set this value to 5 or higher to reduce the error rate by extending the request interval.