Skip to content

Chinese Sentence Segmentation Function Usage

Whisper is a currently mainstream speech recognition model, but it still has significant shortcomings in Chinese recognition. Compared to English speech recognition, Whisper's Chinese recognition results are significantly inferior, not only frequently outputting traditional Chinese characters but also lacking punctuation marks, leading to poor sentence segmentation in the generated subtitles. Even by re-segmenting using the returned character-level timestamps, the results are still unsatisfactory if there is a lack of obvious silent segmentation in the audio and video.

Comparatively, Alibaba's FunASR series models excel in Chinese recognition, but its supported language range is limited, only applicable to Chinese and unable to handle other languages.

Therefore, Alibaba's Chinese punctuation restoration model was introduced in v2.92. This model can restore punctuation marks in the Chinese recognition results and re-segment sentences based on punctuation and silent intervals. Due to the added punctuation restoration model, the software size has increased by approximately 400MB.

Enabling Chinese Sentence Segmentation

The Alibaba Chinese punctuation model will be automatically used to re-segment the results when the following conditions are met:

  1. Check the “Chinese Sentence Segmentation” option in the main interface or the audio/video to subtitle interface;
  2. The spoken language of the audio/video is Chinese;
  3. The speech recognition engine is selected as “faster-whisper”, “openai-whisper”, or “deepgram.com”;
  4. The segmentation mode is selected as Overall Recognition.

After these conditions are met, the system will first restore punctuation marks and then re-segment sentences based on punctuation marks and silent intervals after speech recognition is complete, to improve the accuracy and readability of the subtitles.