Skip to content

Speech Recognition Channels are designed to recognize the human speech in a video and convert it into text, producing subtitles with precise timestamps.

Speech Recognition Channels

Tip: Some channels, such as OpenAI and ByteDance Volcano Engine, require you to set up an API address or key (SK) beforehand. Don't worry, it's simple! Just click the "Speech Recognition Settings" menu at the top of the software and fill in the required information.

Configure API keys in 'Speech Recognition Settings'

Over 10 Supported Speech Recognition Channels ​

To meet diverse user needs, we offer a variety of options, covering local offline models and online cloud services.

Click on a channel's name to see detailed usage instructions.

πŸ’» Local Offline Recognition (No Internet Needed, Protect Privacy) ​

These channels require downloading model files to your computer upon first use, after which they can run completely offline.

  • faster-whisper (Local): A very popular local recognition solution. Known for its speed and low resource usage, it supports dozens of languages and is one of the top choices for local recognition.
  • openai-whisper (Local): The open-source model from OpenAI, offering high accuracy and support for a wide range of languages.
  • Qwen3-ASR (Local): A local ASR model from Alibaba, delivering excellent results for Chinese.
  • Alibaba FunASR (Chinese Recognition): An open-source model from Alibaba DAMO Academy, specifically optimized for Chinese, resulting in highly accurate pronunciation and sentence segmentation.
  • Huggingface_ASR: Supports several models from Hugging Face and one English model from NVIDIA.
  • faster-whisper-xxl.exe: A super-large model version designed for Windows users, providing better recognition results. You need to download the faster-whisper-xxl.exe file separately.
  • whisper.cpp: A recognition channel using whipser.cpp as the backend. You need to deploy the whipser.cpp file yourself.
  • Parakeet-tdt Recognition: An open-source recognition model from NVIDIA. You need to deploy the service yourself and enter your API address in the software settings menu.
  • STT Recognition API: An open-source project that requires self-deployment. Once deployed, fill in the API address in the software.

☁️ Online Recognition (Cloud-Based, Powerful) ​

These channels upload audio files to cloud servers for processing, typically offering excellent results, though some services require payment or have usage limits.

Free or with Free Tiers:

  • Alibaba Bailian Qwen3-ASR: Based on Alibaba's "Tongyi Qianwen" large model. You need to visit the Alibaba Bailian platform to activate the service and create an API Key.
  • Elevenlabs.io Recognition: A service from a company specializing in AI audio technology. Register on their website for a free API Key, but note the free tier has limits.
  • deepgram.com Recognition: A well-known speech recognition service known for high accuracy and speed. Register on deepgram.com for an API Key.
  • Gemini Large Model Recognition: A powerful model from Google, excelling at recognizing less common languages. Requires a Gemini API KEY, but accessing it from China may require a VPN.
  • Google Speech Recognition: A free online recognition service from Google. It performs adequately but requires a VPN for use in China.

Requires Payment or API Key Application:

πŸ”§ Advanced Custom Options (For Developers) ​

If you have some technical background, you can also explore these more flexible solutions:

  • Custom Speech Recognition API: If you can code, you can build your own speech recognition API according to our data format standards, achieving maximum customization.