Skip to content

The core principle of video translation software is: recognize text from the speech in the video, translate the text into the target language, dub the translated text, and then embed the dubbing and text into the video.

As you can see, the first step is to transcribe speech from the video into text. Recognition accuracy directly impacts subsequent translation and dubbing.

faster-whisper (Local) Speech Recognition Channel

This is based on OpenAI's open-source Whisper model but converted for faster performance. As the name suggests, it offers faster recognition speed without compromising accuracy.

image.png

After selecting faster mode, you can choose the model to use on the right side. The first time you use it, the model will be downloaded online, and then speech recognition will be performed locally. Your files will not be uploaded to the internet.

image.png

Model Selection

tiny --> base --> small --> medium --> large-v3-turbo --> large-v1 --> large-v2 --> large-v3

From front to back, model sizes increase, recognition accuracy improves, and required memory and VRAM increase.

It is recommended to choose at least a model equal to or greater than large-v3-turbo. The model with the best performance is large-v3.

Models ending with .en, such as tiny.en->base.en->small.en->medium.en, and models starting with distil, such as distil-large-v3 -> distil-large-v3.5, can only be used for videos with English speech.

Best Configuration for This Channel

To achieve the best speech recognition results, please refer to the following settings:

  1. Select the large-v3 model (ensure your computer has more than 16GB of RAM or more than 10GB of VRAM). If not, try using large-v1/large-v3-turbo models.
  2. Explicitly specify the spoken language, ensuring it matches the language used in the video's audio.
  3. In the Menu - Tools - Advanced Options - Speech Recognition Parameters area: Set Minimum speech duration (ms) to 1000, set Maximum speech duration (s) to a value greater than or equal to 5, and do not select Whisper pre-segment audio.

Note: If you need dubbing and the dubbing role is clone (cloning the original speaker's voice), it is strongly recommended to set Minimum speech duration (ms) to 3000 and Maximum speech duration (s) to 10. This is because voice cloning automatically uses the original audio segment corresponding to the subtitle duration as a reference audio. Most dubbing channels require this reference audio duration to be between 3-10s; otherwise, dubbing is likely to fail. Additionally, you should select both Whisper pre-segment audio and Merge short subtitles with adjacent to ensure subtitle durations fall within the 3-10s range.

  1. If the original audio is not clear or has background noise, please select Denoise.
  2. If you are not using the clone role and want the recognized subtitles to be as short as possible (to suit vertical videos), you can appropriately decrease the Maximum speech duration (s), for example, to 3 or 2. If dubbing is involved, you can also select Secondary recognition.

Secondary recognition: When dubbing is selected, and single subtitle embedding is chosen, selecting "secondary recognition" means that after dubbing is complete, the dubbed audio file will undergo speech recognition again to generate shorter subtitles embedded in the video, ensuring precise alignment between subtitles and dubbing.

CUDA Acceleration

To speed up processing, on Windows and Linux with an NVIDIA graphics card, you can configure and install the CUDA and cuDNN environment, then enable CUDA Acceleration. This will significantly improve execution speed.

image.png

View the CUDA and cuDNN installation tutorial

Manually Download Models for the faster-whisper (Local) Channel

By default, the first time you use a specific model, it will be automatically downloaded online. The original models are hosted on huggingface.co (international) or the Chinese mirror hf-mirror.com. Due to large model sizes and network issues, downloads may fail or be incomplete. You can refer to the following methods for manual download.

Please choose the model you wish to use. The best performing model is large-v3. Without CUDA acceleration, ensure RAM is at least 16GB. With CUDA acceleration, ensure VRAM is greater than 10GB.

  • Download the tiny model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-tiny inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-tiny/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the tiny.en model (English speech only):

    1. Create a folder: Create a folder named models--Systran--faster-whisper-tiny.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-tiny.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the base model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-base inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-base/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the base.en model (English speech only):

    1. Create a folder: Create a folder named models--Systran--faster-whisper-base.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-base.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the small model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-small inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-small/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the small.en model (English speech only):

    1. Create a folder: Create a folder named models--Systran--faster-whisper-small.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-small.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the medium model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-medium inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-medium/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the medium.en model (English speech only):

    1. Create a folder: Create a folder named models--Systran--faster-whisper-medium.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-medium.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the large-v3-turbo model:

    1. Create a folder: Create a folder named models--mobiuslabsgmbh--faster-whisper-large-v3-turbo inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/mobiuslabsgmbh/faster-whisper-large-v3-turbo/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the large-v1 model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-large-v1 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-large-v1/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the large-v2 model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-large-v2 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-large-v2/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the large-v3 model:

    1. Create a folder: Create a folder named models--Systran--faster-whisper-large-v3 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-whisper-large-v3/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.

The following distilled models can only be used for transcribing English speech from audio/video.

  • Download the distil-large-v3 model:

    1. Create a folder: Create a folder named models--Systran--faster-distil-whisper-large-v3 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-distil-whisper-large-v3/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the distil-large-v3.5 model:

    1. Create a folder: Create a folder named models--distil-whisper--distil-large-v3.5-ct2 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/distil-whisper/distil-large-v3.5-ct2/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the distil-small.en model:

    1. Create a folder: Create a folder named models--Systran--faster-distil-whisper-small.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-distil-whisper-small.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the distil-medium.en model:

    1. Create a folder: Create a folder named models--Systran--faster-distil-whisper-medium.en inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-distil-whisper-medium.en/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.
  • Download the distil-large-v2 model:

    1. Create a folder: Create a folder named models--Systran--faster-distil-whisper-large-v2 inside the models folder in the same directory as sp.exe (sp.py).
    2. Open the model download link: https://huggingface.co/Systran/faster-distil-whisper-large-v2/tree/main
    3. Download all .json/.bin/.txt files from that page and copy them into the folder you created. Overwrite existing files if necessary.