Skip to content

There are a total of 14 models, which can be divided into 3 categories, all used to transcribe human speech from videos into subtitles.

To reduce the download size, the software defaults to only including the smallest tiny model. This model has the lowest recognition accuracy. For better results, please download other larger models.

Models usable in both openai and faster modes

tiny :   Smallest model, fastest speed, lowest resource consumption, and lowest accuracy

tiny.en  :For English-only audio videos

base :

base.en :For English-only audio videos

small ,

small.en :For English-only audio videos

medium

medium.en :For English-only audio videos

large-v1

large-v2

large-v3 :Largest model, highest accuracy, requires 8G or 12G or more available video memory

Models usable only in faster mode

distil-whisper-small.en :For English-only videos

distil-whisper-medium.en :For English-only videos

distil-whisper-large-v2 : Requires 8G or more video memory, currently performs better with English videos, performs poorly with other languages

The first category is models with the suffix .en

For example, tiny.en, base.en, medium.en, etc. As the name suggests, these models are only used for video processing where the original language is English. In other words, if the language spoken in the video you are processing is English, choosing a model with the suffix .en will yield better results than the equivalent model without .en.

The second category is models without .en

These can be used for all supported languages, such as tiny, large-v1, etc.

The third category is models starting with distil

There are currently only three models of this type, and they can only process videos where the original language is English. Even without the .en suffix, it is recommended to use them only for processing English audio videos; processing videos in other languages will result in very poor performance.

These models are characterized by their faster speed. Note that distil models can only be used in faster mode, not in openai mode.

distil-whisper-small.en

distil-whisper-medium.en

distil-whisper-large-v2

faster model download

All models are downloaded from https://github.com/jianchang512/stt/releases/tag/0.0

After opening, select the mode you want to use. It is recommended to choose the faster model for faster speed.

After downloading the faster model, the compressed package contains a folder. Copy the folder inside to the models folder in the software directory.

For example, after downloading the medium model, open the compressed package and you will see the folder

Copy this folder to the models directory

As shown in the image above.

openai model download

The address is the same: https://github.com/jianchang512/stt/releases/tag/0.0

Scroll down, download it, and you will get a file with a .pt suffix. Simply copy this file to the models directory.