There are a total of 14 models, which can be divided into 3 categories, all used to transcribe human speech from videos into subtitles.
To reduce the download size, the software defaults to only including the smallest tiny model. This model has the lowest recognition accuracy. For better results, please download other larger models.
Models usable in both openai and faster modes
tiny : Smallest model, fastest speed, lowest resource consumption, and lowest accuracy
tiny.en :For English-only audio videos
base :
base.en :For English-only audio videos
small ,
small.en :For English-only audio videos
medium
medium.en :For English-only audio videos
large-v1
large-v2
large-v3 :Largest model, highest accuracy, requires 8G or 12G or more available video memory
Models usable only in faster mode
distil-whisper-small.en :For English-only videos
distil-whisper-medium.en :For English-only videos
distil-whisper-large-v2 : Requires 8G or more video memory, currently performs better with English videos, performs poorly with other languages
The first category is models with the suffix .en
For example, tiny.en, base.en, medium.en, etc. As the name suggests, these models are only used for video processing where the original language is English. In other words, if the language spoken in the video you are processing is English, choosing a model with the suffix .en will yield better results than the equivalent model without .en.
The second category is models without .en
These can be used for all supported languages, such as tiny, large-v1, etc.
The third category is models starting with distil
There are currently only three models of this type, and they can only process videos where the original language is English. Even without the .en suffix, it is recommended to use them only for processing English audio videos; processing videos in other languages will result in very poor performance.
These models are characterized by their faster speed. Note that distil models can only be used in faster mode, not in openai mode.
distil-whisper-small.en
distil-whisper-medium.en
distil-whisper-large-v2
faster model download
All models are downloaded from https://github.com/jianchang512/stt/releases/tag/0.0
After opening, select the mode you want to use. It is recommended to choose the faster model for faster speed.
After downloading the faster model, the compressed package contains a folder. Copy the folder inside to the models folder in the software directory.
For example, after downloading the medium model, open the compressed package and you will see the folder
Copy this folder to the models directory
As shown in the image above.
openai model download
The address is the same: https://github.com/jianchang512/stt/releases/tag/0.0
Scroll down, download it, and you will get a file with a .pt suffix. Simply copy this file to the models directory.