The core principle of video translation software is: to recognize text from the speech in the video, then translate the text into the target language, then dub the translated text, and finally embed the dubbing and text into the video.
The first step is to recognize text from the speech in the video, and the recognition accuracy directly affects the subsequent translation and dubbing.
openai Mode
This mode uses the OpenAI officially open-sourced whisper model. Compared to the faster mode, it is slower but with similar accuracy.
The model selection method on the right is the same. From tiny
to large-v3
, the computer resources consumed increase, and the accuracy increases accordingly.
Note: Although the model names are mostly the same in faster mode and openai mode, the models are not interchangeable. Please download the models for openai mode from https://pyvideotrans.com/model.html
large-v3-turbo Model
OpenAI whisper recently released a model, large-v3-turbo, optimized based on large-v3. Its recognition accuracy is similar to the former, but its size and resource consumption are greatly reduced, making it a good substitute for large-v3.
How to Use
Upgrade the software to version v2.67: https://pyvideotrans.com
After speech recognition, select openai-whisper local in the drop-down box.
Select large-v3-turbo in the model drop-down box.
Download the large-v3-turbo.pt file to the models folder in the software directory.