Video translation software usually comes with multiple speech recognition channels to transcribe human speech from audio and video into subtitle files. For Chinese and English, the effectiveness of these software is acceptable, but when used for minority languages such as Japanese, Korean, and Indonesian, the effect is not ideal.
This is because the training material for foreign large language models is mainly English, and the effect on Chinese is also unsatisfactory. The training data for domestic models is also mainly concentrated in Chinese and English, with a higher proportion of Chinese.
The lack of training data leads to poor model performance. Fortunately, the Hugging Face website https://huggingface.co gathers a large number of fine-tuned models, including many specifically for minority languages, with quite good results.
This article will introduce how to use Hugging Face models in video translation software to recognize minority languages, taking Japanese as an example.
1. Bypassing Network Restrictions
Due to network restrictions, the website https://huggingface.co cannot be accessed directly from China. You need to configure your network environment to ensure access to this website.
After accessing, you will see the Hugging Face website homepage.
2. Accessing the Models Directory
Click "Automatic Speech Recognition" in the left navigation bar, and all speech recognition models will be displayed on the right.
3. Finding Models Compatible with faster-whisper
The Hugging Face website currently has 20,384 speech recognition models, but not all models are suitable for video translation software. Different models return data in different formats, and video translation software only supports faster-whisper type models.
- Enter "faster-whisper" in the search box to search.
The search results are basically models that can be used in video translation software.
Of course, some models are compatible with faster-whisper, but their names do not contain "faster-whisper". How to find these models?
- Search for the language name, such as "japanese", then click to enter the model details page and check the model description to see if it is compatible with faster-whisper.
If the model name or description does not explicitly mention faster-whisper, then the model is unusable. Even if "whisper" or "whisper-large" appears, it is unusable because "whisper" is used for compatibility with the openai-whisper mode, which is not currently supported by the video translation software. Future support? It depends.
4. Copying the Model ID to the Video Translation Software
After finding a suitable model, copy the model ID and paste it into the "Menu" -> "Tools" -> "Advanced Options" -> "faster and openai model list" of the video translation software.
Copy the model ID.
Paste into the video translation software.
Save the settings.
5. Selecting the faster-whisper Mode
In the speech recognition channel, select the model you just added. If it is not displayed, please restart the software.
After selecting the model and pronunciation language, you can start the recognition.
Note: A proxy must be set, otherwise it will not be able to connect and will report an error. You can try setting a global computer proxy or a system proxy. If you still get an error, please enter the proxy IP and port into the "Network Proxy" text box on the main interface.
For explanation of network proxy, please check https://pyvideotrans.com/proxy
Depending on the network conditions, the download process may take a long time. As long as there is no red error, please wait patiently.