Two Key Factors Determining Translation Quality:
One is the accuracy of the recognized text.
Two is the quality of the translation of that text.
The accuracy of the text directly determines the quality of the translation. Therefore, to improve the quality of translation, we need to address these two aspects.
I: Improving Text Recognition Accuracy:
Use the large-v3 model.
From the base model, small model, medium model to the large-v3 model, the recognition accuracy is increasingly better, but the consumption of computer resources is also increasing. If your computer has a high-performance NVIDIA graphics card with at least 8G of VRAM, and the CUDA and cuDNN environments are properly configured, you can try using the large-v3 model, which can significantly improve the accuracy of text subtitle recognition.
[View CUDA and cuDNN environment installation methods](https://juejin.cn/post/7318704408727519270)
2. Separate background sounds from the video.
If the video contains a lot of background music or noise, it will definitely interfere with the text recognition effect. You can try selecting "Retain background sound," which will separate the background sound before recognition and only use the human speech to recognize, resulting in much better results.
Of course, you can also use other third-party separation tools or the "Separate human voice from background" function on the left side of the software to separate the human voice and background sound in the video.
Then use the "Audio and Video to Text" function to perform subtitle recognition on the human voice separately to obtain the text subtitles.
Then, under "Text Subtitle Translation," translate the subtitles into the target language.
Then, in the "Standard Function Mode," import the subtitles, add background music, and finally embed the dubbing and subtitles into the video. Although the steps are slightly cumbersome, it can significantly improve the translation effect.
3. Manual modification and adjustment
After subtitle recognition and translation are completed, the complete text will be displayed in the subtitle area on the right side of the software. You can click the "Pause" button to pause and manually modify and adjust. No matter how accurate machine recognition and translation are, they will never be as good as manual proofreading.
II: Improving Text Subtitle Translation Quality
The best translation quality is provided by ChatGPT/DeepL/Azure. These three require paid accounts, but none support payments from users in mainland China, and ChatGPT/Azure also require proxy configuration, making them more challenging to use.
If you meet these conditions, have a paid account, and know how to configure a proxy, you can use these three translation channels to improve translation quality (many intermediary proxy services for ChatGPT are available in China).
The next best options are Google/Gemini/Microsoft, all of which are free. Google and Gemini require proxy configuration, while Microsoft does not.
However, Gemini has higher security restrictions. If your video conversation content has a rating, it may be rejected by Gemini for translation.
Next, you can choose Baidu Translate and Tencent Translate. You need to apply for free keys and app IDs on their respective websites. Tencent has a higher free quota, while Baidu's free quota is very low.
In summary, if conditions permit, prioritize ChatGPT/DeepL, then Google, then Microsoft, and finally Tencent Translate and Baidu Translate.
You can also use DeepLx to freeload DeepL, but it is unstable and easily blocks IP addresses.
Similarly, after translation, a pause button will appear. Click pause, and the right-side subtitle area allows manual verification and modification of the translation results.