Two Key Factors Determining Quality:
The first is the accuracy of the recognized text.
The second is the quality of the translation of that text.
The accuracy of the text directly determines the quality of the text translation. Therefore, to improve translation quality, efforts should be made in these two aspects.
1. Improving Text Recognition Accuracy:
Use the large-v3 model.
From the base, small, and medium models to the large-v3 model, recognition accuracy improves, but so does the consumption of computer resources. If your computer has a high-performance NVIDIA graphics card with at least 8GB of VRAM, and you have configured the CUDA and cuDNN environments, you can try using the large-v3 model, which will significantly improve the accuracy of subtitle text recognition.

Separate background noise from the video.
If the video contains a lot of background music or noise, it will definitely interfere with text recognition. You can try selecting "Keep Background Sound." This will separate the background sound before recognition, using only the human speech for recognition, which yields much better results.

Of course, you can also use other third-party separation tools or the "Separate Voice and Background" function on the left side of the software to separately extract the human voice and background sound from the video.

Then, use the "Audio/Video to Text" function to perform subtitle recognition on the human voice alone to obtain the text subtitles.

Next, under "Text Subtitle Translation," translate these subtitles into the target language.

Then, in the "Standard Function Mode," import these subtitles, add background music, and finally embed the voiceover and subtitles into the video. Although the steps are slightly more cumbersome, this can significantly improve the translation effect.

Manual modification and adjustment.
After subtitle recognition is complete, and after translation is complete, the full current text will be displayed in the subtitle area on the right side of the software. You can click the "Pause" button, and after pausing, manually modify and adjust the text. No matter how accurate machine recognition and translation are, they will never surpass manual proofreading.

2. Improving Text Subtitle Translation Quality

The best translation quality comes from ChatGPT/DeepL/Azure. All three require paid accounts, but none support payments from users within China. Additionally, ChatGPT/Azure require proxy configuration, presenting a higher barrier to entry.
If you meet these conditions—having a paid account and knowing how to configure a proxy—you can use these three translation channels to improve translation quality (Note: There are many relay proxy services for ChatGPT available within China).
The next best options are Google/Gemini/Microsoft. All three are free. Among them, Google and Gemini require proxy configuration, while Microsoft does not.
However, note that Gemini has high security restrictions. If the dialogue content in your video is rated, it might be rejected by Gemini for translation.
Next, you can choose Baidu Translate and Tencent Translate. You need to apply for free keys and appids on their respective websites. Tencent offers a higher free quota, while Baidu's free quota is very low.
In summary, if conditions allow, the first choice is ChatGPT/DeepL, followed by Google, then Microsoft, and finally Tencent Translate and Baidu Translate.
Of course, you can also use DeepLx to access DeepL for free, but it's unstable and your IP might easily get blocked.
Similarly, after translation is complete, a pause button will appear. Click pause, and you can manually check and modify the translation results in the subtitle area on the right.
