Improving the Quality of AI-Translated Subtitles
When using AI to translate SRT subtitles, there are generally two methods.
Method One: Complete Translation with Subtitle Formatting, Including Unnecessary Translations of "Line Numbers" and "Timestamp Lines".
The following example shows a complete submission with formatting:
1
00:00:01,950 --> 00:00:04,950
Organic molecules have been discovered in the Pentastar system.
2
00:00:04,950 --> 00:00:07,902
We are still a long way from third contact.
3
00:00:07,902 --> 00:00:11,958
The microwave has been deployed for the filming mission for a year now.
Advantages Considers context, resulting in better translation quality.
Disadvantages Besides wasting tokens, it may also cause subtitle formatting errors during translation, resulting in the translated output no longer being a valid SRT subtitle format. For example, English symbols ,: may be incorrectly changed to Chinese symbols, or line numbers and timestamp lines may be merged into one line.
Method Two: Sending Only the Subtitle Text Content, Then Replacing the Corresponding Text in the Original Subtitle with the Translated Result.
The following format only sends the subtitle text:
Organic molecules have been discovered in the Pentastar system.
We are still a long way from third contact.
The microwave has been deployed for the filming mission for a year now.
Advantages Ensures that the translated result is always a valid SRT subtitle format.
Disadvantages Also obvious, translating subtitle text line by line cannot take context into account, greatly reducing translation quality.
To solve this problem, the software supports translating multiple lines at once, defaulting to 15 lines of subtitles, which can partially address the context.
However, this introduces a new problem: different languages have different grammatical rules and sentence structures, so the original 15 lines may be translated into 14 lines, 13 lines, etc., especially when the preceding and following lines are grammatically part of the same sentence.
If the translated 15 lines of original subtitles are no longer 15 lines, this will certainly cause subtitle confusion. To solve this problem, when the number of lines in the translated result and the original subtitles are inconsistent, they will be translated line by line again to ensure that the number of lines in the subtitles is exactly the same, abandoning the consideration of context.
The software defaults to the second method, because functionality is more important than ease of use.
Starting from version v2.52, support for the first translation method has been added. It is not enabled by default. If you want to enable it, you need to manually turn it on. After enabling it, when using ChatGPT/Gemini/AzureGPT/302.AI/ByteDance Volcano/LocalLLM for translation, the complete SRT subtitle with formatting will be sent for translation, which can better consider the context and improve the translation quality.
However, it must be noted that the problems mentioned in the first method may occur, resulting in an invalid SRT subtitle, possibly resulting in parsing errors or loss of all content after the error. It is recommended to use this method only on sufficiently intelligent models, such as GPT-4 or larger models. If it is a locally deployed model, it is not recommended to use this method. Due to hardware resource limitations, locally deployed models are generally small and not intelligent enough, making it easier to encounter translation format errors.
Enabling the First Translation Method:
Menu -- Tools/Options -- Advanced Options -- Subtitle Translation Area -- Send complete subtitles when performing AI intelligent translation
Adding a Glossary
Each prompt can include a custom glossary, similar to the following:
**During translation, be sure to use** the glossary I provide for term translation to maintain consistency. The specific glossary is as follows:
* Transformer -> Transformer
* Token -> Token
* LLM/Large Language Model -> Large Language Model
* Generative AI -> Generative AI
* One Health -> One Health
* Radiomics -> Radiomics
* OHHLEP -> OHHLEP
* STEM -> STEM
* SHAPE -> SHAPE
* Single-cell transcriptomics -> Single-cell transcriptomics
* Spatial transcriptomics -> Spatial transcriptomics