pyVideoTrans - open source free video translation dubbing software

This article briefly introduces the principle, functions, uses, and usage methods of the "video translation and dubbing software". The main contents include:

What is this thing, and what is its use
How to download, install, and update
Where to download the model
How to choose the translation channel
What is a proxy, is it necessary
How to use it specifically
How to use CUDA acceleration
How to use the original video's voice for dubbing
How to use GPT-SoVIT dubbing
What to do if you encounter problems
Is it chargeable, are there any limitations
Will the project die
Is it possible to modify the source code

What is this thing, and what is its use

This is an open-source video translation and dubbing tool (open-source license GPL-v3), which can translate a video with pronunciation in one language into a video with pronunciation in another language, and embed subtitles in that language. For example, there is an English movie, the pronunciation is English, there are no English subtitles, and no Chinese subtitles. After processing with this tool, it can be converted into a movie with Chinese subtitles and Chinese dubbing.
Open-source address https://github.com/jianchang512/pyvideotrans

In addition to this core function, it also comes with some other tools:

Speech-to-text: Can recognize the sound in video or audio into text, and export it as a subtitle file.
Audio and video separation: Can separate a video into a silent video file and an audio file
Text subtitle translation: Can translate text or srt subtitle files into other languages' text or subtitles
Video subtitle merging: Can embed subtitle files into videos
Audio, video, and subtitle merging: Can merge video files, audio files, and subtitle files into one file
Text-to-speech: Can synthesize any text or srt file into an audio file.
Human voice and background separation: Can separate human voices and other sounds in the video into 2 audio files
Download YouTube videos: Can download YouTube videos online

What is the principle of this tool?

First, use ffmpeg to separate the audio file and silent mp4 from the original video. Then, use the openai-whisper/faster-whisper model to recognize the human voice in the audio and save it as srt subtitles. Then, translate the srt subtitles into the target language and save them as an srt subtitle file. Then, synthesize the translation results into a dubbing audio file.

Then, merge the dubbing audio file, the srt subtitle file, and the original silent mp4 into a video file, thus completing the translation.

Of course, the intermediate steps are more complex, such as extracting background music and human voice, aligning subtitles with sound and picture, voice cloning, CUDA acceleration, etc.

Is it possible to deploy the source code?

Yes, and MacOS and Linux systems do not provide pre-packaged versions, only source code deployment is available. Please refer to the repository page for details: https://github.com/jianchang512/pyvideotrans

How to download, install, and update

Download from GitHub This is an open-source project on GitHub. The preferred download address is naturally GitHub: https://github.com/jianchang512/pyvideotrans/releases. After opening it, select the top download.

If you came from the homepage, such as the address https://github.com/jianchang512/pyvideotrans, open it and click on "Releases" in the middle right of the page to see the download page above.

Updating is very simple. Go to the download page again and see if the latest version is newer than the one you are currently using. If so, download it again, and then unzip and overwrite it.

Download and install from the documentation site

Of course, an easier way is to go to the documentation site and click to download: https://pyvideotrans.com

Unzip and double-click sp.exe to open and use:

Unzip to an English or numeric directory. It is best not to include Chinese or spaces, otherwise some strange problems may occur.

The file list after unzipping is as follows

Where to download the model

The tiny model is built-in by default. This is the smallest and fastest model, and it is also the least accurate model. If you need other models, please download them from this page: https://github.com/jianchang512/stt/releases/tag/0.0

How to choose the translation channel

After recognizing the subtitles, if you need to convert the subtitles to another language, such as an English video, and you want to embed Chinese subtitles after processing, then you need to use a translation channel.

Currently supports Microsoft Translator Google Translate Baidu Translate Tencent Translate DeepL Translate ChatGPT Translate AzureGPT Translate Gemini Pro Translate DeepLx Translate OTT Offline Translate FreeGoogle Translate FreeChatGPT Translate

FreeChatGPT Translate

This is a free ChatGPT translation interface sponsored by apiskey.top. No sk is required, no configuration is needed, just select it to use. Based on the 3.5 model

FreeGoogle Translate: This is a reverse proxy for Google Translate, which can be accessed and used without a proxy, but there is a request limit. It is recommended for novice users who cannot configure a proxy. Other users, if they want to use Google Translate, please fill in the network proxy address.

DeepL Translate: This translation should be the best, even better than chatGPT. Unfortunately, the paid version cannot be purchased in China, and the free version is difficult to call via API. DeepLx is a tool for freeloading DeepL, but local deployment is basically unusable. Due to the large number of subtitles and simultaneous multi-threaded translation, it is easily blocked and restricted by IP. Consider deploying it on Tencent Cloud to reduce the error rate.

https://juejin.cn/user/4441682704623992/posts

Microsoft Translator: Completely free and requires no proxy, but frequent use may still cause IP restrictions.

Google Translate: If you have a proxy and know what a proxy is and how to fill it in, then it is recommended to choose Google Translate first. The free freeloading effect is also great. You only need to fill in the proxy address in the text box.

Check this method, a small tool - use Google Translate directly without a proxy

Tencent Translate: If you don't know anything about proxies, don't bother. Apply for free Tencent Translate, click to view Tencent Translate Api application The first 5 million characters per month are free.

Baidu Translate: You can also apply for Baidu Translate Api, click to view Baidu Translate Api application, unverified users have 50,000 free characters per month, and users who have completed personal verification have 1 million free characters per month.

Using OTT Offline Translation: If you are willing to tinker, you can choose to deploy the free OTT offline translation. The download address is https://github.com/jianchang512/ott. After deployment, fill in the address in the software menu - settings - OTT offline translation.

Using AI translation ChatGPT / Azure / Gemini:

ChatGPT and AzureGPT must have their paid accounts, free accounts are unusable. After having an account, open the menu - settings - OpenAI/chatGPT key and fill in your ChatGPT sk value. AzureGPT and Gemini are also filled in the menu - settings.

Note here that if you are using the official ChatGPT API, you do not need to fill in "API URL". If it is a third-party API, fill in the API address provided by the third party.

ChatGPT access guide: Quickly obtain and configure API keys and fill them into software/tools for use https://juejin.cn/post/7342327642852999168

OpenAI's official ChatGPT and Gemini/AzureGPT must fill in the proxy, otherwise they cannot be accessed.

AzureGPT is also filled in here

Gemini is currently free, and it can be used after filling in the API key and setting the proxy correctly.

What is a proxy, is it necessary

If you want to use Google Translate or ChatGPT official API or Gemini/AzureGPT, then a proxy is necessary. You need to fill in the proxy address in this format: http://127.0.0.1:port number in the proxy address box. Please note that the port number must be the "http type port, not the sock port".

For example, if you are using a certain software, then fill in http://127.0.0.1:10809. If it is a certain software, then fill in http://127.0.0.1:7890. If you have used a proxy but don't know what to fill in, open the software's lower left or upper right corner or other places and carefully look for the http followed by a 4-5 digit number, and then fill in http://127.0.0.1:port number

If you don't understand what a proxy is, for reasons you know, I won't say more. Please search on Baidu yourself.

Please note: The proxy address can be left blank if not needed, but do not fill it in randomly, especially do not fill in the API address here.

How to use it specifically

Double-click sp.exe to open the software. The default interface is as follows

By default, the first one selected on the left is the simple novice mode, which is convenient for novice users to quickly experience the use. Most options have been set by default.

Of course, you can choose the standard function mode to achieve high customization and complete the entire process of video translation + dubbing + embedding subtitles. The other buttons on the left are actually the decomposition of this function, or other simple auxiliary functions. Let's demonstrate how to use it in simple novice mode.

How to use CUDA acceleration

If you have an NVIDIA graphics card, you can configure the CUDA environment, and then select "CUDA acceleration", which will greatly accelerate the process. How to configure it? The content is quite a lot, please refer to this tutorial

How to use the original video's voice for dubbing

First, you need another open-source project: clone-voice: https://github.com/jianchang512/clone-voice. After installing, deploying, and configuring the model, fill in the address of this project in the software menu - settings - original voice cloning API.

Then select "clone-voice" for TTS and "clone" for dubbing role, and you can use it.

How to use GPT-SoVIT dubbing

The software currently supports using GPT-SoVITS for dubbing. After deploying GPT-SoVITS, start the API service, and then fill in the address in the video translation software settings menu - GPT-SOVITS.

You can refer to these 2 articles:

Calling GPT-SoVITS in other software to synthesize speech from text https://juejin.cn/post/7341401110631350324
API improvement and use of GPT-SoVITS project https://juejin.cn/post/7343138052973297702

What to do if you encounter problems

First, carefully read the project homepage: https://github.com/jianchang512/pyvideotrans. Most problems are explained there.

Second, you can visit the documentation website: https://pyvideotrans.com

Third, if you still cannot solve the problem, submit an Issue here: https://github.com/jianchang512/pyvideotrans/issues. Of course, there is also a QQ group on the project homepage: https://github.com/jianchang512/pyvideotrans, you can join the group.

It is recommended to follow my WeChat official account (pyvideotrans), which contains original tutorials and frequently asked questions about this software, as well as relevant skills. Due to limited energy, tutorials for this project are only published on this Nuggets blog and WeChat official account. GitHub and the documentation website will not be updated frequently.

Search for the official account "pyvideotrans" in WeChat search.

Is it chargeable, are there any limitations

The project is open-source under the GPL-v3 license, free to use, without any built-in charging items, and no restrictions (must comply with Chinese law). It can be used freely. Of course, Tencent Translate, Baidu Translate, DeepL Translate, ChatGPT, and AzureGPT are chargeable, but that has nothing to do with me, and they don't give me any money.

Will the project die

There are no projects that will not die, only long-lived and short-lived projects. Projects that rely solely on love to generate electricity may die sooner. Of course, if you want it to die slower and live longer, and to receive effective continuous maintenance and optimization during its survival, you can consider making a donation to help it live a few more days.

Is it possible to modify the source code

The source code is completely open, can be deployed locally, and can be modified and used by yourself. However, note that the source code is open-sourced under the GPL-v3 license. If you integrate the source code into your project, then your project must also be open-source to comply with the open-source license.

What is this thing, and what is its use ​

How to download, install, and update ​

Where to download the model ​

How to choose the translation channel ​

What is a proxy, is it necessary ​

How to use it specifically ​

How to use CUDA acceleration ​

How to use the original video's voice for dubbing ​

How to use GPT-SoVIT dubbing ​

What to do if you encounter problems ​

Is it chargeable, are there any limitations ​

Will the project die ​

Is it possible to modify the source code ​