Skip to content

Google Colab is a free cloud-based programming environment. You can think of it as a computer in the cloud that can run code, process data, and even perform complex AI calculations, such as quickly and accurately converting your audio and video files into subtitles using a large model.

This article will guide you step-by-step on how to use pyVideoTrans on Colab to transcribe audio and video into subtitles, even if you have no programming experience. We will provide a pre-configured Colab notebook; you only need to click a few buttons to complete the process.

Preparation: Internet Access and Google Account

Before you begin, you need two things:

  1. Internet Access: Due to certain reasons, Google services are not directly accessible in mainland China. You need to use special methods to access Google websites.
  2. Google Account: You need a Google account to use Colab. Registration is completely free. With a Google account, you can log in to Colab and use its services.

Make sure you can open Google: https://google.com

Open the Colab Notebook

After ensuring you can access Google's website and log in to your Google account, click the following link to open the Colab notebook we have prepared for you:

https://colab.research.google.com/drive/1kPTeAMz3LnWRnGmabcz4AWW42hiehmfm?usp=sharing

You will see an interface similar to the image below. Since this is a shared notebook, you need to copy it to your own Google Drive before you can modify and run it. Click "Copy to Drive" in the upper left corner. Colab will automatically create a copy for you and open it.

image.png

The following image shows the created page.

image.png

Connect to GPU/TPU

Colab uses the CPU to run code by default. To speed up the transcription process, we need to use a GPU or TPU.

Click "Runtime" -> "Change runtime type" in the menu bar, and then select "GPU" or "TPU" in the "Hardware accelerator" drop-down menu. Click "Save".

image.png

image.png

Once saved, if a dialog box appears, always select Allow, Agree, etc.

It's very simple, divided into three steps

1. Pull the Source Code and Install the Environment

Find the first code block (the gray area with the play button), and click the play button to execute the code. This code will automatically download and install pyvideotrans and other necessary software.

Wait for the code to finish executing. You will see the play button change to a checkmark. Red error messages may appear during this process; you can ignore them.

image.png

2. Check if GPU/TPU is Available

Run the second code block to confirm whether the GPU/TPU is connected successfully. If the output shows CUDA support, it means the connection is successful. If not, please go back and check again whether you are connected to the GPU/TPU.

image.png

image.png

3. Upload Audio/Video and Perform Transcription

  • Upload Files: Click the file icon on the left side of the Colab interface to open the file browser. image.png

Drag and drop your audio/video file from your computer to the blank area of the file browser to upload it.

image.png

  • Copy File Path: After uploading, right-click on the file name, select "Copy path", and obtain the complete path of the file (e.g., /content/your_file_name.mp4). image.png

  • Execute Commandimage.png

For example, the following command:

!python3 cli.py --model large-v2 --language zh --file "/content/1.mp4"

!python3 cli.py This is the fixed beginning command; the exclamation mark is essential.

cli.py can be followed by control parameters, such as which model to use, the audio/video language, whether to use GPU or CPU, and where to find the audio/video file to be transcribed. Only the audio/video file address is required; others can be omitted and default values will be used.

If your video name is 1.mp4, upload it, copy the path, and paste the path, ensuring it's enclosed in double quotes to prevent errors caused by spaces in the name.

!python3 cli.py --file "Paste the copied path here" After pasting and replacing, it becomes: !python3 cli.py --file "/content/1.mp4"

Then click the execution button and wait for it to finish. The required model will be automatically loaded, and the download speed is fast.

image.png

The default model is large-v2. If you want to change it to the large-v3 model, execute the following command:

!python3 cli.py --model large-v3 --file "Paste the copied path"

If you also want to set the language to Chinese:

!python3 cli.py --model large-v3 --language zh --file "Paste the copied path"

Where to Find the Transcription Results

After execution starts, you will find an output folder in the left-side file list. All transcription results are located here, named after the original audio/video file.

image.png

Click on the output name to view all the files inside. Right-click on a file and click "Download" to download it to your local computer.

image.png

Notes

  1. Internet Access
  2. The uploaded files and generated SRT files are only temporarily stored in Colab. When the connection is disconnected or the free Colab time limit is reached, the files will be automatically deleted, including all downloaded source code and installed dependencies. Therefore, please download the generated results in time.
  3. When you open Colab again or reconnect after a disconnection, you need to start again from step one.

image.png 4. If you close the browser, how do you find it next time?

Open this address: https://colab.research.google.com/

Click on the name you used last time.

image.png

  1. As shown in the image above, the name is hard to remember, how to modify it?

image.png