Skip to content

In audio-to-text tasks, background noise or accompaniment can also affect the recognition effect. To obtain more accurate results, it is also necessary to remove the background accompaniment from the audio in advance.

One is vocal-separate: A local offline vocal and background sound separation tool based on spleeter. There is a pre-packaged version for Windows. You can use it by simply unzipping and double-clicking. Mac/Linux requires source code deployment. It has a Chinese interface, is very easy to use, supports direct video processing, and is relatively fast.

Two is Ultimate Vocal Remover: This is the desktop GUI version of uvr5. On Windows, it needs to be installed in the C drive, otherwise problems may occur. It has an English interface, more options, and relatively complex operation, but it also has stronger functions and better effects.


vocal-separate Installation and Usage

1. On Windows, download the pre-packaged version from here first. For other systems, pull the source code for deployment. https://github.com/jianchang512/vocal-separate/releases

2. After downloading, unzip and double-click start.exe. Wait for the browser page to open automatically. If an error similar to the following figure appears, don't worry, this is just a reminder that GPU acceleration is not available, which does not affect usage.

After successful startup, the following browser page will open.

3. As shown in the figure above, drag or click to upload the audio or video you want to separate the vocals from. Videos will be automatically converted to audio before processing after uploading.

Select “2stems” from the model, which will separate the uploaded file into two files: vocals and other sounds.

Of course, you can also choose 4stems and 5stems models. In addition to separating vocals, they will also subdivide other sounds into files such as "drums" and "bass". Generally, only 2stems are used.

You can listen to the separation results on the webpage, click to download, or directly find the separated files in the displayed separation result directory. The vocal file name is vocals.wav, and the other sound file name is accompaniment.wav.

It's that simple.


Ultimate Vocal Remover Installation and Usage

1. First, download from here: https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6

The Windows version can also be downloaded directly via this link. After downloading, double-click the exe file and click "next" all the way to complete the installation. https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe

2. After installation, double-click the desktop icon to launch.

3. As shown in the figure below, select the audio file to be processed, set the output result directory, select the model to be processed, bit rate, and various other options. Except for “Select Input” and “Select Output”, others are optional, keep the default settings.

Select Input”: Click to select the audio file to be processed.

"Select Output": Click to select where to save the processed file.

CHOOSE PROCESS MEHTODS”: Select the processing method. The default is MDX-Net, which should have the best effect. Keep the default setting.

CHOOSE MDX-NET MODEL”: The model to be used corresponding to the above method. If it is not the “MDX-Net” method, you need to download the model separately.

Start Processing”: The start execution button after the selection is complete. Click it to start the separation operation and wait for the prompt to complete.