Speech-to-Text Tool
Speech-to-Text Tool Open Source Address
This is a locally running offline speech-to-text tool, based on the openai-whisper open-source model. It can recognize human speech from video/audio and convert it into text, outputting in JSON format, SRT subtitle format with timestamps, and plain text format. It can be used for self-deployment as a replacement for the openai speech recognition interface or Baidu speech recognition, with accuracy basically equivalent to the official openai API interface.
After deployment or download, double-click
start.exe
to automatically open the local webpage in your default browser.Drag or click to select the audio or video file to be recognized, then select the spoken language, text output format, and the model to be used (base model is built-in). Click "Start Recognition," and after completion, the results will be output in the selected format on the current webpage.
The entire process requires no internet connection and runs completely locally. It can be deployed on an intranet.
The openai-whisper open-source model has base/small/medium/large/large-v3. The base model is built-in. From base to large-v3, the recognition effect gets better, but the required computer resources also increase. You can download other models as needed and place them in the
models
directory.
Pre-compiled Win Version Usage / Linux and Mac Source Code Deployment
Click here to open the Releases page and download the pre-compiled files.
After downloading, extract it to a location, such as E:/stt
Double-click
start.exe
, wait for the browser window to open automatically.Click the upload area on the page, find the audio or video file you want to recognize in the popup window, or directly drag the audio or video file to the upload area. Then select the spoken language, text output format, and the model to be used. Click "Start Recognition." After a short while, the recognition results will be displayed in the selected format in the bottom text box.
If your machine has an NVIDIA GPU and CUDA is configured correctly, CUDA acceleration will be used automatically.
Source Code Deployment (Linux/Mac/Windows)
Requires python 3.9->3.11
Create an empty directory, such as E:/stt. Open a cmd window in this directory by typing
cmd
in the address bar and pressing Enter.Use git to pull the source code to the current directory:
git clone [email protected]:jianchang512/stt.git .
Create a virtual environment:
python -m venv venv
Activate the environment: On Windows, use the command
%cd%/venv/scripts/activate
. On Linux and Mac, use the commandsource ./venv/bin/activate
Install dependencies:
pip install -r requirements.txt
. If you encounter version conflict errors, please runpip install -r requirements.txt --no-deps
On Windows, extract
ffmpeg.7z
. Placeffmpeg.exe
andffprobe.exe
in the project directory. On Linux and Mac, download the corresponding version of ffmpeg from the ffmpeg website, extract theffmpeg
andffprobe
binaries, and place them in the project root directory.Download the model archive. Download the model as needed. After downloading, place the xx.pt file in the
models
folder in the project root directory.Execute
python start.py
. Wait for the local browser window to open automatically.