Skip to content

Parakeet-API: High-Performance Local Speech Transcription Service

parakeet-api project is a local speech transcription service based on the NVIDIA Parakeet-tdt-0.6b model. It provides an interface compatible with the OpenAI API and a simple Web user interface, allowing you to easily and quickly convert any audio or video files into high-precision SRT subtitles, and can be adapted for pyVideoTrans v3.72+.

Project Open Source Address: https://github.com/jianchang512/parakeet-api

Windows Integration Package Download:

Integration Package Download Link 1: Download from Baidu Netdisk

Integration Package Download Link 2: Download from HuggingFace.co

How to use: After unzipping, double-click 启动.bat, wait for the following interface to appear, and the browser will open automatically, indicating successful startup. Successful Startup Interface

Usage in pyVideoTrans

Parakeet-API can be seamlessly integrated with the video translation tool pyVideoTrans (versions v3.72 and above).

  1. Ensure your parakeet-api service is running locally.
  2. Open the pyVideoTrans software.
  3. In the menu bar, select Speech Recognition(R) -> Nvidia parakeet-tdt.
  4. In the pop-up configuration window, set the "http address" to: http://127.0.0.1:5092/v1
  5. Click "Save" to start using it.


Source Code Deployment Guide

🛠️ Installation and Configuration Guide

This project supports Windows, macOS, and Linux. Please follow the steps below for installation and configuration.

Step 0: Configure Python 3.10 Environment

If you don't have Python 3 installed on your machine, please follow this tutorial: https://pvt9.com/_posts/pythoninstall

Step 1: Prepare FFmpeg

This project uses ffmpeg for audio and video format preprocessing.

  • Windows (Recommended):

    1. Download from FFmpeg Github Repository and unzip to get ffmpeg.exe.
    2. Place the downloaded ffmpeg.exe file directly in the project root directory (at the same level as the app.py file). The program will automatically detect and use it without needing to configure environment variables.
  • macOS (Using Homebrew):

    bash
    brew install ffmpeg
  • Linux (Debian/Ubuntu):

    bash
    sudo apt update && sudo apt install ffmpeg

Step 2: Create Python Virtual Environment and Install Dependencies

  1. Download or clone this project code to your local computer (recommended to place it in a folder with English or numeric names on a non-system drive).

  2. Open a terminal or command prompt and navigate to the project root directory (on Windows, you can type cmd in the folder address bar and press Enter).

  3. Create virtual environment: python -m venv venv

  4. Activate the virtual environment:

    • Windows (CMD/PowerShell): .\venv\Scripts\activate
    • macOS / Linux (Bash/Zsh): source venv/bin/activate
  5. Install dependency libraries:

    • If you do not have an NVIDIA graphics card (CPU only):

      bash
      pip install -r requirements.txt
    • If you have an NVIDIA graphics card (GPU acceleration): a. Ensure you have installed the latest NVIDIA driver and the corresponding CUDA Toolkit. b. Uninstall any potentially old PyTorch version: pip uninstall -y torch c. Install PyTorch matching your CUDA version (using CUDA 12.6 as an example):

      bash
      pip install torch --index-url https://download.pytorch.org/whl/cu126

Step 3: Start the Service

In the terminal with the activated virtual environment, run the following command:

bash
python app.py

You will see the service start-up prompts. The first run will download the model (approximately 1.2GB). Please be patient.

If a lot of prompts appear, don't worry.

Successful Startup Interface

🚀 Usage Guide

Method 1: Using the Web Interface

  1. Open in your browser: http://127.0.0.1:5092
  2. Drag and drop or click to upload your audio/video file.
  3. Click "Start Transcription", wait for the process to complete, then view and download the SRT subtitles below.

Method 2: API Call (Python Example)

You can easily call this service using the openai library.

python
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:5092/v1",
    api_key="any-key",
)

with open("your_audio.mp3", "rb") as audio_file:
    srt_result = client.audio.transcriptions.create(
        model="parakeet",
        file=audio_file,
        response_format="srt"
    )
print(srt_result)