Xiaohongshu has open-sourced FireRedASR, an automatic speech recognition (ASR) project that excels in Chinese speech recognition. Previously, they had only open-sourced a smaller AED model. Recently, they released a larger LLM model, which further enhances recognition accuracy.
This ASR model has been integrated into a bundle and can be conveniently used within the video translation software (pyVideoTrans).
Bundle Download and Model Description
Model Size:
- AED Model (model.pth.tar): 4.35GB
- LLM Model: Contains two models:
- Xiaohongshu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): 17GB in total
The total model size is approximately 21GB. Even when compressed into 7z format, the size still exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or network drives. Therefore, the bundle only includes the core program and does not contain any model files.
After downloading the bundle, please follow the steps below to download the model files separately and place them in the specified location.
Note: The model files are hosted on huggingface.co, which may not be directly accessible in some regions. You may need a VPN to download them.
Bundle Core Download
The bundle core is relatively small, at 1.7GB. You can directly open the following address in your browser to download it:
https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z
After downloading and extracting the archive, you should see a file structure similar to the following:
Download AED Model
Downloading the AED model is relatively simple, requiring only one model file to be downloaded.
Download the
model.pth.tar
file.Download address:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file in thepretrained_models/FireRedASR-AED-L
folder within the bundle directory.
After downloading, the file location should look like this:
Download LLM Model
Downloading the LLM model is slightly more complex, requiring a total of 5 files to be downloaded (1 Xiaohongshu model + 4 Qwen2 models).
1. Download the Xiaohongshu Model (model.pth.tar):
Download address: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file in thepretrained_models/FireRedASR-LLM-L
folder within the bundle. Make sure the folder name includesLLM
, and do not place it in the wrong location.
The file location should look like this:
2. Download the Qwen2 Model (4 files):
Download the files from the following 4 links separately and place them in the
pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instruct
folder within the bundle.- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00001-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00002-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00003-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00004-of-00004.safetensors?download=true
After downloading, the Qwen2-7B-Instruct
folder should contain 4 files, as shown in the following image:
Start the Bundle
Once all model files have been downloaded and placed correctly, double-click the 启动.bat
file in the bundle directory to start the program.
After the program starts, it will automatically open the address http://127.0.0.1:5078
in your browser. If you see the following interface, it means the program has started successfully and you can begin using it.
Using in Video Translation Software
If you want to use the FireRedASR model in the video translation software pyVideoTrans, follow these steps:
Make sure you have downloaded and placed the model files as described above, and have successfully started the bundle.
Open the pyVideoTrans software.
In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition and Compatible AI.
In the settings interface, fill in the relevant information as shown in the following image.
After filling in the information, click Save.
In the speech recognition channel selection, choose OpenAI Speech Recognition.
API Address:
Default address: http://127.0.0.1:5078/v1
Using with OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key='123456',
base_url='http://127.0.0.1:5078/v1')
audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
timeout=86400
)
print(transcript.text)