Skip to content

Custom Speech Recognition API

If you are not satisfied with the existing speech recognition methods, you can also customize your own speech recognition API. Simply fill in the relevant information in Menu - Speech Recognition Settings - Custom Speech Recognition API.

image-20240901132849867

Fill in your API address, starting with "http". A WAV format audio data with the key name "audio", a sample rate of 16k, and a single channel will be sent to the API address you provide. If your API requires key verification, fill in the relevant password in the key box. This password will be appended to the API address as sk=password.

requests.post(api_url, files={"audio": open(audio_file, 'rb')})

Your API needs to return JSON formatted data. In case of failure, set code to 1 and msg to the reason for the failure.


Failure response:
    res={
        "code":1,
        "msg":"Error reason"
    }

Success response:

res={
            "code":0,
            "data":[
                {
                    "text":"Subtitle text",
                    "time":'00:00:01,000 --> 00:00:06,500'
                },
                {
                    "text":"Subtitle text",
                    "time":'00:00:06,900 --> 00:00:12,200'
                },
                ...multiple
            ]
        }

As follows

    If you have filled in the key password value, it will be appended to the api_url before sending, api_url?sk=the filled sk value

        requests.post(api_url, files={"audio": open(audio_file, 'rb')})

        Failure response:
        res={
            "code":1,
            "msg":"Error reason"
        }

        Success response:
        res={
            "code":0,
            "data":[
                {
                    "text":"Subtitle text",
                    "time":'00:00:01,000 --> 00:00:06,500'
                },
                {
                    "text":"Subtitle text",
                    "time":'00:00:06,900 --> 00:00:12,200'
                },
            ]
        }