The method for integrating F5-TTS with pyVideoTrans described on this page is only applicable to pyVideoTrans versions V3.66 and later. Please ensure you use the
webui.pyfile from the corresponding official open-source project.
Starting from version v3.68, this interface can be used simultaneously for F5-TTS/Spark-TTS/index-TTS/Dia-TTS/VoxCPM. You only need to enter the correct URL address (usually http://127.0.0.1:7860 on your local machine) and select the corresponding service from the dropdown list.
F5-TTS Windows Integrated Package:
- Download Link (Baidu Netdisk): https://pan.baidu.com/s/1A6jBECIQ41OZaa8yTDCgjA?pwd=1234
- hugginface.co: https://huggingface.co/mortimerme/repocollect/resolve/main/f5-tts0528.7z?download=true
For source code deployment, please refer to the official project documentation at https://github.com/SWivid/F5-TTS
Configuration
To use TTS in the video translation software, you first need to launch the corresponding TTS webui interface and keep the terminal window open.
Then, fill in the URL address on the configuration page. The default is http://127.0.0.1:7860. If your launch address is not the default, please enter the actual address.
In the "Reference Audio" field, fill in the following content:
Audio file name you want to use#Corresponding text in that audio file
Note: Please place the reference audio file in the f5-tts folder located in the root directory of the pyVideotrans project. If the folder doesn't exist, create it manually. For example, you can name your reference audio file nverguo.wav.

Example entry:

Click for Spark-TTS Source Code Deployment GuideClick for index-TTS Source Code Deployment GuideClick for Dia-1.6b Source Code Deployment GuideClick for VoxCPM Integrated Package
Adding Other Languages
If you need to use models for other languages, you also need to modify the src/f5_tts/infer/infer_gradio.py file in the F5-TTS Project Directory.
Find the code around line 59:
DEFAULT_TTS_MODEL_CFG = [
"hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
"hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]Code location diagram:

By default, the official Chinese and English models are configured here. If you need to use models for other languages, modify it according to the instructions below. After modification, you need to restart F5-TTS and ensure a stable internet connection is configured (for accessing huggingface.co) so the program can download the new language model online. After a successful download, first test by cloning a voice in the WebUI, then use it through pyVideoTrans.
Important Note: Before use, ensure the dubbing text language in pyVideoTrans matches the model language selected in F5-TTS.
Here are the configurations for various language models:
French:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt", "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}), ]Hindi:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors", "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt", json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]Italian:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://alien79/F5-TTS-italian/model_159600.safetensors", "hf://alien79/F5-TTS-italian/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]Japanese:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt", "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]Russian:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://hotstone228/F5-TTS-Russian/model_last.safetensors", "hf://hotstone228/F5-TTS-Russian/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]Spanish:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://jpgallegoar/F5-Spanish/model_last.safetensors", "hf://jpgallegoar/F5-Spanish/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4}) ]Finnish:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors", "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]
For other languages, similar methods can be used. Check official updates: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md
Common Errors and Cautions
During API usage, you can close the WebUI interface in your browser, but do not close the terminal window that launched F5-TTS.

Can I dynamically switch models in F5-TTS? No. You need to manually modify the code as described above and then restart the WebUI.
Frequently encountering this type of error:
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')This is a proxy/VPN issue. Please ensure you have a stable and fast internet connection configured (scientific internet access) to access huggingface.co. Refer to the configuration instructions above.
