2.8 KiB
F5-TTS
Demo
Official code for "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Installation
pip install -r requirements.txt
Dataset
# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts
python scripts/prepare_emilia.py
python scripts/prepare_wenetspeech4tts.py
Training
# setup accelerate config, e.g. use multi-gpu ddp, fp16
# will be to: ~/.cache/huggingface/accelerate/default_config.yaml
accelerate config
accelerate launch test_train.py
Inference
Pretrained model ckpts. https://huggingface.co/SWivid/F5-TTS
# single test inference
python test_infer_single.py
Evaluation
download seedtts testset. https://github.com/BytedanceSpeech/seed-tts-eval
download test-clean. http://www.openslr.org/12/
uzip and place under data/, and fill in the path of test-clean in test_infer_batch.py
our librispeech-pc 4-10s subset is already under data/ in this repo
zh asr model ckpt. https://huggingface.co/funasr/paraformer-zh
en asr model ckpt. https://huggingface.co/Systran/faster-whisper-large-v3
wavlm model ckpt. https://drive.google.com/file/d/1-aE1NfzpRCLxA4GUxX9ITI3F9LlbtEGP/view
fill in the path of ckpts in test_infer_batch.py
# batch inference for evaluations
accelerate config # if not set before
bash test_infer_batch.sh
faster-whisper if cuda11,
pip install --force-reinstall ctranslate2==3.24.0
(recommended) pip install faster-whisper==0.10.1,
otherwise may encounter asr failure (output abnormal repetition)
# evaluation for Seed-TTS test set
python scripts/eval_seedtts_testset.py
# evaluation for LibriSpeech-PC test-clean cross sentence
python scripts/eval_librispeech_test_clean.py
Appreciation
- E2-TTS brilliant work, simple and effective
- Emilia, WenetSpeech4TTS valuable datasets
- lucidrains initial CFM structure with also bfs18 for discussion
- SD3 & Huggingface diffusers DiT and MMDiT code structure
- FunASR, faster-whisper & UniSpeech for evaluation tools
- torchdiffeq as ODE solver, Vocos as vocoder