Commit Graph

66 Commits

Author SHA1 Message Date
Zhikang Niu
2d26fba7bf Update requirements.txt add tomli 2024-10-15 16:00:25 +08:00
SWivid
49b465f5d8 add credit of gradio multiple speech-type gen to jpgallegoar 2024-10-15 12:32:04 +08:00
Yushen CHEN
524eaba3a8 Merge pull request #90 from SWivid/fakerybakery-patch-1
Fix unexpected indent issue
2024-10-15 12:20:58 +08:00
mrfakename
923e95cadb Fix unexpected indent issue 2024-10-14 21:07:39 -07:00
Yushen CHEN
3acf3e2a9b Update dataset.py, fix typo 2024-10-15 12:05:15 +08:00
Yushen CHEN
658cfa5f31 Merge pull request #88 from JarodMica/main
Update to make passing in custom paths easier for finetuning/training
2024-10-15 12:03:39 +08:00
Yushen CHEN
4cdcccf7a3 Update dataset.py 2024-10-15 12:03:16 +08:00
Yushen CHEN
03048d6142 Merge pull request #86 from chigkim/cli
Added -f option to read text from a file.
2024-10-15 11:58:14 +08:00
Yushen CHEN
a8b455f44c Merge pull request #80 from fakerybakery/sync-hf
Sync to Hugging Face Space
2024-10-15 11:51:03 +08:00
Jarod Mica
6fda7e5f6f Update to make passing in custom paths easier for finetuning/training 2024-10-14 20:13:07 -07:00
Chi Kim
ca8b596976 Added -f to read text from a file. 2024-10-14 22:57:00 -04:00
mrfakename
0297be2541 Add GPU decorator to add compatibility for ZeroGPU 2024-10-14 15:00:19 -07:00
mrfakename
f534c93e9b Sync to Hugging Face space 2024-10-14 14:38:58 -07:00
Yushen CHEN
3287ba5f18 Merge pull request #79 from fakerybakery/patch-1
Reorganize Gradio app
2024-10-15 04:34:33 +08:00
mrfakename
f6b1de2251 Reorganize Gradio app 2024-10-14 13:30:25 -07:00
SWivid
53c772584e Update README.md 2024-10-15 04:08:58 +08:00
SWivid
12ef9d23f3 address #76 #29 2024-10-15 03:45:14 +08:00
SWivid
2cd03c9134 Update README.md. #76 2024-10-15 03:27:25 +08:00
Yushen CHEN
2b37f5d4ff Merge pull request #77 from AWAS666/main
Speed slider in gradio app
2024-10-15 03:24:41 +08:00
AWAS666
664533a0b3 Merge branch 'main' of https://github.com/SWivid/F5-TTS 2024-10-14 21:11:51 +02:00
AWAS666
ff4e797aab feat: speed slider in gradio 2024-10-14 21:11:50 +02:00
SWivid
f2b892a61b address #75 2024-10-15 02:37:39 +08:00
SWivid
e54fee3b7f redirect to split hf ckpt repos 2024-10-15 02:12:20 +08:00
SWivid
372f6ab44e address #74 2024-10-15 01:53:35 +08:00
Yushen CHEN
40687a54a6 Merge pull request #73 from jpgallegoar/main
Added back parse_emotional_text
2024-10-15 01:39:51 +08:00
jpgallegoar
894acd3c43 Added back parse_emotional_text 2024-10-14 19:33:04 +02:00
SWivid
1cec6ddf34 minor fix 2024-10-15 01:28:11 +08:00
Yushen CHEN
b648e8b04a Merge pull request #71 from jpgallegoar/main
Added multiple speech types generation
2024-10-15 00:20:23 +08:00
Yushen CHEN
18736b7de3 Merge pull request #72 from chigkim/cli
Fixed not saving output file when remove_silence is false.
2024-10-15 00:16:10 +08:00
jpgallegoar
3d2e8fd2d1 Added multiple speech types generation 2024-10-14 18:04:51 +02:00
SWivid
ac672f363d Update README.md 2024-10-15 00:01:10 +08:00
Chi Kim
0f5fd5e13d Fixed not saving output file when remove_silence is false. 2024-10-14 11:55:16 -04:00
SWivid
9d2b8cb3da fix inference-cli; clean-up 2024-10-14 23:40:31 +08:00
Yushen CHEN
9ec24868a9 Merge pull request #67 from chigkim/cli
Command Line Interface for Inference
2024-10-14 22:37:48 +08:00
Chi Kim
d393827475 Inference commandline interface. 2024-10-14 10:16:36 -04:00
SWivid
408075fa58 Update README.md; add python version, numpy<2.x instruct. 2024-10-14 12:37:22 +08:00
SWivid
ac36558bfd Update README.md 2024-10-14 10:42:56 +08:00
SWivid
d3e15e3fd4 Update README.md 2024-10-14 10:35:19 +08:00
SWivid
e938b40bee add more detailed instruct. on inference. address #49 #50 2024-10-14 10:15:40 +08:00
SWivid
ddb68eea89 minor fix 2024-10-14 01:18:46 +08:00
SWivid
49706f2ebc address #43 #45 2024-10-14 01:10:54 +08:00
SWivid
615d183a0d add code-switch friendly synth. and a smoother silence remover 2024-10-14 00:29:30 +08:00
Yushen CHEN
56222196b7 Merge pull request #38 from RootingInLoad/Batch-Inference&Podcast-Generation
Batch Inference & Podcast Generation
2024-10-13 23:20:39 +08:00
RootingInLoad
30d2f0be16 Batch Inference & Podcast Generation
Here's what the Batch Inference part does:

- Try to put as much characters as possible into one batch (200 max)
- If it's not possible, it'll try to do a cut whenever there's a semicolon character
- If it's not possible, it'll try to do a cut whenever there's a comma character
- If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings
- If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D)

The Podcast Generation feature has these features built in:
- Takes two reference speeches and two reference texts (or empty and then transcribed automatically)
- You have to give a name to each of the two speakers
- You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before)

All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming)

Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !
2024-10-13 16:35:27 +02:00
SWivid
46d391a876 fix replacement of ckpt keys when do finetune training 2024-10-13 17:20:18 +08:00
SWivid
0d7b47bc3b enable correct ckpt loading for finetune 2024-10-13 14:41:08 +08:00
SWivid
83fbd34dc8 convert all input audio to mono 2024-10-13 13:39:16 +08:00
SWivid
68b4ce0f2b minor fix 2024-10-13 12:58:42 +08:00
SWivid
9395289d7a add ckpt load opt. for .safetensor 2024-10-13 10:55:18 +08:00
Zhikang Niu
edc189fa96 Update trainer.py 2024-10-13 10:04:13 +08:00