Commit Graph

26 Commits

Author SHA1 Message Date
SWivid
21900ba97d Add zh and code-switch support for gradio 2024-10-16 03:34:15 +08:00
jpgallegoar
e8d16450bc Added audio crossfading 2024-10-15 20:39:19 +02:00
jpgallegoar
028421eafd Improved batching and added reference text ending 2024-10-15 19:09:07 +02:00
SWivid
bc6331529a split pkgs only for eval usage address #97; clean-up 2024-10-15 21:14:44 +08:00
SWivid
49b465f5d8 add credit of gradio multiple speech-type gen to jpgallegoar 2024-10-15 12:32:04 +08:00
mrfakename
0297be2541 Add GPU decorator to add compatibility for ZeroGPU 2024-10-14 15:00:19 -07:00
mrfakename
f6b1de2251 Reorganize Gradio app 2024-10-14 13:30:25 -07:00
SWivid
12ef9d23f3 address #76 #29 2024-10-15 03:45:14 +08:00
AWAS666
664533a0b3 Merge branch 'main' of https://github.com/SWivid/F5-TTS 2024-10-14 21:11:51 +02:00
AWAS666
ff4e797aab feat: speed slider in gradio 2024-10-14 21:11:50 +02:00
SWivid
f2b892a61b address #75 2024-10-15 02:37:39 +08:00
SWivid
e54fee3b7f redirect to split hf ckpt repos 2024-10-15 02:12:20 +08:00
SWivid
372f6ab44e address #74 2024-10-15 01:53:35 +08:00
Yushen CHEN
40687a54a6 Merge pull request #73 from jpgallegoar/main
Added back parse_emotional_text
2024-10-15 01:39:51 +08:00
jpgallegoar
894acd3c43 Added back parse_emotional_text 2024-10-14 19:33:04 +02:00
SWivid
1cec6ddf34 minor fix 2024-10-15 01:28:11 +08:00
jpgallegoar
3d2e8fd2d1 Added multiple speech types generation 2024-10-14 18:04:51 +02:00
SWivid
ac36558bfd Update README.md 2024-10-14 10:42:56 +08:00
SWivid
e938b40bee add more detailed instruct. on inference. address #49 #50 2024-10-14 10:15:40 +08:00
SWivid
ddb68eea89 minor fix 2024-10-14 01:18:46 +08:00
SWivid
49706f2ebc address #43 #45 2024-10-14 01:10:54 +08:00
SWivid
615d183a0d add code-switch friendly synth. and a smoother silence remover 2024-10-14 00:29:30 +08:00
RootingInLoad
30d2f0be16 Batch Inference & Podcast Generation
Here's what the Batch Inference part does:

- Try to put as much characters as possible into one batch (200 max)
- If it's not possible, it'll try to do a cut whenever there's a semicolon character
- If it's not possible, it'll try to do a cut whenever there's a comma character
- If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings
- If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D)

The Podcast Generation feature has these features built in:
- Takes two reference speeches and two reference texts (or empty and then transcribed automatically)
- You have to give a name to each of the two speakers
- You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before)

All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming)

Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !
2024-10-13 16:35:27 +02:00
SWivid
83fbd34dc8 convert all input audio to mono 2024-10-13 13:39:16 +08:00
SWivid
68b4ce0f2b minor fix 2024-10-13 12:58:42 +08:00
mrfakename
3365e96075 Add Gradio app, MPS support 2024-10-12 14:36:15 -07:00