F5-TTS

mirror of https://github.com/SWivid/F5-TTS.git synced 2026-01-27 15:24:16 -08:00

Author	SHA1	Message	Date
SWivid	21900ba97d	Add zh and code-switch support for gradio	2024-10-16 03:34:15 +08:00
jpgallegoar	e8d16450bc	Added audio crossfading	2024-10-15 20:39:19 +02:00
jpgallegoar	028421eafd	Improved batching and added reference text ending	2024-10-15 19:09:07 +02:00
SWivid	bc6331529a	split pkgs only for eval usage address #97 ; clean-up	2024-10-15 21:14:44 +08:00
SWivid	49b465f5d8	add credit of gradio multiple speech-type gen to jpgallegoar	2024-10-15 12:32:04 +08:00
mrfakename	0297be2541	Add GPU decorator to add compatibility for ZeroGPU	2024-10-14 15:00:19 -07:00
mrfakename	f6b1de2251	Reorganize Gradio app	2024-10-14 13:30:25 -07:00
SWivid	12ef9d23f3	address #76 #29	2024-10-15 03:45:14 +08:00
AWAS666	664533a0b3	Merge branch 'main' of https://github.com/SWivid/F5-TTS	2024-10-14 21:11:51 +02:00
AWAS666	ff4e797aab	feat: speed slider in gradio	2024-10-14 21:11:50 +02:00
SWivid	f2b892a61b	address #75	2024-10-15 02:37:39 +08:00
SWivid	e54fee3b7f	redirect to split hf ckpt repos	2024-10-15 02:12:20 +08:00
SWivid	372f6ab44e	address #74	2024-10-15 01:53:35 +08:00
Yushen CHEN	40687a54a6	Merge pull request #73 from jpgallegoar/main Added back parse_emotional_text	2024-10-15 01:39:51 +08:00
jpgallegoar	894acd3c43	Added back parse_emotional_text	2024-10-14 19:33:04 +02:00
SWivid	1cec6ddf34	minor fix	2024-10-15 01:28:11 +08:00
jpgallegoar	3d2e8fd2d1	Added multiple speech types generation	2024-10-14 18:04:51 +02:00
SWivid	ac36558bfd	Update README.md	2024-10-14 10:42:56 +08:00
SWivid	e938b40bee	add more detailed instruct. on inference. address #49 #50	2024-10-14 10:15:40 +08:00
SWivid	ddb68eea89	minor fix	2024-10-14 01:18:46 +08:00
SWivid	49706f2ebc	address #43 #45	2024-10-14 01:10:54 +08:00
SWivid	615d183a0d	add code-switch friendly synth. and a smoother silence remover	2024-10-14 00:29:30 +08:00
RootingInLoad	30d2f0be16	Batch Inference & Podcast Generation Here's what the Batch Inference part does: - Try to put as much characters as possible into one batch (200 max) - If it's not possible, it'll try to do a cut whenever there's a semicolon character - If it's not possible, it'll try to do a cut whenever there's a comma character - If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings - If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D) The Podcast Generation feature has these features built in: - Takes two reference speeches and two reference texts (or empty and then transcribed automatically) - You have to give a name to each of the two speakers - You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before) All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming) Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !	2024-10-13 16:35:27 +02:00
SWivid	83fbd34dc8	convert all input audio to mono	2024-10-13 13:39:16 +08:00
SWivid	68b4ce0f2b	minor fix	2024-10-13 12:58:42 +08:00
mrfakename	3365e96075	Add Gradio app, MPS support	2024-10-12 14:36:15 -07:00

26 Commits