F5-TTS

mirror of https://github.com/SWivid/F5-TTS.git synced 2026-01-15 14:34:22 -08:00

Author	SHA1	Message	Date
SWivid	2cd03c9134	Update README.md. #76	2024-10-15 03:27:25 +08:00
Yushen CHEN	2b37f5d4ff	Merge pull request #77 from AWAS666/main Speed slider in gradio app	2024-10-15 03:24:41 +08:00
AWAS666	664533a0b3	Merge branch 'main' of https://github.com/SWivid/F5-TTS	2024-10-14 21:11:51 +02:00
AWAS666	ff4e797aab	feat: speed slider in gradio	2024-10-14 21:11:50 +02:00
SWivid	f2b892a61b	address #75	2024-10-15 02:37:39 +08:00
SWivid	e54fee3b7f	redirect to split hf ckpt repos	2024-10-15 02:12:20 +08:00
SWivid	372f6ab44e	address #74	2024-10-15 01:53:35 +08:00
Yushen CHEN	40687a54a6	Merge pull request #73 from jpgallegoar/main Added back parse_emotional_text	2024-10-15 01:39:51 +08:00
jpgallegoar	894acd3c43	Added back parse_emotional_text	2024-10-14 19:33:04 +02:00
SWivid	1cec6ddf34	minor fix	2024-10-15 01:28:11 +08:00
Yushen CHEN	b648e8b04a	Merge pull request #71 from jpgallegoar/main Added multiple speech types generation	2024-10-15 00:20:23 +08:00
Yushen CHEN	18736b7de3	Merge pull request #72 from chigkim/cli Fixed not saving output file when remove_silence is false.	2024-10-15 00:16:10 +08:00
jpgallegoar	3d2e8fd2d1	Added multiple speech types generation	2024-10-14 18:04:51 +02:00
SWivid	ac672f363d	Update README.md	2024-10-15 00:01:10 +08:00
Chi Kim	0f5fd5e13d	Fixed not saving output file when remove_silence is false.	2024-10-14 11:55:16 -04:00
SWivid	9d2b8cb3da	fix inference-cli; clean-up	2024-10-14 23:40:31 +08:00
Yushen CHEN	9ec24868a9	Merge pull request #67 from chigkim/cli Command Line Interface for Inference	2024-10-14 22:37:48 +08:00
Chi Kim	d393827475	Inference commandline interface.	2024-10-14 10:16:36 -04:00
SWivid	408075fa58	Update README.md; add python version, numpy<2.x instruct.	2024-10-14 12:37:22 +08:00
SWivid	ac36558bfd	Update README.md	2024-10-14 10:42:56 +08:00
SWivid	d3e15e3fd4	Update README.md	2024-10-14 10:35:19 +08:00
SWivid	e938b40bee	add more detailed instruct. on inference. address #49 #50	2024-10-14 10:15:40 +08:00
SWivid	ddb68eea89	minor fix	2024-10-14 01:18:46 +08:00
SWivid	49706f2ebc	address #43 #45	2024-10-14 01:10:54 +08:00
SWivid	615d183a0d	add code-switch friendly synth. and a smoother silence remover	2024-10-14 00:29:30 +08:00
Yushen CHEN	56222196b7	Merge pull request #38 from RootingInLoad/Batch-Inference&Podcast-Generation Batch Inference & Podcast Generation	2024-10-13 23:20:39 +08:00
RootingInLoad	30d2f0be16	Batch Inference & Podcast Generation Here's what the Batch Inference part does: - Try to put as much characters as possible into one batch (200 max) - If it's not possible, it'll try to do a cut whenever there's a semicolon character - If it's not possible, it'll try to do a cut whenever there's a comma character - If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings - If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D) The Podcast Generation feature has these features built in: - Takes two reference speeches and two reference texts (or empty and then transcribed automatically) - You have to give a name to each of the two speakers - You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before) All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming) Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !	2024-10-13 16:35:27 +02:00
SWivid	46d391a876	fix replacement of ckpt keys when do finetune training	2024-10-13 17:20:18 +08:00
SWivid	0d7b47bc3b	enable correct ckpt loading for finetune	2024-10-13 14:41:08 +08:00
SWivid	83fbd34dc8	convert all input audio to mono	2024-10-13 13:39:16 +08:00
SWivid	68b4ce0f2b	minor fix	2024-10-13 12:58:42 +08:00
SWivid	9395289d7a	add ckpt load opt. for .safetensor	2024-10-13 10:55:18 +08:00
Zhikang Niu	edc189fa96	Update trainer.py	2024-10-13 10:04:13 +08:00
SWivid	77abf4e98a	separate torch pkg install	2024-10-13 09:46:03 +08:00
Yushen CHEN	0e2d4e866b	Merge pull request #19 from fakerybakery/main Add Gradio app, MPS support	2024-10-13 09:19:22 +08:00
Yushen CHEN	3180d25452	Merge pull request #17 from Mateleo/patch-1 Use https instead of ssh	2024-10-13 09:19:09 +08:00
mrfakename	3365e96075	Add Gradio app, MPS support	2024-10-12 14:36:15 -07:00
Mateleo	93af1f939c	use https instead of ssh ssh git cloning will trigger an error	2024-10-12 18:24:30 +02:00
SWivid	ed0b71aa70	Update README.md	2024-10-11 22:08:53 +08:00
Zhikang Niu	09e398ff4e	Update README.md	2024-10-11 21:32:18 +08:00
Zhikang Niu	c01b988360	Update README.md	2024-10-11 21:26:51 +08:00
SWivid	a621c223ec	add speech edit test script	2024-10-11 00:41:23 +08:00
SWivid	39ce201c4e	disable mask for single infer to save mem; add custom trans for vocab to address oov	2024-10-10 17:05:39 +08:00
Yushen CHEN	f6e3b782c4	Update README.md, add paper link	2024-10-10 12:11:14 +08:00
Yushen CHEN	2fae7c0b13	Update README.md for some instruct. on single inference	2024-10-10 11:29:58 +08:00
Yushen CHEN	b22fe71ef1	Update LICENSE, switch to MIT	2024-10-10 09:45:55 +08:00
SWivid	a6938d56c6	add demo page link	2024-10-08 22:39:18 +08:00
Yushen CHEN	406a7923d9	add suppl.	2024-10-08 22:07:39 +08:00
SWivid	074881635d	basic	2024-10-08 21:56:51 +08:00

1 2 3 4 5

249 Commits