SWivid
2cd03c9134
Update README.md. #76
2024-10-15 03:27:25 +08:00
Yushen CHEN
2b37f5d4ff
Merge pull request #77 from AWAS666/main
...
Speed slider in gradio app
2024-10-15 03:24:41 +08:00
AWAS666
664533a0b3
Merge branch 'main' of https://github.com/SWivid/F5-TTS
2024-10-14 21:11:51 +02:00
AWAS666
ff4e797aab
feat: speed slider in gradio
2024-10-14 21:11:50 +02:00
SWivid
f2b892a61b
address #75
2024-10-15 02:37:39 +08:00
SWivid
e54fee3b7f
redirect to split hf ckpt repos
2024-10-15 02:12:20 +08:00
SWivid
372f6ab44e
address #74
2024-10-15 01:53:35 +08:00
Yushen CHEN
40687a54a6
Merge pull request #73 from jpgallegoar/main
...
Added back parse_emotional_text
2024-10-15 01:39:51 +08:00
jpgallegoar
894acd3c43
Added back parse_emotional_text
2024-10-14 19:33:04 +02:00
SWivid
1cec6ddf34
minor fix
2024-10-15 01:28:11 +08:00
Yushen CHEN
b648e8b04a
Merge pull request #71 from jpgallegoar/main
...
Added multiple speech types generation
2024-10-15 00:20:23 +08:00
Yushen CHEN
18736b7de3
Merge pull request #72 from chigkim/cli
...
Fixed not saving output file when remove_silence is false.
2024-10-15 00:16:10 +08:00
jpgallegoar
3d2e8fd2d1
Added multiple speech types generation
2024-10-14 18:04:51 +02:00
SWivid
ac672f363d
Update README.md
2024-10-15 00:01:10 +08:00
Chi Kim
0f5fd5e13d
Fixed not saving output file when remove_silence is false.
2024-10-14 11:55:16 -04:00
SWivid
9d2b8cb3da
fix inference-cli; clean-up
2024-10-14 23:40:31 +08:00
Yushen CHEN
9ec24868a9
Merge pull request #67 from chigkim/cli
...
Command Line Interface for Inference
2024-10-14 22:37:48 +08:00
Chi Kim
d393827475
Inference commandline interface.
2024-10-14 10:16:36 -04:00
SWivid
408075fa58
Update README.md; add python version, numpy<2.x instruct.
2024-10-14 12:37:22 +08:00
SWivid
ac36558bfd
Update README.md
2024-10-14 10:42:56 +08:00
SWivid
d3e15e3fd4
Update README.md
2024-10-14 10:35:19 +08:00
SWivid
e938b40bee
add more detailed instruct. on inference. address #49 #50
2024-10-14 10:15:40 +08:00
SWivid
ddb68eea89
minor fix
2024-10-14 01:18:46 +08:00
SWivid
49706f2ebc
address #43 #45
2024-10-14 01:10:54 +08:00
SWivid
615d183a0d
add code-switch friendly synth. and a smoother silence remover
2024-10-14 00:29:30 +08:00
Yushen CHEN
56222196b7
Merge pull request #38 from RootingInLoad/Batch-Inference&Podcast-Generation
...
Batch Inference & Podcast Generation
2024-10-13 23:20:39 +08:00
RootingInLoad
30d2f0be16
Batch Inference & Podcast Generation
...
Here's what the Batch Inference part does:
- Try to put as much characters as possible into one batch (200 max)
- If it's not possible, it'll try to do a cut whenever there's a semicolon character
- If it's not possible, it'll try to do a cut whenever there's a comma character
- If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings
- If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D)
The Podcast Generation feature has these features built in:
- Takes two reference speeches and two reference texts (or empty and then transcribed automatically)
- You have to give a name to each of the two speakers
- You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before)
All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming)
Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !
2024-10-13 16:35:27 +02:00
SWivid
46d391a876
fix replacement of ckpt keys when do finetune training
2024-10-13 17:20:18 +08:00
SWivid
0d7b47bc3b
enable correct ckpt loading for finetune
2024-10-13 14:41:08 +08:00
SWivid
83fbd34dc8
convert all input audio to mono
2024-10-13 13:39:16 +08:00
SWivid
68b4ce0f2b
minor fix
2024-10-13 12:58:42 +08:00
SWivid
9395289d7a
add ckpt load opt. for .safetensor
2024-10-13 10:55:18 +08:00
Zhikang Niu
edc189fa96
Update trainer.py
2024-10-13 10:04:13 +08:00
SWivid
77abf4e98a
separate torch pkg install
2024-10-13 09:46:03 +08:00
Yushen CHEN
0e2d4e866b
Merge pull request #19 from fakerybakery/main
...
Add Gradio app, MPS support
2024-10-13 09:19:22 +08:00
Yushen CHEN
3180d25452
Merge pull request #17 from Mateleo/patch-1
...
Use https instead of ssh
2024-10-13 09:19:09 +08:00
mrfakename
3365e96075
Add Gradio app, MPS support
2024-10-12 14:36:15 -07:00
Mateleo
93af1f939c
use https instead of ssh
...
ssh git cloning will trigger an error
2024-10-12 18:24:30 +02:00
SWivid
ed0b71aa70
Update README.md
2024-10-11 22:08:53 +08:00
Zhikang Niu
09e398ff4e
Update README.md
2024-10-11 21:32:18 +08:00
Zhikang Niu
c01b988360
Update README.md
2024-10-11 21:26:51 +08:00
SWivid
a621c223ec
add speech edit test script
2024-10-11 00:41:23 +08:00
SWivid
39ce201c4e
disable mask for single infer to save mem; add custom trans for vocab to address oov
2024-10-10 17:05:39 +08:00
Yushen CHEN
f6e3b782c4
Update README.md, add paper link
2024-10-10 12:11:14 +08:00
Yushen CHEN
2fae7c0b13
Update README.md for some instruct. on single inference
2024-10-10 11:29:58 +08:00
Yushen CHEN
b22fe71ef1
Update LICENSE, switch to MIT
2024-10-10 09:45:55 +08:00
SWivid
a6938d56c6
add demo page link
2024-10-08 22:39:18 +08:00
Yushen CHEN
406a7923d9
add suppl.
2024-10-08 22:07:39 +08:00
SWivid
074881635d
basic
2024-10-08 21:56:51 +08:00