643 Commits

Author SHA1 Message Date
SWivid
372f6ab44e address #74 2024-10-15 01:53:35 +08:00
Yushen CHEN
40687a54a6 Merge pull request #73 from jpgallegoar/main
Added back parse_emotional_text
2024-10-15 01:39:51 +08:00
jpgallegoar
894acd3c43 Added back parse_emotional_text 2024-10-14 19:33:04 +02:00
SWivid
1cec6ddf34 minor fix 2024-10-15 01:28:11 +08:00
Yushen CHEN
b648e8b04a Merge pull request #71 from jpgallegoar/main
Added multiple speech types generation
2024-10-15 00:20:23 +08:00
Yushen CHEN
18736b7de3 Merge pull request #72 from chigkim/cli
Fixed not saving output file when remove_silence is false.
2024-10-15 00:16:10 +08:00
jpgallegoar
3d2e8fd2d1 Added multiple speech types generation 2024-10-14 18:04:51 +02:00
SWivid
ac672f363d Update README.md 2024-10-15 00:01:10 +08:00
Chi Kim
0f5fd5e13d Fixed not saving output file when remove_silence is false. 2024-10-14 11:55:16 -04:00
SWivid
9d2b8cb3da fix inference-cli; clean-up 2024-10-14 23:40:31 +08:00
Yushen CHEN
9ec24868a9 Merge pull request #67 from chigkim/cli
Command Line Interface for Inference
2024-10-14 22:37:48 +08:00
Chi Kim
d393827475 Inference commandline interface. 2024-10-14 10:16:36 -04:00
SWivid
408075fa58 Update README.md; add python version, numpy<2.x instruct. 2024-10-14 12:37:22 +08:00
SWivid
ac36558bfd Update README.md 2024-10-14 10:42:56 +08:00
SWivid
d3e15e3fd4 Update README.md 2024-10-14 10:35:19 +08:00
SWivid
e938b40bee add more detailed instruct. on inference. address #49 #50 2024-10-14 10:15:40 +08:00
SWivid
ddb68eea89 minor fix 2024-10-14 01:18:46 +08:00
SWivid
49706f2ebc address #43 #45 2024-10-14 01:10:54 +08:00
SWivid
615d183a0d add code-switch friendly synth. and a smoother silence remover 2024-10-14 00:29:30 +08:00
Yushen CHEN
56222196b7 Merge pull request #38 from RootingInLoad/Batch-Inference&Podcast-Generation
Batch Inference & Podcast Generation
2024-10-13 23:20:39 +08:00
RootingInLoad
30d2f0be16 Batch Inference & Podcast Generation
Here's what the Batch Inference part does:

- Try to put as much characters as possible into one batch (200 max)
- If it's not possible, it'll try to do a cut whenever there's a semicolon character
- If it's not possible, it'll try to do a cut whenever there's a comma character
- If it's not possible, it'll try to do a cut after the most logical word (thus, therefore etc.) --> There's a list at the top of the Gradio script, and it's possible to modify it in Advanced Settings
- If nothing above worked, it's just going to go past that 200 line (realistically, if your text isn't gibberish, this shouldn't happen :D)

The Podcast Generation feature has these features built in:
- Takes two reference speeches and two reference texts (or empty and then transcribed automatically)
- You have to give a name to each of the two speakers
- You can then paste the podcast script, with one speaker's name followed by a semicolon and then their text, you can do the same with the other speaker, all as long as you want (because it's using the same batch inference as before)

All in all, the batch inference feature allow for a little bit more than real-time inference. (I might do another pull request with real-time streaming)

Immense thanks to all of those who worked on this project, it's really great. There's of course still room for improvement, but I think this is a step forward in terms of OSS TTS, so thanks !
2024-10-13 16:35:27 +02:00
SWivid
46d391a876 fix replacement of ckpt keys when do finetune training 2024-10-13 17:20:18 +08:00
SWivid
0d7b47bc3b enable correct ckpt loading for finetune 2024-10-13 14:41:08 +08:00
SWivid
83fbd34dc8 convert all input audio to mono 2024-10-13 13:39:16 +08:00
SWivid
68b4ce0f2b minor fix 2024-10-13 12:58:42 +08:00
SWivid
9395289d7a add ckpt load opt. for .safetensor 2024-10-13 10:55:18 +08:00
Zhikang Niu
edc189fa96 Update trainer.py 2024-10-13 10:04:13 +08:00
SWivid
77abf4e98a separate torch pkg install 2024-10-13 09:46:03 +08:00
Yushen CHEN
0e2d4e866b Merge pull request #19 from fakerybakery/main
Add Gradio app, MPS support
2024-10-13 09:19:22 +08:00
Yushen CHEN
3180d25452 Merge pull request #17 from Mateleo/patch-1
Use https instead of ssh
2024-10-13 09:19:09 +08:00
mrfakename
3365e96075 Add Gradio app, MPS support 2024-10-12 14:36:15 -07:00
Mateleo
93af1f939c use https instead of ssh
ssh git cloning will trigger an error
2024-10-12 18:24:30 +02:00
SWivid
ed0b71aa70 Update README.md 2024-10-11 22:08:53 +08:00
Zhikang Niu
09e398ff4e Update README.md 2024-10-11 21:32:18 +08:00
Zhikang Niu
c01b988360 Update README.md 2024-10-11 21:26:51 +08:00
SWivid
a621c223ec add speech edit test script 2024-10-11 00:41:23 +08:00
SWivid
39ce201c4e disable mask for single infer to save mem; add custom trans for vocab to address oov 2024-10-10 17:05:39 +08:00
Yushen CHEN
f6e3b782c4 Update README.md, add paper link 2024-10-10 12:11:14 +08:00
Yushen CHEN
2fae7c0b13 Update README.md for some instruct. on single inference 2024-10-10 11:29:58 +08:00
Yushen CHEN
b22fe71ef1 Update LICENSE, switch to MIT 2024-10-10 09:45:55 +08:00
SWivid
a6938d56c6 add demo page link 2024-10-08 22:39:18 +08:00
Yushen CHEN
406a7923d9 add suppl. 2024-10-08 22:07:39 +08:00
SWivid
074881635d basic 2024-10-08 21:56:51 +08:00