Yushen CHEN
c279a2b7d5
Merge pull request #1256 from ZhikangNiu/main
...
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 20:00:54 +08:00
ZhikangNiu
5d473e980c
add tqdm in convert text to pinyin
2026-01-22 16:34:39 +08:00
ZhikangNiu
2aefa7c5f7
fix many tensorboard writer and only log in main_process
2026-01-22 13:36:09 +08:00
ZhikangNiu
97fdc7fbb4
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 12:27:23 +08:00
SWivid
1d2f7c5389
Formatting
2026-01-21 13:42:19 +00:00
Yushen CHEN
37a2633d35
Fix epoch updates count logic as drop_last
2026-01-21 21:36:57 +08:00
Raivis Dejus
bc1df7a4fa
Adding support for hf:// links on CLI ( #1252 )
2026-01-21 16:38:30 +08:00
Raivis Dejus
eca786ee0c
Merge pull request #1250 from raivisdejus/add-latvian-community-model
...
Adding Latvian model to shared community models list
2026-01-17 19:29:23 +08:00
Yushen CHEN
27e20fcf39
Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain
...
Fix speech editing boundary artifacts by working in mel domain
2025-12-26 17:35:57 +08:00
acadarmeria
dff57ebd2a
Fix speech editing boundary artifacts by working in mel domain
...
Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.
This commit refactors the approach to work in mel domain:
- Compute mel spectrogram on the clean original audio first
- Insert zero frames in mel domain instead of zero samples in wav domain
- Use frame-level granularity throughout for consistency
Benefits:
- Eliminates boundary artifacts
- More consistent behavior regardless of small float variations in input times
- Cleaner edit boundaries
Changes to speech_edit.py (lines 148-220):
- Convert audio to mel using model.mel_spec() before editing
- Build mel_cond by concatenating original mel frames + zero frames
- Calculate all time-based values at frame level first, then convert to samples
- Pass mel_cond directly to model.sample() instead of raw audio
2025-12-26 08:49:57 +00:00
SWivid
46ccc575c5
v1.1.15 workaround for gr.Accordion default open=False bug ( #1239 )
1.1.15
2025-12-21 15:06:44 +08:00
SWivid
39617fcf7a
v1.1.12 bump gradio from 5.0 to 6.0, several fixes to ensure compatibility with new gradio version
1.1.12
2025-12-20 18:44:43 +08:00
Yushen Chen
5b82f97c26
fix #1239 , use gradio>=6.0; add more clear instruction for ffmpeg installation ( #1234 )
2025-12-20 16:08:13 +08:00
SWivid
9ae46c8360
Replace jieba pkg with rjieba - a jieba-rs Python binding
1.1.10
2025-11-28 13:08:07 +00:00
SWivid
3eecd94baa
support back avg upsampling for batch, cover up non-mask case
2025-11-09 11:56:03 +00:00
SWivid
d9a69452ce
formatting
2025-11-09 18:25:30 +08:00
Yushen CHEN
bc15df2b57
Merge pull request #1212 from QingyuLiu0521/fix/AverageUpsampling
...
Fix Average Upsampling conflict logic, introduced from the previous batch inference fix.
2025-11-09 18:23:38 +08:00
QingyuLiu0521
9b2357a1b9
Fix Average Upsampling
2025-11-08 18:39:06 -05:00
Yushen CHEN
1dcb4e10f7
Add torchcodec dependency to pyproject.toml
2025-11-03 16:44:11 +08:00
SWivid
529d856133
clean-up eval scripts
2025-10-27 14:38:57 +00:00
SWivid
7abadc4c72
fix typo in eval scripts
2025-10-26 14:28:17 +00:00
SWivid
e67d50841e
runtime trtllm: fix batch inference skipping last words in shorter sentences #1039 #1179
2025-10-24 09:12:08 +00:00
SWivid
6b07fb03b2
clean-up ruff lint
2025-10-24 08:30:55 +00:00
SWivid
a051a68552
pytorch imple. fix batch inference skipping last words in shorter sentences issue #1039 #1179
2025-10-24 05:50:25 +00:00
Yushen CHEN
f2a4f8581f
Update runtime README
2025-10-22 08:37:32 +08:00
SWivid
a17c5ae435
pytorch imple.: fix batch 1 inference from last commit
2025-10-22 00:31:56 +00:00
SWivid
a0b8fb5df2
runtime trtllm: minor fixes. pytorch: update text_embedding logic to correct v0 batching.
2025-10-22 00:19:45 +00:00
SWivid
c8bfc3aa3d
runtime trtllm: support v1 and custom
2025-10-21 22:02:25 +00:00
SWivid
8d3ec72159
runtime trtllm: clean-up v0 code, several fixes.
2025-10-20 10:30:58 +00:00
SWivid
65ada48a62
set attn related default value for unet-t backbone: #1192
2025-10-09 06:51:25 +00:00
SWivid
77d3ec623b
v1.1.9
1.1.9
2025-09-13 13:42:33 +08:00
SWivid
186799d6dc
remove numpy<=1.26.4 for python_version>=3.11 #1162 ; update links
2025-09-13 13:40:55 +08:00
Yushen CHEN
31bb78f2ab
Update badge links
2025-09-03 15:12:24 +08:00
SWivid
e61824009a
v1.1.8
1.1.8
2025-08-28 12:33:37 +00:00
SWivid
06a74910bd
add option for text embedding late average upsampling
2025-08-28 11:46:11 +00:00
Yushen CHEN
ac3c43595c
delete .github/workflows/sync-hf.yaml for online space stablility
2025-08-27 06:52:18 +08:00
Jim
605fa13b42
Fix raw.arrow missing rows ( #1145 )
...
* fix raw.arrow missing rows
---------
Co-authored-by: SWivid <swivid@qq.com >
2025-07-22 19:38:44 +08:00
Yushen CHEN
5f35f27230
update pyproject.toml
2025-07-15 17:28:41 +08:00
Yushen CHEN
c96c3aeed8
Update pyproject.toml
1.1.7
2025-07-14 14:36:26 +08:00
Yushen CHEN
9b60fe6a34
update pyproject.toml, set gradio<=5.35.0 until fix #1126
2025-07-14 14:29:19 +08:00
SWivid
a275798a2f
last fix patch-1
2025-07-08 18:44:47 +08:00
SWivid
efc7a7498b
fix #1111 #1037 remove redundant unwrap_model for AcceleratedOptimizer; which has no attribute '_modules' thus conflict with has_compiled_regions check introduced in accelerate v1.7.0
2025-07-08 18:39:43 +08:00
SWivid
9842314127
update slicer in finetune_gradio, legacy min_length 2s changed to 20s
2025-07-08 16:59:46 +08:00
SWivid
69b0e0110e
v1.1.6 fla support, several changed for finetune and infer-cli
1.1.6
2025-07-03 00:08:42 +08:00
SWivid
52c84776e5
fine-grained speed control for infer-cli. #1112
2025-07-02 23:41:55 +08:00
Danh Tran
ebbd7bd91f
Update WAV File Naming and Dependencies 📝 🔊 ( #1091 )
...
* Update infer_cli.py
* Update pyproject.toml
* formalized
---------
Co-authored-by: SWivid <swivid@qq.com >
2025-06-24 23:23:00 +08:00
Yushen CHEN
ac42286d04
update finetune_gradio.py, not to force lower case
...
Not to force lower case, otherwise train infer mismatch with main infer code
2025-06-23 16:37:51 +08:00
Yushen CHEN
d937efa6f3
fix finetune_gradio.py, not to force lower case
2025-06-23 16:22:33 +08:00
Yushen CHEN
8975fca803
Merge pull request #1084 from starkwj/main
...
Speedup inference by batching CFG in DiT
2025-06-12 03:54:04 +08:00
SWivid
8b0053ad0c
backward compatibility
2025-06-12 03:52:12 +08:00