Commit Graph

665 Commits

Author SHA1 Message Date
Yushen CHEN
04459f71e6 Merge pull request #1266 from ZhikangNiu/main
Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
2026-02-16 11:10:17 +08:00
ZhikangNiu
6768b1bcff Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-16 10:05:02 +08:00
Yushen CHEN
ecfdccb890 Merge pull request #1265 from ZhikangNiu/main
Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.
2026-02-14 11:14:57 +08:00
ZhikangNiu
bb5526fc5b Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage. 2026-02-14 11:05:08 +08:00
SWivid
655fbca552 Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0 2026-02-02 17:48:16 +08:00
Yushen CHEN
fc0fa67a03 Update eval README with ctranslate2 installation instructions
Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.
2026-01-28 19:15:57 +08:00
Yushen CHEN
a3c2ea9784 Merge pull request #1261 from ZhikangNiu/main
Ignore padding at the end of the GT mel spectrogram when training sample
2026-01-26 18:58:47 +08:00
ZhikangNiu
d71a69d528 Ignore padding at the end of the ground truth mel spectrogram when training sample 2026-01-26 09:40:37 +08:00
Yushen CHEN
b9d923088c Increase default max_duration from 4096 to 65536 in cfm.py (#1260) 2026-01-25 23:58:21 +08:00
Yushen CHEN
c279a2b7d5 Merge pull request #1256 from ZhikangNiu/main
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 20:00:54 +08:00
ZhikangNiu
5d473e980c add tqdm in convert text to pinyin 2026-01-22 16:34:39 +08:00
ZhikangNiu
2aefa7c5f7 fix many tensorboard writer and only log in main_process 2026-01-22 13:36:09 +08:00
ZhikangNiu
97fdc7fbb4 change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio 2026-01-22 12:27:23 +08:00
SWivid
1d2f7c5389 Formatting 2026-01-21 13:42:19 +00:00
Yushen CHEN
37a2633d35 Fix epoch updates count logic as drop_last 2026-01-21 21:36:57 +08:00
Raivis Dejus
bc1df7a4fa Adding support for hf:// links on CLI (#1252) 2026-01-21 16:38:30 +08:00
Raivis Dejus
eca786ee0c Merge pull request #1250 from raivisdejus/add-latvian-community-model
Adding Latvian model to shared community models list
2026-01-17 19:29:23 +08:00
Yushen CHEN
27e20fcf39 Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain
Fix speech editing boundary artifacts by working in mel domain
2025-12-26 17:35:57 +08:00
acadarmeria
dff57ebd2a Fix speech editing boundary artifacts by working in mel domain
Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.

This commit refactors the approach to work in mel domain:
- Compute mel spectrogram on the clean original audio first
- Insert zero frames in mel domain instead of zero samples in wav domain
- Use frame-level granularity throughout for consistency

Benefits:
- Eliminates boundary artifacts
- More consistent behavior regardless of small float variations in input times
- Cleaner edit boundaries

Changes to speech_edit.py (lines 148-220):
- Convert audio to mel using model.mel_spec() before editing
- Build mel_cond by concatenating original mel frames + zero frames
- Calculate all time-based values at frame level first, then convert to samples
- Pass mel_cond directly to model.sample() instead of raw audio
2025-12-26 08:49:57 +00:00
SWivid
46ccc575c5 v1.1.15 workaround for gr.Accordion default open=False bug (#1239) 1.1.15 2025-12-21 15:06:44 +08:00
SWivid
39617fcf7a v1.1.12 bump gradio from 5.0 to 6.0, several fixes to ensure compatibility with new gradio version 1.1.12 2025-12-20 18:44:43 +08:00
Yushen Chen
5b82f97c26 fix #1239, use gradio>=6.0; add more clear instruction for ffmpeg installation (#1234) 2025-12-20 16:08:13 +08:00
SWivid
9ae46c8360 Replace jieba pkg with rjieba - a jieba-rs Python binding 1.1.10 2025-11-28 13:08:07 +00:00
SWivid
3eecd94baa support back avg upsampling for batch, cover up non-mask case 2025-11-09 11:56:03 +00:00
SWivid
d9a69452ce formatting 2025-11-09 18:25:30 +08:00
Yushen CHEN
bc15df2b57 Merge pull request #1212 from QingyuLiu0521/fix/AverageUpsampling
Fix Average Upsampling conflict logic, introduced from the previous batch inference fix.
2025-11-09 18:23:38 +08:00
QingyuLiu0521
9b2357a1b9 Fix Average Upsampling 2025-11-08 18:39:06 -05:00
Yushen CHEN
1dcb4e10f7 Add torchcodec dependency to pyproject.toml 2025-11-03 16:44:11 +08:00
SWivid
529d856133 clean-up eval scripts 2025-10-27 14:38:57 +00:00
SWivid
7abadc4c72 fix typo in eval scripts 2025-10-26 14:28:17 +00:00
SWivid
e67d50841e runtime trtllm: fix batch inference skipping last words in shorter sentences #1039 #1179 2025-10-24 09:12:08 +00:00
SWivid
6b07fb03b2 clean-up ruff lint 2025-10-24 08:30:55 +00:00
SWivid
a051a68552 pytorch imple. fix batch inference skipping last words in shorter sentences issue #1039 #1179 2025-10-24 05:50:25 +00:00
Yushen CHEN
f2a4f8581f Update runtime README 2025-10-22 08:37:32 +08:00
SWivid
a17c5ae435 pytorch imple.: fix batch 1 inference from last commit 2025-10-22 00:31:56 +00:00
SWivid
a0b8fb5df2 runtime trtllm: minor fixes. pytorch: update text_embedding logic to correct v0 batching. 2025-10-22 00:19:45 +00:00
SWivid
c8bfc3aa3d runtime trtllm: support v1 and custom 2025-10-21 22:02:25 +00:00
SWivid
8d3ec72159 runtime trtllm: clean-up v0 code, several fixes. 2025-10-20 10:30:58 +00:00
SWivid
65ada48a62 set attn related default value for unet-t backbone: #1192 2025-10-09 06:51:25 +00:00
SWivid
77d3ec623b v1.1.9 1.1.9 2025-09-13 13:42:33 +08:00
SWivid
186799d6dc remove numpy<=1.26.4 for python_version>=3.11 #1162; update links 2025-09-13 13:40:55 +08:00
Yushen CHEN
31bb78f2ab Update badge links 2025-09-03 15:12:24 +08:00
SWivid
e61824009a v1.1.8 1.1.8 2025-08-28 12:33:37 +00:00
SWivid
06a74910bd add option for text embedding late average upsampling 2025-08-28 11:46:11 +00:00
Yushen CHEN
ac3c43595c delete .github/workflows/sync-hf.yaml for online space stablility 2025-08-27 06:52:18 +08:00
Jim
605fa13b42 Fix raw.arrow missing rows (#1145)
* fix raw.arrow missing rows

---------

Co-authored-by: SWivid <swivid@qq.com>
2025-07-22 19:38:44 +08:00
Yushen CHEN
5f35f27230 update pyproject.toml 2025-07-15 17:28:41 +08:00
Yushen CHEN
c96c3aeed8 Update pyproject.toml 1.1.7 2025-07-14 14:36:26 +08:00
Yushen CHEN
9b60fe6a34 update pyproject.toml, set gradio<=5.35.0 until fix #1126 2025-07-14 14:29:19 +08:00
SWivid
a275798a2f last fix patch-1 2025-07-08 18:44:47 +08:00