678 Commits

Author SHA1 Message Date
SWivid
5005714c4c Remove pydantic<=2.10.6 restriction to fit latest gradio version 2026-03-07 19:37:59 +08:00
Yushen CHEN
4533426c72 Bump version from 1.1.16 to 1.1.17 1.1.17 2026-03-04 19:34:03 +08:00
Zhikang Niu-SII
b5ab1afa16 Merge pull request #1270 from ZhikangNiu/main
- Use fused=True for AdamW by default

- Warn on torch attention mask memory usage `if attn_backend == "torch" and attn_mask_enabled`

---------

Co-authored-by: SWivid <swivid@qq.com>
2026-03-04 19:31:52 +08:00
Yushen CHEN
ab75dc2837 Merge pull request #1271 from mlxu995/patch-1
Add show_info parameter to preprocess_ref_audio_text
2026-03-04 18:56:46 +08:00
Menglong Xu
4361b0b94f Add show_info parameter to preprocess_ref_audio_text 2026-03-04 17:04:20 +08:00
Yushen CHEN
097772c917 Merge pull request #1269 from ZhikangNiu/main
feat:add mmdit flash attn support
fix: autocast when use flash_attn to enable log_sample
2026-02-27 01:20:09 +08:00
ZhikangNiu
76c00b127e when use flash_attn, log_sample should under autocast context 2026-02-25 08:10:00 +08:00
ZhikangNiu
d7c7a117fa feat:add mmdit flash attn support 2026-02-23 20:39:35 +08:00
Yushen CHEN
54c50eb8f6 Bump version from 1.1.15 to 1.1.16 1.1.16 2026-02-16 12:37:19 +08:00
Yushen CHEN
65250152da Merge pull request #1267 from QingyuLiu0521/qyl/pr-dit-only
Optimize DiT text embedding with batched per-sample seq handling
2026-02-16 12:28:17 +08:00
QingyuLiu0521
c817d6a21d Unify seq_len naming in DiT get_input_embed 2026-02-15 23:24:11 -05:00
Yushen CHEN
04459f71e6 Merge pull request #1266 from ZhikangNiu/main
Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
2026-02-16 11:10:17 +08:00
QingyuLiu0521
57dc698c16 Apply ruff formatting 2026-02-15 21:41:17 -05:00
QingyuLiu0521
6b6ce47d2e Optimize DiT text embedding with batched per-sample seq handling 2026-02-15 21:31:19 -05:00
ZhikangNiu
6768b1bcff Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-16 10:05:02 +08:00
Yushen CHEN
ecfdccb890 Merge pull request #1265 from ZhikangNiu/main
Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.
2026-02-14 11:14:57 +08:00
ZhikangNiu
bb5526fc5b Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage. 2026-02-14 11:05:08 +08:00
SWivid
655fbca552 Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0 2026-02-02 17:48:16 +08:00
Yushen CHEN
fc0fa67a03 Update eval README with ctranslate2 installation instructions
Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.
2026-01-28 19:15:57 +08:00
Yushen CHEN
a3c2ea9784 Merge pull request #1261 from ZhikangNiu/main
Ignore padding at the end of the GT mel spectrogram when training sample
2026-01-26 18:58:47 +08:00
ZhikangNiu
d71a69d528 Ignore padding at the end of the ground truth mel spectrogram when training sample 2026-01-26 09:40:37 +08:00
Yushen CHEN
b9d923088c Increase default max_duration from 4096 to 65536 in cfm.py (#1260) 2026-01-25 23:58:21 +08:00
Yushen CHEN
c279a2b7d5 Merge pull request #1256 from ZhikangNiu/main
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 20:00:54 +08:00
ZhikangNiu
5d473e980c add tqdm in convert text to pinyin 2026-01-22 16:34:39 +08:00
ZhikangNiu
2aefa7c5f7 fix many tensorboard writer and only log in main_process 2026-01-22 13:36:09 +08:00
ZhikangNiu
97fdc7fbb4 change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio 2026-01-22 12:27:23 +08:00
SWivid
1d2f7c5389 Formatting 2026-01-21 13:42:19 +00:00
Yushen CHEN
37a2633d35 Fix epoch updates count logic as drop_last 2026-01-21 21:36:57 +08:00
Raivis Dejus
bc1df7a4fa Adding support for hf:// links on CLI (#1252) 2026-01-21 16:38:30 +08:00
Raivis Dejus
eca786ee0c Merge pull request #1250 from raivisdejus/add-latvian-community-model
Adding Latvian model to shared community models list
2026-01-17 19:29:23 +08:00
Yushen CHEN
27e20fcf39 Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain
Fix speech editing boundary artifacts by working in mel domain
2025-12-26 17:35:57 +08:00
acadarmeria
dff57ebd2a Fix speech editing boundary artifacts by working in mel domain
Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.

This commit refactors the approach to work in mel domain:
- Compute mel spectrogram on the clean original audio first
- Insert zero frames in mel domain instead of zero samples in wav domain
- Use frame-level granularity throughout for consistency

Benefits:
- Eliminates boundary artifacts
- More consistent behavior regardless of small float variations in input times
- Cleaner edit boundaries

Changes to speech_edit.py (lines 148-220):
- Convert audio to mel using model.mel_spec() before editing
- Build mel_cond by concatenating original mel frames + zero frames
- Calculate all time-based values at frame level first, then convert to samples
- Pass mel_cond directly to model.sample() instead of raw audio
2025-12-26 08:49:57 +00:00
SWivid
46ccc575c5 v1.1.15 workaround for gr.Accordion default open=False bug (#1239) 1.1.15 2025-12-21 15:06:44 +08:00
SWivid
39617fcf7a v1.1.12 bump gradio from 5.0 to 6.0, several fixes to ensure compatibility with new gradio version 1.1.12 2025-12-20 18:44:43 +08:00
Yushen Chen
5b82f97c26 fix #1239, use gradio>=6.0; add more clear instruction for ffmpeg installation (#1234) 2025-12-20 16:08:13 +08:00
SWivid
9ae46c8360 Replace jieba pkg with rjieba - a jieba-rs Python binding 1.1.10 2025-11-28 13:08:07 +00:00
SWivid
3eecd94baa support back avg upsampling for batch, cover up non-mask case 2025-11-09 11:56:03 +00:00
SWivid
d9a69452ce formatting 2025-11-09 18:25:30 +08:00
Yushen CHEN
bc15df2b57 Merge pull request #1212 from QingyuLiu0521/fix/AverageUpsampling
Fix Average Upsampling conflict logic, introduced from the previous batch inference fix.
2025-11-09 18:23:38 +08:00
QingyuLiu0521
9b2357a1b9 Fix Average Upsampling 2025-11-08 18:39:06 -05:00
Yushen CHEN
1dcb4e10f7 Add torchcodec dependency to pyproject.toml 2025-11-03 16:44:11 +08:00
SWivid
529d856133 clean-up eval scripts 2025-10-27 14:38:57 +00:00
SWivid
7abadc4c72 fix typo in eval scripts 2025-10-26 14:28:17 +00:00
SWivid
e67d50841e runtime trtllm: fix batch inference skipping last words in shorter sentences #1039 #1179 2025-10-24 09:12:08 +00:00
SWivid
6b07fb03b2 clean-up ruff lint 2025-10-24 08:30:55 +00:00
SWivid
a051a68552 pytorch imple. fix batch inference skipping last words in shorter sentences issue #1039 #1179 2025-10-24 05:50:25 +00:00
Yushen CHEN
f2a4f8581f Update runtime README 2025-10-22 08:37:32 +08:00
SWivid
a17c5ae435 pytorch imple.: fix batch 1 inference from last commit 2025-10-22 00:31:56 +00:00
SWivid
a0b8fb5df2 runtime trtllm: minor fixes. pytorch: update text_embedding logic to correct v0 batching. 2025-10-22 00:19:45 +00:00
SWivid
c8bfc3aa3d runtime trtllm: support v1 and custom 2025-10-21 22:02:25 +00:00