SWivid
5005714c4c
Remove pydantic<=2.10.6 restriction to fit latest gradio version
2026-03-07 19:37:59 +08:00
Yushen CHEN
4533426c72
Bump version from 1.1.16 to 1.1.17
1.1.17
2026-03-04 19:34:03 +08:00
Zhikang Niu-SII
b5ab1afa16
Merge pull request #1270 from ZhikangNiu/main
...
- Use fused=True for AdamW by default
- Warn on torch attention mask memory usage `if attn_backend == "torch" and attn_mask_enabled`
---------
Co-authored-by: SWivid <swivid@qq.com >
2026-03-04 19:31:52 +08:00
Yushen CHEN
ab75dc2837
Merge pull request #1271 from mlxu995/patch-1
...
Add show_info parameter to preprocess_ref_audio_text
2026-03-04 18:56:46 +08:00
Menglong Xu
4361b0b94f
Add show_info parameter to preprocess_ref_audio_text
2026-03-04 17:04:20 +08:00
Yushen CHEN
097772c917
Merge pull request #1269 from ZhikangNiu/main
...
feat:add mmdit flash attn support
fix: autocast when use flash_attn to enable log_sample
2026-02-27 01:20:09 +08:00
ZhikangNiu
76c00b127e
when use flash_attn, log_sample should under autocast context
2026-02-25 08:10:00 +08:00
ZhikangNiu
d7c7a117fa
feat:add mmdit flash attn support
2026-02-23 20:39:35 +08:00
Yushen CHEN
54c50eb8f6
Bump version from 1.1.15 to 1.1.16
1.1.16
2026-02-16 12:37:19 +08:00
Yushen CHEN
65250152da
Merge pull request #1267 from QingyuLiu0521/qyl/pr-dit-only
...
Optimize DiT text embedding with batched per-sample seq handling
2026-02-16 12:28:17 +08:00
QingyuLiu0521
c817d6a21d
Unify seq_len naming in DiT get_input_embed
2026-02-15 23:24:11 -05:00
Yushen CHEN
04459f71e6
Merge pull request #1266 from ZhikangNiu/main
...
Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
2026-02-16 11:10:17 +08:00
QingyuLiu0521
57dc698c16
Apply ruff formatting
2026-02-15 21:41:17 -05:00
QingyuLiu0521
6b6ce47d2e
Optimize DiT text embedding with batched per-sample seq handling
2026-02-15 21:31:19 -05:00
ZhikangNiu
6768b1bcff
Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-16 10:05:02 +08:00
Yushen CHEN
ecfdccb890
Merge pull request #1265 from ZhikangNiu/main
...
Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.
2026-02-14 11:14:57 +08:00
ZhikangNiu
bb5526fc5b
Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.
2026-02-14 11:05:08 +08:00
SWivid
655fbca552
Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0
2026-02-02 17:48:16 +08:00
Yushen CHEN
fc0fa67a03
Update eval README with ctranslate2 installation instructions
...
Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.
2026-01-28 19:15:57 +08:00
Yushen CHEN
a3c2ea9784
Merge pull request #1261 from ZhikangNiu/main
...
Ignore padding at the end of the GT mel spectrogram when training sample
2026-01-26 18:58:47 +08:00
ZhikangNiu
d71a69d528
Ignore padding at the end of the ground truth mel spectrogram when training sample
2026-01-26 09:40:37 +08:00
Yushen CHEN
b9d923088c
Increase default max_duration from 4096 to 65536 in cfm.py ( #1260 )
2026-01-25 23:58:21 +08:00
Yushen CHEN
c279a2b7d5
Merge pull request #1256 from ZhikangNiu/main
...
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 20:00:54 +08:00
ZhikangNiu
5d473e980c
add tqdm in convert text to pinyin
2026-01-22 16:34:39 +08:00
ZhikangNiu
2aefa7c5f7
fix many tensorboard writer and only log in main_process
2026-01-22 13:36:09 +08:00
ZhikangNiu
97fdc7fbb4
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 12:27:23 +08:00
SWivid
1d2f7c5389
Formatting
2026-01-21 13:42:19 +00:00
Yushen CHEN
37a2633d35
Fix epoch updates count logic as drop_last
2026-01-21 21:36:57 +08:00
Raivis Dejus
bc1df7a4fa
Adding support for hf:// links on CLI ( #1252 )
2026-01-21 16:38:30 +08:00
Raivis Dejus
eca786ee0c
Merge pull request #1250 from raivisdejus/add-latvian-community-model
...
Adding Latvian model to shared community models list
2026-01-17 19:29:23 +08:00
Yushen CHEN
27e20fcf39
Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain
...
Fix speech editing boundary artifacts by working in mel domain
2025-12-26 17:35:57 +08:00
acadarmeria
dff57ebd2a
Fix speech editing boundary artifacts by working in mel domain
...
Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.
This commit refactors the approach to work in mel domain:
- Compute mel spectrogram on the clean original audio first
- Insert zero frames in mel domain instead of zero samples in wav domain
- Use frame-level granularity throughout for consistency
Benefits:
- Eliminates boundary artifacts
- More consistent behavior regardless of small float variations in input times
- Cleaner edit boundaries
Changes to speech_edit.py (lines 148-220):
- Convert audio to mel using model.mel_spec() before editing
- Build mel_cond by concatenating original mel frames + zero frames
- Calculate all time-based values at frame level first, then convert to samples
- Pass mel_cond directly to model.sample() instead of raw audio
2025-12-26 08:49:57 +00:00
SWivid
46ccc575c5
v1.1.15 workaround for gr.Accordion default open=False bug ( #1239 )
1.1.15
2025-12-21 15:06:44 +08:00
SWivid
39617fcf7a
v1.1.12 bump gradio from 5.0 to 6.0, several fixes to ensure compatibility with new gradio version
1.1.12
2025-12-20 18:44:43 +08:00
Yushen Chen
5b82f97c26
fix #1239 , use gradio>=6.0; add more clear instruction for ffmpeg installation ( #1234 )
2025-12-20 16:08:13 +08:00
SWivid
9ae46c8360
Replace jieba pkg with rjieba - a jieba-rs Python binding
1.1.10
2025-11-28 13:08:07 +00:00
SWivid
3eecd94baa
support back avg upsampling for batch, cover up non-mask case
2025-11-09 11:56:03 +00:00
SWivid
d9a69452ce
formatting
2025-11-09 18:25:30 +08:00
Yushen CHEN
bc15df2b57
Merge pull request #1212 from QingyuLiu0521/fix/AverageUpsampling
...
Fix Average Upsampling conflict logic, introduced from the previous batch inference fix.
2025-11-09 18:23:38 +08:00
QingyuLiu0521
9b2357a1b9
Fix Average Upsampling
2025-11-08 18:39:06 -05:00
Yushen CHEN
1dcb4e10f7
Add torchcodec dependency to pyproject.toml
2025-11-03 16:44:11 +08:00
SWivid
529d856133
clean-up eval scripts
2025-10-27 14:38:57 +00:00
SWivid
7abadc4c72
fix typo in eval scripts
2025-10-26 14:28:17 +00:00
SWivid
e67d50841e
runtime trtllm: fix batch inference skipping last words in shorter sentences #1039 #1179
2025-10-24 09:12:08 +00:00
SWivid
6b07fb03b2
clean-up ruff lint
2025-10-24 08:30:55 +00:00
SWivid
a051a68552
pytorch imple. fix batch inference skipping last words in shorter sentences issue #1039 #1179
2025-10-24 05:50:25 +00:00
Yushen CHEN
f2a4f8581f
Update runtime README
2025-10-22 08:37:32 +08:00
SWivid
a17c5ae435
pytorch imple.: fix batch 1 inference from last commit
2025-10-22 00:31:56 +00:00
SWivid
a0b8fb5df2
runtime trtllm: minor fixes. pytorch: update text_embedding logic to correct v0 batching.
2025-10-22 00:19:45 +00:00
SWivid
c8bfc3aa3d
runtime trtllm: support v1 and custom
2025-10-21 22:02:25 +00:00