696 Commits

Author SHA1 Message Date
Yushen CHEN 2ae2c9bd9b Merge pull request #1298 from Kaihui-AMD/docs/update-amd-rocm-install
Update AMD ROCm install to 7.2 for RDNA 3.5/4 support
2026-05-18 20:46:11 +08:00
Kaihui-AMD 0556b2c1a5 update AMD ROCm install to 7.2 for RDNA 3.5/4 support
Signed-off-by: Kaihui-AMD <Kaihui.Tang@amd.com>
2026-05-18 16:39:43 +08:00
Yushen CHEN 2f53ded68e Merge pull request #1294 from AAtomical/fix/path-traversal-finetune-gradio
fix: path traversal in finetune Gradio handlers
2026-05-13 10:06:36 +08:00
sysy 25dc4e8668 fix: path traversal in finetune Gradio handlers (closes #1293)
Add _safe_project_path() helper that rejects absolute paths, null bytes,
and path separators, then verifies the resolved path stays within the
intended base directory via realpath + startswith check.

Apply to all 10 sinks in save_settings, load_settings,
create_data_project, vocab_extend, transcribe_all, create_metadata,
and related functions.
2026-05-12 16:04:28 -04:00
SWivid 6f91022519 v1.1.20: refactor cache handling in DiT, MMDiT, and UNetT classes (lazyinit), to fix training bug (EMA deepcopy failure) 1.1.20 2026-04-20 15:28:10 +08:00
Yushen CHEN 650c177b14 Merge pull request #1290 from will422-l/fix/gradio-version-cap
fix: cap Gradio version to <6.11 to prevent UI freeze
Fixes #1289 thanks to @gmtrnv
2026-04-17 12:01:39 +08:00
will422-l 761643b1ff fix: cap Gradio version to <6.11 to prevent UI freeze
Gradio 6.11.0 has a bug in gr.Tabs that causes the UI to completely
freeze when switching tabs. This caps the version to prevent users
from hitting this issue until Gradio fixes it upstream.

Ref: #1289
2026-04-17 09:11:15 +08:00
Yushen CHEN 25874ca255 Bump version from 1.1.18 to 1.1.19 1.1.19 2026-04-16 10:56:40 +08:00
SWivid 428050aa80 Fixes #1287 #1288 2026-04-16 10:49:31 +08:00
Yushen CHEN 22299b38f7 Merge pull request #1285 from ZhikangNiu/main
reuse resamplers and cache vocos MelSpectrogram instances
2026-04-04 16:50:43 +08:00
ZhikangNiu 5486a158d4 reuse resamplers and cache vocos MelSpectrogram instances, it will reduce some training cost 2026-04-04 14:31:00 +08:00
Yushen CHEN 82fc4fe622 Bump version from 1.1.17 to 1.1.18 1.1.18 2026-03-24 20:20:15 +08:00
SWivid 1a63dda3df Several fixes for utils_infer.py; separate streaming and non-streaming func and add back parallelism 2026-03-24 20:03:01 +08:00
Yushen CHEN 2414e3d492 Merge pull request #1281 from zhuxiaoxuhit/fix/remove-ineffective-threadpoolexecutor
remove ineffective ThreadPoolExecutor in infer_batch_process
2026-03-24 19:31:29 +08:00
zhuxiaoxu 543fe4facf remove ineffective ThreadPoolExecutor in infer_batch_process
process_batch is a generator function, so submitting it to a thread pool
only creates a generator object without running any inference code.
all actual work happens sequentially in the main thread when next() is called.
also removes the always-true 'if result:' guard on a generator object.
2026-03-24 16:10:46 +08:00
Yushen CHEN deb5540edb Merge pull request #1280 from ZhikangNiu/main
Add F5TTS v1 Small + LibriTTS training config
2026-03-23 20:14:40 +08:00
ZhikangNiu a25de67cbd F5TTS v1 Small + LibriTTS training config 2026-03-23 16:16:52 +08:00
Karim Ouda 623c96c294 Add Arabic model details to SHARED.md (#1279)
Added Arabic section with details for F5-TTS Small model.
2026-03-16 16:13:45 +08:00
SWivid 5005714c4c Remove pydantic<=2.10.6 restriction to fit latest gradio version 2026-03-07 19:37:59 +08:00
Yushen CHEN 4533426c72 Bump version from 1.1.16 to 1.1.17 1.1.17 2026-03-04 19:34:03 +08:00
Zhikang Niu-SII b5ab1afa16 Merge pull request #1270 from ZhikangNiu/main
- Use fused=True for AdamW by default

- Warn on torch attention mask memory usage `if attn_backend == "torch" and attn_mask_enabled`

---------

Co-authored-by: SWivid <swivid@qq.com>
2026-03-04 19:31:52 +08:00
Yushen CHEN ab75dc2837 Merge pull request #1271 from mlxu995/patch-1
Add show_info parameter to preprocess_ref_audio_text
2026-03-04 18:56:46 +08:00
Menglong Xu 4361b0b94f Add show_info parameter to preprocess_ref_audio_text 2026-03-04 17:04:20 +08:00
Yushen CHEN 097772c917 Merge pull request #1269 from ZhikangNiu/main
feat:add mmdit flash attn support
fix: autocast when use flash_attn to enable log_sample
2026-02-27 01:20:09 +08:00
ZhikangNiu 76c00b127e when use flash_attn, log_sample should under autocast context 2026-02-25 08:10:00 +08:00
ZhikangNiu d7c7a117fa feat:add mmdit flash attn support 2026-02-23 20:39:35 +08:00
Yushen CHEN 54c50eb8f6 Bump version from 1.1.15 to 1.1.16 1.1.16 2026-02-16 12:37:19 +08:00
Yushen CHEN 65250152da Merge pull request #1267 from QingyuLiu0521/qyl/pr-dit-only
Optimize DiT text embedding with batched per-sample seq handling
2026-02-16 12:28:17 +08:00
QingyuLiu0521 c817d6a21d Unify seq_len naming in DiT get_input_embed 2026-02-15 23:24:11 -05:00
Yushen CHEN 04459f71e6 Merge pull request #1266 from ZhikangNiu/main
Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
2026-02-16 11:10:17 +08:00
QingyuLiu0521 57dc698c16 Apply ruff formatting 2026-02-15 21:41:17 -05:00
QingyuLiu0521 6b6ce47d2e Optimize DiT text embedding with batched per-sample seq handling 2026-02-15 21:31:19 -05:00
ZhikangNiu 6768b1bcff Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-16 10:05:02 +08:00
Yushen CHEN ecfdccb890 Merge pull request #1265 from ZhikangNiu/main
Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.
2026-02-14 11:14:57 +08:00
ZhikangNiu bb5526fc5b Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage. 2026-02-14 11:05:08 +08:00
SWivid 655fbca552 Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0 2026-02-02 17:48:16 +08:00
Yushen CHEN fc0fa67a03 Update eval README with ctranslate2 installation instructions
Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.
2026-01-28 19:15:57 +08:00
Yushen CHEN a3c2ea9784 Merge pull request #1261 from ZhikangNiu/main
Ignore padding at the end of the GT mel spectrogram when training sample
2026-01-26 18:58:47 +08:00
ZhikangNiu d71a69d528 Ignore padding at the end of the ground truth mel spectrogram when training sample 2026-01-26 09:40:37 +08:00
Yushen CHEN b9d923088c Increase default max_duration from 4096 to 65536 in cfm.py (#1260) 2026-01-25 23:58:21 +08:00
Yushen CHEN c279a2b7d5 Merge pull request #1256 from ZhikangNiu/main
change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio
2026-01-22 20:00:54 +08:00
ZhikangNiu 5d473e980c add tqdm in convert text to pinyin 2026-01-22 16:34:39 +08:00
ZhikangNiu 2aefa7c5f7 fix many tensorboard writer and only log in main_process 2026-01-22 13:36:09 +08:00
ZhikangNiu 97fdc7fbb4 change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio 2026-01-22 12:27:23 +08:00
SWivid 1d2f7c5389 Formatting 2026-01-21 13:42:19 +00:00
Yushen CHEN 37a2633d35 Fix epoch updates count logic as drop_last 2026-01-21 21:36:57 +08:00
Raivis Dejus bc1df7a4fa Adding support for hf:// links on CLI (#1252) 2026-01-21 16:38:30 +08:00
Raivis Dejus eca786ee0c Merge pull request #1250 from raivisdejus/add-latvian-community-model
Adding Latvian model to shared community models list
2026-01-17 19:29:23 +08:00
Yushen CHEN 27e20fcf39 Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain
Fix speech editing boundary artifacts by working in mel domain
2025-12-26 17:35:57 +08:00
acadarmeria dff57ebd2a Fix speech editing boundary artifacts by working in mel domain
Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.

This commit refactors the approach to work in mel domain:
- Compute mel spectrogram on the clean original audio first
- Insert zero frames in mel domain instead of zero samples in wav domain
- Use frame-level granularity throughout for consistency

Benefits:
- Eliminates boundary artifacts
- More consistent behavior regardless of small float variations in input times
- Cleaner edit boundaries

Changes to speech_edit.py (lines 148-220):
- Convert audio to mel using model.mel_spec() before editing
- Build mel_cond by concatenating original mel frames + zero frames
- Calculate all time-based values at frame level first, then convert to samples
- Pass mel_cond directly to model.sample() instead of raw audio
2025-12-26 08:49:57 +00:00