F5-TTS

mirror of https://github.com/SWivid/F5-TTS.git synced 2026-06-12 10:01:15 -07:00

Author	SHA1	Message	Date
Yushen CHEN	2ae2c9bd9b	Merge pull request #1298 from Kaihui-AMD/docs/update-amd-rocm-install Update AMD ROCm install to 7.2 for RDNA 3.5/4 support	2026-05-18 20:46:11 +08:00
Kaihui-AMD	0556b2c1a5	update AMD ROCm install to 7.2 for RDNA 3.5/4 support Signed-off-by: Kaihui-AMD <Kaihui.Tang@amd.com>	2026-05-18 16:39:43 +08:00
Yushen CHEN	2f53ded68e	Merge pull request #1294 from AAtomical/fix/path-traversal-finetune-gradio fix: path traversal in finetune Gradio handlers	2026-05-13 10:06:36 +08:00
sysy	25dc4e8668	fix: path traversal in finetune Gradio handlers (closes #1293 ) Add _safe_project_path() helper that rejects absolute paths, null bytes, and path separators, then verifies the resolved path stays within the intended base directory via realpath + startswith check. Apply to all 10 sinks in save_settings, load_settings, create_data_project, vocab_extend, transcribe_all, create_metadata, and related functions.	2026-05-12 16:04:28 -04:00
SWivid	6f91022519	v1.1.20: refactor cache handling in DiT, MMDiT, and UNetT classes (lazyinit), to fix training bug (EMA deepcopy failure) 1.1.20	2026-04-20 15:28:10 +08:00
Yushen CHEN	650c177b14	Merge pull request #1290 from will422-l/fix/gradio-version-cap fix: cap Gradio version to <6.11 to prevent UI freeze Fixes #1289 thanks to @gmtrnv	2026-04-17 12:01:39 +08:00
will422-l	761643b1ff	fix: cap Gradio version to <6.11 to prevent UI freeze Gradio 6.11.0 has a bug in gr.Tabs that causes the UI to completely freeze when switching tabs. This caps the version to prevent users from hitting this issue until Gradio fixes it upstream. Ref: #1289	2026-04-17 09:11:15 +08:00
Yushen CHEN	25874ca255	Bump version from 1.1.18 to 1.1.19 1.1.19	2026-04-16 10:56:40 +08:00
SWivid	428050aa80	Fixes #1287 #1288	2026-04-16 10:49:31 +08:00
Yushen CHEN	22299b38f7	Merge pull request #1285 from ZhikangNiu/main reuse resamplers and cache vocos MelSpectrogram instances	2026-04-04 16:50:43 +08:00
ZhikangNiu	5486a158d4	reuse resamplers and cache vocos MelSpectrogram instances, it will reduce some training cost	2026-04-04 14:31:00 +08:00
Yushen CHEN	82fc4fe622	Bump version from 1.1.17 to 1.1.18 1.1.18	2026-03-24 20:20:15 +08:00
SWivid	1a63dda3df	Several fixes for utils_infer.py; separate streaming and non-streaming func and add back parallelism	2026-03-24 20:03:01 +08:00
Yushen CHEN	2414e3d492	Merge pull request #1281 from zhuxiaoxuhit/fix/remove-ineffective-threadpoolexecutor remove ineffective ThreadPoolExecutor in infer_batch_process	2026-03-24 19:31:29 +08:00
zhuxiaoxu	543fe4facf	remove ineffective ThreadPoolExecutor in infer_batch_process process_batch is a generator function, so submitting it to a thread pool only creates a generator object without running any inference code. all actual work happens sequentially in the main thread when next() is called. also removes the always-true 'if result:' guard on a generator object.	2026-03-24 16:10:46 +08:00
Yushen CHEN	deb5540edb	Merge pull request #1280 from ZhikangNiu/main Add F5TTS v1 Small + LibriTTS training config	2026-03-23 20:14:40 +08:00
ZhikangNiu	a25de67cbd	F5TTS v1 Small + LibriTTS training config	2026-03-23 16:16:52 +08:00
Karim Ouda	623c96c294	Add Arabic model details to SHARED.md (#1279 ) Added Arabic section with details for F5-TTS Small model.	2026-03-16 16:13:45 +08:00
SWivid	5005714c4c	Remove pydantic<=2.10.6 restriction to fit latest gradio version	2026-03-07 19:37:59 +08:00
Yushen CHEN	4533426c72	Bump version from 1.1.16 to 1.1.17 1.1.17	2026-03-04 19:34:03 +08:00
Zhikang Niu-SII	b5ab1afa16	Merge pull request #1270 from ZhikangNiu/main - Use fused=True for AdamW by default - Warn on torch attention mask memory usage `if attn_backend == "torch" and attn_mask_enabled` --------- Co-authored-by: SWivid <swivid@qq.com>	2026-03-04 19:31:52 +08:00
Yushen CHEN	ab75dc2837	Merge pull request #1271 from mlxu995/patch-1 Add show_info parameter to preprocess_ref_audio_text	2026-03-04 18:56:46 +08:00
Menglong Xu	4361b0b94f	Add show_info parameter to preprocess_ref_audio_text	2026-03-04 17:04:20 +08:00
Yushen CHEN	097772c917	Merge pull request #1269 from ZhikangNiu/main feat:add mmdit flash attn support fix: autocast when use flash_attn to enable log_sample	2026-02-27 01:20:09 +08:00
ZhikangNiu	76c00b127e	when use flash_attn, log_sample should under autocast context	2026-02-25 08:10:00 +08:00
ZhikangNiu	d7c7a117fa	feat:add mmdit flash attn support	2026-02-23 20:39:35 +08:00
Yushen CHEN	54c50eb8f6	Bump version from 1.1.15 to 1.1.16 1.1.16	2026-02-16 12:37:19 +08:00
Yushen CHEN	65250152da	Merge pull request #1267 from QingyuLiu0521/qyl/pr-dit-only Optimize DiT text embedding with batched per-sample seq handling	2026-02-16 12:28:17 +08:00
QingyuLiu0521	c817d6a21d	Unify seq_len naming in DiT get_input_embed	2026-02-15 23:24:11 -05:00
Yushen CHEN	04459f71e6	Merge pull request #1266 from ZhikangNiu/main Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults	2026-02-16 11:10:17 +08:00
QingyuLiu0521	57dc698c16	Apply ruff formatting	2026-02-15 21:41:17 -05:00
QingyuLiu0521	6b6ce47d2e	Optimize DiT text embedding with batched per-sample seq handling	2026-02-15 21:31:19 -05:00
ZhikangNiu	6768b1bcff	Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 10:05:02 +08:00
Yushen CHEN	ecfdccb890	Merge pull request #1265 from ZhikangNiu/main Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.	2026-02-14 11:14:57 +08:00
ZhikangNiu	bb5526fc5b	Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.	2026-02-14 11:05:08 +08:00
SWivid	655fbca552	Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0	2026-02-02 17:48:16 +08:00
Yushen CHEN	fc0fa67a03	Update eval README with ctranslate2 installation instructions Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.	2026-01-28 19:15:57 +08:00
Yushen CHEN	a3c2ea9784	Merge pull request #1261 from ZhikangNiu/main Ignore padding at the end of the GT mel spectrogram when training sample	2026-01-26 18:58:47 +08:00
ZhikangNiu	d71a69d528	Ignore padding at the end of the ground truth mel spectrogram when training sample	2026-01-26 09:40:37 +08:00
Yushen CHEN	b9d923088c	Increase default max_duration from 4096 to 65536 in cfm.py (#1260 )	2026-01-25 23:58:21 +08:00
Yushen CHEN	c279a2b7d5	Merge pull request #1256 from ZhikangNiu/main change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio	2026-01-22 20:00:54 +08:00
ZhikangNiu	5d473e980c	add tqdm in convert text to pinyin	2026-01-22 16:34:39 +08:00
ZhikangNiu	2aefa7c5f7	fix many tensorboard writer and only log in main_process	2026-01-22 13:36:09 +08:00
ZhikangNiu	97fdc7fbb4	change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio	2026-01-22 12:27:23 +08:00
SWivid	1d2f7c5389	Formatting	2026-01-21 13:42:19 +00:00
Yushen CHEN	37a2633d35	Fix epoch updates count logic as drop_last	2026-01-21 21:36:57 +08:00
Raivis Dejus	bc1df7a4fa	Adding support for hf:// links on CLI (#1252 )	2026-01-21 16:38:30 +08:00
Raivis Dejus	eca786ee0c	Merge pull request #1250 from raivisdejus/add-latvian-community-model Adding Latvian model to shared community models list	2026-01-17 19:29:23 +08:00
Yushen CHEN	27e20fcf39	Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain Fix speech editing boundary artifacts by working in mel domain	2025-12-26 17:35:57 +08:00
acadarmeria	dff57ebd2a	Fix speech editing boundary artifacts by working in mel domain Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio. This commit refactors the approach to work in mel domain: - Compute mel spectrogram on the clean original audio first - Insert zero frames in mel domain instead of zero samples in wav domain - Use frame-level granularity throughout for consistency Benefits: - Eliminates boundary artifacts - More consistent behavior regardless of small float variations in input times - Cleaner edit boundaries Changes to speech_edit.py (lines 148-220): - Convert audio to mel using model.mel_spec() before editing - Build mel_cond by concatenating original mel frames + zero frames - Calculate all time-based values at frame level first, then convert to samples - Pass mel_cond directly to model.sample() instead of raw audio	2025-12-26 08:49:57 +00:00

1 2 3 4 5 ...

696 Commits