F5-TTS

mirror of https://github.com/SWivid/F5-TTS.git synced 2026-03-12 21:02:50 -07:00

Author	SHA1	Message	Date
Yushen CHEN	04459f71e6	Merge pull request #1266 from ZhikangNiu/main Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults	2026-02-16 11:10:17 +08:00
ZhikangNiu	6768b1bcff	Make wandb project/run_name/resume_id configurable via Hydra yaml, backward compatible with defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 10:05:02 +08:00
Yushen CHEN	ecfdccb890	Merge pull request #1265 from ZhikangNiu/main Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.	2026-02-14 11:14:57 +08:00
ZhikangNiu	bb5526fc5b	Use torch.utils.checkpoint in mmdit forward loop when enabled to reduce memory usage.	2026-02-14 11:05:08 +08:00
SWivid	655fbca552	Update run_asr_wer method in utils_eval.py for compat with jiwer>=4.0.0	2026-02-02 17:48:16 +08:00
Yushen CHEN	fc0fa67a03	Update eval README with ctranslate2 installation instructions Added installation instructions for ctranslate2 based on CUDA and cuDNN versions.	2026-01-28 19:15:57 +08:00
Yushen CHEN	a3c2ea9784	Merge pull request #1261 from ZhikangNiu/main Ignore padding at the end of the GT mel spectrogram when training sample	2026-01-26 18:58:47 +08:00
ZhikangNiu	d71a69d528	Ignore padding at the end of the ground truth mel spectrogram when training sample	2026-01-26 09:40:37 +08:00
Yushen CHEN	b9d923088c	Increase default max_duration from 4096 to 65536 in cfm.py (#1260 )	2026-01-25 23:58:21 +08:00
Yushen CHEN	c279a2b7d5	Merge pull request #1256 from ZhikangNiu/main change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio	2026-01-22 20:00:54 +08:00
ZhikangNiu	5d473e980c	add tqdm in convert text to pinyin	2026-01-22 16:34:39 +08:00
ZhikangNiu	2aefa7c5f7	fix many tensorboard writer and only log in main_process	2026-01-22 13:36:09 +08:00
ZhikangNiu	97fdc7fbb4	change prepare_csv_wavs from relative path to absolute path and get duration info with soundfile and torchaudio	2026-01-22 12:27:23 +08:00
SWivid	1d2f7c5389	Formatting	2026-01-21 13:42:19 +00:00
Yushen CHEN	37a2633d35	Fix epoch updates count logic as drop_last	2026-01-21 21:36:57 +08:00
Raivis Dejus	bc1df7a4fa	Adding support for hf:// links on CLI (#1252 )	2026-01-21 16:38:30 +08:00
Raivis Dejus	eca786ee0c	Merge pull request #1250 from raivisdejus/add-latvian-community-model Adding Latvian model to shared community models list	2026-01-17 19:29:23 +08:00
Yushen CHEN	27e20fcf39	Merge pull request #1242 from acadarmeria/fix-speech-edit-mel-domain Fix speech editing boundary artifacts by working in mel domain	2025-12-26 17:35:57 +08:00
acadarmeria	dff57ebd2a	Fix speech editing boundary artifacts by working in mel domain Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio. This commit refactors the approach to work in mel domain: - Compute mel spectrogram on the clean original audio first - Insert zero frames in mel domain instead of zero samples in wav domain - Use frame-level granularity throughout for consistency Benefits: - Eliminates boundary artifacts - More consistent behavior regardless of small float variations in input times - Cleaner edit boundaries Changes to speech_edit.py (lines 148-220): - Convert audio to mel using model.mel_spec() before editing - Build mel_cond by concatenating original mel frames + zero frames - Calculate all time-based values at frame level first, then convert to samples - Pass mel_cond directly to model.sample() instead of raw audio	2025-12-26 08:49:57 +00:00
SWivid	46ccc575c5	v1.1.15 workaround for gr.Accordion default open=False bug (#1239 ) 1.1.15	2025-12-21 15:06:44 +08:00
SWivid	39617fcf7a	v1.1.12 bump gradio from 5.0 to 6.0, several fixes to ensure compatibility with new gradio version 1.1.12	2025-12-20 18:44:43 +08:00
Yushen Chen	5b82f97c26	fix #1239 , use gradio>=6.0; add more clear instruction for ffmpeg installation (#1234 )	2025-12-20 16:08:13 +08:00
SWivid	9ae46c8360	Replace jieba pkg with rjieba - a jieba-rs Python binding 1.1.10	2025-11-28 13:08:07 +00:00
SWivid	3eecd94baa	support back avg upsampling for batch, cover up non-mask case	2025-11-09 11:56:03 +00:00
SWivid	d9a69452ce	formatting	2025-11-09 18:25:30 +08:00
Yushen CHEN	bc15df2b57	Merge pull request #1212 from QingyuLiu0521/fix/AverageUpsampling Fix Average Upsampling conflict logic, introduced from the previous batch inference fix.	2025-11-09 18:23:38 +08:00
QingyuLiu0521	9b2357a1b9	Fix Average Upsampling	2025-11-08 18:39:06 -05:00
Yushen CHEN	1dcb4e10f7	Add torchcodec dependency to pyproject.toml	2025-11-03 16:44:11 +08:00
SWivid	529d856133	clean-up eval scripts	2025-10-27 14:38:57 +00:00
SWivid	7abadc4c72	fix typo in eval scripts	2025-10-26 14:28:17 +00:00
SWivid	e67d50841e	runtime trtllm: fix batch inference skipping last words in shorter sentences #1039 #1179	2025-10-24 09:12:08 +00:00
SWivid	6b07fb03b2	clean-up ruff lint	2025-10-24 08:30:55 +00:00
SWivid	a051a68552	pytorch imple. fix batch inference skipping last words in shorter sentences issue #1039 #1179	2025-10-24 05:50:25 +00:00
Yushen CHEN	f2a4f8581f	Update runtime README	2025-10-22 08:37:32 +08:00
SWivid	a17c5ae435	pytorch imple.: fix batch 1 inference from last commit	2025-10-22 00:31:56 +00:00
SWivid	a0b8fb5df2	runtime trtllm: minor fixes. pytorch: update text_embedding logic to correct v0 batching.	2025-10-22 00:19:45 +00:00
SWivid	c8bfc3aa3d	runtime trtllm: support v1 and custom	2025-10-21 22:02:25 +00:00
SWivid	8d3ec72159	runtime trtllm: clean-up v0 code, several fixes.	2025-10-20 10:30:58 +00:00
SWivid	65ada48a62	set attn related default value for unet-t backbone: #1192	2025-10-09 06:51:25 +00:00
SWivid	77d3ec623b	v1.1.9 1.1.9	2025-09-13 13:42:33 +08:00
SWivid	186799d6dc	remove numpy<=1.26.4 for python_version>=3.11 #1162 ; update links	2025-09-13 13:40:55 +08:00
Yushen CHEN	31bb78f2ab	Update badge links	2025-09-03 15:12:24 +08:00
SWivid	e61824009a	v1.1.8 1.1.8	2025-08-28 12:33:37 +00:00
SWivid	06a74910bd	add option for text embedding late average upsampling	2025-08-28 11:46:11 +00:00
Yushen CHEN	ac3c43595c	delete .github/workflows/sync-hf.yaml for online space stablility	2025-08-27 06:52:18 +08:00
Jim	605fa13b42	Fix raw.arrow missing rows (#1145 ) * fix raw.arrow missing rows --------- Co-authored-by: SWivid <swivid@qq.com>	2025-07-22 19:38:44 +08:00
Yushen CHEN	5f35f27230	update pyproject.toml	2025-07-15 17:28:41 +08:00
Yushen CHEN	c96c3aeed8	Update pyproject.toml 1.1.7	2025-07-14 14:36:26 +08:00
Yushen CHEN	9b60fe6a34	update pyproject.toml, set gradio<=5.35.0 until fix #1126	2025-07-14 14:29:19 +08:00
SWivid	a275798a2f	last fix patch-1	2025-07-08 18:44:47 +08:00

1 2 3 4 5 ...

665 Commits