[PATCH v2] f2fs: fix potential deadlock in f2fs_balance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()
@ 2026-04-26  9:32 ruipengqi
  2026-04-27  8:38 ` Chao Yu
  0 siblings, 1 reply; 5+ messages in thread
From: ruipengqi @ 2026-04-26  9:32 UTC (permalink / raw)
  To: jaegeuk; +Cc: chao, linux-f2fs-devel, linux-kernel, Ruipeng Qi

From: Ruipeng Qi <ruipengqi3@gmail.com>

When the f2fs filesystem space is nearly exhausted, we encounter deadlock
issues as below:

INFO: task A:1890 blocked for more than 120 seconds.
      Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153   flags:0x00000204
Call trace:
 __switch_to+0xf4/0x158
 __schedule+0x27c/0x908
 schedule+0x3c/0x118
 io_schedule+0x44/0x68
 folio_wait_bit_common+0x174/0x370
 folio_wait_bit+0x20/0x38
 folio_wait_writeback+0x54/0xc8
 truncate_inode_partial_folio+0x70/0x1e0
 truncate_inode_pages_range+0x1b0/0x450
 truncate_pagecache+0x54/0x88
 f2fs_file_write_iter+0x3e8/0xb80
 do_iter_readv_writev+0xf0/0x1e0
 vfs_writev+0x138/0x2c8
 do_writev+0x88/0x130
 __arm64_sys_writev+0x28/0x40
 invoke_syscall+0x50/0x120
 el0_svc_common.constprop.0+0xc8/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x30/0xf8
 el0t_64_sync_handler+0x120/0x130
 el0t_64_sync+0x190/0x198

INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
      Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
 __switch_to+0xf4/0x158
 __schedule+0x27c/0x908
 schedule+0x3c/0x118
 io_schedule+0x44/0x68
 folio_wait_bit_common+0x174/0x370
 __filemap_get_folio+0x214/0x348
 pagecache_get_page+0x20/0x70
 f2fs_get_read_data_page+0x150/0x3e8
 f2fs_get_lock_data_page+0x2c/0x160
 move_data_page+0x50/0x478
 do_garbage_collect+0xd38/0x1528
 f2fs_gc+0x240/0x7e0
 f2fs_balance_fs+0x1a0/0x208
 f2fs_write_single_data_page+0x6e4/0x730  //0xfffffe0d6ca08300
 f2fs_write_cache_pages+0x378/0x9b0
 f2fs_write_data_pages+0x2e4/0x388
 do_writepages+0x8c/0x2c8
 __writeback_single_inode+0x4c/0x498
 writeback_sb_inodes+0x234/0x4a8
 __writeback_inodes_wb+0x58/0x118
 wb_writeback+0x2f8/0x3c0
 wb_workfn+0x2c4/0x508
 process_one_work+0x180/0x408
 worker_thread+0x258/0x368
 kthread+0x118/0x128
 ret_from_fork+0x10/0x200

INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
      Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
 __switch_to+0xf4/0x158
 __schedule+0x27c/0x908
 rt_mutex_schedule+0x30/0x60
 __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
 rwbase_write_lock+0x24c/0x378
 down_write+0x1c/0x30
 f2fs_balance_fs+0x184/0x208
 f2fs_write_inode+0xf4/0x328
 __writeback_single_inode+0x370/0x498
 writeback_sb_inodes+0x234/0x4a8
 __writeback_inodes_wb+0x58/0x118
 wb_writeback+0x2f8/0x3c0
 wb_workfn+0x2c4/0x508
 process_one_work+0x180/0x408
 worker_thread+0x258/0x368
 kthread+0x118/0x128
 ret_from_fork+0x10/0x20

INFO: task B:1902 blocked for more than 120 seconds.
      Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:B     state:D stack:0     pid:1902  tgid:1626  ppid:1153   flags:0x0000020c
Call trace:
 __switch_to+0xf4/0x158
 __schedule+0x27c/0x908
 rt_mutex_schedule+0x30/0x60
 __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
 rwbase_write_lock+0x24c/0x378
 down_write+0x1c/0x30
 f2fs_balance_fs+0x184/0x208
 f2fs_map_blocks+0x94c/0x1110
 f2fs_file_write_iter+0x228/0xb80
 do_iter_readv_writev+0xf0/0x1e0
 vfs_writev+0x138/0x2c8
 do_writev+0x88/0x130
 __arm64_sys_writev+0x28/0x40
 invoke_syscall+0x50/0x120
 el0_svc_common.constprop.0+0xc8/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x30/0xf8
 el0t_64_sync_handler+0x120/0x130
 el0t_64_sync+0x190/0x198

INFO: task sync:2769849 blocked for more than 120 seconds.
      Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
Call trace:
 __switch_to+0xf4/0x158
 __schedule+0x27c/0x908
 schedule+0x3c/0x118
 wb_wait_for_completion+0xb0/0xe8
 sync_inodes_sb+0xc8/0x2b0
 sync_inodes_one_sb+0x24/0x38
 iterate_supers+0xa8/0x138
 ksys_sync+0x54/0xc8
 __arm64_sys_sync+0x18/0x30
 invoke_syscall+0x50/0x120
 el0_svc_common.constprop.0+0xc8/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x30/0xf8
 el0t_64_sync_handler+0x120/0x130
 el0t_64_sync+0x190/0x198

The root cause is a potential deadlock between the following tasks:

kworker/u8:11				Thread A
- f2fs_write_single_data_page
 - f2fs_do_write_data_page
  - folio_start_writeback(X)
  - f2fs_outplace_write_data
   - bio_add_folio(X)
 - folio_unlock(X)
					- truncate_inode_pages_range
					 - __filemap_get_folio(X, FGP_LOCK)
					 - truncate_inode_partial_folio(X)
					  - folio_wait_writeback(X)
 - f2fs_balance_fs
  - f2fs_gc
   - do_garbage_collect
    - move_data_page
     - f2fs_get_lock_data_page
      - __filemap_get_folio(X, FGP_LOCK)

Both threads try to access folio X. Thread A holds the lock but waits
for writeback, while kworker waits for the lock. This causes a deadlock.

Other threads also enter D state, waiting for locks such as gc_lock and
writepages.

OPU/IPU DATA folio are all affected by this issue. To avoid such
potential deadlocks, always commit these cached folios before
triggering f2fs_gc() in f2fs_balance_fs().

v2:
- Commit cached OPU/IPU folios, not just OPU folios as in v1.

Suggested-by: Chao <chao@kernel.org>
Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com>
---
 fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
 fs/f2fs/f2fs.h    |  1 +
 fs/f2fs/segment.c |  9 +++++++++
 3 files changed, 36 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 338df7a2aea6..fd03366b3228 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
 	}
 }
 
+void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
+{
+	struct bio_entry *be, *tmp;
+	struct f2fs_bio_info *io;
+	enum temp_type temp;
+	LIST_HEAD(list);
+
+	for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
+		io = sbi->write_io[DATA] + temp;
+
+		if (list_empty(&io->bio_list))
+			continue;
+
+		f2fs_down_write(&io->bio_list_lock);
+		list_splice_init(&io->bio_list, &list);
+		f2fs_up_write(&io->bio_list_lock);
+
+		list_for_each_entry_safe(be, tmp, &list, list) {
+			f2fs_submit_write_bio(sbi, be->bio, DATA);
+			del_bio_entry(be);
+		}
+
+	}
+
+}
+
 int f2fs_merge_page_bio(struct f2fs_io_info *fio)
 {
 	struct bio *bio = *fio->bio;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index bb34e864d0ef..e9038ab1b2bd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi,
 				struct folio *folio, enum page_type type);
 void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
 					struct bio **bio, struct folio *folio);
+void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
 void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
 int f2fs_submit_page_bio(struct f2fs_io_info *fio);
 int f2fs_merge_page_bio(struct f2fs_io_info *fio);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6a97fe76712b..856ffe91b94f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
 		io_schedule();
 		finish_wait(&sbi->gc_thread->fggc_wq, &wait);
 	} else {
+
+		/*
+		 * Submit all cached OPU/IPU DATA bios before triggering
+		 * foreground GC to avoid potential deadlocks.
+		 */
+
+		f2fs_submit_merged_write(sbi, DATA);
+		f2fs_submit_all_merged_ipu_writes(sbi);
+
 		struct f2fs_gc_control gc_control = {
 			.victim_segno = NULL_SEGNO,
 			.init_gc_type = f2fs_sb_has_blkzoned(sbi) ?
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()
  2026-04-26  9:32 [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs() ruipengqi
@ 2026-04-27  8:38 ` Chao Yu
  2026-04-29  3:39   ` Ruipeng Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Chao Yu @ 2026-04-27  8:38 UTC (permalink / raw)
  To: ruipengqi, jaegeuk; +Cc: chao, linux-f2fs-devel, linux-kernel

On 4/26/26 17:32, ruipengqi wrote:
> From: Ruipeng Qi <ruipengqi3@gmail.com>
> 
> When the f2fs filesystem space is nearly exhausted, we encounter deadlock
> issues as below:
> 
> INFO: task A:1890 blocked for more than 120 seconds.
>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153   flags:0x00000204
> Call trace:
>   __switch_to+0xf4/0x158
>   __schedule+0x27c/0x908
>   schedule+0x3c/0x118
>   io_schedule+0x44/0x68
>   folio_wait_bit_common+0x174/0x370
>   folio_wait_bit+0x20/0x38
>   folio_wait_writeback+0x54/0xc8
>   truncate_inode_partial_folio+0x70/0x1e0
>   truncate_inode_pages_range+0x1b0/0x450
>   truncate_pagecache+0x54/0x88
>   f2fs_file_write_iter+0x3e8/0xb80
>   do_iter_readv_writev+0xf0/0x1e0
>   vfs_writev+0x138/0x2c8
>   do_writev+0x88/0x130
>   __arm64_sys_writev+0x28/0x40
>   invoke_syscall+0x50/0x120
>   el0_svc_common.constprop.0+0xc8/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x30/0xf8
>   el0t_64_sync_handler+0x120/0x130
>   el0t_64_sync+0x190/0x198
> 
> INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
> Workqueue: writeback wb_workfn (flush-254:0)
> Call trace:
>   __switch_to+0xf4/0x158
>   __schedule+0x27c/0x908
>   schedule+0x3c/0x118
>   io_schedule+0x44/0x68
>   folio_wait_bit_common+0x174/0x370
>   __filemap_get_folio+0x214/0x348
>   pagecache_get_page+0x20/0x70
>   f2fs_get_read_data_page+0x150/0x3e8
>   f2fs_get_lock_data_page+0x2c/0x160
>   move_data_page+0x50/0x478
>   do_garbage_collect+0xd38/0x1528
>   f2fs_gc+0x240/0x7e0
>   f2fs_balance_fs+0x1a0/0x208
>   f2fs_write_single_data_page+0x6e4/0x730  //0xfffffe0d6ca08300
>   f2fs_write_cache_pages+0x378/0x9b0
>   f2fs_write_data_pages+0x2e4/0x388
>   do_writepages+0x8c/0x2c8
>   __writeback_single_inode+0x4c/0x498
>   writeback_sb_inodes+0x234/0x4a8
>   __writeback_inodes_wb+0x58/0x118
>   wb_writeback+0x2f8/0x3c0
>   wb_workfn+0x2c4/0x508
>   process_one_work+0x180/0x408
>   worker_thread+0x258/0x368
>   kthread+0x118/0x128
>   ret_from_fork+0x10/0x200
> 
> INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
> Workqueue: writeback wb_workfn (flush-254:0)
> Call trace:
>   __switch_to+0xf4/0x158
>   __schedule+0x27c/0x908
>   rt_mutex_schedule+0x30/0x60
>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>   rwbase_write_lock+0x24c/0x378
>   down_write+0x1c/0x30
>   f2fs_balance_fs+0x184/0x208
>   f2fs_write_inode+0xf4/0x328
>   __writeback_single_inode+0x370/0x498
>   writeback_sb_inodes+0x234/0x4a8
>   __writeback_inodes_wb+0x58/0x118
>   wb_writeback+0x2f8/0x3c0
>   wb_workfn+0x2c4/0x508
>   process_one_work+0x180/0x408
>   worker_thread+0x258/0x368
>   kthread+0x118/0x128
>   ret_from_fork+0x10/0x20
> 
> INFO: task B:1902 blocked for more than 120 seconds.
>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:B     state:D stack:0     pid:1902  tgid:1626  ppid:1153   flags:0x0000020c
> Call trace:
>   __switch_to+0xf4/0x158
>   __schedule+0x27c/0x908
>   rt_mutex_schedule+0x30/0x60
>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>   rwbase_write_lock+0x24c/0x378
>   down_write+0x1c/0x30
>   f2fs_balance_fs+0x184/0x208
>   f2fs_map_blocks+0x94c/0x1110
>   f2fs_file_write_iter+0x228/0xb80
>   do_iter_readv_writev+0xf0/0x1e0
>   vfs_writev+0x138/0x2c8
>   do_writev+0x88/0x130
>   __arm64_sys_writev+0x28/0x40
>   invoke_syscall+0x50/0x120
>   el0_svc_common.constprop.0+0xc8/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x30/0xf8
>   el0t_64_sync_handler+0x120/0x130
>   el0t_64_sync+0x190/0x198
> 
> INFO: task sync:2769849 blocked for more than 120 seconds.
>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
> Call trace:
>   __switch_to+0xf4/0x158
>   __schedule+0x27c/0x908
>   schedule+0x3c/0x118
>   wb_wait_for_completion+0xb0/0xe8
>   sync_inodes_sb+0xc8/0x2b0
>   sync_inodes_one_sb+0x24/0x38
>   iterate_supers+0xa8/0x138
>   ksys_sync+0x54/0xc8
>   __arm64_sys_sync+0x18/0x30
>   invoke_syscall+0x50/0x120
>   el0_svc_common.constprop.0+0xc8/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x30/0xf8
>   el0t_64_sync_handler+0x120/0x130
>   el0t_64_sync+0x190/0x198
> 
> The root cause is a potential deadlock between the following tasks:
> 
> kworker/u8:11				Thread A
> - f2fs_write_single_data_page
>   - f2fs_do_write_data_page
>    - folio_start_writeback(X)
>    - f2fs_outplace_write_data
>     - bio_add_folio(X)
>   - folio_unlock(X)
> 					- truncate_inode_pages_range
> 					 - __filemap_get_folio(X, FGP_LOCK)
> 					 - truncate_inode_partial_folio(X)
> 					  - folio_wait_writeback(X)
>   - f2fs_balance_fs
>    - f2fs_gc
>     - do_garbage_collect
>      - move_data_page
>       - f2fs_get_lock_data_page
>        - __filemap_get_folio(X, FGP_LOCK)
> 
> Both threads try to access folio X. Thread A holds the lock but waits
> for writeback, while kworker waits for the lock. This causes a deadlock.
> 
> Other threads also enter D state, waiting for locks such as gc_lock and
> writepages.
> 
> OPU/IPU DATA folio are all affected by this issue. To avoid such
> potential deadlocks, always commit these cached folios before
> triggering f2fs_gc() in f2fs_balance_fs().
> 
> v2:
> - Commit cached OPU/IPU folios, not just OPU folios as in v1.
> 
> Suggested-by: Chao <chao@kernel.org>
> Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com>
> ---
>   fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
>   fs/f2fs/f2fs.h    |  1 +
>   fs/f2fs/segment.c |  9 +++++++++
>   3 files changed, 36 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 338df7a2aea6..fd03366b3228 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>   	}
>   }
>   
> +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
> +{
> +	struct bio_entry *be, *tmp;
> +	struct f2fs_bio_info *io;
> +	enum temp_type temp;
> +	LIST_HEAD(list);
> +
> +	for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
> +		io = sbi->write_io[DATA] + temp;
> +
> +		if (list_empty(&io->bio_list))
> +			continue;

Needs to be covered w/ bio_list_lock to avoid race condition.

> +
> +		f2fs_down_write(&io->bio_list_lock);
> +		list_splice_init(&io->bio_list, &list);
> +		f2fs_up_write(&io->bio_list_lock);
> +
> +		list_for_each_entry_safe(be, tmp, &list, list) {
> +			f2fs_submit_write_bio(sbi, be->bio, DATA);
> +			del_bio_entry(be);
> +		}
> +

Unnecessary blank line.

Thanks,

> +	}
> +
> +}
> +
>   int f2fs_merge_page_bio(struct f2fs_io_info *fio)
>   {
>   	struct bio *bio = *fio->bio;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index bb34e864d0ef..e9038ab1b2bd 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi,
>   				struct folio *folio, enum page_type type);
>   void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>   					struct bio **bio, struct folio *folio);
> +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
>   void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
>   int f2fs_submit_page_bio(struct f2fs_io_info *fio);
>   int f2fs_merge_page_bio(struct f2fs_io_info *fio);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 6a97fe76712b..856ffe91b94f 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
>   		io_schedule();
>   		finish_wait(&sbi->gc_thread->fggc_wq, &wait);
>   	} else {
> +
> +		/*
> +		 * Submit all cached OPU/IPU DATA bios before triggering
> +		 * foreground GC to avoid potential deadlocks.
> +		 */
> +
> +		f2fs_submit_merged_write(sbi, DATA);
> +		f2fs_submit_all_merged_ipu_writes(sbi);
> +
>   		struct f2fs_gc_control gc_control = {
>   			.victim_segno = NULL_SEGNO,
>   			.init_gc_type = f2fs_sb_has_blkzoned(sbi) ?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()
  2026-04-27  8:38 ` Chao Yu
@ 2026-04-29  3:39   ` Ruipeng Qi
  2026-04-29  7:59     ` Chao Yu
  0 siblings, 1 reply; 5+ messages in thread
From: Ruipeng Qi @ 2026-04-29  3:39 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-f2fs-devel, linux-kernel


On 2026/4/27 16:38, Chao Yu wrote:
> On 4/26/26 17:32, ruipengqi wrote:
>> From: Ruipeng Qi <ruipengqi3@gmail.com>
>>
>> When the f2fs filesystem space is nearly exhausted, we encounter 
>> deadlock
>> issues as below:
>>
>> INFO: task A:1890 blocked for more than 120 seconds.
>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> message.
>> task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153 
>> flags:0x00000204
>> Call trace:
>>   __switch_to+0xf4/0x158
>>   __schedule+0x27c/0x908
>>   schedule+0x3c/0x118
>>   io_schedule+0x44/0x68
>>   folio_wait_bit_common+0x174/0x370
>>   folio_wait_bit+0x20/0x38
>>   folio_wait_writeback+0x54/0xc8
>>   truncate_inode_partial_folio+0x70/0x1e0
>>   truncate_inode_pages_range+0x1b0/0x450
>>   truncate_pagecache+0x54/0x88
>>   f2fs_file_write_iter+0x3e8/0xb80
>>   do_iter_readv_writev+0xf0/0x1e0
>>   vfs_writev+0x138/0x2c8
>>   do_writev+0x88/0x130
>>   __arm64_sys_writev+0x28/0x40
>>   invoke_syscall+0x50/0x120
>>   el0_svc_common.constprop.0+0xc8/0xf0
>>   do_el0_svc+0x24/0x38
>>   el0_svc+0x30/0xf8
>>   el0t_64_sync_handler+0x120/0x130
>>   el0t_64_sync+0x190/0x198
>>
>> INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> message.
>> task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 
>> ppid:2      flags:0x00000208
>> Workqueue: writeback wb_workfn (flush-254:0)
>> Call trace:
>>   __switch_to+0xf4/0x158
>>   __schedule+0x27c/0x908
>>   schedule+0x3c/0x118
>>   io_schedule+0x44/0x68
>>   folio_wait_bit_common+0x174/0x370
>>   __filemap_get_folio+0x214/0x348
>>   pagecache_get_page+0x20/0x70
>>   f2fs_get_read_data_page+0x150/0x3e8
>>   f2fs_get_lock_data_page+0x2c/0x160
>>   move_data_page+0x50/0x478
>>   do_garbage_collect+0xd38/0x1528
>>   f2fs_gc+0x240/0x7e0
>>   f2fs_balance_fs+0x1a0/0x208
>>   f2fs_write_single_data_page+0x6e4/0x730  //0xfffffe0d6ca08300
>>   f2fs_write_cache_pages+0x378/0x9b0
>>   f2fs_write_data_pages+0x2e4/0x388
>>   do_writepages+0x8c/0x2c8
>>   __writeback_single_inode+0x4c/0x498
>>   writeback_sb_inodes+0x234/0x4a8
>>   __writeback_inodes_wb+0x58/0x118
>>   wb_writeback+0x2f8/0x3c0
>>   wb_workfn+0x2c4/0x508
>>   process_one_work+0x180/0x408
>>   worker_thread+0x258/0x368
>>   kthread+0x118/0x128
>>   ret_from_fork+0x10/0x200
>>
>> INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> message.
>> task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 
>> ppid:2      flags:0x00000208
>> Workqueue: writeback wb_workfn (flush-254:0)
>> Call trace:
>>   __switch_to+0xf4/0x158
>>   __schedule+0x27c/0x908
>>   rt_mutex_schedule+0x30/0x60
>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>   rwbase_write_lock+0x24c/0x378
>>   down_write+0x1c/0x30
>>   f2fs_balance_fs+0x184/0x208
>>   f2fs_write_inode+0xf4/0x328
>>   __writeback_single_inode+0x370/0x498
>>   writeback_sb_inodes+0x234/0x4a8
>>   __writeback_inodes_wb+0x58/0x118
>>   wb_writeback+0x2f8/0x3c0
>>   wb_workfn+0x2c4/0x508
>>   process_one_work+0x180/0x408
>>   worker_thread+0x258/0x368
>>   kthread+0x118/0x128
>>   ret_from_fork+0x10/0x20
>>
>> INFO: task B:1902 blocked for more than 120 seconds.
>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> message.
>> task:B     state:D stack:0     pid:1902  tgid:1626  ppid:1153 
>> flags:0x0000020c
>> Call trace:
>>   __switch_to+0xf4/0x158
>>   __schedule+0x27c/0x908
>>   rt_mutex_schedule+0x30/0x60
>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>   rwbase_write_lock+0x24c/0x378
>>   down_write+0x1c/0x30
>>   f2fs_balance_fs+0x184/0x208
>>   f2fs_map_blocks+0x94c/0x1110
>>   f2fs_file_write_iter+0x228/0xb80
>>   do_iter_readv_writev+0xf0/0x1e0
>>   vfs_writev+0x138/0x2c8
>>   do_writev+0x88/0x130
>>   __arm64_sys_writev+0x28/0x40
>>   invoke_syscall+0x50/0x120
>>   el0_svc_common.constprop.0+0xc8/0xf0
>>   do_el0_svc+0x24/0x38
>>   el0_svc+0x30/0xf8
>>   el0t_64_sync_handler+0x120/0x130
>>   el0t_64_sync+0x190/0x198
>>
>> INFO: task sync:2769849 blocked for more than 120 seconds.
>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> message.
>> task:sync            state:D stack:0     pid:2769849 tgid:2769849 
>> ppid:736    flags:0x0000020c
>> Call trace:
>>   __switch_to+0xf4/0x158
>>   __schedule+0x27c/0x908
>>   schedule+0x3c/0x118
>>   wb_wait_for_completion+0xb0/0xe8
>>   sync_inodes_sb+0xc8/0x2b0
>>   sync_inodes_one_sb+0x24/0x38
>>   iterate_supers+0xa8/0x138
>>   ksys_sync+0x54/0xc8
>>   __arm64_sys_sync+0x18/0x30
>>   invoke_syscall+0x50/0x120
>>   el0_svc_common.constprop.0+0xc8/0xf0
>>   do_el0_svc+0x24/0x38
>>   el0_svc+0x30/0xf8
>>   el0t_64_sync_handler+0x120/0x130
>>   el0t_64_sync+0x190/0x198
>>
>> The root cause is a potential deadlock between the following tasks:
>>
>> kworker/u8:11                Thread A
>> - f2fs_write_single_data_page
>>   - f2fs_do_write_data_page
>>    - folio_start_writeback(X)
>>    - f2fs_outplace_write_data
>>     - bio_add_folio(X)
>>   - folio_unlock(X)
>>                     - truncate_inode_pages_range
>>                      - __filemap_get_folio(X, FGP_LOCK)
>>                      - truncate_inode_partial_folio(X)
>>                       - folio_wait_writeback(X)
>>   - f2fs_balance_fs
>>    - f2fs_gc
>>     - do_garbage_collect
>>      - move_data_page
>>       - f2fs_get_lock_data_page
>>        - __filemap_get_folio(X, FGP_LOCK)
>>
>> Both threads try to access folio X. Thread A holds the lock but waits
>> for writeback, while kworker waits for the lock. This causes a deadlock.
>>
>> Other threads also enter D state, waiting for locks such as gc_lock and
>> writepages.
>>
>> OPU/IPU DATA folio are all affected by this issue. To avoid such
>> potential deadlocks, always commit these cached folios before
>> triggering f2fs_gc() in f2fs_balance_fs().
>>
>> v2:
>> - Commit cached OPU/IPU folios, not just OPU folios as in v1.
>>
>> Suggested-by: Chao <chao@kernel.org>
>> Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com>
>> ---
>>   fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
>>   fs/f2fs/f2fs.h    |  1 +
>>   fs/f2fs/segment.c |  9 +++++++++
>>   3 files changed, 36 insertions(+)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 338df7a2aea6..fd03366b3228 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct 
>> f2fs_sb_info *sbi,
>>       }
>>   }
>>   +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
>> +{
>> +    struct bio_entry *be, *tmp;
>> +    struct f2fs_bio_info *io;
>> +    enum temp_type temp;
>> +    LIST_HEAD(list);
>> +
>> +    for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
>> +        io = sbi->write_io[DATA] + temp;
>> +
>> +        if (list_empty(&io->bio_list))
>> +            continue;
>
> Needs to be covered w/ bio_list_lock to avoid race condition.

Hi,Chao

The lockless list_empty() here is intentional and acceptable.


If list_empty() returns true but the list becomes non-empty
afterwards (due to race), the newly added bio will be submitted
by the subsequent write path, so no bio will be lost.


Similar patterns exist in the kernel, e.g.:
   net/rfkill/core.c: rfkill_fop_read()
     /* since we re-check and it just compares pointers,
      * using !list_empty() without locking isn't a problem
      */
   fs/f2fs/data.c: f2fs_submit_merged_ipu_write()
     list_empty() is also used without holding bio_list_lock
     as a lockless pre-check


If you'd prefer, we can add a comment to make the intent clear:

     /* list_empty() without lock is safe here - READ_ONCE()
      * ensures pointer read atomicity. A false negative is
      * acceptable since any bio added concurrently will be
      * submitted by the next write path.
      */
     if (list_empty(&io->bio_list))
         continue;
>
>> +
>> +        f2fs_down_write(&io->bio_list_lock);
>> +        list_splice_init(&io->bio_list, &list);
>> +        f2fs_up_write(&io->bio_list_lock);
>> +
>> +        list_for_each_entry_safe(be, tmp, &list, list) {
>> +            f2fs_submit_write_bio(sbi, be->bio, DATA);
>> +            del_bio_entry(be);
>> +        }
>> +
>
> Unnecessary blank line.
>
> Thanks,

Thanks for your correction. Will fix in v3.
     v3:
     - Fixed minor grammatical issues
     - Add comment on lockless list_empty() to explain why it is safe
   without holding bio_list_lock


Thanks,

>
>> +    }
>> +
>> +}
>> +
>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio)
>>   {
>>       struct bio *bio = *fio->bio;
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index bb34e864d0ef..e9038ab1b2bd 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct 
>> f2fs_sb_info *sbi,
>>                   struct folio *folio, enum page_type type);
>>   void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>>                       struct bio **bio, struct folio *folio);
>> +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
>>   void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
>>   int f2fs_submit_page_bio(struct f2fs_io_info *fio);
>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio);
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 6a97fe76712b..856ffe91b94f 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, 
>> bool need)
>>           io_schedule();
>>           finish_wait(&sbi->gc_thread->fggc_wq, &wait);
>>       } else {
>> +
>> +        /*
>> +         * Submit all cached OPU/IPU DATA bios before triggering
>> +         * foreground GC to avoid potential deadlocks.
>> +         */
>> +
>> +        f2fs_submit_merged_write(sbi, DATA);
>> +        f2fs_submit_all_merged_ipu_writes(sbi);
>> +
>>           struct f2fs_gc_control gc_control = {
>>               .victim_segno = NULL_SEGNO,
>>               .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()
  2026-04-29  3:39   ` Ruipeng Qi
@ 2026-04-29  7:59     ` Chao Yu
  2026-05-02 12:41       ` Ruipeng Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Chao Yu @ 2026-04-29  7:59 UTC (permalink / raw)
  To: Ruipeng Qi, jaegeuk; +Cc: chao, linux-f2fs-devel, linux-kernel

On 4/29/26 11:39, Ruipeng Qi wrote:
> 
> On 2026/4/27 16:38, Chao Yu wrote:
>> On 4/26/26 17:32, ruipengqi wrote:
>>> From: Ruipeng Qi <ruipengqi3@gmail.com>
>>>
>>> When the f2fs filesystem space is nearly exhausted, we encounter deadlock
>>> issues as below:
>>>
>>> INFO: task A:1890 blocked for more than 120 seconds.
>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153 flags:0x00000204
>>> Call trace:
>>>   __switch_to+0xf4/0x158
>>>   __schedule+0x27c/0x908
>>>   schedule+0x3c/0x118
>>>   io_schedule+0x44/0x68
>>>   folio_wait_bit_common+0x174/0x370
>>>   folio_wait_bit+0x20/0x38
>>>   folio_wait_writeback+0x54/0xc8
>>>   truncate_inode_partial_folio+0x70/0x1e0
>>>   truncate_inode_pages_range+0x1b0/0x450
>>>   truncate_pagecache+0x54/0x88
>>>   f2fs_file_write_iter+0x3e8/0xb80
>>>   do_iter_readv_writev+0xf0/0x1e0
>>>   vfs_writev+0x138/0x2c8
>>>   do_writev+0x88/0x130
>>>   __arm64_sys_writev+0x28/0x40
>>>   invoke_syscall+0x50/0x120
>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>   do_el0_svc+0x24/0x38
>>>   el0_svc+0x30/0xf8
>>>   el0t_64_sync_handler+0x120/0x130
>>>   el0t_64_sync+0x190/0x198
>>>
>>> INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
>>> Workqueue: writeback wb_workfn (flush-254:0)
>>> Call trace:
>>>   __switch_to+0xf4/0x158
>>>   __schedule+0x27c/0x908
>>>   schedule+0x3c/0x118
>>>   io_schedule+0x44/0x68
>>>   folio_wait_bit_common+0x174/0x370
>>>   __filemap_get_folio+0x214/0x348
>>>   pagecache_get_page+0x20/0x70
>>>   f2fs_get_read_data_page+0x150/0x3e8
>>>   f2fs_get_lock_data_page+0x2c/0x160
>>>   move_data_page+0x50/0x478
>>>   do_garbage_collect+0xd38/0x1528
>>>   f2fs_gc+0x240/0x7e0
>>>   f2fs_balance_fs+0x1a0/0x208
>>>   f2fs_write_single_data_page+0x6e4/0x730  //0xfffffe0d6ca08300
>>>   f2fs_write_cache_pages+0x378/0x9b0
>>>   f2fs_write_data_pages+0x2e4/0x388
>>>   do_writepages+0x8c/0x2c8
>>>   __writeback_single_inode+0x4c/0x498
>>>   writeback_sb_inodes+0x234/0x4a8
>>>   __writeback_inodes_wb+0x58/0x118
>>>   wb_writeback+0x2f8/0x3c0
>>>   wb_workfn+0x2c4/0x508
>>>   process_one_work+0x180/0x408
>>>   worker_thread+0x258/0x368
>>>   kthread+0x118/0x128
>>>   ret_from_fork+0x10/0x200
>>>
>>> INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
>>> Workqueue: writeback wb_workfn (flush-254:0)
>>> Call trace:
>>>   __switch_to+0xf4/0x158
>>>   __schedule+0x27c/0x908
>>>   rt_mutex_schedule+0x30/0x60
>>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>>   rwbase_write_lock+0x24c/0x378
>>>   down_write+0x1c/0x30
>>>   f2fs_balance_fs+0x184/0x208
>>>   f2fs_write_inode+0xf4/0x328
>>>   __writeback_single_inode+0x370/0x498
>>>   writeback_sb_inodes+0x234/0x4a8
>>>   __writeback_inodes_wb+0x58/0x118
>>>   wb_writeback+0x2f8/0x3c0
>>>   wb_workfn+0x2c4/0x508
>>>   process_one_work+0x180/0x408
>>>   worker_thread+0x258/0x368
>>>   kthread+0x118/0x128
>>>   ret_from_fork+0x10/0x20
>>>
>>> INFO: task B:1902 blocked for more than 120 seconds.
>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:B     state:D stack:0     pid:1902  tgid:1626  ppid:1153 flags:0x0000020c
>>> Call trace:
>>>   __switch_to+0xf4/0x158
>>>   __schedule+0x27c/0x908
>>>   rt_mutex_schedule+0x30/0x60
>>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>>   rwbase_write_lock+0x24c/0x378
>>>   down_write+0x1c/0x30
>>>   f2fs_balance_fs+0x184/0x208
>>>   f2fs_map_blocks+0x94c/0x1110
>>>   f2fs_file_write_iter+0x228/0xb80
>>>   do_iter_readv_writev+0xf0/0x1e0
>>>   vfs_writev+0x138/0x2c8
>>>   do_writev+0x88/0x130
>>>   __arm64_sys_writev+0x28/0x40
>>>   invoke_syscall+0x50/0x120
>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>   do_el0_svc+0x24/0x38
>>>   el0_svc+0x30/0xf8
>>>   el0t_64_sync_handler+0x120/0x130
>>>   el0t_64_sync+0x190/0x198
>>>
>>> INFO: task sync:2769849 blocked for more than 120 seconds.
>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
>>> Call trace:
>>>   __switch_to+0xf4/0x158
>>>   __schedule+0x27c/0x908
>>>   schedule+0x3c/0x118
>>>   wb_wait_for_completion+0xb0/0xe8
>>>   sync_inodes_sb+0xc8/0x2b0
>>>   sync_inodes_one_sb+0x24/0x38
>>>   iterate_supers+0xa8/0x138
>>>   ksys_sync+0x54/0xc8
>>>   __arm64_sys_sync+0x18/0x30
>>>   invoke_syscall+0x50/0x120
>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>   do_el0_svc+0x24/0x38
>>>   el0_svc+0x30/0xf8
>>>   el0t_64_sync_handler+0x120/0x130
>>>   el0t_64_sync+0x190/0x198
>>>
>>> The root cause is a potential deadlock between the following tasks:
>>>
>>> kworker/u8:11                Thread A
>>> - f2fs_write_single_data_page
>>>   - f2fs_do_write_data_page
>>>    - folio_start_writeback(X)
>>>    - f2fs_outplace_write_data
>>>     - bio_add_folio(X)
>>>   - folio_unlock(X)
>>>                     - truncate_inode_pages_range
>>>                      - __filemap_get_folio(X, FGP_LOCK)
>>>                      - truncate_inode_partial_folio(X)
>>>                       - folio_wait_writeback(X)
>>>   - f2fs_balance_fs
>>>    - f2fs_gc
>>>     - do_garbage_collect
>>>      - move_data_page
>>>       - f2fs_get_lock_data_page
>>>        - __filemap_get_folio(X, FGP_LOCK)
>>>
>>> Both threads try to access folio X. Thread A holds the lock but waits
>>> for writeback, while kworker waits for the lock. This causes a deadlock.
>>>
>>> Other threads also enter D state, waiting for locks such as gc_lock and
>>> writepages.
>>>
>>> OPU/IPU DATA folio are all affected by this issue. To avoid such
>>> potential deadlocks, always commit these cached folios before
>>> triggering f2fs_gc() in f2fs_balance_fs().
>>>
>>> v2:
>>> - Commit cached OPU/IPU folios, not just OPU folios as in v1.
>>>
>>> Suggested-by: Chao <chao@kernel.org>
>>> Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com>
>>> ---
>>>   fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
>>>   fs/f2fs/f2fs.h    |  1 +
>>>   fs/f2fs/segment.c |  9 +++++++++
>>>   3 files changed, 36 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 338df7a2aea6..fd03366b3228 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>>>       }
>>>   }
>>>   +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
>>> +{
>>> +    struct bio_entry *be, *tmp;
>>> +    struct f2fs_bio_info *io;
>>> +    enum temp_type temp;
>>> +    LIST_HEAD(list);
>>> +
>>> +    for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
>>> +        io = sbi->write_io[DATA] + temp;
>>> +
>>> +        if (list_empty(&io->bio_list))
>>> +            continue;
>>
>> Needs to be covered w/ bio_list_lock to avoid race condition.
> 
> Hi,Chao
> 
> The lockless list_empty() here is intentional and acceptable.
> 
> 
> If list_empty() returns true but the list becomes non-empty
> afterwards (due to race), the newly added bio will be submitted
> by the subsequent write path, so no bio will be lost.

Ah, right, we only need to submit the folios cached by local thread.

> 
> 
> Similar patterns exist in the kernel, e.g.:
>    net/rfkill/core.c: rfkill_fop_read()
>      /* since we re-check and it just compares pointers,
>       * using !list_empty() without locking isn't a problem
>       */
>    fs/f2fs/data.c: f2fs_submit_merged_ipu_write()
>      list_empty() is also used without holding bio_list_lock
>      as a lockless pre-check
> 
> 
> If you'd prefer, we can add a comment to make the intent clear:
> 
>      /* list_empty() without lock is safe here - READ_ONCE()
>       * ensures pointer read atomicity. A false negative is
>       * acceptable since any bio added concurrently will be
>       * submitted by the next write path.
>       */
>      if (list_empty(&io->bio_list))
>          continue;
>>
>>> +
>>> +        f2fs_down_write(&io->bio_list_lock);
>>> +        list_splice_init(&io->bio_list, &list);
>>> +        f2fs_up_write(&io->bio_list_lock);
>>> +
>>> +        list_for_each_entry_safe(be, tmp, &list, list) {
>>> +            f2fs_submit_write_bio(sbi, be->bio, DATA);
>>> +            del_bio_entry(be);
>>> +        }
>>> +
>>
>> Unnecessary blank line.
>>
>> Thanks,
> 
> Thanks for your correction. Will fix in v3.
>      v3:
>      - Fixed minor grammatical issues
>      - Add comment on lockless list_empty() to explain why it is safe
>    without holding bio_list_lock

Seems fine.

> 
> 
> Thanks,
> 
>>
>>> +    }
>>> +
>>> +}
>>> +
>>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio)
>>>   {
>>>       struct bio *bio = *fio->bio;
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index bb34e864d0ef..e9038ab1b2bd 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi,
>>>                   struct folio *folio, enum page_type type);
>>>   void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>>>                       struct bio **bio, struct folio *folio);
>>> +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
>>>   void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
>>>   int f2fs_submit_page_bio(struct f2fs_io_info *fio);
>>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio);
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 6a97fe76712b..856ffe91b94f 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
>>>           io_schedule();
>>>           finish_wait(&sbi->gc_thread->fggc_wq, &wait);
>>>       } else {
>>> +
>>> +        /*
>>> +         * Submit all cached OPU/IPU DATA bios before triggering
>>> +         * foreground GC to avoid potential deadlocks.
>>> +         */
>>> +
>>> +        f2fs_submit_merged_write(sbi, DATA);
>>> +        f2fs_submit_all_merged_ipu_writes(sbi);

Can we relocate above code to below the variable definitions?

Thanks,

>>> +
>>>           struct f2fs_gc_control gc_control = {
>>>               .victim_segno = NULL_SEGNO,
>>>               .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()
  2026-04-29  7:59     ` Chao Yu
@ 2026-05-02 12:41       ` Ruipeng Qi
  0 siblings, 0 replies; 5+ messages in thread
From: Ruipeng Qi @ 2026-05-02 12:41 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-f2fs-devel, linux-kernel


On 2026/4/29 15:59, Chao Yu wrote:
> On 4/29/26 11:39, Ruipeng Qi wrote:
>>
>> On 2026/4/27 16:38, Chao Yu wrote:
>>> On 4/26/26 17:32, ruipengqi wrote:
>>>> From: Ruipeng Qi <ruipengqi3@gmail.com>
>>>>
>>>> When the f2fs filesystem space is nearly exhausted, we encounter 
>>>> deadlock
>>>> issues as below:
>>>>
>>>> INFO: task A:1890 blocked for more than 120 seconds.
>>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>> message.
>>>> task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153 
>>>> flags:0x00000204
>>>> Call trace:
>>>>   __switch_to+0xf4/0x158
>>>>   __schedule+0x27c/0x908
>>>>   schedule+0x3c/0x118
>>>>   io_schedule+0x44/0x68
>>>>   folio_wait_bit_common+0x174/0x370
>>>>   folio_wait_bit+0x20/0x38
>>>>   folio_wait_writeback+0x54/0xc8
>>>>   truncate_inode_partial_folio+0x70/0x1e0
>>>>   truncate_inode_pages_range+0x1b0/0x450
>>>>   truncate_pagecache+0x54/0x88
>>>>   f2fs_file_write_iter+0x3e8/0xb80
>>>>   do_iter_readv_writev+0xf0/0x1e0
>>>>   vfs_writev+0x138/0x2c8
>>>>   do_writev+0x88/0x130
>>>>   __arm64_sys_writev+0x28/0x40
>>>>   invoke_syscall+0x50/0x120
>>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>>   do_el0_svc+0x24/0x38
>>>>   el0_svc+0x30/0xf8
>>>>   el0t_64_sync_handler+0x120/0x130
>>>>   el0t_64_sync+0x190/0x198
>>>>
>>>> INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
>>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>> message.
>>>> task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 
>>>> ppid:2      flags:0x00000208
>>>> Workqueue: writeback wb_workfn (flush-254:0)
>>>> Call trace:
>>>>   __switch_to+0xf4/0x158
>>>>   __schedule+0x27c/0x908
>>>>   schedule+0x3c/0x118
>>>>   io_schedule+0x44/0x68
>>>>   folio_wait_bit_common+0x174/0x370
>>>>   __filemap_get_folio+0x214/0x348
>>>>   pagecache_get_page+0x20/0x70
>>>>   f2fs_get_read_data_page+0x150/0x3e8
>>>>   f2fs_get_lock_data_page+0x2c/0x160
>>>>   move_data_page+0x50/0x478
>>>>   do_garbage_collect+0xd38/0x1528
>>>>   f2fs_gc+0x240/0x7e0
>>>>   f2fs_balance_fs+0x1a0/0x208
>>>>   f2fs_write_single_data_page+0x6e4/0x730 //0xfffffe0d6ca08300
>>>>   f2fs_write_cache_pages+0x378/0x9b0
>>>>   f2fs_write_data_pages+0x2e4/0x388
>>>>   do_writepages+0x8c/0x2c8
>>>>   __writeback_single_inode+0x4c/0x498
>>>>   writeback_sb_inodes+0x234/0x4a8
>>>>   __writeback_inodes_wb+0x58/0x118
>>>>   wb_writeback+0x2f8/0x3c0
>>>>   wb_workfn+0x2c4/0x508
>>>>   process_one_work+0x180/0x408
>>>>   worker_thread+0x258/0x368
>>>>   kthread+0x118/0x128
>>>>   ret_from_fork+0x10/0x200
>>>>
>>>> INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
>>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>> message.
>>>> task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 
>>>> ppid:2      flags:0x00000208
>>>> Workqueue: writeback wb_workfn (flush-254:0)
>>>> Call trace:
>>>>   __switch_to+0xf4/0x158
>>>>   __schedule+0x27c/0x908
>>>>   rt_mutex_schedule+0x30/0x60
>>>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>>>   rwbase_write_lock+0x24c/0x378
>>>>   down_write+0x1c/0x30
>>>>   f2fs_balance_fs+0x184/0x208
>>>>   f2fs_write_inode+0xf4/0x328
>>>>   __writeback_single_inode+0x370/0x498
>>>>   writeback_sb_inodes+0x234/0x4a8
>>>>   __writeback_inodes_wb+0x58/0x118
>>>>   wb_writeback+0x2f8/0x3c0
>>>>   wb_workfn+0x2c4/0x508
>>>>   process_one_work+0x180/0x408
>>>>   worker_thread+0x258/0x368
>>>>   kthread+0x118/0x128
>>>>   ret_from_fork+0x10/0x20
>>>>
>>>> INFO: task B:1902 blocked for more than 120 seconds.
>>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>> message.
>>>> task:B     state:D stack:0     pid:1902  tgid:1626 ppid:1153 
>>>> flags:0x0000020c
>>>> Call trace:
>>>>   __switch_to+0xf4/0x158
>>>>   __schedule+0x27c/0x908
>>>>   rt_mutex_schedule+0x30/0x60
>>>>   __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
>>>>   rwbase_write_lock+0x24c/0x378
>>>>   down_write+0x1c/0x30
>>>>   f2fs_balance_fs+0x184/0x208
>>>>   f2fs_map_blocks+0x94c/0x1110
>>>>   f2fs_file_write_iter+0x228/0xb80
>>>>   do_iter_readv_writev+0xf0/0x1e0
>>>>   vfs_writev+0x138/0x2c8
>>>>   do_writev+0x88/0x130
>>>>   __arm64_sys_writev+0x28/0x40
>>>>   invoke_syscall+0x50/0x120
>>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>>   do_el0_svc+0x24/0x38
>>>>   el0_svc+0x30/0xf8
>>>>   el0t_64_sync_handler+0x120/0x130
>>>>   el0t_64_sync+0x190/0x198
>>>>
>>>> INFO: task sync:2769849 blocked for more than 120 seconds.
>>>>        Tainted: G           O       6.12.41-g3fe07ddf05ab #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>> message.
>>>> task:sync            state:D stack:0     pid:2769849 tgid:2769849 
>>>> ppid:736    flags:0x0000020c
>>>> Call trace:
>>>>   __switch_to+0xf4/0x158
>>>>   __schedule+0x27c/0x908
>>>>   schedule+0x3c/0x118
>>>>   wb_wait_for_completion+0xb0/0xe8
>>>>   sync_inodes_sb+0xc8/0x2b0
>>>>   sync_inodes_one_sb+0x24/0x38
>>>>   iterate_supers+0xa8/0x138
>>>>   ksys_sync+0x54/0xc8
>>>>   __arm64_sys_sync+0x18/0x30
>>>>   invoke_syscall+0x50/0x120
>>>>   el0_svc_common.constprop.0+0xc8/0xf0
>>>>   do_el0_svc+0x24/0x38
>>>>   el0_svc+0x30/0xf8
>>>>   el0t_64_sync_handler+0x120/0x130
>>>>   el0t_64_sync+0x190/0x198
>>>>
>>>> The root cause is a potential deadlock between the following tasks:
>>>>
>>>> kworker/u8:11                Thread A
>>>> - f2fs_write_single_data_page
>>>>   - f2fs_do_write_data_page
>>>>    - folio_start_writeback(X)
>>>>    - f2fs_outplace_write_data
>>>>     - bio_add_folio(X)
>>>>   - folio_unlock(X)
>>>>                     - truncate_inode_pages_range
>>>>                      - __filemap_get_folio(X, FGP_LOCK)
>>>>                      - truncate_inode_partial_folio(X)
>>>>                       - folio_wait_writeback(X)
>>>>   - f2fs_balance_fs
>>>>    - f2fs_gc
>>>>     - do_garbage_collect
>>>>      - move_data_page
>>>>       - f2fs_get_lock_data_page
>>>>        - __filemap_get_folio(X, FGP_LOCK)
>>>>
>>>> Both threads try to access folio X. Thread A holds the lock but waits
>>>> for writeback, while kworker waits for the lock. This causes a 
>>>> deadlock.
>>>>
>>>> Other threads also enter D state, waiting for locks such as gc_lock 
>>>> and
>>>> writepages.
>>>>
>>>> OPU/IPU DATA folio are all affected by this issue. To avoid such
>>>> potential deadlocks, always commit these cached folios before
>>>> triggering f2fs_gc() in f2fs_balance_fs().
>>>>
>>>> v2:
>>>> - Commit cached OPU/IPU folios, not just OPU folios as in v1.
>>>>
>>>> Suggested-by: Chao <chao@kernel.org>
>>>> Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com>
>>>> ---
>>>>   fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
>>>>   fs/f2fs/f2fs.h    |  1 +
>>>>   fs/f2fs/segment.c |  9 +++++++++
>>>>   3 files changed, 36 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 338df7a2aea6..fd03366b3228 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct 
>>>> f2fs_sb_info *sbi,
>>>>       }
>>>>   }
>>>>   +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
>>>> +{
>>>> +    struct bio_entry *be, *tmp;
>>>> +    struct f2fs_bio_info *io;
>>>> +    enum temp_type temp;
>>>> +    LIST_HEAD(list);
>>>> +
>>>> +    for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
>>>> +        io = sbi->write_io[DATA] + temp;
>>>> +
>>>> +        if (list_empty(&io->bio_list))
>>>> +            continue;
>>>
>>> Needs to be covered w/ bio_list_lock to avoid race condition.
>>
>> Hi,Chao
>>
>> The lockless list_empty() here is intentional and acceptable.
>>
>>
>> If list_empty() returns true but the list becomes non-empty
>> afterwards (due to race), the newly added bio will be submitted
>> by the subsequent write path, so no bio will be lost.
>
> Ah, right, we only need to submit the folios cached by local thread.
>
>>
>>
>> Similar patterns exist in the kernel, e.g.:
>>    net/rfkill/core.c: rfkill_fop_read()
>>      /* since we re-check and it just compares pointers,
>>       * using !list_empty() without locking isn't a problem
>>       */
>>    fs/f2fs/data.c: f2fs_submit_merged_ipu_write()
>>      list_empty() is also used without holding bio_list_lock
>>      as a lockless pre-check
>>
>>
>> If you'd prefer, we can add a comment to make the intent clear:
>>
>>      /* list_empty() without lock is safe here - READ_ONCE()
>>       * ensures pointer read atomicity. A false negative is
>>       * acceptable since any bio added concurrently will be
>>       * submitted by the next write path.
>>       */
>>      if (list_empty(&io->bio_list))
>>          continue;
>>>
>>>> +
>>>> +        f2fs_down_write(&io->bio_list_lock);
>>>> +        list_splice_init(&io->bio_list, &list);
>>>> +        f2fs_up_write(&io->bio_list_lock);
>>>> +
>>>> +        list_for_each_entry_safe(be, tmp, &list, list) {
>>>> +            f2fs_submit_write_bio(sbi, be->bio, DATA);
>>>> +            del_bio_entry(be);
>>>> +        }
>>>> +
>>>
>>> Unnecessary blank line.
>>>
>>> Thanks,
>>
>> Thanks for your correction. Will fix in v3.
>>      v3:
>>      - Fixed minor grammatical issues
>>      - Add comment on lockless list_empty() to explain why it is safe
>>    without holding bio_list_lock
>
> Seems fine.
>
>>
>>
>> Thanks,
>>
>>>
>>>> +    }
>>>> +
>>>> +}
>>>> +
>>>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio)
>>>>   {
>>>>       struct bio *bio = *fio->bio;
>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>> index bb34e864d0ef..e9038ab1b2bd 100644
>>>> --- a/fs/f2fs/f2fs.h
>>>> +++ b/fs/f2fs/f2fs.h
>>>> @@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct 
>>>> f2fs_sb_info *sbi,
>>>>                   struct folio *folio, enum page_type type);
>>>>   void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
>>>>                       struct bio **bio, struct folio *folio);
>>>> +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
>>>>   void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
>>>>   int f2fs_submit_page_bio(struct f2fs_io_info *fio);
>>>>   int f2fs_merge_page_bio(struct f2fs_io_info *fio);
>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>> index 6a97fe76712b..856ffe91b94f 100644
>>>> --- a/fs/f2fs/segment.c
>>>> +++ b/fs/f2fs/segment.c
>>>> @@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, 
>>>> bool need)
>>>>           io_schedule();
>>>>           finish_wait(&sbi->gc_thread->fggc_wq, &wait);
>>>>       } else {
>>>> +
>>>> +        /*
>>>> +         * Submit all cached OPU/IPU DATA bios before triggering
>>>> +         * foreground GC to avoid potential deadlocks.
>>>> +         */
>>>> +
>>>> +        f2fs_submit_merged_write(sbi, DATA);
>>>> +        f2fs_submit_all_merged_ipu_writes(sbi);
>
> Can we relocate above code to below the variable definitions?
>
> Thanks,
>
Hi, Chao

Sure, will fix it in V3.

BTW, To avoid potential deadlocks, this patch submits cached OPU/IPU folios
before triggering f2fs_gc() in f2fs_balance_fs(), which changes the
existing IPU/OPU BIO lifecycle.

For OPU, io->io_rwsem provides the necessary synchronization.
For IPU, io->bio_list_lock ensures race-free submission.
In both cases, new BIOs will be allocated as needed after submission.

I may have missed something in the current implementation.
Your professional review would be much appreciated.

Thanks,

>>>> +
>>>>           struct f2fs_gc_control gc_control = {
>>>>               .victim_segno = NULL_SEGNO,
>>>>               .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?
>>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-02 12:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26  9:32 [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs() ruipengqi
2026-04-27  8:38 ` Chao Yu
2026-04-29  3:39   ` Ruipeng Qi
2026-04-29  7:59     ` Chao Yu
2026-05-02 12:41       ` Ruipeng Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox