[f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

* [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
@ 2022-12-16 15:50 Chao Yu
  2022-12-16 20:42 ` Eric Biggers
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-12-16 15:50 UTC (permalink / raw)
  To: jaegeuk; +Cc: syzbot+4793f6096d174c90b4f7, linux-kernel, linux-f2fs-devel

There is a potential deadlock reported by syzbot as below:

F2FS-fs (loop2): invalid crc value
F2FS-fs (loop2): Found nat_bits in checkpoint
F2FS-fs (loop2): Mounted with checkpoint version = 48b305e4
======================================================
WARNING: possible circular locking dependency detected
6.1.0-rc8-syzkaller-33330-ga5541c0811a0 #0 Not tainted
------------------------------------------------------
syz-executor.2/32123 is trying to acquire lock:
ffff0000c0e1a608 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x54/0xb4 mm/memory.c:5644

but task is already holding lock:
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2205 [inline]
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_ioc_get_encryption_pwsalt fs/f2fs/file.c:2334 [inline]
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: __f2fs_ioctl+0x1370/0x3318 fs/f2fs/file.c:4151

which lock already depends on the new lock.

Chain exists of:
  &mm->mmap_lock --> &nm_i->nat_tree_lock --> &sbi->sb_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sbi->sb_lock);
                               lock(&nm_i->nat_tree_lock);
                               lock(&sbi->sb_lock);
  lock(&mm->mmap_lock);

Let's try to avoid above deadlock condition by moving __might_fault()
out of sbi->sb_lock coverage.

Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Link: https://lore.kernel.org/linux-f2fs-devel/000000000000cd5fe305ef617fe2@google.com/T/#u
Reported-by: syzbot+4793f6096d174c90b4f7@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
---
 fs/f2fs/file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index cad4bdd6f097..4bc98dbe8292 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2336,6 +2336,7 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg)
 {
 	struct inode *inode = file_inode(filp);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	u8 encrypt_pw_salt[16];
 	int err;
 
 	if (!f2fs_sb_has_encrypt(sbi))
@@ -2360,12 +2361,14 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg)
 		goto out_err;
 	}
 got_it:
-	if (copy_to_user((__u8 __user *)arg, sbi->raw_super->encrypt_pw_salt,
-									16))
-		err = -EFAULT;
+	memcpy(encrypt_pw_salt, sbi->raw_super->encrypt_pw_salt, 16);
 out_err:
 	f2fs_up_write(&sbi->sb_lock);
 	mnt_drop_write_file(filp);
+
+	if (!err && copy_to_user((__u8 __user *)arg, encrypt_pw_salt, 16))
+		err = -EFAULT;
+
 	return err;
 }
 
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-12-16 15:50 [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock Chao Yu
@ 2022-12-16 20:42 ` Eric Biggers
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Biggers @ 2022-12-16 20:42 UTC (permalink / raw)
  To: Chao Yu
  Cc: jaegeuk, syzbot+4793f6096d174c90b4f7, linux-kernel,
	linux-f2fs-devel

On Fri, Dec 16, 2022 at 11:50:00PM +0800, Chao Yu wrote:
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index cad4bdd6f097..4bc98dbe8292 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2336,6 +2336,7 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg)
>  {
>  	struct inode *inode = file_inode(filp);
>  	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +	u8 encrypt_pw_salt[16];
>  	int err;
>  
>  	if (!f2fs_sb_has_encrypt(sbi))
> @@ -2360,12 +2361,14 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg)
>  		goto out_err;
>  	}
>  got_it:
> -	if (copy_to_user((__u8 __user *)arg, sbi->raw_super->encrypt_pw_salt,
> -									16))
> -		err = -EFAULT;
> +	memcpy(encrypt_pw_salt, sbi->raw_super->encrypt_pw_salt, 16);
>  out_err:
>  	f2fs_up_write(&sbi->sb_lock);
>  	mnt_drop_write_file(filp);
> +
> +	if (!err && copy_to_user((__u8 __user *)arg, encrypt_pw_salt, 16))
> +		err = -EFAULT;
> +
>  	return err;

Reviewed-by: Eric Biggers <ebiggers@google.com>

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
@ 2025-10-14 11:47 Chao Yu via Linux-f2fs-devel
  0 siblings, 0 replies; 17+ messages in thread
From: Chao Yu via Linux-f2fs-devel @ 2025-10-14 11:47 UTC (permalink / raw)
  To: jaegeuk
  Cc: syzbot+14b90e1156b9f6fc1266, linux-kernel, linux-f2fs-devel,
	Jiaming Zhang, stable

As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:

Chain exists of:
  &sbi->cp_rwsem --> fs_reclaim --> sb_internal#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(sb_internal#2);
                               lock(fs_reclaim);
                               lock(sb_internal#2);
  rlock(&sbi->cp_rwsem);

 *** DEADLOCK ***

3 locks held by kswapd0/73:
 #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
 #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
 #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
 #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
 #2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890

stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
 check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
 __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
 lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
 down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
 f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
 f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
 f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
 f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
 f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
 f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
 evict+0x504/0x9c0 fs/inode.c:810
 f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
 evict+0x504/0x9c0 fs/inode.c:810
 dispose_list fs/inode.c:852 [inline]
 prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
 super_cache_scan+0x39b/0x4b0 fs/super.c:224
 do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
 shrink_slab_memcg mm/shrinker.c:550 [inline]
 shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
 shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
 shrink_many mm/vmscan.c:5016 [inline]
 lru_gen_shrink_node mm/vmscan.c:5094 [inline]
 shrink_node+0x315d/0x3780 mm/vmscan.c:6081
 kswapd_shrink_node mm/vmscan.c:6941 [inline]
 balance_pgdat mm/vmscan.c:7124 [inline]
 kswapd+0x147c/0x2800 mm/vmscan.c:7389
 kthread+0x70e/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

The root cause is deadlock among four locks as below:

kswapd
- fs_reclaim				--- Lock A
 - shrink_one
  - evict
   - f2fs_evict_inode
    - sb_start_intwrite			--- Lock B

- iput
 - evict
  - f2fs_evict_inode
   - sb_start_intwrite			--- Lock B
   - f2fs_truncate
    - f2fs_truncate_blocks
     - f2fs_do_truncate_blocks
      - f2fs_lock_op			--- Lock C

ioctl
- f2fs_ioc_commit_atomic_write
 - f2fs_lock_op				--- Lock C
  - __f2fs_commit_atomic_write
   - __replace_atomic_write_block
    - f2fs_get_dnode_of_data
     - __get_node_folio
      - f2fs_check_nid_range
       - f2fs_handle_error
        - f2fs_record_errors
         - f2fs_down_write		--- Lock D

open
- do_open
 - do_truncate
  - security_inode_need_killpriv
   - f2fs_getxattr
    - lookup_all_xattrs
     - f2fs_handle_error
      - f2fs_record_errors
       - f2fs_down_write		--- Lock D
        - f2fs_commit_super
         - read_mapping_folio
          - filemap_alloc_folio_noprof
           - prepare_alloc_pages
            - fs_reclaim_acquire	--- Lock A

In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()

Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
---
Please notice that this patch is base on below patch:
f2fs: fix to do sanity check on node footer in {read,write}_end_io

 fs/f2fs/compress.c |  5 +----
 fs/f2fs/f2fs.h     |  1 -
 fs/f2fs/node.c     |  6 ++----
 fs/f2fs/super.c    | 41 -----------------------------------------
 4 files changed, 3 insertions(+), 50 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 6ad8d3bc6df7..811bfe38e5c0 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -759,10 +759,7 @@ void f2fs_decompress_cluster(struct decompress_io_ctx *dic, bool in_task)
 		ret = -EFSCORRUPTED;
 
 		/* Avoid f2fs_commit_super in irq context */
-		if (!in_task)
-			f2fs_handle_error_async(sbi, ERROR_FAIL_DECOMPRESSION);
-		else
-			f2fs_handle_error(sbi, ERROR_FAIL_DECOMPRESSION);
+		f2fs_handle_error(sbi, ERROR_FAIL_DECOMPRESSION);
 		goto out_release;
 	}
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c589aed069d9..c3e968787d7e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3815,7 +3815,6 @@ void f2fs_quota_off_umount(struct super_block *sb);
 void f2fs_save_errors(struct f2fs_sb_info *sbi, unsigned char flag);
 void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason);
 void f2fs_handle_error(struct f2fs_sb_info *sbi, unsigned char error);
-void f2fs_handle_error_async(struct f2fs_sb_info *sbi, unsigned char error);
 int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover);
 int f2fs_sync_fs(struct super_block *sb, int sync);
 int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e70c970a3047..ce471e033774 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1550,10 +1550,8 @@ int f2fs_sanity_check_node_footer(struct f2fs_sb_info *sbi,
 		ntype, nid, nid_of_node(folio), ino_of_node(folio),
 		ofs_of_node(folio), cpver_of_node(folio),
 		next_blkaddr_of_node(folio));
-	if (in_irq)
-		f2fs_handle_error_async(sbi, ERROR_INCONSISTENT_FOOTER);
-	else
-		f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER);
+
+	f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER);
 	return -EFSCORRUPTED;
 }
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2ae341768a39..1dd4a93ba4bb 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4544,48 +4544,7 @@ void f2fs_save_errors(struct f2fs_sb_info *sbi, unsigned char flag)
 	spin_unlock_irqrestore(&sbi->error_lock, flags);
 }
 
-static bool f2fs_update_errors(struct f2fs_sb_info *sbi)
-{
-	unsigned long flags;
-	bool need_update = false;
-
-	spin_lock_irqsave(&sbi->error_lock, flags);
-	if (sbi->error_dirty) {
-		memcpy(F2FS_RAW_SUPER(sbi)->s_errors, sbi->errors,
-							MAX_F2FS_ERRORS);
-		sbi->error_dirty = false;
-		need_update = true;
-	}
-	spin_unlock_irqrestore(&sbi->error_lock, flags);
-
-	return need_update;
-}
-
-static void f2fs_record_errors(struct f2fs_sb_info *sbi, unsigned char error)
-{
-	int err;
-
-	f2fs_down_write(&sbi->sb_lock);
-
-	if (!f2fs_update_errors(sbi))
-		goto out_unlock;
-
-	err = f2fs_commit_super(sbi, false);
-	if (err)
-		f2fs_err_ratelimited(sbi,
-			"f2fs_commit_super fails to record errors:%u, err:%d",
-			error, err);
-out_unlock:
-	f2fs_up_write(&sbi->sb_lock);
-}
-
 void f2fs_handle_error(struct f2fs_sb_info *sbi, unsigned char error)
-{
-	f2fs_save_errors(sbi, error);
-	f2fs_record_errors(sbi, error);
-}
-
-void f2fs_handle_error_async(struct f2fs_sb_info *sbi, unsigned char error)
 {
 	f2fs_save_errors(sbi, error);
 
-- 
2.49.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
@ 2022-01-27  5:44 Chao Yu
  2022-01-27 21:59 ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-01-27  5:44 UTC (permalink / raw)
  To: jaegeuk; +Cc: Zhiguo Niu, Jing Xia, linux-kernel, linux-f2fs-devel

Quoted from Jing Xia's report, there is a potential deadlock may happen
between kworker and checkpoint as below:

[T:writeback]				[T:checkpoint]
- wb_writeback
 - blk_start_plug
bio contains NodeA was plugged in writeback threads
					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
					 - f2fs_write_data_pages
					  - f2fs_write_single_data_page -- write last dirty page
					   - f2fs_do_write_data_page
					    - set_page_writeback  -- clear page dirty flag and
					    PAGECACHE_TAG_DIRTY tag in radix tree
					    - f2fs_outplace_write_data
					     - f2fs_update_data_blkaddr
					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
					   - inode_dec_dirty_pages
 - writeback_sb_inodes
  - writeback_single_inode
   - do_writepages
    - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
     - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
  - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
 - blk_finish_plug

Let's try to avoid deadlock condition by forcing unplugging previous bio via
blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
due to valid sbi->wb_sync_req[DATA/NODE].

Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
---
 fs/f2fs/data.c | 6 +++++-
 fs/f2fs/node.c | 6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 76d6fe7b0c8f..932a4c81acaf 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
 	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
 	if (wbc->sync_mode == WB_SYNC_ALL)
 		atomic_inc(&sbi->wb_sync_req[DATA]);
-	else if (atomic_read(&sbi->wb_sync_req[DATA]))
+	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
+		/* to avoid potential deadlock */
+		if (current->plug)
+			blk_finish_plug(current->plug);
 		goto skip_write;
+	}
 
 	if (__should_serialize_io(inode, wbc)) {
 		mutex_lock(&sbi->writepages);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 556fcd8457f3..69c6bcaf5aae 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
 
 	if (wbc->sync_mode == WB_SYNC_ALL)
 		atomic_inc(&sbi->wb_sync_req[NODE]);
-	else if (atomic_read(&sbi->wb_sync_req[NODE]))
+	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
+		/* to avoid potential deadlock */
+		if (current->plug)
+			blk_finish_plug(current->plug);
 		goto skip_write;
+	}
 
 	trace_f2fs_writepages(mapping->host, wbc, NODE);
 
-- 
2.32.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-01-27  5:44 Chao Yu
@ 2022-01-27 21:59 ` Jaegeuk Kim
  2022-01-28  1:43   ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2022-01-27 21:59 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 01/27, Chao Yu wrote:
> Quoted from Jing Xia's report, there is a potential deadlock may happen
> between kworker and checkpoint as below:
> 
> [T:writeback]				[T:checkpoint]
> - wb_writeback
>  - blk_start_plug
> bio contains NodeA was plugged in writeback threads

I'm still trying to understand more precisely. So, how is it possible to
have bio having node write in this current context?

> 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> 					 - f2fs_write_data_pages
> 					  - f2fs_write_single_data_page -- write last dirty page
> 					   - f2fs_do_write_data_page
> 					    - set_page_writeback  -- clear page dirty flag and
> 					    PAGECACHE_TAG_DIRTY tag in radix tree
> 					    - f2fs_outplace_write_data
> 					     - f2fs_update_data_blkaddr
> 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> 					   - inode_dec_dirty_pages
>  - writeback_sb_inodes
>   - writeback_single_inode
>    - do_writepages
>     - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>      - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>   - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>  - blk_finish_plug
> 
> Let's try to avoid deadlock condition by forcing unplugging previous bio via
> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> due to valid sbi->wb_sync_req[DATA/NODE].
> 
> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> Signed-off-by: Chao Yu <chao@kernel.org>
> ---
>  fs/f2fs/data.c | 6 +++++-
>  fs/f2fs/node.c | 6 +++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 76d6fe7b0c8f..932a4c81acaf 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>  	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>  	if (wbc->sync_mode == WB_SYNC_ALL)
>  		atomic_inc(&sbi->wb_sync_req[DATA]);
> -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
> +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> +		/* to avoid potential deadlock */
> +		if (current->plug)
> +			blk_finish_plug(current->plug);
>  		goto skip_write;
> +	}
>  
>  	if (__should_serialize_io(inode, wbc)) {
>  		mutex_lock(&sbi->writepages);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 556fcd8457f3..69c6bcaf5aae 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>  
>  	if (wbc->sync_mode == WB_SYNC_ALL)
>  		atomic_inc(&sbi->wb_sync_req[NODE]);
> -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
> +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> +		/* to avoid potential deadlock */
> +		if (current->plug)
> +			blk_finish_plug(current->plug);
>  		goto skip_write;
> +	}
>  
>  	trace_f2fs_writepages(mapping->host, wbc, NODE);
>  
> -- 
> 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-01-27 21:59 ` Jaegeuk Kim
@ 2022-01-28  1:43   ` Chao Yu
  2022-01-29  0:37     ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-01-28  1:43 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 2022/1/28 5:59, Jaegeuk Kim wrote:
> On 01/27, Chao Yu wrote:
>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>> between kworker and checkpoint as below:
>>
>> [T:writeback]				[T:checkpoint]
>> - wb_writeback
>>   - blk_start_plug
>> bio contains NodeA was plugged in writeback threads
> 
> I'm still trying to understand more precisely. So, how is it possible to
> have bio having node write in this current context?

IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
during writebacking node_inode's data page (which should be node page)?

Thanks,

> 
>> 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>> 					 - f2fs_write_data_pages
>> 					  - f2fs_write_single_data_page -- write last dirty page
>> 					   - f2fs_do_write_data_page
>> 					    - set_page_writeback  -- clear page dirty flag and
>> 					    PAGECACHE_TAG_DIRTY tag in radix tree
>> 					    - f2fs_outplace_write_data
>> 					     - f2fs_update_data_blkaddr
>> 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>> 					   - inode_dec_dirty_pages
>>   - writeback_sb_inodes
>>    - writeback_single_inode
>>     - do_writepages
>>      - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>       - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>    - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>   - blk_finish_plug
>>
>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>> due to valid sbi->wb_sync_req[DATA/NODE].
>>
>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>> Signed-off-by: Chao Yu <chao@kernel.org>
>> ---
>>   fs/f2fs/data.c | 6 +++++-
>>   fs/f2fs/node.c | 6 +++++-
>>   2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 76d6fe7b0c8f..932a4c81acaf 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>   	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>   	if (wbc->sync_mode == WB_SYNC_ALL)
>>   		atomic_inc(&sbi->wb_sync_req[DATA]);
>> -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
>> +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>> +		/* to avoid potential deadlock */
>> +		if (current->plug)
>> +			blk_finish_plug(current->plug);
>>   		goto skip_write;
>> +	}
>>   
>>   	if (__should_serialize_io(inode, wbc)) {
>>   		mutex_lock(&sbi->writepages);
>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>> index 556fcd8457f3..69c6bcaf5aae 100644
>> --- a/fs/f2fs/node.c
>> +++ b/fs/f2fs/node.c
>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>   
>>   	if (wbc->sync_mode == WB_SYNC_ALL)
>>   		atomic_inc(&sbi->wb_sync_req[NODE]);
>> -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
>> +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>> +		/* to avoid potential deadlock */
>> +		if (current->plug)
>> +			blk_finish_plug(current->plug);
>>   		goto skip_write;
>> +	}
>>   
>>   	trace_f2fs_writepages(mapping->host, wbc, NODE);
>>   
>> -- 
>> 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-01-28  1:43   ` Chao Yu
@ 2022-01-29  0:37     ` Jaegeuk Kim
  2022-01-29  1:48       ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2022-01-29  0:37 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 01/28, Chao Yu wrote:
> On 2022/1/28 5:59, Jaegeuk Kim wrote:
> > On 01/27, Chao Yu wrote:
> > > Quoted from Jing Xia's report, there is a potential deadlock may happen
> > > between kworker and checkpoint as below:
> > > 
> > > [T:writeback]				[T:checkpoint]
> > > - wb_writeback
> > >   - blk_start_plug
> > > bio contains NodeA was plugged in writeback threads
> > 
> > I'm still trying to understand more precisely. So, how is it possible to
> > have bio having node write in this current context?
> 
> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
> during writebacking node_inode's data page (which should be node page)?

Wasn't that added into a different task->plug?

> 
> Thanks,
> 
> > 
> > > 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> > > 					 - f2fs_write_data_pages
> > > 					  - f2fs_write_single_data_page -- write last dirty page
> > > 					   - f2fs_do_write_data_page
> > > 					    - set_page_writeback  -- clear page dirty flag and
> > > 					    PAGECACHE_TAG_DIRTY tag in radix tree
> > > 					    - f2fs_outplace_write_data
> > > 					     - f2fs_update_data_blkaddr
> > > 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> > > 					   - inode_dec_dirty_pages
> > >   - writeback_sb_inodes
> > >    - writeback_single_inode
> > >     - do_writepages
> > >      - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
> > >       - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
> > >    - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
> > >   - blk_finish_plug
> > > 
> > > Let's try to avoid deadlock condition by forcing unplugging previous bio via
> > > blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> > > due to valid sbi->wb_sync_req[DATA/NODE].
> > > 
> > > Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > > Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > ---
> > >   fs/f2fs/data.c | 6 +++++-
> > >   fs/f2fs/node.c | 6 +++++-
> > >   2 files changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > index 76d6fe7b0c8f..932a4c81acaf 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> > >   	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
> > >   	if (wbc->sync_mode == WB_SYNC_ALL)
> > >   		atomic_inc(&sbi->wb_sync_req[DATA]);
> > > -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
> > > +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> > > +		/* to avoid potential deadlock */
> > > +		if (current->plug)
> > > +			blk_finish_plug(current->plug);
> > >   		goto skip_write;
> > > +	}
> > >   	if (__should_serialize_io(inode, wbc)) {
> > >   		mutex_lock(&sbi->writepages);
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index 556fcd8457f3..69c6bcaf5aae 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
> > >   	if (wbc->sync_mode == WB_SYNC_ALL)
> > >   		atomic_inc(&sbi->wb_sync_req[NODE]);
> > > -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
> > > +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> > > +		/* to avoid potential deadlock */
> > > +		if (current->plug)
> > > +			blk_finish_plug(current->plug);
> > >   		goto skip_write;
> > > +	}
> > >   	trace_f2fs_writepages(mapping->host, wbc, NODE);
> > > -- 
> > > 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-01-29  0:37     ` Jaegeuk Kim
@ 2022-01-29  1:48       ` Chao Yu
  2022-02-03  1:51         ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-01-29  1:48 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 2022/1/29 8:37, Jaegeuk Kim wrote:
> On 01/28, Chao Yu wrote:
>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>> On 01/27, Chao Yu wrote:
>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>> between kworker and checkpoint as below:
>>>>
>>>> [T:writeback]				[T:checkpoint]
>>>> - wb_writeback
>>>>    - blk_start_plug
>>>> bio contains NodeA was plugged in writeback threads
>>>
>>> I'm still trying to understand more precisely. So, how is it possible to
>>> have bio having node write in this current context?
>>
>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>> during writebacking node_inode's data page (which should be node page)?
> 
> Wasn't that added into a different task->plug?

I'm not sure I've got your concern correctly...

Do you mean NodeA and other IOs from do_writepages() were plugged in
different local plug variables?

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>> 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>> 					 - f2fs_write_data_pages
>>>> 					  - f2fs_write_single_data_page -- write last dirty page
>>>> 					   - f2fs_do_write_data_page
>>>> 					    - set_page_writeback  -- clear page dirty flag and
>>>> 					    PAGECACHE_TAG_DIRTY tag in radix tree
>>>> 					    - f2fs_outplace_write_data
>>>> 					     - f2fs_update_data_blkaddr
>>>> 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>> 					   - inode_dec_dirty_pages
>>>>    - writeback_sb_inodes
>>>>     - writeback_single_inode
>>>>      - do_writepages
>>>>       - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>        - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>     - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>    - blk_finish_plug
>>>>
>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>
>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>> ---
>>>>    fs/f2fs/data.c | 6 +++++-
>>>>    fs/f2fs/node.c | 6 +++++-
>>>>    2 files changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>    	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>    	if (wbc->sync_mode == WB_SYNC_ALL)
>>>>    		atomic_inc(&sbi->wb_sync_req[DATA]);
>>>> -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>> +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>> +		/* to avoid potential deadlock */
>>>> +		if (current->plug)
>>>> +			blk_finish_plug(current->plug);
>>>>    		goto skip_write;
>>>> +	}
>>>>    	if (__should_serialize_io(inode, wbc)) {
>>>>    		mutex_lock(&sbi->writepages);
>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>> --- a/fs/f2fs/node.c
>>>> +++ b/fs/f2fs/node.c
>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>    	if (wbc->sync_mode == WB_SYNC_ALL)
>>>>    		atomic_inc(&sbi->wb_sync_req[NODE]);
>>>> -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>> +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>> +		/* to avoid potential deadlock */
>>>> +		if (current->plug)
>>>> +			blk_finish_plug(current->plug);
>>>>    		goto skip_write;
>>>> +	}
>>>>    	trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>> -- 
>>>> 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-01-29  1:48       ` Chao Yu
@ 2022-02-03  1:51         ` Jaegeuk Kim
  2022-02-03 14:57           ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2022-02-03  1:51 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 01/29, Chao Yu wrote:
> On 2022/1/29 8:37, Jaegeuk Kim wrote:
> > On 01/28, Chao Yu wrote:
> > > On 2022/1/28 5:59, Jaegeuk Kim wrote:
> > > > On 01/27, Chao Yu wrote:
> > > > > Quoted from Jing Xia's report, there is a potential deadlock may happen
> > > > > between kworker and checkpoint as below:
> > > > > 
> > > > > [T:writeback]				[T:checkpoint]
> > > > > - wb_writeback
> > > > >    - blk_start_plug
> > > > > bio contains NodeA was plugged in writeback threads
> > > > 
> > > > I'm still trying to understand more precisely. So, how is it possible to
> > > > have bio having node write in this current context?
> > > 
> > > IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
> > > during writebacking node_inode's data page (which should be node page)?
> > 
> > Wasn't that added into a different task->plug?
> 
> I'm not sure I've got your concern correctly...
> 
> Do you mean NodeA and other IOs from do_writepages() were plugged in
> different local plug variables?

I think so.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> > > > > 					 - f2fs_write_data_pages
> > > > > 					  - f2fs_write_single_data_page -- write last dirty page
> > > > > 					   - f2fs_do_write_data_page
> > > > > 					    - set_page_writeback  -- clear page dirty flag and
> > > > > 					    PAGECACHE_TAG_DIRTY tag in radix tree
> > > > > 					    - f2fs_outplace_write_data
> > > > > 					     - f2fs_update_data_blkaddr
> > > > > 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> > > > > 					   - inode_dec_dirty_pages
> > > > >    - writeback_sb_inodes
> > > > >     - writeback_single_inode
> > > > >      - do_writepages
> > > > >       - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
> > > > >        - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
> > > > >     - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
> > > > >    - blk_finish_plug
> > > > > 
> > > > > Let's try to avoid deadlock condition by forcing unplugging previous bio via
> > > > > blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> > > > > due to valid sbi->wb_sync_req[DATA/NODE].
> > > > > 
> > > > > Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> > > > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > > > > Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > ---
> > > > >    fs/f2fs/data.c | 6 +++++-
> > > > >    fs/f2fs/node.c | 6 +++++-
> > > > >    2 files changed, 10 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > index 76d6fe7b0c8f..932a4c81acaf 100644
> > > > > --- a/fs/f2fs/data.c
> > > > > +++ b/fs/f2fs/data.c
> > > > > @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> > > > >    	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
> > > > >    	if (wbc->sync_mode == WB_SYNC_ALL)
> > > > >    		atomic_inc(&sbi->wb_sync_req[DATA]);
> > > > > -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
> > > > > +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> > > > > +		/* to avoid potential deadlock */
> > > > > +		if (current->plug)
> > > > > +			blk_finish_plug(current->plug);
> > > > >    		goto skip_write;
> > > > > +	}
> > > > >    	if (__should_serialize_io(inode, wbc)) {
> > > > >    		mutex_lock(&sbi->writepages);
> > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > index 556fcd8457f3..69c6bcaf5aae 100644
> > > > > --- a/fs/f2fs/node.c
> > > > > +++ b/fs/f2fs/node.c
> > > > > @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
> > > > >    	if (wbc->sync_mode == WB_SYNC_ALL)
> > > > >    		atomic_inc(&sbi->wb_sync_req[NODE]);
> > > > > -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
> > > > > +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> > > > > +		/* to avoid potential deadlock */
> > > > > +		if (current->plug)
> > > > > +			blk_finish_plug(current->plug);
> > > > >    		goto skip_write;
> > > > > +	}
> > > > >    	trace_f2fs_writepages(mapping->host, wbc, NODE);
> > > > > -- 
> > > > > 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-02-03  1:51         ` Jaegeuk Kim
@ 2022-02-03 14:57           ` Chao Yu
  2022-02-25  3:02             ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-02-03 14:57 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Jing Xia, Zhiguo Niu, linux-kernel, linux-f2fs-devel

On 2022/2/3 9:51, Jaegeuk Kim wrote:
> On 01/29, Chao Yu wrote:
>> On 2022/1/29 8:37, Jaegeuk Kim wrote:
>>> On 01/28, Chao Yu wrote:
>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>>>> On 01/27, Chao Yu wrote:
>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>>>> between kworker and checkpoint as below:
>>>>>>
>>>>>> [T:writeback]				[T:checkpoint]
>>>>>> - wb_writeback
>>>>>>     - blk_start_plug
>>>>>> bio contains NodeA was plugged in writeback threads
>>>>>
>>>>> I'm still trying to understand more precisely. So, how is it possible to
>>>>> have bio having node write in this current context?
>>>>
>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>>>> during writebacking node_inode's data page (which should be node page)?
>>>
>>> Wasn't that added into a different task->plug?
>>
>> I'm not sure I've got your concern correctly...
>>
>> Do you mean NodeA and other IOs from do_writepages() were plugged in
>> different local plug variables?
> 
> I think so.

I guess block plug helper says it doesn't allow to use nested plug, so there
is only one plug in kworker thread?

void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
{
	struct task_struct *tsk = current;

	/*
	 * If this is a nested plug, don't actually assign it.
	 */
	if (tsk->plug)
		return;
...
}

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>> 					- do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>>>> 					 - f2fs_write_data_pages
>>>>>> 					  - f2fs_write_single_data_page -- write last dirty page
>>>>>> 					   - f2fs_do_write_data_page
>>>>>> 					    - set_page_writeback  -- clear page dirty flag and
>>>>>> 					    PAGECACHE_TAG_DIRTY tag in radix tree
>>>>>> 					    - f2fs_outplace_write_data
>>>>>> 					     - f2fs_update_data_blkaddr
>>>>>> 					      - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>>>> 					   - inode_dec_dirty_pages
>>>>>>     - writeback_sb_inodes
>>>>>>      - writeback_single_inode
>>>>>>       - do_writepages
>>>>>>        - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>>>         - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>>>      - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>>>     - blk_finish_plug
>>>>>>
>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>>>
>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>>> ---
>>>>>>     fs/f2fs/data.c | 6 +++++-
>>>>>>     fs/f2fs/node.c | 6 +++++-
>>>>>>     2 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>>>> --- a/fs/f2fs/data.c
>>>>>> +++ b/fs/f2fs/data.c
>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>>>     	/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>>>     	if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>     		atomic_inc(&sbi->wb_sync_req[DATA]);
>>>>>> -	else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>>>> +	else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>>>> +		/* to avoid potential deadlock */
>>>>>> +		if (current->plug)
>>>>>> +			blk_finish_plug(current->plug);
>>>>>>     		goto skip_write;
>>>>>> +	}
>>>>>>     	if (__should_serialize_io(inode, wbc)) {
>>>>>>     		mutex_lock(&sbi->writepages);
>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>>>> --- a/fs/f2fs/node.c
>>>>>> +++ b/fs/f2fs/node.c
>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>>>     	if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>     		atomic_inc(&sbi->wb_sync_req[NODE]);
>>>>>> -	else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>>>> +	else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>>>> +		/* to avoid potential deadlock */
>>>>>> +		if (current->plug)
>>>>>> +			blk_finish_plug(current->plug);
>>>>>>     		goto skip_write;
>>>>>> +	}
>>>>>>     	trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>>>> -- 
>>>>>> 2.32.0


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-02-03 14:57           ` Chao Yu
@ 2022-02-25  3:02             ` Chao Yu
  2022-03-02  3:32               ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-02-25  3:02 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Jing Xia, linux-f2fs-devel, Zhiguo Niu, linux-kernel

On 2022/2/3 22:57, Chao Yu wrote:
> On 2022/2/3 9:51, Jaegeuk Kim wrote:
>> On 01/29, Chao Yu wrote:
>>> On 2022/1/29 8:37, Jaegeuk Kim wrote:
>>>> On 01/28, Chao Yu wrote:
>>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>>>>> On 01/27, Chao Yu wrote:
>>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>>>>> between kworker and checkpoint as below:
>>>>>>>
>>>>>>> [T:writeback]                [T:checkpoint]
>>>>>>> - wb_writeback
>>>>>>>     - blk_start_plug
>>>>>>> bio contains NodeA was plugged in writeback threads
>>>>>>
>>>>>> I'm still trying to understand more precisely. So, how is it possible to
>>>>>> have bio having node write in this current context?
>>>>>
>>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>>>>> during writebacking node_inode's data page (which should be node page)?
>>>>
>>>> Wasn't that added into a different task->plug?
>>>
>>> I'm not sure I've got your concern correctly...
>>>
>>> Do you mean NodeA and other IOs from do_writepages() were plugged in
>>> different local plug variables?
>>
>> I think so.
> 
> I guess block plug helper says it doesn't allow to use nested plug, so there
> is only one plug in kworker thread?
> 
> void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
> {
>      struct task_struct *tsk = current;
> 
>      /*
>       * If this is a nested plug, don't actually assign it.
>       */
>      if (tsk->plug)
>          return;
> ...
> }

Any further comments?

Thanks,

> 
> Thanks,
> 
>>
>>>
>>> Thanks,
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>>>                     - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>>>>>                      - f2fs_write_data_pages
>>>>>>>                       - f2fs_write_single_data_page -- write last dirty page
>>>>>>>                        - f2fs_do_write_data_page
>>>>>>>                         - set_page_writeback  -- clear page dirty flag and
>>>>>>>                         PAGECACHE_TAG_DIRTY tag in radix tree
>>>>>>>                         - f2fs_outplace_write_data
>>>>>>>                          - f2fs_update_data_blkaddr
>>>>>>>                           - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>>>>>                        - inode_dec_dirty_pages
>>>>>>>     - writeback_sb_inodes
>>>>>>>      - writeback_single_inode
>>>>>>>       - do_writepages
>>>>>>>        - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>>>>         - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>>>>      - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>>>>     - blk_finish_plug
>>>>>>>
>>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>>>>
>>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>>>> ---
>>>>>>>     fs/f2fs/data.c | 6 +++++-
>>>>>>>     fs/f2fs/node.c | 6 +++++-
>>>>>>>     2 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>>>>> --- a/fs/f2fs/data.c
>>>>>>> +++ b/fs/f2fs/data.c
>>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>>>>         /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>>>>         if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>             atomic_inc(&sbi->wb_sync_req[DATA]);
>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>>>>> +        /* to avoid potential deadlock */
>>>>>>> +        if (current->plug)
>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>             goto skip_write;
>>>>>>> +    }
>>>>>>>         if (__should_serialize_io(inode, wbc)) {
>>>>>>>             mutex_lock(&sbi->writepages);
>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>>>>> --- a/fs/f2fs/node.c
>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>>>>         if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>             atomic_inc(&sbi->wb_sync_req[NODE]);
>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>>>>> +        /* to avoid potential deadlock */
>>>>>>> +        if (current->plug)
>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>             goto skip_write;
>>>>>>> +    }
>>>>>>>         trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>>>>> -- 
>>>>>>> 2.32.0
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-02-25  3:02             ` Chao Yu
@ 2022-03-02  3:32               ` Chao Yu
  2022-03-02  5:26                 ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-03-02  3:32 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

ping,

On 2022/2/25 11:02, Chao Yu wrote:
> On 2022/2/3 22:57, Chao Yu wrote:
>> On 2022/2/3 9:51, Jaegeuk Kim wrote:
>>> On 01/29, Chao Yu wrote:
>>>> On 2022/1/29 8:37, Jaegeuk Kim wrote:
>>>>> On 01/28, Chao Yu wrote:
>>>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>>>>>> On 01/27, Chao Yu wrote:
>>>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>>>>>> between kworker and checkpoint as below:
>>>>>>>>
>>>>>>>> [T:writeback]                [T:checkpoint]
>>>>>>>> - wb_writeback
>>>>>>>>     - blk_start_plug
>>>>>>>> bio contains NodeA was plugged in writeback threads
>>>>>>>
>>>>>>> I'm still trying to understand more precisely. So, how is it possible to
>>>>>>> have bio having node write in this current context?
>>>>>>
>>>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>>>>>> during writebacking node_inode's data page (which should be node page)?
>>>>>
>>>>> Wasn't that added into a different task->plug?
>>>>
>>>> I'm not sure I've got your concern correctly...
>>>>
>>>> Do you mean NodeA and other IOs from do_writepages() were plugged in
>>>> different local plug variables?
>>>
>>> I think so.
>>
>> I guess block plug helper says it doesn't allow to use nested plug, so there
>> is only one plug in kworker thread?
>>
>> void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>> {
>>      struct task_struct *tsk = current;
>>
>>      /*
>>       * If this is a nested plug, don't actually assign it.
>>       */
>>      if (tsk->plug)
>>          return;
>> ...
>> }
> 
> Any further comments?
> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>                     - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>>>>>>                      - f2fs_write_data_pages
>>>>>>>>                       - f2fs_write_single_data_page -- write last dirty page
>>>>>>>>                        - f2fs_do_write_data_page
>>>>>>>>                         - set_page_writeback  -- clear page dirty flag and
>>>>>>>>                         PAGECACHE_TAG_DIRTY tag in radix tree
>>>>>>>>                         - f2fs_outplace_write_data
>>>>>>>>                          - f2fs_update_data_blkaddr
>>>>>>>>                           - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>>>>>>                        - inode_dec_dirty_pages
>>>>>>>>     - writeback_sb_inodes
>>>>>>>>      - writeback_single_inode
>>>>>>>>       - do_writepages
>>>>>>>>        - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>>>>>         - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>>>>>      - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>>>>>     - blk_finish_plug
>>>>>>>>
>>>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>>>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>>>>>
>>>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>>>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>>>>> ---
>>>>>>>>     fs/f2fs/data.c | 6 +++++-
>>>>>>>>     fs/f2fs/node.c | 6 +++++-
>>>>>>>>     2 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>>>>>> --- a/fs/f2fs/data.c
>>>>>>>> +++ b/fs/f2fs/data.c
>>>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>>>>>         /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>>>>>         if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>             atomic_inc(&sbi->wb_sync_req[DATA]);
>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>> +        if (current->plug)
>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>             goto skip_write;
>>>>>>>> +    }
>>>>>>>>         if (__should_serialize_io(inode, wbc)) {
>>>>>>>>             mutex_lock(&sbi->writepages);
>>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>>>>>> --- a/fs/f2fs/node.c
>>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>>>>>         if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>             atomic_inc(&sbi->wb_sync_req[NODE]);
>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>> +        if (current->plug)
>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>             goto skip_write;
>>>>>>>> +    }
>>>>>>>>         trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>>>>>> -- 
>>>>>>>> 2.32.0
>>
>>
>> _______________________________________________
>> Linux-f2fs-devel mailing list
>> Linux-f2fs-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-03-02  3:32               ` Chao Yu
@ 2022-03-02  5:26                 ` Jaegeuk Kim
  2022-03-02  8:14                   ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2022-03-02  5:26 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

On 03/02, Chao Yu wrote:
> ping,
> 
> On 2022/2/25 11:02, Chao Yu wrote:
> > On 2022/2/3 22:57, Chao Yu wrote:
> > > On 2022/2/3 9:51, Jaegeuk Kim wrote:
> > > > On 01/29, Chao Yu wrote:
> > > > > On 2022/1/29 8:37, Jaegeuk Kim wrote:
> > > > > > On 01/28, Chao Yu wrote:
> > > > > > > On 2022/1/28 5:59, Jaegeuk Kim wrote:
> > > > > > > > On 01/27, Chao Yu wrote:
> > > > > > > > > Quoted from Jing Xia's report, there is a potential deadlock may happen
> > > > > > > > > between kworker and checkpoint as below:
> > > > > > > > > 
> > > > > > > > > [T:writeback]                [T:checkpoint]
> > > > > > > > > - wb_writeback
> > > > > > > > >     - blk_start_plug
> > > > > > > > > bio contains NodeA was plugged in writeback threads
> > > > > > > > 
> > > > > > > > I'm still trying to understand more precisely. So, how is it possible to
> > > > > > > > have bio having node write in this current context?
> > > > > > > 
> > > > > > > IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
> > > > > > > during writebacking node_inode's data page (which should be node page)?
> > > > > > 
> > > > > > Wasn't that added into a different task->plug?
> > > > > 
> > > > > I'm not sure I've got your concern correctly...
> > > > > 
> > > > > Do you mean NodeA and other IOs from do_writepages() were plugged in
> > > > > different local plug variables?
> > > > 
> > > > I think so.
> > > 
> > > I guess block plug helper says it doesn't allow to use nested plug, so there
> > > is only one plug in kworker thread?

Is there only one kworker thread that flushes node and inode pages?

> > > 
> > > void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
> > > {
> > >      struct task_struct *tsk = current;
> > > 
> > >      /*
> > >       * If this is a nested plug, don't actually assign it.
> > >       */
> > >      if (tsk->plug)
> > >          return;
> > > ...
> > > }
> > 
> > Any further comments?
> > 
> > Thanks,
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > >                     - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> > > > > > > > >                      - f2fs_write_data_pages
> > > > > > > > >                       - f2fs_write_single_data_page -- write last dirty page
> > > > > > > > >                        - f2fs_do_write_data_page
> > > > > > > > >                         - set_page_writeback  -- clear page dirty flag and
> > > > > > > > >                         PAGECACHE_TAG_DIRTY tag in radix tree
> > > > > > > > >                         - f2fs_outplace_write_data
> > > > > > > > >                          - f2fs_update_data_blkaddr
> > > > > > > > >                           - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> > > > > > > > >                        - inode_dec_dirty_pages
> > > > > > > > >     - writeback_sb_inodes
> > > > > > > > >      - writeback_single_inode
> > > > > > > > >       - do_writepages
> > > > > > > > >        - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
> > > > > > > > >         - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
> > > > > > > > >      - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
> > > > > > > > >     - blk_finish_plug
> > > > > > > > > 
> > > > > > > > > Let's try to avoid deadlock condition by forcing unplugging previous bio via
> > > > > > > > > blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> > > > > > > > > due to valid sbi->wb_sync_req[DATA/NODE].
> > > > > > > > > 
> > > > > > > > > Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> > > > > > > > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > > > > > > > > Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> > > > > > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > > > > > ---
> > > > > > > > >     fs/f2fs/data.c | 6 +++++-
> > > > > > > > >     fs/f2fs/node.c | 6 +++++-
> > > > > > > > >     2 files changed, 10 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > > > > > index 76d6fe7b0c8f..932a4c81acaf 100644
> > > > > > > > > --- a/fs/f2fs/data.c
> > > > > > > > > +++ b/fs/f2fs/data.c
> > > > > > > > > @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> > > > > > > > >         /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
> > > > > > > > >         if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > >             atomic_inc(&sbi->wb_sync_req[DATA]);
> > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
> > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > +        if (current->plug)
> > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > >             goto skip_write;
> > > > > > > > > +    }
> > > > > > > > >         if (__should_serialize_io(inode, wbc)) {
> > > > > > > > >             mutex_lock(&sbi->writepages);
> > > > > > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > > > > > index 556fcd8457f3..69c6bcaf5aae 100644
> > > > > > > > > --- a/fs/f2fs/node.c
> > > > > > > > > +++ b/fs/f2fs/node.c
> > > > > > > > > @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
> > > > > > > > >         if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > >             atomic_inc(&sbi->wb_sync_req[NODE]);
> > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
> > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > +        if (current->plug)
> > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > >             goto skip_write;
> > > > > > > > > +    }
> > > > > > > > >         trace_f2fs_writepages(mapping->host, wbc, NODE);
> > > > > > > > > -- 
> > > > > > > > > 2.32.0
> > > 
> > > 
> > > _______________________________________________
> > > Linux-f2fs-devel mailing list
> > > Linux-f2fs-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > 
> > 
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-03-02  5:26                 ` Jaegeuk Kim
@ 2022-03-02  8:14                   ` Chao Yu
  2022-03-02 19:45                     ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-03-02  8:14 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

On 2022/3/2 13:26, Jaegeuk Kim wrote:
> On 03/02, Chao Yu wrote:
>> ping,
>>
>> On 2022/2/25 11:02, Chao Yu wrote:
>>> On 2022/2/3 22:57, Chao Yu wrote:
>>>> On 2022/2/3 9:51, Jaegeuk Kim wrote:
>>>>> On 01/29, Chao Yu wrote:
>>>>>> On 2022/1/29 8:37, Jaegeuk Kim wrote:
>>>>>>> On 01/28, Chao Yu wrote:
>>>>>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>>>>>>>> On 01/27, Chao Yu wrote:
>>>>>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>>>>>>>> between kworker and checkpoint as below:
>>>>>>>>>>
>>>>>>>>>> [T:writeback]                [T:checkpoint]
>>>>>>>>>> - wb_writeback
>>>>>>>>>>      - blk_start_plug
>>>>>>>>>> bio contains NodeA was plugged in writeback threads
>>>>>>>>>
>>>>>>>>> I'm still trying to understand more precisely. So, how is it possible to
>>>>>>>>> have bio having node write in this current context?
>>>>>>>>
>>>>>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>>>>>>>> during writebacking node_inode's data page (which should be node page)?
>>>>>>>
>>>>>>> Wasn't that added into a different task->plug?
>>>>>>
>>>>>> I'm not sure I've got your concern correctly...
>>>>>>
>>>>>> Do you mean NodeA and other IOs from do_writepages() were plugged in
>>>>>> different local plug variables?
>>>>>
>>>>> I think so.
>>>>
>>>> I guess block plug helper says it doesn't allow to use nested plug, so there
>>>> is only one plug in kworker thread?
> 
> Is there only one kworker thread that flushes node and inode pages?

IIRC, =one kworker per block device?

Thanks,

> 
>>>>
>>>> void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>>>> {
>>>>       struct task_struct *tsk = current;
>>>>
>>>>       /*
>>>>        * If this is a nested plug, don't actually assign it.
>>>>        */
>>>>       if (tsk->plug)
>>>>           return;
>>>> ...
>>>> }
>>>
>>> Any further comments?
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>                      - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>>>>>>>>                       - f2fs_write_data_pages
>>>>>>>>>>                        - f2fs_write_single_data_page -- write last dirty page
>>>>>>>>>>                         - f2fs_do_write_data_page
>>>>>>>>>>                          - set_page_writeback  -- clear page dirty flag and
>>>>>>>>>>                          PAGECACHE_TAG_DIRTY tag in radix tree
>>>>>>>>>>                          - f2fs_outplace_write_data
>>>>>>>>>>                           - f2fs_update_data_blkaddr
>>>>>>>>>>                            - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>>>>>>>>                         - inode_dec_dirty_pages
>>>>>>>>>>      - writeback_sb_inodes
>>>>>>>>>>       - writeback_single_inode
>>>>>>>>>>        - do_writepages
>>>>>>>>>>         - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>>>>>>>          - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>>>>>>>       - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>>>>>>>      - blk_finish_plug
>>>>>>>>>>
>>>>>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>>>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>>>>>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>>>>>>>
>>>>>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>>>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>>>>>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>>>>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>>>>>>> ---
>>>>>>>>>>      fs/f2fs/data.c | 6 +++++-
>>>>>>>>>>      fs/f2fs/node.c | 6 +++++-
>>>>>>>>>>      2 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>>>>>>>> --- a/fs/f2fs/data.c
>>>>>>>>>> +++ b/fs/f2fs/data.c
>>>>>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>>>>>>>          /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>>>>>>>          if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>>>              atomic_inc(&sbi->wb_sync_req[DATA]);
>>>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>>>> +        if (current->plug)
>>>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>>>              goto skip_write;
>>>>>>>>>> +    }
>>>>>>>>>>          if (__should_serialize_io(inode, wbc)) {
>>>>>>>>>>              mutex_lock(&sbi->writepages);
>>>>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>>>>>>>> --- a/fs/f2fs/node.c
>>>>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>>>>>>>          if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>>>              atomic_inc(&sbi->wb_sync_req[NODE]);
>>>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>>>> +        if (current->plug)
>>>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>>>              goto skip_write;
>>>>>>>>>> +    }
>>>>>>>>>>          trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>>>>>>>> -- 
>>>>>>>>>> 2.32.0
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-f2fs-devel mailing list
>>>> Linux-f2fs-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>
>>>
>>> _______________________________________________
>>> Linux-f2fs-devel mailing list
>>> Linux-f2fs-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-03-02  8:14                   ` Chao Yu
@ 2022-03-02 19:45                     ` Jaegeuk Kim
  2022-03-03  2:32                       ` Chao Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Jaegeuk Kim @ 2022-03-02 19:45 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

On 03/02, Chao Yu wrote:
> On 2022/3/2 13:26, Jaegeuk Kim wrote:
> > On 03/02, Chao Yu wrote:
> > > ping,
> > > 
> > > On 2022/2/25 11:02, Chao Yu wrote:
> > > > On 2022/2/3 22:57, Chao Yu wrote:
> > > > > On 2022/2/3 9:51, Jaegeuk Kim wrote:
> > > > > > On 01/29, Chao Yu wrote:
> > > > > > > On 2022/1/29 8:37, Jaegeuk Kim wrote:
> > > > > > > > On 01/28, Chao Yu wrote:
> > > > > > > > > On 2022/1/28 5:59, Jaegeuk Kim wrote:
> > > > > > > > > > On 01/27, Chao Yu wrote:
> > > > > > > > > > > Quoted from Jing Xia's report, there is a potential deadlock may happen
> > > > > > > > > > > between kworker and checkpoint as below:
> > > > > > > > > > > 
> > > > > > > > > > > [T:writeback]                [T:checkpoint]
> > > > > > > > > > > - wb_writeback
> > > > > > > > > > >      - blk_start_plug
> > > > > > > > > > > bio contains NodeA was plugged in writeback threads
> > > > > > > > > > 
> > > > > > > > > > I'm still trying to understand more precisely. So, how is it possible to
> > > > > > > > > > have bio having node write in this current context?
> > > > > > > > > 
> > > > > > > > > IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
> > > > > > > > > during writebacking node_inode's data page (which should be node page)?
> > > > > > > > 
> > > > > > > > Wasn't that added into a different task->plug?
> > > > > > > 
> > > > > > > I'm not sure I've got your concern correctly...
> > > > > > > 
> > > > > > > Do you mean NodeA and other IOs from do_writepages() were plugged in
> > > > > > > different local plug variables?
> > > > > > 
> > > > > > I think so.
> > > > > 
> > > > > I guess block plug helper says it doesn't allow to use nested plug, so there
> > > > > is only one plug in kworker thread?
> > 
> > Is there only one kworker thread that flushes node and inode pages?
> 
> IIRC, =one kworker per block device?

If there's one kworker only, f2fs_write_node_pages() should have flushed its
plug?

> 
> Thanks,
> 
> > 
> > > > > 
> > > > > void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
> > > > > {
> > > > >       struct task_struct *tsk = current;
> > > > > 
> > > > >       /*
> > > > >        * If this is a nested plug, don't actually assign it.
> > > > >        */
> > > > >       if (tsk->plug)
> > > > >           return;
> > > > > ...
> > > > > }
> > > > 
> > > > Any further comments?
> > > > 
> > > > Thanks,
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > >                      - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> > > > > > > > > > >                       - f2fs_write_data_pages
> > > > > > > > > > >                        - f2fs_write_single_data_page -- write last dirty page
> > > > > > > > > > >                         - f2fs_do_write_data_page
> > > > > > > > > > >                          - set_page_writeback  -- clear page dirty flag and
> > > > > > > > > > >                          PAGECACHE_TAG_DIRTY tag in radix tree
> > > > > > > > > > >                          - f2fs_outplace_write_data
> > > > > > > > > > >                           - f2fs_update_data_blkaddr
> > > > > > > > > > >                            - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> > > > > > > > > > >                         - inode_dec_dirty_pages
> > > > > > > > > > >      - writeback_sb_inodes
> > > > > > > > > > >       - writeback_single_inode
> > > > > > > > > > >        - do_writepages
> > > > > > > > > > >         - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
> > > > > > > > > > >          - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
> > > > > > > > > > >       - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
> > > > > > > > > > >      - blk_finish_plug
> > > > > > > > > > > 
> > > > > > > > > > > Let's try to avoid deadlock condition by forcing unplugging previous bio via
> > > > > > > > > > > blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> > > > > > > > > > > due to valid sbi->wb_sync_req[DATA/NODE].
> > > > > > > > > > > 
> > > > > > > > > > > Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> > > > > > > > > > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > > > > > > > > > > Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> > > > > > > > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > > > > > > > ---
> > > > > > > > > > >      fs/f2fs/data.c | 6 +++++-
> > > > > > > > > > >      fs/f2fs/node.c | 6 +++++-
> > > > > > > > > > >      2 files changed, 10 insertions(+), 2 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > > > > > > > index 76d6fe7b0c8f..932a4c81acaf 100644
> > > > > > > > > > > --- a/fs/f2fs/data.c
> > > > > > > > > > > +++ b/fs/f2fs/data.c
> > > > > > > > > > > @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> > > > > > > > > > >          /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
> > > > > > > > > > >          if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > > > >              atomic_inc(&sbi->wb_sync_req[DATA]);
> > > > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
> > > > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> > > > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > > > +        if (current->plug)
> > > > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > > > >              goto skip_write;
> > > > > > > > > > > +    }
> > > > > > > > > > >          if (__should_serialize_io(inode, wbc)) {
> > > > > > > > > > >              mutex_lock(&sbi->writepages);
> > > > > > > > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > > > > > > > index 556fcd8457f3..69c6bcaf5aae 100644
> > > > > > > > > > > --- a/fs/f2fs/node.c
> > > > > > > > > > > +++ b/fs/f2fs/node.c
> > > > > > > > > > > @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
> > > > > > > > > > >          if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > > > >              atomic_inc(&sbi->wb_sync_req[NODE]);
> > > > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
> > > > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> > > > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > > > +        if (current->plug)
> > > > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > > > >              goto skip_write;
> > > > > > > > > > > +    }
> > > > > > > > > > >          trace_f2fs_writepages(mapping->host, wbc, NODE);
> > > > > > > > > > > -- 
> > > > > > > > > > > 2.32.0
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Linux-f2fs-devel mailing list
> > > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Linux-f2fs-devel mailing list
> > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-03-02 19:45                     ` Jaegeuk Kim
@ 2022-03-03  2:32                       ` Chao Yu
  2022-03-03 21:30                         ` Jaegeuk Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Chao Yu @ 2022-03-03  2:32 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

On 2022/3/3 3:45, Jaegeuk Kim wrote:
> On 03/02, Chao Yu wrote:
>> On 2022/3/2 13:26, Jaegeuk Kim wrote:
>>> On 03/02, Chao Yu wrote:
>>>> ping,
>>>>
>>>> On 2022/2/25 11:02, Chao Yu wrote:
>>>>> On 2022/2/3 22:57, Chao Yu wrote:
>>>>>> On 2022/2/3 9:51, Jaegeuk Kim wrote:
>>>>>>> On 01/29, Chao Yu wrote:
>>>>>>>> On 2022/1/29 8:37, Jaegeuk Kim wrote:
>>>>>>>>> On 01/28, Chao Yu wrote:
>>>>>>>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote:
>>>>>>>>>>> On 01/27, Chao Yu wrote:
>>>>>>>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen
>>>>>>>>>>>> between kworker and checkpoint as below:
>>>>>>>>>>>>
>>>>>>>>>>>> [T:writeback]                [T:checkpoint]
>>>>>>>>>>>> - wb_writeback
>>>>>>>>>>>>       - blk_start_plug
>>>>>>>>>>>> bio contains NodeA was plugged in writeback threads
>>>>>>>>>>>
>>>>>>>>>>> I'm still trying to understand more precisely. So, how is it possible to
>>>>>>>>>>> have bio having node write in this current context?
>>>>>>>>>>
>>>>>>>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
>>>>>>>>>> during writebacking node_inode's data page (which should be node page)?
>>>>>>>>>
>>>>>>>>> Wasn't that added into a different task->plug?
>>>>>>>>
>>>>>>>> I'm not sure I've got your concern correctly...
>>>>>>>>
>>>>>>>> Do you mean NodeA and other IOs from do_writepages() were plugged in
>>>>>>>> different local plug variables?
>>>>>>>
>>>>>>> I think so.
>>>>>>
>>>>>> I guess block plug helper says it doesn't allow to use nested plug, so there
>>>>>> is only one plug in kworker thread?
>>>
>>> Is there only one kworker thread that flushes node and inode pages?
>>
>> IIRC, =one kworker per block device?
> 
> If there's one kworker only, f2fs_write_node_pages() should have flushed its
> plug?

No, f2fs_write_node_pages() failed to attach local plug into current->plug due to
current has attached plug from wb_writeback(), and also, f2fs_write_node_pages()
will fail to flush current->plug due to its local plug doesn't match current->plug.

void blk_start_plug_nr_ios()
{
	if (tsk->plug)
		return;
...
}

void blk_finish_plug(struct blk_plug *plug)
{
	if (plug == current->plug) {
		__blk_flush_plug(plug, false);
		current->plug = NULL;
	}
}

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>>>
>>>>>> void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>>>>>> {
>>>>>>        struct task_struct *tsk = current;
>>>>>>
>>>>>>        /*
>>>>>>         * If this is a nested plug, don't actually assign it.
>>>>>>         */
>>>>>>        if (tsk->plug)
>>>>>>            return;
>>>>>> ...
>>>>>> }
>>>>>
>>>>> Any further comments?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>                       - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
>>>>>>>>>>>>                        - f2fs_write_data_pages
>>>>>>>>>>>>                         - f2fs_write_single_data_page -- write last dirty page
>>>>>>>>>>>>                          - f2fs_do_write_data_page
>>>>>>>>>>>>                           - set_page_writeback  -- clear page dirty flag and
>>>>>>>>>>>>                           PAGECACHE_TAG_DIRTY tag in radix tree
>>>>>>>>>>>>                           - f2fs_outplace_write_data
>>>>>>>>>>>>                            - f2fs_update_data_blkaddr
>>>>>>>>>>>>                             - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
>>>>>>>>>>>>                          - inode_dec_dirty_pages
>>>>>>>>>>>>       - writeback_sb_inodes
>>>>>>>>>>>>        - writeback_single_inode
>>>>>>>>>>>>         - do_writepages
>>>>>>>>>>>>          - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
>>>>>>>>>>>>           - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
>>>>>>>>>>>>        - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
>>>>>>>>>>>>       - blk_finish_plug
>>>>>>>>>>>>
>>>>>>>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via
>>>>>>>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
>>>>>>>>>>>> due to valid sbi->wb_sync_req[DATA/NODE].
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
>>>>>>>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
>>>>>>>>>>>> Signed-off-by: Jing Xia <jing.xia@unisoc.com>
>>>>>>>>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       fs/f2fs/data.c | 6 +++++-
>>>>>>>>>>>>       fs/f2fs/node.c | 6 +++++-
>>>>>>>>>>>>       2 files changed, 10 insertions(+), 2 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>>>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644
>>>>>>>>>>>> --- a/fs/f2fs/data.c
>>>>>>>>>>>> +++ b/fs/f2fs/data.c
>>>>>>>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
>>>>>>>>>>>>           /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
>>>>>>>>>>>>           if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>>>>>               atomic_inc(&sbi->wb_sync_req[DATA]);
>>>>>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
>>>>>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
>>>>>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>>>>>> +        if (current->plug)
>>>>>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>>>>>               goto skip_write;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>           if (__should_serialize_io(inode, wbc)) {
>>>>>>>>>>>>               mutex_lock(&sbi->writepages);
>>>>>>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644
>>>>>>>>>>>> --- a/fs/f2fs/node.c
>>>>>>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>>>>>>>>>>>>           if (wbc->sync_mode == WB_SYNC_ALL)
>>>>>>>>>>>>               atomic_inc(&sbi->wb_sync_req[NODE]);
>>>>>>>>>>>> -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
>>>>>>>>>>>> +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
>>>>>>>>>>>> +        /* to avoid potential deadlock */
>>>>>>>>>>>> +        if (current->plug)
>>>>>>>>>>>> +            blk_finish_plug(current->plug);
>>>>>>>>>>>>               goto skip_write;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>           trace_f2fs_writepages(mapping->host, wbc, NODE);
>>>>>>>>>>>> -- 
>>>>>>>>>>>> 2.32.0
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-f2fs-devel mailing list
>>>>>> Linux-f2fs-devel@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-f2fs-devel mailing list
>>>>> Linux-f2fs-devel@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock
  2022-03-03  2:32                       ` Chao Yu
@ 2022-03-03 21:30                         ` Jaegeuk Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Jaegeuk Kim @ 2022-03-03 21:30 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, Jing Xia, Zhiguo Niu, linux-f2fs-devel

On 03/03, Chao Yu wrote:
> On 2022/3/3 3:45, Jaegeuk Kim wrote:
> > On 03/02, Chao Yu wrote:
> > > On 2022/3/2 13:26, Jaegeuk Kim wrote:
> > > > On 03/02, Chao Yu wrote:
> > > > > ping,
> > > > > 
> > > > > On 2022/2/25 11:02, Chao Yu wrote:
> > > > > > On 2022/2/3 22:57, Chao Yu wrote:
> > > > > > > On 2022/2/3 9:51, Jaegeuk Kim wrote:
> > > > > > > > On 01/29, Chao Yu wrote:
> > > > > > > > > On 2022/1/29 8:37, Jaegeuk Kim wrote:
> > > > > > > > > > On 01/28, Chao Yu wrote:
> > > > > > > > > > > On 2022/1/28 5:59, Jaegeuk Kim wrote:
> > > > > > > > > > > > On 01/27, Chao Yu wrote:
> > > > > > > > > > > > > Quoted from Jing Xia's report, there is a potential deadlock may happen
> > > > > > > > > > > > > between kworker and checkpoint as below:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > [T:writeback]                [T:checkpoint]
> > > > > > > > > > > > > - wb_writeback
> > > > > > > > > > > > >       - blk_start_plug
> > > > > > > > > > > > > bio contains NodeA was plugged in writeback threads
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm still trying to understand more precisely. So, how is it possible to
> > > > > > > > > > > > have bio having node write in this current context?
> > > > > > > > > > > 
> > > > > > > > > > > IMO, after above blk_start_plug(), it may plug some inode's node page in kworker
> > > > > > > > > > > during writebacking node_inode's data page (which should be node page)?
> > > > > > > > > > 
> > > > > > > > > > Wasn't that added into a different task->plug?
> > > > > > > > > 
> > > > > > > > > I'm not sure I've got your concern correctly...
> > > > > > > > > 
> > > > > > > > > Do you mean NodeA and other IOs from do_writepages() were plugged in
> > > > > > > > > different local plug variables?
> > > > > > > > 
> > > > > > > > I think so.
> > > > > > > 
> > > > > > > I guess block plug helper says it doesn't allow to use nested plug, so there
> > > > > > > is only one plug in kworker thread?
> > > > 
> > > > Is there only one kworker thread that flushes node and inode pages?
> > > 
> > > IIRC, =one kworker per block device?
> > 
> > If there's one kworker only, f2fs_write_node_pages() should have flushed its
> > plug?
> 
> No, f2fs_write_node_pages() failed to attach local plug into current->plug due to
> current has attached plug from wb_writeback(), and also, f2fs_write_node_pages()
> will fail to flush current->plug due to its local plug doesn't match current->plug.
> 
> void blk_start_plug_nr_ios()
> {
> 	if (tsk->plug)
> 		return;
> ...
> }
> 
> void blk_finish_plug(struct blk_plug *plug)
> {
> 	if (plug == current->plug) {
> 		__blk_flush_plug(plug, false);
> 		current->plug = NULL;
> 	}
> }

Ah, okay. Now I see. Thanks for the chasing down.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > > > 
> > > > > > > void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
> > > > > > > {
> > > > > > >        struct task_struct *tsk = current;
> > > > > > > 
> > > > > > >        /*
> > > > > > >         * If this is a nested plug, don't actually assign it.
> > > > > > >         */
> > > > > > >        if (tsk->plug)
> > > > > > >            return;
> > > > > > > ...
> > > > > > > }
> > > > > > 
> > > > > > Any further comments?
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > >                       - do_writepages  -- sync write inodeB, inc wb_sync_req[DATA]
> > > > > > > > > > > > >                        - f2fs_write_data_pages
> > > > > > > > > > > > >                         - f2fs_write_single_data_page -- write last dirty page
> > > > > > > > > > > > >                          - f2fs_do_write_data_page
> > > > > > > > > > > > >                           - set_page_writeback  -- clear page dirty flag and
> > > > > > > > > > > > >                           PAGECACHE_TAG_DIRTY tag in radix tree
> > > > > > > > > > > > >                           - f2fs_outplace_write_data
> > > > > > > > > > > > >                            - f2fs_update_data_blkaddr
> > > > > > > > > > > > >                             - f2fs_wait_on_page_writeback -- wait NodeA to writeback here
> > > > > > > > > > > > >                          - inode_dec_dirty_pages
> > > > > > > > > > > > >       - writeback_sb_inodes
> > > > > > > > > > > > >        - writeback_single_inode
> > > > > > > > > > > > >         - do_writepages
> > > > > > > > > > > > >          - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
> > > > > > > > > > > > >           - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
> > > > > > > > > > > > >        - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
> > > > > > > > > > > > >       - blk_finish_plug
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Let's try to avoid deadlock condition by forcing unplugging previous bio via
> > > > > > > > > > > > > blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
> > > > > > > > > > > > > due to valid sbi->wb_sync_req[DATA/NODE].
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
> > > > > > > > > > > > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > > > > > > > > > > > > Signed-off-by: Jing Xia <jing.xia@unisoc.com>
> > > > > > > > > > > > > Signed-off-by: Chao Yu <chao@kernel.org>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >       fs/f2fs/data.c | 6 +++++-
> > > > > > > > > > > > >       fs/f2fs/node.c | 6 +++++-
> > > > > > > > > > > > >       2 files changed, 10 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > > > > > > > > > index 76d6fe7b0c8f..932a4c81acaf 100644
> > > > > > > > > > > > > --- a/fs/f2fs/data.c
> > > > > > > > > > > > > +++ b/fs/f2fs/data.c
> > > > > > > > > > > > > @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> > > > > > > > > > > > >           /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
> > > > > > > > > > > > >           if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > > > > > >               atomic_inc(&sbi->wb_sync_req[DATA]);
> > > > > > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[DATA]))
> > > > > > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[DATA])) {
> > > > > > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > > > > > +        if (current->plug)
> > > > > > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > > > > > >               goto skip_write;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > >           if (__should_serialize_io(inode, wbc)) {
> > > > > > > > > > > > >               mutex_lock(&sbi->writepages);
> > > > > > > > > > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > > > > > > > > > index 556fcd8457f3..69c6bcaf5aae 100644
> > > > > > > > > > > > > --- a/fs/f2fs/node.c
> > > > > > > > > > > > > +++ b/fs/f2fs/node.c
> > > > > > > > > > > > > @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping,
> > > > > > > > > > > > >           if (wbc->sync_mode == WB_SYNC_ALL)
> > > > > > > > > > > > >               atomic_inc(&sbi->wb_sync_req[NODE]);
> > > > > > > > > > > > > -    else if (atomic_read(&sbi->wb_sync_req[NODE]))
> > > > > > > > > > > > > +    else if (atomic_read(&sbi->wb_sync_req[NODE])) {
> > > > > > > > > > > > > +        /* to avoid potential deadlock */
> > > > > > > > > > > > > +        if (current->plug)
> > > > > > > > > > > > > +            blk_finish_plug(current->plug);
> > > > > > > > > > > > >               goto skip_write;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > >           trace_f2fs_writepages(mapping->host, wbc, NODE);
> > > > > > > > > > > > > -- 
> > > > > > > > > > > > > 2.32.0
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Linux-f2fs-devel mailing list
> > > > > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Linux-f2fs-devel mailing list
> > > > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-10-14 11:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-16 15:50 [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock Chao Yu
2022-12-16 20:42 ` Eric Biggers
  -- strict thread matches above, loose matches on Subject: below --
2025-10-14 11:47 Chao Yu via Linux-f2fs-devel
2022-01-27  5:44 Chao Yu
2022-01-27 21:59 ` Jaegeuk Kim
2022-01-28  1:43   ` Chao Yu
2022-01-29  0:37     ` Jaegeuk Kim
2022-01-29  1:48       ` Chao Yu
2022-02-03  1:51         ` Jaegeuk Kim
2022-02-03 14:57           ` Chao Yu
2022-02-25  3:02             ` Chao Yu
2022-03-02  3:32               ` Chao Yu
2022-03-02  5:26                 ` Jaegeuk Kim
2022-03-02  8:14                   ` Chao Yu
2022-03-02 19:45                     ` Jaegeuk Kim
2022-03-03  2:32                       ` Chao Yu
2022-03-03 21:30                         ` Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).