linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: fix race between balance and cancel/pause
@ 2023-06-23  5:05 Josef Bacik
  2023-06-29 15:01 ` David Sterba
  2023-08-08  2:47 ` xiaoshoukui
  0 siblings, 2 replies; 5+ messages in thread
From: Josef Bacik @ 2023-06-23  5:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: syzbot+c0f3acf145cb465426d5

Syzbot reported a panic that looks like this

assertion failed: fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE_PAUSED, in fs/btrfs/ioctl.c:465
------------[ cut here ]------------
kernel BUG at fs/btrfs/messages.c:259!
RIP: 0010:btrfs_assertfail+0x2c/0x30 fs/btrfs/messages.c:259
Call Trace:
 <TASK>
 btrfs_exclop_balance fs/btrfs/ioctl.c:465 [inline]
 btrfs_ioctl_balance fs/btrfs/ioctl.c:3564 [inline]
 btrfs_ioctl+0x531e/0x5b30 fs/btrfs/ioctl.c:4632
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The reproducer is running a balance and a cancel or pause in parallel.  The way
balance finishes is a bit wonky, if we were paused we need to save the
balance_ctl in the fs_info, but clear it otherwise and cleanup.  However
we rely on the return values being specific errors, or having a cancel
request or no pause request.  If balance completes and returns 0, but we
have a pause or cancel request we won't do the appropriate cleanup, and
then the next time we try to start a balance we'll trip this ASSERT.

The error handling is just wrong here, we always want to clean up,
unless we got -ECANCELLED and we set the appropriate pause flag in the
exclusive op.  With this patch the reproducer ran for an hour without
tripping, previously it would trip in less than a few minutes.

Reported-by: syzbot+c0f3acf145cb465426d5@syzkaller.appspotmail.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a536d0e0e055..90d25b2a6fd1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4081,14 +4081,6 @@ static int alloc_profile_is_valid(u64 flags, int extended)
 	return has_single_bit_set(flags);
 }
 
-static inline int balance_need_close(struct btrfs_fs_info *fs_info)
-{
-	/* cancel requested || normal exit path */
-	return atomic_read(&fs_info->balance_cancel_req) ||
-		(atomic_read(&fs_info->balance_pause_req) == 0 &&
-		 atomic_read(&fs_info->balance_cancel_req) == 0);
-}
-
 /*
  * Validate target profile against allowed profiles and return true if it's OK.
  * Otherwise print the error message and return false.
@@ -4278,6 +4270,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 	u64 num_devices;
 	unsigned seq;
 	bool reducing_redundancy;
+	bool paused = false;
 	int i;
 
 	if (btrfs_fs_closing(fs_info) ||
@@ -4408,6 +4401,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 	if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
 		btrfs_info(fs_info, "balance: paused");
 		btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
+		paused = true;
 	}
 	/*
 	 * Balance can be canceled by:
@@ -4436,8 +4430,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 		btrfs_update_ioctl_balance_args(fs_info, bargs);
 	}
 
-	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
-	    balance_need_close(fs_info)) {
+	/* We didn't pause, we can clean everything up. */
+	if (!paused) {
 		reset_balance_state(fs_info);
 		btrfs_exclop_finish(fs_info);
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fix race between balance and cancel/pause
  2023-06-23  5:05 [PATCH] btrfs: fix race between balance and cancel/pause Josef Bacik
@ 2023-06-29 15:01 ` David Sterba
  2023-08-08  2:47 ` xiaoshoukui
  1 sibling, 0 replies; 5+ messages in thread
From: David Sterba @ 2023-06-29 15:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, syzbot+c0f3acf145cb465426d5

On Fri, Jun 23, 2023 at 01:05:41AM -0400, Josef Bacik wrote:
> Syzbot reported a panic that looks like this
> 
> assertion failed: fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE_PAUSED, in fs/btrfs/ioctl.c:465
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/messages.c:259!
> RIP: 0010:btrfs_assertfail+0x2c/0x30 fs/btrfs/messages.c:259
> Call Trace:
>  <TASK>
>  btrfs_exclop_balance fs/btrfs/ioctl.c:465 [inline]
>  btrfs_ioctl_balance fs/btrfs/ioctl.c:3564 [inline]
>  btrfs_ioctl+0x531e/0x5b30 fs/btrfs/ioctl.c:4632
>  vfs_ioctl fs/ioctl.c:51 [inline]
>  __do_sys_ioctl fs/ioctl.c:870 [inline]
>  __se_sys_ioctl fs/ioctl.c:856 [inline]
>  __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> The reproducer is running a balance and a cancel or pause in parallel.  The way
> balance finishes is a bit wonky, if we were paused we need to save the
> balance_ctl in the fs_info, but clear it otherwise and cleanup.  However
> we rely on the return values being specific errors, or having a cancel
> request or no pause request.  If balance completes and returns 0, but we
> have a pause or cancel request we won't do the appropriate cleanup, and
> then the next time we try to start a balance we'll trip this ASSERT.
> 
> The error handling is just wrong here, we always want to clean up,
> unless we got -ECANCELLED and we set the appropriate pause flag in the
> exclusive op.  With this patch the reproducer ran for an hour without
> tripping, previously it would trip in less than a few minutes.
> 
> Reported-by: syzbot+c0f3acf145cb465426d5@syzkaller.appspotmail.com
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fix race between balance and cancel/pause
  2023-06-23  5:05 [PATCH] btrfs: fix race between balance and cancel/pause Josef Bacik
  2023-06-29 15:01 ` David Sterba
@ 2023-08-08  2:47 ` xiaoshoukui
  2023-08-09 13:16   ` Josef Bacik
  1 sibling, 1 reply; 5+ messages in thread
From: xiaoshoukui @ 2023-08-08  2:47 UTC (permalink / raw)
  To: josef, dsterba; +Cc: clm, dsterba, linux-btrfs, linux-kernel, xiaoshoukui

I think this patch does not fully fix the issue.

This patch just fix assertion panic, but in the race situation, the ioctl pause 
request still returns an incorrect value 0 to the user which mislead the user the
pause request finished successfully. In fact, the balance request has not been paused.

Test results and analysis are as follows:
https://lore.kernel.org/linux-btrfs/20230726030617.109018-1-xiaoshoukui@gmail.com/T/#me125d17fa59e9e671149cc76d410ced747f488b1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fix race between balance and cancel/pause
  2023-08-08  2:47 ` xiaoshoukui
@ 2023-08-09 13:16   ` Josef Bacik
  2023-08-10  5:45     ` xiaoshoukui
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2023-08-09 13:16 UTC (permalink / raw)
  To: xiaoshoukui; +Cc: dsterba, clm, dsterba, linux-btrfs, linux-kernel, xiaoshoukui

On Mon, Aug 07, 2023 at 10:47:48PM -0400, xiaoshoukui wrote:
> I think this patch does not fully fix the issue.
> 
> This patch just fix assertion panic, but in the race situation, the ioctl pause 
> request still returns an incorrect value 0 to the user which mislead the user the
> pause request finished successfully. In fact, the balance request has not been paused.
> 
> Test results and analysis are as follows:
> https://lore.kernel.org/linux-btrfs/20230726030617.109018-1-xiaoshoukui@gmail.com/T/#me125d17fa59e9e671149cc76d410ced747f488b1

They're just two different issues.  My patch is concerned with the panic, yours
is concerned with getting the correct return value out to the user.

Rebase your patch ontop of Sterba's tree with my fix and send it along, getting
an accurate errno out to the user is a reasonable goal.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: fix race between balance and cancel/pause
  2023-08-09 13:16   ` Josef Bacik
@ 2023-08-10  5:45     ` xiaoshoukui
  0 siblings, 0 replies; 5+ messages in thread
From: xiaoshoukui @ 2023-08-10  5:45 UTC (permalink / raw)
  To: josef
  Cc: clm, dsterba, dsterba, linux-btrfs, linux-kernel, xiaoshoukui,
	xiaoshoukui

> They're just two different issues.  My patch is concerned with the panic, yours
> is concerned with getting the correct return value out to the user.

Agreed.

> Rebase your patch ontop of Sterba's tree with my fix and send it along, getting
> an accurate errno out to the user is a reasonable goal.  Thanks,

Send the patch through below thread, pls review. Thanks.
https://lore.kernel.org/linux-btrfs/20230810034810.23934-1-xiaoshoukui@gmail.com/T/#u

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-10  5:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23  5:05 [PATCH] btrfs: fix race between balance and cancel/pause Josef Bacik
2023-06-29 15:01 ` David Sterba
2023-08-08  2:47 ` xiaoshoukui
2023-08-09 13:16   ` Josef Bacik
2023-08-10  5:45     ` xiaoshoukui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).