* [PATCH 0/2] fix issues during suspended replace operation
@ 2022-08-12 10:32 Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
To: linux-btrfs
While trying to reproduce the issue reported in the ML. Found few
issues as in the individual independent patches below.
Anand Jain (2):
btrfs: fix assert during replace-cancel of suspended replace-operation
btrfs: add info when mount fails due to stale replace target
fs/btrfs/dev-replace.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--
2.33.1
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain @ 2022-08-12 10:32 ` Anand Jain 2022-09-22 10:00 ` Filipe Manana 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba 2 siblings, 1 reply; 7+ messages in thread From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw) To: linux-btrfs If the filesystem mounts with the replace-operation in a suspended state and try to cancel the suspended replace-operation, we hit the assert. The assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state") that was actually not required. So just remove it. $ mount /dev/sda5 /btrfs BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' $ mount -o degraded /dev/sda5 /btrfs <-- success. $ btrfs replace cancel /btrfs kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 kernel: ------------[ cut here ]------------ kernel: kernel BUG at fs/btrfs/ctree.h:3750! After the patch: $ btrfs replace cancel /btrfs BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/dev-replace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 488f2105c5d0..9d46a702bc11 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) up_write(&dev_replace->rwsem); /* Scrub for replace must not be running in suspended state */ - ret = btrfs_scrub_cancel(fs_info); - ASSERT(ret != -ENOTCONN); + btrfs_scrub_cancel(fs_info); trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { -- 2.33.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain @ 2022-09-22 10:00 ` Filipe Manana 2022-09-22 11:28 ` Anand Jain 0 siblings, 1 reply; 7+ messages in thread From: Filipe Manana @ 2022-09-22 10:00 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote: > > If the filesystem mounts with the replace-operation in a suspended state > and try to cancel the suspended replace-operation, we hit the assert. The > assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's > scrub must not be running in suspended state") that was actually not > required. So just remove it. > > $ mount /dev/sda5 /btrfs > > BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing > BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' > > $ mount -o degraded /dev/sda5 /btrfs <-- success. > > $ btrfs replace cancel /btrfs > > kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 > kernel: ------------[ cut here ]------------ > kernel: kernel BUG at fs/btrfs/ctree.h:3750! > > After the patch: > > $ btrfs replace cancel /btrfs > > BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled Anand, can you please add a test case to fstests? This is a scenario with no coverage at all in fstests, therefore specially useful to have it there. Thanks. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/dev-replace.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index 488f2105c5d0..9d46a702bc11 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) > up_write(&dev_replace->rwsem); > > /* Scrub for replace must not be running in suspended state */ > - ret = btrfs_scrub_cancel(fs_info); > - ASSERT(ret != -ENOTCONN); > + btrfs_scrub_cancel(fs_info); > > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > -- > 2.33.1 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-09-22 10:00 ` Filipe Manana @ 2022-09-22 11:28 ` Anand Jain 0 siblings, 0 replies; 7+ messages in thread From: Anand Jain @ 2022-09-22 11:28 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs On 22/09/2022 18:00, Filipe Manana wrote: > On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote: >> >> If the filesystem mounts with the replace-operation in a suspended state >> and try to cancel the suspended replace-operation, we hit the assert. The >> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's >> scrub must not be running in suspended state") that was actually not >> required. So just remove it. >> >> $ mount /dev/sda5 /btrfs >> >> BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing >> BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' >> >> $ mount -o degraded /dev/sda5 /btrfs <-- success. >> >> $ btrfs replace cancel /btrfs >> >> kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 >> kernel: ------------[ cut here ]------------ >> kernel: kernel BUG at fs/btrfs/ctree.h:3750! >> >> After the patch: >> >> $ btrfs replace cancel /btrfs >> >> BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled > > Anand, can you please add a test case to fstests? > This is a scenario with no coverage at all in fstests, therefore > specially useful to have it there. > I thought about it before and found that unless we implement the replace-pause sub-command, we can't get a pending replace item in an unmounted btrfs using a script. However, to test it manually, I did an abrupt reboot (or power-off, I think). Thanks. > Thanks. > >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/dev-replace.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c >> index 488f2105c5d0..9d46a702bc11 100644 >> --- a/fs/btrfs/dev-replace.c >> +++ b/fs/btrfs/dev-replace.c >> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) >> up_write(&dev_replace->rwsem); >> >> /* Scrub for replace must not be running in suspended state */ >> - ret = btrfs_scrub_cancel(fs_info); >> - ASSERT(ret != -ENOTCONN); >> + btrfs_scrub_cancel(fs_info); >> >> trans = btrfs_start_transaction(root, 0); >> if (IS_ERR(trans)) { >> -- >> 2.33.1 >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: add info when mount fails due to stale replace target 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain @ 2022-08-12 10:32 ` Anand Jain 2022-08-22 19:36 ` David Sterba 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba 2 siblings, 1 reply; 7+ messages in thread From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw) To: linux-btrfs; +Cc: Samuel Greiner If the replace-target device re-appears after the suspended replace is cancelled, it blocks the mount operation as it can't find the matching replace-item in the metadata. As shown below, BTRFS error (device sda5): replace devid present without an active replace item To overcome this situation, the user can run the command btrfs device scan --forget <device-path-to-devid=0> and try the mount command again. And also, to avoid repeating the issue, superblock on the devid=0 must be wiped. wipefs -a device-path-to-devid=0. This patch adds some info when this situation occurs. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Samuel Greiner <samuel@balkonien.org> Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/ --- fs/btrfs/dev-replace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 9d46a702bc11..7202b76ce59f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) if (btrfs_find_device(fs_info->fs_devices, &args)) { btrfs_err(fs_info, "replace devid present without an active replace item"); + btrfs_info(fs_info, + "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'"); ret = -EUCLEAN; } else { dev_replace->srcdev = NULL; -- 2.33.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: add info when mount fails due to stale replace target 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain @ 2022-08-22 19:36 ` David Sterba 0 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs, Samuel Greiner On Fri, Aug 12, 2022 at 06:32:19PM +0800, Anand Jain wrote: > If the replace-target device re-appears after the suspended replace is > cancelled, it blocks the mount operation as it can't find the matching > replace-item in the metadata. As shown below, > > BTRFS error (device sda5): replace devid present without an active replace item > > To overcome this situation, the user can run the command > > btrfs device scan --forget <device-path-to-devid=0> > > and try the mount command again. And also, to avoid repeating the issue, > superblock on the devid=0 must be wiped. > > wipefs -a device-path-to-devid=0. > > This patch adds some info when this situation occurs. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > Reported-by: Samuel Greiner <samuel@balkonien.org> > Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/ > --- > fs/btrfs/dev-replace.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index 9d46a702bc11..7202b76ce59f 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) > if (btrfs_find_device(fs_info->fs_devices, &args)) { > btrfs_err(fs_info, > "replace devid present without an active replace item"); > + btrfs_info(fs_info, > + "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'"); The messages should be on the same level and in one message, I've reprhrased it a bit. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] fix issues during suspended replace operation 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain @ 2022-08-22 19:36 ` David Sterba 2 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs On Fri, Aug 12, 2022 at 06:32:17PM +0800, Anand Jain wrote: > While trying to reproduce the issue reported in the ML. Found few > issues as in the individual independent patches below. > > Anand Jain (2): > btrfs: fix assert during replace-cancel of suspended replace-operation > btrfs: add info when mount fails due to stale replace target Added to misc-next, thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-22 11:28 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain 2022-09-22 10:00 ` Filipe Manana 2022-09-22 11:28 ` Anand Jain 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain 2022-08-22 19:36 ` David Sterba 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.