* [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE @ 2026-04-13 22:45 Benjamin Marzinski 2026-04-14 1:25 ` Li Nan 0 siblings, 1 reply; 8+ messages in thread From: Benjamin Marzinski @ 2026-04-13 22:45 UTC (permalink / raw) To: Yu Kuai, Song Liu, Li Nan; +Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon When make_stripe_request() encounters a clone bio that crosses the reshape position while the reshape cannot make progress, it was setting bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE. This will update the original bio's bi_status in md_end_clone_io(). Afterwards, md_handle_request() will wait for the device to become unsuspended and submit a new cloned bio. However, even if that clone completes successfully, it will not clear the original bio's bi_status. There's no need to set bi_status when retrying the bio. md will already error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- drivers/md/raid5.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index dc0c680ca199..690c65cd1e29 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6042,7 +6042,6 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, raid5_release_stripe(sh); out: if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) { - bi->bi_status = BLK_STS_RESOURCE; ret = STRIPE_WAIT_RESHAPE; pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress"); } -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE 2026-04-13 22:45 [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Benjamin Marzinski @ 2026-04-14 1:25 ` Li Nan 2026-04-14 6:20 ` Yu Kuai 0 siblings, 1 reply; 8+ messages in thread From: Li Nan @ 2026-04-14 1:25 UTC (permalink / raw) To: Benjamin Marzinski, Yu Kuai, Song Liu Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon 在 2026/4/14 6:45, Benjamin Marzinski 写道: > When make_stripe_request() encounters a clone bio that crosses the > reshape position while the reshape cannot make progress, it was setting > bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE. > This will update the original bio's bi_status in md_end_clone_io(). > Afterwards, md_handle_request() will wait for the device to become > unsuspended and submit a new cloned bio. However, even if that clone > completes successfully, it will not clear the original bio's bi_status. > > There's no need to set bi_status when retrying the bio. md will already > error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will > be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE. > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > drivers/md/raid5.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index dc0c680ca199..690c65cd1e29 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -6042,7 +6042,6 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, > raid5_release_stripe(sh); > out: > if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) { > - bi->bi_status = BLK_STS_RESOURCE; > ret = STRIPE_WAIT_RESHAPE; > pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress"); > } The link below leads to the same patch, which Kuai has already replied to. https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/ -- Thanks, Nan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE 2026-04-14 1:25 ` Li Nan @ 2026-04-14 6:20 ` Yu Kuai 2026-04-14 18:19 ` Benjamin Marzinski 0 siblings, 1 reply; 8+ messages in thread From: Yu Kuai @ 2026-04-14 6:20 UTC (permalink / raw) To: Li Nan, Benjamin Marzinski, Yu Kuai, Song Liu, yukuai Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon Hi, 在 2026/4/14 9:25, Li Nan 写道: > > > 在 2026/4/14 6:45, Benjamin Marzinski 写道: >> When make_stripe_request() encounters a clone bio that crosses the >> reshape position while the reshape cannot make progress, it was setting >> bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE. >> This will update the original bio's bi_status in md_end_clone_io(). >> Afterwards, md_handle_request() will wait for the device to become >> unsuspended and submit a new cloned bio. However, even if that clone >> completes successfully, it will not clear the original bio's bi_status. >> >> There's no need to set bi_status when retrying the bio. md will already >> error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will >> be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE. >> >> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> >> --- >> drivers/md/raid5.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index dc0c680ca199..690c65cd1e29 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -6042,7 +6042,6 @@ static enum stripe_result >> make_stripe_request(struct mddev *mddev, >> raid5_release_stripe(sh); >> out: >> if (ret == STRIPE_SCHEDULE_AND_RETRY && >> reshape_interrupted(mddev)) { >> - bi->bi_status = BLK_STS_RESOURCE; >> ret = STRIPE_WAIT_RESHAPE; >> pr_err_ratelimited("dm-raid456: io across reshape position >> while reshape can't make progress"); >> } > > The link below leads to the same patch, which Kuai has already replied > to. > > https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/ Perhaps instead of clearing the error code from error path, this problem can be fixed by resetting the error code from the issue path if original bio is resubmitted. > > > -- > Thanks, > Nan > > -- Thansk, Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE 2026-04-14 6:20 ` Yu Kuai @ 2026-04-14 18:19 ` Benjamin Marzinski 2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski 2026-04-15 1:28 ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei 0 siblings, 2 replies; 8+ messages in thread From: Benjamin Marzinski @ 2026-04-14 18:19 UTC (permalink / raw) To: Yu Kuai Cc: Li Nan, Yu Kuai, Song Liu, linux-raid, dm-devel, Xiao Ni, Nigel Croxon, Yang Xiuwei On Tue, Apr 14, 2026 at 02:20:40PM +0800, Yu Kuai wrote: > Hi, > > 在 2026/4/14 9:25, Li Nan 写道: > > > > > > 在 2026/4/14 6:45, Benjamin Marzinski 写道: > >> When make_stripe_request() encounters a clone bio that crosses the > >> reshape position while the reshape cannot make progress, it was setting > >> bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE. > >> This will update the original bio's bi_status in md_end_clone_io(). > >> Afterwards, md_handle_request() will wait for the device to become > >> unsuspended and submit a new cloned bio. However, even if that clone > >> completes successfully, it will not clear the original bio's bi_status. > >> > >> There's no need to set bi_status when retrying the bio. md will already > >> error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will > >> be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE. > >> > >> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > >> --- > >> drivers/md/raid5.c | 1 - > >> 1 file changed, 1 deletion(-) > >> > >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >> index dc0c680ca199..690c65cd1e29 100644 > >> --- a/drivers/md/raid5.c > >> +++ b/drivers/md/raid5.c > >> @@ -6042,7 +6042,6 @@ static enum stripe_result > >> make_stripe_request(struct mddev *mddev, > >> raid5_release_stripe(sh); > >> out: > >> if (ret == STRIPE_SCHEDULE_AND_RETRY && > >> reshape_interrupted(mddev)) { > >> - bi->bi_status = BLK_STS_RESOURCE; > >> ret = STRIPE_WAIT_RESHAPE; > >> pr_err_ratelimited("dm-raid456: io across reshape position > >> while reshape can't make progress"); > >> } > > > > The link below leads to the same patch, which Kuai has already replied > > to. > > > > https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/ > > Perhaps instead of clearing the error code from error path, this problem can be fixed > by resetting the error code from the issue path if original bio is resubmitted. I saw your comments at https://lore.kernel.org/all/71e50b0e-0669-4a40-84d5-3c3061dfb229@fnnas.com/ and I'm a little confused. The only code path where STRIPE_WAIT_RESHAPE is returned and bi->bi_status is currently set to BLK_STS_RESOURCE is: md_handle_request -> raid5_make_request -> make_stripe_request() make_stripe_request() returning STRIPE_WAIT_RESHAPE, means that raid5_make_request() will return false (this is the only situation where raid5_make_request() returns false). This causes the cloned bio to be freed without completing the original bio. raid5_make_request() returning false will cause md_handle_request() to do different things, depending on whether the device is a dm device or a md device. For dm devices, md_handle_request() will return false, causing dm-raid.c:raid_map() to return DM_MAPIO_REQUEUE. This will either requeue dm's original bio (md's orignal bio is itself a clone of dm's original bio) if the device is currently in a noflush suspend or complete dm's original bio with BLK_STS_IOERR if the device is not. Since the DM_MAPIO_REQUEUE overrides any error for bios that should be requeued, removing "bi->bi_status = BLK_STS_RESOURCE" doesn't actually seem important for DM. But for md devices, md_handle_request() will loop back to check_suspend, which will complete the bio with BLK_STS_AGAIN if it's a REQ_NOWAIT bio, and will otherwise wait until the device is no longer suspended to call raid5_make_request() again. If that later call to raid5_make_request() completes successfully, the original bio will retain the BLK_STS_RESOURCE status from the earlier failed call, instead of completing successfully like it should. I don't see where a bio could get completed without bio->bi_status getting set to an approriate error here. Am I missing something? Obviously clearing the error when you resubmit would fix the issue as well. It just seems odd to set it and then clear it when AFAICT nothing requires it to be set in the first place. But perhaps I'm overlooking something. Yang Xiuwei, have you verified that this fix actually solves your problems? If a dm map() function completes with DM_MAPIO_REQUEUE, and the device is in a noflush suspend, it shouldn't set the error on the original bio, regardless of the clone bio. It should requeue the bio. If a dm map() function completes with DM_MAPIO_REQUEUE, and the device isn't in a noflush suspend, the original bio will always be completed with an error. To me, it seems more likely that what you are seeing is make_stripe_request() returning STRIPE_WAIT_RESHAPE when the dm device isn't actually in a noflush suspend. I have seen this myself. -Ben > > > > > > > -- > > Thanks, > > Nan > > > > > -- > Thansk, > Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] dm-raid: only requeue bios when dm is suspending. 2026-04-14 18:19 ` Benjamin Marzinski @ 2026-04-14 19:03 ` Benjamin Marzinski 2026-04-22 9:58 ` Xiao Ni 2026-04-28 8:35 ` Yu Kuai 2026-04-15 1:28 ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei 1 sibling, 2 replies; 8+ messages in thread From: Benjamin Marzinski @ 2026-04-14 19:03 UTC (permalink / raw) To: Yang Xiuwei Cc: Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Xiao Ni, Nigel Croxon returning DM_MAPIO_REQUEUE from the target map() function only requeues the bio during noflush suspends. During regular operations or during flushing suspends, it fails the bio. Failing the bio during flushing suspends is the correct behavior here. We cannot handle the bio, and we cannot suspends while it is outstanding. But during normal operations, we should not push the bio back to do. Instead, wait for the reshape to be resumed. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- Yang Xiuwei, if you are still able to see I/O errors during LVM testing, does this patch fix them? drivers/md/dm-raid.c | 7 +++++++ drivers/md/md.h | 1 + drivers/md/raid5.c | 6 ++++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 4bacdc499984..cac61d57e7e2 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti) * resume, raid_postsuspend() is too late. */ set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); + WRITE_ONCE(mddev->dm_suspending, 1); if (!reshape_interrupted(mddev)) return; @@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti) static void raid_presuspend_undo(struct dm_target *ti) { struct raid_set *rs = ti->private; + struct mddev *mddev = &rs->md; + + WRITE_ONCE(mddev->dm_suspending, 0); clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); } @@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti) static void raid_postsuspend(struct dm_target *ti) { struct raid_set *rs = ti->private; + struct mddev *mddev = &rs->md; if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { /* @@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti) mddev_suspend(&rs->md, false); rs->md.ro = MD_RDONLY; } + WRITE_ONCE(mddev->dm_suspending, 0); + } static void attempt_restore_of_faulty_devices(struct raid_set *rs) diff --git a/drivers/md/md.h b/drivers/md/md.h index ac84289664cd..e8d7332c5cb9 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -463,6 +463,7 @@ struct mddev { int delta_disks, new_level, new_layout; int new_chunk_sectors; int reshape_backwards; + int dm_suspending; struct md_thread __rcu *thread; /* management thread */ struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */ diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8854e024f311..d528263f92a3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, raid5_release_stripe(sh); out: if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) { - bi->bi_status = BLK_STS_RESOURCE; - ret = STRIPE_WAIT_RESHAPE; + if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) { + bi->bi_status = BLK_STS_RESOURCE; + ret = STRIPE_WAIT_RESHAPE; + } pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress"); } return ret; -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] dm-raid: only requeue bios when dm is suspending. 2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski @ 2026-04-22 9:58 ` Xiao Ni 2026-04-28 8:35 ` Yu Kuai 1 sibling, 0 replies; 8+ messages in thread From: Xiao Ni @ 2026-04-22 9:58 UTC (permalink / raw) To: Benjamin Marzinski Cc: Yang Xiuwei, Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Nigel Croxon On Wed, Apr 15, 2026 at 3:03 AM Benjamin Marzinski <bmarzins@redhat.com> wrote: > > returning DM_MAPIO_REQUEUE from the target map() function only requeues > the bio during noflush suspends. During regular operations or during > flushing suspends, it fails the bio. Failing the bio during flushing > suspends is the correct behavior here. We cannot handle the bio, and we > cannot suspends while it is outstanding. But during normal operations, > we should not push the bio back to do. Instead, wait for the reshape > to be resumed. > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > > Yang Xiuwei, if you are still able to see I/O errors during LVM testing, > does this patch fix them? > > drivers/md/dm-raid.c | 7 +++++++ > drivers/md/md.h | 1 + > drivers/md/raid5.c | 6 ++++-- > 3 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > index 4bacdc499984..cac61d57e7e2 100644 > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti) > * resume, raid_postsuspend() is too late. > */ > set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); > + WRITE_ONCE(mddev->dm_suspending, 1); > > if (!reshape_interrupted(mddev)) > return; > @@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti) > static void raid_presuspend_undo(struct dm_target *ti) > { > struct raid_set *rs = ti->private; > + struct mddev *mddev = &rs->md; > + > + WRITE_ONCE(mddev->dm_suspending, 0); > > clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); > } > @@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti) > static void raid_postsuspend(struct dm_target *ti) > { > struct raid_set *rs = ti->private; > + struct mddev *mddev = &rs->md; > > if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > /* > @@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti) > mddev_suspend(&rs->md, false); > rs->md.ro = MD_RDONLY; > } > + WRITE_ONCE(mddev->dm_suspending, 0); > + > } > > static void attempt_restore_of_faulty_devices(struct raid_set *rs) > diff --git a/drivers/md/md.h b/drivers/md/md.h > index ac84289664cd..e8d7332c5cb9 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -463,6 +463,7 @@ struct mddev { > int delta_disks, new_level, new_layout; > int new_chunk_sectors; > int reshape_backwards; > + int dm_suspending; > > struct md_thread __rcu *thread; /* management thread */ > struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */ > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 8854e024f311..d528263f92a3 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, > raid5_release_stripe(sh); > out: > if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) { > - bi->bi_status = BLK_STS_RESOURCE; > - ret = STRIPE_WAIT_RESHAPE; > + if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) { > + bi->bi_status = BLK_STS_RESOURCE; > + ret = STRIPE_WAIT_RESHAPE; > + } > pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress"); > } > return ret; > -- > 2.50.1 > Looks good to me. Reviewed-by: Xiao Ni <xni@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] dm-raid: only requeue bios when dm is suspending. 2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski 2026-04-22 9:58 ` Xiao Ni @ 2026-04-28 8:35 ` Yu Kuai 1 sibling, 0 replies; 8+ messages in thread From: Yu Kuai @ 2026-04-28 8:35 UTC (permalink / raw) To: Benjamin Marzinski, Yang Xiuwei Cc: Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Xiao Ni, Nigel Croxon, yukuai Hi, 在 2026/4/15 3:03, Benjamin Marzinski 写道: > returning DM_MAPIO_REQUEUE from the target map() function only requeues > the bio during noflush suspends. During regular operations or during > flushing suspends, it fails the bio. Failing the bio during flushing > suspends is the correct behavior here. We cannot handle the bio, and we > cannot suspends while it is outstanding. But during normal operations, > we should not push the bio back to do. Instead, wait for the reshape > to be resumed. > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > > Yang Xiuwei, if you are still able to see I/O errors during LVM testing, > does this patch fix them? > > drivers/md/dm-raid.c | 7 +++++++ > drivers/md/md.h | 1 + > drivers/md/raid5.c | 6 ++++-- > 3 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > index 4bacdc499984..cac61d57e7e2 100644 > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti) > * resume, raid_postsuspend() is too late. > */ > set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); > + WRITE_ONCE(mddev->dm_suspending, 1); > > if (!reshape_interrupted(mddev)) > return; > @@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti) > static void raid_presuspend_undo(struct dm_target *ti) > { > struct raid_set *rs = ti->private; > + struct mddev *mddev = &rs->md; > + > + WRITE_ONCE(mddev->dm_suspending, 0); > > clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags); > } > @@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti) > static void raid_postsuspend(struct dm_target *ti) > { > struct raid_set *rs = ti->private; > + struct mddev *mddev = &rs->md; > > if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > /* > @@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti) > mddev_suspend(&rs->md, false); > rs->md.ro = MD_RDONLY; > } > + WRITE_ONCE(mddev->dm_suspending, 0); > + > } > > static void attempt_restore_of_faulty_devices(struct raid_set *rs) > diff --git a/drivers/md/md.h b/drivers/md/md.h > index ac84289664cd..e8d7332c5cb9 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -463,6 +463,7 @@ struct mddev { > int delta_disks, new_level, new_layout; > int new_chunk_sectors; > int reshape_backwards; > + int dm_suspending; This patch looks fine, however, can you also optimize it by a new flag instead a new int field ? > > struct md_thread __rcu *thread; /* management thread */ > struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */ > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 8854e024f311..d528263f92a3 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, > raid5_release_stripe(sh); > out: > if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) { > - bi->bi_status = BLK_STS_RESOURCE; > - ret = STRIPE_WAIT_RESHAPE; > + if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) { > + bi->bi_status = BLK_STS_RESOURCE; > + ret = STRIPE_WAIT_RESHAPE; > + } > pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress"); > } > return ret; -- Thansk, Kuai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE 2026-04-14 18:19 ` Benjamin Marzinski 2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski @ 2026-04-15 1:28 ` Yang Xiuwei 1 sibling, 0 replies; 8+ messages in thread From: Yang Xiuwei @ 2026-04-15 1:28 UTC (permalink / raw) To: Benjamin Marzinski Cc: Yu Kuai, Li Nan, Song Liu, Xiao Ni, Nigel Croxon, linux-raid, dm-devel Hi Ben, On Tue, Apr 14, 2026 at 02:19:33PM -0400, Benjamin Marzinski wrote: > Yang Xiuwei, have you verified that this fix actually solves your > problems? If a dm map() function completes with DM_MAPIO_REQUEUE, and > the device is in a noflush suspend, it shouldn't set the error on the > original bio, regardless of the clone bio. It should requeue the bio. If > a dm map() function completes with DM_MAPIO_REQUEUE, and the device > isn't in a noflush suspend, the original bio will always be completed > with an error. > > To me, it seems more likely that what you are seeing is > make_stripe_request() returning STRIPE_WAIT_RESHAPE when the dm device > isn't actually in a noflush suspend. I have seen this myself. > > -Ben I tested the version that removes setting bi->bi_status to BLK_STS_RESOURCE in the STRIPE_WAIT_RESHAPE path you described. In my environment it did not fix the failure below. Sorry for the slow response. The earlier fix still did not solve the problem in my testing. I am not very familiar with this area yet and wanted to learn more before continuing the analysis, but other work meant I have not had time to pick it up again until now. I have not yet tested the dm-raid RFC patch from your follow-up message, but I plan to try it when I have time. The failure was observed while running the LVM2 shell test lvconvert-raid-reshape-stripes-load-fail.sh. Below is the test log (kernel messages and harness output), followed by the script contents. Test log: | [ 0:10.630] WARNING: This metadata update is NOT backed up. | [ 0:10.632] aux disable_dev $dev1 | [ 0:10.748] #lvconvert-raid-reshape-stripes-load-fail.sh:68+ aux disable_dev /dev/mapper/LVMTEST1351568pv1 | [ 0:10.748] Disabling device /dev/mapper/LVMTEST1351568pv1 (252:5) | [ 0:10.868] [73439.222696] <6> 2026-01-20 13:59:47 md: reshape of RAID array mdX | [ 0:10.868] aux delay_dev "$dev2" 0 50 | [ 0:10.871] #lvconvert-raid-reshape-stripes-load-fail.sh:69+ aux delay_dev /dev/mapper/LVMTEST1351568pv2 0 50 | [ 0:10.871] check lv_first_seg_field $vg/$lv1 segtype "raid5_ls" | [ 0:10.886] [73439.231558] <3> 2026-01-20 13:59:47 Buffer I/O error on dev dm-5, logical block 0, async page read | [ 0:10.886] #lvconvert-raid-reshape-stripes-load-fail.sh:70+ check lv_first_seg_field LVMTEST1351568vg/LV1 segtype raid5_ls | [ 0:10.886] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ. | [ 0:10.910] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1). | [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices. | [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices. | [ 0:10.910] check lv_first_seg_field $vg/$lv1 stripesize "64.00k" | [ 0:10.912] #lvconvert-raid-reshape-stripes-load-fail.sh:71+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripesize 64.00k | [ 0:10.912] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ. | [ 0:10.933] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1). | [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices. | [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices. | [ 0:10.933] check lv_first_seg_field $vg/$lv1 data_stripes 15 | [ 0:10.935] #lvconvert-raid-reshape-stripes-load-fail.sh:72+ check lv_first_seg_field LVMTEST1351568vg/LV1 data_stripes 15 | [ 0:10.935] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ. | [ 0:10.956] [73439.292632] <3> 2026-01-20 13:59:47 md: super_written gets error=-5 | [ 0:10.956] [73439.297679] <2> 2026-01-20 13:59:47 md/raid:mdX: Disk failure on dm-22, disabling device. | [ 0:10.956] [73439.304626] <2> 2026-01-20 13:59:47 md/raid:mdX: Operation continuing on 15 devices. | [ 0:10.956] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1). | [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices. | [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices. | [ 0:10.956] check lv_first_seg_field $vg/$lv1 stripes 16 | [ 0:10.958] #lvconvert-raid-reshape-stripes-load-fail.sh:73+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripes 16 | [ 0:10.958] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ. | [ 0:10.979] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1). | [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices. | [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices. | [ 0:10.979] | [ 0:10.981] kill -9 %% | [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:75+ kill -9 %% | [ 0:10.981] wait | [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:76+ wait | [ 0:10.981] rm -fr "$mount_dir/[12]" | [ 0:11.787] [73439.674065] <4> 2026-01-20 13:59:48 make_stripe_request: 24 callbacks suppressed | [ 0:11.787] [73439.674074] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.674086] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1074, lost sync page write | [ 0:11.787] [73439.681096] <6> 2026-01-20 13:59:48 md: mdX: reshape interrupted. | [ 0:11.787] [73439.682723] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.691180] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.699766] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.708347] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.716934] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.725519] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.734099] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.734574] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_check_bdev_write_error:225: comm kworker/u388:2: Error while async write back metadata | [ 0:11.787] [73439.742682] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.787] [73439.764081] <3> 2026-01-20 13:59:48 Aborting journal on device dm-43-8. | [ 0:11.787] [73439.778040] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress | [ 0:11.788] [73439.778043] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 740, lost sync page write | [ 0:11.788] [73439.795025] <3> 2026-01-20 13:59:48 JBD2: I/O error when updating journal superblock for dm-43-8. | [ 0:11.788] [73439.802674] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal | [ 0:11.788] [73439.802673] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal | [ 0:11.788] [73440.032568] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1, lost sync page write | [ 0:11.788] [73440.040800] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock | [ 0:11.788] [73440.040813] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): previous I/O error to superblock detected | [ 0:11.788] [73440.047569] <2> 2026-01-20 13:59:48 EXT4-fs (dm-43): Remounting filesystem read-only | [ 0:11.788] [73440.054948] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1, lost sync page write | [ 0:11.788] [73440.069663] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock | [ 0:11.788] [73440.076428] <2> 2026-01-20 13:59:48 EXT4-fs (dm-43): Remounting filesystem read-only | [ 0:11.788] #lvconvert-raid-reshape-stripes-load-fail.sh:77+ rm -fr 'mnt/[12]' | [ 0:11.788] | [ 0:11.789] sync | [ 0:11.789] #lvconvert-raid-reshape-stripes-load-fail.sh:79+ sync | [ 0:11.789] umount "$mount_dir" | [ 0:11.798] [73440.145596] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 82, lost async page write | [ 0:11.798] #lvconvert-raid-reshape-stripes-load-fail.sh:80+ umount mnt | [ 0:11.798] | [ 0:11.814] fsck -fn "$DM_DEV_DIR/$vg/$lv1" | [ 0:11.814] [73440.162114] <6> 2026-01-20 13:59:48 EXT4-fs (dm-43): unmounting filesystem 86548d8e-e409-4ae8-b7d5-8b78a9b5fb50. | [ 0:11.814] [73440.162336] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock | [ 0:11.814] #lvconvert-raid-reshape-stripes-load-fail.sh:82+ fsck -fn /dev/LVMTEST1351568vg/LV1 | [ 0:11.814] fsck from util-linux 2.39.1 | [ 0:11.816] e2fsck 1.47.0 (5-Feb-2023) | [ 0:11.821] fsck.ext2: Input/output error while trying to open /dev/mapper/LVMTEST1351568vg-LV1 | [ 0:11.821] | [ 0:11.821] The superblock could not be read or does not describe a valid ext2/ext3/ext4 | [ 0:11.821] filesystem. If the device is valid and it really contains an ext2/ext3/ext4 | [ 0:11.821] filesystem (and not swap or ufs or something else), then the superblock | [ 0:11.821] is corrupt, and you might try running e2fsck with an alternate superblock: | [ 0:11.821] e2fsck -b 8193 <device> | [ 0:11.821] or | [ 0:11.821] e2fsck -b 32768 <device> | [ 0:11.821] | [ 0:11.821] set +vx; STACKTRACE; set -vx | [ 0:11.822] ##lvconvert-raid-reshape-stripes-load-fail.sh:82+ set +vx | [ 0:11.822] ## - /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82 | [ 0:11.822] ## 1 STACKTRACE() called from /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82 lvconvert-raid-reshape-stripes-load-fail.sh: #!/usr/bin/env bash # Copyright (C) 2017 Red Hat, Inc. All rights reserved. # # This copyrighted material is made available to anyone wishing to use, # modify, copy, or redistribute it subject to the terms and conditions # of the GNU General Public License v.2. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software Foundation, # Inc., 51 Franklin Street, Fifth Floor, Boston, MA2110-1301 USA SKIP_WITH_LVMPOLLD=1 . lib/inittest # Test reshaping under io load case "$(uname -r)" in 3.10.0-862*) skip "Cannot run this test on unfixed kernel." ;; esac which mkfs.ext4 || skip aux have_raid 1 13 2 || skip mount_dir="mnt" cleanup_mounted_and_teardown() { umount "$mount_dir" || true aux teardown } aux prepare_pvs 16 32 get_devs vgcreate $SHARED -s 1M "$vg" "${DEVICES[@]}" trap 'cleanup_mounted_and_teardown' EXIT # Create 10-way striped raid5 (11 legs total) lvcreate --yes --type raid5_ls --stripesize 64K --stripes 10 -L4 -n$lv1 $vg check lv_first_seg_field $vg/$lv1 segtype "raid5_ls" check lv_first_seg_field $vg/$lv1 stripesize "64.00k" check lv_first_seg_field $vg/$lv1 data_stripes 10 check lv_first_seg_field $vg/$lv1 stripes 11 wipefs -a "$DM_DEV_DIR/$vg/$lv1" mkfs -t ext4 "$DM_DEV_DIR/$vg/$lv1" fsck -fn "$DM_DEV_DIR/$vg/$lv1" mkdir -p "$mount_dir" mount "$DM_DEV_DIR/$vg/$lv1" "$mount_dir" mkdir -p "$mount_dir/1" "$mount_dir/2" echo 3 >/proc/sys/vm/drop_caches cp -r /usr/bin "$mount_dir/1" &>/dev/null & cp -r /usr/bin "$mount_dir/2" &>/dev/null & sync & aux wait_for_sync $vg $lv1 aux delay_dev "$dev2" 0 100 # Reshape it to 15 data stripes lvconvert --yes --stripes 15 $vg/$lv1 aux disable_dev $dev1 aux delay_dev "$dev2" 0 50 check lv_first_seg_field $vg/$lv1 segtype "raid5_ls" check lv_first_seg_field $vg/$lv1 stripesize "64.00k" check lv_first_seg_field $vg/$lv1 data_stripes 15 check lv_first_seg_field $vg/$lv1 stripes 16 kill -9 %% wait rm -fr "$mount_dir/[12]" sync umount "$mount_dir" fsck -fn "$DM_DEV_DIR/$vg/$lv1" vgremove -ff $vg Thanks, Yang Xiuwei ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-28 8:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-13 22:45 [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Benjamin Marzinski 2026-04-14 1:25 ` Li Nan 2026-04-14 6:20 ` Yu Kuai 2026-04-14 18:19 ` Benjamin Marzinski 2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski 2026-04-22 9:58 ` Xiao Ni 2026-04-28 8:35 ` Yu Kuai 2026-04-15 1:28 ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox