[PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT

Linux RAID subsystem development
 help / color / mirror / Atom feed

* [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
@ 2026-04-13 22:45 Benjamin Marzinski
  2026-04-14  1:25 ` Li Nan
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2026-04-13 22:45 UTC (permalink / raw)
  To: Yu Kuai, Song Liu, Li Nan; +Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon

When make_stripe_request() encounters a clone bio that crosses the
reshape position while the reshape cannot make progress, it was setting
bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE.
This will update the original bio's bi_status in md_end_clone_io().
Afterwards, md_handle_request() will wait for the device to become
unsuspended and submit a new cloned bio. However, even if that clone
completes successfully, it will not clear the original bio's bi_status.

There's no need to set bi_status when retrying the bio. md will already
error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will
be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 drivers/md/raid5.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc0c680ca199..690c65cd1e29 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6042,7 +6042,6 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 	raid5_release_stripe(sh);
 out:
 	if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
-		bi->bi_status = BLK_STS_RESOURCE;
 		ret = STRIPE_WAIT_RESHAPE;
 		pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
 	}
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
  2026-04-13 22:45 [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Benjamin Marzinski
@ 2026-04-14  1:25 ` Li Nan
  2026-04-14  6:20   ` Yu Kuai
  0 siblings, 1 reply; 8+ messages in thread
From: Li Nan @ 2026-04-14  1:25 UTC (permalink / raw)
  To: Benjamin Marzinski, Yu Kuai, Song Liu
  Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon



在 2026/4/14 6:45, Benjamin Marzinski 写道:
> When make_stripe_request() encounters a clone bio that crosses the
> reshape position while the reshape cannot make progress, it was setting
> bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE.
> This will update the original bio's bi_status in md_end_clone_io().
> Afterwards, md_handle_request() will wait for the device to become
> unsuspended and submit a new cloned bio. However, even if that clone
> completes successfully, it will not clear the original bio's bi_status.
> 
> There's no need to set bi_status when retrying the bio. md will already
> error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will
> be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>   drivers/md/raid5.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index dc0c680ca199..690c65cd1e29 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6042,7 +6042,6 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>   	raid5_release_stripe(sh);
>   out:
>   	if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
> -		bi->bi_status = BLK_STS_RESOURCE;
>   		ret = STRIPE_WAIT_RESHAPE;
>   		pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
>   	}

The link below leads to the same patch, which Kuai has already replied to.

https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
  2026-04-14  1:25 ` Li Nan
@ 2026-04-14  6:20   ` Yu Kuai
  2026-04-14 18:19     ` Benjamin Marzinski
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2026-04-14  6:20 UTC (permalink / raw)
  To: Li Nan, Benjamin Marzinski, Yu Kuai, Song Liu, yukuai
  Cc: linux-raid, dm-devel, Xiao Ni, Nigel Croxon

Hi,

在 2026/4/14 9:25, Li Nan 写道:
>
>
> 在 2026/4/14 6:45, Benjamin Marzinski 写道:
>> When make_stripe_request() encounters a clone bio that crosses the
>> reshape position while the reshape cannot make progress, it was setting
>> bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE.
>> This will update the original bio's bi_status in md_end_clone_io().
>> Afterwards, md_handle_request() will wait for the device to become
>> unsuspended and submit a new cloned bio. However, even if that clone
>> completes successfully, it will not clear the original bio's bi_status.
>>
>> There's no need to set bi_status when retrying the bio. md will already
>> error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will
>> be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE.
>>
>> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>> ---
>>   drivers/md/raid5.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index dc0c680ca199..690c65cd1e29 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -6042,7 +6042,6 @@ static enum stripe_result 
>> make_stripe_request(struct mddev *mddev,
>>       raid5_release_stripe(sh);
>>   out:
>>       if (ret == STRIPE_SCHEDULE_AND_RETRY && 
>> reshape_interrupted(mddev)) {
>> -        bi->bi_status = BLK_STS_RESOURCE;
>>           ret = STRIPE_WAIT_RESHAPE;
>>           pr_err_ratelimited("dm-raid456: io across reshape position 
>> while reshape can't make progress");
>>       }
>
> The link below leads to the same patch, which Kuai has already replied 
> to.
>
> https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/

Perhaps instead of clearing the error code from error path, this problem can be fixed
by resetting the error code from the issue path if original bio is resubmitted.

>
>
> -- 
> Thanks,
> Nan
>
>
-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
  2026-04-14  6:20   ` Yu Kuai
@ 2026-04-14 18:19     ` Benjamin Marzinski
  2026-04-14 19:03       ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
  2026-04-15  1:28       ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei
  0 siblings, 2 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2026-04-14 18:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Li Nan, Yu Kuai, Song Liu, linux-raid, dm-devel, Xiao Ni,
	Nigel Croxon, Yang Xiuwei

On Tue, Apr 14, 2026 at 02:20:40PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2026/4/14 9:25, Li Nan 写道:
> >
> >
> > 在 2026/4/14 6:45, Benjamin Marzinski 写道:
> >> When make_stripe_request() encounters a clone bio that crosses the
> >> reshape position while the reshape cannot make progress, it was setting
> >> bi->bi_status to BLK_STS_RESOURCE when returning STRIPE_WAIT_RESHAPE.
> >> This will update the original bio's bi_status in md_end_clone_io().
> >> Afterwards, md_handle_request() will wait for the device to become
> >> unsuspended and submit a new cloned bio. However, even if that clone
> >> completes successfully, it will not clear the original bio's bi_status.
> >>
> >> There's no need to set bi_status when retrying the bio. md will already
> >> error out the bio correctly if it is set REQ_NOWAIT. Otherwise it will
> >> be retried. dm-raid will already end the bio with DM_MAPIO_REQUEUE.
> >>
> >> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> >> ---
> >>   drivers/md/raid5.c | 1 -
> >>   1 file changed, 1 deletion(-)
> >>
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index dc0c680ca199..690c65cd1e29 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -6042,7 +6042,6 @@ static enum stripe_result 
> >> make_stripe_request(struct mddev *mddev,
> >>       raid5_release_stripe(sh);
> >>   out:
> >>       if (ret == STRIPE_SCHEDULE_AND_RETRY && 
> >> reshape_interrupted(mddev)) {
> >> -        bi->bi_status = BLK_STS_RESOURCE;
> >>           ret = STRIPE_WAIT_RESHAPE;
> >>           pr_err_ratelimited("dm-raid456: io across reshape position 
> >> while reshape can't make progress");
> >>       }
> >
> > The link below leads to the same patch, which Kuai has already replied 
> > to.
> >
> > https://lore.kernel.org/all/20260203095156.2349174-1-yangxiuwei@kylinos.cn/
> 
> Perhaps instead of clearing the error code from error path, this problem can be fixed
> by resetting the error code from the issue path if original bio is resubmitted.

I saw your comments at
https://lore.kernel.org/all/71e50b0e-0669-4a40-84d5-3c3061dfb229@fnnas.com/
and I'm a little confused.

The only code path where STRIPE_WAIT_RESHAPE is returned and
bi->bi_status is currently set to BLK_STS_RESOURCE is:
md_handle_request -> raid5_make_request -> make_stripe_request()

make_stripe_request() returning STRIPE_WAIT_RESHAPE, means that
raid5_make_request() will return false (this is the only situation where
raid5_make_request() returns false). This causes the cloned bio to be
freed without completing the original bio.

raid5_make_request() returning false will cause md_handle_request() to
do different things, depending on whether the device is a dm device or a
md device.

For dm devices, md_handle_request() will return false, causing
dm-raid.c:raid_map() to return DM_MAPIO_REQUEUE. This will either
requeue dm's original bio (md's orignal bio is itself a clone of dm's
original bio) if the device is currently in a noflush suspend or
complete dm's original bio with BLK_STS_IOERR if the device is not.
Since the DM_MAPIO_REQUEUE overrides any error for bios that should be
requeued, removing "bi->bi_status = BLK_STS_RESOURCE" doesn't actually
seem important for DM.

But for md devices, md_handle_request() will loop back to check_suspend,
which will complete the bio with BLK_STS_AGAIN if it's a REQ_NOWAIT bio,
and will otherwise wait until the device is no longer suspended to call
raid5_make_request() again. If that later call to raid5_make_request()
completes successfully, the original bio will retain the
BLK_STS_RESOURCE status from the earlier failed call, instead of
completing successfully like it should.

I don't see where a bio could get completed without bio->bi_status
getting set to an approriate error here. Am I missing something?
Obviously clearing the error when you resubmit would fix the issue
as well. It just seems odd to set it and then clear it when AFAICT
nothing requires it to be set in the first place. But perhaps I'm
overlooking something.

Yang Xiuwei, have you verified that this fix actually solves your
problems? If a dm map() function completes with DM_MAPIO_REQUEUE, and
the device is in a noflush suspend, it shouldn't set the error on the
original bio, regardless of the clone bio. It should requeue the bio. If
a dm map() function completes with DM_MAPIO_REQUEUE, and the device
isn't in a noflush suspend, the original bio will always be completed
with an error.

To me, it seems more likely that what you are seeing is
make_stripe_request() returning STRIPE_WAIT_RESHAPE when the dm device
isn't actually in a noflush suspend. I have seen this myself.

-Ben

> 
> >
> >
> > -- 
> > Thanks,
> > Nan
> >
> >
> -- 
> Thansk,
> Kuai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH] dm-raid: only requeue bios when dm is suspending.
  2026-04-14 18:19     ` Benjamin Marzinski
@ 2026-04-14 19:03       ` Benjamin Marzinski
  2026-04-22  9:58         ` Xiao Ni
  2026-04-28  8:35         ` Yu Kuai
  2026-04-15  1:28       ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei
  1 sibling, 2 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2026-04-14 19:03 UTC (permalink / raw)
  To: Yang Xiuwei
  Cc: Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Xiao Ni,
	Nigel Croxon

returning DM_MAPIO_REQUEUE from the target map() function only requeues
the bio during noflush suspends. During regular operations or during
flushing suspends, it fails the bio. Failing the bio during flushing
suspends is the correct behavior here. We cannot handle the bio, and we
cannot suspends while it is outstanding. But during normal operations,
we should not push the bio back to do. Instead, wait for the reshape
to be resumed.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---

Yang Xiuwei, if you are still able to see I/O errors during LVM testing,
does this patch fix them?

 drivers/md/dm-raid.c | 7 +++++++
 drivers/md/md.h      | 1 +
 drivers/md/raid5.c   | 6 ++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 4bacdc499984..cac61d57e7e2 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti)
 	 * resume, raid_postsuspend() is too late.
 	 */
 	set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+	WRITE_ONCE(mddev->dm_suspending, 1);
 
 	if (!reshape_interrupted(mddev))
 		return;
@@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti)
 static void raid_presuspend_undo(struct dm_target *ti)
 {
 	struct raid_set *rs = ti->private;
+	struct mddev *mddev = &rs->md;
+
+	WRITE_ONCE(mddev->dm_suspending, 0);
 
 	clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
 }
@@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti)
 static void raid_postsuspend(struct dm_target *ti)
 {
 	struct raid_set *rs = ti->private;
+	struct mddev *mddev = &rs->md;
 
 	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
 		/*
@@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti)
 		mddev_suspend(&rs->md, false);
 		rs->md.ro = MD_RDONLY;
 	}
+	WRITE_ONCE(mddev->dm_suspending, 0);
+
 }
 
 static void attempt_restore_of_faulty_devices(struct raid_set *rs)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd..e8d7332c5cb9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -463,6 +463,7 @@ struct mddev {
 	int				delta_disks, new_level, new_layout;
 	int				new_chunk_sectors;
 	int				reshape_backwards;
+	int				dm_suspending;
 
 	struct md_thread __rcu		*thread;	/* management thread */
 	struct md_thread __rcu		*sync_thread;	/* doing resync or reconstruct */
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8854e024f311..d528263f92a3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 	raid5_release_stripe(sh);
 out:
 	if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
-		bi->bi_status = BLK_STS_RESOURCE;
-		ret = STRIPE_WAIT_RESHAPE;
+		if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) {
+			bi->bi_status = BLK_STS_RESOURCE;
+			ret = STRIPE_WAIT_RESHAPE;
+		}
 		pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
 	}
 	return ret;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] dm-raid: only requeue bios when dm is suspending.
  2026-04-14 19:03       ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
@ 2026-04-22  9:58         ` Xiao Ni
  2026-04-28  8:35         ` Yu Kuai
  1 sibling, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2026-04-22  9:58 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Yang Xiuwei, Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel,
	Nigel Croxon

On Wed, Apr 15, 2026 at 3:03 AM Benjamin Marzinski <bmarzins@redhat.com> wrote:
>
> returning DM_MAPIO_REQUEUE from the target map() function only requeues
> the bio during noflush suspends. During regular operations or during
> flushing suspends, it fails the bio. Failing the bio during flushing
> suspends is the correct behavior here. We cannot handle the bio, and we
> cannot suspends while it is outstanding. But during normal operations,
> we should not push the bio back to do. Instead, wait for the reshape
> to be resumed.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>
> Yang Xiuwei, if you are still able to see I/O errors during LVM testing,
> does this patch fix them?
>
>  drivers/md/dm-raid.c | 7 +++++++
>  drivers/md/md.h      | 1 +
>  drivers/md/raid5.c   | 6 ++++--
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index 4bacdc499984..cac61d57e7e2 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti)
>          * resume, raid_postsuspend() is too late.
>          */
>         set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
> +       WRITE_ONCE(mddev->dm_suspending, 1);
>
>         if (!reshape_interrupted(mddev))
>                 return;
> @@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti)
>  static void raid_presuspend_undo(struct dm_target *ti)
>  {
>         struct raid_set *rs = ti->private;
> +       struct mddev *mddev = &rs->md;
> +
> +       WRITE_ONCE(mddev->dm_suspending, 0);
>
>         clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
>  }
> @@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti)
>  static void raid_postsuspend(struct dm_target *ti)
>  {
>         struct raid_set *rs = ti->private;
> +       struct mddev *mddev = &rs->md;
>
>         if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>                 /*
> @@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti)
>                 mddev_suspend(&rs->md, false);
>                 rs->md.ro = MD_RDONLY;
>         }
> +       WRITE_ONCE(mddev->dm_suspending, 0);
> +
>  }
>
>  static void attempt_restore_of_faulty_devices(struct raid_set *rs)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ac84289664cd..e8d7332c5cb9 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -463,6 +463,7 @@ struct mddev {
>         int                             delta_disks, new_level, new_layout;
>         int                             new_chunk_sectors;
>         int                             reshape_backwards;
> +       int                             dm_suspending;
>
>         struct md_thread __rcu          *thread;        /* management thread */
>         struct md_thread __rcu          *sync_thread;   /* doing resync or reconstruct */
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8854e024f311..d528263f92a3 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>         raid5_release_stripe(sh);
>  out:
>         if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
> -               bi->bi_status = BLK_STS_RESOURCE;
> -               ret = STRIPE_WAIT_RESHAPE;
> +               if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) {
> +                       bi->bi_status = BLK_STS_RESOURCE;
> +                       ret = STRIPE_WAIT_RESHAPE;
> +               }
>                 pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
>         }
>         return ret;
> --
> 2.50.1
>

Looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] dm-raid: only requeue bios when dm is suspending.
  2026-04-14 19:03       ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
  2026-04-22  9:58         ` Xiao Ni
@ 2026-04-28  8:35         ` Yu Kuai
  1 sibling, 0 replies; 8+ messages in thread
From: Yu Kuai @ 2026-04-28  8:35 UTC (permalink / raw)
  To: Benjamin Marzinski, Yang Xiuwei
  Cc: Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Xiao Ni,
	Nigel Croxon, yukuai

Hi,

在 2026/4/15 3:03, Benjamin Marzinski 写道:
> returning DM_MAPIO_REQUEUE from the target map() function only requeues
> the bio during noflush suspends. During regular operations or during
> flushing suspends, it fails the bio. Failing the bio during flushing
> suspends is the correct behavior here. We cannot handle the bio, and we
> cannot suspends while it is outstanding. But during normal operations,
> we should not push the bio back to do. Instead, wait for the reshape
> to be resumed.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>
> Yang Xiuwei, if you are still able to see I/O errors during LVM testing,
> does this patch fix them?
>
>   drivers/md/dm-raid.c | 7 +++++++
>   drivers/md/md.h      | 1 +
>   drivers/md/raid5.c   | 6 ++++--
>   3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index 4bacdc499984..cac61d57e7e2 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti)
>   	 * resume, raid_postsuspend() is too late.
>   	 */
>   	set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
> +	WRITE_ONCE(mddev->dm_suspending, 1);
>   
>   	if (!reshape_interrupted(mddev))
>   		return;
> @@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti)
>   static void raid_presuspend_undo(struct dm_target *ti)
>   {
>   	struct raid_set *rs = ti->private;
> +	struct mddev *mddev = &rs->md;
> +
> +	WRITE_ONCE(mddev->dm_suspending, 0);
>   
>   	clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
>   }
> @@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti)
>   static void raid_postsuspend(struct dm_target *ti)
>   {
>   	struct raid_set *rs = ti->private;
> +	struct mddev *mddev = &rs->md;
>   
>   	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>   		/*
> @@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti)
>   		mddev_suspend(&rs->md, false);
>   		rs->md.ro = MD_RDONLY;
>   	}
> +	WRITE_ONCE(mddev->dm_suspending, 0);
> +
>   }
>   
>   static void attempt_restore_of_faulty_devices(struct raid_set *rs)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ac84289664cd..e8d7332c5cb9 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -463,6 +463,7 @@ struct mddev {
>   	int				delta_disks, new_level, new_layout;
>   	int				new_chunk_sectors;
>   	int				reshape_backwards;
> +	int				dm_suspending;

This patch looks fine, however, can you also optimize it by a new
flag instead a new int field ?

>   
>   	struct md_thread __rcu		*thread;	/* management thread */
>   	struct md_thread __rcu		*sync_thread;	/* doing resync or reconstruct */
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8854e024f311..d528263f92a3 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>   	raid5_release_stripe(sh);
>   out:
>   	if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
> -		bi->bi_status = BLK_STS_RESOURCE;
> -		ret = STRIPE_WAIT_RESHAPE;
> +		if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) {
> +			bi->bi_status = BLK_STS_RESOURCE;
> +			ret = STRIPE_WAIT_RESHAPE;
> +		}
>   		pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
>   	}
>   	return ret;

-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
  2026-04-14 18:19     ` Benjamin Marzinski
  2026-04-14 19:03       ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
@ 2026-04-15  1:28       ` Yang Xiuwei
  1 sibling, 0 replies; 8+ messages in thread
From: Yang Xiuwei @ 2026-04-15  1:28 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Yu Kuai, Li Nan, Song Liu, Xiao Ni, Nigel Croxon, linux-raid,
	dm-devel

Hi Ben,

On Tue, Apr 14, 2026 at 02:19:33PM -0400, Benjamin Marzinski wrote:
> Yang Xiuwei, have you verified that this fix actually solves your
> problems? If a dm map() function completes with DM_MAPIO_REQUEUE, and
> the device is in a noflush suspend, it shouldn't set the error on the
> original bio, regardless of the clone bio. It should requeue the bio. If
> a dm map() function completes with DM_MAPIO_REQUEUE, and the device
> isn't in a noflush suspend, the original bio will always be completed
> with an error.
>
> To me, it seems more likely that what you are seeing is
> make_stripe_request() returning STRIPE_WAIT_RESHAPE when the dm device
> isn't actually in a noflush suspend. I have seen this myself.
>
> -Ben

I tested the version that removes setting bi->bi_status to BLK_STS_RESOURCE
in the STRIPE_WAIT_RESHAPE path you described. In my environment it did
not fix the failure below.

Sorry for the slow response. The earlier fix still did not solve the
problem in my testing. I am not very familiar with this area yet and wanted
to learn more before continuing the analysis, but other work meant I have
not had time to pick it up again until now.

I have not yet tested the dm-raid RFC patch from your follow-up message,
but I plan to try it when I have time.

The failure was observed while running the LVM2 shell test
lvconvert-raid-reshape-stripes-load-fail.sh. Below is the test log (kernel
messages and harness output), followed by the script contents.

Test log:
| [ 0:10.630]   WARNING: This metadata update is NOT backed up.
| [ 0:10.632] aux disable_dev $dev1
| [ 0:10.748] #lvconvert-raid-reshape-stripes-load-fail.sh:68+ aux disable_dev /dev/mapper/LVMTEST1351568pv1
| [ 0:10.748] Disabling device /dev/mapper/LVMTEST1351568pv1 (252:5)
| [ 0:10.868] [73439.222696] <6> 2026-01-20 13:59:47  md: reshape of RAID array mdX
| [ 0:10.868] aux delay_dev "$dev2" 0 50
| [ 0:10.871] #lvconvert-raid-reshape-stripes-load-fail.sh:69+ aux delay_dev /dev/mapper/LVMTEST1351568pv2 0 50
| [ 0:10.871] check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
| [ 0:10.886] [73439.231558] <3> 2026-01-20 13:59:47  Buffer I/O error on dev dm-5, logical block 0, async page read
| [ 0:10.886] #lvconvert-raid-reshape-stripes-load-fail.sh:70+ check lv_first_seg_field LVMTEST1351568vg/LV1 segtype raid5_ls
| [ 0:10.886] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.910] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.910] check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
| [ 0:10.912] #lvconvert-raid-reshape-stripes-load-fail.sh:71+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripesize 64.00k
| [ 0:10.912] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.933] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.933] check lv_first_seg_field $vg/$lv1 data_stripes 15
| [ 0:10.935] #lvconvert-raid-reshape-stripes-load-fail.sh:72+ check lv_first_seg_field LVMTEST1351568vg/LV1 data_stripes 15
| [ 0:10.935] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.956] [73439.292632] <3> 2026-01-20 13:59:47  md: super_written gets error=-5
| [ 0:10.956] [73439.297679] <2> 2026-01-20 13:59:47  md/raid:mdX: Disk failure on dm-22, disabling device.
| [ 0:10.956] [73439.304626] <2> 2026-01-20 13:59:47  md/raid:mdX: Operation continuing on 15 devices.
| [ 0:10.956] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.956] check lv_first_seg_field $vg/$lv1 stripes 16
| [ 0:10.958] #lvconvert-raid-reshape-stripes-load-fail.sh:73+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripes 16
| [ 0:10.958] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.979] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.979] 
| [ 0:10.981] kill -9 %%
| [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:75+ kill -9 %%
| [ 0:10.981] wait
| [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:76+ wait
| [ 0:10.981] rm -fr "$mount_dir/[12]"
| [ 0:11.787] [73439.674065] <4> 2026-01-20 13:59:48  make_stripe_request: 24 callbacks suppressed
| [ 0:11.787] [73439.674074] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.674086] <3> 2026-01-20 13:59:48  Buffer I/O error on dev dm-43, logical block 1074, lost sync page write
| [ 0:11.787] [73439.681096] <6> 2026-01-20 13:59:48  md: mdX: reshape interrupted.
| [ 0:11.787] [73439.682723] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.691180] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.699766] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.708347] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.716934] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.725519] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.734099] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.734574] <2> 2026-01-20 13:59:48  EXT4-fs error (device dm-43): ext4_check_bdev_write_error:225: comm kworker/u388:2: Error while async write back metadata
| [ 0:11.787] [73439.742682] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.764081] <3> 2026-01-20 13:59:48  Aborting journal on device dm-43-8.
| [ 0:11.787] [73439.778040] <3> 2026-01-20 13:59:48  dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.788] [73439.778043] <3> 2026-01-20 13:59:48  Buffer I/O error on dev dm-43, logical block 740, lost sync page write
| [ 0:11.788] [73439.795025] <3> 2026-01-20 13:59:48  JBD2: I/O error when updating journal superblock for dm-43-8.
| [ 0:11.788] [73439.802674] <2> 2026-01-20 13:59:48  EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal
| [ 0:11.788] [73439.802673] <2> 2026-01-20 13:59:48  EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal
| [ 0:11.788] [73440.032568] <3> 2026-01-20 13:59:48  Buffer I/O error on dev dm-43, logical block 1, lost sync page write
| [ 0:11.788] [73440.040800] <3> 2026-01-20 13:59:48  EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.788] [73440.040813] <3> 2026-01-20 13:59:48  EXT4-fs (dm-43): previous I/O error to superblock detected
| [ 0:11.788] [73440.047569] <2> 2026-01-20 13:59:48  EXT4-fs (dm-43): Remounting filesystem read-only
| [ 0:11.788] [73440.054948] <3> 2026-01-20 13:59:48  Buffer I/O error on dev dm-43, logical block 1, lost sync page write
| [ 0:11.788] [73440.069663] <3> 2026-01-20 13:59:48  EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.788] [73440.076428] <2> 2026-01-20 13:59:48  EXT4-fs (dm-43): Remounting filesystem read-only
| [ 0:11.788] #lvconvert-raid-reshape-stripes-load-fail.sh:77+ rm -fr 'mnt/[12]'
| [ 0:11.788] 
| [ 0:11.789] sync
| [ 0:11.789] #lvconvert-raid-reshape-stripes-load-fail.sh:79+ sync
| [ 0:11.789] umount "$mount_dir"
| [ 0:11.798] [73440.145596] <3> 2026-01-20 13:59:48  Buffer I/O error on dev dm-43, logical block 82, lost async page write
| [ 0:11.798] #lvconvert-raid-reshape-stripes-load-fail.sh:80+ umount mnt
| [ 0:11.798] 
| [ 0:11.814] fsck -fn "$DM_DEV_DIR/$vg/$lv1"
| [ 0:11.814] [73440.162114] <6> 2026-01-20 13:59:48  EXT4-fs (dm-43): unmounting filesystem 86548d8e-e409-4ae8-b7d5-8b78a9b5fb50.
| [ 0:11.814] [73440.162336] <3> 2026-01-20 13:59:48  EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.814] #lvconvert-raid-reshape-stripes-load-fail.sh:82+ fsck -fn /dev/LVMTEST1351568vg/LV1
| [ 0:11.814] fsck from util-linux 2.39.1
| [ 0:11.816] e2fsck 1.47.0 (5-Feb-2023)
| [ 0:11.821] fsck.ext2: Input/output error while trying to open /dev/mapper/LVMTEST1351568vg-LV1
| [ 0:11.821] 
| [ 0:11.821] The superblock could not be read or does not describe a valid ext2/ext3/ext4
| [ 0:11.821] filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
| [ 0:11.821] filesystem (and not swap or ufs or something else), then the superblock
| [ 0:11.821] is corrupt, and you might try running e2fsck with an alternate superblock:
| [ 0:11.821]     e2fsck -b 8193 <device>
| [ 0:11.821]  or
| [ 0:11.821]     e2fsck -b 32768 <device>
| [ 0:11.821] 
| [ 0:11.821] set +vx; STACKTRACE; set -vx
| [ 0:11.822] ##lvconvert-raid-reshape-stripes-load-fail.sh:82+ set +vx
| [ 0:11.822] ## - /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82
| [ 0:11.822] ## 1 STACKTRACE() called from /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82

lvconvert-raid-reshape-stripes-load-fail.sh:
#!/usr/bin/env bash

# Copyright (C) 2017 Red Hat, Inc. All rights reserved.
#
# This copyrighted material is made available to anyone wishing to use,
# modify, copy, or redistribute it subject to the terms and conditions
# of the GNU General Public License v.2.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 51 Franklin Street, Fifth Floor, Boston, MA2110-1301 USA


SKIP_WITH_LVMPOLLD=1

. lib/inittest

# Test reshaping under io load

case "$(uname -r)" in
  3.10.0-862*) skip "Cannot run this test on unfixed kernel." ;;
esac

which mkfs.ext4 || skip
aux have_raid 1 13 2 || skip

mount_dir="mnt"

cleanup_mounted_and_teardown()
{
	umount "$mount_dir" || true
	aux teardown
}

aux prepare_pvs 16 32

get_devs

vgcreate $SHARED -s 1M "$vg" "${DEVICES[@]}"

trap 'cleanup_mounted_and_teardown' EXIT

# Create 10-way striped raid5 (11 legs total)
lvcreate --yes --type raid5_ls --stripesize 64K --stripes 10 -L4 -n$lv1 $vg
check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
check lv_first_seg_field $vg/$lv1 data_stripes 10
check lv_first_seg_field $vg/$lv1 stripes 11
wipefs -a "$DM_DEV_DIR/$vg/$lv1"
mkfs -t ext4 "$DM_DEV_DIR/$vg/$lv1"
fsck -fn "$DM_DEV_DIR/$vg/$lv1"

mkdir -p "$mount_dir"
mount "$DM_DEV_DIR/$vg/$lv1" "$mount_dir"
mkdir -p "$mount_dir/1" "$mount_dir/2"


echo 3 >/proc/sys/vm/drop_caches
cp -r /usr/bin "$mount_dir/1" &>/dev/null &
cp -r /usr/bin "$mount_dir/2" &>/dev/null &
sync &

aux wait_for_sync $vg $lv1
aux delay_dev "$dev2" 0 100

# Reshape it to 15 data stripes
lvconvert --yes --stripes 15 $vg/$lv1
aux disable_dev $dev1
aux delay_dev "$dev2" 0 50
check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
check lv_first_seg_field $vg/$lv1 data_stripes 15
check lv_first_seg_field $vg/$lv1 stripes 16

kill -9 %%
wait
rm -fr "$mount_dir/[12]"

sync
umount "$mount_dir"

fsck -fn "$DM_DEV_DIR/$vg/$lv1"

vgremove -ff $vg

Thanks,
Yang Xiuwei


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-28  8:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13 22:45 [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Benjamin Marzinski
2026-04-14  1:25 ` Li Nan
2026-04-14  6:20   ` Yu Kuai
2026-04-14 18:19     ` Benjamin Marzinski
2026-04-14 19:03       ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
2026-04-22  9:58         ` Xiao Ni
2026-04-28  8:35         ` Yu Kuai
2026-04-15  1:28       ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox