* [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
@ 2026-04-08 4:35 Benjamin Marzinski
2026-04-08 11:22 ` Xiao Ni
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Benjamin Marzinski @ 2026-04-08 4:35 UTC (permalink / raw)
To: Yu Kuai, Song Liu, Li Nan, Xiao Ni; +Cc: linux-raid, dm-devel, Nigel Croxon
If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
raid5_make_request() will free the cloned bio. But raid5_make_request()
can call make_stripe_request() multiple times, writing to the various
stripes. If that bio got added to the toread or towrite lists of a
stripe disk in an earlier call to make_stripe_request(), then it's not
safe to just free the bio if a later part of it is found to cross the
reshape position. Doing so can lead to a UAF error, when bio_endio()
is called on the bio for the earlier stripes.
Instead, raid5_make_request() needs to wait until all parts of the bio
have called bio_endio(). To do this, bios that cross the reshape
position while the reshape can't make progress are flagged as needing to
wait for all parts to complete. When raid5_make_request() has a bio that
failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
bi->bi_private to a completion struct and waits for completion after
ending the bio. When the bio_endio() is called for the last time on a
clone bio with bi->bi_private set, it wakes up the waiter. This
guarantees that raid5_make_request() doesn't return until the cloned bio
needing a retry for io across the reshape boundary is safely cleaned up.
There is a simple reproducer available at [1]. Compile the kernel with
KASAN for more useful reporting when the error is triggered (this is not
necessary to see the bug).
[1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
Changes from v1:
- Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
md_io_clone->must_retry. Instead, use a completion struct
pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
drivers/md/md.c | 31 ++++++++-----------------------
drivers/md/md.h | 1 -
drivers/md/raid5.c | 7 ++++++-
3 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3ce6f9e9d38e..4318d875a5f6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
static void md_end_clone_io(struct bio *bio)
{
- struct md_io_clone *md_io_clone = bio->bi_private;
+ struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
+ bio_clone);
struct bio *orig_bio = md_io_clone->orig_bio;
struct mddev *mddev = md_io_clone->mddev;
+ struct completion *reshape_completion = bio->bi_private;
if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
md_bitmap_end(mddev, md_io_clone);
@@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
bio_end_io_acct(orig_bio, md_io_clone->start_time);
bio_put(bio);
- bio_endio(orig_bio);
+ if (unlikely(reshape_completion))
+ complete(reshape_completion);
+ else
+ bio_endio(orig_bio);
percpu_ref_put(&mddev->active_io);
}
@@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
}
clone->bi_end_io = md_end_clone_io;
- clone->bi_private = md_io_clone;
+ clone->bi_private = NULL;
*bio = clone;
}
@@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
}
EXPORT_SYMBOL_GPL(md_account_bio);
-void md_free_cloned_bio(struct bio *bio)
-{
- struct md_io_clone *md_io_clone = bio->bi_private;
- struct bio *orig_bio = md_io_clone->orig_bio;
- struct mddev *mddev = md_io_clone->mddev;
-
- if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
- md_bitmap_end(mddev, md_io_clone);
-
- if (bio->bi_status && !orig_bio->bi_status)
- orig_bio->bi_status = bio->bi_status;
-
- if (md_io_clone->start_time)
- bio_end_io_acct(orig_bio, md_io_clone->start_time);
-
- bio_put(bio);
- percpu_ref_put(&mddev->active_io);
-}
-EXPORT_SYMBOL_GPL(md_free_cloned_bio);
-
/* md_allow_write(mddev)
* Calling this ensures that the array is marked 'active' so that writes
* may proceed without blocking. It is important to call this before
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd..5d57fee22901 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
struct bio *bio, sector_t start, sector_t size);
void md_account_bio(struct mddev *mddev, struct bio **bio);
-void md_free_cloned_bio(struct bio *bio);
extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a8e8d431071b..dc0c680ca199 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
mempool_free(ctx, conf->ctx_pool);
if (res == STRIPE_WAIT_RESHAPE) {
- md_free_cloned_bio(bi);
+ DECLARE_COMPLETION_ONSTACK(done);
+ WRITE_ONCE(bi->bi_private, &done);
+
+ bio_endio(bi);
+
+ wait_for_completion(&done);
return false;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-08 4:35 [PATCH v2] md/raid5: Fix UAF on IO across the reshape position Benjamin Marzinski
@ 2026-04-08 11:22 ` Xiao Ni
2026-04-08 19:57 ` Benjamin Marzinski
2026-04-13 2:07 ` Xiao Ni
2026-04-19 3:51 ` Yu Kuai
2 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2026-04-08 11:22 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Yu Kuai, Song Liu, Li Nan, linux-raid, dm-devel, Nigel Croxon
On Wed, Apr 8, 2026 at 12:35 PM Benjamin Marzinski <bmarzins@redhat.com> wrote:
>
> If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> raid5_make_request() will free the cloned bio. But raid5_make_request()
> can call make_stripe_request() multiple times, writing to the various
> stripes. If that bio got added to the toread or towrite lists of a
> stripe disk in an earlier call to make_stripe_request(), then it's not
> safe to just free the bio if a later part of it is found to cross the
> reshape position. Doing so can lead to a UAF error, when bio_endio()
> is called on the bio for the earlier stripes.
>
> Instead, raid5_make_request() needs to wait until all parts of the bio
> have called bio_endio(). To do this, bios that cross the reshape
> position while the reshape can't make progress are flagged as needing to
> wait for all parts to complete. When raid5_make_request() has a bio that
> failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> bi->bi_private to a completion struct and waits for completion after
> ending the bio. When the bio_endio() is called for the last time on a
> clone bio with bi->bi_private set, it wakes up the waiter. This
> guarantees that raid5_make_request() doesn't return until the cloned bio
> needing a retry for io across the reshape boundary is safely cleaned up.
>
> There is a simple reproducer available at [1]. Compile the kernel with
> KASAN for more useful reporting when the error is triggered (this is not
> necessary to see the bug).
>
> [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>
> Changes from v1:
> - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> md_io_clone->must_retry. Instead, use a completion struct
> pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
>
> drivers/md/md.c | 31 ++++++++-----------------------
> drivers/md/md.h | 1 -
> drivers/md/raid5.c | 7 ++++++-
> 3 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3ce6f9e9d38e..4318d875a5f6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
>
> static void md_end_clone_io(struct bio *bio)
> {
> - struct md_io_clone *md_io_clone = bio->bi_private;
> + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
> + bio_clone);
> struct bio *orig_bio = md_io_clone->orig_bio;
> struct mddev *mddev = md_io_clone->mddev;
> + struct completion *reshape_completion = bio->bi_private;
>
> if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> md_bitmap_end(mddev, md_io_clone);
> @@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
> bio_end_io_acct(orig_bio, md_io_clone->start_time);
>
> bio_put(bio);
> - bio_endio(orig_bio);
> + if (unlikely(reshape_completion))
> + complete(reshape_completion);
> + else
> + bio_endio(orig_bio);
> percpu_ref_put(&mddev->active_io);
> }
>
> @@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
> }
>
> clone->bi_end_io = md_end_clone_io;
> - clone->bi_private = md_io_clone;
> + clone->bi_private = NULL;
> *bio = clone;
> }
>
> @@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> }
> EXPORT_SYMBOL_GPL(md_account_bio);
>
> -void md_free_cloned_bio(struct bio *bio)
> -{
> - struct md_io_clone *md_io_clone = bio->bi_private;
> - struct bio *orig_bio = md_io_clone->orig_bio;
> - struct mddev *mddev = md_io_clone->mddev;
> -
> - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> - md_bitmap_end(mddev, md_io_clone);
> -
> - if (bio->bi_status && !orig_bio->bi_status)
> - orig_bio->bi_status = bio->bi_status;
> -
> - if (md_io_clone->start_time)
> - bio_end_io_acct(orig_bio, md_io_clone->start_time);
> -
> - bio_put(bio);
> - percpu_ref_put(&mddev->active_io);
> -}
> -EXPORT_SYMBOL_GPL(md_free_cloned_bio);
> -
> /* md_allow_write(mddev)
> * Calling this ensures that the array is marked 'active' so that writes
> * may proceed without blocking. It is important to call this before
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ac84289664cd..5d57fee22901 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
> void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
> struct bio *bio, sector_t start, sector_t size);
> void md_account_bio(struct mddev *mddev, struct bio **bio);
> -void md_free_cloned_bio(struct bio *bio);
>
> extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
> void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a8e8d431071b..dc0c680ca199 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
>
> mempool_free(ctx, conf->ctx_pool);
> if (res == STRIPE_WAIT_RESHAPE) {
> - md_free_cloned_bio(bi);
> + DECLARE_COMPLETION_ONSTACK(done);
> + WRITE_ONCE(bi->bi_private, &done);
> +
> + bio_endio(bi);
Hi Ben
You gave an explanation why it doesn't need WRITE_ONCE. As you said,
bio_endio uses atomic_dec_and_test, so it guarantees a full memory
barrier. Why do you use WRITE_ONCE here?
Regards
Xiao
> +
> + wait_for_completion(&done);
> return false;
> }
>
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-08 11:22 ` Xiao Ni
@ 2026-04-08 19:57 ` Benjamin Marzinski
2026-04-09 2:31 ` Xiao Ni
0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Marzinski @ 2026-04-08 19:57 UTC (permalink / raw)
To: Xiao Ni; +Cc: Yu Kuai, Song Liu, Li Nan, linux-raid, dm-devel, Nigel Croxon
On Wed, Apr 08, 2026 at 07:22:38PM +0800, Xiao Ni wrote:
> On Wed, Apr 8, 2026 at 12:35 PM Benjamin Marzinski <bmarzins@redhat.com> wrote:
> >
> > If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> > raid5_make_request() will free the cloned bio. But raid5_make_request()
> > can call make_stripe_request() multiple times, writing to the various
> > stripes. If that bio got added to the toread or towrite lists of a
> > stripe disk in an earlier call to make_stripe_request(), then it's not
> > safe to just free the bio if a later part of it is found to cross the
> > reshape position. Doing so can lead to a UAF error, when bio_endio()
> > is called on the bio for the earlier stripes.
> >
> > Instead, raid5_make_request() needs to wait until all parts of the bio
> > have called bio_endio(). To do this, bios that cross the reshape
> > position while the reshape can't make progress are flagged as needing to
> > wait for all parts to complete. When raid5_make_request() has a bio that
> > failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> > bi->bi_private to a completion struct and waits for completion after
> > ending the bio. When the bio_endio() is called for the last time on a
> > clone bio with bi->bi_private set, it wakes up the waiter. This
> > guarantees that raid5_make_request() doesn't return until the cloned bio
> > needing a retry for io across the reshape boundary is safely cleaned up.
> >
> > There is a simple reproducer available at [1]. Compile the kernel with
> > KASAN for more useful reporting when the error is triggered (this is not
> > necessary to see the bug).
> >
> > [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >
> > Changes from v1:
> > - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> > md_io_clone->must_retry. Instead, use a completion struct
> > pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
> >
> > drivers/md/md.c | 31 ++++++++-----------------------
> > drivers/md/md.h | 1 -
> > drivers/md/raid5.c | 7 ++++++-
> > 3 files changed, 14 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 3ce6f9e9d38e..4318d875a5f6 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
> >
> > static void md_end_clone_io(struct bio *bio)
> > {
> > - struct md_io_clone *md_io_clone = bio->bi_private;
> > + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
> > + bio_clone);
> > struct bio *orig_bio = md_io_clone->orig_bio;
> > struct mddev *mddev = md_io_clone->mddev;
> > + struct completion *reshape_completion = bio->bi_private;
> >
> > if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > md_bitmap_end(mddev, md_io_clone);
> > @@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
> > bio_end_io_acct(orig_bio, md_io_clone->start_time);
> >
> > bio_put(bio);
> > - bio_endio(orig_bio);
> > + if (unlikely(reshape_completion))
> > + complete(reshape_completion);
> > + else
> > + bio_endio(orig_bio);
> > percpu_ref_put(&mddev->active_io);
> > }
> >
> > @@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
> > }
> >
> > clone->bi_end_io = md_end_clone_io;
> > - clone->bi_private = md_io_clone;
> > + clone->bi_private = NULL;
> > *bio = clone;
> > }
> >
> > @@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> > }
> > EXPORT_SYMBOL_GPL(md_account_bio);
> >
> > -void md_free_cloned_bio(struct bio *bio)
> > -{
> > - struct md_io_clone *md_io_clone = bio->bi_private;
> > - struct bio *orig_bio = md_io_clone->orig_bio;
> > - struct mddev *mddev = md_io_clone->mddev;
> > -
> > - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > - md_bitmap_end(mddev, md_io_clone);
> > -
> > - if (bio->bi_status && !orig_bio->bi_status)
> > - orig_bio->bi_status = bio->bi_status;
> > -
> > - if (md_io_clone->start_time)
> > - bio_end_io_acct(orig_bio, md_io_clone->start_time);
> > -
> > - bio_put(bio);
> > - percpu_ref_put(&mddev->active_io);
> > -}
> > -EXPORT_SYMBOL_GPL(md_free_cloned_bio);
> > -
> > /* md_allow_write(mddev)
> > * Calling this ensures that the array is marked 'active' so that writes
> > * may proceed without blocking. It is important to call this before
> > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > index ac84289664cd..5d57fee22901 100644
> > --- a/drivers/md/md.h
> > +++ b/drivers/md/md.h
> > @@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
> > void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
> > struct bio *bio, sector_t start, sector_t size);
> > void md_account_bio(struct mddev *mddev, struct bio **bio);
> > -void md_free_cloned_bio(struct bio *bio);
> >
> > extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
> > void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index a8e8d431071b..dc0c680ca199 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> >
> > mempool_free(ctx, conf->ctx_pool);
> > if (res == STRIPE_WAIT_RESHAPE) {
> > - md_free_cloned_bio(bi);
> > + DECLARE_COMPLETION_ONSTACK(done);
> > + WRITE_ONCE(bi->bi_private, &done);
> > +
> > + bio_endio(bi);
>
> Hi Ben
>
> You gave an explanation why it doesn't need WRITE_ONCE. As you said,
> bio_endio uses atomic_dec_and_test, so it guarantees a full memory
> barrier. Why do you use WRITE_ONCE here?
You're correct. I don't believe it's necessary. The compiler has to
update bi->bi_private before calling bio_endio(bi), which can free bi,
and either the bio was never chained, and bi->bi_private will be read by
the same process that set it, or it was chained, and the
atomic_dec_and_test() in bio_remaining_done() will guarantee a memory
barrier.
I just patterned my updated fix off your idea, and left the WRITE_ONCE
there because it doesn't really hurt anything, since this is already the
slow (and unlikely) path. I can pull it out if you'd like.
I actually have another question about this code. My patch doesn't mess
with the code at the end of make_stripe_request() to return
STRIPE_WAIT_RESHAPE, but I'm not sure that it's right. That code
includes:
bi->bi_status = BLK_STS_RESOURCE;
This will update the orig_bio's bi_status in md_end_clone_io():
if (bio->bi_status && !orig_bio->bi_status)
orig_bio->bi_status = bio->bi_status;
For dm-raid, that orig_bio is itself a clone, and will eventually
get ended with DM_MAPIO_REQUEUE, which will requeue the actual
original bio (assuming that this happens when the device is
in a noflush suspend).
But for md, md_handle_request() can just loop and retry it. If the
mddev->pers->make_request() call succeeds on retry, the orig_bio will
still have the BLK_STS_RESOURCE status that got set when the earlier
call to make_stripe_request() returned STRIPE_WAIT_RESHAPE.
Perhaps make_stripe_request() shouldn't set bi->bi_status if
it's going to return STRIPE_WAIT_RESHAPE. The only thing I can see that
it does is set orig_bio->bi_status, which I don't think we want it to
do. Am I missing something here?
-Ben
>
> Regards
> Xiao
>
>
> > +
> > + wait_for_completion(&done);
> > return false;
> > }
> >
> > --
> > 2.53.0
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-08 19:57 ` Benjamin Marzinski
@ 2026-04-09 2:31 ` Xiao Ni
2026-04-13 2:08 ` Xiao Ni
0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2026-04-09 2:31 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Yu Kuai, Song Liu, Li Nan, linux-raid, dm-devel, Nigel Croxon
On Thu, Apr 9, 2026 at 3:58 AM Benjamin Marzinski <bmarzins@redhat.com> wrote:
>
> On Wed, Apr 08, 2026 at 07:22:38PM +0800, Xiao Ni wrote:
> > On Wed, Apr 8, 2026 at 12:35 PM Benjamin Marzinski <bmarzins@redhat.com> wrote:
> > >
> > > If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> > > raid5_make_request() will free the cloned bio. But raid5_make_request()
> > > can call make_stripe_request() multiple times, writing to the various
> > > stripes. If that bio got added to the toread or towrite lists of a
> > > stripe disk in an earlier call to make_stripe_request(), then it's not
> > > safe to just free the bio if a later part of it is found to cross the
> > > reshape position. Doing so can lead to a UAF error, when bio_endio()
> > > is called on the bio for the earlier stripes.
> > >
> > > Instead, raid5_make_request() needs to wait until all parts of the bio
> > > have called bio_endio(). To do this, bios that cross the reshape
> > > position while the reshape can't make progress are flagged as needing to
> > > wait for all parts to complete. When raid5_make_request() has a bio that
> > > failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> > > bi->bi_private to a completion struct and waits for completion after
> > > ending the bio. When the bio_endio() is called for the last time on a
> > > clone bio with bi->bi_private set, it wakes up the waiter. This
> > > guarantees that raid5_make_request() doesn't return until the cloned bio
> > > needing a retry for io across the reshape boundary is safely cleaned up.
> > >
> > > There is a simple reproducer available at [1]. Compile the kernel with
> > > KASAN for more useful reporting when the error is triggered (this is not
> > > necessary to see the bug).
> > >
> > > [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >
> > > Changes from v1:
> > > - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> > > md_io_clone->must_retry. Instead, use a completion struct
> > > pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
> > >
> > > drivers/md/md.c | 31 ++++++++-----------------------
> > > drivers/md/md.h | 1 -
> > > drivers/md/raid5.c | 7 ++++++-
> > > 3 files changed, 14 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 3ce6f9e9d38e..4318d875a5f6 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
> > >
> > > static void md_end_clone_io(struct bio *bio)
> > > {
> > > - struct md_io_clone *md_io_clone = bio->bi_private;
> > > + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
> > > + bio_clone);
> > > struct bio *orig_bio = md_io_clone->orig_bio;
> > > struct mddev *mddev = md_io_clone->mddev;
> > > + struct completion *reshape_completion = bio->bi_private;
> > >
> > > if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > > md_bitmap_end(mddev, md_io_clone);
> > > @@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
> > > bio_end_io_acct(orig_bio, md_io_clone->start_time);
> > >
> > > bio_put(bio);
> > > - bio_endio(orig_bio);
> > > + if (unlikely(reshape_completion))
> > > + complete(reshape_completion);
> > > + else
> > > + bio_endio(orig_bio);
> > > percpu_ref_put(&mddev->active_io);
> > > }
> > >
> > > @@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
> > > }
> > >
> > > clone->bi_end_io = md_end_clone_io;
> > > - clone->bi_private = md_io_clone;
> > > + clone->bi_private = NULL;
> > > *bio = clone;
> > > }
> > >
> > > @@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> > > }
> > > EXPORT_SYMBOL_GPL(md_account_bio);
> > >
> > > -void md_free_cloned_bio(struct bio *bio)
> > > -{
> > > - struct md_io_clone *md_io_clone = bio->bi_private;
> > > - struct bio *orig_bio = md_io_clone->orig_bio;
> > > - struct mddev *mddev = md_io_clone->mddev;
> > > -
> > > - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > > - md_bitmap_end(mddev, md_io_clone);
> > > -
> > > - if (bio->bi_status && !orig_bio->bi_status)
> > > - orig_bio->bi_status = bio->bi_status;
> > > -
> > > - if (md_io_clone->start_time)
> > > - bio_end_io_acct(orig_bio, md_io_clone->start_time);
> > > -
> > > - bio_put(bio);
> > > - percpu_ref_put(&mddev->active_io);
> > > -}
> > > -EXPORT_SYMBOL_GPL(md_free_cloned_bio);
> > > -
> > > /* md_allow_write(mddev)
> > > * Calling this ensures that the array is marked 'active' so that writes
> > > * may proceed without blocking. It is important to call this before
> > > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > > index ac84289664cd..5d57fee22901 100644
> > > --- a/drivers/md/md.h
> > > +++ b/drivers/md/md.h
> > > @@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
> > > void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
> > > struct bio *bio, sector_t start, sector_t size);
> > > void md_account_bio(struct mddev *mddev, struct bio **bio);
> > > -void md_free_cloned_bio(struct bio *bio);
> > >
> > > extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
> > > void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index a8e8d431071b..dc0c680ca199 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> > >
> > > mempool_free(ctx, conf->ctx_pool);
> > > if (res == STRIPE_WAIT_RESHAPE) {
> > > - md_free_cloned_bio(bi);
> > > + DECLARE_COMPLETION_ONSTACK(done);
> > > + WRITE_ONCE(bi->bi_private, &done);
> > > +
> > > + bio_endio(bi);
> >
> > Hi Ben
> >
> > You gave an explanation why it doesn't need WRITE_ONCE. As you said,
> > bio_endio uses atomic_dec_and_test, so it guarantees a full memory
> > barrier. Why do you use WRITE_ONCE here?
>
> You're correct. I don't believe it's necessary. The compiler has to
> update bi->bi_private before calling bio_endio(bi), which can free bi,
> and either the bio was never chained, and bi->bi_private will be read by
> the same process that set it, or it was chained, and the
> atomic_dec_and_test() in bio_remaining_done() will guarantee a memory
> barrier.
>
> I just patterned my updated fix off your idea, and left the WRITE_ONCE
> there because it doesn't really hurt anything, since this is already the
> slow (and unlikely) path. I can pull it out if you'd like.
Thanks for the explanation. It's good to me to keep it.
>
> I actually have another question about this code. My patch doesn't mess
> with the code at the end of make_stripe_request() to return
> STRIPE_WAIT_RESHAPE, but I'm not sure that it's right. That code
> includes:
>
> bi->bi_status = BLK_STS_RESOURCE;
>
> This will update the orig_bio's bi_status in md_end_clone_io():
>
> if (bio->bi_status && !orig_bio->bi_status)
> orig_bio->bi_status = bio->bi_status;
>
> For dm-raid, that orig_bio is itself a clone, and will eventually
> get ended with DM_MAPIO_REQUEUE, which will requeue the actual
> original bio (assuming that this happens when the device is
> in a noflush suspend).
>
> But for md, md_handle_request() can just loop and retry it. If the
> mddev->pers->make_request() call succeeds on retry, the orig_bio will
> still have the BLK_STS_RESOURCE status that got set when the earlier
> call to make_stripe_request() returned STRIPE_WAIT_RESHAPE.
>
> Perhaps make_stripe_request() shouldn't set bi->bi_status if
> it's going to return STRIPE_WAIT_RESHAPE. The only thing I can see that
> it does is set orig_bio->bi_status, which I don't think we want it to
> do. Am I missing something here?
It's good to me to remove the line of setting BLK_STS_RESOURCE.
Regards
Xiao
>
> -Ben
>
> >
> > Regards
> > Xiao
> >
> >
> > > +
> > > + wait_for_completion(&done);
> > > return false;
> > > }
> > >
> > > --
> > > 2.53.0
> > >
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-08 4:35 [PATCH v2] md/raid5: Fix UAF on IO across the reshape position Benjamin Marzinski
2026-04-08 11:22 ` Xiao Ni
@ 2026-04-13 2:07 ` Xiao Ni
2026-04-19 3:51 ` Yu Kuai
2 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-04-13 2:07 UTC (permalink / raw)
To: Benjamin Marzinski, Yu Kuai, Song Liu, Li Nan
Cc: linux-raid, dm-devel, Nigel Croxon
在 2026/4/8 12:35, Benjamin Marzinski 写道:
> If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> raid5_make_request() will free the cloned bio. But raid5_make_request()
> can call make_stripe_request() multiple times, writing to the various
> stripes. If that bio got added to the toread or towrite lists of a
> stripe disk in an earlier call to make_stripe_request(), then it's not
> safe to just free the bio if a later part of it is found to cross the
> reshape position. Doing so can lead to a UAF error, when bio_endio()
> is called on the bio for the earlier stripes.
>
> Instead, raid5_make_request() needs to wait until all parts of the bio
> have called bio_endio(). To do this, bios that cross the reshape
> position while the reshape can't make progress are flagged as needing to
> wait for all parts to complete. When raid5_make_request() has a bio that
> failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> bi->bi_private to a completion struct and waits for completion after
> ending the bio. When the bio_endio() is called for the last time on a
> clone bio with bi->bi_private set, it wakes up the waiter. This
> guarantees that raid5_make_request() doesn't return until the cloned bio
> needing a retry for io across the reshape boundary is safely cleaned up.
>
> There is a simple reproducer available at [1]. Compile the kernel with
> KASAN for more useful reporting when the error is triggered (this is not
> necessary to see the bug).
>
> [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>
> Changes from v1:
> - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> md_io_clone->must_retry. Instead, use a completion struct
> pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
>
> drivers/md/md.c | 31 ++++++++-----------------------
> drivers/md/md.h | 1 -
> drivers/md/raid5.c | 7 ++++++-
> 3 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3ce6f9e9d38e..4318d875a5f6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
>
> static void md_end_clone_io(struct bio *bio)
> {
> - struct md_io_clone *md_io_clone = bio->bi_private;
> + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
> + bio_clone);
> struct bio *orig_bio = md_io_clone->orig_bio;
> struct mddev *mddev = md_io_clone->mddev;
> + struct completion *reshape_completion = bio->bi_private;
>
> if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> md_bitmap_end(mddev, md_io_clone);
> @@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
> bio_end_io_acct(orig_bio, md_io_clone->start_time);
>
> bio_put(bio);
> - bio_endio(orig_bio);
> + if (unlikely(reshape_completion))
> + complete(reshape_completion);
> + else
> + bio_endio(orig_bio);
> percpu_ref_put(&mddev->active_io);
> }
>
> @@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
> }
>
> clone->bi_end_io = md_end_clone_io;
> - clone->bi_private = md_io_clone;
> + clone->bi_private = NULL;
> *bio = clone;
> }
>
> @@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> }
> EXPORT_SYMBOL_GPL(md_account_bio);
>
> -void md_free_cloned_bio(struct bio *bio)
> -{
> - struct md_io_clone *md_io_clone = bio->bi_private;
> - struct bio *orig_bio = md_io_clone->orig_bio;
> - struct mddev *mddev = md_io_clone->mddev;
> -
> - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> - md_bitmap_end(mddev, md_io_clone);
> -
> - if (bio->bi_status && !orig_bio->bi_status)
> - orig_bio->bi_status = bio->bi_status;
> -
> - if (md_io_clone->start_time)
> - bio_end_io_acct(orig_bio, md_io_clone->start_time);
> -
> - bio_put(bio);
> - percpu_ref_put(&mddev->active_io);
> -}
> -EXPORT_SYMBOL_GPL(md_free_cloned_bio);
> -
> /* md_allow_write(mddev)
> * Calling this ensures that the array is marked 'active' so that writes
> * may proceed without blocking. It is important to call this before
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ac84289664cd..5d57fee22901 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
> void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
> struct bio *bio, sector_t start, sector_t size);
> void md_account_bio(struct mddev *mddev, struct bio **bio);
> -void md_free_cloned_bio(struct bio *bio);
>
> extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
> void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a8e8d431071b..dc0c680ca199 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
>
> mempool_free(ctx, conf->ctx_pool);
> if (res == STRIPE_WAIT_RESHAPE) {
> - md_free_cloned_bio(bi);
> + DECLARE_COMPLETION_ONSTACK(done);
> + WRITE_ONCE(bi->bi_private, &done);
> +
> + bio_endio(bi);
> +
> + wait_for_completion(&done);
> return false;
> }
>
The patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-09 2:31 ` Xiao Ni
@ 2026-04-13 2:08 ` Xiao Ni
0 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-04-13 2:08 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Yu Kuai, Song Liu, Li Nan, linux-raid, dm-devel, Nigel Croxon
On Thu, Apr 9, 2026 at 10:31 AM Xiao Ni <xni@redhat.com> wrote:
>
> On Thu, Apr 9, 2026 at 3:58 AM Benjamin Marzinski <bmarzins@redhat.com> wrote:
> >
> > On Wed, Apr 08, 2026 at 07:22:38PM +0800, Xiao Ni wrote:
> > > On Wed, Apr 8, 2026 at 12:35 PM Benjamin Marzinski <bmarzins@redhat.com> wrote:
> > > >
> > > > If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> > > > raid5_make_request() will free the cloned bio. But raid5_make_request()
> > > > can call make_stripe_request() multiple times, writing to the various
> > > > stripes. If that bio got added to the toread or towrite lists of a
> > > > stripe disk in an earlier call to make_stripe_request(), then it's not
> > > > safe to just free the bio if a later part of it is found to cross the
> > > > reshape position. Doing so can lead to a UAF error, when bio_endio()
> > > > is called on the bio for the earlier stripes.
> > > >
> > > > Instead, raid5_make_request() needs to wait until all parts of the bio
> > > > have called bio_endio(). To do this, bios that cross the reshape
> > > > position while the reshape can't make progress are flagged as needing to
> > > > wait for all parts to complete. When raid5_make_request() has a bio that
> > > > failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> > > > bi->bi_private to a completion struct and waits for completion after
> > > > ending the bio. When the bio_endio() is called for the last time on a
> > > > clone bio with bi->bi_private set, it wakes up the waiter. This
> > > > guarantees that raid5_make_request() doesn't return until the cloned bio
> > > > needing a retry for io across the reshape boundary is safely cleaned up.
> > > >
> > > > There is a simple reproducer available at [1]. Compile the kernel with
> > > > KASAN for more useful reporting when the error is triggered (this is not
> > > > necessary to see the bug).
> > > >
> > > > [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
> > > >
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > ---
> > > >
> > > > Changes from v1:
> > > > - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> > > > md_io_clone->must_retry. Instead, use a completion struct
> > > > pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
> > > >
> > > > drivers/md/md.c | 31 ++++++++-----------------------
> > > > drivers/md/md.h | 1 -
> > > > drivers/md/raid5.c | 7 ++++++-
> > > > 3 files changed, 14 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > > index 3ce6f9e9d38e..4318d875a5f6 100644
> > > > --- a/drivers/md/md.c
> > > > +++ b/drivers/md/md.c
> > > > @@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
> > > >
> > > > static void md_end_clone_io(struct bio *bio)
> > > > {
> > > > - struct md_io_clone *md_io_clone = bio->bi_private;
> > > > + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
> > > > + bio_clone);
> > > > struct bio *orig_bio = md_io_clone->orig_bio;
> > > > struct mddev *mddev = md_io_clone->mddev;
> > > > + struct completion *reshape_completion = bio->bi_private;
> > > >
> > > > if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > > > md_bitmap_end(mddev, md_io_clone);
> > > > @@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
> > > > bio_end_io_acct(orig_bio, md_io_clone->start_time);
> > > >
> > > > bio_put(bio);
> > > > - bio_endio(orig_bio);
> > > > + if (unlikely(reshape_completion))
> > > > + complete(reshape_completion);
> > > > + else
> > > > + bio_endio(orig_bio);
> > > > percpu_ref_put(&mddev->active_io);
> > > > }
> > > >
> > > > @@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
> > > > }
> > > >
> > > > clone->bi_end_io = md_end_clone_io;
> > > > - clone->bi_private = md_io_clone;
> > > > + clone->bi_private = NULL;
> > > > *bio = clone;
> > > > }
> > > >
> > > > @@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> > > > }
> > > > EXPORT_SYMBOL_GPL(md_account_bio);
> > > >
> > > > -void md_free_cloned_bio(struct bio *bio)
> > > > -{
> > > > - struct md_io_clone *md_io_clone = bio->bi_private;
> > > > - struct bio *orig_bio = md_io_clone->orig_bio;
> > > > - struct mddev *mddev = md_io_clone->mddev;
> > > > -
> > > > - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
> > > > - md_bitmap_end(mddev, md_io_clone);
> > > > -
> > > > - if (bio->bi_status && !orig_bio->bi_status)
> > > > - orig_bio->bi_status = bio->bi_status;
> > > > -
> > > > - if (md_io_clone->start_time)
> > > > - bio_end_io_acct(orig_bio, md_io_clone->start_time);
> > > > -
> > > > - bio_put(bio);
> > > > - percpu_ref_put(&mddev->active_io);
> > > > -}
> > > > -EXPORT_SYMBOL_GPL(md_free_cloned_bio);
> > > > -
> > > > /* md_allow_write(mddev)
> > > > * Calling this ensures that the array is marked 'active' so that writes
> > > > * may proceed without blocking. It is important to call this before
> > > > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > > > index ac84289664cd..5d57fee22901 100644
> > > > --- a/drivers/md/md.h
> > > > +++ b/drivers/md/md.h
> > > > @@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
> > > > void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
> > > > struct bio *bio, sector_t start, sector_t size);
> > > > void md_account_bio(struct mddev *mddev, struct bio **bio);
> > > > -void md_free_cloned_bio(struct bio *bio);
> > > >
> > > > extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
> > > > void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
> > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > index a8e8d431071b..dc0c680ca199 100644
> > > > --- a/drivers/md/raid5.c
> > > > +++ b/drivers/md/raid5.c
> > > > @@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> > > >
> > > > mempool_free(ctx, conf->ctx_pool);
> > > > if (res == STRIPE_WAIT_RESHAPE) {
> > > > - md_free_cloned_bio(bi);
> > > > + DECLARE_COMPLETION_ONSTACK(done);
> > > > + WRITE_ONCE(bi->bi_private, &done);
> > > > +
> > > > + bio_endio(bi);
> > >
> > > Hi Ben
> > >
> > > You gave an explanation why it doesn't need WRITE_ONCE. As you said,
> > > bio_endio uses atomic_dec_and_test, so it guarantees a full memory
> > > barrier. Why do you use WRITE_ONCE here?
> >
> > You're correct. I don't believe it's necessary. The compiler has to
> > update bi->bi_private before calling bio_endio(bi), which can free bi,
> > and either the bio was never chained, and bi->bi_private will be read by
> > the same process that set it, or it was chained, and the
> > atomic_dec_and_test() in bio_remaining_done() will guarantee a memory
> > barrier.
> >
> > I just patterned my updated fix off your idea, and left the WRITE_ONCE
> > there because it doesn't really hurt anything, since this is already the
> > slow (and unlikely) path. I can pull it out if you'd like.
>
> Thanks for the explanation. It's good to me to keep it.
>
> >
> > I actually have another question about this code. My patch doesn't mess
> > with the code at the end of make_stripe_request() to return
> > STRIPE_WAIT_RESHAPE, but I'm not sure that it's right. That code
> > includes:
> >
> > bi->bi_status = BLK_STS_RESOURCE;
> >
> > This will update the orig_bio's bi_status in md_end_clone_io():
> >
> > if (bio->bi_status && !orig_bio->bi_status)
> > orig_bio->bi_status = bio->bi_status;
> >
> > For dm-raid, that orig_bio is itself a clone, and will eventually
> > get ended with DM_MAPIO_REQUEUE, which will requeue the actual
> > original bio (assuming that this happens when the device is
> > in a noflush suspend).
> >
> > But for md, md_handle_request() can just loop and retry it. If the
> > mddev->pers->make_request() call succeeds on retry, the orig_bio will
> > still have the BLK_STS_RESOURCE status that got set when the earlier
> > call to make_stripe_request() returned STRIPE_WAIT_RESHAPE.
> >
> > Perhaps make_stripe_request() shouldn't set bi->bi_status if
> > it's going to return STRIPE_WAIT_RESHAPE. The only thing I can see that
> > it does is set orig_bio->bi_status, which I don't think we want it to
> > do. Am I missing something here?
>
> It's good to me to remove the line of setting BLK_STS_RESOURCE.
Hi Ben
Could you send this in a seperate patch?
Best Regards
Xiao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] md/raid5: Fix UAF on IO across the reshape position
2026-04-08 4:35 [PATCH v2] md/raid5: Fix UAF on IO across the reshape position Benjamin Marzinski
2026-04-08 11:22 ` Xiao Ni
2026-04-13 2:07 ` Xiao Ni
@ 2026-04-19 3:51 ` Yu Kuai
2 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2026-04-19 3:51 UTC (permalink / raw)
To: Benjamin Marzinski, Song Liu, Li Nan, Xiao Ni, yukuai
Cc: linux-raid, dm-devel, Nigel Croxon
在 2026/4/8 12:35, Benjamin Marzinski 写道:
> If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
> raid5_make_request() will free the cloned bio. But raid5_make_request()
> can call make_stripe_request() multiple times, writing to the various
> stripes. If that bio got added to the toread or towrite lists of a
> stripe disk in an earlier call to make_stripe_request(), then it's not
> safe to just free the bio if a later part of it is found to cross the
> reshape position. Doing so can lead to a UAF error, when bio_endio()
> is called on the bio for the earlier stripes.
>
> Instead, raid5_make_request() needs to wait until all parts of the bio
> have called bio_endio(). To do this, bios that cross the reshape
> position while the reshape can't make progress are flagged as needing to
> wait for all parts to complete. When raid5_make_request() has a bio that
> failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
> bi->bi_private to a completion struct and waits for completion after
> ending the bio. When the bio_endio() is called for the last time on a
> clone bio with bi->bi_private set, it wakes up the waiter. This
> guarantees that raid5_make_request() doesn't return until the cloned bio
> needing a retry for io across the reshape boundary is safely cleaned up.
>
> There is a simple reproducer available at [1]. Compile the kernel with
> KASAN for more useful reporting when the error is triggered (this is not
> necessary to see the bug).
>
> [1]https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
>
> Signed-off-by: Benjamin Marzinski<bmarzins@redhat.com>
> ---
>
> Changes from v1:
> - Removed mddev->pending_retry_bios, mddev->retry_bios_wait, and
> md_io_clone->must_retry. Instead, use a completion struct
> pointed to by bi->bi_private, as suggested by Xiao Ni and Yu Kuai.
>
> drivers/md/md.c | 31 ++++++++-----------------------
> drivers/md/md.h | 1 -
> drivers/md/raid5.c | 7 ++++++-
> 3 files changed, 14 insertions(+), 25 deletions(-)
Applied
--
Thansk,
Kuai
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-19 3:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 4:35 [PATCH v2] md/raid5: Fix UAF on IO across the reshape position Benjamin Marzinski
2026-04-08 11:22 ` Xiao Ni
2026-04-08 19:57 ` Benjamin Marzinski
2026-04-09 2:31 ` Xiao Ni
2026-04-13 2:08 ` Xiao Ni
2026-04-13 2:07 ` Xiao Ni
2026-04-19 3:51 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox