two small RAID1 cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* two small RAID1 cleanups
@ 2026-05-29  5:42 Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

Hi all,

this series has two little cleanups that helped me understand the
code when reading through it.

Diffstat:
 raid1.c |   44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] md/raid1: cleanup handle_read_error
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
@ 2026-05-29  5:42 ` Christoph Hellwig
  2026-06-01  5:51   ` Xiao Ni
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
  2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

Unwind the main conditional with duplicate conditions and initialize
variables at initialization time where possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 64d970e2ef50..8fad1692cf66 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
 
 static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 {
+	struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev;
+	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
 	struct mddev *mddev = conf->mddev;
-	struct bio *bio;
-	struct md_rdev *rdev;
 	sector_t sector;
 
 	clear_bit(R1BIO_ReadError, &r1_bio->state);
-	/* we got a read error. Maybe the drive is bad.  Maybe just
-	 * the block and we can fix it.
-	 * We freeze all other IO, and try reading the block from
-	 * other devices.  When we find one, we re-write
-	 * and check it that fixes the read error.
-	 * This is all done synchronously while the array is
-	 * frozen
-	 */
 
-	bio = r1_bio->bios[r1_bio->read_disk];
 	bio_put(bio);
 	r1_bio->bios[r1_bio->read_disk] = NULL;
 
-	rdev = conf->mirrors[r1_bio->read_disk].rdev;
-	if (mddev->ro == 0
-	    && !test_bit(FailFast, &rdev->flags)) {
+	/*
+	 * We got a read error. Maybe the drive is bad.  Maybe just the block
+	 * and we can fix it.
+	 *
+	 * If allowed, freeze all other IO, and try reading the block from other
+	 * devices.  If we find one, we re-write and check it that fixes the
+	 * read error.  This is all done synchronously while the array is
+	 * frozen.
+	 */
+	if (mddev->ro) {
+		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+	} else if (test_bit(FailFast, &rdev->flags)) {
+		md_error(mddev, rdev);
+	} else {
 		freeze_array(conf, 1);
 		fix_read_error(conf, r1_bio);
 		unfreeze_array(conf);
-	} else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
-		md_error(mddev, rdev);
-	} else {
-		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
 	}
 
 	rdev_dec_pending(rdev, conf->mddev);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
@ 2026-05-29  5:43 ` Christoph Hellwig
  2026-06-01  5:53   ` Xiao Ni
  2026-06-08 17:40   ` sashiko-bot
  2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
  2 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:43 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

This condition much better fits into the only caller, limiting
fix_read_error to actually fix up data devices after a read error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 8fad1692cf66..e510ad7eef32 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	struct mddev *mddev = conf->mddev;
 	struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
 
-	if (exceed_read_errors(mddev, rdev)) {
-		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
-		return;
-	}
-
 	while(sectors) {
 		int s = sectors;
 		int d = read_disk;
@@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 		md_error(mddev, rdev);
 	} else {
 		freeze_array(conf, 1);
-		fix_read_error(conf, r1_bio);
+		if (exceed_read_errors(mddev, rdev))
+			r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+		else
+			fix_read_error(conf, r1_bio);
 		unfreeze_array(conf);
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: two small RAID1 cleanups
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
@ 2026-05-31 10:19 ` Yu Kuai
  2 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2026-05-31 10:19 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu, yukuai; +Cc: Li Nan, Xiao Ni, linux-raid

在 2026/5/29 13:42, Christoph Hellwig 写道:

> Hi all,
>
> this series has two little cleanups that helped me understand the
> code when reading through it.
>
> Diffstat:
>   raid1.c |   44 ++++++++++++++++++++------------------------
>   1 file changed, 20 insertions(+), 24 deletions(-)
Applied to md-7.2

-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] md/raid1: cleanup handle_read_error
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
@ 2026-06-01  5:51   ` Xiao Ni
  0 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-06-01  5:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid

On Fri, May 29, 2026 at 1:46 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Unwind the main conditional with duplicate conditions and initialize
> variables at initialization time where possible.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/raid1.c | 34 ++++++++++++++++------------------
>  1 file changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 64d970e2ef50..8fad1692cf66 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>
>  static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  {
> +       struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev;
> +       struct bio *bio = r1_bio->bios[r1_bio->read_disk];
>         struct mddev *mddev = conf->mddev;
> -       struct bio *bio;
> -       struct md_rdev *rdev;
>         sector_t sector;
>
>         clear_bit(R1BIO_ReadError, &r1_bio->state);
> -       /* we got a read error. Maybe the drive is bad.  Maybe just
> -        * the block and we can fix it.
> -        * We freeze all other IO, and try reading the block from
> -        * other devices.  When we find one, we re-write
> -        * and check it that fixes the read error.
> -        * This is all done synchronously while the array is
> -        * frozen
> -        */
>
> -       bio = r1_bio->bios[r1_bio->read_disk];
>         bio_put(bio);
>         r1_bio->bios[r1_bio->read_disk] = NULL;
>
> -       rdev = conf->mirrors[r1_bio->read_disk].rdev;
> -       if (mddev->ro == 0
> -           && !test_bit(FailFast, &rdev->flags)) {
> +       /*
> +        * We got a read error. Maybe the drive is bad.  Maybe just the block
> +        * and we can fix it.
> +        *
> +        * If allowed, freeze all other IO, and try reading the block from other
> +        * devices.  If we find one, we re-write and check it that fixes the
> +        * read error.  This is all done synchronously while the array is
> +        * frozen.
> +        */
> +       if (mddev->ro) {
> +               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +       } else if (test_bit(FailFast, &rdev->flags)) {
> +               md_error(mddev, rdev);
> +       } else {
>                 freeze_array(conf, 1);
>                 fix_read_error(conf, r1_bio);
>                 unfreeze_array(conf);
> -       } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> -               md_error(mddev, rdev);
> -       } else {
> -               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
>         }
>
>         rdev_dec_pending(rdev, conf->mddev);
> --
> 2.53.0
>
>

Yes, it's more readable. This patch is good to me.
Reviewed-by: Xiao Ni <xiao@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
@ 2026-06-01  5:53   ` Xiao Ni
  2026-06-08 17:40   ` sashiko-bot
  1 sibling, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-06-01  5:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid

On Fri, May 29, 2026 at 1:45 PM Christoph Hellwig <hch@lst.de> wrote:
>
> This condition much better fits into the only caller, limiting
> fix_read_error to actually fix up data devices after a read error.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/raid1.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 8fad1692cf66..e510ad7eef32 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>         struct mddev *mddev = conf->mddev;
>         struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
>
> -       if (exceed_read_errors(mddev, rdev)) {
> -               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> -               return;
> -       }
> -
>         while(sectors) {
>                 int s = sectors;
>                 int d = read_disk;
> @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>                 md_error(mddev, rdev);
>         } else {
>                 freeze_array(conf, 1);
> -               fix_read_error(conf, r1_bio);
> +               if (exceed_read_errors(mddev, rdev))
> +                       r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +               else
> +                       fix_read_error(conf, r1_bio);
>                 unfreeze_array(conf);
>         }
>
> --
> 2.53.0
>
>
This patch looks good to me.
Reviewed-by: Xiao Ni <xiao@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
  2026-06-01  5:53   ` Xiao Ni
@ 2026-06-08 17:40   ` sashiko-bot
  1 sibling, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-08 17:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Low] The initialization of `rdev` at the beginning of `fix_read_error()` is a dead store.

Pre-existing issues:
- [High] Memory leak of `r1_bio` when retrying `REQ_NOWAIT` reads during an array freeze or resync barrier.
- [High] Infinite loop of read retries on a `FailFast` device when it is the last good device.
- [High] Lockless access to `conf->mirrors[d].rdev` in `fix_read_error` without RCU protection or `reconfig_mutex` leads to a Use-After-Free race condition.
--

commit 8f0f2f192596802d5deb747f4831c98556fe83cc
Author: Christoph Hellwig <hch@lst.de>

md/raid1: move the exceed_read_errors condition out of fix_read_error

This commit moves the exceed_read_errors() check from fix_read_error() to
its caller, handle_read_error(). This restricts fix_read_error() to only
fixing up data devices after a read error.

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  	struct mddev *mddev = conf->mddev;
>  	struct md_rdev *rdev = conf->mirrors[read_disk].rdev;

[Severity: Low]
Is this initialization of rdev a dead store? Since the exceed_read_errors()
check was moved out of fix_read_error(), this initial assignment of rdev
is no longer used before being unconditionally reassigned inside the while
loop.

> -	if (exceed_read_errors(mddev, rdev)) {
> -		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> -		return;
> -	}
> -
>  	while(sectors) {
>  		int s = sectors;
>  		int d = read_disk;

[Severity: High]
This isn't a bug introduced by this patch, but does the lockless access to
conf->mirrors[d].rdev inside the while(sectors) loop lead to a
Use-After-Free race condition?

Looking at fix_read_error(), the code iterates over mirror indices and
dereferences rdev without holding rcu_read_lock():

    do {
        rdev = conf->mirrors[d].rdev;
        if (rdev &&
            (test_bit(In_sync, &rdev->flags) ||
             (!test_bit(Faulty, &rdev->flags) &&
              rdev->recovery_offset >= sect + s)) &&
            rdev_has_badblock(rdev, sect, s) == 0) {
            atomic_inc(&rdev->nr_pending);

If a concurrent management thread executes hot_remove_disk ->
raid1_remove_disk() -> raid1_remove_conf(), it can check nr_pending
(seeing 0 for mirrors where d != read_disk), set the pointer to NULL, and
kick the device. Since md_kick_rdev_from_array uses synchronize_rcu() and
fix_read_error() holds no RCU lock, can rdev be freed while
fix_read_error() continues to access it?

[ ... ]

> @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  		md_error(mddev, rdev);
>  	} else {

[Severity: High]
This isn't a bug introduced by this patch, but can the FailFast branch
cause an infinite loop of read retries when it is the last good device?

In handle_read_error(), if a read error occurs on an rdev with the FailFast
flag set, md_error(mddev, rdev) is called and the else branch is completely
skipped. If this rdev happens to be the last good device, raid1_error()
refuses to set the Faulty bit.

Because the else branch is skipped, r1_bio->bios[r1_bio->read_disk] is not
set to IO_BLOCKED. Since the device is neither Faulty nor IO_BLOCKED,
raid1_read_request() considers the same device readable and selects it
again, causing an immediate failure and re-queuing that hangs the thread.

>  		freeze_array(conf, 1);
> -		fix_read_error(conf, r1_bio);
> +		if (exceed_read_errors(mddev, rdev))
> +			r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +		else
> +			fix_read_error(conf, r1_bio);
>  		unfreeze_array(conf);
>  	}

[Severity: High]
This is a pre-existing issue, but does retrying a REQ_NOWAIT read during an
array freeze or resync barrier leak r1_bio?

When handle_read_error() queues a retry for a read request with REQ_NOWAIT,
it calls raid1_read_request() and passes the existing r1_bio:

drivers/md/raid1.c:handle_read_error() {
    ...
    r1_bio->state = 0;
    raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
    allow_barrier(conf, sector);
}

If there is an active barrier, wait_read_barrier() correctly identifies the
nowait flag and returns false. raid1_read_request() handles this by calling
bio_wouldblock_error() and returning early:

drivers/md/raid1.c:raid1_read_request() {
    ...
    if (!wait_read_barrier(conf, bio->bi_iter.bi_sector,
                           bio->bi_opf & REQ_NOWAIT)) {
        bio_wouldblock_error(bio);
        return;
    }
    ...
}

Does this early return bypass the cleanup, permanently leaking the mempool
allocated r1_bio and potentially causing a system deadlock?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529054308.2720300-1-hch@lst.de?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-08 17:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
2026-06-01  5:51   ` Xiao Ni
2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
2026-06-01  5:53   ` Xiao Ni
2026-06-08 17:40   ` sashiko-bot
2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.