two small RAID1 cleanups

Linux RAID subsystem development
 help / color / mirror / Atom feed

* two small RAID1 cleanups
@ 2026-05-29  5:42 Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

Hi all,

this series has two little cleanups that helped me understand the
code when reading through it.

Diffstat:
 raid1.c |   44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] md/raid1: cleanup handle_read_error
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
@ 2026-05-29  5:42 ` Christoph Hellwig
  2026-06-01  5:51   ` Xiao Ni
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
  2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

Unwind the main conditional with duplicate conditions and initialize
variables at initialization time where possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 64d970e2ef50..8fad1692cf66 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
 
 static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 {
+	struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev;
+	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
 	struct mddev *mddev = conf->mddev;
-	struct bio *bio;
-	struct md_rdev *rdev;
 	sector_t sector;
 
 	clear_bit(R1BIO_ReadError, &r1_bio->state);
-	/* we got a read error. Maybe the drive is bad.  Maybe just
-	 * the block and we can fix it.
-	 * We freeze all other IO, and try reading the block from
-	 * other devices.  When we find one, we re-write
-	 * and check it that fixes the read error.
-	 * This is all done synchronously while the array is
-	 * frozen
-	 */
 
-	bio = r1_bio->bios[r1_bio->read_disk];
 	bio_put(bio);
 	r1_bio->bios[r1_bio->read_disk] = NULL;
 
-	rdev = conf->mirrors[r1_bio->read_disk].rdev;
-	if (mddev->ro == 0
-	    && !test_bit(FailFast, &rdev->flags)) {
+	/*
+	 * We got a read error. Maybe the drive is bad.  Maybe just the block
+	 * and we can fix it.
+	 *
+	 * If allowed, freeze all other IO, and try reading the block from other
+	 * devices.  If we find one, we re-write and check it that fixes the
+	 * read error.  This is all done synchronously while the array is
+	 * frozen.
+	 */
+	if (mddev->ro) {
+		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+	} else if (test_bit(FailFast, &rdev->flags)) {
+		md_error(mddev, rdev);
+	} else {
 		freeze_array(conf, 1);
 		fix_read_error(conf, r1_bio);
 		unfreeze_array(conf);
-	} else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
-		md_error(mddev, rdev);
-	} else {
-		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
 	}
 
 	rdev_dec_pending(rdev, conf->mddev);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] md/raid1: cleanup handle_read_error
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
@ 2026-06-01  5:51   ` Xiao Ni
  0 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-06-01  5:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid

On Fri, May 29, 2026 at 1:46 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Unwind the main conditional with duplicate conditions and initialize
> variables at initialization time where possible.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/raid1.c | 34 ++++++++++++++++------------------
>  1 file changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 64d970e2ef50..8fad1692cf66 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>
>  static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  {
> +       struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev;
> +       struct bio *bio = r1_bio->bios[r1_bio->read_disk];
>         struct mddev *mddev = conf->mddev;
> -       struct bio *bio;
> -       struct md_rdev *rdev;
>         sector_t sector;
>
>         clear_bit(R1BIO_ReadError, &r1_bio->state);
> -       /* we got a read error. Maybe the drive is bad.  Maybe just
> -        * the block and we can fix it.
> -        * We freeze all other IO, and try reading the block from
> -        * other devices.  When we find one, we re-write
> -        * and check it that fixes the read error.
> -        * This is all done synchronously while the array is
> -        * frozen
> -        */
>
> -       bio = r1_bio->bios[r1_bio->read_disk];
>         bio_put(bio);
>         r1_bio->bios[r1_bio->read_disk] = NULL;
>
> -       rdev = conf->mirrors[r1_bio->read_disk].rdev;
> -       if (mddev->ro == 0
> -           && !test_bit(FailFast, &rdev->flags)) {
> +       /*
> +        * We got a read error. Maybe the drive is bad.  Maybe just the block
> +        * and we can fix it.
> +        *
> +        * If allowed, freeze all other IO, and try reading the block from other
> +        * devices.  If we find one, we re-write and check it that fixes the
> +        * read error.  This is all done synchronously while the array is
> +        * frozen.
> +        */
> +       if (mddev->ro) {
> +               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +       } else if (test_bit(FailFast, &rdev->flags)) {
> +               md_error(mddev, rdev);
> +       } else {
>                 freeze_array(conf, 1);
>                 fix_read_error(conf, r1_bio);
>                 unfreeze_array(conf);
> -       } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> -               md_error(mddev, rdev);
> -       } else {
> -               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
>         }
>
>         rdev_dec_pending(rdev, conf->mddev);
> --
> 2.53.0
>
>

Yes, it's more readable. This patch is good to me.
Reviewed-by: Xiao Ni <xiao@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
@ 2026-05-29  5:43 ` Christoph Hellwig
  2026-06-01  5:53   ` Xiao Ni
  2026-06-08 17:40   ` sashiko-bot
  2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
  2 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29  5:43 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid

This condition much better fits into the only caller, limiting
fix_read_error to actually fix up data devices after a read error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 8fad1692cf66..e510ad7eef32 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	struct mddev *mddev = conf->mddev;
 	struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
 
-	if (exceed_read_errors(mddev, rdev)) {
-		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
-		return;
-	}
-
 	while(sectors) {
 		int s = sectors;
 		int d = read_disk;
@@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 		md_error(mddev, rdev);
 	} else {
 		freeze_array(conf, 1);
-		fix_read_error(conf, r1_bio);
+		if (exceed_read_errors(mddev, rdev))
+			r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+		else
+			fix_read_error(conf, r1_bio);
 		unfreeze_array(conf);
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
@ 2026-06-01  5:53   ` Xiao Ni
  2026-06-08 17:40   ` sashiko-bot
  1 sibling, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2026-06-01  5:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid

On Fri, May 29, 2026 at 1:45 PM Christoph Hellwig <hch@lst.de> wrote:
>
> This condition much better fits into the only caller, limiting
> fix_read_error to actually fix up data devices after a read error.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/raid1.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 8fad1692cf66..e510ad7eef32 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>         struct mddev *mddev = conf->mddev;
>         struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
>
> -       if (exceed_read_errors(mddev, rdev)) {
> -               r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> -               return;
> -       }
> -
>         while(sectors) {
>                 int s = sectors;
>                 int d = read_disk;
> @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>                 md_error(mddev, rdev);
>         } else {
>                 freeze_array(conf, 1);
> -               fix_read_error(conf, r1_bio);
> +               if (exceed_read_errors(mddev, rdev))
> +                       r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +               else
> +                       fix_read_error(conf, r1_bio);
>                 unfreeze_array(conf);
>         }
>
> --
> 2.53.0
>
>
This patch looks good to me.
Reviewed-by: Xiao Ni <xiao@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
  2026-06-01  5:53   ` Xiao Ni
@ 2026-06-08 17:40   ` sashiko-bot
  1 sibling, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-08 17:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Low] The initialization of `rdev` at the beginning of `fix_read_error()` is a dead store.

Pre-existing issues:
- [High] Memory leak of `r1_bio` when retrying `REQ_NOWAIT` reads during an array freeze or resync barrier.
- [High] Infinite loop of read retries on a `FailFast` device when it is the last good device.
- [High] Lockless access to `conf->mirrors[d].rdev` in `fix_read_error` without RCU protection or `reconfig_mutex` leads to a Use-After-Free race condition.
--

commit 8f0f2f192596802d5deb747f4831c98556fe83cc
Author: Christoph Hellwig <hch@lst.de>

md/raid1: move the exceed_read_errors condition out of fix_read_error

This commit moves the exceed_read_errors() check from fix_read_error() to
its caller, handle_read_error(). This restricts fix_read_error() to only
fixing up data devices after a read error.

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  	struct mddev *mddev = conf->mddev;
>  	struct md_rdev *rdev = conf->mirrors[read_disk].rdev;

[Severity: Low]
Is this initialization of rdev a dead store? Since the exceed_read_errors()
check was moved out of fix_read_error(), this initial assignment of rdev
is no longer used before being unconditionally reassigned inside the while
loop.

> -	if (exceed_read_errors(mddev, rdev)) {
> -		r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> -		return;
> -	}
> -
>  	while(sectors) {
>  		int s = sectors;
>  		int d = read_disk;

[Severity: High]
This isn't a bug introduced by this patch, but does the lockless access to
conf->mirrors[d].rdev inside the while(sectors) loop lead to a
Use-After-Free race condition?

Looking at fix_read_error(), the code iterates over mirror indices and
dereferences rdev without holding rcu_read_lock():

    do {
        rdev = conf->mirrors[d].rdev;
        if (rdev &&
            (test_bit(In_sync, &rdev->flags) ||
             (!test_bit(Faulty, &rdev->flags) &&
              rdev->recovery_offset >= sect + s)) &&
            rdev_has_badblock(rdev, sect, s) == 0) {
            atomic_inc(&rdev->nr_pending);

If a concurrent management thread executes hot_remove_disk ->
raid1_remove_disk() -> raid1_remove_conf(), it can check nr_pending
(seeing 0 for mirrors where d != read_disk), set the pointer to NULL, and
kick the device. Since md_kick_rdev_from_array uses synchronize_rcu() and
fix_read_error() holds no RCU lock, can rdev be freed while
fix_read_error() continues to access it?

[ ... ]

> @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>  		md_error(mddev, rdev);
>  	} else {

[Severity: High]
This isn't a bug introduced by this patch, but can the FailFast branch
cause an infinite loop of read retries when it is the last good device?

In handle_read_error(), if a read error occurs on an rdev with the FailFast
flag set, md_error(mddev, rdev) is called and the else branch is completely
skipped. If this rdev happens to be the last good device, raid1_error()
refuses to set the Faulty bit.

Because the else branch is skipped, r1_bio->bios[r1_bio->read_disk] is not
set to IO_BLOCKED. Since the device is neither Faulty nor IO_BLOCKED,
raid1_read_request() considers the same device readable and selects it
again, causing an immediate failure and re-queuing that hangs the thread.

>  		freeze_array(conf, 1);
> -		fix_read_error(conf, r1_bio);
> +		if (exceed_read_errors(mddev, rdev))
> +			r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> +		else
> +			fix_read_error(conf, r1_bio);
>  		unfreeze_array(conf);
>  	}

[Severity: High]
This is a pre-existing issue, but does retrying a REQ_NOWAIT read during an
array freeze or resync barrier leak r1_bio?

When handle_read_error() queues a retry for a read request with REQ_NOWAIT,
it calls raid1_read_request() and passes the existing r1_bio:

drivers/md/raid1.c:handle_read_error() {
    ...
    r1_bio->state = 0;
    raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
    allow_barrier(conf, sector);
}

If there is an active barrier, wait_read_barrier() correctly identifies the
nowait flag and returns false. raid1_read_request() handles this by calling
bio_wouldblock_error() and returning early:

drivers/md/raid1.c:raid1_read_request() {
    ...
    if (!wait_read_barrier(conf, bio->bi_iter.bi_sector,
                           bio->bi_opf & REQ_NOWAIT)) {
        bio_wouldblock_error(bio);
        return;
    }
    ...
}

Does this early return bypass the cleanup, permanently leaking the mempool
allocated r1_bio and potentially causing a system deadlock?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529054308.2720300-1-hch@lst.de?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: two small RAID1 cleanups
  2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
  2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
  2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
@ 2026-05-31 10:19 ` Yu Kuai
  2 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2026-05-31 10:19 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu, yukuai; +Cc: Li Nan, Xiao Ni, linux-raid

在 2026/5/29 13:42, Christoph Hellwig 写道:

> Hi all,
>
> this series has two little cleanups that helped me understand the
> code when reading through it.
>
> Diffstat:
>   raid1.c |   44 ++++++++++++++++++++------------------------
>   1 file changed, 20 insertions(+), 24 deletions(-)
Applied to md-7.2

-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-08 17:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  5:42 two small RAID1 cleanups Christoph Hellwig
2026-05-29  5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
2026-06-01  5:51   ` Xiao Ni
2026-05-29  5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
2026-06-01  5:53   ` Xiao Ni
2026-06-08 17:40   ` sashiko-bot
2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox