* two small RAID1 cleanups
@ 2026-05-29 5:42 Christoph Hellwig
2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-05-29 5:42 UTC (permalink / raw)
To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid
Hi all,
this series has two little cleanups that helped me understand the
code when reading through it.
Diffstat:
raid1.c | 44 ++++++++++++++++++++------------------------
1 file changed, 20 insertions(+), 24 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] md/raid1: cleanup handle_read_error 2026-05-29 5:42 two small RAID1 cleanups Christoph Hellwig @ 2026-05-29 5:42 ` Christoph Hellwig 2026-06-01 5:51 ` Xiao Ni 2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig 2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai 2 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2026-05-29 5:42 UTC (permalink / raw) To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid Unwind the main conditional with duplicate conditions and initialize variables at initialization time where possible. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/md/raid1.c | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 64d970e2ef50..8fad1692cf66 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) { + struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev; + struct bio *bio = r1_bio->bios[r1_bio->read_disk]; struct mddev *mddev = conf->mddev; - struct bio *bio; - struct md_rdev *rdev; sector_t sector; clear_bit(R1BIO_ReadError, &r1_bio->state); - /* we got a read error. Maybe the drive is bad. Maybe just - * the block and we can fix it. - * We freeze all other IO, and try reading the block from - * other devices. When we find one, we re-write - * and check it that fixes the read error. - * This is all done synchronously while the array is - * frozen - */ - bio = r1_bio->bios[r1_bio->read_disk]; bio_put(bio); r1_bio->bios[r1_bio->read_disk] = NULL; - rdev = conf->mirrors[r1_bio->read_disk].rdev; - if (mddev->ro == 0 - && !test_bit(FailFast, &rdev->flags)) { + /* + * We got a read error. Maybe the drive is bad. Maybe just the block + * and we can fix it. + * + * If allowed, freeze all other IO, and try reading the block from other + * devices. If we find one, we re-write and check it that fixes the + * read error. This is all done synchronously while the array is + * frozen. + */ + if (mddev->ro) { + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; + } else if (test_bit(FailFast, &rdev->flags)) { + md_error(mddev, rdev); + } else { freeze_array(conf, 1); fix_read_error(conf, r1_bio); unfreeze_array(conf); - } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) { - md_error(mddev, rdev); - } else { - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; } rdev_dec_pending(rdev, conf->mddev); -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] md/raid1: cleanup handle_read_error 2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig @ 2026-06-01 5:51 ` Xiao Ni 0 siblings, 0 replies; 7+ messages in thread From: Xiao Ni @ 2026-06-01 5:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid On Fri, May 29, 2026 at 1:46 PM Christoph Hellwig <hch@lst.de> wrote: > > Unwind the main conditional with duplicate conditions and initialize > variables at initialization time where possible. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/md/raid1.c | 34 ++++++++++++++++------------------ > 1 file changed, 16 insertions(+), 18 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 64d970e2ef50..8fad1692cf66 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2627,35 +2627,33 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) > > static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > { > + struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev; > + struct bio *bio = r1_bio->bios[r1_bio->read_disk]; > struct mddev *mddev = conf->mddev; > - struct bio *bio; > - struct md_rdev *rdev; > sector_t sector; > > clear_bit(R1BIO_ReadError, &r1_bio->state); > - /* we got a read error. Maybe the drive is bad. Maybe just > - * the block and we can fix it. > - * We freeze all other IO, and try reading the block from > - * other devices. When we find one, we re-write > - * and check it that fixes the read error. > - * This is all done synchronously while the array is > - * frozen > - */ > > - bio = r1_bio->bios[r1_bio->read_disk]; > bio_put(bio); > r1_bio->bios[r1_bio->read_disk] = NULL; > > - rdev = conf->mirrors[r1_bio->read_disk].rdev; > - if (mddev->ro == 0 > - && !test_bit(FailFast, &rdev->flags)) { > + /* > + * We got a read error. Maybe the drive is bad. Maybe just the block > + * and we can fix it. > + * > + * If allowed, freeze all other IO, and try reading the block from other > + * devices. If we find one, we re-write and check it that fixes the > + * read error. This is all done synchronously while the array is > + * frozen. > + */ > + if (mddev->ro) { > + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > + } else if (test_bit(FailFast, &rdev->flags)) { > + md_error(mddev, rdev); > + } else { > freeze_array(conf, 1); > fix_read_error(conf, r1_bio); > unfreeze_array(conf); > - } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) { > - md_error(mddev, rdev); > - } else { > - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > } > > rdev_dec_pending(rdev, conf->mddev); > -- > 2.53.0 > > Yes, it's more readable. This patch is good to me. Reviewed-by: Xiao Ni <xiao@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error 2026-05-29 5:42 two small RAID1 cleanups Christoph Hellwig 2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig @ 2026-05-29 5:43 ` Christoph Hellwig 2026-06-01 5:53 ` Xiao Ni 2026-06-08 17:40 ` sashiko-bot 2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai 2 siblings, 2 replies; 7+ messages in thread From: Christoph Hellwig @ 2026-05-29 5:43 UTC (permalink / raw) To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid This condition much better fits into the only caller, limiting fix_read_error to actually fix up data devices after a read error. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/md/raid1.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 8fad1692cf66..e510ad7eef32 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) struct mddev *mddev = conf->mddev; struct md_rdev *rdev = conf->mirrors[read_disk].rdev; - if (exceed_read_errors(mddev, rdev)) { - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; - return; - } - while(sectors) { int s = sectors; int d = read_disk; @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) md_error(mddev, rdev); } else { freeze_array(conf, 1); - fix_read_error(conf, r1_bio); + if (exceed_read_errors(mddev, rdev)) + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; + else + fix_read_error(conf, r1_bio); unfreeze_array(conf); } -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error 2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig @ 2026-06-01 5:53 ` Xiao Ni 2026-06-08 17:40 ` sashiko-bot 1 sibling, 0 replies; 7+ messages in thread From: Xiao Ni @ 2026-06-01 5:53 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid On Fri, May 29, 2026 at 1:45 PM Christoph Hellwig <hch@lst.de> wrote: > > This condition much better fits into the only caller, limiting > fix_read_error to actually fix up data devices after a read error. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/md/raid1.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 8fad1692cf66..e510ad7eef32 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) > struct mddev *mddev = conf->mddev; > struct md_rdev *rdev = conf->mirrors[read_disk].rdev; > > - if (exceed_read_errors(mddev, rdev)) { > - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > - return; > - } > - > while(sectors) { > int s = sectors; > int d = read_disk; > @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > md_error(mddev, rdev); > } else { > freeze_array(conf, 1); > - fix_read_error(conf, r1_bio); > + if (exceed_read_errors(mddev, rdev)) > + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > + else > + fix_read_error(conf, r1_bio); > unfreeze_array(conf); > } > > -- > 2.53.0 > > This patch looks good to me. Reviewed-by: Xiao Ni <xiao@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error 2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig 2026-06-01 5:53 ` Xiao Ni @ 2026-06-08 17:40 ` sashiko-bot 1 sibling, 0 replies; 7+ messages in thread From: sashiko-bot @ 2026-06-08 17:40 UTC (permalink / raw) To: Christoph Hellwig; +Cc: yukuai, linux-raid Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider: New issues: - [Low] The initialization of `rdev` at the beginning of `fix_read_error()` is a dead store. Pre-existing issues: - [High] Memory leak of `r1_bio` when retrying `REQ_NOWAIT` reads during an array freeze or resync barrier. - [High] Infinite loop of read retries on a `FailFast` device when it is the last good device. - [High] Lockless access to `conf->mirrors[d].rdev` in `fix_read_error` without RCU protection or `reconfig_mutex` leads to a Use-After-Free race condition. -- commit 8f0f2f192596802d5deb747f4831c98556fe83cc Author: Christoph Hellwig <hch@lst.de> md/raid1: move the exceed_read_errors condition out of fix_read_error This commit moves the exceed_read_errors() check from fix_read_error() to its caller, handle_read_error(). This restricts fix_read_error() to only fixing up data devices after a read error. > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) > struct mddev *mddev = conf->mddev; > struct md_rdev *rdev = conf->mirrors[read_disk].rdev; [Severity: Low] Is this initialization of rdev a dead store? Since the exceed_read_errors() check was moved out of fix_read_error(), this initial assignment of rdev is no longer used before being unconditionally reassigned inside the while loop. > - if (exceed_read_errors(mddev, rdev)) { > - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > - return; > - } > - > while(sectors) { > int s = sectors; > int d = read_disk; [Severity: High] This isn't a bug introduced by this patch, but does the lockless access to conf->mirrors[d].rdev inside the while(sectors) loop lead to a Use-After-Free race condition? Looking at fix_read_error(), the code iterates over mirror indices and dereferences rdev without holding rcu_read_lock(): do { rdev = conf->mirrors[d].rdev; if (rdev && (test_bit(In_sync, &rdev->flags) || (!test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >= sect + s)) && rdev_has_badblock(rdev, sect, s) == 0) { atomic_inc(&rdev->nr_pending); If a concurrent management thread executes hot_remove_disk -> raid1_remove_disk() -> raid1_remove_conf(), it can check nr_pending (seeing 0 for mirrors where d != read_disk), set the pointer to NULL, and kick the device. Since md_kick_rdev_from_array uses synchronize_rcu() and fix_read_error() holds no RCU lock, can rdev be freed while fix_read_error() continues to access it? [ ... ] > @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > md_error(mddev, rdev); > } else { [Severity: High] This isn't a bug introduced by this patch, but can the FailFast branch cause an infinite loop of read retries when it is the last good device? In handle_read_error(), if a read error occurs on an rdev with the FailFast flag set, md_error(mddev, rdev) is called and the else branch is completely skipped. If this rdev happens to be the last good device, raid1_error() refuses to set the Faulty bit. Because the else branch is skipped, r1_bio->bios[r1_bio->read_disk] is not set to IO_BLOCKED. Since the device is neither Faulty nor IO_BLOCKED, raid1_read_request() considers the same device readable and selects it again, causing an immediate failure and re-queuing that hangs the thread. > freeze_array(conf, 1); > - fix_read_error(conf, r1_bio); > + if (exceed_read_errors(mddev, rdev)) > + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > + else > + fix_read_error(conf, r1_bio); > unfreeze_array(conf); > } [Severity: High] This is a pre-existing issue, but does retrying a REQ_NOWAIT read during an array freeze or resync barrier leak r1_bio? When handle_read_error() queues a retry for a read request with REQ_NOWAIT, it calls raid1_read_request() and passes the existing r1_bio: drivers/md/raid1.c:handle_read_error() { ... r1_bio->state = 0; raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio); allow_barrier(conf, sector); } If there is an active barrier, wait_read_barrier() correctly identifies the nowait flag and returns false. raid1_read_request() handles this by calling bio_wouldblock_error() and returning early: drivers/md/raid1.c:raid1_read_request() { ... if (!wait_read_barrier(conf, bio->bi_iter.bi_sector, bio->bi_opf & REQ_NOWAIT)) { bio_wouldblock_error(bio); return; } ... } Does this early return bypass the cleanup, permanently leaking the mempool allocated r1_bio and potentially causing a system deadlock? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260529054308.2720300-1-hch@lst.de?part=2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: two small RAID1 cleanups 2026-05-29 5:42 two small RAID1 cleanups Christoph Hellwig 2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig 2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig @ 2026-05-31 10:19 ` Yu Kuai 2 siblings, 0 replies; 7+ messages in thread From: Yu Kuai @ 2026-05-31 10:19 UTC (permalink / raw) To: Christoph Hellwig, Song Liu, yukuai; +Cc: Li Nan, Xiao Ni, linux-raid 在 2026/5/29 13:42, Christoph Hellwig 写道: > Hi all, > > this series has two little cleanups that helped me understand the > code when reading through it. > > Diffstat: > raid1.c | 44 ++++++++++++++++++++------------------------ > 1 file changed, 20 insertions(+), 24 deletions(-) Applied to md-7.2 -- Thansk, Kuai ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-08 17:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-29 5:42 two small RAID1 cleanups Christoph Hellwig 2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig 2026-06-01 5:51 ` Xiao Ni 2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig 2026-06-01 5:53 ` Xiao Ni 2026-06-08 17:40 ` sashiko-bot 2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox