From: sashiko-bot@kernel.org
To: "Christoph Hellwig" <hch@lst.de>
Cc: yukuai@fygo.io, linux-raid@vger.kernel.org
Subject: Re: [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error
Date: Mon, 08 Jun 2026 17:40:13 +0000 [thread overview]
Message-ID: <20260608174013.79EB41F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260529054308.2720300-3-hch@lst.de>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Low] The initialization of `rdev` at the beginning of `fix_read_error()` is a dead store.
Pre-existing issues:
- [High] Memory leak of `r1_bio` when retrying `REQ_NOWAIT` reads during an array freeze or resync barrier.
- [High] Infinite loop of read retries on a `FailFast` device when it is the last good device.
- [High] Lockless access to `conf->mirrors[d].rdev` in `fix_read_error` without RCU protection or `reconfig_mutex` leads to a Use-After-Free race condition.
--
commit 8f0f2f192596802d5deb747f4831c98556fe83cc
Author: Christoph Hellwig <hch@lst.de>
md/raid1: move the exceed_read_errors condition out of fix_read_error
This commit moves the exceed_read_errors() check from fix_read_error() to
its caller, handle_read_error(). This restricts fix_read_error() to only
fixing up data devices after a read error.
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2411,11 +2411,6 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> struct mddev *mddev = conf->mddev;
> struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
[Severity: Low]
Is this initialization of rdev a dead store? Since the exceed_read_errors()
check was moved out of fix_read_error(), this initial assignment of rdev
is no longer used before being unconditionally reassigned inside the while
loop.
> - if (exceed_read_errors(mddev, rdev)) {
> - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> - return;
> - }
> -
> while(sectors) {
> int s = sectors;
> int d = read_disk;
[Severity: High]
This isn't a bug introduced by this patch, but does the lockless access to
conf->mirrors[d].rdev inside the while(sectors) loop lead to a
Use-After-Free race condition?
Looking at fix_read_error(), the code iterates over mirror indices and
dereferences rdev without holding rcu_read_lock():
do {
rdev = conf->mirrors[d].rdev;
if (rdev &&
(test_bit(In_sync, &rdev->flags) ||
(!test_bit(Faulty, &rdev->flags) &&
rdev->recovery_offset >= sect + s)) &&
rdev_has_badblock(rdev, sect, s) == 0) {
atomic_inc(&rdev->nr_pending);
If a concurrent management thread executes hot_remove_disk ->
raid1_remove_disk() -> raid1_remove_conf(), it can check nr_pending
(seeing 0 for mirrors where d != read_disk), set the pointer to NULL, and
kick the device. Since md_kick_rdev_from_array uses synchronize_rcu() and
fix_read_error() holds no RCU lock, can rdev be freed while
fix_read_error() continues to access it?
[ ... ]
> @@ -2652,7 +2647,10 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> md_error(mddev, rdev);
> } else {
[Severity: High]
This isn't a bug introduced by this patch, but can the FailFast branch
cause an infinite loop of read retries when it is the last good device?
In handle_read_error(), if a read error occurs on an rdev with the FailFast
flag set, md_error(mddev, rdev) is called and the else branch is completely
skipped. If this rdev happens to be the last good device, raid1_error()
refuses to set the Faulty bit.
Because the else branch is skipped, r1_bio->bios[r1_bio->read_disk] is not
set to IO_BLOCKED. Since the device is neither Faulty nor IO_BLOCKED,
raid1_read_request() considers the same device readable and selects it
again, causing an immediate failure and re-queuing that hangs the thread.
> freeze_array(conf, 1);
> - fix_read_error(conf, r1_bio);
> + if (exceed_read_errors(mddev, rdev))
> + r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> + else
> + fix_read_error(conf, r1_bio);
> unfreeze_array(conf);
> }
[Severity: High]
This is a pre-existing issue, but does retrying a REQ_NOWAIT read during an
array freeze or resync barrier leak r1_bio?
When handle_read_error() queues a retry for a read request with REQ_NOWAIT,
it calls raid1_read_request() and passes the existing r1_bio:
drivers/md/raid1.c:handle_read_error() {
...
r1_bio->state = 0;
raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
allow_barrier(conf, sector);
}
If there is an active barrier, wait_read_barrier() correctly identifies the
nowait flag and returns false. raid1_read_request() handles this by calling
bio_wouldblock_error() and returning early:
drivers/md/raid1.c:raid1_read_request() {
...
if (!wait_read_barrier(conf, bio->bi_iter.bi_sector,
bio->bi_opf & REQ_NOWAIT)) {
bio_wouldblock_error(bio);
return;
}
...
}
Does this early return bypass the cleanup, permanently leaking the mempool
allocated r1_bio and potentially causing a system deadlock?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260529054308.2720300-1-hch@lst.de?part=2
next prev parent reply other threads:[~2026-06-08 17:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 5:42 two small RAID1 cleanups Christoph Hellwig
2026-05-29 5:42 ` [PATCH 1/2] md/raid1: cleanup handle_read_error Christoph Hellwig
2026-06-01 5:51 ` Xiao Ni
2026-05-29 5:43 ` [PATCH 2/2] md/raid1: move the exceed_read_errors condition out of fix_read_error Christoph Hellwig
2026-06-01 5:53 ` Xiao Ni
2026-06-08 17:40 ` sashiko-bot [this message]
2026-05-31 10:19 ` two small RAID1 cleanups Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608174013.79EB41F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=hch@lst.de \
--cc=linux-raid@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox