Re: [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
To: sashiko-reviews@lists.linux.dev
Cc: linux-raid@vger.kernel.org, yukuai@fygo.io
Subject: Re: [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling
Date: Mon, 15 Jun 2026 10:11:24 +0200	[thread overview]
Message-ID: <m21pe8xlxv.fsf@gmail.com> (raw)
In-Reply-To: <20260613184757.57BC51F000E9@smtp.kernel.org>


Hi Kuai,

On Sat, Jun 13, 2026 at 18:47 +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> New issues:
> - [Critical] Moving `atomic_inc(&rdev->nr_pending)` down opens a race window with `hot_remove_disk`, leading to a NULL pointer dereference during error handling or a Use-After-Free in clustered configurations.
>

I don't think this can happen. the array must be suspended before any
disk can be removed.

> Pre-existing issues:
> - [High] A `REQ_ATOMIC` write can be silently split if a bad block overlaps the start of the write, violating atomicity guarantees.

I don't think this is possible. bio_submit_split_bioset() will fail
rather than split the bio.

> --
>
> commit c334f57d91677f32131f21e50bb32b83136e5a60
> Author: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
>
> md/raid1: simplify raid1_write_request() error handling
>
> This commit delays the rdev->nr_pending increment until after badblocks
> are checked to avoid incrementing and immediately decrementing on skipped
> devices. It also consolidates error handling paths and modifies REQ_ATOMIC
> failures to complete with BLK_STS_NOTSUPP instead of I/O errors.
>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 632d72607e11e..86d4f224ffb11 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
> [ ... ]
>> @@ -1576,23 +1573,21 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>>  		if (!rdev || test_bit(Faulty, &rdev->flags))
>>  			continue;
>>  
>> -		atomic_inc(&rdev->nr_pending);
>>  		if (test_bit(WriteErrorSeen, &rdev->flags)) {
>>  			sector_t first_bad;
>>  			sector_t bad_sectors;
>>  			int is_bad;
>>  
>> -			is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
>> +			is_bad = is_badblock(rdev, sector, max_sectors,
>>  					     &first_bad, &bad_sectors);
>> -			if (is_bad && first_bad <= r1_bio->sector) {
>> +			if (is_bad && first_bad <= sector) {
>>  				/* Cannot write here at all */
>> -				bad_sectors -= (r1_bio->sector - first_bad);
>> +				bad_sectors -= (sector - first_bad);
>>  				if (bad_sectors < max_sectors)
>>  					/* mustn't write more than bad_sectors
>>  					 * to other devices yet
>>  					 */
>>  					max_sectors = bad_sectors;
>> -				rdev_dec_pending(rdev, mddev);
>>  				continue;
>
> [Severity: High]
> This is a pre-existing issue, but does this logic skip the rejection of
> REQ_ATOMIC bios if a bad block overlaps the start of the write?
>
> When is_badblock() finds a bad block at or before the write sector, the
> code shrinks max_sectors to force a split at the boundary and then calls
> continue. Because of the continue, the REQ_ATOMIC bio skips the rejection
> logic below.
>
> Could this silently split an atomic write into smaller pieces and submit
> them to the healthy mirrors via bio_submit_split_bioset(), violating
> block-level atomicity guarantees?
>
>>  			}
>>  			if (is_bad) {
> [ ... ]
>>  				if (bio->bi_opf & REQ_ATOMIC) {
>> -					rdev_dec_pending(rdev, mddev);
>> -					goto err_handle;
>> +					bio->bi_status = BLK_STS_NOTSUPP;
>> +					bio_endio(bio);

I misunderstood the comment. it should call bio_io_error(bio) instead.
I'll fix this in v3.

>> +					goto err_dec_pending;
>>  				}
>>  
>> -				good_sectors = first_bad - r1_bio->sector;
>> +				good_sectors = first_bad - sector;
>>  				if (good_sectors < max_sectors)
>>  					max_sectors = good_sectors;
>>  			}
>>  		}
>> +
>> +		atomic_inc(&rdev->nr_pending);
>
> [Severity: Critical]
> Does moving atomic_inc(&rdev->nr_pending) down open a race window with
> hot_remove_disk()?
>
> Because nr_pending acts as an existence lock that prevents the device from
> being removed from the array configuration by raid1_remove_conf(),
> incrementing it after the is_badblock() and test_bit(WriteErrorSeen, ...)
> checks means md_check_recovery() could evaluate rdev_removeable() while
> nr_pending is 0 and call hot_remove_disk().
>
> If raid1_remove_conf() unlinks the device by setting
> conf->mirrors[i].rdev = NULL, when this thread resumes and the I/O
> eventually fails, would the error handling path (handle_write_finished()
> calling narrow_write_error()) read conf->mirrors[i].rdev and dereference a
> NULL pointer?
>
> Additionally, in clustered configurations, if the unlinked rdev is
> immediately freed by md_kick_rdev_from_array(), could this lead to a
> use-after-free directly in raid1_write_request() when evaluating
> test_bit(WriteErrorSeen, &rdev->flags) or is_badblock()?
>
>>  		r1_bio->bios[i] = bio;
>>  	}
>
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260613182810.1317258-1-abd.masalkhi@gmail.com?part=4

-- 
Best Regards,
Abd-Alrhman

     prev parent reply	other threads:[~2026-06-15  8:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
2026-06-13 18:40   ` sashiko-bot
2026-06-13 18:28 ` [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
2026-06-13 18:47   ` sashiko-bot
2026-06-15  8:11     ` Abd-Alrhman Masalkhi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m21pe8xlxv.fsf@gmail.com \
    --to=abd.masalkhi@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox