Re: [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
To: sashiko-reviews@lists.linux.dev
Cc: linux-raid@vger.kernel.org, yukuai@fygo.io
Subject: Re: [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling
Date: Mon, 15 Jun 2026 10:11:24 +0200	[thread overview]
Message-ID: <m21pe8xlxv.fsf@gmail.com> (raw)
In-Reply-To: <20260613184757.57BC51F000E9@smtp.kernel.org>


Hi Kuai,

On Sat, Jun 13, 2026 at 18:47 +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> New issues:
> - [Critical] Moving `atomic_inc(&rdev->nr_pending)` down opens a race window with `hot_remove_disk`, leading to a NULL pointer dereference during error handling or a Use-After-Free in clustered configurations.
>

I don't think this can happen. the array must be suspended before any
disk can be removed.

> Pre-existing issues:
> - [High] A `REQ_ATOMIC` write can be silently split if a bad block overlaps the start of the write, violating atomicity guarantees.

I don't think this is possible. bio_submit_split_bioset() will fail
rather than split the bio.

> --
>
> commit c334f57d91677f32131f21e50bb32b83136e5a60
> Author: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
>
> md/raid1: simplify raid1_write_request() error handling
>
> This commit delays the rdev->nr_pending increment until after badblocks
> are checked to avoid incrementing and immediately decrementing on skipped
> devices. It also consolidates error handling paths and modifies REQ_ATOMIC
> failures to complete with BLK_STS_NOTSUPP instead of I/O errors.
>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 632d72607e11e..86d4f224ffb11 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
> [ ... ]
>> @@ -1576,23 +1573,21 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>>  		if (!rdev || test_bit(Faulty, &rdev->flags))
>>  			continue;
>>  
>> -		atomic_inc(&rdev->nr_pending);
>>  		if (test_bit(WriteErrorSeen, &rdev->flags)) {
>>  			sector_t first_bad;
>>  			sector_t bad_sectors;
>>  			int is_bad;
>>  
>> -			is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
>> +			is_bad = is_badblock(rdev, sector, max_sectors,
>>  					     &first_bad, &bad_sectors);
>> -			if (is_bad && first_bad <= r1_bio->sector) {
>> +			if (is_bad && first_bad <= sector) {
>>  				/* Cannot write here at all */
>> -				bad_sectors -= (r1_bio->sector - first_bad);
>> +				bad_sectors -= (sector - first_bad);
>>  				if (bad_sectors < max_sectors)
>>  					/* mustn't write more than bad_sectors
>>  					 * to other devices yet
>>  					 */
>>  					max_sectors = bad_sectors;
>> -				rdev_dec_pending(rdev, mddev);
>>  				continue;
>
> [Severity: High]
> This is a pre-existing issue, but does this logic skip the rejection of
> REQ_ATOMIC bios if a bad block overlaps the start of the write?
>
> When is_badblock() finds a bad block at or before the write sector, the
> code shrinks max_sectors to force a split at the boundary and then calls
> continue. Because of the continue, the REQ_ATOMIC bio skips the rejection
> logic below.
>
> Could this silently split an atomic write into smaller pieces and submit
> them to the healthy mirrors via bio_submit_split_bioset(), violating
> block-level atomicity guarantees?
>
>>  			}
>>  			if (is_bad) {
> [ ... ]
>>  				if (bio->bi_opf & REQ_ATOMIC) {
>> -					rdev_dec_pending(rdev, mddev);
>> -					goto err_handle;
>> +					bio->bi_status = BLK_STS_NOTSUPP;
>> +					bio_endio(bio);

I misunderstood the comment. it should call bio_io_error(bio) instead.
I'll fix this in v3.

>> +					goto err_dec_pending;
>>  				}
>>  
>> -				good_sectors = first_bad - r1_bio->sector;
>> +				good_sectors = first_bad - sector;
>>  				if (good_sectors < max_sectors)
>>  					max_sectors = good_sectors;
>>  			}
>>  		}
>> +
>> +		atomic_inc(&rdev->nr_pending);
>
> [Severity: Critical]
> Does moving atomic_inc(&rdev->nr_pending) down open a race window with
> hot_remove_disk()?
>
> Because nr_pending acts as an existence lock that prevents the device from
> being removed from the array configuration by raid1_remove_conf(),
> incrementing it after the is_badblock() and test_bit(WriteErrorSeen, ...)
> checks means md_check_recovery() could evaluate rdev_removeable() while
> nr_pending is 0 and call hot_remove_disk().
>
> If raid1_remove_conf() unlinks the device by setting
> conf->mirrors[i].rdev = NULL, when this thread resumes and the I/O
> eventually fails, would the error handling path (handle_write_finished()
> calling narrow_write_error()) read conf->mirrors[i].rdev and dereference a
> NULL pointer?
>
> Additionally, in clustered configurations, if the unlinked rdev is
> immediately freed by md_kick_rdev_from_array(), could this lead to a
> use-after-free directly in raid1_write_request() when evaluating
> test_bit(WriteErrorSeen, &rdev->flags) or is_badblock()?
>
>>  		r1_bio->bios[i] = bio;
>>  	}
>
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260613182810.1317258-1-abd.masalkhi@gmail.com?part=4

-- 
Best Regards,
Abd-Alrhman

     prev parent reply	other threads:[~2026-06-15  8:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
2026-06-13 18:40   ` sashiko-bot
2026-06-13 18:28 ` [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
2026-06-13 18:47   ` sashiko-bot
2026-06-15  8:11     ` Abd-Alrhman Masalkhi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m21pe8xlxv.fsf@gmail.com \
    --to=abd.masalkhi@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.