public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: "Yu Kuai" <yukuai@fnnas.com>
To: "Abd-Alrhman Masalkhi" <abd.masalkhi@gmail.com>,
	<song@kernel.org>,  <shli@fb.com>, <neilb@suse.com>,
	<linux-raid@vger.kernel.org>,  <linux-kernel@vger.kernel.org>,
	<yukuai@fnnas.com>
Subject: Re: [PATCH] md/raid1: fix bio splitting in raid1 thread to avoid recursion and deadlock
Date: Tue, 28 Apr 2026 16:54:54 +0800	[thread overview]
Message-ID: <2cf6f585-a0de-4c84-9cfc-05e1f6fde549@fnnas.com> (raw)
In-Reply-To: <20260427103446.300378-1-abd.masalkhi@gmail.com>

Hi,

在 2026/4/27 18:34, Abd-Alrhman Masalkhi 写道:
> Splitting a bio while executing in the raid1 thread can lead to
> recursion, as task->bio_list is NULL in this context.
>
> In addition, resubmitting an md_cloned_bio after splitting may lead to
> a deadlock if the array is suspended before the md driver calls
> percpu_ref_tryget_live(&mddev->active_io) on it's path to
> pers->make_request().

I don't understand, I agree this is problematic in the suspend case, but
what's wrong with task->bio_list being NULL? This can only cause the reverse
order because the split bio will submit first. However this is not a big deal
as this is the slow error patch.

If suspend is the only problem here, the simple fix is to add checking
in md_handle_request().

>
> Avoid splitting the bio in this context and require that it is either
> read in full or not at all.
>
> This prevents recursion and avoids potential deadlocks during array
> suspension.
>
> Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
> Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
> ---
> I sent an email about this issue two days ago, but at the time I was not
> sure whether it was a real problem or a misunderstanding on my part.
>
> After further analysis, it appears that this issue can occur.
>
> Apologies for the earlier confusion, and thank you for your time.
>
> Abd-Alrhman
> ---
>   drivers/md/raid1.c | 33 ++++++++++++++++++++++++---------
>   1 file changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index cc9914bd15c1..14f6d6625811 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -607,7 +607,7 @@ static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio,
>   
>   		/* choose the first disk even if it has some bad blocks. */
>   		read_len = raid1_check_read_range(rdev, this_sector, &len);
> -		if (read_len > 0) {
> +		if (read_len > 0 && (!*max_sectors || read_len == r1_bio->sectors)) {
>   			update_read_sectors(conf, disk, this_sector, read_len);
>   			*max_sectors = read_len;
>   			return disk;
> @@ -704,8 +704,13 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
>   	}
>   
>   	if (bb_disk != -1) {
> -		*max_sectors = bb_read_len;
> -		update_read_sectors(conf, bb_disk, this_sector, bb_read_len);
> +		if (!*max_sectors || bb_read_len == r1_bio->sectors) {
> +			*max_sectors = bb_read_len;
> +			update_read_sectors(conf, bb_disk, this_sector,
> +					    bb_read_len);
> +		} else {
> +			bb_disk = -1;
> +		}
>   	}
>   
>   	return bb_disk;
> @@ -852,8 +857,9 @@ static int choose_best_rdev(struct r1conf *conf, struct r1bio *r1_bio)
>    * disks and disks with bad blocks for now. Only pay attention to key disk
>    * choice.
>    *
> - * 3) If we've made it this far, now look for disks with bad blocks and choose
> - * the one with most number of sectors.
> + * 3) If we've made it this far and *max_sectors is 0 (i.e., we are tolerant
> + * of bad blocks), look for disks with bad blocks and choose the one with
> + * the most sectors.
>    *
>    * 4) If we are all the way at the end, we have no choice but to use a disk even
>    * if it is write mostly.
> @@ -882,11 +888,13 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio,
>   	/*
>   	 * If we are here it means we didn't find a perfectly good disk so
>   	 * now spend a bit more time trying to find one with the most good
> -	 * sectors.
> +	 * sectors. but only if we are tolerant of bad blocks.
>   	 */
> -	disk = choose_bb_rdev(conf, r1_bio, max_sectors);
> -	if (disk >= 0)
> -		return disk;
> +	if (!*max_sectors) {
> +		disk = choose_bb_rdev(conf, r1_bio, max_sectors);
> +		if (disk >= 0)
> +			return disk;
> +	}
>   
>   	return choose_slow_rdev(conf, r1_bio, max_sectors);
>   }
> @@ -1346,7 +1354,14 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
>   	/*
>   	 * make_request() can abort the operation when read-ahead is being
>   	 * used and no empty request is available.
> +	 *
> +	 * If we allow splitting the bio while executing in the raid1 thread,
> +	 * we may end up recursing (current->bio_list is NULL), and we might
> +	 * also deadlock if we try to suspend the array, since we are
> +	 * resubmitting an md_cloned_bio. Therefore, we must be read either
> +	 * all the sectors or none.
>   	 */
> +	max_sectors = r1bio_existed;
>   	rdisk = read_balance(conf, r1_bio, &max_sectors);
>   	if (rdisk < 0) {
>   		/* couldn't find anywhere to read from */

-- 
Thansk,
Kuai

  parent reply	other threads:[~2026-04-28  8:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 10:34 [PATCH] md/raid1: fix bio splitting in raid1 thread to avoid recursion and deadlock Abd-Alrhman Masalkhi
2026-04-27 14:49 ` Paul Menzel
2026-04-27 17:44   ` Abd-Alrhman Masalkhi
2026-04-28  8:16   ` Abd-Alrhman Masalkhi
2026-04-28  8:54 ` Yu Kuai [this message]
2026-04-28  9:46   ` Abd-Alrhman Masalkhi
  -- strict thread matches above, loose matches on Subject: below --
2026-04-15  7:01 [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal chensong_2000
2026-04-15  7:40 ` Christoph Hellwig
2026-04-16 10:33 ` Petr Mladek
2026-04-19  0:07   ` Song Chen
2026-04-16 12:30 ` David Laight
2026-04-16 14:54   ` Petr Mladek
2026-04-16 19:15     ` David Laight
2026-04-19  0:21   ` Song Chen
2026-04-20  5:44 ` Masami Hiramatsu
2026-04-21  9:05   ` Petr Mladek
     [not found] <20260413080701.180976-1-chensong_2000@189.cn>
2026-04-14 14:33 ` [RFC PATCH 2/2] kernel/module: Decouple klp and ftrace from load_module Petr Pavlu
2026-04-15  6:43   ` Song Chen
2026-04-16 11:18     ` Petr Pavlu
2026-04-16 14:49       ` Petr Mladek
2026-04-20  2:27         ` Masami Hiramatsu
2026-04-16 13:09     ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2cf6f585-a0de-4c84-9cfc-05e1f6fde549@fnnas.com \
    --to=yukuai@fnnas.com \
    --cc=abd.masalkhi@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=shli@fb.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox