Linux RAID subsystem development
 help / color / mirror / Atom feed
From: "chencheng" <chencheng@fnnas.com>
To: <sashiko-reviews@lists.linux.dev>
Cc: <yukuai@fygo.io>, <linux-raid@vger.kernel.org>
Subject: Re: [PATCH v2] md/raid5: fix reshape deadlock while failed devices more than max degraded
Date: Tue, 16 Jun 2026 12:04:30 +0800	[thread overview]
Message-ID: <ddaaa79b-6171-40a5-a004-0178ecc0a44f@fnnas.com> (raw)
In-Reply-To: <20260615114929.E5BAD1F00A3A@smtp.kernel.org>

在 2026/6/15 19:49, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Critical] Clearing STRIPE_EXPANDING prematurely in handle_failed_reshape() allows a destination stripe to be freed and reused while asynchronous DMA is still writing to its pages, leading to DMA-after-free memory corruption.
> --
>
> commit ec55b78c975b4f9f35e287b83bee72d76b92b33a
> Author: Chen Cheng <chencheng@fnnas.com>
>
> md/raid5: fix reshape deadlock while failed devices more than max degraded
>      
> This patch fixes a deadlock in raid5 reshape by cleaning up stripe states
> (including destination and source stripes) when failed devices exceed the
> max degraded count.
>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index d0d7f96fd7cd9..2f4d82bd7c2e6 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3730,6 +3730,78 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
>>   		md_sync_error(conf->mddev);
>>   }
>>   
>> +/*
>> + * handle_failed_reshape - handl failed stripes when reshape failed and
>> + *			   degraded devices >= max_degraded
> [ ... ]
>
>> +static void handle_failed_reshape(struct r5conf *conf, struct stripe_head *sh,
>> +				  struct stripe_head_state *s)
>> +{
>> +	int i;
> [ ... ]
>
>> +	/* release the destination stripes that are waiting to be filled */
>> +	if (test_and_clear_bit(STRIPE_EXPAND_SOURCE, &sh->state)) {
>> +		for (i = 0; i < sh->disks; i++) {
>> +			int dd_idx;
>> +			struct stripe_head *sh2;
>> +			sector_t bn, sec;
>> +
>> +			if (i == sh->pd_idx)
>> +				continue;
>> +			if (conf->level == 6 && i == sh->qd_idx)
>> +				continue;
>> +
>> +			bn = raid5_compute_blocknr(sh, i, 1);
>> +			sec = raid5_compute_sector(conf, bn, 0, &dd_idx, NULL);
>> +			sh2 = raid5_get_active_stripe(conf, NULL, sec,
>> +					R5_GAS_NOBLOCK | R5_GAS_NOQUIESCE);
>> +			if (!sh2)
>> +				continue;
>> +
>> +			if (test_and_clear_bit(STRIPE_EXPANDING, &sh2->state)) {
> [Severity: Critical]
> Could clearing STRIPE_EXPANDING prematurely here lead to a DMA-after-free?
>
> During a reshape, a destination stripe (sh2) might require data from multiple
> source stripes. STRIPE_EXPANDING normally prevents the destination stripe from
> being returned to the free list before all data is copied.
>
> If Worker 1 is handling a source stripe via handle_stripe_expansion(), it
> submits an asynchronous DMA copy to sh2's pages and then calls
> raid5_release_stripe(sh2) to drop its reference BEFORE waiting for the DMA to
> complete:
>
> drivers/md/raid5.c:handle_stripe_expansion() {
>      ...
>              raid5_release_stripe(sh2);
>          }
>      /* done submitting copies, wait for them to complete */
>      async_tx_quiesce(&tx);
>      ...
> }
>
> If the array concurrently fails, Worker 2 could enter handle_failed_reshape()
> for a different source stripe, look up sh2, clear its STRIPE_EXPANDING bit,
> and release it.
>
> If sh2's refcount drops to 0 without STRIPE_EXPANDING set, could it be
> added to the inactive_list and reallocated (or have its pages freed) while
> the DMA from Worker 1 is still actively writing to it?


that make sense, however I can't find a card or processer to support offload
memcpy , neither emulate, so I can't re-produce this scenario.


theoretically, the fix way I can think of is use refcnt to point-out the number
of offloading memcpy tasks, after reach to zero, the destination stripe do
cleanup, otherwise , delay source stripe handle.



>> +				atomic_dec(&conf->reshape_stripes);
>> +				wake_up(&conf->wait_for_reshape);
>> +				md_done_sync(conf->mddev,
>> +					     RAID5_STRIPE_SECTORS(conf));
>> +			}
>> +
>> +			clear_bit(STRIPE_EXPAND_READY, &sh2->state);
>> +
>> +			raid5_release_stripe(sh2);
>> +		}
>> +	}
>> +}

      reply	other threads:[~2026-06-16  4:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 11:34 [PATCH v2] md/raid5: fix reshape deadlock while failed devices more than max degraded Chen Cheng
2026-06-15 11:49 ` sashiko-bot
2026-06-16  4:04   ` chencheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddaaa79b-6171-40a5-a004-0178ecc0a44f@fnnas.com \
    --to=chencheng@fnnas.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox