From: "chencheng" <chencheng@fnnas.com>
To: <sashiko-reviews@lists.linux.dev>
Cc: <yukuai@fygo.io>, <linux-raid@vger.kernel.org>
Subject: Re: [PATCH v2] md/raid5: fix reshape deadlock while failed devices more than max degraded
Date: Tue, 16 Jun 2026 12:04:30 +0800 [thread overview]
Message-ID: <ddaaa79b-6171-40a5-a004-0178ecc0a44f@fnnas.com> (raw)
In-Reply-To: <20260615114929.E5BAD1F00A3A@smtp.kernel.org>
在 2026/6/15 19:49, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Critical] Clearing STRIPE_EXPANDING prematurely in handle_failed_reshape() allows a destination stripe to be freed and reused while asynchronous DMA is still writing to its pages, leading to DMA-after-free memory corruption.
> --
>
> commit ec55b78c975b4f9f35e287b83bee72d76b92b33a
> Author: Chen Cheng <chencheng@fnnas.com>
>
> md/raid5: fix reshape deadlock while failed devices more than max degraded
>
> This patch fixes a deadlock in raid5 reshape by cleaning up stripe states
> (including destination and source stripes) when failed devices exceed the
> max degraded count.
>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index d0d7f96fd7cd9..2f4d82bd7c2e6 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3730,6 +3730,78 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
>> md_sync_error(conf->mddev);
>> }
>>
>> +/*
>> + * handle_failed_reshape - handl failed stripes when reshape failed and
>> + * degraded devices >= max_degraded
> [ ... ]
>
>> +static void handle_failed_reshape(struct r5conf *conf, struct stripe_head *sh,
>> + struct stripe_head_state *s)
>> +{
>> + int i;
> [ ... ]
>
>> + /* release the destination stripes that are waiting to be filled */
>> + if (test_and_clear_bit(STRIPE_EXPAND_SOURCE, &sh->state)) {
>> + for (i = 0; i < sh->disks; i++) {
>> + int dd_idx;
>> + struct stripe_head *sh2;
>> + sector_t bn, sec;
>> +
>> + if (i == sh->pd_idx)
>> + continue;
>> + if (conf->level == 6 && i == sh->qd_idx)
>> + continue;
>> +
>> + bn = raid5_compute_blocknr(sh, i, 1);
>> + sec = raid5_compute_sector(conf, bn, 0, &dd_idx, NULL);
>> + sh2 = raid5_get_active_stripe(conf, NULL, sec,
>> + R5_GAS_NOBLOCK | R5_GAS_NOQUIESCE);
>> + if (!sh2)
>> + continue;
>> +
>> + if (test_and_clear_bit(STRIPE_EXPANDING, &sh2->state)) {
> [Severity: Critical]
> Could clearing STRIPE_EXPANDING prematurely here lead to a DMA-after-free?
>
> During a reshape, a destination stripe (sh2) might require data from multiple
> source stripes. STRIPE_EXPANDING normally prevents the destination stripe from
> being returned to the free list before all data is copied.
>
> If Worker 1 is handling a source stripe via handle_stripe_expansion(), it
> submits an asynchronous DMA copy to sh2's pages and then calls
> raid5_release_stripe(sh2) to drop its reference BEFORE waiting for the DMA to
> complete:
>
> drivers/md/raid5.c:handle_stripe_expansion() {
> ...
> raid5_release_stripe(sh2);
> }
> /* done submitting copies, wait for them to complete */
> async_tx_quiesce(&tx);
> ...
> }
>
> If the array concurrently fails, Worker 2 could enter handle_failed_reshape()
> for a different source stripe, look up sh2, clear its STRIPE_EXPANDING bit,
> and release it.
>
> If sh2's refcount drops to 0 without STRIPE_EXPANDING set, could it be
> added to the inactive_list and reallocated (or have its pages freed) while
> the DMA from Worker 1 is still actively writing to it?
that make sense, however I can't find a card or processer to support offload
memcpy , neither emulate, so I can't re-produce this scenario.
theoretically, the fix way I can think of is use refcnt to point-out the number
of offloading memcpy tasks, after reach to zero, the destination stripe do
cleanup, otherwise , delay source stripe handle.
>> + atomic_dec(&conf->reshape_stripes);
>> + wake_up(&conf->wait_for_reshape);
>> + md_done_sync(conf->mddev,
>> + RAID5_STRIPE_SECTORS(conf));
>> + }
>> +
>> + clear_bit(STRIPE_EXPAND_READY, &sh2->state);
>> +
>> + raid5_release_stripe(sh2);
>> + }
>> + }
>> +}
prev parent reply other threads:[~2026-06-16 4:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 11:34 [PATCH v2] md/raid5: fix reshape deadlock while failed devices more than max degraded Chen Cheng
2026-06-15 11:49 ` sashiko-bot
2026-06-16 4:04 ` chencheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddaaa79b-6171-40a5-a004-0178ecc0a44f@fnnas.com \
--to=chencheng@fnnas.com \
--cc=linux-raid@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox