* [PATCH] md/raid5: protect batch_head->bm_seq updates
@ 2026-06-18 6:55 Chen Cheng
2026-06-18 10:36 ` Paul Menzel
0 siblings, 1 reply; 5+ messages in thread
From: Chen Cheng @ 2026-06-18 6:55 UTC (permalink / raw)
To: linux-raid, yukuai, yukuai; +Cc: chencheng, chenchneg33, linux-kernel
From: Chen Cheng <chencheng@fnnas.com>
bm_seq means "stripe delay to flush until bm_seq <= seq_write".
do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
when bm_seq >= seq_write.
after raid5d() flushes bitmap update and ++seq_write, and
active_bit_delay() retry to release delayed stripes.
the stripe batch head must carry the newest bm_seq among all
member stripes, because the whole batch later released according
to the batch head state and bm_seq.
race scenario:
===================
1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
3. cpu1 write 102, and cpu0 overwrite with 101;
the point is, if the head has a lower bm_seq than one of its
members, the whole batch could be released before that
member's bitmap is flushed.
and the on-disk bitmap not record sh1's changes.
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
drivers/md/raid5.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a08230aac711..ee145a7bf9e8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct r5conf *conf,
/*
* at this point, head's BATCH_READY could be cleared, but we
* can still add the stripe to batch list
*/
list_add(&sh->batch_list, &head->batch_list);
- spin_unlock(&head->batch_head->batch_lock);
} else {
head->batch_head = head;
sh->batch_head = head->batch_head;
spin_lock(&head->batch_lock);
list_add_tail(&sh->batch_list, &head->batch_list);
- spin_unlock(&head->batch_lock);
}
- if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
- if (atomic_dec_return(&conf->preread_active_stripes)
- < IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
-
if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
int seq = sh->bm_seq;
if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
sh->batch_head->bm_seq - seq > 0)
seq = sh->batch_head->bm_seq;
set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
sh->batch_head->bm_seq = seq;
}
+ spin_unlock(&head->batch_head->batch_lock);
+
+ if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
+ if (atomic_dec_return(&conf->preread_active_stripes)
+ < IO_THRESHOLD)
+ md_wakeup_thread(conf->mddev->thread);
atomic_inc(&sh->count);
unlock_out:
unlock_two_stripes(head, sh);
out:
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
2026-06-18 6:55 [PATCH] md/raid5: protect batch_head->bm_seq updates Chen Cheng
@ 2026-06-18 10:36 ` Paul Menzel
2026-06-18 11:26 ` Chen Cheng
0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2026-06-18 10:36 UTC (permalink / raw)
To: Chen Cheng; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel
Dear Chen,
Thank you for very much.
Am 18.06.26 um 08:55 schrieb Chen Cheng:
> From: Chen Cheng <chencheng@fnnas.com>
>
> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>
> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
> when bm_seq >= seq_write.
>
> after raid5d() flushes bitmap update and ++seq_write, and
> active_bit_delay() retry to release delayed stripes.
>
> the stripe batch head must carry the newest bm_seq among all
> member stripes, because the whole batch later released according
> to the batch head state and bm_seq.
>
> race scenario:
> ===================
> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
> 3. cpu1 write 102, and cpu0 overwrite with 101;
>
> the point is, if the head has a lower bm_seq than one of its
> members, the whole batch could be released before that
> member's bitmap is flushed.
> and the on-disk bitmap not record sh1's changes.
It’s a little hard to read. Could you please improve the wording of the
last paragraph, and maybe also start each sentence with a capital
letter. Maybe also use 75 characters per line.
Do you have a reproducer by any chance?
> Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Also add a Fixes: tag?
> ---
> drivers/md/raid5.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a08230aac711..ee145a7bf9e8 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct r5conf *conf,
> /*
> * at this point, head's BATCH_READY could be cleared, but we
> * can still add the stripe to batch list
> */
> list_add(&sh->batch_list, &head->batch_list);
> - spin_unlock(&head->batch_head->batch_lock);
> } else {
> head->batch_head = head;
> sh->batch_head = head->batch_head;
> spin_lock(&head->batch_lock);
> list_add_tail(&sh->batch_list, &head->batch_list);
> - spin_unlock(&head->batch_lock);
> }
>
> - if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> - if (atomic_dec_return(&conf->preread_active_stripes)
> - < IO_THRESHOLD)
> - md_wakeup_thread(conf->mddev->thread);
> -
> if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
> int seq = sh->bm_seq;
> if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
> sh->batch_head->bm_seq - seq > 0)
> seq = sh->batch_head->bm_seq;
> set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
> sh->batch_head->bm_seq = seq;
> }
> + spin_unlock(&head->batch_head->batch_lock);
> +
> + if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> + if (atomic_dec_return(&conf->preread_active_stripes)
> + < IO_THRESHOLD)
> + md_wakeup_thread(conf->mddev->thread);
>
> atomic_inc(&sh->count);
> unlock_out:
> unlock_two_stripes(head, sh);
> out:
Kind regards,
Paul
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
2026-06-18 10:36 ` Paul Menzel
@ 2026-06-18 11:26 ` Chen Cheng
2026-06-18 12:15 ` Paul Menzel
0 siblings, 1 reply; 5+ messages in thread
From: Chen Cheng @ 2026-06-18 11:26 UTC (permalink / raw)
To: Paul Menzel; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel
在 2026/6/18 18:36, Paul Menzel 写道:
> Dear Chen,
>
>
> Thank you for very much.
>
> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>> From: Chen Cheng <chencheng@fnnas.com>
>>
>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>
>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>> when bm_seq >= seq_write.
>>
>> after raid5d() flushes bitmap update and ++seq_write, and
>> active_bit_delay() retry to release delayed stripes.
>>
>> the stripe batch head must carry the newest bm_seq among all
>> member stripes, because the whole batch later released according
>> to the batch head state and bm_seq.
>>
>> race scenario:
>> ===================
>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>
>> the point is, if the head has a lower bm_seq than one of its
>> members, the whole batch could be released before that
>> member's bitmap is flushed.
>> and the on-disk bitmap not record sh1's changes.
>
> It’s a little hard to read. Could you please improve the wording of the
> last paragraph, and maybe also start each sentence with a capital
> letter. Maybe also use 75 characters per line.
>
> Do you have a reproducer by any chance?
Hi Paul,
Thanks to review, and , I will follow your advise.
Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
for this one. Because it's reported by sashiko-review bot, and , I think
it's a true risk.
I will try to make a reproducer for this case later , after I figure-out
the other KCSAN reports.
>
>> Signed-off-by: Chen Cheng <chencheng@fnnas.com>
>
> Also add a Fixes: tag?
>
>> ---
>> drivers/md/raid5.c | 13 ++++++-------
>> 1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index a08230aac711..ee145a7bf9e8 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct
>> r5conf *conf,
>> /*
>> * at this point, head's BATCH_READY could be cleared, but we
>> * can still add the stripe to batch list
>> */
>> list_add(&sh->batch_list, &head->batch_list);
>> - spin_unlock(&head->batch_head->batch_lock);
>> } else {
>> head->batch_head = head;
>> sh->batch_head = head->batch_head;
>> spin_lock(&head->batch_lock);
>> list_add_tail(&sh->batch_list, &head->batch_list);
>> - spin_unlock(&head->batch_lock);
>> }
>> - if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
>> - if (atomic_dec_return(&conf->preread_active_stripes)
>> - < IO_THRESHOLD)
>> - md_wakeup_thread(conf->mddev->thread);
>> -
>> if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
>> int seq = sh->bm_seq;
>> if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
>> sh->batch_head->bm_seq - seq > 0)
>> seq = sh->batch_head->bm_seq;
>> set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
>> sh->batch_head->bm_seq = seq;
>> }
>> + spin_unlock(&head->batch_head->batch_lock);
>> +
>> + if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
>> + if (atomic_dec_return(&conf->preread_active_stripes)
>> + < IO_THRESHOLD)
>> + md_wakeup_thread(conf->mddev->thread);
>> atomic_inc(&sh->count);
>> unlock_out:
>> unlock_two_stripes(head, sh);
>> out:
>
>
> Kind regards,
>
> Paul
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
2026-06-18 11:26 ` Chen Cheng
@ 2026-06-18 12:15 ` Paul Menzel
2026-06-18 12:30 ` Chen Cheng
0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2026-06-18 12:15 UTC (permalink / raw)
To: Chen Cheng; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel
Dear Chen,
Am 18.06.26 um 13:26 schrieb Chen Cheng:
> 在 2026/6/18 18:36, Paul Menzel 写道:
>> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>>> From: Chen Cheng <chencheng@fnnas.com>
>>>
>>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>>
>>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>>> when bm_seq >= seq_write.
>>>
>>> after raid5d() flushes bitmap update and ++seq_write, and
>>> active_bit_delay() retry to release delayed stripes.
>>>
>>> the stripe batch head must carry the newest bm_seq among all
>>> member stripes, because the whole batch later released according
>>> to the batch head state and bm_seq.
>>>
>>> race scenario:
>>> ===================
>>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>>
>>> the point is, if the head has a lower bm_seq than one of its
>>> members, the whole batch could be released before that
>>> member's bitmap is flushed.
>>> and the on-disk bitmap not record sh1's changes.
>>
>> It’s a little hard to read. Could you please improve the wording of the
>> last paragraph, and maybe also start each sentence with a capital
>> letter. Maybe also use 75 characters per line.
>>
>> Do you have a reproducer by any chance?
> Thanks to review, and , I will follow your advise.
Thank you for reading my comments.
> Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
> for this one. Because it's reported by sashiko-review bot, and, I think
> it's a true risk.
Maybe also mention sashiko-review.
> I will try to make a reproducer for this case later, after I figure-out
> the other KCSAN reports.
A reproducer is not required, I was just interested, how the issue was
found. So don’t spend too much on it or at all.
[…]
Kind regards,
Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
2026-06-18 12:15 ` Paul Menzel
@ 2026-06-18 12:30 ` Chen Cheng
0 siblings, 0 replies; 5+ messages in thread
From: Chen Cheng @ 2026-06-18 12:30 UTC (permalink / raw)
To: Paul Menzel, Chen Cheng; +Cc: linux-raid, yukuai, yukuai, linux-kernel
在 2026/6/18 20:15, Paul Menzel 写道:
> Dear Chen,
>
>
> Am 18.06.26 um 13:26 schrieb Chen Cheng:
>> 在 2026/6/18 18:36, Paul Menzel 写道:
>
>>> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>>>> From: Chen Cheng <chencheng@fnnas.com>
>>>>
>>>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>>>
>>>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>>>> when bm_seq >= seq_write.
>>>>
>>>> after raid5d() flushes bitmap update and ++seq_write, and
>>>> active_bit_delay() retry to release delayed stripes.
>>>>
>>>> the stripe batch head must carry the newest bm_seq among all
>>>> member stripes, because the whole batch later released according
>>>> to the batch head state and bm_seq.
>>>>
>>>> race scenario:
>>>> ===================
>>>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>>>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>>>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>>>
>>>> the point is, if the head has a lower bm_seq than one of its
>>>> members, the whole batch could be released before that
>>>> member's bitmap is flushed.
>>>> and the on-disk bitmap not record sh1's changes.
>>>
>>> It’s a little hard to read. Could you please improve the wording of the
>>> last paragraph, and maybe also start each sentence with a capital
>>> letter. Maybe also use 75 characters per line.
>>>
>>> Do you have a reproducer by any chance?
>
>> Thanks to review, and , I will follow your advise.
>
> Thank you for reading my comments.
>
>> Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
>> for this one. Because it's reported by sashiko-review bot, and, I think
>> it's a true risk.
>
> Maybe also mention sashiko-review.
>
>> I will try to make a reproducer for this case later, after I figure-out
>> the other KCSAN reports.
>
> A reproducer is not required, I was just interested, how the issue was
> found. So don’t spend too much on it or at all.
Well , I think a good reproducer has to be :
- Concurrency as much path as possible ,
e.g. reshape concurrency with normal I/O, etc...
Optionally ,
- Run some Sanitizer,
- Inject some fault, e.g. bad block, write error, disk drop...
- Provide some background I/O pressure,
- Do some combinations describes above
>
> […]
>
>
> Kind regards,
>
> Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-18 12:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 6:55 [PATCH] md/raid5: protect batch_head->bm_seq updates Chen Cheng
2026-06-18 10:36 ` Paul Menzel
2026-06-18 11:26 ` Chen Cheng
2026-06-18 12:15 ` Paul Menzel
2026-06-18 12:30 ` Chen Cheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox