Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH] md/raid5: protect batch_head->bm_seq updates
@ 2026-06-18  6:55 Chen Cheng
  2026-06-18 10:36 ` Paul Menzel
  0 siblings, 1 reply; 5+ messages in thread
From: Chen Cheng @ 2026-06-18  6:55 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, chenchneg33, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

bm_seq means "stripe delay to flush until bm_seq <= seq_write".

do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
when bm_seq >= seq_write.

after raid5d() flushes bitmap update and ++seq_write, and
active_bit_delay() retry to release delayed stripes.

the stripe batch head must carry the newest bm_seq among all
member stripes, because the whole batch later released according
to the batch head state and bm_seq.

race scenario:
===================
1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
3. cpu1 write 102, and cpu0 overwrite with 101;

the point is, if the head has a lower bm_seq than one of its
members, the whole batch could be released before that
member's bitmap is flushed.
and the on-disk bitmap not record sh1's changes.

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/raid5.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a08230aac711..ee145a7bf9e8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct r5conf *conf,
 		/*
 		 * at this point, head's BATCH_READY could be cleared, but we
 		 * can still add the stripe to batch list
 		 */
 		list_add(&sh->batch_list, &head->batch_list);
-		spin_unlock(&head->batch_head->batch_lock);
 	} else {
 		head->batch_head = head;
 		sh->batch_head = head->batch_head;
 		spin_lock(&head->batch_lock);
 		list_add_tail(&sh->batch_list, &head->batch_list);
-		spin_unlock(&head->batch_lock);
 	}
 
-	if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
-		if (atomic_dec_return(&conf->preread_active_stripes)
-		    < IO_THRESHOLD)
-			md_wakeup_thread(conf->mddev->thread);
-
 	if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
 		int seq = sh->bm_seq;
 		if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
 		    sh->batch_head->bm_seq - seq > 0)
 			seq = sh->batch_head->bm_seq;
 		set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
 		sh->batch_head->bm_seq = seq;
 	}
+	spin_unlock(&head->batch_head->batch_lock);
+
+	if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
+		if (atomic_dec_return(&conf->preread_active_stripes)
+		    < IO_THRESHOLD)
+			md_wakeup_thread(conf->mddev->thread);
 
 	atomic_inc(&sh->count);
 unlock_out:
 	unlock_two_stripes(head, sh);
 out:
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
  2026-06-18  6:55 [PATCH] md/raid5: protect batch_head->bm_seq updates Chen Cheng
@ 2026-06-18 10:36 ` Paul Menzel
  2026-06-18 11:26   ` Chen Cheng
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2026-06-18 10:36 UTC (permalink / raw)
  To: Chen Cheng; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel

Dear Chen,


Thank you for very much.

Am 18.06.26 um 08:55 schrieb Chen Cheng:
> From: Chen Cheng <chencheng@fnnas.com>
> 
> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
> 
> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
> when bm_seq >= seq_write.
> 
> after raid5d() flushes bitmap update and ++seq_write, and
> active_bit_delay() retry to release delayed stripes.
> 
> the stripe batch head must carry the newest bm_seq among all
> member stripes, because the whole batch later released according
> to the batch head state and bm_seq.
> 
> race scenario:
> ===================
> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
> 3. cpu1 write 102, and cpu0 overwrite with 101;
> 
> the point is, if the head has a lower bm_seq than one of its
> members, the whole batch could be released before that
> member's bitmap is flushed.
> and the on-disk bitmap not record sh1's changes.

It’s a little hard to read. Could you please improve the wording of the 
last paragraph, and maybe also start each sentence with a capital 
letter. Maybe also use 75 characters per line.

Do you have a reproducer by any chance?

> Signed-off-by: Chen Cheng <chencheng@fnnas.com>

Also add a Fixes: tag?

> ---
>   drivers/md/raid5.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a08230aac711..ee145a7bf9e8 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct r5conf *conf,
>   		/*
>   		 * at this point, head's BATCH_READY could be cleared, but we
>   		 * can still add the stripe to batch list
>   		 */
>   		list_add(&sh->batch_list, &head->batch_list);
> -		spin_unlock(&head->batch_head->batch_lock);
>   	} else {
>   		head->batch_head = head;
>   		sh->batch_head = head->batch_head;
>   		spin_lock(&head->batch_lock);
>   		list_add_tail(&sh->batch_list, &head->batch_list);
> -		spin_unlock(&head->batch_lock);
>   	}
>   
> -	if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> -		if (atomic_dec_return(&conf->preread_active_stripes)
> -		    < IO_THRESHOLD)
> -			md_wakeup_thread(conf->mddev->thread);
> -
>   	if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
>   		int seq = sh->bm_seq;
>   		if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
>   		    sh->batch_head->bm_seq - seq > 0)
>   			seq = sh->batch_head->bm_seq;
>   		set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
>   		sh->batch_head->bm_seq = seq;
>   	}
> +	spin_unlock(&head->batch_head->batch_lock);
> +
> +	if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> +		if (atomic_dec_return(&conf->preread_active_stripes)
> +		    < IO_THRESHOLD)
> +			md_wakeup_thread(conf->mddev->thread);
>   
>   	atomic_inc(&sh->count);
>   unlock_out:
>   	unlock_two_stripes(head, sh);
>   out:


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
  2026-06-18 10:36 ` Paul Menzel
@ 2026-06-18 11:26   ` Chen Cheng
  2026-06-18 12:15     ` Paul Menzel
  0 siblings, 1 reply; 5+ messages in thread
From: Chen Cheng @ 2026-06-18 11:26 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel

在 2026/6/18 18:36, Paul Menzel 写道:
> Dear Chen,
> 
> 
> Thank you for very much.
> 
> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>> From: Chen Cheng <chencheng@fnnas.com>
>>
>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>
>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>> when bm_seq >= seq_write.
>>
>> after raid5d() flushes bitmap update and ++seq_write, and
>> active_bit_delay() retry to release delayed stripes.
>>
>> the stripe batch head must carry the newest bm_seq among all
>> member stripes, because the whole batch later released according
>> to the batch head state and bm_seq.
>>
>> race scenario:
>> ===================
>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>
>> the point is, if the head has a lower bm_seq than one of its
>> members, the whole batch could be released before that
>> member's bitmap is flushed.
>> and the on-disk bitmap not record sh1's changes.
> 
> It’s a little hard to read. Could you please improve the wording of the 
> last paragraph, and maybe also start each sentence with a capital 
> letter. Maybe also use 75 characters per line.
> 
> Do you have a reproducer by any chance?

Hi Paul,

Thanks to review, and , I will follow your advise.

Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not 
for this one. Because it's reported by sashiko-review bot, and , I think 
it's a true risk.

I will try to make a reproducer for this case later , after I figure-out 
the other KCSAN reports.


> 
>> Signed-off-by: Chen Cheng <chencheng@fnnas.com>
> 
> Also add a Fixes: tag?
> 
>> ---
>>   drivers/md/raid5.c | 13 ++++++-------
>>   1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index a08230aac711..ee145a7bf9e8 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -980,32 +980,31 @@ static void stripe_add_to_batch_list(struct 
>> r5conf *conf,
>>           /*
>>            * at this point, head's BATCH_READY could be cleared, but we
>>            * can still add the stripe to batch list
>>            */
>>           list_add(&sh->batch_list, &head->batch_list);
>> -        spin_unlock(&head->batch_head->batch_lock);
>>       } else {
>>           head->batch_head = head;
>>           sh->batch_head = head->batch_head;
>>           spin_lock(&head->batch_lock);
>>           list_add_tail(&sh->batch_list, &head->batch_list);
>> -        spin_unlock(&head->batch_lock);
>>       }
>> -    if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
>> -        if (atomic_dec_return(&conf->preread_active_stripes)
>> -            < IO_THRESHOLD)
>> -            md_wakeup_thread(conf->mddev->thread);
>> -
>>       if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
>>           int seq = sh->bm_seq;
>>           if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
>>               sh->batch_head->bm_seq - seq > 0)
>>               seq = sh->batch_head->bm_seq;
>>           set_bit(STRIPE_BIT_DELAY, &sh->batch_head->state);
>>           sh->batch_head->bm_seq = seq;
>>       }
>> +    spin_unlock(&head->batch_head->batch_lock);
>> +
>> +    if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
>> +        if (atomic_dec_return(&conf->preread_active_stripes)
>> +            < IO_THRESHOLD)
>> +            md_wakeup_thread(conf->mddev->thread);
>>       atomic_inc(&sh->count);
>>   unlock_out:
>>       unlock_two_stripes(head, sh);
>>   out:
> 
> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
  2026-06-18 11:26   ` Chen Cheng
@ 2026-06-18 12:15     ` Paul Menzel
  2026-06-18 12:30       ` Chen Cheng
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2026-06-18 12:15 UTC (permalink / raw)
  To: Chen Cheng; +Cc: linux-raid, yukuai, yukuai, chenchneg33, linux-kernel

Dear Chen,


Am 18.06.26 um 13:26 schrieb Chen Cheng:
> 在 2026/6/18 18:36, Paul Menzel 写道:

>> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>>> From: Chen Cheng <chencheng@fnnas.com>
>>>
>>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>>
>>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>>> when bm_seq >= seq_write.
>>>
>>> after raid5d() flushes bitmap update and ++seq_write, and
>>> active_bit_delay() retry to release delayed stripes.
>>>
>>> the stripe batch head must carry the newest bm_seq among all
>>> member stripes, because the whole batch later released according
>>> to the batch head state and bm_seq.
>>>
>>> race scenario:
>>> ===================
>>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>>
>>> the point is, if the head has a lower bm_seq than one of its
>>> members, the whole batch could be released before that
>>> member's bitmap is flushed.
>>> and the on-disk bitmap not record sh1's changes.
>>
>> It’s a little hard to read. Could you please improve the wording of the
>> last paragraph, and maybe also start each sentence with a capital
>> letter. Maybe also use 75 characters per line.
>>
>> Do you have a reproducer by any chance?

> Thanks to review, and , I will follow your advise.

Thank you for reading my comments.

> Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
> for this one. Because it's reported by sashiko-review bot, and, I think
> it's a true risk.

Maybe also mention sashiko-review.

> I will try to make a reproducer for this case later, after I figure-out
> the other KCSAN reports.

A reproducer is not required, I was just interested, how the issue was 
found. So don’t spend too much on it or at all.

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
  2026-06-18 12:15     ` Paul Menzel
@ 2026-06-18 12:30       ` Chen Cheng
  0 siblings, 0 replies; 5+ messages in thread
From: Chen Cheng @ 2026-06-18 12:30 UTC (permalink / raw)
  To: Paul Menzel, Chen Cheng; +Cc: linux-raid, yukuai, yukuai, linux-kernel



在 2026/6/18 20:15, Paul Menzel 写道:
> Dear Chen,
> 
> 
> Am 18.06.26 um 13:26 schrieb Chen Cheng:
>> 在 2026/6/18 18:36, Paul Menzel 写道:
> 
>>> Am 18.06.26 um 08:55 schrieb Chen Cheng:
>>>> From: Chen Cheng <chencheng@fnnas.com>
>>>>
>>>> bm_seq means "stripe delay to flush until bm_seq <= seq_write".
>>>>
>>>> do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
>>>> when bm_seq >= seq_write.
>>>>
>>>> after raid5d() flushes bitmap update and ++seq_write, and
>>>> active_bit_delay() retry to release delayed stripes.
>>>>
>>>> the stripe batch head must carry the newest bm_seq among all
>>>> member stripes, because the whole batch later released according
>>>> to the batch head state and bm_seq.
>>>>
>>>> race scenario:
>>>> ===================
>>>> 1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
>>>> 2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
>>>> 3. cpu1 write 102, and cpu0 overwrite with 101;
>>>>
>>>> the point is, if the head has a lower bm_seq than one of its
>>>> members, the whole batch could be released before that
>>>> member's bitmap is flushed.
>>>> and the on-disk bitmap not record sh1's changes.
>>>
>>> It’s a little hard to read. Could you please improve the wording of the
>>> last paragraph, and maybe also start each sentence with a capital
>>> letter. Maybe also use 75 characters per line.
>>>
>>> Do you have a reproducer by any chance?
> 
>> Thanks to review, and , I will follow your advise.
> 
> Thank you for reading my comments.
> 
>> Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
>> for this one. Because it's reported by sashiko-review bot, and, I think
>> it's a true risk.
> 
> Maybe also mention sashiko-review.
> 
>> I will try to make a reproducer for this case later, after I figure-out
>> the other KCSAN reports.
> 
> A reproducer is not required, I was just interested, how the issue was 
> found. So don’t spend too much on it or at all.


Well , I think a good reproducer has to be :
- Concurrency as much path as possible ,
	e.g. reshape concurrency with normal I/O, etc...

Optionally ,
- Run some Sanitizer,
- Inject some fault, e.g. bad block, write error, disk drop...
- Provide some background I/O pressure,
- Do some combinations describes above

> 
> […]
> 
> 
> Kind regards,
> 
> Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-18 12:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  6:55 [PATCH] md/raid5: protect batch_head->bm_seq updates Chen Cheng
2026-06-18 10:36 ` Paul Menzel
2026-06-18 11:26   ` Chen Cheng
2026-06-18 12:15     ` Paul Menzel
2026-06-18 12:30       ` Chen Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox