Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write
@ 2026-06-16 13:13 Chen Cheng
  2026-06-16 13:30 ` sashiko-bot
  0 siblings, 1 reply; 3+ messages in thread
From: Chen Cheng @ 2026-06-16 13:13 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

kcsan detect race :
- raid5d() closes the current bitmap batch by updating
	conf->seq_flush under conf->device_lock.
- __add_stripe_bio() read conf->seq_flush without that
	lock when assigning sh->bm_seq.

so, protect seq_flush/seq_write by READ_ONCE()/WRITE_ONCE().

re-explain the stripe batch sequence number update flow:
1. sh->bm_seq declare which batch number the stripe belongs to
   when perform bitmap-related write.
	==> bm_seq = seq_flush+1

2. stripe be handled,
	* if sh->bm_seq - conf->seq_write > 0, means the
	  batch stripes **newer than** the last written
	  batch, it cannot proceed yet, queued on bitmap_list.
	* otherwise , has already proceed.

3. raid5d() `++seq_flush` to closes the current batch, means
	* no more stripes join that old batch
	* just-closed batch ready to write-out to disk

4. raid5d() calls bitmap hooks unplug() or writeout, then,
   `++seq_write` to the same as bm_seq.

- seq_flush - for producer, to close batches.
- seq_write - for consumer, the checkpoint number.

the report:
====================================
BUG: KCSAN: data-race in __add_stripe_bio / raid5d

write to 0xffff88ba5625d470 of 4 bytes by task 82401 on cpu 0:
 raid5d+0x1d9/0xba0
 [.....]

read to 0xffff88ba5625d470 of 4 bytes by task 82421 on cpu 8:
 __add_stripe_bio+0x332/0x400
 raid5_make_request+0x6ac/0x2930
 md_handle_request+0x4a2/0xa40
 md_submit_bio+0x109/0x1a0
 __submit_bio+0x2ec/0x390
 [.....]

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/raid5.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a320b71d7117..f3c2959b5606 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3536,11 +3536,11 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d, logical %llu\n",
 		 (*bip)->bi_iter.bi_sector, sh->sector, dd_idx,
 		 sh->dev[dd_idx].sector);
 
 	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
-		sh->bm_seq = conf->seq_flush+1;
+		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
 		set_bit(STRIPE_BIT_DELAY, &sh->state);
 	}
 }
 
 /*
@@ -5827,11 +5827,11 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
 			md_write_inc(mddev, bi);
 			sh->overwrite_disks++;
 		}
 		spin_unlock_irq(&sh->stripe_lock);
 		if (conf->mddev->bitmap) {
-			sh->bm_seq = conf->seq_flush + 1;
+			sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
 			set_bit(STRIPE_BIT_DELAY, &sh->state);
 		}
 
 		set_bit(STRIPE_HANDLE, &sh->state);
 		clear_bit(STRIPE_DELAYED, &sh->state);
@@ -6877,16 +6877,17 @@ static void raid5d(struct md_thread *thread)
 			clear_bit(R5_DID_ALLOC, &conf->cache_state);
 
 		if (
 		    !list_empty(&conf->bitmap_list)) {
 			/* Now is a good time to flush some bitmap updates */
-			conf->seq_flush++;
+			int seq = READ_ONCE(conf->seq_flush) + 1;
+			WRITE_ONCE(conf->seq_flush, seq);
 			spin_unlock_irq(&conf->device_lock);
 			if (md_bitmap_enabled(mddev, true))
 				mddev->bitmap_ops->unplug(mddev, true);
 			spin_lock_irq(&conf->device_lock);
-			conf->seq_write = conf->seq_flush;
+			WRITE_ONCE(conf->seq_write, seq);
 			activate_bit_delay(conf, conf->temp_inactive_list);
 		}
 		raid5_activate_delayed(conf);
 
 		while ((bio = remove_bio_from_retry(conf, &offset))) {
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write
  2026-06-16 13:13 [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write Chen Cheng
@ 2026-06-16 13:30 ` sashiko-bot
  2026-06-17 13:55   ` Chen Cheng
  0 siblings, 1 reply; 3+ messages in thread
From: sashiko-bot @ 2026-06-16 13:30 UTC (permalink / raw)
  To: Chen Cheng; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Missing full memory barriers between the bitmap update/scan and `seq_flush` access allows torn writes.
--

commit 45ef15a300eccd14518bc90e254146a73a49bf51
Author: Chen Cheng <chencheng@fnnas.com>

md/raid5: protect bitmap batch counters aka seq_flush/seq_write

This commit attempts to fix a KCSAN data-race report on conf->seq_flush
in the raid5 code by introducing READ_ONCE() and WRITE_ONCE().

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index d0d7f96fd7cd9..ffc89ebbc4857 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3538,7 +3538,7 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
>  		 sh->dev[dd_idx].sector);
>  
>  	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
> -		sh->bm_seq = conf->seq_flush+1;
> +		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;

[Severity: Critical]
This is a pre-existing issue, but does this lockless synchronization require
a full memory barrier (smp_mb()) before reading seq_flush?

The producer sequence updates the in-memory bitmap (which drops a lock,
providing only a Store-Release barrier) and then reads seq_flush. Because
there is no full memory barrier between the store to the bitmap and the load
of seq_flush, weakly-ordered CPUs (and even x86, which allows Store-Load
reordering) can reorder the read of seq_flush before the store to the
bitmap becomes globally visible.

Could this lead to silent data corruption on a crash if the consumer sees
the old bitmap state but the producer assigns the stripe to the current
batch?

>  		set_bit(STRIPE_BIT_DELAY, &sh->state);
>  	}
>  }

[ ... ]

> @@ -5755,7 +5755,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
>  		}
>  		spin_unlock_irq(&sh->stripe_lock);
>  		if (conf->mddev->bitmap) {
> -			sh->bm_seq = conf->seq_flush + 1;
> +			sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;

[Severity: Critical]
This is a pre-existing issue, but the same missing full memory barrier
pattern appears here in the discard path as well.

>  			set_bit(STRIPE_BIT_DELAY, &sh->state);
>  		}
>  

[ ... ]

> @@ -6805,12 +6805,13 @@ static void raid5d(struct md_thread *thread)
>  		if (
>  		    !list_empty(&conf->bitmap_list)) {
>  			/* Now is a good time to flush some bitmap updates */
> -			conf->seq_flush++;
> +			int seq = READ_ONCE(conf->seq_flush) + 1;
> +			WRITE_ONCE(conf->seq_flush, seq);

[Severity: Critical]
This isn't a bug introduced by this patch, but does the consumer side
also rely dangerously on implicit barriers?

The consumer (raid5d) writes seq_flush and relies on an implicit full
barrier deep inside unplug() (via test_and_clear_bit) to order the
subsequent bitmap scan.

If the producer's missing barrier allows the read of seq_flush to be
reordered, the consumer might flush the old batch and advance seq_write,
unblocking the producer's stripe to be written to disk without its
write-intent bitmap bit being flushed.

>  			spin_unlock_irq(&conf->device_lock);
>  			if (md_bitmap_enabled(mddev, true))
>  				mddev->bitmap_ops->unplug(mddev, true);
>  			spin_lock_irq(&conf->device_lock);
> -			conf->seq_write = conf->seq_flush;
> +			WRITE_ONCE(conf->seq_write, seq);
>  			activate_bit_delay(conf, conf->temp_inactive_list);
>  		}
>  		raid5_activate_delayed(conf);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616131310.3790554-1-chencheng@fnnas.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write
  2026-06-16 13:30 ` sashiko-bot
@ 2026-06-17 13:55   ` Chen Cheng
  0 siblings, 0 replies; 3+ messages in thread
From: Chen Cheng @ 2026-06-17 13:55 UTC (permalink / raw)
  To: sashiko-reviews, Chen Cheng; +Cc: yukuai, linux-raid


在 2026/6/16 21:30, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [Critical] Missing full memory barriers between the bitmap update/scan and `seq_flush` access allows torn writes.
> --
>
> commit 45ef15a300eccd14518bc90e254146a73a49bf51
> Author: Chen Cheng <chencheng@fnnas.com>
>
> md/raid5: protect bitmap batch counters aka seq_flush/seq_write
>
> This commit attempts to fix a KCSAN data-race report on conf->seq_flush
> in the raid5 code by introducing READ_ONCE() and WRITE_ONCE().
>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index d0d7f96fd7cd9..ffc89ebbc4857 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3538,7 +3538,7 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
>>   		 sh->dev[dd_idx].sector);
>>   
>>   	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
>> -		sh->bm_seq = conf->seq_flush+1;
>> +		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
> [Severity: Critical]
> This is a pre-existing issue, but does this lockless synchronization require
> a full memory barrier (smp_mb()) before reading seq_flush?
>
> The producer sequence updates the in-memory bitmap (which drops a lock,
> providing only a Store-Release barrier) and then reads seq_flush. Because
> there is no full memory barrier between the store to the bitmap and the load
> of seq_flush, weakly-ordered CPUs (and even x86, which allows Store-Load
> reordering) can reorder the read of seq_flush before the store to the
> bitmap becomes globally visible.
>
> Could this lead to silent data corruption on a crash if the consumer sees
> the old bitmap state but the producer assigns the stripe to the current
> batch?


barrier provide a tool to promise "the things before this statement all 
done", and

I don't think I need such strong promise.


What I need is, "all cpu cache coherence for accesses variable 
seq_flush", so

READ_ONCE/WRITE_ONCE is suitable for this scenario.


>>   		set_bit(STRIPE_BIT_DELAY, &sh->state);
>>   	}
>>   }
> [ ... ]
>
>> @@ -5755,7 +5755,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
>>   		}
>>   		spin_unlock_irq(&sh->stripe_lock);
>>   		if (conf->mddev->bitmap) {
>> -			sh->bm_seq = conf->seq_flush + 1;
>> +			sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
> [Severity: Critical]
> This is a pre-existing issue, but the same missing full memory barrier
> pattern appears here in the discard path as well.
>
>>   			set_bit(STRIPE_BIT_DELAY, &sh->state);
>>   		}
>>   
> [ ... ]
>
>> @@ -6805,12 +6805,13 @@ static void raid5d(struct md_thread *thread)
>>   		if (
>>   		    !list_empty(&conf->bitmap_list)) {
>>   			/* Now is a good time to flush some bitmap updates */
>> -			conf->seq_flush++;
>> +			int seq = READ_ONCE(conf->seq_flush) + 1;
>> +			WRITE_ONCE(conf->seq_flush, seq);
> [Severity: Critical]
> This isn't a bug introduced by this patch, but does the consumer side
> also rely dangerously on implicit barriers?
>
> The consumer (raid5d) writes seq_flush and relies on an implicit full
> barrier deep inside unplug() (via test_and_clear_bit) to order the
> subsequent bitmap scan.
>
> If the producer's missing barrier allows the read of seq_flush to be
> reordered, the consumer might flush the old batch and advance seq_write,
> unblocking the producer's stripe to be written to disk without its
> write-intent bitmap bit being flushed.
>
>>   			spin_unlock_irq(&conf->device_lock);
>>   			if (md_bitmap_enabled(mddev, true))
>>   				mddev->bitmap_ops->unplug(mddev, true);
>>   			spin_lock_irq(&conf->device_lock);
>> -			conf->seq_write = conf->seq_flush;
>> +			WRITE_ONCE(conf->seq_write, seq);
>>   			activate_bit_delay(conf, conf->temp_inactive_list);
>>   		}
>>   		raid5_activate_delayed(conf);

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-17 13:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 13:13 [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write Chen Cheng
2026-06-16 13:30 ` sashiko-bot
2026-06-17 13:55   ` Chen Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox