Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH v2] md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency
@ 2026-06-17 14:28 Chen Cheng
  2026-06-17 14:43 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: Chen Cheng @ 2026-06-17 14:28 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, chenchneg33, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

kcsan detect race :
- raid5d() closes the current bitmap batch by updating
	conf->seq_flush under conf->device_lock.
- __add_stripe_bio() read conf->seq_flush without that
	lock when assigning sh->bm_seq.

so, protect seq_flush/seq_write consistency for multiple CPUs by
READ_ONCE()/WRITE_ONCE() under the path without held device_lock.

re-explain the stripe batch sequence number update flow:
1. sh->bm_seq declare which batch number the stripe belongs to
   when perform bitmap-related write.
	==> bm_seq = seq_flush+1

2. stripe be handled,
	* if sh->bm_seq - conf->seq_write > 0, means the
	  batch stripes **newer than** the last written
	  batch, it cannot proceed yet, queued on bitmap_list.
	* otherwise , has already proceed.

3. raid5d() `++seq_flush` to closes the current batch, means
	* no more stripes join that old batch
	* just-closed batch ready to write-out to disk

4. raid5d() calls bitmap hooks unplug() or writeout, then,
   `++seq_write` to the same as bm_seq.

- seq_flush - for producer, to close batches.
- seq_write - for consumer, the checkpoint number.

the report:
====================================
BUG: KCSAN: data-race in __add_stripe_bio / raid5d

write to 0xffff88ba5625d470 of 4 bytes by task 82401 on cpu 0:
 raid5d+0x1d9/0xba0
 [.....]

read to 0xffff88ba5625d470 of 4 bytes by task 82421 on cpu 8:
 __add_stripe_bio+0x332/0x400
 raid5_make_request+0x6ac/0x2930
 md_handle_request+0x4a2/0xa40
 md_submit_bio+0x109/0x1a0
 __submit_bio+0x2ec/0x390
 [.....]

v1 -> v2:
- remove WRITE_ONCE(conf->seq_write) in held device_lock path.
- remove READ_ONCE(conf->seq_flush) in held device_lock path.

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/raid5.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a320b71d7117..b2c5a1930841 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3536,11 +3536,11 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d, logical %llu\n",
 		 (*bip)->bi_iter.bi_sector, sh->sector, dd_idx,
 		 sh->dev[dd_idx].sector);
 
 	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
-		sh->bm_seq = conf->seq_flush+1;
+		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
 		set_bit(STRIPE_BIT_DELAY, &sh->state);
 	}
 }
 
 /*
@@ -5827,11 +5827,11 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
 			md_write_inc(mddev, bi);
 			sh->overwrite_disks++;
 		}
 		spin_unlock_irq(&sh->stripe_lock);
 		if (conf->mddev->bitmap) {
-			sh->bm_seq = conf->seq_flush + 1;
+			sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
 			set_bit(STRIPE_BIT_DELAY, &sh->state);
 		}
 
 		set_bit(STRIPE_HANDLE, &sh->state);
 		clear_bit(STRIPE_DELAYED, &sh->state);
@@ -6877,16 +6877,17 @@ static void raid5d(struct md_thread *thread)
 			clear_bit(R5_DID_ALLOC, &conf->cache_state);
 
 		if (
 		    !list_empty(&conf->bitmap_list)) {
 			/* Now is a good time to flush some bitmap updates */
-			conf->seq_flush++;
+			int seq = conf->seq_flush + 1;
+			WRITE_ONCE(conf->seq_flush, seq);
 			spin_unlock_irq(&conf->device_lock);
 			if (md_bitmap_enabled(mddev, true))
 				mddev->bitmap_ops->unplug(mddev, true);
 			spin_lock_irq(&conf->device_lock);
-			conf->seq_write = conf->seq_flush;
+			conf->seq_write = seq;
 			activate_bit_delay(conf, conf->temp_inactive_list);
 		}
 		raid5_activate_delayed(conf);
 
 		while ((bio = remove_bio_from_retry(conf, &offset))) {
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency
  2026-06-17 14:28 [PATCH v2] md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency Chen Cheng
@ 2026-06-17 14:43 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-17 14:43 UTC (permalink / raw)
  To: Chen Cheng; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Direct integer comparison of sequence numbers `bm_seq` in `stripe_add_to_batch_list` fails to handle integer wrap-around.
--

commit d138461da7f53106a83faa762ae81dbb822cbfca
Author: Chen Cheng <chencheng@fnnas.com>

md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency

This patch protects seq_flush and seq_write consistency across multiple CPUs.
It uses READ_ONCE() and WRITE_ONCE() to prevent data races detected by KCSAN
when assigning sh->bm_seq without holding device_lock.

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index d0d7f96fd7cd9..891c2cb605d30 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3538,7 +3538,7 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
>  		 sh->dev[dd_idx].sector);
>  
>  	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
> -		sh->bm_seq = conf->seq_flush+1;
> +		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
>  		set_bit(STRIPE_BIT_DELAY, &sh->state);
>  	}
>  }

[Severity: Critical]
This is a pre-existing issue, but does stripe_add_to_batch_list() handle
32-bit integer wrap-around correctly when comparing sequence numbers?

Since seq_flush, seq_write, and sh->bm_seq increment for each batch, they
will eventually overflow and wrap around to negative values. In other parts
of the code, this is handled using signed subtraction (e.g.,
sh->bm_seq - conf->seq_write > 0).

However, in stripe_add_to_batch_list(), there is a direct comparison:

drivers/md/raid5.c:stripe_add_to_batch_list() {
    ...
    if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
        int seq = sh->bm_seq;
        if (test_bit(STRIPE_BIT_DELAY, &sh->batch_head->state) &&
            sh->batch_head->bm_seq > seq)
            seq = sh->batch_head->bm_seq;
    ...
}

At the wrap-around boundary, a newer sequence number is numerically less than
an older sequence number. Does this direct comparison incorrectly evaluate to
false, assigning the batch head an older sequence number?

Could this cause the batch to be released early when conf->seq_write catches
up, completing before its newer bitmap updates are fully flushed? Should this
comparison use subtraction instead, such as sh->batch_head->bm_seq - seq > 0?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617142839.882378-1-chencheng@fnnas.com?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-17 14:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 14:28 [PATCH v2] md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency Chen Cheng
2026-06-17 14:43 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox