* [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request
@ 2026-06-19 8:11 Chen Cheng
2026-06-20 22:01 ` yu kuai
0 siblings, 1 reply; 2+ messages in thread
From: Chen Cheng @ 2026-06-19 8:11 UTC (permalink / raw)
To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel
From: Chen Cheng <chencheng@fnnas.com>
KCSAN reports race in raid5_make_request() vs. stripe_add_to_batch_list()
Writer flow (stripe_add_to_batch_list):
1. grab `head` stripe;
2. lock_two_stripes(head, sh);
3. re-check stripe_can_batch() for both head and sh, which requires
STRIPE_BATCH_READY set on both;
4. write head->batch_head = head and sh->batch_head = head;
5. unlock_two_stripes.
STRIPE_BATCH_READY is cleared in two places:
- clear_batch_ready(), at the entry of handle_stripe();
- __add_stripe_bio(), for non-batchable bios.
And, both need to acquire `stripe_lock`.
Under stripe_lock, if STRIPE_BATCH_READY is clear, then:
- New writers cannot install a batch_head;
- Existing writers have already finished.
So .. handle_stripe() readers can ready `batch_head` locklessly.
Fix way:
Writer side make_stripe_request() under STRIPE_BATCH_READY, so , need
to be protected by stripe_lock when read something..
v1 -> v2:
- re-expalin how stripe_lock and batch_head work in commit message , and ,
- modify comment in raid5.h.
Fixs: f4aec6a097387
KCSAN report:
======================================
BUG: KCSAN: data-race in raid5_make_request / raid5_make_request
write to 0xffff8f03062432d8 of 8 bytes by task 210246 on cpu 6:
raid5_make_request+0x175e/0x2ab0
md_handle_request+0x2c5/0x700
md_submit_bio+0x126/0x320
[.........]
btrfs_sync_file+0x181/0x970
vfs_fsync_range+0x71/0x110
do_fsync+0x46/0xa0
__x64_sys_fsync+0x20/0x30
read to 0xffff8f03062432d8 of 8 bytes by task 210251 on cpu 0:
raid5_make_request+0x7c7/0x2ab0
md_handle_request+0x2c5/0x700
md_submit_bio+0x126/0x320
[.........]
btrfs_remap_file_range+0x266/0x980
vfs_clone_file_range+0x16d/0x610
ioctl_file_clone+0x64/0xd0
do_vfs_ioctl+0x87f/0xbc0
__x64_sys_ioctl+0xb8/0x130
value changed: 0x0000000000000000 -> 0xffff8f0307798728
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
drivers/md/raid5.c | 2 ++
drivers/md/raid5.h | 8 +++++++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5521051a9425..efc63740f867 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6108,14 +6108,16 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
ctx->do_flush = false;
}
set_bit(STRIPE_HANDLE, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);
+ spin_lock_irq(&sh->stripe_lock);
if ((!sh->batch_head || sh == sh->batch_head) &&
(bi->bi_opf & REQ_SYNC) &&
!test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
atomic_inc(&conf->preread_active_stripes);
+ spin_unlock_irq(&sh->stripe_lock);
release_stripe_plug(mddev, sh);
return STRIPE_SUCCESS;
out_release:
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 1c7b710fc9c1..9ff825697ba3 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -221,11 +221,17 @@ struct stripe_head {
enum reconstruct_states reconstruct_state;
spinlock_t stripe_lock;
int cpu;
struct r5worker_group *group;
- struct stripe_head *batch_head; /* protected by stripe lock */
+ /*
+ * Writer protected by stripe_lock.
+ * Reader hold stripe_lock when STRIPE_BATCH_READY is set.
+ * Without STRIPE_BATCH_READY means no concurrent write,
+ * lockless read is ok.
+ */
+ struct stripe_head *batch_head;
spinlock_t batch_lock; /* only header's lock is useful */
struct list_head batch_list; /* protected by head's batch lock*/
union {
struct r5l_io_unit *log_io;
--
2.54.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request
2026-06-19 8:11 [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request Chen Cheng
@ 2026-06-20 22:01 ` yu kuai
0 siblings, 0 replies; 2+ messages in thread
From: yu kuai @ 2026-06-20 22:01 UTC (permalink / raw)
To: Chen Cheng, linux-raid, yukuai; +Cc: linux-kernel
Hi,
在 2026/6/19 16:11, Chen Cheng 写道:
> From: Chen Cheng <chencheng@fnnas.com>
>
> KCSAN reports race in raid5_make_request() vs. stripe_add_to_batch_list()
>
> Writer flow (stripe_add_to_batch_list):
> 1. grab `head` stripe;
> 2. lock_two_stripes(head, sh);
> 3. re-check stripe_can_batch() for both head and sh, which requires
> STRIPE_BATCH_READY set on both;
> 4. write head->batch_head = head and sh->batch_head = head;
> 5. unlock_two_stripes.
>
> STRIPE_BATCH_READY is cleared in two places:
> - clear_batch_ready(), at the entry of handle_stripe();
> - __add_stripe_bio(), for non-batchable bios.
> And, both need to acquire `stripe_lock`.
>
> Under stripe_lock, if STRIPE_BATCH_READY is clear, then:
> - New writers cannot install a batch_head;
> - Existing writers have already finished.
> So .. handle_stripe() readers can ready `batch_head` locklessly.
This does not explain the race clearly, I still have no clue yet.
>
> Fix way:
> Writer side make_stripe_request() under STRIPE_BATCH_READY, so , need
> to be protected by stripe_lock when read something..
>
> v1 -> v2:
> - re-expalin how stripe_lock and batch_head work in commit message , and ,
> - modify comment in raid5.h.
>
> Fixs: f4aec6a097387
Weird fix tag again.
>
>
> KCSAN report:
> ======================================
> BUG: KCSAN: data-race in raid5_make_request / raid5_make_request
>
> write to 0xffff8f03062432d8 of 8 bytes by task 210246 on cpu 6:
> raid5_make_request+0x175e/0x2ab0
> md_handle_request+0x2c5/0x700
> md_submit_bio+0x126/0x320
> [.........]
> btrfs_sync_file+0x181/0x970
> vfs_fsync_range+0x71/0x110
> do_fsync+0x46/0xa0
> __x64_sys_fsync+0x20/0x30
>
> read to 0xffff8f03062432d8 of 8 bytes by task 210251 on cpu 0:
> raid5_make_request+0x7c7/0x2ab0
> md_handle_request+0x2c5/0x700
> md_submit_bio+0x126/0x320
> [.........]
> btrfs_remap_file_range+0x266/0x980
> vfs_clone_file_range+0x16d/0x610
> ioctl_file_clone+0x64/0xd0
> do_vfs_ioctl+0x87f/0xbc0
> __x64_sys_ioctl+0xb8/0x130
>
> value changed: 0x0000000000000000 -> 0xffff8f0307798728
Is this a mismatch report?
>
>
> Signed-off-by: Chen Cheng <chencheng@fnnas.com>
> ---
> drivers/md/raid5.c | 2 ++
> drivers/md/raid5.h | 8 +++++++-
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5521051a9425..efc63740f867 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6108,14 +6108,16 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
> ctx->do_flush = false;
> }
>
> set_bit(STRIPE_HANDLE, &sh->state);
> clear_bit(STRIPE_DELAYED, &sh->state);
> + spin_lock_irq(&sh->stripe_lock);
> if ((!sh->batch_head || sh == sh->batch_head) &&
> (bi->bi_opf & REQ_SYNC) &&
> !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> atomic_inc(&conf->preread_active_stripes);
> + spin_unlock_irq(&sh->stripe_lock);
>
> release_stripe_plug(mddev, sh);
> return STRIPE_SUCCESS;
>
> out_release:
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index 1c7b710fc9c1..9ff825697ba3 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -221,11 +221,17 @@ struct stripe_head {
> enum reconstruct_states reconstruct_state;
> spinlock_t stripe_lock;
> int cpu;
> struct r5worker_group *group;
>
> - struct stripe_head *batch_head; /* protected by stripe lock */
> + /*
> + * Writer protected by stripe_lock.
> + * Reader hold stripe_lock when STRIPE_BATCH_READY is set.
> + * Without STRIPE_BATCH_READY means no concurrent write,
> + * lockless read is ok.
> + */
> + struct stripe_head *batch_head;
> spinlock_t batch_lock; /* only header's lock is useful */
> struct list_head batch_list; /* protected by head's batch lock*/
>
> union {
> struct r5l_io_unit *log_io;
--
Thanks,
Kuai
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-20 22:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-19 8:11 [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request Chen Cheng
2026-06-20 22:01 ` yu kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox