From: "Chen Cheng" <chencheng@fnnas.com>
To: <yukuai@fygo.io>, <linux-raid@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request
Date: Mon, 22 Jun 2026 11:08:00 +0800 [thread overview]
Message-ID: <727bd8e8-cbbe-4bbb-a239-41c9fa18dc93@fnnas.com> (raw)
In-Reply-To: <e958cb97-556a-4320-8e76-9aed18a435bc@fygo.io>
在 2026/6/21 06:01, yu kuai 写道:
> Hi,
>
> 在 2026/6/19 16:11, Chen Cheng 写道:
>> From: Chen Cheng <chencheng@fnnas.com>
>>
>> KCSAN reports race in raid5_make_request() vs. stripe_add_to_batch_list()
>>
>> Writer flow (stripe_add_to_batch_list):
>> 1. grab `head` stripe;
>> 2. lock_two_stripes(head, sh);
>> 3. re-check stripe_can_batch() for both head and sh, which requires
>> STRIPE_BATCH_READY set on both;
>> 4. write head->batch_head = head and sh->batch_head = head;
>> 5. unlock_two_stripes.
>>
>> STRIPE_BATCH_READY is cleared in two places:
>> - clear_batch_ready(), at the entry of handle_stripe();
>> - __add_stripe_bio(), for non-batchable bios.
>> And, both need to acquire `stripe_lock`.
>>
>> Under stripe_lock, if STRIPE_BATCH_READY is clear, then:
>> - New writers cannot install a batch_head;
>> - Existing writers have already finished.
>> So .. handle_stripe() readers can ready `batch_head` locklessly.
>
> This does not explain the race clearly, I still have no clue yet.
From the semantic correctness perspective, I think the lock is needed.
From the race consequence perspective, the worst consequence I can see
is that it could add to a batch member stripe. But
`conf->preread_active_stripes` should only add to batch head or lone stripe.
the scenario:
=========================
sh1 and sh2 are neighbor, wich means,
if sh1 start with sector X, then, sh2 start with sectorX + STRIPE_SECTORS,
CPU0 CPU1
make_stripe_request(sh2)
-> add_all_stripe_bios(sh2) make_stripe_request(sh2)
-> add_all_stripe_bios(sh2)
-> stripe_add_to_batch_list(sh2)
-> lock_two_stripes(sh1, sh2)
-> sh1->batch_head = sh1
-> sh2->batch_head = sh1
-> test_and_clear_bit(
STRIPE_PREREAD_ACTIVE,
&sh2->state)
-> unlock_two_stripes(sh1, sh2)
-> if ((!sh2->batch_head ||
sh2 == sh2->batch_head) &&
REQ_SYNC &&
!test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh2->state))
atomic_inc(&conf->preread_active_stripes)
After CPU2 batches 'sh', CPU1 can still treat it as a lone stripe and
charge preread_active_stripes. Since CPU2 has already run the follower
side compensation, the later increment has no matching decrement.
>
>>
>> Fix way:
>> Writer side make_stripe_request() under STRIPE_BATCH_READY, so , need
>> to be protected by stripe_lock when read something..
>>
>> v1 -> v2:
>> - re-expalin how stripe_lock and batch_head work in commit message , and ,
>> - modify comment in raid5.h.
>>
>> Fixs: f4aec6a097387
>
> Weird fix tag again.
>
>>
>>
>> KCSAN report:
>> ======================================
>> BUG: KCSAN: data-race in raid5_make_request / raid5_make_request
>>
>> write to 0xffff8f03062432d8 of 8 bytes by task 210246 on cpu 6:
>> raid5_make_request+0x175e/0x2ab0
>> md_handle_request+0x2c5/0x700
>> md_submit_bio+0x126/0x320
>> [.........]
>> btrfs_sync_file+0x181/0x970
>> vfs_fsync_range+0x71/0x110
>> do_fsync+0x46/0xa0
>> __x64_sys_fsync+0x20/0x30
>>
>> read to 0xffff8f03062432d8 of 8 bytes by task 210251 on cpu 0:
>> raid5_make_request+0x7c7/0x2ab0
>> md_handle_request+0x2c5/0x700
>> md_submit_bio+0x126/0x320
>> [.........]
>> btrfs_remap_file_range+0x266/0x980
>> vfs_clone_file_range+0x16d/0x610
>> ioctl_file_clone+0x64/0xd0
>> do_vfs_ioctl+0x87f/0xbc0
>> __x64_sys_ioctl+0xb8/0x130
>>
>> value changed: 0x0000000000000000 -> 0xffff8f0307798728
>
> Is this a mismatch report?
>
>>
>>
>> Signed-off-by: Chen Cheng <chencheng@fnnas.com>
>> ---
>> drivers/md/raid5.c | 2 ++
>> drivers/md/raid5.h | 8 +++++++-
>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 5521051a9425..efc63740f867 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -6108,14 +6108,16 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>> ctx->do_flush = false;
>> }
>>
>> set_bit(STRIPE_HANDLE, &sh->state);
>> clear_bit(STRIPE_DELAYED, &sh->state);
>> + spin_lock_irq(&sh->stripe_lock);
>> if ((!sh->batch_head || sh == sh->batch_head) &&
>> (bi->bi_opf & REQ_SYNC) &&
>> !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
>> atomic_inc(&conf->preread_active_stripes);
>> + spin_unlock_irq(&sh->stripe_lock);
>>
>> release_stripe_plug(mddev, sh);
>> return STRIPE_SUCCESS;
>>
>> out_release:
>> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> index 1c7b710fc9c1..9ff825697ba3 100644
>> --- a/drivers/md/raid5.h
>> +++ b/drivers/md/raid5.h
>> @@ -221,11 +221,17 @@ struct stripe_head {
>> enum reconstruct_states reconstruct_state;
>> spinlock_t stripe_lock;
>> int cpu;
>> struct r5worker_group *group;
>>
>> - struct stripe_head *batch_head; /* protected by stripe lock */
>> + /*
>> + * Writer protected by stripe_lock.
>> + * Reader hold stripe_lock when STRIPE_BATCH_READY is set.
>> + * Without STRIPE_BATCH_READY means no concurrent write,
>> + * lockless read is ok.
>> + */
>> + struct stripe_head *batch_head;
>> spinlock_t batch_lock; /* only header's lock is useful */
>> struct list_head batch_list; /* protected by head's batch lock*/
>>
>> union {
>> struct r5l_io_unit *log_io;
>
prev parent reply other threads:[~2026-06-22 3:08 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-19 8:11 [PATCH v2] md/raid5: read batch_head under stripe_lock in make_stripe_request Chen Cheng
2026-06-20 22:01 ` yu kuai
2026-06-22 3:08 ` Chen Cheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=727bd8e8-cbbe-4bbb-a239-41c9fa18dc93@fnnas.com \
--to=chencheng@fnnas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox