Re: [PATCH] md/raid5: fix race between reshape and chunk-aligned read

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

From: FengWei Shih <dannyshih@synology.com>
To: yukuai@fnnas.com, song@kernel.org
Cc: linan122@huawei.com, linux-raid@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] md/raid5: fix race between reshape and chunk-aligned read
Date: Tue, 21 Apr 2026 15:09:22 +0800	[thread overview]
Message-ID: <d298e159-1e47-4c97-80cd-41c4c1c74a04@synology.com> (raw)
In-Reply-To: <68b61dc3-f7fa-49ba-9f26-f07b7fa4768d@fnnas.com>

Hi Kuai,

Yu Kuai 於 2026/4/19 下午 01:14 寫道:
> Hi,
>
> 在 2026/4/9 13:17, FengWei Shih 写道:
>> raid5_make_request() checks mddev->reshape_position to decide whether
>> to allow chunk-aligned reads. However in raid5_start_reshape(), the
>> layout configuration (raid_disks, algorithm, etc.) is updated before
>> mddev->reshape_position is set:
>>
>>     reshape (raid5_start_reshape)        read (raid5_make_request)
>>     ==============================       ===========================
>>     write_seqcount_begin
>>     update raid_disks, algorithm...
>>     set conf->reshape_progress
>>     write_seqcount_end
>>                                           check mddev->reshape_position
>>                                             * still MaxSector, allow
>>                                           raid5_read_one_chunk()
>>                                             * use new layout
>>     raid5_quiesce()
>>     set mddev->reshape_position
> While we're here, I think it's pretty ugly to disable raid5_read_one_chunk
> when reshape is not fully done. So a better solution should be:
>
> - data behind reshape: read with new layout
> - data ahead of reshape: read with old layout, reshape will also need to wait
> for this IO to be done, before reshape can make progress.
> - data intersecting the active reshape window: wait for reshape to make progress.

Thanks for the feedback. I agree that using reshape_progress to distinguish
the three cases (ahead/behind/inside reshape) would be more refined.

But disabling chunk-aligned reads during reshape was already the 
existing design.
In the original code, the check is at the caller level:

     if (rw == READ && mddev->degraded == 0 &&
         mddev->reshape_position == MaxSector) {
         bi = chunk_aligned_read(mddev, bi);
     }

My patch is focused on fixing the race condition in the existing 
lockless check of whether
reshape is in progress. So just to confirm: are you suggesting we add a 
mechanism to
allow chunk-aligned reads during reshape (based on reshape progress), rather
than simply disabling them?

>> Since reshape_position is not yet updated, raid5_make_request()
>> considers no reshape is in progress and proceeds with the
>> chunk-aligned path, but the layout has already changed, causing
>> raid5_compute_sector() to return an incorrect physical address.
>>
>> Fix this by reading conf->reshape_progress under gen_lock in
>> raid5_read_one_chunk() and falling back to the stripe path if a
>> reshape is in progress.
>>
>> Signed-off-by: FengWei Shih <dannyshih@synology.com>
>> ---
>>    drivers/md/raid5.c | 8 ++++++++
>>    1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index a8e8d431071b..bded2b86f0ef 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -5421,6 +5421,11 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>>    	sector_t sector, end_sector;
>>    	int dd_idx;
>>    	bool did_inc;
>> +	int seq;
>> +
>> +	seq = read_seqcount_begin(&conf->gen_lock);
>> +	if (unlikely(conf->reshape_progress != MaxSector))
>> +		return 0;
>>    
>>    	if (!in_chunk_boundary(mddev, raid_bio)) {
>>    		pr_debug("%s: non aligned\n", __func__);
>> @@ -5431,6 +5436,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>>    				      &dd_idx, NULL);
>>    	end_sector = sector + bio_sectors(raid_bio);
>>    
>> +	if (read_seqcount_retry(&conf->gen_lock, seq))
>> +		return 0;
>> +
>>    	if (r5c_big_stripe_cached(conf, sector))
>>    		return 0;
>>    
Thanks,
FengWei Shih

Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

next prev parent reply	other threads:[~2026-04-21  7:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09  5:17 [PATCH] md/raid5: fix race between reshape and chunk-aligned read FengWei Shih
2026-04-13  7:19 ` Li Nan
2026-04-14  8:27   ` FengWei Shih
2026-04-19  5:14 ` Yu Kuai
2026-04-21  7:09   ` FengWei Shih [this message]
2026-04-28  8:29     ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d298e159-1e47-4c97-80cd-41c4c1c74a04@synology.com \
    --to=dannyshih@synology.com \
    --cc=linan122@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yukuai@fnnas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox