* [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
@ 2023-05-22 11:54 ` linan666
2023-05-22 13:01 ` Yu Kuai
2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
To: song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
will be deref later. However, the latter check of mreplace might set
mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.
Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/raid10.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4fcfcb350d2b..e21502c03b45 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int must_sync;
int any_working;
int need_recover = 0;
- int need_replace = 0;
struct raid10_info *mirror = &conf->mirrors[i];
struct md_rdev *mrdev, *mreplace;
@@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
!test_bit(In_sync, &mrdev->flags))
need_recover = 1;
if (mreplace != NULL &&
- !test_bit(Faulty, &mreplace->flags))
- need_replace = 1;
+ test_bit(Faulty, &mreplace->flags))
+ mreplace = NULL;
- if (!need_recover && !need_replace) {
+ if (!need_recover && !mreplace) {
rcu_read_unlock();
continue;
}
@@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
rcu_read_unlock();
continue;
}
- if (mreplace && test_bit(Faulty, &mreplace->flags))
- mreplace = NULL;
/* Unless we are doing a full sync, or a replacement
* we only need to recover the block if it is set in
* the bitmap
@@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
bio = r10_bio->devs[1].repl_bio;
if (bio)
bio->bi_end_io = NULL;
- /* Note: if need_replace, then bio
+ /* Note: if replace is not NULL, then bio
* cannot be NULL as r10buf_pool_alloc will
* have allocated it.
*/
- if (!need_replace)
+ if (!mreplace)
break;
bio->bi_next = biolist;
biolist = bio;
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
@ 2023-05-22 13:01 ` Yu Kuai
2023-05-25 13:55 ` Li Nan
0 siblings, 1 reply; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:01 UTC (permalink / raw)
To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
yukuai (C)
Hi,
在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
>
> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
> will be deref later. However, the latter check of mreplace might set
> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
>
> Fix it by merging two checks into one. And replace 'need_replace' with
> 'mreplace' because their values are always the same.
>
> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
> drivers/md/raid10.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..e21502c03b45 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> int must_sync;
> int any_working;
> int need_recover = 0;
need_recover can be removed as well. Otherwise, this patch looks good to
me.
Thanks,
Kuai
> - int need_replace = 0;
> struct raid10_info *mirror = &conf->mirrors[i];
> struct md_rdev *mrdev, *mreplace;
>
> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> !test_bit(In_sync, &mrdev->flags))
> need_recover = 1;
> if (mreplace != NULL &&
> - !test_bit(Faulty, &mreplace->flags))
> - need_replace = 1;
> + test_bit(Faulty, &mreplace->flags))
> + mreplace = NULL;
>
> - if (!need_recover && !need_replace) {
> + if (!need_recover && !mreplace) {
> rcu_read_unlock();
> continue;
> }
> @@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> rcu_read_unlock();
> continue;
> }
> - if (mreplace && test_bit(Faulty, &mreplace->flags))
> - mreplace = NULL;
> /* Unless we are doing a full sync, or a replacement
> * we only need to recover the block if it is set in
> * the bitmap
> @@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> bio = r10_bio->devs[1].repl_bio;
> if (bio)
> bio->bi_end_io = NULL;
> - /* Note: if need_replace, then bio
> + /* Note: if replace is not NULL, then bio
> * cannot be NULL as r10buf_pool_alloc will
> * have allocated it.
> */
> - if (!need_replace)
> + if (!mreplace)
> break;
> bio->bi_next = biolist;
> biolist = bio;
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
2023-05-22 13:01 ` Yu Kuai
@ 2023-05-25 13:55 ` Li Nan
0 siblings, 0 replies; 14+ messages in thread
From: Li Nan @ 2023-05-25 13:55 UTC (permalink / raw)
To: Yu Kuai, linan666, song, shli, allenpeng, alexwu, bingjingc,
neilb
Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
yukuai (C)
在 2023/5/22 21:01, Yu Kuai 写道:
> Hi,
>
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
>> will be deref later. However, the latter check of mreplace might set
>> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this
>> time.
>>
>> Fix it by merging two checks into one. And replace 'need_replace' with
>> 'mreplace' because their values are always the same.
>>
>> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new
>> added disk faulty")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>> drivers/md/raid10.c | 13 +++++--------
>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 4fcfcb350d2b..e21502c03b45 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev
>> *mddev, sector_t sector_nr,
>> int must_sync;
>> int any_working;
>> int need_recover = 0;
>
> need_recover can be removed as well. Otherwise, this patch looks good to
> me.
>
I agree. Let me improve this in v2.
--
Thanks,
Nan
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] md/raid10: fix incorrect done of recovery
2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
@ 2023-05-22 11:54 ` linan666
2023-05-22 13:54 ` Yu Kuai
2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
To: song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
Recovery will go to giveup and let chunks_skipped++ in
raid10_sync_request() if there are some bad_blocks, and it will return
max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
data is inconsistent but user think recovery is done, it is wrong.
Fix it by set mirror's recovery_disabled and spare device shouln't be
added to here.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/raid10.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e21502c03b45..70cc87c7ee57 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int chunks_skipped = 0;
sector_t chunk_mask = conf->geo.chunk_mask;
int page_idx = 0;
+ int error_disk = -1;
/*
* Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
return reshape_request(mddev, sector_nr, skipped);
if (chunks_skipped >= conf->geo.raid_disks) {
- /* if there has been nothing to do on any drive,
+ pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+ test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
+ if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+ /*
+ * recovery fail, set mirrors.recovory_disabled,
+ * device shouldn't be added to there.
+ */
+ conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
+ return 0;
+ }
+ /*
+ * if there has been nothing to do on any drive,
* then there is nothing to do at all..
*/
*skipped = 1;
@@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
mdname(mddev));
mirror->recovery_disabled
= mddev->recovery_disabled;
+ } else {
+ error_disk = i;
}
put_buf(r10_bio);
if (rb2)
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
@ 2023-05-22 13:54 ` Yu Kuai
2023-05-22 16:03 ` Song Liu
2023-05-25 14:00 ` Li Nan
0 siblings, 2 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:54 UTC (permalink / raw)
To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
yukuai (C)
Hi,
在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
>
> Recovery will go to giveup and let chunks_skipped++ in
> raid10_sync_request() if there are some bad_blocks, and it will return
> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> data is inconsistent but user think recovery is done, it is wrong.
>
> Fix it by set mirror's recovery_disabled and spare device shouln't be
> added to here.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
> drivers/md/raid10.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e21502c03b45..70cc87c7ee57 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> int chunks_skipped = 0;
> sector_t chunk_mask = conf->geo.chunk_mask;
> int page_idx = 0;
> + int error_disk = -1;
>
> /*
> * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> return reshape_request(mddev, sector_nr, skipped);
>
> if (chunks_skipped >= conf->geo.raid_disks) {
> - /* if there has been nothing to do on any drive,
> + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
Line exceed 80 columns, and following.
> + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
Resync has the same problem, right?
Thanks,
Kuai
> + /*
> + * recovery fail, set mirrors.recovory_disabled,
> + * device shouldn't be added to there.
> + */
> + conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
> + return 0;
> + }
> + /*
> + * if there has been nothing to do on any drive,
> * then there is nothing to do at all..
> */
> *skipped = 1;
> @@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> mdname(mddev));
> mirror->recovery_disabled
> = mddev->recovery_disabled;
> + } else {
> + error_disk = i;
> }
> put_buf(r10_bio);
> if (rb2)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
2023-05-22 13:54 ` Yu Kuai
@ 2023-05-22 16:03 ` Song Liu
2023-05-25 14:00 ` Li Nan
1 sibling, 0 replies; 14+ messages in thread
From: Song Liu @ 2023-05-22 16:03 UTC (permalink / raw)
To: Yu Kuai
Cc: linan666, shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
linux-kernel, linan122, yi.zhang, houtao1, yangerkun, yukuai (C)
On Mon, May 22, 2023 at 6:54 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> > From: Li Nan <linan122@huawei.com>
> >
> > Recovery will go to giveup and let chunks_skipped++ in
> > raid10_sync_request() if there are some bad_blocks, and it will return
> > max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> > data is inconsistent but user think recovery is done, it is wrong.
> >
> > Fix it by set mirror's recovery_disabled and spare device shouln't be
> > added to here.
> >
> > Signed-off-by: Li Nan <linan122@huawei.com>
> > ---
> > drivers/md/raid10.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index e21502c03b45..70cc87c7ee57 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> > int chunks_skipped = 0;
> > sector_t chunk_mask = conf->geo.chunk_mask;
> > int page_idx = 0;
> > + int error_disk = -1;
> >
> > /*
> > * Allow skipping a full rebuild for incremental assembly
> > @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> > return reshape_request(mddev, sector_nr, skipped);
> >
> > if (chunks_skipped >= conf->geo.raid_disks) {
> > - /* if there has been nothing to do on any drive,
> > + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> > + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
>
> Line exceed 80 columns, and following.
If it makes the code look better, such as in this case, it is OK to have
lines longer than 80 columns.
Thanks,
Song
> > + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
2023-05-22 13:54 ` Yu Kuai
2023-05-22 16:03 ` Song Liu
@ 2023-05-25 14:00 ` Li Nan
2023-05-26 2:55 ` Yu Kuai
1 sibling, 1 reply; 14+ messages in thread
From: Li Nan @ 2023-05-25 14:00 UTC (permalink / raw)
To: Yu Kuai, linan666, song, shli, allenpeng, alexwu, bingjingc,
neilb
Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
yukuai (C)
在 2023/5/22 21:54, Yu Kuai 写道:
> Hi,
>
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> Recovery will go to giveup and let chunks_skipped++ in
>> raid10_sync_request() if there are some bad_blocks, and it will return
>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>> data is inconsistent but user think recovery is done, it is wrong.
>>
>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>> added to here.
>>
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>> drivers/md/raid10.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e21502c03b45..70cc87c7ee57 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev
>> *mddev, sector_t sector_nr,
>> int chunks_skipped = 0;
>> sector_t chunk_mask = conf->geo.chunk_mask;
>> int page_idx = 0;
>> + int error_disk = -1;
>> /*
>> * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct
>> mddev *mddev, sector_t sector_nr,
>> return reshape_request(mddev, sector_nr, skipped);
>> if (chunks_skipped >= conf->geo.raid_disks) {
>> - /* if there has been nothing to do on any drive,
>> + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync"
>> : "recovery");
>
> Line exceed 80 columns, and following.
>> + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC,
>> &mddev->recovery)) {
>
> Resync has the same problem, right?
>
Yes. But I have no idea to fix it. md_error disk nor set
recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
--
Thanks,
Nan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
2023-05-25 14:00 ` Li Nan
@ 2023-05-26 2:55 ` Yu Kuai
0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-26 2:55 UTC (permalink / raw)
To: Li Nan, Yu Kuai, song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
yukuai (C)
Hi,
在 2023/05/25 22:00, Li Nan 写道:
>
>
> 在 2023/5/22 21:54, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>>> From: Li Nan <linan122@huawei.com>
>>>
>>> Recovery will go to giveup and let chunks_skipped++ in
>>> raid10_sync_request() if there are some bad_blocks, and it will return
>>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>>> data is inconsistent but user think recovery is done, it is wrong.
>>>
>>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>>> added to here.
>>>
>>> Signed-off-by: Li Nan <linan122@huawei.com>
>>> ---
>>> drivers/md/raid10.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index e21502c03b45..70cc87c7ee57 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct
>>> mddev *mddev, sector_t sector_nr,
>>> int chunks_skipped = 0;
>>> sector_t chunk_mask = conf->geo.chunk_mask;
>>> int page_idx = 0;
>>> + int error_disk = -1;
>>> /*
>>> * Allow skipping a full rebuild for incremental assembly
>>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct
>>> mddev *mddev, sector_t sector_nr,
>>> return reshape_request(mddev, sector_nr, skipped);
>>> if (chunks_skipped >= conf->geo.raid_disks) {
>>> - /* if there has been nothing to do on any drive,
>>> + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>>> + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync"
>>> : "recovery");
>>
>> Line exceed 80 columns, and following.
>>> + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC,
>>> &mddev->recovery)) {
>>
>> Resync has the same problem, right?
>>
>
> Yes. But I have no idea to fix it. md_error disk nor set
> recovery_disabled is a good solution. So, just print error message now.
> Do you have any ideas?
I'll look into this, in the meadtime, I don't suggest to apply this
patch because this is just temporary solution that only fix half of
the problem.
Thanks,
Kuai
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev
2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
@ 2023-05-22 11:54 ` linan666
2023-05-22 13:03 ` Yu Kuai
2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
To: song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
When we remove a disk which has replacement, first set rdev to NULL
and then set replacement to rdev, finally set replacement to NULL (see
raid10_remove_disk()). If io is submitted during the same time, it might
read both rdev and replacement as NULL, and io will not be submitted.
rdev -> NULL
read rdev
replacement -> NULL
read replacement
Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.
Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/raid10.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 70cc87c7ee57..25a5a7b1e95c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
disk = r10_bio->devs[slot].devnum;
rdev = rcu_dereference(conf->mirrors[disk].replacement);
if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
- r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+ r10_bio->devs[slot].addr + sectors >
+ rdev->recovery_offset) {
+ /*
+ * Read replacement first to prevent reading both rdev
+ * and replacement as NULL during replacement replace
+ * rdev.
+ */
+ smp_mb();
rdev = rcu_dereference(conf->mirrors[disk].rdev);
+ }
if (rdev == NULL ||
test_bit(Faulty, &rdev->flags))
continue;
@@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
for (i = 0; i < conf->copies; i++) {
int d = r10_bio->devs[i].devnum;
- struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
- struct md_rdev *rrdev = rcu_dereference(
- conf->mirrors[d].replacement);
+ struct md_rdev *rdev, *rrdev;
+
+ rrdev = rcu_dereference(conf->mirrors[d].replacement);
+ /*
+ * Read replacement first to Prevent reading both rdev and
+ * replacement as NULL during replacement replace rdev.
+ */
+ smp_mb();
+ rdev = rcu_dereference(conf->mirrors[d].rdev);
if (rdev == rrdev)
rrdev = NULL;
if (rdev && (test_bit(Faulty, &rdev->flags)))
--
2.31.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev
2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-05-22 13:03 ` Yu Kuai
0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:03 UTC (permalink / raw)
To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
yukuai (C)
Hi,
在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
>
> When we remove a disk which has replacement, first set rdev to NULL
> and then set replacement to rdev, finally set replacement to NULL (see
> raid10_remove_disk()). If io is submitted during the same time, it might
> read both rdev and replacement as NULL, and io will not be submitted.
>
> rdev -> NULL
> read rdev
> replacement -> NULL
> read replacement
>
> Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
> to prevent memory reordering.
Looks good, feel free to add:
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
>
> Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
> drivers/md/raid10.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 70cc87c7ee57..25a5a7b1e95c 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
> disk = r10_bio->devs[slot].devnum;
> rdev = rcu_dereference(conf->mirrors[disk].replacement);
> if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
> - r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
> + r10_bio->devs[slot].addr + sectors >
> + rdev->recovery_offset) {
> + /*
> + * Read replacement first to prevent reading both rdev
> + * and replacement as NULL during replacement replace
> + * rdev.
> + */
> + smp_mb();
> rdev = rcu_dereference(conf->mirrors[disk].rdev);
> + }
> if (rdev == NULL ||
> test_bit(Faulty, &rdev->flags))
> continue;
> @@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
>
> for (i = 0; i < conf->copies; i++) {
> int d = r10_bio->devs[i].devnum;
> - struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
> - struct md_rdev *rrdev = rcu_dereference(
> - conf->mirrors[d].replacement);
> + struct md_rdev *rdev, *rrdev;
> +
> + rrdev = rcu_dereference(conf->mirrors[d].replacement);
> + /*
> + * Read replacement first to Prevent reading both rdev and
> + * replacement as NULL during replacement replace rdev.
> + */
> + smp_mb();
> + rdev = rcu_dereference(conf->mirrors[d].rdev);
> if (rdev == rrdev)
> rrdev = NULL;
> if (rdev && (test_bit(Faulty, &rdev->flags)))
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] raid10 bugfix
2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
` (2 preceding siblings ...)
2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-05-23 18:44 ` Song Liu
2023-05-25 14:01 ` Li Nan
3 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2023-05-23 18:44 UTC (permalink / raw)
To: linan666
Cc: shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
linux-kernel, linan122, yukuai3, yi.zhang, houtao1, yangerkun
On Mon, May 22, 2023 at 4:56 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Li Nan (3):
> md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
> md/raid10: fix incorrect done of recovery
> md/raid10: fix io loss while replacement replace rdev
Please address feedback and send v2.
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] raid10 bugfix
2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
@ 2023-05-25 14:01 ` Li Nan
0 siblings, 0 replies; 14+ messages in thread
From: Li Nan @ 2023-05-25 14:01 UTC (permalink / raw)
To: Song Liu, linan666
Cc: shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
linux-kernel, yukuai3, yi.zhang, houtao1, yangerkun
在 2023/5/24 2:44, Song Liu 写道:
> On Mon, May 22, 2023 at 4:56 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> Li Nan (3):
>> md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
>> md/raid10: fix incorrect done of recovery
>> md/raid10: fix io loss while replacement replace rdev
>
> Please address feedback and send v2.
>
> Thanks,
> Song
> .
Thanks for review, I will send v2 later.
--
Thanks,
Nan
^ permalink raw reply [flat|nested] 14+ messages in thread