[PATCH 0/3] raid10 bugfix

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] raid10 bugfix
@ 2023-05-22 11:54 linan666
  2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
  To: song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Li Nan (3):
  md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
  md/raid10: fix incorrect done of recovery
  md/raid10: fix io loss while replacement replace rdev

 drivers/md/raid10.c | 51 +++++++++++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 13 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
  2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
@ 2023-05-22 11:54 ` linan666
  2023-05-22 13:01   ` Yu Kuai
  2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
  To: song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
will be deref later. However, the latter check of mreplace might set
mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.

Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.

Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4fcfcb350d2b..e21502c03b45 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			int must_sync;
 			int any_working;
 			int need_recover = 0;
-			int need_replace = 0;
 			struct raid10_info *mirror = &conf->mirrors[i];
 			struct md_rdev *mrdev, *mreplace;
 
@@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			    !test_bit(In_sync, &mrdev->flags))
 				need_recover = 1;
 			if (mreplace != NULL &&
-			    !test_bit(Faulty, &mreplace->flags))
-				need_replace = 1;
+			    test_bit(Faulty, &mreplace->flags))
+				mreplace = NULL;
 
-			if (!need_recover && !need_replace) {
+			if (!need_recover && !mreplace) {
 				rcu_read_unlock();
 				continue;
 			}
@@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				rcu_read_unlock();
 				continue;
 			}
-			if (mreplace && test_bit(Faulty, &mreplace->flags))
-				mreplace = NULL;
 			/* Unless we are doing a full sync, or a replacement
 			 * we only need to recover the block if it is set in
 			 * the bitmap
@@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio = r10_bio->devs[1].repl_bio;
 				if (bio)
 					bio->bi_end_io = NULL;
-				/* Note: if need_replace, then bio
+				/* Note: if replace is not NULL, then bio
 				 * cannot be NULL as r10buf_pool_alloc will
 				 * have allocated it.
 				 */
-				if (!need_replace)
+				if (!mreplace)
 					break;
 				bio->bi_next = biolist;
 				biolist = bio;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
  2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
@ 2023-05-22 13:01   ` Yu Kuai
  2023-05-25 13:55     ` Li Nan
  0 siblings, 1 reply; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:01 UTC (permalink / raw)
  To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
> will be deref later. However, the latter check of mreplace might set
> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
> 
> Fix it by merging two checks into one. And replace 'need_replace' with
> 'mreplace' because their values are always the same.
> 
> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..e21502c03b45 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   			int must_sync;
>   			int any_working;
>   			int need_recover = 0;

need_recover can be removed as well. Otherwise, this patch looks good to
me.

Thanks,
Kuai
> -			int need_replace = 0;
>   			struct raid10_info *mirror = &conf->mirrors[i];
>   			struct md_rdev *mrdev, *mreplace;
>   
> @@ -3451,10 +3450,10 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   			    !test_bit(In_sync, &mrdev->flags))
>   				need_recover = 1;
>   			if (mreplace != NULL &&
> -			    !test_bit(Faulty, &mreplace->flags))
> -				need_replace = 1;
> +			    test_bit(Faulty, &mreplace->flags))
> +				mreplace = NULL;
>   
> -			if (!need_recover && !need_replace) {
> +			if (!need_recover && !mreplace) {
>   				rcu_read_unlock();
>   				continue;
>   			}
> @@ -3470,8 +3469,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   				rcu_read_unlock();
>   				continue;
>   			}
> -			if (mreplace && test_bit(Faulty, &mreplace->flags))
> -				mreplace = NULL;
>   			/* Unless we are doing a full sync, or a replacement
>   			 * we only need to recover the block if it is set in
>   			 * the bitmap
> @@ -3594,11 +3591,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   				bio = r10_bio->devs[1].repl_bio;
>   				if (bio)
>   					bio->bi_end_io = NULL;
> -				/* Note: if need_replace, then bio
> +				/* Note: if replace is not NULL, then bio
>   				 * cannot be NULL as r10buf_pool_alloc will
>   				 * have allocated it.
>   				 */
> -				if (!need_replace)
> +				if (!mreplace)
>   					break;
>   				bio->bi_next = biolist;
>   				biolist = bio;
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
  2023-05-22 13:01   ` Yu Kuai
@ 2023-05-25 13:55     ` Li Nan
  0 siblings, 0 replies; 14+ messages in thread
From: Li Nan @ 2023-05-25 13:55 UTC (permalink / raw)
  To: Yu Kuai, linan666, song, shli, allenpeng, alexwu, bingjingc,
	neilb
  Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
	yukuai (C)



在 2023/5/22 21:01, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
>> will be deref later. However, the latter check of mreplace might set
>> mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this 
>> time.
>>
>> Fix it by merging two checks into one. And replace 'need_replace' with
>> 'mreplace' because their values are always the same.
>>
>> Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new 
>> added disk faulty")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>   drivers/md/raid10.c | 13 +++++--------
>>   1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 4fcfcb350d2b..e21502c03b45 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3438,7 +3438,6 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>               int must_sync;
>>               int any_working;
>>               int need_recover = 0;
> 
> need_recover can be removed as well. Otherwise, this patch looks good to
> me.
> 

I agree. Let me improve this in v2.
-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] md/raid10: fix incorrect done of recovery
  2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
  2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
@ 2023-05-22 11:54 ` linan666
  2023-05-22 13:54   ` Yu Kuai
  2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
  2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
  3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
  To: song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Recovery will go to giveup and let chunks_skipped++ in
raid10_sync_request() if there are some bad_blocks, and it will return
max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
data is inconsistent but user think recovery is done, it is wrong.

Fix it by set mirror's recovery_disabled and spare device shouln't be
added to here.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e21502c03b45..70cc87c7ee57 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int chunks_skipped = 0;
 	sector_t chunk_mask = conf->geo.chunk_mask;
 	int page_idx = 0;
+	int error_disk = -1;
 
 	/*
 	 * Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		return reshape_request(mddev, sector_nr, skipped);
 
 	if (chunks_skipped >= conf->geo.raid_disks) {
-		/* if there has been nothing to do on any drive,
+		pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
+		if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+			/*
+			 * recovery fail, set mirrors.recovory_disabled,
+			 * device shouldn't be added to there.
+			 */
+			conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
+			return 0;
+		}
+		/*
+		 * if there has been nothing to do on any drive,
 		 * then there is nothing to do at all..
 		 */
 		*skipped = 1;
@@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						       mdname(mddev));
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
+				} else {
+					error_disk = i;
 				}
 				put_buf(r10_bio);
 				if (rb2)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
  2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
@ 2023-05-22 13:54   ` Yu Kuai
  2023-05-22 16:03     ` Song Liu
  2023-05-25 14:00     ` Li Nan
  0 siblings, 2 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:54 UTC (permalink / raw)
  To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> Recovery will go to giveup and let chunks_skipped++ in
> raid10_sync_request() if there are some bad_blocks, and it will return
> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> data is inconsistent but user think recovery is done, it is wrong.
> 
> Fix it by set mirror's recovery_disabled and spare device shouln't be
> added to here.
> 
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e21502c03b45..70cc87c7ee57 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   	int chunks_skipped = 0;
>   	sector_t chunk_mask = conf->geo.chunk_mask;
>   	int page_idx = 0;
> +	int error_disk = -1;
>   
>   	/*
>   	 * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   		return reshape_request(mddev, sector_nr, skipped);
>   
>   	if (chunks_skipped >= conf->geo.raid_disks) {
> -		/* if there has been nothing to do on any drive,
> +		pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> +			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");

Line exceed 80 columns, and following.
> +		if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {

Resync has the same problem, right?

Thanks,
Kuai
> +			/*
> +			 * recovery fail, set mirrors.recovory_disabled,
> +			 * device shouldn't be added to there.
> +			 */
> +			conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
> +			return 0;
> +		}
> +		/*
> +		 * if there has been nothing to do on any drive,
>   		 * then there is nothing to do at all..
>   		 */
>   		*skipped = 1;
> @@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   						       mdname(mddev));
>   					mirror->recovery_disabled
>   						= mddev->recovery_disabled;
> +				} else {
> +					error_disk = i;
>   				}
>   				put_buf(r10_bio);
>   				if (rb2)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
  2023-05-22 13:54   ` Yu Kuai
@ 2023-05-22 16:03     ` Song Liu
  2023-05-25 14:00     ` Li Nan
  1 sibling, 0 replies; 14+ messages in thread
From: Song Liu @ 2023-05-22 16:03 UTC (permalink / raw)
  To: Yu Kuai
  Cc: linan666, shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
	linux-kernel, linan122, yi.zhang, houtao1, yangerkun, yukuai (C)

On Mon, May 22, 2023 at 6:54 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> > From: Li Nan <linan122@huawei.com>
> >
> > Recovery will go to giveup and let chunks_skipped++ in
> > raid10_sync_request() if there are some bad_blocks, and it will return
> > max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> > data is inconsistent but user think recovery is done, it is wrong.
> >
> > Fix it by set mirror's recovery_disabled and spare device shouln't be
> > added to here.
> >
> > Signed-off-by: Li Nan <linan122@huawei.com>
> > ---
> >   drivers/md/raid10.c | 16 +++++++++++++++-
> >   1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index e21502c03b45..70cc87c7ee57 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >       int chunks_skipped = 0;
> >       sector_t chunk_mask = conf->geo.chunk_mask;
> >       int page_idx = 0;
> > +     int error_disk = -1;
> >
> >       /*
> >        * Allow skipping a full rebuild for incremental assembly
> > @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >               return reshape_request(mddev, sector_nr, skipped);
> >
> >       if (chunks_skipped >= conf->geo.raid_disks) {
> > -             /* if there has been nothing to do on any drive,
> > +             pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> > +                     test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
>
> Line exceed 80 columns, and following.

If it makes the code look better, such as in this case, it is OK to have
lines longer than 80 columns.

Thanks,
Song

> > +             if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
  2023-05-22 13:54   ` Yu Kuai
  2023-05-22 16:03     ` Song Liu
@ 2023-05-25 14:00     ` Li Nan
  2023-05-26  2:55       ` Yu Kuai
  1 sibling, 1 reply; 14+ messages in thread
From: Li Nan @ 2023-05-25 14:00 UTC (permalink / raw)
  To: Yu Kuai, linan666, song, shli, allenpeng, alexwu, bingjingc,
	neilb
  Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
	yukuai (C)



在 2023/5/22 21:54, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>> From: Li Nan <linan122@huawei.com>
>>
>> Recovery will go to giveup and let chunks_skipped++ in
>> raid10_sync_request() if there are some bad_blocks, and it will return
>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>> data is inconsistent but user think recovery is done, it is wrong.
>>
>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>> added to here.
>>
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>   drivers/md/raid10.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e21502c03b45..70cc87c7ee57 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>       int chunks_skipped = 0;
>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>       int page_idx = 0;
>> +    int error_disk = -1;
>>       /*
>>        * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct 
>> mddev *mddev, sector_t sector_nr,
>>           return reshape_request(mddev, sector_nr, skipped);
>>       if (chunks_skipped >= conf->geo.raid_disks) {
>> -        /* if there has been nothing to do on any drive,
>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>> : "recovery");
> 
> Line exceed 80 columns, and following.
>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, 
>> &mddev->recovery)) {
> 
> Resync has the same problem, right?
> 

Yes. But I have no idea to fix it. md_error disk nor set 
recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
  2023-05-25 14:00     ` Li Nan
@ 2023-05-26  2:55       ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-26  2:55 UTC (permalink / raw)
  To: Li Nan, Yu Kuai, song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/05/25 22:00, Li Nan 写道:
> 
> 
> 在 2023/5/22 21:54, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
>>> From: Li Nan <linan122@huawei.com>
>>>
>>> Recovery will go to giveup and let chunks_skipped++ in
>>> raid10_sync_request() if there are some bad_blocks, and it will return
>>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>>> data is inconsistent but user think recovery is done, it is wrong.
>>>
>>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>>> added to here.
>>>
>>> Signed-off-by: Li Nan <linan122@huawei.com>
>>> ---
>>>   drivers/md/raid10.c | 16 +++++++++++++++-
>>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index e21502c03b45..70cc87c7ee57 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct 
>>> mddev *mddev, sector_t sector_nr,
>>>       int chunks_skipped = 0;
>>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>>       int page_idx = 0;
>>> +    int error_disk = -1;
>>>       /*
>>>        * Allow skipping a full rebuild for incremental assembly
>>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct 
>>> mddev *mddev, sector_t sector_nr,
>>>           return reshape_request(mddev, sector_nr, skipped);
>>>       if (chunks_skipped >= conf->geo.raid_disks) {
>>> -        /* if there has been nothing to do on any drive,
>>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>>> : "recovery");
>>
>> Line exceed 80 columns, and following.
>>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, 
>>> &mddev->recovery)) {
>>
>> Resync has the same problem, right?
>>
> 
> Yes. But I have no idea to fix it. md_error disk nor set 
> recovery_disabled is a good solution. So, just print error message now.
> Do you have any ideas?

I'll look into this, in the meadtime, I don't suggest to apply this
patch because this is just temporary solution that only fix half of
the problem.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev
  2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
  2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
  2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
@ 2023-05-22 11:54 ` linan666
  2023-05-22 13:03   ` Yu Kuai
  2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
  3 siblings, 1 reply; 14+ messages in thread
From: linan666 @ 2023-05-22 11:54 UTC (permalink / raw)
  To: song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

When we remove a disk which has replacement, first set rdev to NULL
and then set replacement to rdev, finally set replacement to NULL (see
raid10_remove_disk()). If io is submitted during the same time, it might
read both rdev and replacement as NULL, and io will not be submitted.

  rdev -> NULL
			read rdev
  replacement -> NULL
			read replacement

Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.

Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 70cc87c7ee57..25a5a7b1e95c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
 		disk = r10_bio->devs[slot].devnum;
 		rdev = rcu_dereference(conf->mirrors[disk].replacement);
 		if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
-		    r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+		    r10_bio->devs[slot].addr + sectors >
+		    rdev->recovery_offset) {
+			/*
+			 * Read replacement first to prevent reading both rdev
+			 * and replacement as NULL during replacement replace
+			 * rdev.
+			 */
+			smp_mb();
 			rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		    }
 		if (rdev == NULL ||
 		    test_bit(Faulty, &rdev->flags))
 			continue;
@@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 	for (i = 0;  i < conf->copies; i++) {
 		int d = r10_bio->devs[i].devnum;
-		struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
-		struct md_rdev *rrdev = rcu_dereference(
-			conf->mirrors[d].replacement);
+		struct md_rdev *rdev, *rrdev;
+
+		rrdev = rcu_dereference(conf->mirrors[d].replacement);
+		/*
+		 * Read replacement first to Prevent reading both rdev and
+		 * replacement as NULL during replacement replace rdev.
+		 */
+		smp_mb();
+		rdev = rcu_dereference(conf->mirrors[d].rdev);
 		if (rdev == rrdev)
 			rrdev = NULL;
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev
  2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-05-22 13:03   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2023-05-22 13:03 UTC (permalink / raw)
  To: linan666, song, shli, allenpeng, alexwu, bingjingc, neilb
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/05/22 19:54, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> When we remove a disk which has replacement, first set rdev to NULL
> and then set replacement to rdev, finally set replacement to NULL (see
> raid10_remove_disk()). If io is submitted during the same time, it might
> read both rdev and replacement as NULL, and io will not be submitted.
> 
>    rdev -> NULL
> 			read rdev
>    replacement -> NULL
> 			read replacement
> 
> Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
> to prevent memory reordering.

Looks good, feel free to add:

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> 
> Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 70cc87c7ee57..25a5a7b1e95c 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
>   		disk = r10_bio->devs[slot].devnum;
>   		rdev = rcu_dereference(conf->mirrors[disk].replacement);
>   		if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
> -		    r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
> +		    r10_bio->devs[slot].addr + sectors >
> +		    rdev->recovery_offset) {
> +			/*
> +			 * Read replacement first to prevent reading both rdev
> +			 * and replacement as NULL during replacement replace
> +			 * rdev.
> +			 */
> +			smp_mb();
>   			rdev = rcu_dereference(conf->mirrors[disk].rdev);
> +		    }
>   		if (rdev == NULL ||
>   		    test_bit(Faulty, &rdev->flags))
>   			continue;
> @@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
>   
>   	for (i = 0;  i < conf->copies; i++) {
>   		int d = r10_bio->devs[i].devnum;
> -		struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
> -		struct md_rdev *rrdev = rcu_dereference(
> -			conf->mirrors[d].replacement);
> +		struct md_rdev *rdev, *rrdev;
> +
> +		rrdev = rcu_dereference(conf->mirrors[d].replacement);
> +		/*
> +		 * Read replacement first to Prevent reading both rdev and
> +		 * replacement as NULL during replacement replace rdev.
> +		 */
> +		smp_mb();
> +		rdev = rcu_dereference(conf->mirrors[d].rdev);
>   		if (rdev == rrdev)
>   			rrdev = NULL;
>   		if (rdev && (test_bit(Faulty, &rdev->flags)))
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] raid10 bugfix
  2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
                   ` (2 preceding siblings ...)
  2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-05-23 18:44 ` Song Liu
  2023-05-25 14:01   ` Li Nan
  3 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2023-05-23 18:44 UTC (permalink / raw)
  To: linan666
  Cc: shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
	linux-kernel, linan122, yukuai3, yi.zhang, houtao1, yangerkun

On Mon, May 22, 2023 at 4:56 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Li Nan (3):
>   md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
>   md/raid10: fix incorrect done of recovery
>   md/raid10: fix io loss while replacement replace rdev

Please address feedback and send v2.

Thanks,
Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] raid10 bugfix
  2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
@ 2023-05-25 14:01   ` Li Nan
  0 siblings, 0 replies; 14+ messages in thread
From: Li Nan @ 2023-05-25 14:01 UTC (permalink / raw)
  To: Song Liu, linan666
  Cc: shli, allenpeng, alexwu, bingjingc, neilb, linux-raid,
	linux-kernel, yukuai3, yi.zhang, houtao1, yangerkun



在 2023/5/24 2:44, Song Liu 写道:
> On Mon, May 22, 2023 at 4:56 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> Li Nan (3):
>>    md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
>>    md/raid10: fix incorrect done of recovery
>>    md/raid10: fix io loss while replacement replace rdev
> 
> Please address feedback and send v2.
> 
> Thanks,
> Song
> .

Thanks for review, I will send v2 later.
-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/3] raid10 bugfix
@ 2023-06-28  1:57 linan666
  0 siblings, 0 replies; 14+ messages in thread
From: linan666 @ 2023-06-28  1:57 UTC (permalink / raw)
  To: song, guoqing.jiang, colyli, xni
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Li Nan (3):
  md/raid10: check replacement and rdev to prevent submit the same io
    twice
  md/raid10: factor out get_rdev_repl_from_mirror()
  md/raid10: use get_rdev_repl_from_mirror() to get devices

 drivers/md/raid10.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-28  1:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-22 11:54 [PATCH 0/3] raid10 bugfix linan666
2023-05-22 11:54 ` [PATCH 1/3] md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request linan666
2023-05-22 13:01   ` Yu Kuai
2023-05-25 13:55     ` Li Nan
2023-05-22 11:54 ` [PATCH 2/3] md/raid10: fix incorrect done of recovery linan666
2023-05-22 13:54   ` Yu Kuai
2023-05-22 16:03     ` Song Liu
2023-05-25 14:00     ` Li Nan
2023-05-26  2:55       ` Yu Kuai
2023-05-22 11:54 ` [PATCH 3/3] md/raid10: fix io loss while replacement replace rdev linan666
2023-05-22 13:03   ` Yu Kuai
2023-05-23 18:44 ` [PATCH 0/3] raid10 bugfix Song Liu
2023-05-25 14:01   ` Li Nan
  -- strict thread matches above, loose matches on Subject: below --
2023-06-28  1:57 linan666

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).