[PATCH v3 0/3] don't change mempool if in-flight r1bio exists

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] don't change mempool if in-flight r1bio exists
@ 2023-07-19  7:09 Xueshi Hu
  2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19  7:09 UTC (permalink / raw)
  To: song; +Cc: linux-raid, yukuai1, Xueshi Hu

All the r1bio should be freed before raid1_reshape() changes the
mempool. However, freeze_array() doesn't guarantee there's no r1bio,
as some r1bio maybe queued. Also, in raid1_write_request(), 
handle_read_error() and raid_end_bio_io() , kernel shall not 
allow_barrier() before the r1bio is properly handled.

-> v2:
	- fix the problem one by one instead of calling
	blk_mq_freeze_queue() as suggested by Yu Kuai
-> v3:
	- add freeze_array_totally() to replace freeze_array() instead
	  of gave up in raid1_reshape()
	- add a missed fix in raid_end_bio_io()
	- add a small check at the start of raid1_reshape()


Xueshi Hu (3):
  md/raid1: freeze array more strictly when reshape
  md/raid1: don't allow_barrier() before r1bio got freed
  md/raid1: check array size before reshape

 drivers/md/raid1.c | 59 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 11 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
  2023-07-19  7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
@ 2023-07-19  7:09 ` Xueshi Hu
  2023-07-20  1:36   ` Yu Kuai
  2023-07-19  7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
  2023-07-19  7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
  2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19  7:09 UTC (permalink / raw)
  To: song; +Cc: linux-raid, yukuai1, Xueshi Hu

When an IO error happens, reschedule_retry() will increase
r1conf::nr_queued, which makes freeze_array() unblocked. However, before
all r1bio in the memory pool are released, the memory pool should not be
modified. Introduce freeze_array_totally() to solve the problem. Compared
to freeze_array(), it's more strict because any in-flight io needs to
complete including queued io.

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index dd25832eb045..5605c9680818 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
 	/* Stop sync I/O and normal I/O and wait for everything to
 	 * go quiet.
 	 * This is called in two situations:
-	 * 1) management command handlers (reshape, remove disk, quiesce).
+	 * 1) management command handlers (remove disk, quiesce).
 	 * 2) one normal I/O request failed.
 
 	 * After array_frozen is set to 1, new sync IO will be blocked at
@@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
+/* conf->resync_lock should be held */
+static int get_pending(struct r1conf *conf)
+{
+	int idx, ret;
+
+	ret = atomic_read(&conf->nr_sync_pending);
+	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
+		ret += atomic_read(&conf->nr_pending[idx]);
+
+	return ret;
+}
+
+static void freeze_array_totally(struct r1conf *conf)
+{
+	/*
+	 * freeze_array_totally() is almost the same with freeze_array() except
+	 * it requires there's no queued io. Raid1's reshape will destroy the
+	 * old mempool and change r1conf::raid_disks, which are necessary when
+	 * freeing the queued io.
+	 */
+	spin_lock_irq(&conf->resync_lock);
+	conf->array_frozen = 1;
+	raid1_log(conf->mddev, "freeze totally");
+	wait_event_lock_irq_cmd(
+			conf->wait_barrier,
+			get_pending(conf) == 0,
+			conf->resync_lock,
+			md_wakeup_thread(conf->mddev->thread));
+	spin_unlock_irq(&conf->resync_lock);
+}
+
 static void alloc_behind_master_bio(struct r1bio *r1_bio,
 					   struct bio *bio)
 {
@@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
 		return -ENOMEM;
 	}
 
-	freeze_array(conf, 0);
+	freeze_array_totally(conf);
 
 	/* ok, everything is stopped */
 	oldpool = conf->r1bio_pool;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed
  2023-07-19  7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
  2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
@ 2023-07-19  7:09 ` Xueshi Hu
  2023-07-20  1:47   ` Yu Kuai
  2023-07-19  7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
  2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19  7:09 UTC (permalink / raw)
  To: song; +Cc: linux-raid, yukuai1, Xueshi Hu

allow_barrier() make reshape possible. Raid reshape changes the
r1conf::raid_disks and mempool. Free the r1bio firstly and then call
allow_barrier()

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 drivers/md/raid1.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5605c9680818..62e86b7d1561 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
 {
 	struct bio *bio = r1_bio->master_bio;
 	struct r1conf *conf = r1_bio->mddev->private;
+	sector_t sector = r1_bio->sector;
 
 	/* if nobody has done the final endio yet, do it now */
 	if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
@@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
 
 		call_bio_endio(r1_bio);
 	}
+
+	free_r1bio(r1_bio);
 	/*
 	 * Wake up any possible resync thread that waits for the device
 	 * to go idle.  All I/Os, even write-behind writes, are done.
 	 */
-	allow_barrier(conf, r1_bio->sector);
-
-	free_r1bio(r1_bio);
+	allow_barrier(conf, sector);
 }
 
 /*
@@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		return;
 	}
 
+ retry_write:
 	r1_bio = alloc_r1bio(mddev, bio);
 	r1_bio->sectors = max_write_sectors;
 
@@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	 */
 
 	disks = conf->raid_disks * 2;
- retry_write:
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
-		r1_bio->state = 0;
+		free_r1bio(r1_bio);
 		allow_barrier(conf, bio->bi_iter.bi_sector);
 
 		if (bio->bi_opf & REQ_NOWAIT) {
@@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	struct mddev *mddev = conf->mddev;
 	struct bio *bio;
 	struct md_rdev *rdev;
+	sector_t sector;
 
 	clear_bit(R1BIO_ReadError, &r1_bio->state);
 	/* we got a read error. Maybe the drive is bad.  Maybe just
@@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	}
 
 	rdev_dec_pending(rdev, conf->mddev);
-	allow_barrier(conf, r1_bio->sector);
+	sector = r1_bio->sector;
 	bio = r1_bio->master_bio;
 
 	/* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
 	r1_bio->state = 0;
 	raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
+	allow_barrier(conf, sector);
 }
 
 static void raid1d(struct md_thread *thread)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-19  7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
  2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
  2023-07-19  7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
@ 2023-07-19  7:09 ` Xueshi Hu
  2023-07-19  7:38   ` Paul Menzel
  2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19  7:09 UTC (permalink / raw)
  To: song; +Cc: linux-raid, yukuai1, Xueshi Hu

If array size doesn't changed, nothing need to do.

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 drivers/md/raid1.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 62e86b7d1561..5840b8b0f9b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
 	int d, d2;
 	int ret;
 
-	memset(&newpool, 0, sizeof(newpool));
-	memset(&oldpool, 0, sizeof(oldpool));
-
 	/* Cannot change chunk_size, layout, or level */
 	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
 	    mddev->layout != mddev->new_layout ||
@@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
 		return -EINVAL;
 	}
 
+	if (mddev->delta_disks == 0)
+		return 0; /* nothing to do */
+
+	memset(&newpool, 0, sizeof(newpool));
+	memset(&oldpool, 0, sizeof(oldpool));
+
 	if (!mddev_is_clustered(mddev))
 		md_allow_write(mddev);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-19  7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
@ 2023-07-19  7:38   ` Paul Menzel
  2023-07-19 11:51     ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Menzel @ 2023-07-19  7:38 UTC (permalink / raw)
  To: Xueshi Hu; +Cc: song, linux-raid, yukuai1

Dear Xueshi,


Thank you for your patches.

Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> If array size doesn't changed, nothing need to do.

Maybe: … nothing needs to be done.

Do you have a test case to reproduce it?


Kind regards,

Paul


> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 62e86b7d1561..5840b8b0f9b7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>   	int d, d2;
>   	int ret;
>   
> -	memset(&newpool, 0, sizeof(newpool));
> -	memset(&oldpool, 0, sizeof(oldpool));
> -
>   	/* Cannot change chunk_size, layout, or level */
>   	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>   	    mddev->layout != mddev->new_layout ||
> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>   		return -EINVAL;
>   	}
>   
> +	if (mddev->delta_disks == 0)
> +		return 0; /* nothing to do */
> +
> +	memset(&newpool, 0, sizeof(newpool));
> +	memset(&oldpool, 0, sizeof(oldpool));
> +
>   	if (!mddev_is_clustered(mddev))
>   		md_allow_write(mddev);
>   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-19  7:38   ` Paul Menzel
@ 2023-07-19 11:51     ` Xueshi Hu
  2023-07-20  1:28       ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 11:51 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Song Liu, linux-raid, Yu Kuai

On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> Dear Xueshi,
> 
> 
> Thank you for your patches.
> 
> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > If array size doesn't changed, nothing need to do.
> 
> Maybe: … nothing needs to be done.
What a hurry, I'll be cautious next time and check the sentences with
tools.
> 
> Do you have a test case to reproduce it?
> 
Userspace command:

	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks

Kernel function calling flow:

md_attr_store()
	raid_disks_store()
		update_raid_disks()
			raid1_reshape()

Maybe I shall provide more information when submit patches, thank you for
reminding me.
> 
> Kind regards,
> 
> Paul
> 
> 
> > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > ---
> >   drivers/md/raid1.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 62e86b7d1561..5840b8b0f9b7 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> >   	int d, d2;
> >   	int ret;
> > -	memset(&newpool, 0, sizeof(newpool));
> > -	memset(&oldpool, 0, sizeof(oldpool));
> > -
> >   	/* Cannot change chunk_size, layout, or level */
> >   	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> >   	    mddev->layout != mddev->new_layout ||
> > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> >   		return -EINVAL;
> >   	}
> > +	if (mddev->delta_disks == 0)
> > +		return 0; /* nothing to do */
> > +
> > +	memset(&newpool, 0, sizeof(newpool));
> > +	memset(&oldpool, 0, sizeof(oldpool));
> > +
> >   	if (!mddev_is_clustered(mddev))
> >   		md_allow_write(mddev);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-19 11:51     ` Xueshi Hu
@ 2023-07-20  1:28       ` Yu Kuai
  2023-07-28 14:42         ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20  1:28 UTC (permalink / raw)
  To: Xueshi Hu, Paul Menzel; +Cc: Song Liu, linux-raid, yukuai (C)

Hi,

在 2023/07/19 19:51, Xueshi Hu 写道:
> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>> Dear Xueshi,
>>
>>
>> Thank you for your patches.
>>
>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>> If array size doesn't changed, nothing need to do.
>>
>> Maybe: … nothing needs to be done.
> What a hurry, I'll be cautious next time and check the sentences with
> tools.
>>
>> Do you have a test case to reproduce it?
>>
> Userspace command:
> 
> 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> 
> Kernel function calling flow:
> 
> md_attr_store()
> 	raid_disks_store()
> 		update_raid_disks()
> 			raid1_reshape()
> 
> Maybe I shall provide more information when submit patches, thank you for
> reminding me.
>>
>> Kind regards,
>>
>> Paul
>>
>>
>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>> ---
>>>    drivers/md/raid1.c | 9 ++++++---
>>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>>    	int d, d2;
>>>    	int ret;
>>> -	memset(&newpool, 0, sizeof(newpool));
>>> -	memset(&oldpool, 0, sizeof(oldpool));
>>> -
>>>    	/* Cannot change chunk_size, layout, or level */
>>>    	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>>    	    mddev->layout != mddev->new_layout ||
>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>>    		return -EINVAL;
>>>    	}
>>> +	if (mddev->delta_disks == 0)
>>> +		return 0; /* nothing to do */

I think this is wrong, you should at least keep following:

         set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
         set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
         md_wakeup_thread(mddev->thread);

Thanks,
Kuai

>>> +
>>> +	memset(&newpool, 0, sizeof(newpool));
>>> +	memset(&oldpool, 0, sizeof(oldpool));
>>> +
>>>    	if (!mddev_is_clustered(mddev))
>>>    		md_allow_write(mddev);
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
  2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
@ 2023-07-20  1:36   ` Yu Kuai
  2023-07-20  1:37     ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20  1:36 UTC (permalink / raw)
  To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C)

Hi,

在 2023/07/19 15:09, Xueshi Hu 写道:
> When an IO error happens, reschedule_retry() will increase
> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> all r1bio in the memory pool are released, the memory pool should not be
> modified. Introduce freeze_array_totally() to solve the problem. Compared
> to freeze_array(), it's more strict because any in-flight io needs to
> complete including queued io.
> 
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>   1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index dd25832eb045..5605c9680818 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
>   	/* Stop sync I/O and normal I/O and wait for everything to
>   	 * go quiet.
>   	 * This is called in two situations:
> -	 * 1) management command handlers (reshape, remove disk, quiesce).
> +	 * 1) management command handlers (remove disk, quiesce).
>   	 * 2) one normal I/O request failed.
>   
>   	 * After array_frozen is set to 1, new sync IO will be blocked at
> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>   	wake_up(&conf->wait_barrier);
>   }
>   
> +/* conf->resync_lock should be held */
> +static int get_pending(struct r1conf *conf)
> +{
> +	int idx, ret;
> +
> +	ret = atomic_read(&conf->nr_sync_pending);
> +	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +		ret += atomic_read(&conf->nr_pending[idx]);
> +
> +	return ret;
> +}
> +
> +static void freeze_array_totally(struct r1conf *conf)
> +{
> +	/*
> +	 * freeze_array_totally() is almost the same with freeze_array() except
> +	 * it requires there's no queued io. Raid1's reshape will destroy the
> +	 * old mempool and change r1conf::raid_disks, which are necessary when
> +	 * freeing the queued io.
> +	 */
> +	spin_lock_irq(&conf->resync_lock);
> +	conf->array_frozen = 1;
> +	raid1_log(conf->mddev, "freeze totally");
> +	wait_event_lock_irq_cmd(
> +			conf->wait_barrier,
> +			get_pending(conf) == 0,
> +			conf->resync_lock,
> +			md_wakeup_thread(conf->mddev->thread));
> +	spin_unlock_irq(&conf->resync_lock);
> +}
> +
>   static void alloc_behind_master_bio(struct r1bio *r1_bio,
>   					   struct bio *bio)
>   {
> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>   		return -ENOMEM;
>   	}
>   
> -	freeze_array(conf, 0);
> +	freeze_array_totally(conf);

I think this is wrong, raid1_reshape() can't be called with
'reconfig_mutex' grabbed, and this will deadlock because failed io need
this lock to be handled by daemon thread.(see details in [1]).

Be aware that never hold 'reconfig_mutex' to wait for io.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>   
>   	/* ok, everything is stopped */
>   	oldpool = conf->r1bio_pool;
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
  2023-07-20  1:36   ` Yu Kuai
@ 2023-07-20  1:37     ` Yu Kuai
  2023-07-31 14:02       ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20  1:37 UTC (permalink / raw)
  To: Yu Kuai, Xueshi Hu, song; +Cc: linux-raid, yukuai (C)

Hi,

在 2023/07/20 9:36, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/19 15:09, Xueshi Hu 写道:
>> When an IO error happens, reschedule_retry() will increase
>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>> all r1bio in the memory pool are released, the memory pool should not be
>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>> to freeze_array(), it's more strict because any in-flight io needs to
>> complete including queued io.
>>
>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>> ---
>>   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>>   1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index dd25832eb045..5605c9680818 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, 
>> int extra)
>>       /* Stop sync I/O and normal I/O and wait for everything to
>>        * go quiet.
>>        * This is called in two situations:
>> -     * 1) management command handlers (reshape, remove disk, quiesce).
>> +     * 1) management command handlers (remove disk, quiesce).
>>        * 2) one normal I/O request failed.
>>        * After array_frozen is set to 1, new sync IO will be blocked at
>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>>       wake_up(&conf->wait_barrier);
>>   }
>> +/* conf->resync_lock should be held */
>> +static int get_pending(struct r1conf *conf)
>> +{
>> +    int idx, ret;
>> +
>> +    ret = atomic_read(&conf->nr_sync_pending);
>> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>> +        ret += atomic_read(&conf->nr_pending[idx]);
>> +
>> +    return ret;
>> +}
>> +
>> +static void freeze_array_totally(struct r1conf *conf)
>> +{
>> +    /*
>> +     * freeze_array_totally() is almost the same with freeze_array() 
>> except
>> +     * it requires there's no queued io. Raid1's reshape will destroy 
>> the
>> +     * old mempool and change r1conf::raid_disks, which are necessary 
>> when
>> +     * freeing the queued io.
>> +     */
>> +    spin_lock_irq(&conf->resync_lock);
>> +    conf->array_frozen = 1;
>> +    raid1_log(conf->mddev, "freeze totally");
>> +    wait_event_lock_irq_cmd(
>> +            conf->wait_barrier,
>> +            get_pending(conf) == 0,
>> +            conf->resync_lock,
>> +            md_wakeup_thread(conf->mddev->thread));
>> +    spin_unlock_irq(&conf->resync_lock);
>> +}
>> +
>>   static void alloc_behind_master_bio(struct r1bio *r1_bio,
>>                          struct bio *bio)
>>   {
>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>>           return -ENOMEM;
>>       }
>> -    freeze_array(conf, 0);
>> +    freeze_array_totally(conf);
> 
> I think this is wrong, raid1_reshape() can't be called with
Sorry about thi typo, I mean raid1_reshape() can be called with ...

Thanks,
Kuai
> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> this lock to be handled by daemon thread.(see details in [1]).
> 
> Be aware that never hold 'reconfig_mutex' to wait for io.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 
> 
>>       /* ok, everything is stopped */
>>       oldpool = conf->r1bio_pool;
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed
  2023-07-19  7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
@ 2023-07-20  1:47   ` Yu Kuai
  0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2023-07-20  1:47 UTC (permalink / raw)
  To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C)

Hi,

在 2023/07/19 15:09, Xueshi Hu 写道:
> allow_barrier() make reshape possible. Raid reshape changes the
> r1conf::raid_disks and mempool. Free the r1bio firstly and then call
> allow_barrier()
> 

After adding fixes tags, feel free to add:

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 5605c9680818..62e86b7d1561 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
>   {
>   	struct bio *bio = r1_bio->master_bio;
>   	struct r1conf *conf = r1_bio->mddev->private;
> +	sector_t sector = r1_bio->sector;
>   
>   	/* if nobody has done the final endio yet, do it now */
>   	if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
> @@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
>   
>   		call_bio_endio(r1_bio);
>   	}
> +
> +	free_r1bio(r1_bio);
>   	/*
>   	 * Wake up any possible resync thread that waits for the device
>   	 * to go idle.  All I/Os, even write-behind writes, are done.
>   	 */
> -	allow_barrier(conf, r1_bio->sector);
> -
> -	free_r1bio(r1_bio);
> +	allow_barrier(conf, sector);
>   }
>   
>   /*
> @@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		return;
>   	}
>   
> + retry_write:
>   	r1_bio = alloc_r1bio(mddev, bio);
>   	r1_bio->sectors = max_write_sectors;
>   
> @@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   	 */
>   
>   	disks = conf->raid_disks * 2;
> - retry_write:
>   	blocked_rdev = NULL;
>   	rcu_read_lock();
>   	max_sectors = r1_bio->sectors;
> @@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		for (j = 0; j < i; j++)
>   			if (r1_bio->bios[j])
>   				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> -		r1_bio->state = 0;
> +		free_r1bio(r1_bio);
>   		allow_barrier(conf, bio->bi_iter.bi_sector);
>   
>   		if (bio->bi_opf & REQ_NOWAIT) {
> @@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>   	struct mddev *mddev = conf->mddev;
>   	struct bio *bio;
>   	struct md_rdev *rdev;
> +	sector_t sector;
>   
>   	clear_bit(R1BIO_ReadError, &r1_bio->state);
>   	/* we got a read error. Maybe the drive is bad.  Maybe just
> @@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>   	}
>   
>   	rdev_dec_pending(rdev, conf->mddev);
> -	allow_barrier(conf, r1_bio->sector);
> +	sector = r1_bio->sector;
>   	bio = r1_bio->master_bio;
>   
>   	/* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
>   	r1_bio->state = 0;
>   	raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
> +	allow_barrier(conf, sector);
>   }
>   
>   static void raid1d(struct md_thread *thread)
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-20  1:28       ` Yu Kuai
@ 2023-07-28 14:42         ` Xueshi Hu
  2023-07-29  0:58           ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-28 14:42 UTC (permalink / raw)
  To: Yu Kuai; +Cc: song, linux-raid, pmenzel

On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/19 19:51, Xueshi Hu 写道:
> > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > Dear Xueshi,
> > > 
> > > 
> > > Thank you for your patches.
> > > 
> > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > If array size doesn't changed, nothing need to do.
> > > 
> > > Maybe: … nothing needs to be done.
> > What a hurry, I'll be cautious next time and check the sentences with
> > tools.
> > > 
> > > Do you have a test case to reproduce it?
> > > 
> > Userspace command:
> > 
> > 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> > 
> > Kernel function calling flow:
> > 
> > md_attr_store()
> > 	raid_disks_store()
> > 		update_raid_disks()
> > 			raid1_reshape()
> > 
> > Maybe I shall provide more information when submit patches, thank you for
> > reminding me.
> > > 
> > > Kind regards,
> > > 
> > > Paul
> > > 
> > > 
> > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > ---
> > > >    drivers/md/raid1.c | 9 ++++++---
> > > >    1 file changed, 6 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > --- a/drivers/md/raid1.c
> > > > +++ b/drivers/md/raid1.c
> > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > >    	int d, d2;
> > > >    	int ret;
> > > > -	memset(&newpool, 0, sizeof(newpool));
> > > > -	memset(&oldpool, 0, sizeof(oldpool));
> > > > -
> > > >    	/* Cannot change chunk_size, layout, or level */
> > > >    	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > >    	    mddev->layout != mddev->new_layout ||
> > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > >    		return -EINVAL;
> > > >    	}
> > > > +	if (mddev->delta_disks == 0)
> > > > +		return 0; /* nothing to do */
> 
> I think this is wrong, you should at least keep following:
> 
>         set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>         set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>         md_wakeup_thread(mddev->thread);
> 
I fail to comprehend the rationale behind the kernel's need to invoke 
raid1d() repeatedly despite the absence of any modifications.
It appears that raid1d() is responsible for invoking md_check_recovery() 
and resubmit the failed io.

Evidently, it is not within the scope of raid1_reshape() to resubmit
failed io.

Moreover, upon reviewing the Documentation[1], I could not find any
indications that reshape must undertake the actions as specified in
md_check_recovery()'s documentation, such as eliminating faulty devices.

[1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> Thanks,
> Kuai
> 
> > > > +
> > > > +	memset(&newpool, 0, sizeof(newpool));
> > > > +	memset(&oldpool, 0, sizeof(oldpool));
> > > > +
> > > >    	if (!mddev_is_clustered(mddev))
> > > >    		md_allow_write(mddev);
> > .
> > 
> 
Thanks,
Hu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-28 14:42         ` Xueshi Hu
@ 2023-07-29  0:58           ` Yu Kuai
  2023-07-29  3:29             ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29  0:58 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: song, linux-raid, pmenzel, yukuai (C)

Hi,

在 2023/07/28 22:42, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/19 19:51, Xueshi Hu 写道:
>>> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>>>> Dear Xueshi,
>>>>
>>>>
>>>> Thank you for your patches.
>>>>
>>>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>>>> If array size doesn't changed, nothing need to do.
>>>>
>>>> Maybe: … nothing needs to be done.
>>> What a hurry, I'll be cautious next time and check the sentences with
>>> tools.
>>>>
>>>> Do you have a test case to reproduce it?
>>>>
>>> Userspace command:
>>>
>>> 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
>>>
>>> Kernel function calling flow:
>>>
>>> md_attr_store()
>>> 	raid_disks_store()
>>> 		update_raid_disks()
>>> 			raid1_reshape()
>>>
>>> Maybe I shall provide more information when submit patches, thank you for
>>> reminding me.
>>>>
>>>> Kind regards,
>>>>
>>>> Paul
>>>>
>>>>
>>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>>> ---
>>>>>     drivers/md/raid1.c | 9 ++++++---
>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>>>> --- a/drivers/md/raid1.c
>>>>> +++ b/drivers/md/raid1.c
>>>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>>>>     	int d, d2;
>>>>>     	int ret;
>>>>> -	memset(&newpool, 0, sizeof(newpool));
>>>>> -	memset(&oldpool, 0, sizeof(oldpool));
>>>>> -
>>>>>     	/* Cannot change chunk_size, layout, or level */
>>>>>     	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>>>>     	    mddev->layout != mddev->new_layout ||
>>>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>>>>     		return -EINVAL;
>>>>>     	}
>>>>> +	if (mddev->delta_disks == 0)
>>>>> +		return 0; /* nothing to do */
>>
>> I think this is wrong, you should at least keep following:
>>
>>          set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>          set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>          md_wakeup_thread(mddev->thread);
>>
> I fail to comprehend the rationale behind the kernel's need to invoke
> raid1d() repeatedly despite the absence of any modifications.
> It appears that raid1d() is responsible for invoking md_check_recovery()
> and resubmit the failed io.

No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
so that md_check_recovery() can start to reshape.

Thanks,
Kuai
> 
> Evidently, it is not within the scope of raid1_reshape() to resubmit
> failed io.
> 
> Moreover, upon reviewing the Documentation[1], I could not find any
> indications that reshape must undertake the actions as specified in
> md_check_recovery()'s documentation, such as eliminating faulty devices.
> 
> [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
>> Thanks,
>> Kuai
>>
>>>>> +
>>>>> +	memset(&newpool, 0, sizeof(newpool));
>>>>> +	memset(&oldpool, 0, sizeof(oldpool));
>>>>> +
>>>>>     	if (!mddev_is_clustered(mddev))
>>>>>     		md_allow_write(mddev);
>>> .
>>>
>>
> Thanks,
> Hu
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  0:58           ` Yu Kuai
@ 2023-07-29  3:29             ` Xueshi Hu
  2023-07-29  3:36               ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29  3:29 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Sat, Jul 29, 2023 at 08:58:45AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/28 22:42, Xueshi Hu 写道:
> > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/19 19:51, Xueshi Hu 写道:
> > > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > > > Dear Xueshi,
> > > > > 
> > > > > 
> > > > > Thank you for your patches.
> > > > > 
> > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > > > If array size doesn't changed, nothing need to do.
> > > > > 
> > > > > Maybe: … nothing needs to be done.
> > > > What a hurry, I'll be cautious next time and check the sentences with
> > > > tools.
> > > > > 
> > > > > Do you have a test case to reproduce it?
> > > > > 
> > > > Userspace command:
> > > > 
> > > > 	echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> > > > 
> > > > Kernel function calling flow:
> > > > 
> > > > md_attr_store()
> > > > 	raid_disks_store()
> > > > 		update_raid_disks()
> > > > 			raid1_reshape()
> > > > 
> > > > Maybe I shall provide more information when submit patches, thank you for
> > > > reminding me.
> > > > > 
> > > > > Kind regards,
> > > > > 
> > > > > Paul
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > > > ---
> > > > > >     drivers/md/raid1.c | 9 ++++++---
> > > > > >     1 file changed, 6 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > > > --- a/drivers/md/raid1.c
> > > > > > +++ b/drivers/md/raid1.c
> > > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > >     	int d, d2;
> > > > > >     	int ret;
> > > > > > -	memset(&newpool, 0, sizeof(newpool));
> > > > > > -	memset(&oldpool, 0, sizeof(oldpool));
> > > > > > -
> > > > > >     	/* Cannot change chunk_size, layout, or level */
> > > > > >     	if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > > > >     	    mddev->layout != mddev->new_layout ||
> > > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > >     		return -EINVAL;
> > > > > >     	}
> > > > > > +	if (mddev->delta_disks == 0)
> > > > > > +		return 0; /* nothing to do */
> > > 
> > > I think this is wrong, you should at least keep following:
> > > 
> > >          set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > >          set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > >          md_wakeup_thread(mddev->thread);
> > > 
> > I fail to comprehend the rationale behind the kernel's need to invoke
> > raid1d() repeatedly despite the absence of any modifications.
> > It appears that raid1d() is responsible for invoking md_check_recovery()
> > and resubmit the failed io.
> 
> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> so that md_check_recovery() can start to reshape.
I apologize, but I am still unable to comprehend your idea.
If mmdev->delta_disks == 0 , what is the purpose of invoking
md_check_recovery() again?
> 
> Thanks,
> Kuai
> > 
> > Evidently, it is not within the scope of raid1_reshape() to resubmit
> > failed io.
> > 
> > Moreover, upon reviewing the Documentation[1], I could not find any
> > indications that reshape must undertake the actions as specified in
> > md_check_recovery()'s documentation, such as eliminating faulty devices.
> > 
> > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> > > Thanks,
> > > Kuai
> > > 
> > > > > > +
> > > > > > +	memset(&newpool, 0, sizeof(newpool));
> > > > > > +	memset(&oldpool, 0, sizeof(oldpool));
> > > > > > +
> > > > > >     	if (!mddev_is_clustered(mddev))
> > > > > >     		md_allow_write(mddev);
> > > > .
> > > > 
> > > 
> > Thanks,
> > Hu
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  3:29             ` Xueshi Hu
@ 2023-07-29  3:36               ` Yu Kuai
  2023-07-29  3:51                 ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29  3:36 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/29 11:29, Xueshi Hu 写道:
>>>> I think this is wrong, you should at least keep following:
>>>>
>>>>           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>           md_wakeup_thread(mddev->thread);
>>>>
>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>> raid1d() repeatedly despite the absence of any modifications.
>>> It appears that raid1d() is responsible for invoking md_check_recovery()
>>> and resubmit the failed io.
>>
>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>> so that md_check_recovery() can start to reshape.
> I apologize, but I am still unable to comprehend your idea.
> If mmdev->delta_disks == 0 , what is the purpose of invoking
> md_check_recovery() again?

Sounds like you think raid1_reshape() can only be called from
md_check_recovery(), please take a look at other callers.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  3:36               ` Yu Kuai
@ 2023-07-29  3:51                 ` Yu Kuai
  2023-07-29  6:16                   ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29  3:51 UTC (permalink / raw)
  To: Yu Kuai, Xueshi Hu; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/29 11:36, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>> I think this is wrong, you should at least keep following:
>>>>>
>>>>>           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>           md_wakeup_thread(mddev->thread);
>>>>>
>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>> raid1d() repeatedly despite the absence of any modifications.
>>>> It appears that raid1d() is responsible for invoking 
>>>> md_check_recovery()
>>>> and resubmit the failed io.
>>>
>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>> so that md_check_recovery() can start to reshape.
>> I apologize, but I am still unable to comprehend your idea.
>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>> md_check_recovery() again?
> 
> Sounds like you think raid1_reshape() can only be called from
> md_check_recovery(), please take a look at other callers.

And even if raid1_reshape() is called from md_check_recovery(),
which means reshape is interupted, then MD_RECOVERY_RECOVER
and MD_RECOVERY_RECOVER need to be set, so that reshape can
continue.

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  3:51                 ` Yu Kuai
@ 2023-07-29  6:16                   ` Xueshi Hu
  2023-07-29  7:37                     ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29  6:16 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 11:36, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > I think this is wrong, you should at least keep following:
> > > > > > 
> > > > > >           set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > >           set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > >           md_wakeup_thread(mddev->thread);
> > > > > > 
> > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > It appears that raid1d() is responsible for invoking
> > > > > md_check_recovery()
> > > > > and resubmit the failed io.
> > > > 
> > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > so that md_check_recovery() can start to reshape.
> > > I apologize, but I am still unable to comprehend your idea.
> > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > md_check_recovery() again?
> > 
> > Sounds like you think raid1_reshape() can only be called from
> > md_check_recovery(), please take a look at other callers.
Thank you for the quick reply and patience.

Of course, I have checked all the caller of md_personality::check_reshape.

- layout_store
- action_store
- chunk_size_store
- md_ioctl
	__md_set_array_info 
		update_array_info 
- md_check_recovery
- md_ioctl
	__md_set_array_info
		update_array_info
			update_raid_disks
- process_metadata_update
	md_reload_sb
		check_sb_changes
			update_raid_disks
- raid_disks_store
	update_raid_disks

There are three categories of callers except md_check_recovery().
1. write to sysfs
2. ioctl
3. revice instructions from md cluster peer

Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
example, if the mddev::raid_disks is already 4, I don't think 
raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
bit, then wake up the mddev::thread to call md_check_recovery().

Is there any counterexample to demonstrate the issues that may arise if
md_check_recovery() is not called out of raid1_reshape() ?

> 
> And even if raid1_reshape() is called from md_check_recovery(),
> which means reshape is interupted, then MD_RECOVERY_RECOVER
> and MD_RECOVERY_RECOVER need to be set, so that reshape can
> continue.

raid1 only register md_personality::check_reshape, all the work related
with reshape are handle in md_personality::check_reshape.
What's the meaning of "reshape can continue" ? In another word, what's
the additional work need to do if mddev::raid_disks doesn't change ?

Thanks,
Hu

> 
> Thanks,
> Kuai
> > 
> > Thanks,
> > Kuai
> > 
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  6:16                   ` Xueshi Hu
@ 2023-07-29  7:37                     ` Yu Kuai
  2023-07-29 12:23                       ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29  7:37 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/29 14:16, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>
>>>>>>>            set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>>            set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>>            md_wakeup_thread(mddev->thread);
>>>>>>>
>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>> It appears that raid1d() is responsible for invoking
>>>>>> md_check_recovery()
>>>>>> and resubmit the failed io.
>>>>>
>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>> so that md_check_recovery() can start to reshape.
>>>> I apologize, but I am still unable to comprehend your idea.
>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>> md_check_recovery() again?
>>>
>>> Sounds like you think raid1_reshape() can only be called from
>>> md_check_recovery(), please take a look at other callers.
> Thank you for the quick reply and patience.
> 
> Of course, I have checked all the caller of md_personality::check_reshape.
> 
> - layout_store
> - action_store
> - chunk_size_store
> - md_ioctl
> 	__md_set_array_info
> 		update_array_info
> - md_check_recovery
> - md_ioctl
> 	__md_set_array_info
> 		update_array_info
> 			update_raid_disks
> - process_metadata_update
> 	md_reload_sb
> 		check_sb_changes
> 			update_raid_disks
> - raid_disks_store
> 	update_raid_disks
> 
> There are three categories of callers except md_check_recovery().
> 1. write to sysfs
> 2. ioctl
> 3. revice instructions from md cluster peer
> 
> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> example, if the mddev::raid_disks is already 4, I don't think
> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> bit, then wake up the mddev::thread to call md_check_recovery().
> 
> Is there any counterexample to demonstrate the issues that may arise if
> md_check_recovery() is not called out of raid1_reshape() ?
> 
>>
>> And even if raid1_reshape() is called from md_check_recovery(),
>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>> continue.
> 
> raid1 only register md_personality::check_reshape, all the work related
> with reshape are handle in md_personality::check_reshape.
> What's the meaning of "reshape can continue" ? In another word, what's
> the additional work need to do if mddev::raid_disks doesn't change ?

Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
really means recovery, synchronize data to new disks. Never mind what
I said "reshape can continue", it's not possible for raid1.

Then the problem is the same from 'recovery' point of view:
if the 'recovery' is interrupted, before this patch, even if raid_disks
is the same, raid1_reshape() will still set the flag and then
md_check_recovery() will try to continue the recovery. I'd like to
keep this behaviour untill it can be sure that no user will be
affected.

Thanks,
Kuai

> 
> Thanks,
> Hu
> 
>>
>> Thanks,
>> Kuai
>>>
>>> Thanks,
>>> Kuai
>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29  7:37                     ` Yu Kuai
@ 2023-07-29 12:23                       ` Xueshi Hu
  2023-07-31  1:03                         ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29 12:23 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 14:16, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > Hi,
> > > > 
> > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > > 
> > > > > > > >            set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > >            set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > >            md_wakeup_thread(mddev->thread);
> > > > > > > > 
> > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > md_check_recovery()
> > > > > > > and resubmit the failed io.
> > > > > > 
> > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > so that md_check_recovery() can start to reshape.
> > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > md_check_recovery() again?
> > > > 
> > > > Sounds like you think raid1_reshape() can only be called from
> > > > md_check_recovery(), please take a look at other callers.
> > Thank you for the quick reply and patience.
> > 
> > Of course, I have checked all the caller of md_personality::check_reshape.
> > 
> > - layout_store
> > - action_store
> > - chunk_size_store
> > - md_ioctl
> > 	__md_set_array_info
> > 		update_array_info
> > - md_check_recovery
> > - md_ioctl
> > 	__md_set_array_info
> > 		update_array_info
> > 			update_raid_disks
> > - process_metadata_update
> > 	md_reload_sb
> > 		check_sb_changes
> > 			update_raid_disks
> > - raid_disks_store
> > 	update_raid_disks
> > 
> > There are three categories of callers except md_check_recovery().
> > 1. write to sysfs
> > 2. ioctl
> > 3. revice instructions from md cluster peer
> > 
> > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > example, if the mddev::raid_disks is already 4, I don't think
> > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > bit, then wake up the mddev::thread to call md_check_recovery().
> > 
> > Is there any counterexample to demonstrate the issues that may arise if
> > md_check_recovery() is not called out of raid1_reshape() ?
> > 
> > > 
> > > And even if raid1_reshape() is called from md_check_recovery(),
> > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > continue.
> > 
> > raid1 only register md_personality::check_reshape, all the work related
> > with reshape are handle in md_personality::check_reshape.
> > What's the meaning of "reshape can continue" ? In another word, what's
> > the additional work need to do if mddev::raid_disks doesn't change ?
> 
> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> really means recovery, synchronize data to new disks. Never mind what
> I said "reshape can continue", it's not possible for raid1.
> 
> Then the problem is the same from 'recovery' point of view:
> if the 'recovery' is interrupted, before this patch, even if raid_disks
> is the same, raid1_reshape() will still set the flag and then
> md_check_recovery() will try to continue the recovery. I'd like to
> keep this behaviour untill it can be sure that no user will be
> affected.

But md_check_recovery() will never call raid1_reshape().

md_personality::check_reshape() is called in md_check_recovery() when
the reshape is in process. But, raid1 is speicial as
mddev::reshape_position always equals to MaxSector in raid1.

By the way, the concern in V2 patch[1] is unnecessary out of the same
reason.

[1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t

> 
> Thanks,
> Kuai
> 
> > 
> > Thanks,
> > Hu
> > 
> > > 
> > > Thanks,
> > > Kuai
> > > > 
> > > > Thanks,
> > > > Kuai
> > > > 
> > > > .
> > > > 
> > > 
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-29 12:23                       ` Xueshi Hu
@ 2023-07-31  1:03                         ` Yu Kuai
  2023-07-31  3:48                           ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-31  1:03 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/29 20:23, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 14:16, Xueshi Hu 写道:
>>> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>>>
>>>>>>>>>             set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>>>>             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>>>>             md_wakeup_thread(mddev->thread);
>>>>>>>>>
>>>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>>>> It appears that raid1d() is responsible for invoking
>>>>>>>> md_check_recovery()
>>>>>>>> and resubmit the failed io.
>>>>>>>
>>>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>>>> so that md_check_recovery() can start to reshape.
>>>>>> I apologize, but I am still unable to comprehend your idea.
>>>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>>>> md_check_recovery() again?
>>>>>
>>>>> Sounds like you think raid1_reshape() can only be called from
>>>>> md_check_recovery(), please take a look at other callers.
>>> Thank you for the quick reply and patience.
>>>
>>> Of course, I have checked all the caller of md_personality::check_reshape.
>>>
>>> - layout_store
>>> - action_store
>>> - chunk_size_store
>>> - md_ioctl
>>> 	__md_set_array_info
>>> 		update_array_info
>>> - md_check_recovery
>>> - md_ioctl
>>> 	__md_set_array_info
>>> 		update_array_info
>>> 			update_raid_disks
>>> - process_metadata_update
>>> 	md_reload_sb
>>> 		check_sb_changes
>>> 			update_raid_disks
>>> - raid_disks_store
>>> 	update_raid_disks
>>>
>>> There are three categories of callers except md_check_recovery().
>>> 1. write to sysfs
>>> 2. ioctl
>>> 3. revice instructions from md cluster peer
>>>
>>> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
>>> example, if the mddev::raid_disks is already 4, I don't think
>>> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
>>> bit, then wake up the mddev::thread to call md_check_recovery().
>>>
>>> Is there any counterexample to demonstrate the issues that may arise if
>>> md_check_recovery() is not called out of raid1_reshape() ?
>>>
>>>>
>>>> And even if raid1_reshape() is called from md_check_recovery(),
>>>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>>>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>>>> continue.
>>>
>>> raid1 only register md_personality::check_reshape, all the work related
>>> with reshape are handle in md_personality::check_reshape.
>>> What's the meaning of "reshape can continue" ? In another word, what's
>>> the additional work need to do if mddev::raid_disks doesn't change ?
>>
>> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
>> really means recovery, synchronize data to new disks. Never mind what
>> I said "reshape can continue", it's not possible for raid1.
>>
>> Then the problem is the same from 'recovery' point of view:
>> if the 'recovery' is interrupted, before this patch, even if raid_disks
>> is the same, raid1_reshape() will still set the flag and then
>> md_check_recovery() will try to continue the recovery. I'd like to
>> keep this behaviour untill it can be sure that no user will be
>> affected.
> 
> But md_check_recovery() will never call raid1_reshape().
> 
> md_personality::check_reshape() is called in md_check_recovery() when
> the reshape is in process. But, raid1 is speicial as
> mddev::reshape_position always equals to MaxSector in raid1.
> 
> By the way, the concern in V2 patch[1] is unnecessary out of the same
> reason.
> 

Well... I just said reshape can continue is not possible for raid1, and
this patch will cause that recovery can't continue is some cases.

Thanks,
Kuai

> [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
> 
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Thanks,
>>> Hu
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-31  1:03                         ` Yu Kuai
@ 2023-07-31  3:48                           ` Xueshi Hu
  2023-07-31  6:22                             ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31  3:48 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Mon, Jul 31, 2023 at 09:03:52AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/29 20:23, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2023/07/29 14:16, Xueshi Hu 写道:
> > > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > > > Hi,
> > > > > 
> > > > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > > > Hi,
> > > > > > 
> > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > > > > 
> > > > > > > > > >             set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > > > >             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > > > >             md_wakeup_thread(mddev->thread);
> > > > > > > > > > 
> > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > > > md_check_recovery()
> > > > > > > > > and resubmit the failed io.
> > > > > > > > 
> > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > > > so that md_check_recovery() can start to reshape.
> > > > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > > > md_check_recovery() again?
> > > > > > 
> > > > > > Sounds like you think raid1_reshape() can only be called from
> > > > > > md_check_recovery(), please take a look at other callers.
> > > > Thank you for the quick reply and patience.
> > > > 
> > > > Of course, I have checked all the caller of md_personality::check_reshape.
> > > > 
> > > > - layout_store
> > > > - action_store
> > > > - chunk_size_store
> > > > - md_ioctl
> > > > 	__md_set_array_info
> > > > 		update_array_info
> > > > - md_check_recovery
> > > > - md_ioctl
> > > > 	__md_set_array_info
> > > > 		update_array_info
> > > > 			update_raid_disks
> > > > - process_metadata_update
> > > > 	md_reload_sb
> > > > 		check_sb_changes
> > > > 			update_raid_disks
> > > > - raid_disks_store
> > > > 	update_raid_disks
> > > > 
> > > > There are three categories of callers except md_check_recovery().
> > > > 1. write to sysfs
> > > > 2. ioctl
> > > > 3. revice instructions from md cluster peer
> > > > 
> > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > > > example, if the mddev::raid_disks is already 4, I don't think
> > > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > > > bit, then wake up the mddev::thread to call md_check_recovery().
> > > > 
> > > > Is there any counterexample to demonstrate the issues that may arise if
> > > > md_check_recovery() is not called out of raid1_reshape() ?
> > > > 
> > > > > 
> > > > > And even if raid1_reshape() is called from md_check_recovery(),
> > > > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > > > continue.
> > > > 
> > > > raid1 only register md_personality::check_reshape, all the work related
> > > > with reshape are handle in md_personality::check_reshape.
> > > > What's the meaning of "reshape can continue" ? In another word, what's
> > > > the additional work need to do if mddev::raid_disks doesn't change ?
> > > 
> > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> > > really means recovery, synchronize data to new disks. Never mind what
> > > I said "reshape can continue", it's not possible for raid1.
> > > 
> > > Then the problem is the same from 'recovery' point of view:
> > > if the 'recovery' is interrupted, before this patch, even if raid_disks
> > > is the same, raid1_reshape() will still set the flag and then
> > > md_check_recovery() will try to continue the recovery. I'd like to
> > > keep this behaviour untill it can be sure that no user will be
> > > affected.
> > 
> > But md_check_recovery() will never call raid1_reshape().
> > 
> > md_personality::check_reshape() is called in md_check_recovery() when
> > the reshape is in process. But, raid1 is speicial as
> > mddev::reshape_position always equals to MaxSector in raid1.
> > 
> > By the way, the concern in V2 patch[1] is unnecessary out of the same
> > reason.
> > 
> 
> Well... I just said reshape can continue is not possible for raid1, and
> this patch will cause that recovery can't continue is some cases.
I see. I will reread the relevant code to gain a better understanding of
"some cases".

Thanks,
Hu
> 
> Thanks,
> Kuai
> 
> > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
> > 
> > > 
> > > Thanks,
> > > Kuai
> > > 
> > > > 
> > > > Thanks,
> > > > Hu
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Kuai
> > > > > > 
> > > > > > Thanks,
> > > > > > Kuai
> > > > > > 
> > > > > > .
> > > > > > 
> > > > > 
> > > > .
> > > > 
> > > 
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-31  3:48                           ` Xueshi Hu
@ 2023-07-31  6:22                             ` Yu Kuai
  2023-07-31 14:12                               ` Xueshi Hu
  0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-31  6:22 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/31 11:48, Xueshi Hu 写道:
>> Well... I just said reshape can continue is not possible for raid1, and
>> this patch will cause that recovery can't continue is some cases.

> I see. I will reread the relevant code to gain a better understanding of
> "some cases".

It's not that complex at all, the key point is whether your changes
introduce functional changes, if so, does the effect fully evaluated.

In this case, raid1_reshape(all the callers) will set
MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
thread must check if recovery is needed, if so it will start recovery.

I agree that recovery is probably not needed in this case that
raid_disks is the same, however, note that recovery can be interrupted
by user, and raid1_reshape() can be triggered by user as well, this
means following procedures will end up different:

1. trigger a recovery;
2. interupt the recovery;
3. trigger a raid1_reshape();

Before this patch, the interupted recovery will continue;

This is minor, but I can't say that no user will be affected, and I
really prefer to keep this behaviour, which means this patch can just do
some cleanup without introducing functional changes.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
  2023-07-20  1:37     ` Yu Kuai
@ 2023-07-31 14:02       ` Xueshi Hu
  2023-08-01  1:24         ` Yu Kuai
  0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31 14:02 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/20 9:36, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2023/07/19 15:09, Xueshi Hu 写道:
> > > When an IO error happens, reschedule_retry() will increase
> > > r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> > > all r1bio in the memory pool are released, the memory pool should not be
> > > modified. Introduce freeze_array_totally() to solve the problem. Compared
> > > to freeze_array(), it's more strict because any in-flight io needs to
> > > complete including queued io.
> > > 
> > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > ---
> > >   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
> > >   1 file changed, 33 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index dd25832eb045..5605c9680818 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
> > > int extra)
> > >       /* Stop sync I/O and normal I/O and wait for everything to
> > >        * go quiet.
> > >        * This is called in two situations:
> > > -     * 1) management command handlers (reshape, remove disk, quiesce).
> > > +     * 1) management command handlers (remove disk, quiesce).
> > >        * 2) one normal I/O request failed.
> > >        * After array_frozen is set to 1, new sync IO will be blocked at
> > > @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
> > >       wake_up(&conf->wait_barrier);
> > >   }
> > > +/* conf->resync_lock should be held */
> > > +static int get_pending(struct r1conf *conf)
> > > +{
> > > +    int idx, ret;
> > > +
> > > +    ret = atomic_read(&conf->nr_sync_pending);
> > > +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> > > +        ret += atomic_read(&conf->nr_pending[idx]);
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +static void freeze_array_totally(struct r1conf *conf)
> > > +{
> > > +    /*
> > > +     * freeze_array_totally() is almost the same with
> > > freeze_array() except
> > > +     * it requires there's no queued io. Raid1's reshape will
> > > destroy the
> > > +     * old mempool and change r1conf::raid_disks, which are
> > > necessary when
> > > +     * freeing the queued io.
> > > +     */
> > > +    spin_lock_irq(&conf->resync_lock);
> > > +    conf->array_frozen = 1;
> > > +    raid1_log(conf->mddev, "freeze totally");
> > > +    wait_event_lock_irq_cmd(
> > > +            conf->wait_barrier,
> > > +            get_pending(conf) == 0,
> > > +            conf->resync_lock,
> > > +            md_wakeup_thread(conf->mddev->thread));
> > > +    spin_unlock_irq(&conf->resync_lock);
> > > +}
> > > +
> > >   static void alloc_behind_master_bio(struct r1bio *r1_bio,
> > >                          struct bio *bio)
> > >   {
> > > @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
> > >           return -ENOMEM;
> > >       }
> > > -    freeze_array(conf, 0);
> > > +    freeze_array_totally(conf);
> > 
> > I think this is wrong, raid1_reshape() can't be called with
> Sorry about thi typo, I mean raid1_reshape() can be called with ...
You're right, this is indeed a deadlock.

I am wondering whether this approach is viable

	if (unlikely(atomic_read(conf->nr_queued))) {
		kfree(newpoolinfo);
		mempool_exit(&newpool);
		unfreeze_array(conf);

		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
		md_wakeup_thread(mddev->thread);
		return -EBUSY;
	}

Thanks,
Hu

> 
> Thanks,
> Kuai
> > 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> > this lock to be handled by daemon thread.(see details in [1]).
> > 
> > Be aware that never hold 'reconfig_mutex' to wait for io.
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
> > 
> > >       /* ok, everything is stopped */
> > >       oldpool = conf->r1bio_pool;
> > > 
> > 
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
  2023-07-31  6:22                             ` Yu Kuai
@ 2023-07-31 14:12                               ` Xueshi Hu
  0 siblings, 0 replies; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31 14:12 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, pmenzel, song

On Mon, Jul 31, 2023 at 02:22:31PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/07/31 11:48, Xueshi Hu 写道:
> > > Well... I just said reshape can continue is not possible for raid1, and
> > > this patch will cause that recovery can't continue is some cases.
> 
> > I see. I will reread the relevant code to gain a better understanding of
> > "some cases".
> 
> It's not that complex at all, the key point is whether your changes
> introduce functional changes, if so, does the effect fully evaluated.
> 
> In this case, raid1_reshape(all the callers) will set
> MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
> thread must check if recovery is needed, if so it will start recovery.
> 
> I agree that recovery is probably not needed in this case that
> raid_disks is the same, however, note that recovery can be interrupted
> by user, and raid1_reshape() can be triggered by user as well, this
> means following procedures will end up different:
> 
> 1. trigger a recovery;
> 2. interupt the recovery;
> 3. trigger a raid1_reshape();
Sincere gratitude for your detailed response. I have been consistently
misunderstanding the concept of "interrupted recovery." I now understand
your concerns.

Thanks,
Hu
> 
> Before this patch, the interupted recovery will continue;
> 
> This is minor, but I can't say that no user will be affected, and I
> really prefer to keep this behaviour, which means this patch can just do
> some cleanup without introducing functional changes.
> 
> Thanks,
> Kuai
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
  2023-07-31 14:02       ` Xueshi Hu
@ 2023-08-01  1:24         ` Yu Kuai
  0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2023-08-01  1:24 UTC (permalink / raw)
  To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)

Hi,

在 2023/07/31 22:02, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/20 9:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/19 15:09, Xueshi Hu 写道:
>>>> When an IO error happens, reschedule_retry() will increase
>>>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>>>> all r1bio in the memory pool are released, the memory pool should not be
>>>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>>>> to freeze_array(), it's more strict because any in-flight io needs to
>>>> complete including queued io.
>>>>
>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>> ---
>>>>    drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 33 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>> index dd25832eb045..5605c9680818 100644
>>>> --- a/drivers/md/raid1.c
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
>>>> int extra)
>>>>        /* Stop sync I/O and normal I/O and wait for everything to
>>>>         * go quiet.
>>>>         * This is called in two situations:
>>>> -     * 1) management command handlers (reshape, remove disk, quiesce).
>>>> +     * 1) management command handlers (remove disk, quiesce).
>>>>         * 2) one normal I/O request failed.
>>>>         * After array_frozen is set to 1, new sync IO will be blocked at
>>>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>>>>        wake_up(&conf->wait_barrier);
>>>>    }
>>>> +/* conf->resync_lock should be held */
>>>> +static int get_pending(struct r1conf *conf)
>>>> +{
>>>> +    int idx, ret;
>>>> +
>>>> +    ret = atomic_read(&conf->nr_sync_pending);
>>>> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>>>> +        ret += atomic_read(&conf->nr_pending[idx]);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void freeze_array_totally(struct r1conf *conf)
>>>> +{
>>>> +    /*
>>>> +     * freeze_array_totally() is almost the same with
>>>> freeze_array() except
>>>> +     * it requires there's no queued io. Raid1's reshape will
>>>> destroy the
>>>> +     * old mempool and change r1conf::raid_disks, which are
>>>> necessary when
>>>> +     * freeing the queued io.
>>>> +     */
>>>> +    spin_lock_irq(&conf->resync_lock);
>>>> +    conf->array_frozen = 1;
>>>> +    raid1_log(conf->mddev, "freeze totally");
>>>> +    wait_event_lock_irq_cmd(
>>>> +            conf->wait_barrier,
>>>> +            get_pending(conf) == 0,
>>>> +            conf->resync_lock,
>>>> +            md_wakeup_thread(conf->mddev->thread));
>>>> +    spin_unlock_irq(&conf->resync_lock);
>>>> +}
>>>> +
>>>>    static void alloc_behind_master_bio(struct r1bio *r1_bio,
>>>>                           struct bio *bio)
>>>>    {
>>>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>>>>            return -ENOMEM;
>>>>        }
>>>> -    freeze_array(conf, 0);
>>>> +    freeze_array_totally(conf);
>>>
>>> I think this is wrong, raid1_reshape() can't be called with
>> Sorry about thi typo, I mean raid1_reshape() can be called with ...
> You're right, this is indeed a deadlock.
> 
> I am wondering whether this approach is viable
> 
> 	if (unlikely(atomic_read(conf->nr_queued))) {
> 		kfree(newpoolinfo);
> 		mempool_exit(&newpool);
> 		unfreeze_array(conf);
> 
> 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> 		md_wakeup_thread(mddev->thread);
> 		return -EBUSY;
> 	}

This is not okay, 'nr_queued' can be incresed at any time when normal io
failed, read it once doesn't mean anything, and you need to
freeze_array() before reading it:

freeze_array
// guarantee new io won't be dispatched
if (atomic_read(conf->nr_queued))
  ...
  unfreeze_array
  return -EBUSY;

Fortunately, I'm working on another patchset to synchronize io with
array configuration, which means all the callers of raid1_reshape() will
suspend the array, and no normal io will be in progress, hence this
problem won't exist as well.

Thanks,
Kuai

> 
> Thanks,
> Hu
> 
>>
>> Thanks,
>> Kuai
>>> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
>>> this lock to be handled by daemon thread.(see details in [1]).
>>>
>>> Be aware that never hold 'reconfig_mutex' to wait for io.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>>>
>>>>        /* ok, everything is stopped */
>>>>        oldpool = conf->r1bio_pool;
>>>>
>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-01  1:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-19  7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
2023-07-20  1:36   ` Yu Kuai
2023-07-20  1:37     ` Yu Kuai
2023-07-31 14:02       ` Xueshi Hu
2023-08-01  1:24         ` Yu Kuai
2023-07-19  7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
2023-07-20  1:47   ` Yu Kuai
2023-07-19  7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2023-07-19  7:38   ` Paul Menzel
2023-07-19 11:51     ` Xueshi Hu
2023-07-20  1:28       ` Yu Kuai
2023-07-28 14:42         ` Xueshi Hu
2023-07-29  0:58           ` Yu Kuai
2023-07-29  3:29             ` Xueshi Hu
2023-07-29  3:36               ` Yu Kuai
2023-07-29  3:51                 ` Yu Kuai
2023-07-29  6:16                   ` Xueshi Hu
2023-07-29  7:37                     ` Yu Kuai
2023-07-29 12:23                       ` Xueshi Hu
2023-07-31  1:03                         ` Yu Kuai
2023-07-31  3:48                           ` Xueshi Hu
2023-07-31  6:22                             ` Yu Kuai
2023-07-31 14:12                               ` Xueshi Hu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).