* [PATCH v3 0/3] don't change mempool if in-flight r1bio exists
@ 2023-07-19 7:09 Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw)
To: song; +Cc: linux-raid, yukuai1, Xueshi Hu
All the r1bio should be freed before raid1_reshape() changes the
mempool. However, freeze_array() doesn't guarantee there's no r1bio,
as some r1bio maybe queued. Also, in raid1_write_request(),
handle_read_error() and raid_end_bio_io() , kernel shall not
allow_barrier() before the r1bio is properly handled.
-> v2:
- fix the problem one by one instead of calling
blk_mq_freeze_queue() as suggested by Yu Kuai
-> v3:
- add freeze_array_totally() to replace freeze_array() instead
of gave up in raid1_reshape()
- add a missed fix in raid_end_bio_io()
- add a small check at the start of raid1_reshape()
Xueshi Hu (3):
md/raid1: freeze array more strictly when reshape
md/raid1: don't allow_barrier() before r1bio got freed
md/raid1: check array size before reshape
drivers/md/raid1.c | 59 +++++++++++++++++++++++++++++++++++++---------
1 file changed, 48 insertions(+), 11 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
@ 2023-07-19 7:09 ` Xueshi Hu
2023-07-20 1:36 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw)
To: song; +Cc: linux-raid, yukuai1, Xueshi Hu
When an IO error happens, reschedule_retry() will increase
r1conf::nr_queued, which makes freeze_array() unblocked. However, before
all r1bio in the memory pool are released, the memory pool should not be
modified. Introduce freeze_array_totally() to solve the problem. Compared
to freeze_array(), it's more strict because any in-flight io needs to
complete including queued io.
Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index dd25832eb045..5605c9680818 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
/* Stop sync I/O and normal I/O and wait for everything to
* go quiet.
* This is called in two situations:
- * 1) management command handlers (reshape, remove disk, quiesce).
+ * 1) management command handlers (remove disk, quiesce).
* 2) one normal I/O request failed.
* After array_frozen is set to 1, new sync IO will be blocked at
@@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
wake_up(&conf->wait_barrier);
}
+/* conf->resync_lock should be held */
+static int get_pending(struct r1conf *conf)
+{
+ int idx, ret;
+
+ ret = atomic_read(&conf->nr_sync_pending);
+ for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
+ ret += atomic_read(&conf->nr_pending[idx]);
+
+ return ret;
+}
+
+static void freeze_array_totally(struct r1conf *conf)
+{
+ /*
+ * freeze_array_totally() is almost the same with freeze_array() except
+ * it requires there's no queued io. Raid1's reshape will destroy the
+ * old mempool and change r1conf::raid_disks, which are necessary when
+ * freeing the queued io.
+ */
+ spin_lock_irq(&conf->resync_lock);
+ conf->array_frozen = 1;
+ raid1_log(conf->mddev, "freeze totally");
+ wait_event_lock_irq_cmd(
+ conf->wait_barrier,
+ get_pending(conf) == 0,
+ conf->resync_lock,
+ md_wakeup_thread(conf->mddev->thread));
+ spin_unlock_irq(&conf->resync_lock);
+}
+
static void alloc_behind_master_bio(struct r1bio *r1_bio,
struct bio *bio)
{
@@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
return -ENOMEM;
}
- freeze_array(conf, 0);
+ freeze_array_totally(conf);
/* ok, everything is stopped */
oldpool = conf->r1bio_pool;
--
2.40.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed
2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
@ 2023-07-19 7:09 ` Xueshi Hu
2023-07-20 1:47 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw)
To: song; +Cc: linux-raid, yukuai1, Xueshi Hu
allow_barrier() make reshape possible. Raid reshape changes the
r1conf::raid_disks and mempool. Free the r1bio firstly and then call
allow_barrier()
Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
drivers/md/raid1.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5605c9680818..62e86b7d1561 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
{
struct bio *bio = r1_bio->master_bio;
struct r1conf *conf = r1_bio->mddev->private;
+ sector_t sector = r1_bio->sector;
/* if nobody has done the final endio yet, do it now */
if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
@@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
call_bio_endio(r1_bio);
}
+
+ free_r1bio(r1_bio);
/*
* Wake up any possible resync thread that waits for the device
* to go idle. All I/Os, even write-behind writes, are done.
*/
- allow_barrier(conf, r1_bio->sector);
-
- free_r1bio(r1_bio);
+ allow_barrier(conf, sector);
}
/*
@@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
return;
}
+ retry_write:
r1_bio = alloc_r1bio(mddev, bio);
r1_bio->sectors = max_write_sectors;
@@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
*/
disks = conf->raid_disks * 2;
- retry_write:
blocked_rdev = NULL;
rcu_read_lock();
max_sectors = r1_bio->sectors;
@@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
for (j = 0; j < i; j++)
if (r1_bio->bios[j])
rdev_dec_pending(conf->mirrors[j].rdev, mddev);
- r1_bio->state = 0;
+ free_r1bio(r1_bio);
allow_barrier(conf, bio->bi_iter.bi_sector);
if (bio->bi_opf & REQ_NOWAIT) {
@@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
struct mddev *mddev = conf->mddev;
struct bio *bio;
struct md_rdev *rdev;
+ sector_t sector;
clear_bit(R1BIO_ReadError, &r1_bio->state);
/* we got a read error. Maybe the drive is bad. Maybe just
@@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
}
rdev_dec_pending(rdev, conf->mddev);
- allow_barrier(conf, r1_bio->sector);
+ sector = r1_bio->sector;
bio = r1_bio->master_bio;
/* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
r1_bio->state = 0;
raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
+ allow_barrier(conf, sector);
}
static void raid1d(struct md_thread *thread)
--
2.40.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
@ 2023-07-19 7:09 ` Xueshi Hu
2023-07-19 7:38 ` Paul Menzel
2 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw)
To: song; +Cc: linux-raid, yukuai1, Xueshi Hu
If array size doesn't changed, nothing need to do.
Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
drivers/md/raid1.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 62e86b7d1561..5840b8b0f9b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
int d, d2;
int ret;
- memset(&newpool, 0, sizeof(newpool));
- memset(&oldpool, 0, sizeof(oldpool));
-
/* Cannot change chunk_size, layout, or level */
if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
mddev->layout != mddev->new_layout ||
@@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
return -EINVAL;
}
+ if (mddev->delta_disks == 0)
+ return 0; /* nothing to do */
+
+ memset(&newpool, 0, sizeof(newpool));
+ memset(&oldpool, 0, sizeof(oldpool));
+
if (!mddev_is_clustered(mddev))
md_allow_write(mddev);
--
2.40.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
@ 2023-07-19 7:38 ` Paul Menzel
2023-07-19 11:51 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Paul Menzel @ 2023-07-19 7:38 UTC (permalink / raw)
To: Xueshi Hu; +Cc: song, linux-raid, yukuai1
Dear Xueshi,
Thank you for your patches.
Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> If array size doesn't changed, nothing need to do.
Maybe: … nothing needs to be done.
Do you have a test case to reproduce it?
Kind regards,
Paul
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
> drivers/md/raid1.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 62e86b7d1561..5840b8b0f9b7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> int d, d2;
> int ret;
>
> - memset(&newpool, 0, sizeof(newpool));
> - memset(&oldpool, 0, sizeof(oldpool));
> -
> /* Cannot change chunk_size, layout, or level */
> if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> mddev->layout != mddev->new_layout ||
> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> return -EINVAL;
> }
>
> + if (mddev->delta_disks == 0)
> + return 0; /* nothing to do */
> +
> + memset(&newpool, 0, sizeof(newpool));
> + memset(&oldpool, 0, sizeof(oldpool));
> +
> if (!mddev_is_clustered(mddev))
> md_allow_write(mddev);
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-19 7:38 ` Paul Menzel
@ 2023-07-19 11:51 ` Xueshi Hu
2023-07-20 1:28 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-19 11:51 UTC (permalink / raw)
To: Paul Menzel; +Cc: Song Liu, linux-raid, Yu Kuai
On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> Dear Xueshi,
>
>
> Thank you for your patches.
>
> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > If array size doesn't changed, nothing need to do.
>
> Maybe: … nothing needs to be done.
What a hurry, I'll be cautious next time and check the sentences with
tools.
>
> Do you have a test case to reproduce it?
>
Userspace command:
echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
Kernel function calling flow:
md_attr_store()
raid_disks_store()
update_raid_disks()
raid1_reshape()
Maybe I shall provide more information when submit patches, thank you for
reminding me.
>
> Kind regards,
>
> Paul
>
>
> > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > ---
> > drivers/md/raid1.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 62e86b7d1561..5840b8b0f9b7 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > int d, d2;
> > int ret;
> > - memset(&newpool, 0, sizeof(newpool));
> > - memset(&oldpool, 0, sizeof(oldpool));
> > -
> > /* Cannot change chunk_size, layout, or level */
> > if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > mddev->layout != mddev->new_layout ||
> > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > return -EINVAL;
> > }
> > + if (mddev->delta_disks == 0)
> > + return 0; /* nothing to do */
> > +
> > + memset(&newpool, 0, sizeof(newpool));
> > + memset(&oldpool, 0, sizeof(oldpool));
> > +
> > if (!mddev_is_clustered(mddev))
> > md_allow_write(mddev);
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-19 11:51 ` Xueshi Hu
@ 2023-07-20 1:28 ` Yu Kuai
2023-07-28 14:42 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20 1:28 UTC (permalink / raw)
To: Xueshi Hu, Paul Menzel; +Cc: Song Liu, linux-raid, yukuai (C)
Hi,
在 2023/07/19 19:51, Xueshi Hu 写道:
> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>> Dear Xueshi,
>>
>>
>> Thank you for your patches.
>>
>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>> If array size doesn't changed, nothing need to do.
>>
>> Maybe: … nothing needs to be done.
> What a hurry, I'll be cautious next time and check the sentences with
> tools.
>>
>> Do you have a test case to reproduce it?
>>
> Userspace command:
>
> echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
>
> Kernel function calling flow:
>
> md_attr_store()
> raid_disks_store()
> update_raid_disks()
> raid1_reshape()
>
> Maybe I shall provide more information when submit patches, thank you for
> reminding me.
>>
>> Kind regards,
>>
>> Paul
>>
>>
>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>> ---
>>> drivers/md/raid1.c | 9 ++++++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>> int d, d2;
>>> int ret;
>>> - memset(&newpool, 0, sizeof(newpool));
>>> - memset(&oldpool, 0, sizeof(oldpool));
>>> -
>>> /* Cannot change chunk_size, layout, or level */
>>> if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>> mddev->layout != mddev->new_layout ||
>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>> return -EINVAL;
>>> }
>>> + if (mddev->delta_disks == 0)
>>> + return 0; /* nothing to do */
I think this is wrong, you should at least keep following:
set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
Thanks,
Kuai
>>> +
>>> + memset(&newpool, 0, sizeof(newpool));
>>> + memset(&oldpool, 0, sizeof(oldpool));
>>> +
>>> if (!mddev_is_clustered(mddev))
>>> md_allow_write(mddev);
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
@ 2023-07-20 1:36 ` Yu Kuai
2023-07-20 1:37 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20 1:36 UTC (permalink / raw)
To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C)
Hi,
在 2023/07/19 15:09, Xueshi Hu 写道:
> When an IO error happens, reschedule_retry() will increase
> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> all r1bio in the memory pool are released, the memory pool should not be
> modified. Introduce freeze_array_totally() to solve the problem. Compared
> to freeze_array(), it's more strict because any in-flight io needs to
> complete including queued io.
>
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index dd25832eb045..5605c9680818 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
> /* Stop sync I/O and normal I/O and wait for everything to
> * go quiet.
> * This is called in two situations:
> - * 1) management command handlers (reshape, remove disk, quiesce).
> + * 1) management command handlers (remove disk, quiesce).
> * 2) one normal I/O request failed.
>
> * After array_frozen is set to 1, new sync IO will be blocked at
> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
> wake_up(&conf->wait_barrier);
> }
>
> +/* conf->resync_lock should be held */
> +static int get_pending(struct r1conf *conf)
> +{
> + int idx, ret;
> +
> + ret = atomic_read(&conf->nr_sync_pending);
> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> + ret += atomic_read(&conf->nr_pending[idx]);
> +
> + return ret;
> +}
> +
> +static void freeze_array_totally(struct r1conf *conf)
> +{
> + /*
> + * freeze_array_totally() is almost the same with freeze_array() except
> + * it requires there's no queued io. Raid1's reshape will destroy the
> + * old mempool and change r1conf::raid_disks, which are necessary when
> + * freeing the queued io.
> + */
> + spin_lock_irq(&conf->resync_lock);
> + conf->array_frozen = 1;
> + raid1_log(conf->mddev, "freeze totally");
> + wait_event_lock_irq_cmd(
> + conf->wait_barrier,
> + get_pending(conf) == 0,
> + conf->resync_lock,
> + md_wakeup_thread(conf->mddev->thread));
> + spin_unlock_irq(&conf->resync_lock);
> +}
> +
> static void alloc_behind_master_bio(struct r1bio *r1_bio,
> struct bio *bio)
> {
> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
> return -ENOMEM;
> }
>
> - freeze_array(conf, 0);
> + freeze_array_totally(conf);
I think this is wrong, raid1_reshape() can't be called with
'reconfig_mutex' grabbed, and this will deadlock because failed io need
this lock to be handled by daemon thread.(see details in [1]).
Be aware that never hold 'reconfig_mutex' to wait for io.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>
> /* ok, everything is stopped */
> oldpool = conf->r1bio_pool;
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
2023-07-20 1:36 ` Yu Kuai
@ 2023-07-20 1:37 ` Yu Kuai
2023-07-31 14:02 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-20 1:37 UTC (permalink / raw)
To: Yu Kuai, Xueshi Hu, song; +Cc: linux-raid, yukuai (C)
Hi,
在 2023/07/20 9:36, Yu Kuai 写道:
> Hi,
>
> 在 2023/07/19 15:09, Xueshi Hu 写道:
>> When an IO error happens, reschedule_retry() will increase
>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>> all r1bio in the memory pool are released, the memory pool should not be
>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>> to freeze_array(), it's more strict because any in-flight io needs to
>> complete including queued io.
>>
>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>> ---
>> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>> 1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index dd25832eb045..5605c9680818 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
>> int extra)
>> /* Stop sync I/O and normal I/O and wait for everything to
>> * go quiet.
>> * This is called in two situations:
>> - * 1) management command handlers (reshape, remove disk, quiesce).
>> + * 1) management command handlers (remove disk, quiesce).
>> * 2) one normal I/O request failed.
>> * After array_frozen is set to 1, new sync IO will be blocked at
>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>> wake_up(&conf->wait_barrier);
>> }
>> +/* conf->resync_lock should be held */
>> +static int get_pending(struct r1conf *conf)
>> +{
>> + int idx, ret;
>> +
>> + ret = atomic_read(&conf->nr_sync_pending);
>> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>> + ret += atomic_read(&conf->nr_pending[idx]);
>> +
>> + return ret;
>> +}
>> +
>> +static void freeze_array_totally(struct r1conf *conf)
>> +{
>> + /*
>> + * freeze_array_totally() is almost the same with freeze_array()
>> except
>> + * it requires there's no queued io. Raid1's reshape will destroy
>> the
>> + * old mempool and change r1conf::raid_disks, which are necessary
>> when
>> + * freeing the queued io.
>> + */
>> + spin_lock_irq(&conf->resync_lock);
>> + conf->array_frozen = 1;
>> + raid1_log(conf->mddev, "freeze totally");
>> + wait_event_lock_irq_cmd(
>> + conf->wait_barrier,
>> + get_pending(conf) == 0,
>> + conf->resync_lock,
>> + md_wakeup_thread(conf->mddev->thread));
>> + spin_unlock_irq(&conf->resync_lock);
>> +}
>> +
>> static void alloc_behind_master_bio(struct r1bio *r1_bio,
>> struct bio *bio)
>> {
>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>> return -ENOMEM;
>> }
>> - freeze_array(conf, 0);
>> + freeze_array_totally(conf);
>
> I think this is wrong, raid1_reshape() can't be called with
Sorry about thi typo, I mean raid1_reshape() can be called with ...
Thanks,
Kuai
> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> this lock to be handled by daemon thread.(see details in [1]).
>
> Be aware that never hold 'reconfig_mutex' to wait for io.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>
>> /* ok, everything is stopped */
>> oldpool = conf->r1bio_pool;
>>
>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed
2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
@ 2023-07-20 1:47 ` Yu Kuai
0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2023-07-20 1:47 UTC (permalink / raw)
To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C)
Hi,
在 2023/07/19 15:09, Xueshi Hu 写道:
> allow_barrier() make reshape possible. Raid reshape changes the
> r1conf::raid_disks and mempool. Free the r1bio firstly and then call
> allow_barrier()
>
After adding fixes tags, feel free to add:
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
> drivers/md/raid1.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 5605c9680818..62e86b7d1561 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
> {
> struct bio *bio = r1_bio->master_bio;
> struct r1conf *conf = r1_bio->mddev->private;
> + sector_t sector = r1_bio->sector;
>
> /* if nobody has done the final endio yet, do it now */
> if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
> @@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
>
> call_bio_endio(r1_bio);
> }
> +
> + free_r1bio(r1_bio);
> /*
> * Wake up any possible resync thread that waits for the device
> * to go idle. All I/Os, even write-behind writes, are done.
> */
> - allow_barrier(conf, r1_bio->sector);
> -
> - free_r1bio(r1_bio);
> + allow_barrier(conf, sector);
> }
>
> /*
> @@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> return;
> }
>
> + retry_write:
> r1_bio = alloc_r1bio(mddev, bio);
> r1_bio->sectors = max_write_sectors;
>
> @@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> */
>
> disks = conf->raid_disks * 2;
> - retry_write:
> blocked_rdev = NULL;
> rcu_read_lock();
> max_sectors = r1_bio->sectors;
> @@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> for (j = 0; j < i; j++)
> if (r1_bio->bios[j])
> rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> - r1_bio->state = 0;
> + free_r1bio(r1_bio);
> allow_barrier(conf, bio->bi_iter.bi_sector);
>
> if (bio->bi_opf & REQ_NOWAIT) {
> @@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> struct mddev *mddev = conf->mddev;
> struct bio *bio;
> struct md_rdev *rdev;
> + sector_t sector;
>
> clear_bit(R1BIO_ReadError, &r1_bio->state);
> /* we got a read error. Maybe the drive is bad. Maybe just
> @@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> }
>
> rdev_dec_pending(rdev, conf->mddev);
> - allow_barrier(conf, r1_bio->sector);
> + sector = r1_bio->sector;
> bio = r1_bio->master_bio;
>
> /* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
> r1_bio->state = 0;
> raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
> + allow_barrier(conf, sector);
> }
>
> static void raid1d(struct md_thread *thread)
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-20 1:28 ` Yu Kuai
@ 2023-07-28 14:42 ` Xueshi Hu
2023-07-29 0:58 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-28 14:42 UTC (permalink / raw)
To: Yu Kuai; +Cc: song, linux-raid, pmenzel
On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/19 19:51, Xueshi Hu 写道:
> > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > Dear Xueshi,
> > >
> > >
> > > Thank you for your patches.
> > >
> > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > If array size doesn't changed, nothing need to do.
> > >
> > > Maybe: … nothing needs to be done.
> > What a hurry, I'll be cautious next time and check the sentences with
> > tools.
> > >
> > > Do you have a test case to reproduce it?
> > >
> > Userspace command:
> >
> > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> >
> > Kernel function calling flow:
> >
> > md_attr_store()
> > raid_disks_store()
> > update_raid_disks()
> > raid1_reshape()
> >
> > Maybe I shall provide more information when submit patches, thank you for
> > reminding me.
> > >
> > > Kind regards,
> > >
> > > Paul
> > >
> > >
> > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > ---
> > > > drivers/md/raid1.c | 9 ++++++---
> > > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > --- a/drivers/md/raid1.c
> > > > +++ b/drivers/md/raid1.c
> > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > > int d, d2;
> > > > int ret;
> > > > - memset(&newpool, 0, sizeof(newpool));
> > > > - memset(&oldpool, 0, sizeof(oldpool));
> > > > -
> > > > /* Cannot change chunk_size, layout, or level */
> > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > > mddev->layout != mddev->new_layout ||
> > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > > return -EINVAL;
> > > > }
> > > > + if (mddev->delta_disks == 0)
> > > > + return 0; /* nothing to do */
>
> I think this is wrong, you should at least keep following:
>
> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> md_wakeup_thread(mddev->thread);
>
I fail to comprehend the rationale behind the kernel's need to invoke
raid1d() repeatedly despite the absence of any modifications.
It appears that raid1d() is responsible for invoking md_check_recovery()
and resubmit the failed io.
Evidently, it is not within the scope of raid1_reshape() to resubmit
failed io.
Moreover, upon reviewing the Documentation[1], I could not find any
indications that reshape must undertake the actions as specified in
md_check_recovery()'s documentation, such as eliminating faulty devices.
[1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> Thanks,
> Kuai
>
> > > > +
> > > > + memset(&newpool, 0, sizeof(newpool));
> > > > + memset(&oldpool, 0, sizeof(oldpool));
> > > > +
> > > > if (!mddev_is_clustered(mddev))
> > > > md_allow_write(mddev);
> > .
> >
>
Thanks,
Hu
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-28 14:42 ` Xueshi Hu
@ 2023-07-29 0:58 ` Yu Kuai
2023-07-29 3:29 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29 0:58 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: song, linux-raid, pmenzel, yukuai (C)
Hi,
在 2023/07/28 22:42, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/19 19:51, Xueshi Hu 写道:
>>> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
>>>> Dear Xueshi,
>>>>
>>>>
>>>> Thank you for your patches.
>>>>
>>>> Am 19.07.23 um 09:09 schrieb Xueshi Hu:
>>>>> If array size doesn't changed, nothing need to do.
>>>>
>>>> Maybe: … nothing needs to be done.
>>> What a hurry, I'll be cautious next time and check the sentences with
>>> tools.
>>>>
>>>> Do you have a test case to reproduce it?
>>>>
>>> Userspace command:
>>>
>>> echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
>>>
>>> Kernel function calling flow:
>>>
>>> md_attr_store()
>>> raid_disks_store()
>>> update_raid_disks()
>>> raid1_reshape()
>>>
>>> Maybe I shall provide more information when submit patches, thank you for
>>> reminding me.
>>>>
>>>> Kind regards,
>>>>
>>>> Paul
>>>>
>>>>
>>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>>> ---
>>>>> drivers/md/raid1.c | 9 ++++++---
>>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>>> index 62e86b7d1561..5840b8b0f9b7 100644
>>>>> --- a/drivers/md/raid1.c
>>>>> +++ b/drivers/md/raid1.c
>>>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
>>>>> int d, d2;
>>>>> int ret;
>>>>> - memset(&newpool, 0, sizeof(newpool));
>>>>> - memset(&oldpool, 0, sizeof(oldpool));
>>>>> -
>>>>> /* Cannot change chunk_size, layout, or level */
>>>>> if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
>>>>> mddev->layout != mddev->new_layout ||
>>>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
>>>>> return -EINVAL;
>>>>> }
>>>>> + if (mddev->delta_disks == 0)
>>>>> + return 0; /* nothing to do */
>>
>> I think this is wrong, you should at least keep following:
>>
>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>> md_wakeup_thread(mddev->thread);
>>
> I fail to comprehend the rationale behind the kernel's need to invoke
> raid1d() repeatedly despite the absence of any modifications.
> It appears that raid1d() is responsible for invoking md_check_recovery()
> and resubmit the failed io.
No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
so that md_check_recovery() can start to reshape.
Thanks,
Kuai
>
> Evidently, it is not within the scope of raid1_reshape() to resubmit
> failed io.
>
> Moreover, upon reviewing the Documentation[1], I could not find any
> indications that reshape must undertake the actions as specified in
> md_check_recovery()'s documentation, such as eliminating faulty devices.
>
> [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
>> Thanks,
>> Kuai
>>
>>>>> +
>>>>> + memset(&newpool, 0, sizeof(newpool));
>>>>> + memset(&oldpool, 0, sizeof(oldpool));
>>>>> +
>>>>> if (!mddev_is_clustered(mddev))
>>>>> md_allow_write(mddev);
>>> .
>>>
>>
> Thanks,
> Hu
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 0:58 ` Yu Kuai
@ 2023-07-29 3:29 ` Xueshi Hu
2023-07-29 3:36 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29 3:29 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Sat, Jul 29, 2023 at 08:58:45AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/28 22:42, Xueshi Hu 写道:
> > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote:
> > > Hi,
> > >
> > > 在 2023/07/19 19:51, Xueshi Hu 写道:
> > > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote:
> > > > > Dear Xueshi,
> > > > >
> > > > >
> > > > > Thank you for your patches.
> > > > >
> > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu:
> > > > > > If array size doesn't changed, nothing need to do.
> > > > >
> > > > > Maybe: … nothing needs to be done.
> > > > What a hurry, I'll be cautious next time and check the sentences with
> > > > tools.
> > > > >
> > > > > Do you have a test case to reproduce it?
> > > > >
> > > > Userspace command:
> > > >
> > > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks
> > > >
> > > > Kernel function calling flow:
> > > >
> > > > md_attr_store()
> > > > raid_disks_store()
> > > > update_raid_disks()
> > > > raid1_reshape()
> > > >
> > > > Maybe I shall provide more information when submit patches, thank you for
> > > > reminding me.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Paul
> > > > >
> > > > >
> > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > > > > ---
> > > > > > drivers/md/raid1.c | 9 ++++++---
> > > > > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > > > > index 62e86b7d1561..5840b8b0f9b7 100644
> > > > > > --- a/drivers/md/raid1.c
> > > > > > +++ b/drivers/md/raid1.c
> > > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > > int d, d2;
> > > > > > int ret;
> > > > > > - memset(&newpool, 0, sizeof(newpool));
> > > > > > - memset(&oldpool, 0, sizeof(oldpool));
> > > > > > -
> > > > > > /* Cannot change chunk_size, layout, or level */
> > > > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
> > > > > > mddev->layout != mddev->new_layout ||
> > > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev)
> > > > > > return -EINVAL;
> > > > > > }
> > > > > > + if (mddev->delta_disks == 0)
> > > > > > + return 0; /* nothing to do */
> > >
> > > I think this is wrong, you should at least keep following:
> > >
> > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > md_wakeup_thread(mddev->thread);
> > >
> > I fail to comprehend the rationale behind the kernel's need to invoke
> > raid1d() repeatedly despite the absence of any modifications.
> > It appears that raid1d() is responsible for invoking md_check_recovery()
> > and resubmit the failed io.
>
> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> so that md_check_recovery() can start to reshape.
I apologize, but I am still unable to comprehend your idea.
If mmdev->delta_disks == 0 , what is the purpose of invoking
md_check_recovery() again?
>
> Thanks,
> Kuai
> >
> > Evidently, it is not within the scope of raid1_reshape() to resubmit
> > failed io.
> >
> > Moreover, upon reviewing the Documentation[1], I could not find any
> > indications that reshape must undertake the actions as specified in
> > md_check_recovery()'s documentation, such as eliminating faulty devices.
> >
> > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html
> > > Thanks,
> > > Kuai
> > >
> > > > > > +
> > > > > > + memset(&newpool, 0, sizeof(newpool));
> > > > > > + memset(&oldpool, 0, sizeof(oldpool));
> > > > > > +
> > > > > > if (!mddev_is_clustered(mddev))
> > > > > > md_allow_write(mddev);
> > > > .
> > > >
> > >
> > Thanks,
> > Hu
> > .
> >
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 3:29 ` Xueshi Hu
@ 2023-07-29 3:36 ` Yu Kuai
2023-07-29 3:51 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29 3:36 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/29 11:29, Xueshi Hu 写道:
>>>> I think this is wrong, you should at least keep following:
>>>>
>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>> md_wakeup_thread(mddev->thread);
>>>>
>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>> raid1d() repeatedly despite the absence of any modifications.
>>> It appears that raid1d() is responsible for invoking md_check_recovery()
>>> and resubmit the failed io.
>>
>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>> so that md_check_recovery() can start to reshape.
> I apologize, but I am still unable to comprehend your idea.
> If mmdev->delta_disks == 0 , what is the purpose of invoking
> md_check_recovery() again?
Sounds like you think raid1_reshape() can only be called from
md_check_recovery(), please take a look at other callers.
Thanks,
Kuai
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 3:36 ` Yu Kuai
@ 2023-07-29 3:51 ` Yu Kuai
2023-07-29 6:16 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29 3:51 UTC (permalink / raw)
To: Yu Kuai, Xueshi Hu; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/29 11:36, Yu Kuai 写道:
> Hi,
>
> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>> I think this is wrong, you should at least keep following:
>>>>>
>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>> md_wakeup_thread(mddev->thread);
>>>>>
>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>> raid1d() repeatedly despite the absence of any modifications.
>>>> It appears that raid1d() is responsible for invoking
>>>> md_check_recovery()
>>>> and resubmit the failed io.
>>>
>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>> so that md_check_recovery() can start to reshape.
>> I apologize, but I am still unable to comprehend your idea.
>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>> md_check_recovery() again?
>
> Sounds like you think raid1_reshape() can only be called from
> md_check_recovery(), please take a look at other callers.
And even if raid1_reshape() is called from md_check_recovery(),
which means reshape is interupted, then MD_RECOVERY_RECOVER
and MD_RECOVERY_RECOVER need to be set, so that reshape can
continue.
Thanks,
Kuai
>
> Thanks,
> Kuai
>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 3:51 ` Yu Kuai
@ 2023-07-29 6:16 ` Xueshi Hu
2023-07-29 7:37 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29 6:16 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/29 11:36, Yu Kuai 写道:
> > Hi,
> >
> > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > I think this is wrong, you should at least keep following:
> > > > > >
> > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > md_wakeup_thread(mddev->thread);
> > > > > >
> > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > It appears that raid1d() is responsible for invoking
> > > > > md_check_recovery()
> > > > > and resubmit the failed io.
> > > >
> > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > so that md_check_recovery() can start to reshape.
> > > I apologize, but I am still unable to comprehend your idea.
> > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > md_check_recovery() again?
> >
> > Sounds like you think raid1_reshape() can only be called from
> > md_check_recovery(), please take a look at other callers.
Thank you for the quick reply and patience.
Of course, I have checked all the caller of md_personality::check_reshape.
- layout_store
- action_store
- chunk_size_store
- md_ioctl
__md_set_array_info
update_array_info
- md_check_recovery
- md_ioctl
__md_set_array_info
update_array_info
update_raid_disks
- process_metadata_update
md_reload_sb
check_sb_changes
update_raid_disks
- raid_disks_store
update_raid_disks
There are three categories of callers except md_check_recovery().
1. write to sysfs
2. ioctl
3. revice instructions from md cluster peer
Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
example, if the mddev::raid_disks is already 4, I don't think
raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
bit, then wake up the mddev::thread to call md_check_recovery().
Is there any counterexample to demonstrate the issues that may arise if
md_check_recovery() is not called out of raid1_reshape() ?
>
> And even if raid1_reshape() is called from md_check_recovery(),
> which means reshape is interupted, then MD_RECOVERY_RECOVER
> and MD_RECOVERY_RECOVER need to be set, so that reshape can
> continue.
raid1 only register md_personality::check_reshape, all the work related
with reshape are handle in md_personality::check_reshape.
What's the meaning of "reshape can continue" ? In another word, what's
the additional work need to do if mddev::raid_disks doesn't change ?
Thanks,
Hu
>
> Thanks,
> Kuai
> >
> > Thanks,
> > Kuai
> >
> > .
> >
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 6:16 ` Xueshi Hu
@ 2023-07-29 7:37 ` Yu Kuai
2023-07-29 12:23 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-29 7:37 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/29 14:16, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>
>>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>> md_wakeup_thread(mddev->thread);
>>>>>>>
>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>> It appears that raid1d() is responsible for invoking
>>>>>> md_check_recovery()
>>>>>> and resubmit the failed io.
>>>>>
>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>> so that md_check_recovery() can start to reshape.
>>>> I apologize, but I am still unable to comprehend your idea.
>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>> md_check_recovery() again?
>>>
>>> Sounds like you think raid1_reshape() can only be called from
>>> md_check_recovery(), please take a look at other callers.
> Thank you for the quick reply and patience.
>
> Of course, I have checked all the caller of md_personality::check_reshape.
>
> - layout_store
> - action_store
> - chunk_size_store
> - md_ioctl
> __md_set_array_info
> update_array_info
> - md_check_recovery
> - md_ioctl
> __md_set_array_info
> update_array_info
> update_raid_disks
> - process_metadata_update
> md_reload_sb
> check_sb_changes
> update_raid_disks
> - raid_disks_store
> update_raid_disks
>
> There are three categories of callers except md_check_recovery().
> 1. write to sysfs
> 2. ioctl
> 3. revice instructions from md cluster peer
>
> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> example, if the mddev::raid_disks is already 4, I don't think
> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> bit, then wake up the mddev::thread to call md_check_recovery().
>
> Is there any counterexample to demonstrate the issues that may arise if
> md_check_recovery() is not called out of raid1_reshape() ?
>
>>
>> And even if raid1_reshape() is called from md_check_recovery(),
>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>> continue.
>
> raid1 only register md_personality::check_reshape, all the work related
> with reshape are handle in md_personality::check_reshape.
> What's the meaning of "reshape can continue" ? In another word, what's
> the additional work need to do if mddev::raid_disks doesn't change ?
Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
really means recovery, synchronize data to new disks. Never mind what
I said "reshape can continue", it's not possible for raid1.
Then the problem is the same from 'recovery' point of view:
if the 'recovery' is interrupted, before this patch, even if raid_disks
is the same, raid1_reshape() will still set the flag and then
md_check_recovery() will try to continue the recovery. I'd like to
keep this behaviour untill it can be sure that no user will be
affected.
Thanks,
Kuai
>
> Thanks,
> Hu
>
>>
>> Thanks,
>> Kuai
>>>
>>> Thanks,
>>> Kuai
>>>
>>> .
>>>
>>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 7:37 ` Yu Kuai
@ 2023-07-29 12:23 ` Xueshi Hu
2023-07-31 1:03 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-29 12:23 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/29 14:16, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > Hi,
> > >
> > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > Hi,
> > > >
> > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > >
> > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > > md_wakeup_thread(mddev->thread);
> > > > > > > >
> > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > md_check_recovery()
> > > > > > > and resubmit the failed io.
> > > > > >
> > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > so that md_check_recovery() can start to reshape.
> > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > md_check_recovery() again?
> > > >
> > > > Sounds like you think raid1_reshape() can only be called from
> > > > md_check_recovery(), please take a look at other callers.
> > Thank you for the quick reply and patience.
> >
> > Of course, I have checked all the caller of md_personality::check_reshape.
> >
> > - layout_store
> > - action_store
> > - chunk_size_store
> > - md_ioctl
> > __md_set_array_info
> > update_array_info
> > - md_check_recovery
> > - md_ioctl
> > __md_set_array_info
> > update_array_info
> > update_raid_disks
> > - process_metadata_update
> > md_reload_sb
> > check_sb_changes
> > update_raid_disks
> > - raid_disks_store
> > update_raid_disks
> >
> > There are three categories of callers except md_check_recovery().
> > 1. write to sysfs
> > 2. ioctl
> > 3. revice instructions from md cluster peer
> >
> > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > example, if the mddev::raid_disks is already 4, I don't think
> > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > bit, then wake up the mddev::thread to call md_check_recovery().
> >
> > Is there any counterexample to demonstrate the issues that may arise if
> > md_check_recovery() is not called out of raid1_reshape() ?
> >
> > >
> > > And even if raid1_reshape() is called from md_check_recovery(),
> > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > continue.
> >
> > raid1 only register md_personality::check_reshape, all the work related
> > with reshape are handle in md_personality::check_reshape.
> > What's the meaning of "reshape can continue" ? In another word, what's
> > the additional work need to do if mddev::raid_disks doesn't change ?
>
> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> really means recovery, synchronize data to new disks. Never mind what
> I said "reshape can continue", it's not possible for raid1.
>
> Then the problem is the same from 'recovery' point of view:
> if the 'recovery' is interrupted, before this patch, even if raid_disks
> is the same, raid1_reshape() will still set the flag and then
> md_check_recovery() will try to continue the recovery. I'd like to
> keep this behaviour untill it can be sure that no user will be
> affected.
But md_check_recovery() will never call raid1_reshape().
md_personality::check_reshape() is called in md_check_recovery() when
the reshape is in process. But, raid1 is speicial as
mddev::reshape_position always equals to MaxSector in raid1.
By the way, the concern in V2 patch[1] is unnecessary out of the same
reason.
[1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
>
> Thanks,
> Kuai
>
> >
> > Thanks,
> > Hu
> >
> > >
> > > Thanks,
> > > Kuai
> > > >
> > > > Thanks,
> > > > Kuai
> > > >
> > > > .
> > > >
> > >
> > .
> >
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-29 12:23 ` Xueshi Hu
@ 2023-07-31 1:03 ` Yu Kuai
2023-07-31 3:48 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-31 1:03 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/29 20:23, Xueshi Hu 写道:
> On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/29 14:16, Xueshi Hu 写道:
>>> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2023/07/29 11:36, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2023/07/29 11:29, Xueshi Hu 写道:
>>>>>>>>> I think this is wrong, you should at least keep following:
>>>>>>>>>
>>>>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>>>>>>> md_wakeup_thread(mddev->thread);
>>>>>>>>>
>>>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke
>>>>>>>> raid1d() repeatedly despite the absence of any modifications.
>>>>>>>> It appears that raid1d() is responsible for invoking
>>>>>>>> md_check_recovery()
>>>>>>>> and resubmit the failed io.
>>>>>>>
>>>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
>>>>>>> so that md_check_recovery() can start to reshape.
>>>>>> I apologize, but I am still unable to comprehend your idea.
>>>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking
>>>>>> md_check_recovery() again?
>>>>>
>>>>> Sounds like you think raid1_reshape() can only be called from
>>>>> md_check_recovery(), please take a look at other callers.
>>> Thank you for the quick reply and patience.
>>>
>>> Of course, I have checked all the caller of md_personality::check_reshape.
>>>
>>> - layout_store
>>> - action_store
>>> - chunk_size_store
>>> - md_ioctl
>>> __md_set_array_info
>>> update_array_info
>>> - md_check_recovery
>>> - md_ioctl
>>> __md_set_array_info
>>> update_array_info
>>> update_raid_disks
>>> - process_metadata_update
>>> md_reload_sb
>>> check_sb_changes
>>> update_raid_disks
>>> - raid_disks_store
>>> update_raid_disks
>>>
>>> There are three categories of callers except md_check_recovery().
>>> 1. write to sysfs
>>> 2. ioctl
>>> 3. revice instructions from md cluster peer
>>>
>>> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
>>> example, if the mddev::raid_disks is already 4, I don't think
>>> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
>>> bit, then wake up the mddev::thread to call md_check_recovery().
>>>
>>> Is there any counterexample to demonstrate the issues that may arise if
>>> md_check_recovery() is not called out of raid1_reshape() ?
>>>
>>>>
>>>> And even if raid1_reshape() is called from md_check_recovery(),
>>>> which means reshape is interupted, then MD_RECOVERY_RECOVER
>>>> and MD_RECOVERY_RECOVER need to be set, so that reshape can
>>>> continue.
>>>
>>> raid1 only register md_personality::check_reshape, all the work related
>>> with reshape are handle in md_personality::check_reshape.
>>> What's the meaning of "reshape can continue" ? In another word, what's
>>> the additional work need to do if mddev::raid_disks doesn't change ?
>>
>> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
>> really means recovery, synchronize data to new disks. Never mind what
>> I said "reshape can continue", it's not possible for raid1.
>>
>> Then the problem is the same from 'recovery' point of view:
>> if the 'recovery' is interrupted, before this patch, even if raid_disks
>> is the same, raid1_reshape() will still set the flag and then
>> md_check_recovery() will try to continue the recovery. I'd like to
>> keep this behaviour untill it can be sure that no user will be
>> affected.
>
> But md_check_recovery() will never call raid1_reshape().
>
> md_personality::check_reshape() is called in md_check_recovery() when
> the reshape is in process. But, raid1 is speicial as
> mddev::reshape_position always equals to MaxSector in raid1.
>
> By the way, the concern in V2 patch[1] is unnecessary out of the same
> reason.
>
Well... I just said reshape can continue is not possible for raid1, and
this patch will cause that recovery can't continue is some cases.
Thanks,
Kuai
> [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
>
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Thanks,
>>> Hu
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-31 1:03 ` Yu Kuai
@ 2023-07-31 3:48 ` Xueshi Hu
2023-07-31 6:22 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31 3:48 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Mon, Jul 31, 2023 at 09:03:52AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/29 20:23, Xueshi Hu 写道:
> > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote:
> > > Hi,
> > >
> > > 在 2023/07/29 14:16, Xueshi Hu 写道:
> > > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote:
> > > > > Hi,
> > > > >
> > > > > 在 2023/07/29 11:36, Yu Kuai 写道:
> > > > > > Hi,
> > > > > >
> > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道:
> > > > > > > > > > I think this is wrong, you should at least keep following:
> > > > > > > > > >
> > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > > > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > > > > > > > > md_wakeup_thread(mddev->thread);
> > > > > > > > > >
> > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke
> > > > > > > > > raid1d() repeatedly despite the absence of any modifications.
> > > > > > > > > It appears that raid1d() is responsible for invoking
> > > > > > > > > md_check_recovery()
> > > > > > > > > and resubmit the failed io.
> > > > > > > >
> > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER,
> > > > > > > > so that md_check_recovery() can start to reshape.
> > > > > > > I apologize, but I am still unable to comprehend your idea.
> > > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking
> > > > > > > md_check_recovery() again?
> > > > > >
> > > > > > Sounds like you think raid1_reshape() can only be called from
> > > > > > md_check_recovery(), please take a look at other callers.
> > > > Thank you for the quick reply and patience.
> > > >
> > > > Of course, I have checked all the caller of md_personality::check_reshape.
> > > >
> > > > - layout_store
> > > > - action_store
> > > > - chunk_size_store
> > > > - md_ioctl
> > > > __md_set_array_info
> > > > update_array_info
> > > > - md_check_recovery
> > > > - md_ioctl
> > > > __md_set_array_info
> > > > update_array_info
> > > > update_raid_disks
> > > > - process_metadata_update
> > > > md_reload_sb
> > > > check_sb_changes
> > > > update_raid_disks
> > > > - raid_disks_store
> > > > update_raid_disks
> > > >
> > > > There are three categories of callers except md_check_recovery().
> > > > 1. write to sysfs
> > > > 2. ioctl
> > > > 3. revice instructions from md cluster peer
> > > >
> > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an
> > > > example, if the mddev::raid_disks is already 4, I don't think
> > > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED
> > > > bit, then wake up the mddev::thread to call md_check_recovery().
> > > >
> > > > Is there any counterexample to demonstrate the issues that may arise if
> > > > md_check_recovery() is not called out of raid1_reshape() ?
> > > >
> > > > >
> > > > > And even if raid1_reshape() is called from md_check_recovery(),
> > > > > which means reshape is interupted, then MD_RECOVERY_RECOVER
> > > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can
> > > > > continue.
> > > >
> > > > raid1 only register md_personality::check_reshape, all the work related
> > > > with reshape are handle in md_personality::check_reshape.
> > > > What's the meaning of "reshape can continue" ? In another word, what's
> > > > the additional work need to do if mddev::raid_disks doesn't change ?
> > >
> > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here
> > > really means recovery, synchronize data to new disks. Never mind what
> > > I said "reshape can continue", it's not possible for raid1.
> > >
> > > Then the problem is the same from 'recovery' point of view:
> > > if the 'recovery' is interrupted, before this patch, even if raid_disks
> > > is the same, raid1_reshape() will still set the flag and then
> > > md_check_recovery() will try to continue the recovery. I'd like to
> > > keep this behaviour untill it can be sure that no user will be
> > > affected.
> >
> > But md_check_recovery() will never call raid1_reshape().
> >
> > md_personality::check_reshape() is called in md_check_recovery() when
> > the reshape is in process. But, raid1 is speicial as
> > mddev::reshape_position always equals to MaxSector in raid1.
> >
> > By the way, the concern in V2 patch[1] is unnecessary out of the same
> > reason.
> >
>
> Well... I just said reshape can continue is not possible for raid1, and
> this patch will cause that recovery can't continue is some cases.
I see. I will reread the relevant code to gain a better understanding of
"some cases".
Thanks,
Hu
>
> Thanks,
> Kuai
>
> > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t
> >
> > >
> > > Thanks,
> > > Kuai
> > >
> > > >
> > > > Thanks,
> > > > Hu
> > > >
> > > > >
> > > > > Thanks,
> > > > > Kuai
> > > > > >
> > > > > > Thanks,
> > > > > > Kuai
> > > > > >
> > > > > > .
> > > > > >
> > > > >
> > > > .
> > > >
> > >
> > .
> >
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-31 3:48 ` Xueshi Hu
@ 2023-07-31 6:22 ` Yu Kuai
2023-07-31 14:12 ` Xueshi Hu
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2023-07-31 6:22 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/31 11:48, Xueshi Hu 写道:
>> Well... I just said reshape can continue is not possible for raid1, and
>> this patch will cause that recovery can't continue is some cases.
> I see. I will reread the relevant code to gain a better understanding of
> "some cases".
It's not that complex at all, the key point is whether your changes
introduce functional changes, if so, does the effect fully evaluated.
In this case, raid1_reshape(all the callers) will set
MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
thread must check if recovery is needed, if so it will start recovery.
I agree that recovery is probably not needed in this case that
raid_disks is the same, however, note that recovery can be interrupted
by user, and raid1_reshape() can be triggered by user as well, this
means following procedures will end up different:
1. trigger a recovery;
2. interupt the recovery;
3. trigger a raid1_reshape();
Before this patch, the interupted recovery will continue;
This is minor, but I can't say that no user will be affected, and I
really prefer to keep this behaviour, which means this patch can just do
some cleanup without introducing functional changes.
Thanks,
Kuai
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
2023-07-20 1:37 ` Yu Kuai
@ 2023-07-31 14:02 ` Xueshi Hu
2023-08-01 1:24 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31 14:02 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/20 9:36, Yu Kuai 写道:
> > Hi,
> >
> > 在 2023/07/19 15:09, Xueshi Hu 写道:
> > > When an IO error happens, reschedule_retry() will increase
> > > r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> > > all r1bio in the memory pool are released, the memory pool should not be
> > > modified. Introduce freeze_array_totally() to solve the problem. Compared
> > > to freeze_array(), it's more strict because any in-flight io needs to
> > > complete including queued io.
> > >
> > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> > > ---
> > > drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
> > > 1 file changed, 33 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index dd25832eb045..5605c9680818 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
> > > int extra)
> > > /* Stop sync I/O and normal I/O and wait for everything to
> > > * go quiet.
> > > * This is called in two situations:
> > > - * 1) management command handlers (reshape, remove disk, quiesce).
> > > + * 1) management command handlers (remove disk, quiesce).
> > > * 2) one normal I/O request failed.
> > > * After array_frozen is set to 1, new sync IO will be blocked at
> > > @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
> > > wake_up(&conf->wait_barrier);
> > > }
> > > +/* conf->resync_lock should be held */
> > > +static int get_pending(struct r1conf *conf)
> > > +{
> > > + int idx, ret;
> > > +
> > > + ret = atomic_read(&conf->nr_sync_pending);
> > > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> > > + ret += atomic_read(&conf->nr_pending[idx]);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static void freeze_array_totally(struct r1conf *conf)
> > > +{
> > > + /*
> > > + * freeze_array_totally() is almost the same with
> > > freeze_array() except
> > > + * it requires there's no queued io. Raid1's reshape will
> > > destroy the
> > > + * old mempool and change r1conf::raid_disks, which are
> > > necessary when
> > > + * freeing the queued io.
> > > + */
> > > + spin_lock_irq(&conf->resync_lock);
> > > + conf->array_frozen = 1;
> > > + raid1_log(conf->mddev, "freeze totally");
> > > + wait_event_lock_irq_cmd(
> > > + conf->wait_barrier,
> > > + get_pending(conf) == 0,
> > > + conf->resync_lock,
> > > + md_wakeup_thread(conf->mddev->thread));
> > > + spin_unlock_irq(&conf->resync_lock);
> > > +}
> > > +
> > > static void alloc_behind_master_bio(struct r1bio *r1_bio,
> > > struct bio *bio)
> > > {
> > > @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
> > > return -ENOMEM;
> > > }
> > > - freeze_array(conf, 0);
> > > + freeze_array_totally(conf);
> >
> > I think this is wrong, raid1_reshape() can't be called with
> Sorry about thi typo, I mean raid1_reshape() can be called with ...
You're right, this is indeed a deadlock.
I am wondering whether this approach is viable
if (unlikely(atomic_read(conf->nr_queued))) {
kfree(newpoolinfo);
mempool_exit(&newpool);
unfreeze_array(conf);
set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
return -EBUSY;
}
Thanks,
Hu
>
> Thanks,
> Kuai
> > 'reconfig_mutex' grabbed, and this will deadlock because failed io need
> > this lock to be handled by daemon thread.(see details in [1]).
> >
> > Be aware that never hold 'reconfig_mutex' to wait for io.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
> >
> > > /* ok, everything is stopped */
> > > oldpool = conf->r1bio_pool;
> > >
> >
> > .
> >
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape
2023-07-31 6:22 ` Yu Kuai
@ 2023-07-31 14:12 ` Xueshi Hu
0 siblings, 0 replies; 24+ messages in thread
From: Xueshi Hu @ 2023-07-31 14:12 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, pmenzel, song
On Mon, Jul 31, 2023 at 02:22:31PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2023/07/31 11:48, Xueshi Hu 写道:
> > > Well... I just said reshape can continue is not possible for raid1, and
> > > this patch will cause that recovery can't continue is some cases.
>
> > I see. I will reread the relevant code to gain a better understanding of
> > "some cases".
>
> It's not that complex at all, the key point is whether your changes
> introduce functional changes, if so, does the effect fully evaluated.
>
> In this case, raid1_reshape(all the callers) will set
> MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon
> thread must check if recovery is needed, if so it will start recovery.
>
> I agree that recovery is probably not needed in this case that
> raid_disks is the same, however, note that recovery can be interrupted
> by user, and raid1_reshape() can be triggered by user as well, this
> means following procedures will end up different:
>
> 1. trigger a recovery;
> 2. interupt the recovery;
> 3. trigger a raid1_reshape();
Sincere gratitude for your detailed response. I have been consistently
misunderstanding the concept of "interrupted recovery." I now understand
your concerns.
Thanks,
Hu
>
> Before this patch, the interupted recovery will continue;
>
> This is minor, but I can't say that no user will be affected, and I
> really prefer to keep this behaviour, which means this patch can just do
> some cleanup without introducing functional changes.
>
> Thanks,
> Kuai
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
2023-07-31 14:02 ` Xueshi Hu
@ 2023-08-01 1:24 ` Yu Kuai
0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2023-08-01 1:24 UTC (permalink / raw)
To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C)
Hi,
在 2023/07/31 22:02, Xueshi Hu 写道:
> On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/07/20 9:36, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2023/07/19 15:09, Xueshi Hu 写道:
>>>> When an IO error happens, reschedule_retry() will increase
>>>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
>>>> all r1bio in the memory pool are released, the memory pool should not be
>>>> modified. Introduce freeze_array_totally() to solve the problem. Compared
>>>> to freeze_array(), it's more strict because any in-flight io needs to
>>>> complete including queued io.
>>>>
>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
>>>> ---
>>>> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 33 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>> index dd25832eb045..5605c9680818 100644
>>>> --- a/drivers/md/raid1.c
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf,
>>>> int extra)
>>>> /* Stop sync I/O and normal I/O and wait for everything to
>>>> * go quiet.
>>>> * This is called in two situations:
>>>> - * 1) management command handlers (reshape, remove disk, quiesce).
>>>> + * 1) management command handlers (remove disk, quiesce).
>>>> * 2) one normal I/O request failed.
>>>> * After array_frozen is set to 1, new sync IO will be blocked at
>>>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>>>> wake_up(&conf->wait_barrier);
>>>> }
>>>> +/* conf->resync_lock should be held */
>>>> +static int get_pending(struct r1conf *conf)
>>>> +{
>>>> + int idx, ret;
>>>> +
>>>> + ret = atomic_read(&conf->nr_sync_pending);
>>>> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
>>>> + ret += atomic_read(&conf->nr_pending[idx]);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static void freeze_array_totally(struct r1conf *conf)
>>>> +{
>>>> + /*
>>>> + * freeze_array_totally() is almost the same with
>>>> freeze_array() except
>>>> + * it requires there's no queued io. Raid1's reshape will
>>>> destroy the
>>>> + * old mempool and change r1conf::raid_disks, which are
>>>> necessary when
>>>> + * freeing the queued io.
>>>> + */
>>>> + spin_lock_irq(&conf->resync_lock);
>>>> + conf->array_frozen = 1;
>>>> + raid1_log(conf->mddev, "freeze totally");
>>>> + wait_event_lock_irq_cmd(
>>>> + conf->wait_barrier,
>>>> + get_pending(conf) == 0,
>>>> + conf->resync_lock,
>>>> + md_wakeup_thread(conf->mddev->thread));
>>>> + spin_unlock_irq(&conf->resync_lock);
>>>> +}
>>>> +
>>>> static void alloc_behind_master_bio(struct r1bio *r1_bio,
>>>> struct bio *bio)
>>>> {
>>>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>>>> return -ENOMEM;
>>>> }
>>>> - freeze_array(conf, 0);
>>>> + freeze_array_totally(conf);
>>>
>>> I think this is wrong, raid1_reshape() can't be called with
>> Sorry about thi typo, I mean raid1_reshape() can be called with ...
> You're right, this is indeed a deadlock.
>
> I am wondering whether this approach is viable
>
> if (unlikely(atomic_read(conf->nr_queued))) {
> kfree(newpoolinfo);
> mempool_exit(&newpool);
> unfreeze_array(conf);
>
> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> md_wakeup_thread(mddev->thread);
> return -EBUSY;
> }
This is not okay, 'nr_queued' can be incresed at any time when normal io
failed, read it once doesn't mean anything, and you need to
freeze_array() before reading it:
freeze_array
// guarantee new io won't be dispatched
if (atomic_read(conf->nr_queued))
...
unfreeze_array
return -EBUSY;
Fortunately, I'm working on another patchset to synchronize io with
array configuration, which means all the callers of raid1_reshape() will
suspend the array, and no normal io will be in progress, hence this
problem won't exist as well.
Thanks,
Kuai
>
> Thanks,
> Hu
>
>>
>> Thanks,
>> Kuai
>>> 'reconfig_mutex' grabbed, and this will deadlock because failed io need
>>> this lock to be handled by daemon thread.(see details in [1]).
>>>
>>> Be aware that never hold 'reconfig_mutex' to wait for io.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>>>
>>>> /* ok, everything is stopped */
>>>> oldpool = conf->r1bio_pool;
>>>>
>>>
>>> .
>>>
>>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-08-01 1:24 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
2023-07-20 1:36 ` Yu Kuai
2023-07-20 1:37 ` Yu Kuai
2023-07-31 14:02 ` Xueshi Hu
2023-08-01 1:24 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
2023-07-20 1:47 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2023-07-19 7:38 ` Paul Menzel
2023-07-19 11:51 ` Xueshi Hu
2023-07-20 1:28 ` Yu Kuai
2023-07-28 14:42 ` Xueshi Hu
2023-07-29 0:58 ` Yu Kuai
2023-07-29 3:29 ` Xueshi Hu
2023-07-29 3:36 ` Yu Kuai
2023-07-29 3:51 ` Yu Kuai
2023-07-29 6:16 ` Xueshi Hu
2023-07-29 7:37 ` Yu Kuai
2023-07-29 12:23 ` Xueshi Hu
2023-07-31 1:03 ` Yu Kuai
2023-07-31 3:48 ` Xueshi Hu
2023-07-31 6:22 ` Yu Kuai
2023-07-31 14:12 ` Xueshi Hu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).