* [PATCH v3 0/3] don't change mempool if in-flight r1bio exists @ 2023-07-19 7:09 Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw) To: song; +Cc: linux-raid, yukuai1, Xueshi Hu All the r1bio should be freed before raid1_reshape() changes the mempool. However, freeze_array() doesn't guarantee there's no r1bio, as some r1bio maybe queued. Also, in raid1_write_request(), handle_read_error() and raid_end_bio_io() , kernel shall not allow_barrier() before the r1bio is properly handled. -> v2: - fix the problem one by one instead of calling blk_mq_freeze_queue() as suggested by Yu Kuai -> v3: - add freeze_array_totally() to replace freeze_array() instead of gave up in raid1_reshape() - add a missed fix in raid_end_bio_io() - add a small check at the start of raid1_reshape() Xueshi Hu (3): md/raid1: freeze array more strictly when reshape md/raid1: don't allow_barrier() before r1bio got freed md/raid1: check array size before reshape drivers/md/raid1.c | 59 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 11 deletions(-) -- 2.40.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape 2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu @ 2023-07-19 7:09 ` Xueshi Hu 2023-07-20 1:36 ` Yu Kuai 2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu 2 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw) To: song; +Cc: linux-raid, yukuai1, Xueshi Hu When an IO error happens, reschedule_retry() will increase r1conf::nr_queued, which makes freeze_array() unblocked. However, before all r1bio in the memory pool are released, the memory pool should not be modified. Introduce freeze_array_totally() to solve the problem. Compared to freeze_array(), it's more strict because any in-flight io needs to complete including queued io. Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> --- drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index dd25832eb045..5605c9680818 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra) /* Stop sync I/O and normal I/O and wait for everything to * go quiet. * This is called in two situations: - * 1) management command handlers (reshape, remove disk, quiesce). + * 1) management command handlers (remove disk, quiesce). * 2) one normal I/O request failed. * After array_frozen is set to 1, new sync IO will be blocked at @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf) wake_up(&conf->wait_barrier); } +/* conf->resync_lock should be held */ +static int get_pending(struct r1conf *conf) +{ + int idx, ret; + + ret = atomic_read(&conf->nr_sync_pending); + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) + ret += atomic_read(&conf->nr_pending[idx]); + + return ret; +} + +static void freeze_array_totally(struct r1conf *conf) +{ + /* + * freeze_array_totally() is almost the same with freeze_array() except + * it requires there's no queued io. Raid1's reshape will destroy the + * old mempool and change r1conf::raid_disks, which are necessary when + * freeing the queued io. + */ + spin_lock_irq(&conf->resync_lock); + conf->array_frozen = 1; + raid1_log(conf->mddev, "freeze totally"); + wait_event_lock_irq_cmd( + conf->wait_barrier, + get_pending(conf) == 0, + conf->resync_lock, + md_wakeup_thread(conf->mddev->thread)); + spin_unlock_irq(&conf->resync_lock); +} + static void alloc_behind_master_bio(struct r1bio *r1_bio, struct bio *bio) { @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev) return -ENOMEM; } - freeze_array(conf, 0); + freeze_array_totally(conf); /* ok, everything is stopped */ oldpool = conf->r1bio_pool; -- 2.40.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape 2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu @ 2023-07-20 1:36 ` Yu Kuai 2023-07-20 1:37 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-20 1:36 UTC (permalink / raw) To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C) Hi, 在 2023/07/19 15:09, Xueshi Hu 写道: > When an IO error happens, reschedule_retry() will increase > r1conf::nr_queued, which makes freeze_array() unblocked. However, before > all r1bio in the memory pool are released, the memory pool should not be > modified. Introduce freeze_array_totally() to solve the problem. Compared > to freeze_array(), it's more strict because any in-flight io needs to > complete including queued io. > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > --- > drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++-- > 1 file changed, 33 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index dd25832eb045..5605c9680818 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra) > /* Stop sync I/O and normal I/O and wait for everything to > * go quiet. > * This is called in two situations: > - * 1) management command handlers (reshape, remove disk, quiesce). > + * 1) management command handlers (remove disk, quiesce). > * 2) one normal I/O request failed. > > * After array_frozen is set to 1, new sync IO will be blocked at > @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf) > wake_up(&conf->wait_barrier); > } > > +/* conf->resync_lock should be held */ > +static int get_pending(struct r1conf *conf) > +{ > + int idx, ret; > + > + ret = atomic_read(&conf->nr_sync_pending); > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > + ret += atomic_read(&conf->nr_pending[idx]); > + > + return ret; > +} > + > +static void freeze_array_totally(struct r1conf *conf) > +{ > + /* > + * freeze_array_totally() is almost the same with freeze_array() except > + * it requires there's no queued io. Raid1's reshape will destroy the > + * old mempool and change r1conf::raid_disks, which are necessary when > + * freeing the queued io. > + */ > + spin_lock_irq(&conf->resync_lock); > + conf->array_frozen = 1; > + raid1_log(conf->mddev, "freeze totally"); > + wait_event_lock_irq_cmd( > + conf->wait_barrier, > + get_pending(conf) == 0, > + conf->resync_lock, > + md_wakeup_thread(conf->mddev->thread)); > + spin_unlock_irq(&conf->resync_lock); > +} > + > static void alloc_behind_master_bio(struct r1bio *r1_bio, > struct bio *bio) > { > @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev) > return -ENOMEM; > } > > - freeze_array(conf, 0); > + freeze_array_totally(conf); I think this is wrong, raid1_reshape() can't be called with 'reconfig_mutex' grabbed, and this will deadlock because failed io need this lock to be handled by daemon thread.(see details in [1]). Be aware that never hold 'reconfig_mutex' to wait for io. [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 > > /* ok, everything is stopped */ > oldpool = conf->r1bio_pool; > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape 2023-07-20 1:36 ` Yu Kuai @ 2023-07-20 1:37 ` Yu Kuai 2023-07-31 14:02 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-20 1:37 UTC (permalink / raw) To: Yu Kuai, Xueshi Hu, song; +Cc: linux-raid, yukuai (C) Hi, 在 2023/07/20 9:36, Yu Kuai 写道: > Hi, > > 在 2023/07/19 15:09, Xueshi Hu 写道: >> When an IO error happens, reschedule_retry() will increase >> r1conf::nr_queued, which makes freeze_array() unblocked. However, before >> all r1bio in the memory pool are released, the memory pool should not be >> modified. Introduce freeze_array_totally() to solve the problem. Compared >> to freeze_array(), it's more strict because any in-flight io needs to >> complete including queued io. >> >> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >> --- >> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++-- >> 1 file changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index dd25832eb045..5605c9680818 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, >> int extra) >> /* Stop sync I/O and normal I/O and wait for everything to >> * go quiet. >> * This is called in two situations: >> - * 1) management command handlers (reshape, remove disk, quiesce). >> + * 1) management command handlers (remove disk, quiesce). >> * 2) one normal I/O request failed. >> * After array_frozen is set to 1, new sync IO will be blocked at >> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf) >> wake_up(&conf->wait_barrier); >> } >> +/* conf->resync_lock should be held */ >> +static int get_pending(struct r1conf *conf) >> +{ >> + int idx, ret; >> + >> + ret = atomic_read(&conf->nr_sync_pending); >> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) >> + ret += atomic_read(&conf->nr_pending[idx]); >> + >> + return ret; >> +} >> + >> +static void freeze_array_totally(struct r1conf *conf) >> +{ >> + /* >> + * freeze_array_totally() is almost the same with freeze_array() >> except >> + * it requires there's no queued io. Raid1's reshape will destroy >> the >> + * old mempool and change r1conf::raid_disks, which are necessary >> when >> + * freeing the queued io. >> + */ >> + spin_lock_irq(&conf->resync_lock); >> + conf->array_frozen = 1; >> + raid1_log(conf->mddev, "freeze totally"); >> + wait_event_lock_irq_cmd( >> + conf->wait_barrier, >> + get_pending(conf) == 0, >> + conf->resync_lock, >> + md_wakeup_thread(conf->mddev->thread)); >> + spin_unlock_irq(&conf->resync_lock); >> +} >> + >> static void alloc_behind_master_bio(struct r1bio *r1_bio, >> struct bio *bio) >> { >> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev) >> return -ENOMEM; >> } >> - freeze_array(conf, 0); >> + freeze_array_totally(conf); > > I think this is wrong, raid1_reshape() can't be called with Sorry about thi typo, I mean raid1_reshape() can be called with ... Thanks, Kuai > 'reconfig_mutex' grabbed, and this will deadlock because failed io need > this lock to be handled by daemon thread.(see details in [1]). > > Be aware that never hold 'reconfig_mutex' to wait for io. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 > >> /* ok, everything is stopped */ >> oldpool = conf->r1bio_pool; >> > > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape 2023-07-20 1:37 ` Yu Kuai @ 2023-07-31 14:02 ` Xueshi Hu 2023-08-01 1:24 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-31 14:02 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/20 9:36, Yu Kuai 写道: > > Hi, > > > > 在 2023/07/19 15:09, Xueshi Hu 写道: > > > When an IO error happens, reschedule_retry() will increase > > > r1conf::nr_queued, which makes freeze_array() unblocked. However, before > > > all r1bio in the memory pool are released, the memory pool should not be > > > modified. Introduce freeze_array_totally() to solve the problem. Compared > > > to freeze_array(), it's more strict because any in-flight io needs to > > > complete including queued io. > > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > > --- > > > drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++-- > > > 1 file changed, 33 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > index dd25832eb045..5605c9680818 100644 > > > --- a/drivers/md/raid1.c > > > +++ b/drivers/md/raid1.c > > > @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, > > > int extra) > > > /* Stop sync I/O and normal I/O and wait for everything to > > > * go quiet. > > > * This is called in two situations: > > > - * 1) management command handlers (reshape, remove disk, quiesce). > > > + * 1) management command handlers (remove disk, quiesce). > > > * 2) one normal I/O request failed. > > > * After array_frozen is set to 1, new sync IO will be blocked at > > > @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf) > > > wake_up(&conf->wait_barrier); > > > } > > > +/* conf->resync_lock should be held */ > > > +static int get_pending(struct r1conf *conf) > > > +{ > > > + int idx, ret; > > > + > > > + ret = atomic_read(&conf->nr_sync_pending); > > > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > > > + ret += atomic_read(&conf->nr_pending[idx]); > > > + > > > + return ret; > > > +} > > > + > > > +static void freeze_array_totally(struct r1conf *conf) > > > +{ > > > + /* > > > + * freeze_array_totally() is almost the same with > > > freeze_array() except > > > + * it requires there's no queued io. Raid1's reshape will > > > destroy the > > > + * old mempool and change r1conf::raid_disks, which are > > > necessary when > > > + * freeing the queued io. > > > + */ > > > + spin_lock_irq(&conf->resync_lock); > > > + conf->array_frozen = 1; > > > + raid1_log(conf->mddev, "freeze totally"); > > > + wait_event_lock_irq_cmd( > > > + conf->wait_barrier, > > > + get_pending(conf) == 0, > > > + conf->resync_lock, > > > + md_wakeup_thread(conf->mddev->thread)); > > > + spin_unlock_irq(&conf->resync_lock); > > > +} > > > + > > > static void alloc_behind_master_bio(struct r1bio *r1_bio, > > > struct bio *bio) > > > { > > > @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev) > > > return -ENOMEM; > > > } > > > - freeze_array(conf, 0); > > > + freeze_array_totally(conf); > > > > I think this is wrong, raid1_reshape() can't be called with > Sorry about thi typo, I mean raid1_reshape() can be called with ... You're right, this is indeed a deadlock. I am wondering whether this approach is viable if (unlikely(atomic_read(conf->nr_queued))) { kfree(newpoolinfo); mempool_exit(&newpool); unfreeze_array(conf); set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); return -EBUSY; } Thanks, Hu > > Thanks, > Kuai > > 'reconfig_mutex' grabbed, and this will deadlock because failed io need > > this lock to be handled by daemon thread.(see details in [1]). > > > > Be aware that never hold 'reconfig_mutex' to wait for io. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 > > > > > /* ok, everything is stopped */ > > > oldpool = conf->r1bio_pool; > > > > > > > . > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape 2023-07-31 14:02 ` Xueshi Hu @ 2023-08-01 1:24 ` Yu Kuai 0 siblings, 0 replies; 24+ messages in thread From: Yu Kuai @ 2023-08-01 1:24 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/31 22:02, Xueshi Hu 写道: > On Thu, Jul 20, 2023 at 09:37:38AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/20 9:36, Yu Kuai 写道: >>> Hi, >>> >>> 在 2023/07/19 15:09, Xueshi Hu 写道: >>>> When an IO error happens, reschedule_retry() will increase >>>> r1conf::nr_queued, which makes freeze_array() unblocked. However, before >>>> all r1bio in the memory pool are released, the memory pool should not be >>>> modified. Introduce freeze_array_totally() to solve the problem. Compared >>>> to freeze_array(), it's more strict because any in-flight io needs to >>>> complete including queued io. >>>> >>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >>>> --- >>>> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 33 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>> index dd25832eb045..5605c9680818 100644 >>>> --- a/drivers/md/raid1.c >>>> +++ b/drivers/md/raid1.c >>>> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, >>>> int extra) >>>> /* Stop sync I/O and normal I/O and wait for everything to >>>> * go quiet. >>>> * This is called in two situations: >>>> - * 1) management command handlers (reshape, remove disk, quiesce). >>>> + * 1) management command handlers (remove disk, quiesce). >>>> * 2) one normal I/O request failed. >>>> * After array_frozen is set to 1, new sync IO will be blocked at >>>> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf) >>>> wake_up(&conf->wait_barrier); >>>> } >>>> +/* conf->resync_lock should be held */ >>>> +static int get_pending(struct r1conf *conf) >>>> +{ >>>> + int idx, ret; >>>> + >>>> + ret = atomic_read(&conf->nr_sync_pending); >>>> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) >>>> + ret += atomic_read(&conf->nr_pending[idx]); >>>> + >>>> + return ret; >>>> +} >>>> + >>>> +static void freeze_array_totally(struct r1conf *conf) >>>> +{ >>>> + /* >>>> + * freeze_array_totally() is almost the same with >>>> freeze_array() except >>>> + * it requires there's no queued io. Raid1's reshape will >>>> destroy the >>>> + * old mempool and change r1conf::raid_disks, which are >>>> necessary when >>>> + * freeing the queued io. >>>> + */ >>>> + spin_lock_irq(&conf->resync_lock); >>>> + conf->array_frozen = 1; >>>> + raid1_log(conf->mddev, "freeze totally"); >>>> + wait_event_lock_irq_cmd( >>>> + conf->wait_barrier, >>>> + get_pending(conf) == 0, >>>> + conf->resync_lock, >>>> + md_wakeup_thread(conf->mddev->thread)); >>>> + spin_unlock_irq(&conf->resync_lock); >>>> +} >>>> + >>>> static void alloc_behind_master_bio(struct r1bio *r1_bio, >>>> struct bio *bio) >>>> { >>>> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev) >>>> return -ENOMEM; >>>> } >>>> - freeze_array(conf, 0); >>>> + freeze_array_totally(conf); >>> >>> I think this is wrong, raid1_reshape() can't be called with >> Sorry about thi typo, I mean raid1_reshape() can be called with ... > You're right, this is indeed a deadlock. > > I am wondering whether this approach is viable > > if (unlikely(atomic_read(conf->nr_queued))) { > kfree(newpoolinfo); > mempool_exit(&newpool); > unfreeze_array(conf); > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > md_wakeup_thread(mddev->thread); > return -EBUSY; > } This is not okay, 'nr_queued' can be incresed at any time when normal io failed, read it once doesn't mean anything, and you need to freeze_array() before reading it: freeze_array // guarantee new io won't be dispatched if (atomic_read(conf->nr_queued)) ... unfreeze_array return -EBUSY; Fortunately, I'm working on another patchset to synchronize io with array configuration, which means all the callers of raid1_reshape() will suspend the array, and no normal io will be in progress, hence this problem won't exist as well. Thanks, Kuai > > Thanks, > Hu > >> >> Thanks, >> Kuai >>> 'reconfig_mutex' grabbed, and this will deadlock because failed io need >>> this lock to be handled by daemon thread.(see details in [1]). >>> >>> Be aware that never hold 'reconfig_mutex' to wait for io. >>> >>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463 >>> >>>> /* ok, everything is stopped */ >>>> oldpool = conf->r1bio_pool; >>>> >>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed 2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu @ 2023-07-19 7:09 ` Xueshi Hu 2023-07-20 1:47 ` Yu Kuai 2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu 2 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw) To: song; +Cc: linux-raid, yukuai1, Xueshi Hu allow_barrier() make reshape possible. Raid reshape changes the r1conf::raid_disks and mempool. Free the r1bio firstly and then call allow_barrier() Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> --- drivers/md/raid1.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 5605c9680818..62e86b7d1561 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio) { struct bio *bio = r1_bio->master_bio; struct r1conf *conf = r1_bio->mddev->private; + sector_t sector = r1_bio->sector; /* if nobody has done the final endio yet, do it now */ if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { @@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio) call_bio_endio(r1_bio); } + + free_r1bio(r1_bio); /* * Wake up any possible resync thread that waits for the device * to go idle. All I/Os, even write-behind writes, are done. */ - allow_barrier(conf, r1_bio->sector); - - free_r1bio(r1_bio); + allow_barrier(conf, sector); } /* @@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, return; } + retry_write: r1_bio = alloc_r1bio(mddev, bio); r1_bio->sectors = max_write_sectors; @@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, */ disks = conf->raid_disks * 2; - retry_write: blocked_rdev = NULL; rcu_read_lock(); max_sectors = r1_bio->sectors; @@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, for (j = 0; j < i; j++) if (r1_bio->bios[j]) rdev_dec_pending(conf->mirrors[j].rdev, mddev); - r1_bio->state = 0; + free_r1bio(r1_bio); allow_barrier(conf, bio->bi_iter.bi_sector); if (bio->bi_opf & REQ_NOWAIT) { @@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) struct mddev *mddev = conf->mddev; struct bio *bio; struct md_rdev *rdev; + sector_t sector; clear_bit(R1BIO_ReadError, &r1_bio->state); /* we got a read error. Maybe the drive is bad. Maybe just @@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) } rdev_dec_pending(rdev, conf->mddev); - allow_barrier(conf, r1_bio->sector); + sector = r1_bio->sector; bio = r1_bio->master_bio; /* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */ r1_bio->state = 0; raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio); + allow_barrier(conf, sector); } static void raid1d(struct md_thread *thread) -- 2.40.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed 2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu @ 2023-07-20 1:47 ` Yu Kuai 0 siblings, 0 replies; 24+ messages in thread From: Yu Kuai @ 2023-07-20 1:47 UTC (permalink / raw) To: Xueshi Hu, song; +Cc: linux-raid, yukuai1, yukuai (C) Hi, 在 2023/07/19 15:09, Xueshi Hu 写道: > allow_barrier() make reshape possible. Raid reshape changes the > r1conf::raid_disks and mempool. Free the r1bio firstly and then call > allow_barrier() > After adding fixes tags, feel free to add: Reviewed-by: Yu Kuai <yukuai3@huawei.com> > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > --- > drivers/md/raid1.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 5605c9680818..62e86b7d1561 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -313,6 +313,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio) > { > struct bio *bio = r1_bio->master_bio; > struct r1conf *conf = r1_bio->mddev->private; > + sector_t sector = r1_bio->sector; > > /* if nobody has done the final endio yet, do it now */ > if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { > @@ -323,13 +324,13 @@ static void raid_end_bio_io(struct r1bio *r1_bio) > > call_bio_endio(r1_bio); > } > + > + free_r1bio(r1_bio); > /* > * Wake up any possible resync thread that waits for the device > * to go idle. All I/Os, even write-behind writes, are done. > */ > - allow_barrier(conf, r1_bio->sector); > - > - free_r1bio(r1_bio); > + allow_barrier(conf, sector); > } > > /* > @@ -1404,6 +1405,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > return; > } > > + retry_write: > r1_bio = alloc_r1bio(mddev, bio); > r1_bio->sectors = max_write_sectors; > > @@ -1419,7 +1421,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > */ > > disks = conf->raid_disks * 2; > - retry_write: > blocked_rdev = NULL; > rcu_read_lock(); > max_sectors = r1_bio->sectors; > @@ -1499,7 +1500,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > for (j = 0; j < i; j++) > if (r1_bio->bios[j]) > rdev_dec_pending(conf->mirrors[j].rdev, mddev); > - r1_bio->state = 0; > + free_r1bio(r1_bio); > allow_barrier(conf, bio->bi_iter.bi_sector); > > if (bio->bi_opf & REQ_NOWAIT) { > @@ -2529,6 +2530,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > struct mddev *mddev = conf->mddev; > struct bio *bio; > struct md_rdev *rdev; > + sector_t sector; > > clear_bit(R1BIO_ReadError, &r1_bio->state); > /* we got a read error. Maybe the drive is bad. Maybe just > @@ -2558,12 +2560,13 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > } > > rdev_dec_pending(rdev, conf->mddev); > - allow_barrier(conf, r1_bio->sector); > + sector = r1_bio->sector; > bio = r1_bio->master_bio; > > /* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */ > r1_bio->state = 0; > raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio); > + allow_barrier(conf, sector); > } > > static void raid1d(struct md_thread *thread) > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu @ 2023-07-19 7:09 ` Xueshi Hu 2023-07-19 7:38 ` Paul Menzel 2 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-19 7:09 UTC (permalink / raw) To: song; +Cc: linux-raid, yukuai1, Xueshi Hu If array size doesn't changed, nothing need to do. Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> --- drivers/md/raid1.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 62e86b7d1561..5840b8b0f9b7 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) int d, d2; int ret; - memset(&newpool, 0, sizeof(newpool)); - memset(&oldpool, 0, sizeof(oldpool)); - /* Cannot change chunk_size, layout, or level */ if (mddev->chunk_sectors != mddev->new_chunk_sectors || mddev->layout != mddev->new_layout || @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) return -EINVAL; } + if (mddev->delta_disks == 0) + return 0; /* nothing to do */ + + memset(&newpool, 0, sizeof(newpool)); + memset(&oldpool, 0, sizeof(oldpool)); + if (!mddev_is_clustered(mddev)) md_allow_write(mddev); -- 2.40.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu @ 2023-07-19 7:38 ` Paul Menzel 2023-07-19 11:51 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Paul Menzel @ 2023-07-19 7:38 UTC (permalink / raw) To: Xueshi Hu; +Cc: song, linux-raid, yukuai1 Dear Xueshi, Thank you for your patches. Am 19.07.23 um 09:09 schrieb Xueshi Hu: > If array size doesn't changed, nothing need to do. Maybe: … nothing needs to be done. Do you have a test case to reproduce it? Kind regards, Paul > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > --- > drivers/md/raid1.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 62e86b7d1561..5840b8b0f9b7 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > int d, d2; > int ret; > > - memset(&newpool, 0, sizeof(newpool)); > - memset(&oldpool, 0, sizeof(oldpool)); > - > /* Cannot change chunk_size, layout, or level */ > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > mddev->layout != mddev->new_layout || > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > return -EINVAL; > } > > + if (mddev->delta_disks == 0) > + return 0; /* nothing to do */ > + > + memset(&newpool, 0, sizeof(newpool)); > + memset(&oldpool, 0, sizeof(oldpool)); > + > if (!mddev_is_clustered(mddev)) > md_allow_write(mddev); > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-19 7:38 ` Paul Menzel @ 2023-07-19 11:51 ` Xueshi Hu 2023-07-20 1:28 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-19 11:51 UTC (permalink / raw) To: Paul Menzel; +Cc: Song Liu, linux-raid, Yu Kuai On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > Dear Xueshi, > > > Thank you for your patches. > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > If array size doesn't changed, nothing need to do. > > Maybe: … nothing needs to be done. What a hurry, I'll be cautious next time and check the sentences with tools. > > Do you have a test case to reproduce it? > Userspace command: echo 4 > /sys/devices/virtual/block/md10/md/raid_disks Kernel function calling flow: md_attr_store() raid_disks_store() update_raid_disks() raid1_reshape() Maybe I shall provide more information when submit patches, thank you for reminding me. > > Kind regards, > > Paul > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > --- > > drivers/md/raid1.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index 62e86b7d1561..5840b8b0f9b7 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > int d, d2; > > int ret; > > - memset(&newpool, 0, sizeof(newpool)); > > - memset(&oldpool, 0, sizeof(oldpool)); > > - > > /* Cannot change chunk_size, layout, or level */ > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > mddev->layout != mddev->new_layout || > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > return -EINVAL; > > } > > + if (mddev->delta_disks == 0) > > + return 0; /* nothing to do */ > > + > > + memset(&newpool, 0, sizeof(newpool)); > > + memset(&oldpool, 0, sizeof(oldpool)); > > + > > if (!mddev_is_clustered(mddev)) > > md_allow_write(mddev); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-19 11:51 ` Xueshi Hu @ 2023-07-20 1:28 ` Yu Kuai 2023-07-28 14:42 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-20 1:28 UTC (permalink / raw) To: Xueshi Hu, Paul Menzel; +Cc: Song Liu, linux-raid, yukuai (C) Hi, 在 2023/07/19 19:51, Xueshi Hu 写道: > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: >> Dear Xueshi, >> >> >> Thank you for your patches. >> >> Am 19.07.23 um 09:09 schrieb Xueshi Hu: >>> If array size doesn't changed, nothing need to do. >> >> Maybe: … nothing needs to be done. > What a hurry, I'll be cautious next time and check the sentences with > tools. >> >> Do you have a test case to reproduce it? >> > Userspace command: > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > Kernel function calling flow: > > md_attr_store() > raid_disks_store() > update_raid_disks() > raid1_reshape() > > Maybe I shall provide more information when submit patches, thank you for > reminding me. >> >> Kind regards, >> >> Paul >> >> >>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >>> --- >>> drivers/md/raid1.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>> index 62e86b7d1561..5840b8b0f9b7 100644 >>> --- a/drivers/md/raid1.c >>> +++ b/drivers/md/raid1.c >>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) >>> int d, d2; >>> int ret; >>> - memset(&newpool, 0, sizeof(newpool)); >>> - memset(&oldpool, 0, sizeof(oldpool)); >>> - >>> /* Cannot change chunk_size, layout, or level */ >>> if (mddev->chunk_sectors != mddev->new_chunk_sectors || >>> mddev->layout != mddev->new_layout || >>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) >>> return -EINVAL; >>> } >>> + if (mddev->delta_disks == 0) >>> + return 0; /* nothing to do */ I think this is wrong, you should at least keep following: set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); Thanks, Kuai >>> + >>> + memset(&newpool, 0, sizeof(newpool)); >>> + memset(&oldpool, 0, sizeof(oldpool)); >>> + >>> if (!mddev_is_clustered(mddev)) >>> md_allow_write(mddev); > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-20 1:28 ` Yu Kuai @ 2023-07-28 14:42 ` Xueshi Hu 2023-07-29 0:58 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-28 14:42 UTC (permalink / raw) To: Yu Kuai; +Cc: song, linux-raid, pmenzel On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/19 19:51, Xueshi Hu 写道: > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > > > Dear Xueshi, > > > > > > > > > Thank you for your patches. > > > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > > > If array size doesn't changed, nothing need to do. > > > > > > Maybe: … nothing needs to be done. > > What a hurry, I'll be cautious next time and check the sentences with > > tools. > > > > > > Do you have a test case to reproduce it? > > > > > Userspace command: > > > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > > > Kernel function calling flow: > > > > md_attr_store() > > raid_disks_store() > > update_raid_disks() > > raid1_reshape() > > > > Maybe I shall provide more information when submit patches, thank you for > > reminding me. > > > > > > Kind regards, > > > > > > Paul > > > > > > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > > > --- > > > > drivers/md/raid1.c | 9 ++++++--- > > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > > index 62e86b7d1561..5840b8b0f9b7 100644 > > > > --- a/drivers/md/raid1.c > > > > +++ b/drivers/md/raid1.c > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > > > int d, d2; > > > > int ret; > > > > - memset(&newpool, 0, sizeof(newpool)); > > > > - memset(&oldpool, 0, sizeof(oldpool)); > > > > - > > > > /* Cannot change chunk_size, layout, or level */ > > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > > > mddev->layout != mddev->new_layout || > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > > > return -EINVAL; > > > > } > > > > + if (mddev->delta_disks == 0) > > > > + return 0; /* nothing to do */ > > I think this is wrong, you should at least keep following: > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > md_wakeup_thread(mddev->thread); > I fail to comprehend the rationale behind the kernel's need to invoke raid1d() repeatedly despite the absence of any modifications. It appears that raid1d() is responsible for invoking md_check_recovery() and resubmit the failed io. Evidently, it is not within the scope of raid1_reshape() to resubmit failed io. Moreover, upon reviewing the Documentation[1], I could not find any indications that reshape must undertake the actions as specified in md_check_recovery()'s documentation, such as eliminating faulty devices. [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html > Thanks, > Kuai > > > > > + > > > > + memset(&newpool, 0, sizeof(newpool)); > > > > + memset(&oldpool, 0, sizeof(oldpool)); > > > > + > > > > if (!mddev_is_clustered(mddev)) > > > > md_allow_write(mddev); > > . > > > Thanks, Hu ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-28 14:42 ` Xueshi Hu @ 2023-07-29 0:58 ` Yu Kuai 2023-07-29 3:29 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-29 0:58 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: song, linux-raid, pmenzel, yukuai (C) Hi, 在 2023/07/28 22:42, Xueshi Hu 写道: > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/19 19:51, Xueshi Hu 写道: >>> On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: >>>> Dear Xueshi, >>>> >>>> >>>> Thank you for your patches. >>>> >>>> Am 19.07.23 um 09:09 schrieb Xueshi Hu: >>>>> If array size doesn't changed, nothing need to do. >>>> >>>> Maybe: … nothing needs to be done. >>> What a hurry, I'll be cautious next time and check the sentences with >>> tools. >>>> >>>> Do you have a test case to reproduce it? >>>> >>> Userspace command: >>> >>> echo 4 > /sys/devices/virtual/block/md10/md/raid_disks >>> >>> Kernel function calling flow: >>> >>> md_attr_store() >>> raid_disks_store() >>> update_raid_disks() >>> raid1_reshape() >>> >>> Maybe I shall provide more information when submit patches, thank you for >>> reminding me. >>>> >>>> Kind regards, >>>> >>>> Paul >>>> >>>> >>>>> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> >>>>> --- >>>>> drivers/md/raid1.c | 9 ++++++--- >>>>> 1 file changed, 6 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>>> index 62e86b7d1561..5840b8b0f9b7 100644 >>>>> --- a/drivers/md/raid1.c >>>>> +++ b/drivers/md/raid1.c >>>>> @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) >>>>> int d, d2; >>>>> int ret; >>>>> - memset(&newpool, 0, sizeof(newpool)); >>>>> - memset(&oldpool, 0, sizeof(oldpool)); >>>>> - >>>>> /* Cannot change chunk_size, layout, or level */ >>>>> if (mddev->chunk_sectors != mddev->new_chunk_sectors || >>>>> mddev->layout != mddev->new_layout || >>>>> @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) >>>>> return -EINVAL; >>>>> } >>>>> + if (mddev->delta_disks == 0) >>>>> + return 0; /* nothing to do */ >> >> I think this is wrong, you should at least keep following: >> >> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> md_wakeup_thread(mddev->thread); >> > I fail to comprehend the rationale behind the kernel's need to invoke > raid1d() repeatedly despite the absence of any modifications. > It appears that raid1d() is responsible for invoking md_check_recovery() > and resubmit the failed io. No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, so that md_check_recovery() can start to reshape. Thanks, Kuai > > Evidently, it is not within the scope of raid1_reshape() to resubmit > failed io. > > Moreover, upon reviewing the Documentation[1], I could not find any > indications that reshape must undertake the actions as specified in > md_check_recovery()'s documentation, such as eliminating faulty devices. > > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html >> Thanks, >> Kuai >> >>>>> + >>>>> + memset(&newpool, 0, sizeof(newpool)); >>>>> + memset(&oldpool, 0, sizeof(oldpool)); >>>>> + >>>>> if (!mddev_is_clustered(mddev)) >>>>> md_allow_write(mddev); >>> . >>> >> > Thanks, > Hu > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 0:58 ` Yu Kuai @ 2023-07-29 3:29 ` Xueshi Hu 2023-07-29 3:36 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-29 3:29 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Sat, Jul 29, 2023 at 08:58:45AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/28 22:42, Xueshi Hu 写道: > > On Thu, Jul 20, 2023 at 09:28:34AM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/19 19:51, Xueshi Hu 写道: > > > > On Wed, Jul 19, 2023 at 09:38:26AM +0200, Paul Menzel wrote: > > > > > Dear Xueshi, > > > > > > > > > > > > > > > Thank you for your patches. > > > > > > > > > > Am 19.07.23 um 09:09 schrieb Xueshi Hu: > > > > > > If array size doesn't changed, nothing need to do. > > > > > > > > > > Maybe: … nothing needs to be done. > > > > What a hurry, I'll be cautious next time and check the sentences with > > > > tools. > > > > > > > > > > Do you have a test case to reproduce it? > > > > > > > > > Userspace command: > > > > > > > > echo 4 > /sys/devices/virtual/block/md10/md/raid_disks > > > > > > > > Kernel function calling flow: > > > > > > > > md_attr_store() > > > > raid_disks_store() > > > > update_raid_disks() > > > > raid1_reshape() > > > > > > > > Maybe I shall provide more information when submit patches, thank you for > > > > reminding me. > > > > > > > > > > Kind regards, > > > > > > > > > > Paul > > > > > > > > > > > > > > > > Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com> > > > > > > --- > > > > > > drivers/md/raid1.c | 9 ++++++--- > > > > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > > > > index 62e86b7d1561..5840b8b0f9b7 100644 > > > > > > --- a/drivers/md/raid1.c > > > > > > +++ b/drivers/md/raid1.c > > > > > > @@ -3282,9 +3282,6 @@ static int raid1_reshape(struct mddev *mddev) > > > > > > int d, d2; > > > > > > int ret; > > > > > > - memset(&newpool, 0, sizeof(newpool)); > > > > > > - memset(&oldpool, 0, sizeof(oldpool)); > > > > > > - > > > > > > /* Cannot change chunk_size, layout, or level */ > > > > > > if (mddev->chunk_sectors != mddev->new_chunk_sectors || > > > > > > mddev->layout != mddev->new_layout || > > > > > > @@ -3295,6 +3292,12 @@ static int raid1_reshape(struct mddev *mddev) > > > > > > return -EINVAL; > > > > > > } > > > > > > + if (mddev->delta_disks == 0) > > > > > > + return 0; /* nothing to do */ > > > > > > I think this is wrong, you should at least keep following: > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > md_wakeup_thread(mddev->thread); > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > raid1d() repeatedly despite the absence of any modifications. > > It appears that raid1d() is responsible for invoking md_check_recovery() > > and resubmit the failed io. > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > so that md_check_recovery() can start to reshape. I apologize, but I am still unable to comprehend your idea. If mmdev->delta_disks == 0 , what is the purpose of invoking md_check_recovery() again? > > Thanks, > Kuai > > > > Evidently, it is not within the scope of raid1_reshape() to resubmit > > failed io. > > > > Moreover, upon reviewing the Documentation[1], I could not find any > > indications that reshape must undertake the actions as specified in > > md_check_recovery()'s documentation, such as eliminating faulty devices. > > > > [1]: https://www.kernel.org/doc/html/latest/admin-guide/md.html > > > Thanks, > > > Kuai > > > > > > > > > + > > > > > > + memset(&newpool, 0, sizeof(newpool)); > > > > > > + memset(&oldpool, 0, sizeof(oldpool)); > > > > > > + > > > > > > if (!mddev_is_clustered(mddev)) > > > > > > md_allow_write(mddev); > > > > . > > > > > > > > > Thanks, > > Hu > > . > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 3:29 ` Xueshi Hu @ 2023-07-29 3:36 ` Yu Kuai 2023-07-29 3:51 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-29 3:36 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/29 11:29, Xueshi Hu 写道: >>>> I think this is wrong, you should at least keep following: >>>> >>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>> md_wakeup_thread(mddev->thread); >>>> >>> I fail to comprehend the rationale behind the kernel's need to invoke >>> raid1d() repeatedly despite the absence of any modifications. >>> It appears that raid1d() is responsible for invoking md_check_recovery() >>> and resubmit the failed io. >> >> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >> so that md_check_recovery() can start to reshape. > I apologize, but I am still unable to comprehend your idea. > If mmdev->delta_disks == 0 , what is the purpose of invoking > md_check_recovery() again? Sounds like you think raid1_reshape() can only be called from md_check_recovery(), please take a look at other callers. Thanks, Kuai ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 3:36 ` Yu Kuai @ 2023-07-29 3:51 ` Yu Kuai 2023-07-29 6:16 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-29 3:51 UTC (permalink / raw) To: Yu Kuai, Xueshi Hu; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/29 11:36, Yu Kuai 写道: > Hi, > > 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>> I think this is wrong, you should at least keep following: >>>>> >>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>> md_wakeup_thread(mddev->thread); >>>>> >>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>> raid1d() repeatedly despite the absence of any modifications. >>>> It appears that raid1d() is responsible for invoking >>>> md_check_recovery() >>>> and resubmit the failed io. >>> >>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>> so that md_check_recovery() can start to reshape. >> I apologize, but I am still unable to comprehend your idea. >> If mmdev->delta_disks == 0 , what is the purpose of invoking >> md_check_recovery() again? > > Sounds like you think raid1_reshape() can only be called from > md_check_recovery(), please take a look at other callers. And even if raid1_reshape() is called from md_check_recovery(), which means reshape is interupted, then MD_RECOVERY_RECOVER and MD_RECOVERY_RECOVER need to be set, so that reshape can continue. Thanks, Kuai > > Thanks, > Kuai > > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 3:51 ` Yu Kuai @ 2023-07-29 6:16 ` Xueshi Hu 2023-07-29 7:37 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-29 6:16 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 11:36, Yu Kuai 写道: > > Hi, > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > It appears that raid1d() is responsible for invoking > > > > > md_check_recovery() > > > > > and resubmit the failed io. > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > so that md_check_recovery() can start to reshape. > > > I apologize, but I am still unable to comprehend your idea. > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > md_check_recovery() again? > > > > Sounds like you think raid1_reshape() can only be called from > > md_check_recovery(), please take a look at other callers. Thank you for the quick reply and patience. Of course, I have checked all the caller of md_personality::check_reshape. - layout_store - action_store - chunk_size_store - md_ioctl __md_set_array_info update_array_info - md_check_recovery - md_ioctl __md_set_array_info update_array_info update_raid_disks - process_metadata_update md_reload_sb check_sb_changes update_raid_disks - raid_disks_store update_raid_disks There are three categories of callers except md_check_recovery(). 1. write to sysfs 2. ioctl 3. revice instructions from md cluster peer Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an example, if the mddev::raid_disks is already 4, I don't think raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED bit, then wake up the mddev::thread to call md_check_recovery(). Is there any counterexample to demonstrate the issues that may arise if md_check_recovery() is not called out of raid1_reshape() ? > > And even if raid1_reshape() is called from md_check_recovery(), > which means reshape is interupted, then MD_RECOVERY_RECOVER > and MD_RECOVERY_RECOVER need to be set, so that reshape can > continue. raid1 only register md_personality::check_reshape, all the work related with reshape are handle in md_personality::check_reshape. What's the meaning of "reshape can continue" ? In another word, what's the additional work need to do if mddev::raid_disks doesn't change ? Thanks, Hu > > Thanks, > Kuai > > > > Thanks, > > Kuai > > > > . > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 6:16 ` Xueshi Hu @ 2023-07-29 7:37 ` Yu Kuai 2023-07-29 12:23 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-29 7:37 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/29 14:16, Xueshi Hu 写道: > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/29 11:36, Yu Kuai 写道: >>> Hi, >>> >>> 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>>>> I think this is wrong, you should at least keep following: >>>>>>> >>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>>>> md_wakeup_thread(mddev->thread); >>>>>>> >>>>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>>>> raid1d() repeatedly despite the absence of any modifications. >>>>>> It appears that raid1d() is responsible for invoking >>>>>> md_check_recovery() >>>>>> and resubmit the failed io. >>>>> >>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>>>> so that md_check_recovery() can start to reshape. >>>> I apologize, but I am still unable to comprehend your idea. >>>> If mmdev->delta_disks == 0 , what is the purpose of invoking >>>> md_check_recovery() again? >>> >>> Sounds like you think raid1_reshape() can only be called from >>> md_check_recovery(), please take a look at other callers. > Thank you for the quick reply and patience. > > Of course, I have checked all the caller of md_personality::check_reshape. > > - layout_store > - action_store > - chunk_size_store > - md_ioctl > __md_set_array_info > update_array_info > - md_check_recovery > - md_ioctl > __md_set_array_info > update_array_info > update_raid_disks > - process_metadata_update > md_reload_sb > check_sb_changes > update_raid_disks > - raid_disks_store > update_raid_disks > > There are three categories of callers except md_check_recovery(). > 1. write to sysfs > 2. ioctl > 3. revice instructions from md cluster peer > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > example, if the mddev::raid_disks is already 4, I don't think > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > bit, then wake up the mddev::thread to call md_check_recovery(). > > Is there any counterexample to demonstrate the issues that may arise if > md_check_recovery() is not called out of raid1_reshape() ? > >> >> And even if raid1_reshape() is called from md_check_recovery(), >> which means reshape is interupted, then MD_RECOVERY_RECOVER >> and MD_RECOVERY_RECOVER need to be set, so that reshape can >> continue. > > raid1 only register md_personality::check_reshape, all the work related > with reshape are handle in md_personality::check_reshape. > What's the meaning of "reshape can continue" ? In another word, what's > the additional work need to do if mddev::raid_disks doesn't change ? Okay, I missed that raid1 doesn't have 'start_reshape', reshape here really means recovery, synchronize data to new disks. Never mind what I said "reshape can continue", it's not possible for raid1. Then the problem is the same from 'recovery' point of view: if the 'recovery' is interrupted, before this patch, even if raid_disks is the same, raid1_reshape() will still set the flag and then md_check_recovery() will try to continue the recovery. I'd like to keep this behaviour untill it can be sure that no user will be affected. Thanks, Kuai > > Thanks, > Hu > >> >> Thanks, >> Kuai >>> >>> Thanks, >>> Kuai >>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 7:37 ` Yu Kuai @ 2023-07-29 12:23 ` Xueshi Hu 2023-07-31 1:03 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-29 12:23 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 14:16, Xueshi Hu 写道: > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/29 11:36, Yu Kuai 写道: > > > > Hi, > > > > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > > > It appears that raid1d() is responsible for invoking > > > > > > > md_check_recovery() > > > > > > > and resubmit the failed io. > > > > > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > > > so that md_check_recovery() can start to reshape. > > > > > I apologize, but I am still unable to comprehend your idea. > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > > > md_check_recovery() again? > > > > > > > > Sounds like you think raid1_reshape() can only be called from > > > > md_check_recovery(), please take a look at other callers. > > Thank you for the quick reply and patience. > > > > Of course, I have checked all the caller of md_personality::check_reshape. > > > > - layout_store > > - action_store > > - chunk_size_store > > - md_ioctl > > __md_set_array_info > > update_array_info > > - md_check_recovery > > - md_ioctl > > __md_set_array_info > > update_array_info > > update_raid_disks > > - process_metadata_update > > md_reload_sb > > check_sb_changes > > update_raid_disks > > - raid_disks_store > > update_raid_disks > > > > There are three categories of callers except md_check_recovery(). > > 1. write to sysfs > > 2. ioctl > > 3. revice instructions from md cluster peer > > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > > example, if the mddev::raid_disks is already 4, I don't think > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > > bit, then wake up the mddev::thread to call md_check_recovery(). > > > > Is there any counterexample to demonstrate the issues that may arise if > > md_check_recovery() is not called out of raid1_reshape() ? > > > > > > > > And even if raid1_reshape() is called from md_check_recovery(), > > > which means reshape is interupted, then MD_RECOVERY_RECOVER > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can > > > continue. > > > > raid1 only register md_personality::check_reshape, all the work related > > with reshape are handle in md_personality::check_reshape. > > What's the meaning of "reshape can continue" ? In another word, what's > > the additional work need to do if mddev::raid_disks doesn't change ? > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here > really means recovery, synchronize data to new disks. Never mind what > I said "reshape can continue", it's not possible for raid1. > > Then the problem is the same from 'recovery' point of view: > if the 'recovery' is interrupted, before this patch, even if raid_disks > is the same, raid1_reshape() will still set the flag and then > md_check_recovery() will try to continue the recovery. I'd like to > keep this behaviour untill it can be sure that no user will be > affected. But md_check_recovery() will never call raid1_reshape(). md_personality::check_reshape() is called in md_check_recovery() when the reshape is in process. But, raid1 is speicial as mddev::reshape_position always equals to MaxSector in raid1. By the way, the concern in V2 patch[1] is unnecessary out of the same reason. [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > > Thanks, > Kuai > > > > > Thanks, > > Hu > > > > > > > > Thanks, > > > Kuai > > > > > > > > Thanks, > > > > Kuai > > > > > > > > . > > > > > > > > > . > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-29 12:23 ` Xueshi Hu @ 2023-07-31 1:03 ` Yu Kuai 2023-07-31 3:48 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-31 1:03 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/29 20:23, Xueshi Hu 写道: > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/07/29 14:16, Xueshi Hu 写道: >>> On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2023/07/29 11:36, Yu Kuai 写道: >>>>> Hi, >>>>> >>>>> 在 2023/07/29 11:29, Xueshi Hu 写道: >>>>>>>>> I think this is wrong, you should at least keep following: >>>>>>>>> >>>>>>>>> set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>>>>>>>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>>>>>>>> md_wakeup_thread(mddev->thread); >>>>>>>>> >>>>>>>> I fail to comprehend the rationale behind the kernel's need to invoke >>>>>>>> raid1d() repeatedly despite the absence of any modifications. >>>>>>>> It appears that raid1d() is responsible for invoking >>>>>>>> md_check_recovery() >>>>>>>> and resubmit the failed io. >>>>>>> >>>>>>> No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, >>>>>>> so that md_check_recovery() can start to reshape. >>>>>> I apologize, but I am still unable to comprehend your idea. >>>>>> If mmdev->delta_disks == 0 , what is the purpose of invoking >>>>>> md_check_recovery() again? >>>>> >>>>> Sounds like you think raid1_reshape() can only be called from >>>>> md_check_recovery(), please take a look at other callers. >>> Thank you for the quick reply and patience. >>> >>> Of course, I have checked all the caller of md_personality::check_reshape. >>> >>> - layout_store >>> - action_store >>> - chunk_size_store >>> - md_ioctl >>> __md_set_array_info >>> update_array_info >>> - md_check_recovery >>> - md_ioctl >>> __md_set_array_info >>> update_array_info >>> update_raid_disks >>> - process_metadata_update >>> md_reload_sb >>> check_sb_changes >>> update_raid_disks >>> - raid_disks_store >>> update_raid_disks >>> >>> There are three categories of callers except md_check_recovery(). >>> 1. write to sysfs >>> 2. ioctl >>> 3. revice instructions from md cluster peer >>> >>> Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an >>> example, if the mddev::raid_disks is already 4, I don't think >>> raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED >>> bit, then wake up the mddev::thread to call md_check_recovery(). >>> >>> Is there any counterexample to demonstrate the issues that may arise if >>> md_check_recovery() is not called out of raid1_reshape() ? >>> >>>> >>>> And even if raid1_reshape() is called from md_check_recovery(), >>>> which means reshape is interupted, then MD_RECOVERY_RECOVER >>>> and MD_RECOVERY_RECOVER need to be set, so that reshape can >>>> continue. >>> >>> raid1 only register md_personality::check_reshape, all the work related >>> with reshape are handle in md_personality::check_reshape. >>> What's the meaning of "reshape can continue" ? In another word, what's >>> the additional work need to do if mddev::raid_disks doesn't change ? >> >> Okay, I missed that raid1 doesn't have 'start_reshape', reshape here >> really means recovery, synchronize data to new disks. Never mind what >> I said "reshape can continue", it's not possible for raid1. >> >> Then the problem is the same from 'recovery' point of view: >> if the 'recovery' is interrupted, before this patch, even if raid_disks >> is the same, raid1_reshape() will still set the flag and then >> md_check_recovery() will try to continue the recovery. I'd like to >> keep this behaviour untill it can be sure that no user will be >> affected. > > But md_check_recovery() will never call raid1_reshape(). > > md_personality::check_reshape() is called in md_check_recovery() when > the reshape is in process. But, raid1 is speicial as > mddev::reshape_position always equals to MaxSector in raid1. > > By the way, the concern in V2 patch[1] is unnecessary out of the same > reason. > Well... I just said reshape can continue is not possible for raid1, and this patch will cause that recovery can't continue is some cases. Thanks, Kuai > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > >> >> Thanks, >> Kuai >> >>> >>> Thanks, >>> Hu >>> >>>> >>>> Thanks, >>>> Kuai >>>>> >>>>> Thanks, >>>>> Kuai >>>>> >>>>> . >>>>> >>>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-31 1:03 ` Yu Kuai @ 2023-07-31 3:48 ` Xueshi Hu 2023-07-31 6:22 ` Yu Kuai 0 siblings, 1 reply; 24+ messages in thread From: Xueshi Hu @ 2023-07-31 3:48 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Mon, Jul 31, 2023 at 09:03:52AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/29 20:23, Xueshi Hu 写道: > > On Sat, Jul 29, 2023 at 03:37:41PM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/07/29 14:16, Xueshi Hu 写道: > > > > On Sat, Jul 29, 2023 at 11:51:27AM +0800, Yu Kuai wrote: > > > > > Hi, > > > > > > > > > > 在 2023/07/29 11:36, Yu Kuai 写道: > > > > > > Hi, > > > > > > > > > > > > 在 2023/07/29 11:29, Xueshi Hu 写道: > > > > > > > > > > I think this is wrong, you should at least keep following: > > > > > > > > > > > > > > > > > > > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > > > > > > > > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > > > > > > > > > md_wakeup_thread(mddev->thread); > > > > > > > > > > > > > > > > > > > I fail to comprehend the rationale behind the kernel's need to invoke > > > > > > > > > raid1d() repeatedly despite the absence of any modifications. > > > > > > > > > It appears that raid1d() is responsible for invoking > > > > > > > > > md_check_recovery() > > > > > > > > > and resubmit the failed io. > > > > > > > > > > > > > > > > No, the point here is to set MD_RECOVERY_NEEDED and MD_RECOVERY_RECOVER, > > > > > > > > so that md_check_recovery() can start to reshape. > > > > > > > I apologize, but I am still unable to comprehend your idea. > > > > > > > If mmdev->delta_disks == 0 , what is the purpose of invoking > > > > > > > md_check_recovery() again? > > > > > > > > > > > > Sounds like you think raid1_reshape() can only be called from > > > > > > md_check_recovery(), please take a look at other callers. > > > > Thank you for the quick reply and patience. > > > > > > > > Of course, I have checked all the caller of md_personality::check_reshape. > > > > > > > > - layout_store > > > > - action_store > > > > - chunk_size_store > > > > - md_ioctl > > > > __md_set_array_info > > > > update_array_info > > > > - md_check_recovery > > > > - md_ioctl > > > > __md_set_array_info > > > > update_array_info > > > > update_raid_disks > > > > - process_metadata_update > > > > md_reload_sb > > > > check_sb_changes > > > > update_raid_disks > > > > - raid_disks_store > > > > update_raid_disks > > > > > > > > There are three categories of callers except md_check_recovery(). > > > > 1. write to sysfs > > > > 2. ioctl > > > > 3. revice instructions from md cluster peer > > > > > > > > Using "echo 4 > /sys/devices/virtual/block/md10/md/raid_disks" as an > > > > example, if the mddev::raid_disks is already 4, I don't think > > > > raid1_reshape() should set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED > > > > bit, then wake up the mddev::thread to call md_check_recovery(). > > > > > > > > Is there any counterexample to demonstrate the issues that may arise if > > > > md_check_recovery() is not called out of raid1_reshape() ? > > > > > > > > > > > > > > And even if raid1_reshape() is called from md_check_recovery(), > > > > > which means reshape is interupted, then MD_RECOVERY_RECOVER > > > > > and MD_RECOVERY_RECOVER need to be set, so that reshape can > > > > > continue. > > > > > > > > raid1 only register md_personality::check_reshape, all the work related > > > > with reshape are handle in md_personality::check_reshape. > > > > What's the meaning of "reshape can continue" ? In another word, what's > > > > the additional work need to do if mddev::raid_disks doesn't change ? > > > > > > Okay, I missed that raid1 doesn't have 'start_reshape', reshape here > > > really means recovery, synchronize data to new disks. Never mind what > > > I said "reshape can continue", it's not possible for raid1. > > > > > > Then the problem is the same from 'recovery' point of view: > > > if the 'recovery' is interrupted, before this patch, even if raid_disks > > > is the same, raid1_reshape() will still set the flag and then > > > md_check_recovery() will try to continue the recovery. I'd like to > > > keep this behaviour untill it can be sure that no user will be > > > affected. > > > > But md_check_recovery() will never call raid1_reshape(). > > > > md_personality::check_reshape() is called in md_check_recovery() when > > the reshape is in process. But, raid1 is speicial as > > mddev::reshape_position always equals to MaxSector in raid1. > > > > By the way, the concern in V2 patch[1] is unnecessary out of the same > > reason. > > > > Well... I just said reshape can continue is not possible for raid1, and > this patch will cause that recovery can't continue is some cases. I see. I will reread the relevant code to gain a better understanding of "some cases". Thanks, Hu > > Thanks, > Kuai > > > [1]: https://lore.kernel.org/linux-raid/ff93bc7a-5ae2-7a85-91c9-9662d3c5a442@huaweicloud.com/#t > > > > > > > > Thanks, > > > Kuai > > > > > > > > > > > Thanks, > > > > Hu > > > > > > > > > > > > > > Thanks, > > > > > Kuai > > > > > > > > > > > > Thanks, > > > > > > Kuai > > > > > > > > > > > > . > > > > > > > > > > > > > > > . > > > > > > > > > . > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-31 3:48 ` Xueshi Hu @ 2023-07-31 6:22 ` Yu Kuai 2023-07-31 14:12 ` Xueshi Hu 0 siblings, 1 reply; 24+ messages in thread From: Yu Kuai @ 2023-07-31 6:22 UTC (permalink / raw) To: Xueshi Hu, Yu Kuai; +Cc: linux-raid, pmenzel, song, yukuai (C) Hi, 在 2023/07/31 11:48, Xueshi Hu 写道: >> Well... I just said reshape can continue is not possible for raid1, and >> this patch will cause that recovery can't continue is some cases. > I see. I will reread the relevant code to gain a better understanding of > "some cases". It's not that complex at all, the key point is whether your changes introduce functional changes, if so, does the effect fully evaluated. In this case, raid1_reshape(all the callers) will set MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon thread must check if recovery is needed, if so it will start recovery. I agree that recovery is probably not needed in this case that raid_disks is the same, however, note that recovery can be interrupted by user, and raid1_reshape() can be triggered by user as well, this means following procedures will end up different: 1. trigger a recovery; 2. interupt the recovery; 3. trigger a raid1_reshape(); Before this patch, the interupted recovery will continue; This is minor, but I can't say that no user will be affected, and I really prefer to keep this behaviour, which means this patch can just do some cleanup without introducing functional changes. Thanks, Kuai ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/3] md/raid1: check array size before reshape 2023-07-31 6:22 ` Yu Kuai @ 2023-07-31 14:12 ` Xueshi Hu 0 siblings, 0 replies; 24+ messages in thread From: Xueshi Hu @ 2023-07-31 14:12 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, pmenzel, song On Mon, Jul 31, 2023 at 02:22:31PM +0800, Yu Kuai wrote: > Hi, > > 在 2023/07/31 11:48, Xueshi Hu 写道: > > > Well... I just said reshape can continue is not possible for raid1, and > > > this patch will cause that recovery can't continue is some cases. > > > I see. I will reread the relevant code to gain a better understanding of > > "some cases". > > It's not that complex at all, the key point is whether your changes > introduce functional changes, if so, does the effect fully evaluated. > > In this case, raid1_reshape(all the callers) will set > MD_RECOVERY_RECOVER and MD_RECOVERY_NEEDED to indicate that daemon > thread must check if recovery is needed, if so it will start recovery. > > I agree that recovery is probably not needed in this case that > raid_disks is the same, however, note that recovery can be interrupted > by user, and raid1_reshape() can be triggered by user as well, this > means following procedures will end up different: > > 1. trigger a recovery; > 2. interupt the recovery; > 3. trigger a raid1_reshape(); Sincere gratitude for your detailed response. I have been consistently misunderstanding the concept of "interrupted recovery." I now understand your concerns. Thanks, Hu > > Before this patch, the interupted recovery will continue; > > This is minor, but I can't say that no user will be affected, and I > really prefer to keep this behaviour, which means this patch can just do > some cleanup without introducing functional changes. > > Thanks, > Kuai > ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-08-01 1:24 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu 2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu 2023-07-20 1:36 ` Yu Kuai 2023-07-20 1:37 ` Yu Kuai 2023-07-31 14:02 ` Xueshi Hu 2023-08-01 1:24 ` Yu Kuai 2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu 2023-07-20 1:47 ` Yu Kuai 2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu 2023-07-19 7:38 ` Paul Menzel 2023-07-19 11:51 ` Xueshi Hu 2023-07-20 1:28 ` Yu Kuai 2023-07-28 14:42 ` Xueshi Hu 2023-07-29 0:58 ` Yu Kuai 2023-07-29 3:29 ` Xueshi Hu 2023-07-29 3:36 ` Yu Kuai 2023-07-29 3:51 ` Yu Kuai 2023-07-29 6:16 ` Xueshi Hu 2023-07-29 7:37 ` Yu Kuai 2023-07-29 12:23 ` Xueshi Hu 2023-07-31 1:03 ` Yu Kuai 2023-07-31 3:48 ` Xueshi Hu 2023-07-31 6:22 ` Yu Kuai 2023-07-31 14:12 ` Xueshi Hu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).