* [PATCH RFC] md: call md_cluster_stop() in __md_stop()
@ 2022-08-12 2:16 NeilBrown
2022-08-15 6:19 ` Guoqing Jiang
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2022-08-12 2:16 UTC (permalink / raw)
To: Song Liu, Guoqing Jiang; +Cc: linux-raid
[[ I noticed the e151 patch recently which seems to admit that it broke
something. So I looked into it and came up with this.
It seems to make sense, but I'm not in a position to test it.
Guoqing: does it look OK to you?
- NeilBrown
]]
As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
beginning of __md_stop") md_cluster_stop() needs to run before the
mdddev->thread is stopped.
The change to make this happen was reverted in Commit: e151db8ecfb0
("md-raid: destroy the bitmap after destroying the thread") due to
problems it caused.
To restore correct behaviour, make md_cluster_stop() reentrant and
explicitly call it at the start of __md_stop(), after first calling
md_bitmap_wait_behind_writes().
Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/md-cluster.c | 1 +
drivers/md/md.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 742b2349fea3..37bf0aa4ed71 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
test_bit(MD_CLOSING, &mddev->flags)))
resync_bitmap(mddev);
+ mddev->cluster_info = NULL;
set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
md_unregister_thread(&cinfo->recovery_thread);
md_unregister_thread(&cinfo->recv_thread);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..a57b2dff64dd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
+
+ md_bitmap_wait_behind_writes(mddev);
+ md_cluster_stop(mddev);
mddev_detach(mddev);
/* Ensure ->event_work is done */
if (mddev->event_work.func)
--
2.36.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop() 2022-08-12 2:16 [PATCH RFC] md: call md_cluster_stop() in __md_stop() NeilBrown @ 2022-08-15 6:19 ` Guoqing Jiang 2022-08-16 0:58 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Guoqing Jiang @ 2022-08-15 6:19 UTC (permalink / raw) To: NeilBrown, Song Liu; +Cc: linux-raid Hi Neil, Sorry for later reply since I was on vacation last week. On 8/12/22 10:16 AM, NeilBrown wrote: > [[ I noticed the e151 patch recently which seems to admit that it broke > something. So I looked into it and came up with this. I just noticed it ... > It seems to make sense, but I'm not in a position to test it. > Guoqing: does it look OK to you? > - NeilBrown > ]] > > As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the > beginning of __md_stop") md_cluster_stop() needs to run before the > mdddev->thread is stopped. > The change to make this happen was reverted in Commit: e151db8ecfb0 > ("md-raid: destroy the bitmap after destroying the thread") due to > problems it caused. > > To restore correct behaviour, make md_cluster_stop() reentrant and > explicitly call it at the start of __md_stop(), after first calling > md_bitmap_wait_behind_writes(). > > Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") > Signed-off-by: NeilBrown <neilb@suse.de> > --- > drivers/md/md-cluster.c | 1 + > drivers/md/md.c | 3 +++ > 2 files changed, 4 insertions(+) > > diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c > index 742b2349fea3..37bf0aa4ed71 100644 > --- a/drivers/md/md-cluster.c > +++ b/drivers/md/md-cluster.c > @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev) > test_bit(MD_CLOSING, &mddev->flags))) > resync_bitmap(mddev); > > + mddev->cluster_info = NULL; The above makes sense. > set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state); > md_unregister_thread(&cinfo->recovery_thread); > md_unregister_thread(&cinfo->recv_thread); > diff --git a/drivers/md/md.c b/drivers/md/md.c > index afaf36b2f6ab..a57b2dff64dd 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev) > static void __md_stop(struct mddev *mddev) > { > struct md_personality *pers = mddev->pers; > + > + md_bitmap_wait_behind_writes(mddev); > + md_cluster_stop(mddev); > mddev_detach(mddev); > /* Ensure ->event_work is done */ > if (mddev->event_work.func) The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0, and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by md_bitmap_free. So the above is sort of redundant to me. For the issue described in e151db8ecfb, looks like raid1d was running after __md_stop, I am wondering if dm-raid should stop write first just like normal md-raid. diff --git a/drivers/md/md.c b/drivers/md/md.c index afaf36b2f6ab..afc8d638eba0 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6260,6 +6260,7 @@ void md_stop(struct mddev *mddev) /* stop the array and free an attached data structures. * This is called from dm-raid */ + __md_stop_writes(mddev); __md_stop(mddev); bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set); Thanks, Guoqing ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop() 2022-08-15 6:19 ` Guoqing Jiang @ 2022-08-16 0:58 ` NeilBrown 2022-08-16 13:53 ` Guoqing Jiang 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2022-08-16 0:58 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Song Liu, linux-raid On Mon, 15 Aug 2022, Guoqing Jiang wrote: > Hi Neil, > > Sorry for later reply since I was on vacation last week. > > On 8/12/22 10:16 AM, NeilBrown wrote: > > [[ I noticed the e151 patch recently which seems to admit that it broke > > something. So I looked into it and came up with this. > > I just noticed it ... > > > It seems to make sense, but I'm not in a position to test it. > > Guoqing: does it look OK to you? > > - NeilBrown > > ]] > > > > As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the > > beginning of __md_stop") md_cluster_stop() needs to run before the > > mdddev->thread is stopped. > > The change to make this happen was reverted in Commit: e151db8ecfb0 > > ("md-raid: destroy the bitmap after destroying the thread") due to > > problems it caused. > > > > To restore correct behaviour, make md_cluster_stop() reentrant and > > explicitly call it at the start of __md_stop(), after first calling > > md_bitmap_wait_behind_writes(). > > > > Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > drivers/md/md-cluster.c | 1 + > > drivers/md/md.c | 3 +++ > > 2 files changed, 4 insertions(+) > > > > diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c > > index 742b2349fea3..37bf0aa4ed71 100644 > > --- a/drivers/md/md-cluster.c > > +++ b/drivers/md/md-cluster.c > > @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev) > > test_bit(MD_CLOSING, &mddev->flags))) > > resync_bitmap(mddev); > > > > + mddev->cluster_info = NULL; > > The above makes sense. Thanks. > > > set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state); > > md_unregister_thread(&cinfo->recovery_thread); > > md_unregister_thread(&cinfo->recv_thread); > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index afaf36b2f6ab..a57b2dff64dd 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev) > > static void __md_stop(struct mddev *mddev) > > { > > struct md_personality *pers = mddev->pers; > > + > > + md_bitmap_wait_behind_writes(mddev); > > + md_cluster_stop(mddev); > > mddev_detach(mddev); > > /* Ensure ->event_work is done */ > > if (mddev->event_work.func) > > The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0, > and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by > md_bitmap_free. So the above is sort of redundant to me. The point is that md_cluster_stop() needs to run before mddev_detach() shuts down the thread. If we don't call all of md_bitmap_destroy() before mddev_detach() we need to at least run md_cluster_stop(), and I think we need to run md_bitmap_wait_behind_writes() before that. > > For the issue described in e151db8ecfb, looks like raid1d was running after > __md_stop, I am wondering if dm-raid should stop write first just like > normal > md-raid. That looks like a really good idea, that should make it safe to move md_bitmap_destroy() back before mddev_detach(). Would you like to post a patch to make those two changes, and include Mikulas Patocka, or should I? Thanks, NeilBrown > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index afaf36b2f6ab..afc8d638eba0 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -6260,6 +6260,7 @@ void md_stop(struct mddev *mddev) > /* stop the array and free an attached data structures. > * This is called from dm-raid > */ > + __md_stop_writes(mddev); > __md_stop(mddev); > bioset_exit(&mddev->bio_set); > bioset_exit(&mddev->sync_set); > > Thanks, > Guoqing > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop() 2022-08-16 0:58 ` NeilBrown @ 2022-08-16 13:53 ` Guoqing Jiang 0 siblings, 0 replies; 4+ messages in thread From: Guoqing Jiang @ 2022-08-16 13:53 UTC (permalink / raw) To: NeilBrown; +Cc: Song Liu, linux-raid On 8/16/22 8:58 AM, NeilBrown wrote: > On Mon, 15 Aug 2022, Guoqing Jiang wrote: >> Hi Neil, >> >> Sorry for later reply since I was on vacation last week. >> >> On 8/12/22 10:16 AM, NeilBrown wrote: >>> [[ I noticed the e151 patch recently which seems to admit that it broke >>> something. So I looked into it and came up with this. >> I just noticed it ... >> >>> It seems to make sense, but I'm not in a position to test it. >>> Guoqing: does it look OK to you? >>> - NeilBrown >>> ]] >>> >>> As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the >>> beginning of __md_stop") md_cluster_stop() needs to run before the >>> mdddev->thread is stopped. >>> The change to make this happen was reverted in Commit: e151db8ecfb0 >>> ("md-raid: destroy the bitmap after destroying the thread") due to >>> problems it caused. >>> >>> To restore correct behaviour, make md_cluster_stop() reentrant and >>> explicitly call it at the start of __md_stop(), after first calling >>> md_bitmap_wait_behind_writes(). >>> >>> Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") >>> Signed-off-by: NeilBrown <neilb@suse.de> >>> --- >>> drivers/md/md-cluster.c | 1 + >>> drivers/md/md.c | 3 +++ >>> 2 files changed, 4 insertions(+) >>> >>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c >>> index 742b2349fea3..37bf0aa4ed71 100644 >>> --- a/drivers/md/md-cluster.c >>> +++ b/drivers/md/md-cluster.c >>> @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev) >>> test_bit(MD_CLOSING, &mddev->flags))) >>> resync_bitmap(mddev); >>> >>> + mddev->cluster_info = NULL; >> The above makes sense. > Thanks. > >>> set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state); >>> md_unregister_thread(&cinfo->recovery_thread); >>> md_unregister_thread(&cinfo->recv_thread); >>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>> index afaf36b2f6ab..a57b2dff64dd 100644 >>> --- a/drivers/md/md.c >>> +++ b/drivers/md/md.c >>> @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev) >>> static void __md_stop(struct mddev *mddev) >>> { >>> struct md_personality *pers = mddev->pers; >>> + >>> + md_bitmap_wait_behind_writes(mddev); >>> + md_cluster_stop(mddev); >>> mddev_detach(mddev); >>> /* Ensure ->event_work is done */ >>> if (mddev->event_work.func) >> The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0, >> and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by >> md_bitmap_free. So the above is sort of redundant to me. > The point is that md_cluster_stop() needs to run before mddev_detach() > shuts down the thread. If we don't call all of md_bitmap_destroy() > before mddev_detach() we need to at least run md_cluster_stop(), and I > think we need to run md_bitmap_wait_behind_writes() before that. > > >> For the issue described in e151db8ecfb, looks like raid1d was running after >> __md_stop, I am wondering if dm-raid should stop write first just like >> normal md-raid. > That looks like a really good idea, that should make it safe to move > md_bitmap_destroy() back before mddev_detach(). > Would you like to post a patch to make those two changes, and include > Mikulas Patocka, or should I? NP, will send it later, thanks for your review. BRs, Guoqing ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-16 13:54 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-12 2:16 [PATCH RFC] md: call md_cluster_stop() in __md_stop() NeilBrown 2022-08-15 6:19 ` Guoqing Jiang 2022-08-16 0:58 ` NeilBrown 2022-08-16 13:53 ` Guoqing Jiang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).