Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: NeilBrown <neilb@suse.de>
Cc: Song Liu <song@kernel.org>, linux-raid@vger.kernel.org
Subject: Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()
Date: Tue, 16 Aug 2022 21:53:53 +0800	[thread overview]
Message-ID: <a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev> (raw)
In-Reply-To: <166061152702.5425.9507699881285566239@noble.neil.brown.name>



On 8/16/22 8:58 AM, NeilBrown wrote:
> On Mon, 15 Aug 2022, Guoqing Jiang wrote:
>> Hi Neil,
>>
>> Sorry for later reply since I was on vacation last week.
>>
>> On 8/12/22 10:16 AM, NeilBrown wrote:
>>> [[ I noticed the e151 patch recently which seems to admit that it broke
>>>      something.  So I looked into it and came up with this.
>> I just noticed it ...
>>
>>>      It seems to make sense, but I'm not in a position to test it.
>>>      Guoqing: does it look OK to you?
>>>      - NeilBrown
>>> ]]
>>>
>>> As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
>>> beginning of __md_stop") md_cluster_stop() needs to run before the
>>> mdddev->thread is stopped.
>>> The change to make this happen was reverted in Commit: e151db8ecfb0
>>> ("md-raid: destroy the bitmap after destroying the thread") due to
>>> problems it caused.
>>>
>>> To restore correct behaviour, make md_cluster_stop() reentrant and
>>> explicitly call it at the start of __md_stop(), after first calling
>>> md_bitmap_wait_behind_writes().
>>>
>>> Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>>    drivers/md/md-cluster.c | 1 +
>>>    drivers/md/md.c         | 3 +++
>>>    2 files changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
>>> index 742b2349fea3..37bf0aa4ed71 100644
>>> --- a/drivers/md/md-cluster.c
>>> +++ b/drivers/md/md-cluster.c
>>> @@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
>>>    	     test_bit(MD_CLOSING, &mddev->flags)))
>>>    		resync_bitmap(mddev);
>>>    
>>> +	mddev->cluster_info = NULL;
>> The above makes sense.
> Thanks.
>
>>>    	set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
>>>    	md_unregister_thread(&cinfo->recovery_thread);
>>>    	md_unregister_thread(&cinfo->recv_thread);
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index afaf36b2f6ab..a57b2dff64dd 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
>>>    static void __md_stop(struct mddev *mddev)
>>>    {
>>>    	struct md_personality *pers = mddev->pers;
>>> +
>>> +	md_bitmap_wait_behind_writes(mddev);
>>> +	md_cluster_stop(mddev);
>>>    	mddev_detach(mddev);
>>>    	/* Ensure ->event_work is done */
>>>    	if (mddev->event_work.func)
>> The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0,
>> and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by
>> md_bitmap_free. So the above is sort of redundant to me.
> The point is that md_cluster_stop() needs to run before mddev_detach()
> shuts down the thread.  If we don't call all of md_bitmap_destroy()
> before mddev_detach() we need to at least run md_cluster_stop(), and I
> think we need to run md_bitmap_wait_behind_writes() before that.
>
>
>> For the issue described in e151db8ecfb, looks like raid1d was running after
>> __md_stop, I am wondering if dm-raid should stop write first just like
>> normal md-raid.
> That looks like a really good idea, that should make it safe to move
> md_bitmap_destroy() back before mddev_detach().
> Would you like to post a patch to make those two changes, and include
> Mikulas Patocka, or should I?

NP, will send it later, thanks for your review.

BRs,
Guoqing

     prev parent reply	other threads:[~2022-08-16 13:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-12  2:16 [PATCH RFC] md: call md_cluster_stop() in __md_stop() NeilBrown
2022-08-15  6:19 ` Guoqing Jiang
2022-08-16  0:58   ` NeilBrown
2022-08-16 13:53     ` Guoqing Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev \
    --to=guoqing.jiang@linux.dev \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).