All of lore.kernel.org
 help / color / mirror / Atom feed
From: piaojun <piaojun@huawei.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres
Date: Thu, 2 Nov 2017 09:42:47 +0800	[thread overview]
Message-ID: <59FA7817.6010200@huawei.com> (raw)
In-Reply-To: <63ADC13FD55D6546B7DECE290D39E373CED736CA@H3CMLB14-EX.srv.huawei-3com.com>

Hi Changwei,

I had tried a solution like yours before, but failed to prevent the
race just by 'dlm_state' and the existed variable as
'DLM_CTXT_IN_SHUTDOWN' contains many status. so I think we need
introduce a 'migrate_done' to solve that problem.

thanks,
Jun

On 2017/11/1 17:00, Changwei Ge wrote:
> Hi Jun,
> 
> On 2017/11/1 16:46, piaojun wrote:
>> Hi Changwei,
>>
>> I do think we need follow the principle that use 'dlm_domain_lock' to
>> protect 'dlm_state' as the NOTE says in 'dlm_ctxt':
>> /* NOTE: Next three are protected by dlm_domain_lock */
>>
>> deadnode won't be cleared from 'dlm->domain_map' if return from
>> __dlm_hb_node_down(), and NodeA will retry migrating to NodeB forever
>> if only NodeA and NodeB in domain. I suggest more testing needed in
>> your solution.
> 
> I agree, however, my patch is just a draft to indicate my comments.
> 
> Perhaps we can figure out a better way to solve this, as your patch 
> can't clear DLM RECOVERING flag on lock resource. I am not sure if it is 
> reasonable, I suppose this may violate ocfs2/dlm design philosophy.
> 
> Thanks,
> Changwei
> 

if res is marked as DLM RECOVERING, migrating process will wait for
recoverying done. and DLM RECOVERING will be cleared after recoverying.

>>
>> thanks,
>> Jun
>>
>> On 2017/11/1 16:11, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> Thanks for reviewing.
>>> I don't think we have to worry about misusing *dlm_domain_lock* and
>>> *dlm::spinlock*. I admit my change may look a little different from most
>>> of other code snippets where using these two spin locks. But our purpose
>>> is to close the race between __dlm_hb_node_down and
>>> dlm_unregister_domain, right?  And my change meets that. :-)
>>>
>>> I suppose we can do it in a flexible way.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>
>>> On 2017/11/1 15:57, piaojun wrote:
>>>> Hi Changwei,
>>>>
>>>> thanks for reviewing, and I think waiting for recoverying done before
>>>> migrating seems another solution, but I wonder if new problems will be
>>>> invoked as following comments.
>>>>
>>>> thanks,
>>>> Jun
>>>>
>>>> On 2017/11/1 15:13, Changwei Ge wrote:
>>>>> Hi Jun,
>>>>>
>>>>> I probably get your point.
>>>>>
>>>>> You mean that dlm finds no lock resource to be migrated and no more lock
>>>>> resource is managed by its hash table. After that a node dies all of a
>>>>> sudden and the dead node is put into dlm's recovery map, right?
>>>> that is it.
>>>>> Furthermore, a lock resource is migrated from other nodes and local node
>>>>> has already asserted master to them.
>>>>>
>>>>> If so, I want to suggest a easier way to solve it.
>>>>> We don't have to add a new flag to dlm structure, just leverage existed
>>>>> dlm status and bitmap.
>>>>> It will bring a bonus - no lock resource will be marked with RECOVERING,
>>>>> it's a safer way, I suppose.
>>>>>
>>>>> Please take a review.
>>>>>
>>>>> Thanks,
>>>>> Changwei
>>>>>
>>>>>
>>>>> Subject: [PATCH] ocfs2/dlm: a node can't be involved in recovery if it
>>>>> is being shutdown
>>>>>
>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>> ---
>>>>>     fs/ocfs2/dlm/dlmdomain.c   | 4 ++++
>>>>>     fs/ocfs2/dlm/dlmrecovery.c | 3 +++
>>>>>     2 files changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>>>>> index a2b19fbdcf46..5e9283e509a4 100644
>>>>> --- a/fs/ocfs2/dlm/dlmdomain.c
>>>>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>>>>> @@ -707,11 +707,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>>>>     		 * want new domain joins to communicate with us at
>>>>>     		 * least until we've completed migration of our
>>>>>     		 * resources. */
>>>>> +		spin_lock(&dlm->spinlock);
>>>>>     		dlm->dlm_state = DLM_CTXT_IN_SHUTDOWN;
>>>>> +		spin_unlock(&dlm->spinlock);
>>>> I guess there will be misuse of 'dlm->spinlock' and dlm_domain_lock.
>>>>>     		leave = 1;
>>>>>     	}
>>>>>     	spin_unlock(&dlm_domain_lock);
>>>>>
>>>>> +	dlm_wait_for_recovery(dlm);
>>>>> +
>>>>>     	if (leave) {
>>>>>     		mlog(0, "shutting down domain %s\n", dlm->name);
>>>>>     		dlm_begin_exit_domain(dlm);
>>>>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>>>>> index 74407c6dd592..764c95b2b35c 100644
>>>>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>>>>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>>>> @@ -2441,6 +2441,9 @@ static void __dlm_hb_node_down(struct dlm_ctxt
>>>>> *dlm, int idx)
>>>>>     {
>>>>>     	assert_spin_locked(&dlm->spinlock);
>>>>>
>>>>> +	if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN)
>>>>> +		return;
>>>>> +
>>>> 'dlm->dlm_state' probably need be to protected by 'dlm_domain_lock'.
>>>> and I wander if there is more work to be done when in
>>>> 'DLM_CTXT_IN_SHUTDOWN'?
>>>>>     	if (dlm->reco.new_master == idx) {
>>>>>     		mlog(0, "%s: recovery master %d just died\n",
>>>>>     		     dlm->name, idx);
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 

  reply	other threads:[~2017-11-02  1:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01  1:14 [Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres piaojun
2017-11-01  2:47 ` Changwei Ge
2017-11-01  5:52   ` piaojun
2017-11-01  7:13     ` Changwei Ge
2017-11-01  7:56       ` piaojun
2017-11-01  8:11         ` Changwei Ge
2017-11-01  8:45           ` piaojun
2017-11-01  9:00             ` Changwei Ge
2017-11-02  1:42               ` piaojun [this message]
2017-11-02  1:56                 ` Changwei Ge
2017-11-03  1:01                   ` piaojun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59FA7817.6010200@huawei.com \
    --to=piaojun@huawei.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.