ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
From: Ashish Samant <ashish.samant@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: Fix locking for res->tracking and dlm->tracking_list
Date: Fri, 22 Jun 2018 16:33:06 -0700	[thread overview]
Message-ID: <5B2D8732.5090101@oracle.com> (raw)
In-Reply-To: <HK0PR06MB2532956AE27D9E01083E6E92D5750@HK0PR06MB2532.apcprd06.prod.outlook.com>



On 06/22/2018 02:25 AM, Changwei Ge wrote:
>
> On 2018/6/22 16:55, Joseph Qi wrote:
>> On 18/6/22 16:50, Changwei Ge wrote:
>>> On 2018/6/22 16:32, Joseph Qi wrote:
>>>> On 18/6/22 07:57, Ashish Samant wrote:
>>>>> In dlm_init_lockres() and dlm_unregister_domain() we access and modify
>>>>> res->tracking and dlm->tracking_list without holding dlm->track_lock.
>>>>> This can cause list corruptions and can end up in kernel panic.
>>>>>
>>>>> Fix this by locking res->tracking and dlm->tracking_list with
>>>>> dlm->track_lock at all places.
>>>>>
>>>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
>>>>> ---
>>>>>     fs/ocfs2/dlm/dlmdomain.c | 2 ++
>>>>>     fs/ocfs2/dlm/dlmmaster.c | 4 ++--
>>>>>     2 files changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>>>>> index 2acd58b..cfb1edd 100644
>>>>> --- a/fs/ocfs2/dlm/dlmdomain.c
>>>>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>>>>> @@ -723,6 +723,7 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>>>>     			mlog(0, "%s: more migration to do\n", dlm->name);
>>>>>     		}
>>>>>     
>>>>> +		spin_lock(&dlm->track_lock);
>>>>>     		/* This list should be empty. If not, print remaining lockres */
>>>>>     		if (!list_empty(&dlm->tracking_list)) {
>>>>>     			mlog(ML_ERROR, "Following lockres' are still on the "
>>>>> @@ -730,6 +731,7 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>>>>     			list_for_each_entry(res, &dlm->tracking_list, tracking)
>>>>>     				dlm_print_one_lock_resource(res);
>>>>>     		}
>>>>> +		spin_unlock(&dlm->track_lock);
>>>>>     
>>>> The locking order should be res->spinlock > dlm->track_lock.
>>>> Since here just want to print error message for issue tracking, I'm
>>>> wandering if we can copy tracking list to local first.

Right, for some reason, I was thinking the call is to 
__dlm_print_lock_resource() and not dlm_print_one_lock_resource(). So 
this could deadlock.

>>> That won't be easy since I think the copying should also should lock
>>> resource lock.
>> Copy tracking list only need taking track_lock.
>> Then access local tracking list we don't have to take it any more
>> and then we can call dlm_print_one_lock_resource() which will take
>> res->spinlock.
> I thought you' want to copy lock resources as well.
> Um, is it possible that the copied track list points to some stale lock
> resources which are released after the copy.
Yes dropping the track_lock can still cause the same problem. However, I 
am wondering , since this is during dlm unregister domain/ cluster 
disconnect after the dlm_thread has run, under what conditions would a 
concurrent access to the tracking_list occur at this point?

Thanks,
Ashish

>
> Thanks,
> Changwei
>
>> Thanks,
>> Joseph
>>
>>> Perhaps, we can remove lock resource from dlm->track_list only when the
>>> lock resource is released.
>>> It brings another benefit that we can easily find which lock resource is
>>> leaked.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>>     		dlm_mark_domain_leaving(dlm);
>>>>>     		dlm_leave_domain(dlm);
>>>>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>>>>> index aaca094..826f056 100644
>>>>> --- a/fs/ocfs2/dlm/dlmmaster.c
>>>>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>>>>> @@ -584,9 +584,9 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
>>>>>     
>>>>>     	res->last_used = 0;
>>>>>     
>>>>> -	spin_lock(&dlm->spinlock);
>>>>> +	spin_lock(&dlm->track_lock);
>>>>>     	list_add_tail(&res->tracking, &dlm->tracking_list);
>>>>> -	spin_unlock(&dlm->spinlock);
>>>>> +	spin_unlock(&dlm->track_lock);
Maybe we only need this to fix the issue.

Thanks,
Ashish


>>>>>     
>>>>>     	memset(res->lvb, 0, DLM_LVB_LEN);
>>>>>     	memset(res->refmap, 0, sizeof(res->refmap));
>>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  parent reply	other threads:[~2018-06-22 23:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-21 23:57 [Ocfs2-devel] [PATCH] ocfs2: Fix locking for res->tracking and dlm->tracking_list Ashish Samant
2018-06-22  0:34 ` piaojun
2018-06-22  1:33   ` Changwei Ge
2018-06-22  8:32 ` Joseph Qi
2018-06-22  8:50   ` Changwei Ge
2018-06-22  8:55     ` Joseph Qi
2018-06-22  9:25       ` Changwei Ge
2018-06-22  9:41         ` Joseph Qi
2018-06-22 23:33         ` Ashish Samant [this message]
2018-06-25  1:07           ` Changwei Ge
2018-06-25 18:28             ` Ashish Samant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B2D8732.5090101@oracle.com \
    --to=ashish.samant@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).