From: Ashish Samant <ashish.samant@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: Fix locking for res->tracking and dlm->tracking_list
Date: Mon, 25 Jun 2018 11:28:29 -0700 [thread overview]
Message-ID: <5B31344D.4040406@oracle.com> (raw)
In-Reply-To: <HK0PR06MB2532496BA15D73BBE5EA3B23D54A0@HK0PR06MB2532.apcprd06.prod.outlook.com>
On 06/24/2018 06:07 PM, Changwei Ge wrote:
>
> On 2018/6/23 7:33, Ashish Samant wrote:
>>
>> On 06/22/2018 02:25 AM, Changwei Ge wrote:
>>> On 2018/6/22 16:55, Joseph Qi wrote:
>>>> On 18/6/22 16:50, Changwei Ge wrote:
>>>>> On 2018/6/22 16:32, Joseph Qi wrote:
>>>>>> On 18/6/22 07:57, Ashish Samant wrote:
>>>>>>> In dlm_init_lockres() and dlm_unregister_domain() we access and
>>>>>>> modify
>>>>>>> res->tracking and dlm->tracking_list without holding
>>>>>>> dlm->track_lock.
>>>>>>> This can cause list corruptions and can end up in kernel panic.
>>>>>>>
>>>>>>> Fix this by locking res->tracking and dlm->tracking_list with
>>>>>>> dlm->track_lock at all places.
>>>>>>>
>>>>>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
>>>>>>> ---
>>>>>>> fs/ocfs2/dlm/dlmdomain.c | 2 ++
>>>>>>> fs/ocfs2/dlm/dlmmaster.c | 4 ++--
>>>>>>> 2 files changed, 4 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>>>>>>> index 2acd58b..cfb1edd 100644
>>>>>>> --- a/fs/ocfs2/dlm/dlmdomain.c
>>>>>>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>>>>>>> @@ -723,6 +723,7 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>>>>>> mlog(0, "%s: more migration to do\n", dlm->name);
>>>>>>> }
>>>>>>> + spin_lock(&dlm->track_lock);
>>>>>>> /* This list should be empty. If not, print remaining
>>>>>>> lockres */
>>>>>>> if (!list_empty(&dlm->tracking_list)) {
>>>>>>> mlog(ML_ERROR, "Following lockres' are still on
>>>>>>> the "
>>>>>>> @@ -730,6 +731,7 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>>>>>> list_for_each_entry(res, &dlm->tracking_list,
>>>>>>> tracking)
>>>>>>> dlm_print_one_lock_resource(res);
>>>>>>> }
>>>>>>> + spin_unlock(&dlm->track_lock);
>>>>>> The locking order should be res->spinlock > dlm->track_lock.
>>>>>> Since here just want to print error message for issue tracking, I'm
>>>>>> wandering if we can copy tracking list to local first.
>> Right, for some reason, I was thinking the call is to
>> __dlm_print_lock_resource() and not dlm_print_one_lock_resource(). So
>> this could deadlock.
>>
>>>>> That won't be easy since I think the copying should also should lock
>>>>> resource lock.
>>>> Copy tracking list only need taking track_lock.
>>>> Then access local tracking list we don't have to take it any more
>>>> and then we can call dlm_print_one_lock_resource() which will take
>>>> res->spinlock.
>>> I thought you' want to copy lock resources as well.
>>> Um, is it possible that the copied track list points to some stale lock
>>> resources which are released after the copy.
>> Yes dropping the track_lock can still cause the same problem. However,
>> I am wondering , since this is during dlm unregister domain/ cluster
>> disconnect after the dlm_thread has run, under what conditions would a
>> concurrent access to the tracking_list occur at this point?
> I think your assumption stands, we don't have to worry much about
> concurrent access to the ::tracking_list. DLM should make sure that
> after migrating all lock resources, no more lock resources should be born.
>
>> Thanks,
>> Ashish
>>
>>> Thanks,
>>> Changwei
>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> Perhaps, we can remove lock resource from dlm->track_list only when
>>>>> the
>>>>> lock resource is released.
>>>>> It brings another benefit that we can easily find which lock
>>>>> resource is
>>>>> leaked.
>>>>>
>>>>> Thanks,
>>>>> Changwei
>>>>>
>>>>>> Thanks,
>>>>>> Joseph
>>>>>>
>>>>>>> dlm_mark_domain_leaving(dlm);
>>>>>>> dlm_leave_domain(dlm);
>>>>>>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>>>>>>> index aaca094..826f056 100644
>>>>>>> --- a/fs/ocfs2/dlm/dlmmaster.c
>>>>>>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>>>>>>> @@ -584,9 +584,9 @@ static void dlm_init_lockres(struct dlm_ctxt
>>>>>>> *dlm,
>>>>>>> res->last_used = 0;
>>>>>>> - spin_lock(&dlm->spinlock);
>>>>>>> + spin_lock(&dlm->track_lock);
>>>>>>> list_add_tail(&res->tracking, &dlm->tracking_list);
>>>>>>> - spin_unlock(&dlm->spinlock);
>>>>>>> + spin_unlock(&dlm->track_lock);
>> Maybe we only need this to fix the issue.
> Agree. Could you resend your patch?
Sent V2.
Thanks,
Ashish
>
> Thanks,
> Changwei
>
>> Thanks,
>> Ashish
>>
>>
>>>>>>> memset(res->lvb, 0, DLM_LVB_LEN);
>>>>>>> memset(res->refmap, 0, sizeof(res->refmap));
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
prev parent reply other threads:[~2018-06-25 18:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-21 23:57 [Ocfs2-devel] [PATCH] ocfs2: Fix locking for res->tracking and dlm->tracking_list Ashish Samant
2018-06-22 0:34 ` piaojun
2018-06-22 1:33 ` Changwei Ge
2018-06-22 8:32 ` Joseph Qi
2018-06-22 8:50 ` Changwei Ge
2018-06-22 8:55 ` Joseph Qi
2018-06-22 9:25 ` Changwei Ge
2018-06-22 9:41 ` Joseph Qi
2018-06-22 23:33 ` Ashish Samant
2018-06-25 1:07 ` Changwei Ge
2018-06-25 18:28 ` Ashish Samant [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5B31344D.4040406@oracle.com \
--to=ashish.samant@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).