From: Junxiao Bi <junxiao.bi@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash
Date: Tue, 25 Feb 2014 10:14:29 +0800 [thread overview]
Message-ID: <530BFC85.5030409@oracle.com> (raw)
In-Reply-To: <530BD609.8080104@oracle.com>
On 02/25/2014 07:30 AM, Srinivas Eeda wrote:
> Junxiao, thanks for looking into this issue. Please see my comment below
>
> On 02/24/2014 01:07 AM, Junxiao Bi wrote:
>> Hi,
>>
>> On 07/19/2012 09:59 AM, Sunil Mushran wrote:
>>> Different issues.
>>>
>>> On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao.bi@oracle.com
>>> <mailto:junxiao.bi@oracle.com>> wrote:
>>>
>>> On 07/19/2012 12:36 AM, Sunil Mushran wrote:
>>>> This bug was detected during code audit. Never seen a crash. If
>>>> it does hit,
>>>> then we have bigger problems. So no point posting to stable.
>>>
>> I read a lot of dlm recovery code recently, I found this bug could
>> happen at the following scenario.
>>
>> node 1: migrate target
>> node x:
>> dlm_unregister_domain()
>> dlm_migrate_all_locks()
>> dlm_empty_lockres()
>> select node x as migrate target node
>> since there is a node x lock on the granted list.
>> dlm_migrate_lockres()
>> dlm_mark_lockres_migrating() {
>> wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
>> <<< node x unlock may happen here, res->granted list can be empty.
> If the unlock request got sent at this point, and if the request was
> *processed*, lock must have been removed from the granted_list. If the
> request was *not yet processed*, then the DLM_LOCK_RES_MIGRATING set
> in dlm_lockres_release_ast would make dlm_unlock handler to return
> DLM_MIGRATING to the caller (in this case node x). So I don't see how
> granted_list could have stale lock. Am I missing something ?
>
> I do think there is such race that you pointed below exist, but I am
> not sure if it was due to the above race described.
Outside the windows from set RES_BLOCK_DIRTY flag and wait_event() to
dlm_lockres_release_ast(), granted_list can not be empty, since
wait_event will wait until dlm_thread clear the dirty flag where shuffle
list will pick another lock to the granted list. After the window,
DLM_MIGRATING flag will stop other node unlock to the granted list. So I
think this cause the empty granted list and cause the migrate target
panic. I didn't see any other harm of this since the migrate target node
will shuffle the list and send the ast message later.
Thanks,
Junxiao.
>
>> dlm_lockres_release_ast(dlm, res);
>> }
>> dlm_send_one_lockres()
>>
>> dlm_process_recovery_data() {
>> tmpq is
>> res->granted list and is empty.
>>
>> list_for_each_entry(lock, tmpq, list) {
>> if
>> (lock->ml.cookie != ml->cookie)
>> lock = NULL;
>> else
>> break;
>> }
>> lock will be
>> invalid here.
>> if (lock->ml.node
>> != ml->node)
>> BUG() -->
>> crash here.
>> }
>>
>> Thanks,
>> Junxiao.
>>>
>>> Our customer can reproduce it. Also I saw you were assigned a
>>> similar bug before, see
>>> https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
>>> same BUG?
>>>>
>>>> On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
>>>> <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>>
>>>> Hi Sunil,
>>>>
>>>> On 07/18/2012 03:49 AM, Sunil Mushran wrote:
>>>>> On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
>>>>> <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>>>
>>>>> In the target node of the dlm lock migration, the
>>>>> logic to find
>>>>> the local dlm lock is wrong, it shouldn't change the
>>>>> loop variable
>>>>> "lock" in the list_for_each_entry loop. This will
>>>>> cause a NULL-pointer
>>>>> accessing crash.
>>>>>
>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com
>>>>> <mailto:junxiao.bi@oracle.com>>
>>>>> Cc: stable at vger.kernel.org <mailto:stable@vger.kernel.org>
>>>>> ---
>>>>> fs/ocfs2/dlm/dlmrecovery.c | 12 +++++++-----
>>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c
>>>>> b/fs/ocfs2/dlm/dlmrecovery.c
>>>>> index 01ebfd0..0b9cc88 100644
>>>>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>>>>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>>>> @@ -1762,6 +1762,7 @@ static int
>>>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>>> u8 from = O2NM_MAX_NODES;
>>>>> unsigned int added = 0;
>>>>> __be64 c;
>>>>> + int found;
>>>>>
>>>>> mlog(0, "running %d locks for this lockres\n",
>>>>> mres->num_locks);
>>>>> for (i=0; i<mres->num_locks; i++) {
>>>>> @@ -1793,22 +1794,23 @@ static int
>>>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>>> /* MIGRATION ONLY! */
>>>>> BUG_ON(!(mres->flags &
>>>>> DLM_MRES_MIGRATION));
>>>>>
>>>>> + found = 0;
>>>>> spin_lock(&res->spinlock);
>>>>> for (j = DLM_GRANTED_LIST; j
>>>>> <= DLM_BLOCKED_LIST; j++) {
>>>>> tmpq =
>>>>> dlm_list_idx_to_ptr(res, j);
>>>>>
>>>>> list_for_each_entry(lock, tmpq, list) {
>>>>> - if
>>>>> (lock->ml.cookie != ml->cookie)
>>>>> - lock =
>>>>> NULL;
>>>>> - else
>>>>> + if
>>>>> (lock->ml.cookie == ml->cookie) {
>>>>> + found = 1;
>>>>> break;
>>>>> + }
>>>>> }
>>>>> - if (lock)
>>>>> + if (found)
>>>>> break;
>>>>> }
>>>>>
>>>>> /* lock is always created
>>>>> locally first, and
>>>>> * destroyed locally last. it
>>>>> must be on the list */
>>>>> - if (!lock) {
>>>>> + if (!found) {
>>>>> c = ml->cookie;
>>>>> mlog(ML_ERROR, "Could
>>>>> not find local lock "
>>>>> "with
>>>>> cookie %u:%llu, node %u, "
>>>>>
>>>>>
>>>>>
>>>>> https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47
>>>>>
>>>>>
>>>>> We had decided to go back to list_for_each().
>>>>
>>>> OK, thank you. It's OK to revert it back for a introduced
>>>> bug. But I think you'd better cc stable branch.
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140225/4bf61a97/attachment-0001.html
prev parent reply other threads:[~2014-02-25 2:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-17 7:10 [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash Junxiao Bi
2012-07-17 19:49 ` Sunil Mushran
2012-07-18 1:36 ` Junxiao Bi
[not found] ` <CAEeiSHXpcU6xXeDzP3nA8jGDnoit-NtZHM2A73hya_9c01Y_mg@mail.gmail.com>
[not found] ` <50076428.2040908@oracle.com>
[not found] ` <CAEeiSHV+TVsnwqnsi0u4r=ucBoddo8wD8DcqbsCn1UoA3xjtdg@mail.gmail.com>
2014-02-24 9:07 ` Junxiao Bi
2014-02-24 23:30 ` Srinivas Eeda
2014-02-25 1:54 ` Junxiao Bi
2014-02-25 2:14 ` Junxiao Bi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530BFC85.5030409@oracle.com \
--to=junxiao.bi@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.