From: Srinivas Eeda <srinivas.eeda@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash
Date: Mon, 24 Feb 2014 15:30:17 -0800 [thread overview]
Message-ID: <530BD609.8080104@oracle.com> (raw)
In-Reply-To: <530B0BB5.90600@oracle.com>
Junxiao, thanks for looking into this issue. Please see my comment below
On 02/24/2014 01:07 AM, Junxiao Bi wrote:
> Hi,
>
> On 07/19/2012 09:59 AM, Sunil Mushran wrote:
>> Different issues.
>>
>> On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao.bi@oracle.com
>> <mailto:junxiao.bi@oracle.com>> wrote:
>>
>> On 07/19/2012 12:36 AM, Sunil Mushran wrote:
>>> This bug was detected during code audit. Never seen a crash. If
>>> it does hit,
>>> then we have bigger problems. So no point posting to stable.
>>
> I read a lot of dlm recovery code recently, I found this bug could
> happen at the following scenario.
>
> node 1: migrate target node x:
> dlm_unregister_domain()
> dlm_migrate_all_locks()
> dlm_empty_lockres()
> select node x as migrate target node
> since there is a node x lock on the granted list.
> dlm_migrate_lockres()
> dlm_mark_lockres_migrating() {
> wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
> <<< node x unlock may happen here, res->granted list can be empty.
If the unlock request got sent at this point, and if the request was
*processed*, lock must have been removed from the granted_list. If the
request was *not yet processed*, then the DLM_LOCK_RES_MIGRATING set in
dlm_lockres_release_ast would make dlm_unlock handler to return
DLM_MIGRATING to the caller (in this case node x). So I don't see how
granted_list could have stale lock. Am I missing something ?
I do think there is such race that you pointed below exist, but I am not
sure if it was due to the above race described.
> dlm_lockres_release_ast(dlm, res);
> }
> dlm_send_one_lockres()
> dlm_process_recovery_data() {
> tmpq is
> res->granted list and is empty.
> list_for_each_entry(lock, tmpq, list) {
> if
> (lock->ml.cookie != ml->cookie)
> lock = NULL;
> else
> break;
> }
> lock will be
> invalid here.
> if (lock->ml.node
> != ml->node)
> BUG() -->
> crash here.
> }
>
> Thanks,
> Junxiao.
>>
>> Our customer can reproduce it. Also I saw you were assigned a
>> similar bug before, see
>> https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
>> same BUG?
>>>
>>> On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
>>> <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>
>>> Hi Sunil,
>>>
>>> On 07/18/2012 03:49 AM, Sunil Mushran wrote:
>>>> On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
>>>> <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>>
>>>> In the target node of the dlm lock migration, the logic
>>>> to find
>>>> the local dlm lock is wrong, it shouldn't change the
>>>> loop variable
>>>> "lock" in the list_for_each_entry loop. This will cause
>>>> a NULL-pointer
>>>> accessing crash.
>>>>
>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com
>>>> <mailto:junxiao.bi@oracle.com>>
>>>> Cc: stable at vger.kernel.org <mailto:stable@vger.kernel.org>
>>>> ---
>>>> fs/ocfs2/dlm/dlmrecovery.c | 12 +++++++-----
>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c
>>>> b/fs/ocfs2/dlm/dlmrecovery.c
>>>> index 01ebfd0..0b9cc88 100644
>>>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>>>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>>> @@ -1762,6 +1762,7 @@ static int
>>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>> u8 from = O2NM_MAX_NODES;
>>>> unsigned int added = 0;
>>>> __be64 c;
>>>> + int found;
>>>>
>>>> mlog(0, "running %d locks for this lockres\n",
>>>> mres->num_locks);
>>>> for (i=0; i<mres->num_locks; i++) {
>>>> @@ -1793,22 +1794,23 @@ static int
>>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>> /* MIGRATION ONLY! */
>>>> BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));
>>>>
>>>> + found = 0;
>>>> spin_lock(&res->spinlock);
>>>> for (j = DLM_GRANTED_LIST; j <=
>>>> DLM_BLOCKED_LIST; j++) {
>>>> tmpq =
>>>> dlm_list_idx_to_ptr(res, j);
>>>> list_for_each_entry(lock, tmpq, list) {
>>>> - if
>>>> (lock->ml.cookie != ml->cookie)
>>>> - lock = NULL;
>>>> - else
>>>> + if
>>>> (lock->ml.cookie == ml->cookie) {
>>>> + found = 1;
>>>> break;
>>>> + }
>>>> }
>>>> - if (lock)
>>>> + if (found)
>>>> break;
>>>> }
>>>>
>>>> /* lock is always created
>>>> locally first, and
>>>> * destroyed locally last. it
>>>> must be on the list */
>>>> - if (!lock) {
>>>> + if (!found) {
>>>> c = ml->cookie;
>>>> mlog(ML_ERROR, "Could not find local lock "
>>>> "with cookie %u:%llu, node %u, "
>>>>
>>>>
>>>>
>>>> https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47
>>>>
>>>>
>>>> We had decided to go back to list_for_each().
>>>
>>> OK, thank you. It's OK to revert it back for a introduced
>>> bug. But I think you'd better cc stable branch.
>>>
>>>
>>
>>
>>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140224/b704f9e0/attachment-0001.html
next prev parent reply other threads:[~2014-02-24 23:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-17 7:10 [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash Junxiao Bi
2012-07-17 19:49 ` Sunil Mushran
2012-07-18 1:36 ` Junxiao Bi
[not found] ` <CAEeiSHXpcU6xXeDzP3nA8jGDnoit-NtZHM2A73hya_9c01Y_mg@mail.gmail.com>
[not found] ` <50076428.2040908@oracle.com>
[not found] ` <CAEeiSHV+TVsnwqnsi0u4r=ucBoddo8wD8DcqbsCn1UoA3xjtdg@mail.gmail.com>
2014-02-24 9:07 ` Junxiao Bi
2014-02-24 23:30 ` Srinivas Eeda [this message]
2014-02-25 1:54 ` Junxiao Bi
2014-02-25 2:14 ` Junxiao Bi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530BD609.8080104@oracle.com \
--to=srinivas.eeda@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).