All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srinivas Eeda <srinivas.eeda@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash
Date: Mon, 24 Feb 2014 15:30:17 -0800	[thread overview]
Message-ID: <530BD609.8080104@oracle.com> (raw)
In-Reply-To: <530B0BB5.90600@oracle.com>

Junxiao, thanks for looking into this issue. Please see my comment below

On 02/24/2014 01:07 AM, Junxiao Bi wrote:
> Hi,
>
> On 07/19/2012 09:59 AM, Sunil Mushran wrote:
>> Different issues.
>>
>> On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao.bi@oracle.com 
>> <mailto:junxiao.bi@oracle.com>> wrote:
>>
>>     On 07/19/2012 12:36 AM, Sunil Mushran wrote:
>>>     This bug was detected during code audit. Never seen a crash. If
>>>     it does hit,
>>>     then we have bigger problems. So no point posting to stable.
>>
> I read a lot of dlm recovery code recently, I found this bug could 
> happen at the following scenario.
>
> node 1:                                             migrate target node x:
> dlm_unregister_domain()
>  dlm_migrate_all_locks()
>   dlm_empty_lockres()
>    select node x as migrate target node
>    since there is a node x lock on the granted list.
>    dlm_migrate_lockres()
>     dlm_mark_lockres_migrating() {
>      wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
> <<< node x unlock may happen here, res->granted list can be empty.
If the unlock request got sent at this point, and if the request was 
*processed*, lock must have been removed from the granted_list. If the 
request was *not yet processed*, then the DLM_LOCK_RES_MIGRATING set in 
dlm_lockres_release_ast would make dlm_unlock handler to return 
DLM_MIGRATING to the caller (in this case node x). So I don't see how 
granted_list could have stale lock. Am I missing something ?

I do think there is such race that you pointed below exist, but I am not 
sure if it was due to the above race described.

> dlm_lockres_release_ast(dlm, res);
>     }
>     dlm_send_one_lockres()
> dlm_process_recovery_data() {
>                                                     tmpq is 
> res->granted list and is empty.
> list_for_each_entry(lock, tmpq, list) {
>                                                      if 
> (lock->ml.cookie != ml->cookie)
>                                                       lock = NULL;
>                                                      else
>                                                       break;
>                                                     }
>                                                     lock will be 
> invalid here.
>                                                     if (lock->ml.node 
> != ml->node)
>                                                         BUG() --> 
> crash here.
>                                                    }
>
> Thanks,
> Junxiao.
>>
>>     Our customer can reproduce it. Also I saw you were assigned a
>>     similar bug before, see
>>     https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
>>     same BUG?
>>>
>>>     On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
>>>     <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>
>>>         Hi Sunil,
>>>
>>>         On 07/18/2012 03:49 AM, Sunil Mushran wrote:
>>>>         On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
>>>>         <junxiao.bi at oracle.com <mailto:junxiao.bi@oracle.com>> wrote:
>>>>
>>>>             In the target node of the dlm lock migration, the logic
>>>>             to find
>>>>             the local dlm lock is wrong, it shouldn't change the
>>>>             loop variable
>>>>             "lock" in the list_for_each_entry loop. This will cause
>>>>             a NULL-pointer
>>>>             accessing crash.
>>>>
>>>>             Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com
>>>>             <mailto:junxiao.bi@oracle.com>>
>>>>             Cc: stable at vger.kernel.org <mailto:stable@vger.kernel.org>
>>>>             ---
>>>>              fs/ocfs2/dlm/dlmrecovery.c |   12 +++++++-----
>>>>              1 file changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>>             diff --git a/fs/ocfs2/dlm/dlmrecovery.c
>>>>             b/fs/ocfs2/dlm/dlmrecovery.c
>>>>             index 01ebfd0..0b9cc88 100644
>>>>             --- a/fs/ocfs2/dlm/dlmrecovery.c
>>>>             +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>>>             @@ -1762,6 +1762,7 @@ static int
>>>>             dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>>                     u8 from = O2NM_MAX_NODES;
>>>>                     unsigned int added = 0;
>>>>                     __be64 c;
>>>>             +       int found;
>>>>
>>>>                     mlog(0, "running %d locks for this lockres\n",
>>>>             mres->num_locks);
>>>>                     for (i=0; i<mres->num_locks; i++) {
>>>>             @@ -1793,22 +1794,23 @@ static int
>>>>             dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>>>                                     /* MIGRATION ONLY! */
>>>>             BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));
>>>>
>>>>             +                       found = 0;
>>>>             spin_lock(&res->spinlock);
>>>>                                     for (j = DLM_GRANTED_LIST; j <=
>>>>             DLM_BLOCKED_LIST; j++) {
>>>>                                             tmpq =
>>>>             dlm_list_idx_to_ptr(res, j);
>>>>             list_for_each_entry(lock, tmpq, list) {
>>>>             -                                       if
>>>>             (lock->ml.cookie != ml->cookie)
>>>>             -       lock = NULL;
>>>>             - else
>>>>             +                                       if
>>>>             (lock->ml.cookie == ml->cookie) {
>>>>             +       found = 1;
>>>>                   break;
>>>>             +                                       }
>>>>                                             }
>>>>             -                               if (lock)
>>>>             +                               if (found)
>>>>             break;
>>>>                                     }
>>>>
>>>>                                     /* lock is always created
>>>>             locally first, and
>>>>                                      * destroyed locally last.  it
>>>>             must be on the list */
>>>>             -                       if (!lock) {
>>>>             +                       if (!found) {
>>>>                                             c = ml->cookie;
>>>>             mlog(ML_ERROR, "Could not find local lock "
>>>>                  "with cookie %u:%llu, node %u, "
>>>>
>>>>
>>>>
>>>>         https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47
>>>>
>>>>
>>>>         We had decided to go back to list_for_each().
>>>
>>>         OK, thank you. It's OK to revert it back for a introduced
>>>         bug. But I think you'd better cc stable branch.
>>>
>>>
>>
>>
>>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140224/b704f9e0/attachment-0001.html 

  reply	other threads:[~2014-02-24 23:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-17  7:10 [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash Junxiao Bi
2012-07-17 19:49 ` Sunil Mushran
2012-07-18  1:36   ` Junxiao Bi
     [not found]     ` <CAEeiSHXpcU6xXeDzP3nA8jGDnoit-NtZHM2A73hya_9c01Y_mg@mail.gmail.com>
     [not found]       ` <50076428.2040908@oracle.com>
     [not found]         ` <CAEeiSHV+TVsnwqnsi0u4r=ucBoddo8wD8DcqbsCn1UoA3xjtdg@mail.gmail.com>
2014-02-24  9:07           ` Junxiao Bi
2014-02-24 23:30             ` Srinivas Eeda [this message]
2014-02-25  1:54               ` Junxiao Bi
2014-02-25  2:14               ` Junxiao Bi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530BD609.8080104@oracle.com \
    --to=srinivas.eeda@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.