All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] dlm stress test hangs OCFS2
Date: Mon, 21 Sep 2009 10:31:45 -0700	[thread overview]
Message-ID: <4AB7B881.2040608@oracle.com> (raw)
In-Reply-To: <4AB7B710.3040801@oracle.com>

Please could you log a bugzilla (oss.oracle.com/bugzilla) and attach
the logs to it.

Sunil Mushran wrote:
> The patch does not have a fix. Only tracing. We may have to disable
> a printk for the 2 node to reproduce.
>
> For the BUG, can I have the full logs. The oops trace and the tracing
> from all nodes.
>
> Thanks
> Sunil
>
> Coly Li wrote:
>   
>> Hi Sunil,
>>
>> I tried this patch, on 2 nodes cluster, it works. No blocking observed so far.
>> Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously,
>> and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of
>> the nodes (I observed the oops on node x4),
>>
>> 3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
>> 3215                                               int new_level)
>> 3216 {
>> 3217         assert_spin_locked(&lockres->l_lock);
>> 3218
>> 3219         BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
>> 3220
>> 3221         if (lockres->l_level <= new_level) {
>> 3222                 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n",
>> 3223                      lockres->l_level, new_level);
>> 3224                 BUG();
>> 3225         }
>> 3226
>> 3227         mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n",
>> 3228              lockres->l_name, new_level, lockres->l_blocking);
>> 3229
>> 3230         lockres->l_action = OCFS2_AST_DOWNCONVERT;
>> 3231         lockres->l_requested = new_level;
>> 3232         lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
>> 3233         return lockres_set_pending(lockres);
>> 3234 }
>>
>> I am trying to understand what you did now :-)
>>
>> Sunil Mushran Wrote:
>>   
>>     
>>> So originally my thinking was that the dc thread was not getting kicked.
>>> That is not the case. The lock is getting downconverted. But it is getting
>>> upconverted shortly thereafter. This just could be the case in which
>>> dlmglue
>>> is slow to increment the holders to block the dc thread from downconverting
>>> the lock. The snippet shows that BAST is received 16 usecs after the
>>> upconvert.
>>>
>>> Coly, I have another patch. Pop out the older patch before applying this
>>> one.
>>> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch
>>>
>>>     
>>>       
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>   

  reply	other threads:[~2009-09-21 17:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-18 19:26 [Ocfs2-devel] dlm stress test hangs OCFS2 Coly Li
2009-08-18 19:34 ` David Teigland
2009-08-19  3:06 ` Sunil Mushran
2009-09-02 17:11   ` Coly Li
2009-09-02 22:01     ` Sunil Mushran
2009-09-03 16:24       ` Coly Li
2009-09-03 16:24         ` Sunil Mushran
2009-09-09 20:07           ` Coly Li
2009-09-09 21:42             ` Sunil Mushran
2009-09-10  5:38               ` Coly Li
2009-09-11 22:57                 ` Sunil Mushran
2009-09-13 14:08                   ` Coly Li
2009-09-14 19:30                     ` Sunil Mushran
2009-09-14 20:23                       ` Coly Li
2009-09-14 23:57                         ` Sunil Mushran
2009-09-15  7:11                           ` Coly Li
2009-09-16  0:49                             ` Sunil Mushran
2009-09-21 17:25                               ` Coly Li
2009-09-21 17:25                                 ` Sunil Mushran
2009-09-21 17:31                                   ` Sunil Mushran [this message]
2009-09-21 17:43                                     ` Coly Li
2009-09-21 19:03                                     ` Coly Li
2009-09-23  6:32                               ` [Ocfs2-devel] questions of AST and BAST (was Re: dlm stress test hangs OCFS2) Coly Li
2009-09-23 18:21                                 ` Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AB7B881.2040608@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.