All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] dlm stress test hangs OCFS2
Date: Tue, 15 Sep 2009 17:49:15 -0700	[thread overview]
Message-ID: <4AB0360B.4050602@oracle.com> (raw)
In-Reply-To: <4AAF3E24.9050207@suse.de>

So originally my thinking was that the dc thread was not getting kicked.
That is not the case. The lock is getting downconverted. But it is getting
upconverted shortly thereafter. This just could be the case in which dlmglue
is slow to increment the holders to block the dc thread from downconverting
the lock. The snippet shows that BAST is received 16 usecs after the 
upconvert.

Coly, I have another patch. Pop out the older patch before applying this 
one.
http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch

BAST:
[368.807757] (2572,dlm_astd,0):ocfs2_blocking_ast:1025 BAST fired for 
lockres M0000000000000000085e0200000000, blocking 5, level 3 type Meta
[368.807767] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3839 lockres 
M0000000000000000085e0200000000 blocked.
[368.807774] (2571,ocfs2dc,0):ocfs2_prepare_downconvert:3232 lock 
M0000000000000000085e0200000000, new_level = 0, l_blocking = 5
[368.807779] (2571,ocfs2dc,0):ocfs2_downconvert_lock:3252 lock 
M0000000000000000085e0200000000, level 3 => 0
[368.807799] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3863 lockres 
M0000000000000000085e0200000000, requeue = no.

Downconvert AST:
[368.807806] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock 
M0000000000000000085e0200000000, action 3, unlock 0

Upconvert AST:
[369.007930] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock 
M0000000000000000085e0200000000, action 2, unlock 0

BAST:
[369.007946] (2572,dlm_astd,0):ocfs2_blocking_ast:1025 BAST fired for 
lockres M0000000000000000085e0200000000, blocking 5, level 3 type Meta
[369.007956] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3839 lockres 
M0000000000000000085e0200000000 blocked.
[369.007962] (2571,ocfs2dc,0):ocfs2_prepare_downconvert:3232 lock 
M0000000000000000085e0200000000, new_level = 0, l_blocking = 5
[369.007967] (2571,ocfs2dc,0):ocfs2_downconvert_lock:3252 lock 
M0000000000000000085e0200000000, level 3 => 0
[369.007987] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3863 lockres 
M0000000000000000085e0200000000, requeue = no.

Downconvert AST:
[369.007994] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock 
M0000000000000000085e0200000000, action 3, unlock 0

Upconvert AST:
[369.208048] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock 
M0000000000000000085e0200000000, action 2, unlock 0


Coly Li wrote:
>
> Sunil Mushran Wrote:
>> The full trace is available here.
>> http://oss.oracle.com/~smushran/calltrace_x1
>>
>> So one sees the following block repeated. It shows that the lock is
>> being downconverted from EX to NL but also upconverted presumably to EX.
>>
>>
>> Coli, Can you map the pids to the process names.
>>
>
> Hi Sunil,
>
> In the attached trace info, I add current->comm after the pid.
>
> Here is the steps I reproduce the blocking,
> - This time I only run 1 make_panic process on each node.
> - I ran make_panic on x1 firstly, then on x2.
> - On node x1, after creating 3-4 files, I start the make_panic script on node x2.
> - On node x2, make_panic blocked immediately, no file created. On node x1, after
> creating 13 files, make_panic blocked too.
> - After waiting for several minutes, still blocked. I stop gathering trace info.
>
> Please check the attachment. Thank.
>

  reply	other threads:[~2009-09-16  0:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-18 19:26 [Ocfs2-devel] dlm stress test hangs OCFS2 Coly Li
2009-08-18 19:34 ` David Teigland
2009-08-19  3:06 ` Sunil Mushran
2009-09-02 17:11   ` Coly Li
2009-09-02 22:01     ` Sunil Mushran
2009-09-03 16:24       ` Coly Li
2009-09-03 16:24         ` Sunil Mushran
2009-09-09 20:07           ` Coly Li
2009-09-09 21:42             ` Sunil Mushran
2009-09-10  5:38               ` Coly Li
2009-09-11 22:57                 ` Sunil Mushran
2009-09-13 14:08                   ` Coly Li
2009-09-14 19:30                     ` Sunil Mushran
2009-09-14 20:23                       ` Coly Li
2009-09-14 23:57                         ` Sunil Mushran
2009-09-15  7:11                           ` Coly Li
2009-09-16  0:49                             ` Sunil Mushran [this message]
2009-09-21 17:25                               ` Coly Li
2009-09-21 17:25                                 ` Sunil Mushran
2009-09-21 17:31                                   ` Sunil Mushran
2009-09-21 17:43                                     ` Coly Li
2009-09-21 19:03                                     ` Coly Li
2009-09-23  6:32                               ` [Ocfs2-devel] questions of AST and BAST (was Re: dlm stress test hangs OCFS2) Coly Li
2009-09-23 18:21                                 ` Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AB0360B.4050602@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.