All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] dlm stress test hangs OCFS2
Date: Wed, 09 Sep 2009 14:42:14 -0700	[thread overview]
Message-ID: <4AA82136.9000403@oracle.com> (raw)
In-Reply-To: <4AA80AE4.9090105@suse.de>

Can you send me the raw trace output.

Coly Li wrote:
> Sunil Mushran Wrote:
>   
>> You will have to trace thru process_blocked_lock() to make sense of this.
>>
>>     
> I try to trace process_blocked_lock(), the result is quite complex. I attach the
> modified fs/ocfs2 code (ocfs2-trace.tar.bz2) in this email, since it's not
> latest upstream ocfs2 code.
>
> Then I run the bash script to create zero byte file, when the blocking happens,
> I dump the dmesg output from all nodes.
>
> The printk message is quite long, therefore, I try to divide the printk messages
> into several mod (up to 11, see printk-mods.txt), and try to replace the printk
> messages by a symbol (M1, M2, ... M11). If there is a symbol continuous
> repeated, I only keep one and followed with its repeat number.
>
> Then I get a much short printk message dump file, I past the content here:
> --------------------------------------------
> M1
> M2 (1505)
> M3
> M4
> M2 (1848)
> w_level: 0x0
> ocfs2_unblock_lock:3281 ctl->unblock_action: 0.
> ocfs2_unblock_lock:3297 ctl->requeue = 0
> ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
> ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
> lockres->l_level:3, new_level: 0
> ocfs2_unblock_lock:3320 gen: 0x7a4ec, ret: 0x0
> ocfs2_unblock_lock:3322 lockres->l_flags: 0x157
> 	<<<exit ocfs2_unblock_lock after leave.
> ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
> 0x157, ctl{unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M2 (286)
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b653c0, 33188, 0, 'x1_2872_722')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M6
> M11
> M5 (276)
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a8a0, 33188, 0, 'x1_2860_733')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M7
> M1
> M2 (274)
> M3
> M4
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a2f0, 33188, 0, 'x1_2872_725')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M1
> M5 (295)
> M6
> M7
> M1
> M2
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a700, 33188, 0, 'x1_2860_734')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M1
> M5 (455)
> M11
> M5 (275)
> M11
> M5 (275)
> M6
> M7
> M1
> M2
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65080, 33188, 0, 'x1_2860_735')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M1
> M5 (204)
> M11
> M5 (696)
> {unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M5 (229)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (275)
> b=1.
> ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
> lockres->l_level:5, new_level: 0
> ocfs2_unblock_lock:3320 gen: 0x8945b, ret: 0x0
> ocfs2_unblock_lock:3322 lockres->l_flags: 0x157
> 	<<<exit ocfs2_unblock_lock after leave.
> ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
> 0x157, ctl{unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M5 (272)
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65490, 33188, 0, 'x1_2860_738')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M6
> M7
> M1
> M2
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b652f0, 33188, 0, 'x1_2872_730')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M1
> M8
> ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65630, 33188, 0, 'x1_2860_739')
> ocfs2_mknod:252 call ocfs2_inode_lock
> M9 (2)
> M10 (110)
> nblock_lock:3297 ctl->requeue = 0
> ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
> ocfs2_unblock_lock:3301 set_lvb=1.
> ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
> ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
> lockres->l_level:5, new_level: 3
> ocfs2_unblock_lock:3320 gen: 0x8a092, ret: 0x0
> ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
> 	<<<exit ocfs2_unblock_lock after leave.
> ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
> 0x147, ctl{unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M10 (260)
> nblock_lock:3297 ctl->requeue = 0
> ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
> ocfs2_unblock_lock:3301 set_lvb=1.
> ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
> ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
> lockres->l_level:5, new_level: 3
> ocfs2_unblock_lock:3320 gen: 0x8a6c6, ret: 0x0
> ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
> 	<<<exit ocfs2_unblock_lock after leave.
> ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
> 0x147, ctl{unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M10 (259)
> nblock_lock:3297 ctl->requeue = 0
> ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
> ocfs2_unblock_lock:3301 set_lvb=1.
> ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
> ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
> lockres->l_level:5, new_level: 3
> ocfs2_unblock_lock:3320 gen: 0x8bdf8, ret: 0x0
> ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
> 	<<<exit ocfs2_unblock_lock after leave.
> ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
> 0x147, ctl{unblock_action:0, requeue:0})
> <<<exit ocfs2_process_blocked_lock
> M10 (269)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (589)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (795)
> M11
> M5 (247)
> M11
> M5 (275)
> M11
> M5 (275)
> M11
> M5 (463)
> --------------------------------------------
>
> There are some incomplete printk messages in the dump file, I don't have idea
> yet why part of them are missed. If ignoring the incomplete printk messages, we
> can find ocfs2_precess_blocked_lock() execution in mod 2 and 5 happens >99%.
> This is an interested result, I can not explain yet.
>
> Another notable thing is, I find in only in mod 1,2,4,5,7,9,10,11, l_ex_holders
> and l_ro_holders are all zero. IMHO, if current lockres is PR, there should be
> at least 1 l_ro_holder; if current lockres is EX, there should be at least 1
> l_ex_holders. Still I can not see my observation is a result of the blocking
> issue, or a source of the blocking issue.
>
> I update my current progress, and continue to work on it. If there are any
> suggestion from anyone on the list, I am glad to know :-)
>
> Thanks.
>   

  reply	other threads:[~2009-09-09 21:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-18 19:26 [Ocfs2-devel] dlm stress test hangs OCFS2 Coly Li
2009-08-18 19:34 ` David Teigland
2009-08-19  3:06 ` Sunil Mushran
2009-09-02 17:11   ` Coly Li
2009-09-02 22:01     ` Sunil Mushran
2009-09-03 16:24       ` Coly Li
2009-09-03 16:24         ` Sunil Mushran
2009-09-09 20:07           ` Coly Li
2009-09-09 21:42             ` Sunil Mushran [this message]
2009-09-10  5:38               ` Coly Li
2009-09-11 22:57                 ` Sunil Mushran
2009-09-13 14:08                   ` Coly Li
2009-09-14 19:30                     ` Sunil Mushran
2009-09-14 20:23                       ` Coly Li
2009-09-14 23:57                         ` Sunil Mushran
2009-09-15  7:11                           ` Coly Li
2009-09-16  0:49                             ` Sunil Mushran
2009-09-21 17:25                               ` Coly Li
2009-09-21 17:25                                 ` Sunil Mushran
2009-09-21 17:31                                   ` Sunil Mushran
2009-09-21 17:43                                     ` Coly Li
2009-09-21 19:03                                     ` Coly Li
2009-09-23  6:32                               ` [Ocfs2-devel] questions of AST and BAST (was Re: dlm stress test hangs OCFS2) Coly Li
2009-09-23 18:21                                 ` Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AA82136.9000403@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.