From: Coly Li <coly.li@suse.de>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] dlm stress test hangs OCFS2
Date: Thu, 10 Sep 2009 04:07:00 +0800 [thread overview]
Message-ID: <4AA80AE4.9090105@suse.de> (raw)
In-Reply-To: <4A9FEDAC.50704@oracle.com>
Sunil Mushran Wrote:
> You will have to trace thru process_blocked_lock() to make sense of this.
>
I try to trace process_blocked_lock(), the result is quite complex. I attach the
modified fs/ocfs2 code (ocfs2-trace.tar.bz2) in this email, since it's not
latest upstream ocfs2 code.
Then I run the bash script to create zero byte file, when the blocking happens,
I dump the dmesg output from all nodes.
The printk message is quite long, therefore, I try to divide the printk messages
into several mod (up to 11, see printk-mods.txt), and try to replace the printk
messages by a symbol (M1, M2, ... M11). If there is a symbol continuous
repeated, I only keep one and followed with its repeat number.
Then I get a much short printk message dump file, I past the content here:
--------------------------------------------
M1
M2 (1505)
M3
M4
M2 (1848)
w_level: 0x0
ocfs2_unblock_lock:3281 ctl->unblock_action: 0.
ocfs2_unblock_lock:3297 ctl->requeue = 0
ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
lockres->l_level:3, new_level: 0
ocfs2_unblock_lock:3320 gen: 0x7a4ec, ret: 0x0
ocfs2_unblock_lock:3322 lockres->l_flags: 0x157
<<<exit ocfs2_unblock_lock after leave.
ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
0x157, ctl{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M2 (286)
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b653c0, 33188, 0, 'x1_2872_722')
ocfs2_mknod:252 call ocfs2_inode_lock
M6
M11
M5 (276)
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a8a0, 33188, 0, 'x1_2860_733')
ocfs2_mknod:252 call ocfs2_inode_lock
M7
M1
M2 (274)
M3
M4
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a2f0, 33188, 0, 'x1_2872_725')
ocfs2_mknod:252 call ocfs2_inode_lock
M1
M5 (295)
M6
M7
M1
M2
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff88000781a700, 33188, 0, 'x1_2860_734')
ocfs2_mknod:252 call ocfs2_inode_lock
M1
M5 (455)
M11
M5 (275)
M11
M5 (275)
M6
M7
M1
M2
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65080, 33188, 0, 'x1_2860_735')
ocfs2_mknod:252 call ocfs2_inode_lock
M1
M5 (204)
M11
M5 (696)
{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M5 (229)
M11
M5 (275)
M11
M5 (275)
M11
M5 (275)
M11
M5 (275)
b=1.
ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
lockres->l_level:5, new_level: 0
ocfs2_unblock_lock:3320 gen: 0x8945b, ret: 0x0
ocfs2_unblock_lock:3322 lockres->l_flags: 0x157
<<<exit ocfs2_unblock_lock after leave.
ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
0x157, ctl{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M5 (272)
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65490, 33188, 0, 'x1_2860_738')
ocfs2_mknod:252 call ocfs2_inode_lock
M6
M7
M1
M2
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b652f0, 33188, 0, 'x1_2872_730')
ocfs2_mknod:252 call ocfs2_inode_lock
M1
M8
ocfs2_mknod:246 (0xffff880017b835e8, 0xffff880017b65630, 33188, 0, 'x1_2860_739')
ocfs2_mknod:252 call ocfs2_inode_lock
M9 (2)
M10 (110)
nblock_lock:3297 ctl->requeue = 0
ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
ocfs2_unblock_lock:3301 set_lvb=1.
ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
lockres->l_level:5, new_level: 3
ocfs2_unblock_lock:3320 gen: 0x8a092, ret: 0x0
ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
<<<exit ocfs2_unblock_lock after leave.
ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
0x147, ctl{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M10 (260)
nblock_lock:3297 ctl->requeue = 0
ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
ocfs2_unblock_lock:3301 set_lvb=1.
ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
lockres->l_level:5, new_level: 3
ocfs2_unblock_lock:3320 gen: 0x8a6c6, ret: 0x0
ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
<<<exit ocfs2_unblock_lock after leave.
ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
0x147, ctl{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M10 (259)
nblock_lock:3297 ctl->requeue = 0
ocfs2_unblock_lock:3299 LOCK_TYPE_USES_LVB.
ocfs2_unblock_lock:3301 set_lvb=1.
ocfs2_unblock_lock:3311 call lockres->l_ops->set_lvb
ocfs2_prepare_downconvert:3069 lock M0000000000000000085e0200000000,
lockres->l_level:5, new_level: 3
ocfs2_unblock_lock:3320 gen: 0x8bdf8, ret: 0x0
ocfs2_unblock_lock:3322 lockres->l_flags: 0x147
<<<exit ocfs2_unblock_lock after leave.
ocfs2_process_blocked_lock:3707 call lockres_clear_flags LOCK_QEUED(l_flags:
0x147, ctl{unblock_action:0, requeue:0})
<<<exit ocfs2_process_blocked_lock
M10 (269)
M11
M5 (275)
M11
M5 (275)
M11
M5 (589)
M11
M5 (275)
M11
M5 (275)
M11
M5 (275)
M11
M5 (795)
M11
M5 (247)
M11
M5 (275)
M11
M5 (275)
M11
M5 (463)
--------------------------------------------
There are some incomplete printk messages in the dump file, I don't have idea
yet why part of them are missed. If ignoring the incomplete printk messages, we
can find ocfs2_precess_blocked_lock() execution in mod 2 and 5 happens >99%.
This is an interested result, I can not explain yet.
Another notable thing is, I find in only in mod 1,2,4,5,7,9,10,11, l_ex_holders
and l_ro_holders are all zero. IMHO, if current lockres is PR, there should be
at least 1 l_ro_holder; if current lockres is EX, there should be at least 1
l_ex_holders. Still I can not see my observation is a result of the blocking
issue, or a source of the blocking issue.
I update my current progress, and continue to work on it. If there are any
suggestion from anyone on the list, I am glad to know :-)
Thanks.
--
Coly Li
SuSE Labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ocfs2-trace.tar.bz2
Type: application/x-bzip
Size: 329239 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20090910/169a7f3d/attachment-0001.bin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: printk-mods.txt
Url: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20090910/169a7f3d/attachment-0001.txt
next prev parent reply other threads:[~2009-09-09 20:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-18 19:26 [Ocfs2-devel] dlm stress test hangs OCFS2 Coly Li
2009-08-18 19:34 ` David Teigland
2009-08-19 3:06 ` Sunil Mushran
2009-09-02 17:11 ` Coly Li
2009-09-02 22:01 ` Sunil Mushran
2009-09-03 16:24 ` Coly Li
2009-09-03 16:24 ` Sunil Mushran
2009-09-09 20:07 ` Coly Li [this message]
2009-09-09 21:42 ` Sunil Mushran
2009-09-10 5:38 ` Coly Li
2009-09-11 22:57 ` Sunil Mushran
2009-09-13 14:08 ` Coly Li
2009-09-14 19:30 ` Sunil Mushran
2009-09-14 20:23 ` Coly Li
2009-09-14 23:57 ` Sunil Mushran
2009-09-15 7:11 ` Coly Li
2009-09-16 0:49 ` Sunil Mushran
2009-09-21 17:25 ` Coly Li
2009-09-21 17:25 ` Sunil Mushran
2009-09-21 17:31 ` Sunil Mushran
2009-09-21 17:43 ` Coly Li
2009-09-21 19:03 ` Coly Li
2009-09-23 6:32 ` [Ocfs2-devel] questions of AST and BAST (was Re: dlm stress test hangs OCFS2) Coly Li
2009-09-23 18:21 ` Sunil Mushran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AA80AE4.9090105@suse.de \
--to=coly.li@suse.de \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.