From: eric <zren@suse.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] (no subject)
Date: Tue, 13 Oct 2015 18:07:14 +0800 [thread overview]
Message-ID: <561CD7D2.7070107@suse.com> (raw)
Hi David and list,
I'm working on ocfs2, and encountered an problem about dlm posix file lock.
After some investigation, I'd like to share information about it and get
some
hints from you.
Environment:
kernel: 3.12.47
FS: OCFS2
stack: pacemaker
cluster: 2 testing nodes, node1, node2
Issue desc:
There is a deadlock test case for file lock in ocfs2 test suites. The
deadlock test first prepare
an testing file1 on shared disk, then on node1 do "fcntl(file1,
F_SETLKW, {F_WRLCK, SEEK_SET, 0, 0})"
, then on node2 set alarm(10s) and also "fcntl(file1, F_SETLKW,
{F_WRLCK, SEEK_SET, 0, 0})".
It expects alarm timeout to send SIGALRM, and wake up the sleep process,
as "man fcntl"
says: "If a signal is caught while waiting, then the call is
interrupted and (after the signal handler has returned)
returns immediately (with return value -1 and errno set to EINTR".
But, the process on node2 was in "Dl" state when using ps, and signal
was blocked. So, the test case was hung for ever.
Investigations:
* Key debug infos:
process stack on node1:
n1:/opt/ocfs2-test/bin # cat /proc/22677/stack
[<ffffffff8104250b>] kvm_clock_get_cycles+0x1b/0x20
[<ffffffff810ba924>] __getnstimeofday+0x34/0xc0
[<ffffffff810ba9ba>] getnstimeofday+0xa/0x30
[<ffffffff811bb30d>] SyS_poll+0x5d/0xf0
[<ffffffff81529809>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
process stack on node2:
n2:~ # cat /proc/1534/stack
[<ffffffffa050fa65>] dlm_posix_lock+0x185/0x380 [dlm]
[<ffffffff811f39ce>] fcntl_setlk+0x12e/0x2d0
[<ffffffff811b8231>] SyS_fcntl+0x261/0x510
[<ffffffff81529809>] system_call_fastpath+0x16/0x1b
[<00007f3f5721eb42>] 0x7f3f5721eb42
[<ffffffffffffffff>] 0xffffffffffffffff
* dlm_posix_lock
Through adding printk and recompile dlm kernel module, where n2 is hung
has been located:
dlm_posix_lock -> wait_event_killable
And wait_event_killable will put process into "TASK_KILLABLE" state which's like
"UNINTERRUPTABLE" but can be waked up by fatal signals. I did some tests, SIGTERM
can did it, but SIGALRM cannot.
Did this go against posix file lock semanteme? Any hints would be very appreciated!
I can provide any infos as I can if needed;-)
Thanks,
Eric
next reply other threads:[~2015-10-13 10:07 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-13 10:07 eric [this message]
-- strict thread matches above, loose matches on Subject: below --
2017-10-09 9:12 [Cluster-devel] (no subject) Andreas Gruenbacher
2010-02-05 5:45 Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561CD7D2.7070107@suse.com \
--to=zren@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).