From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] "->ls_in_recovery" not released
Date: Mon, 22 Nov 2010 12:34:42 -0500 [thread overview]
Message-ID: <20101122173442.GA21879@redhat.com> (raw)
In-Reply-To: <4CEA9ADD.2050109@bull.net>
On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
> We have got a two-node OCFS2 file system controlled by the pacemaker.
Are you using dlm_controld.pcmk? If so, please try the latest versions of
pacemaker that use the standard dlm_controld. The problem may be related
to the lockspace membership events that are passed to the kernel from
dlm_controld. 'dlm_tool dump' from each node, correlated with the
corosync membership events, may probably reveal the problem. Start by
looking at the sequence of confchg log messages,
e.g. "dlm:ls:g conf 3 1 0 memb 1 2 4 join 4 left"
conf
3 = number of members
1 = number of members that joined
0 = number of members that left
"memb 1 2 4" - nodeids of members
"join 4" - nodeids of members that joined
"left" - nodeids of members that left
> "ls_recover()" includes several other cases when it simply goes
> to the "fail:" branch without setting free "->ls_in_recovery" and
> without cleaning up the inconsistent data left behind.
>
> I think some error handling code is missing in "ls_recover()".
> Have you modified the DLM since the RHEL 6.0?
No, in_recovery is supposed to remain locked until recovery completes.
Any number of ls_recover() calls can fail due to more member changes
during recovery, but one of them should eventually succeed (complete
recovery), once the membership stops changing. Then in_recovery will be
unlocked.
Look at the specific errors causing ls_recover() to fail, and check if
it's a confchg-related failure (like above), or another kind of error.
Dave
next prev parent reply other threads:[~2010-11-22 17:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 16:31 [Cluster-devel] "->ls_in_recovery" not released Menyhart Zoltan
2010-11-22 17:34 ` David Teigland [this message]
2010-11-23 14:58 ` Menyhart Zoltan
2010-11-23 17:15 ` David Teigland
2010-11-24 16:13 ` Menyhart Zoltan
2010-11-24 20:29 ` David Teigland
2010-11-30 16:57 ` [Cluster-devel] Patch: making DLM more robust Menyhart Zoltan
2010-11-30 17:30 ` David Teigland
2010-12-01 9:23 ` Menyhart Zoltan
2010-12-01 17:27 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101122173442.GA21879@redhat.com \
--to=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.