From: Menyhart Zoltan <Zoltan.Menyhart@bull.net>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] "->ls_in_recovery" not released
Date: Tue, 23 Nov 2010 15:58:42 +0100 [thread overview]
Message-ID: <4CEBD6A2.8090005@bull.net> (raw)
In-Reply-To: <20101122173442.GA21879@redhat.com>
David Teigland wrote:
> On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
>> We have got a two-node OCFS2 file system controlled by the pacemaker.
>
> Are you using dlm_controld.pcmk?
Yes.
>If so, please try the latest versions of
> pacemaker that use the standard dlm_controld.
Actually we have dlm-pcmk-3.0.12-23.el6.x86_64.
I downloaded git://git.fedorahosted.org/dlm.git
We shall try it soon.
>> "ls_recover()" includes several other cases when it simply goes
>> to the "fail:" branch without setting free "->ls_in_recovery" and
>> without cleaning up the inconsistent data left behind.
>>
>> I think some error handling code is missing in "ls_recover()".
>> Have you modified the DLM since the RHEL 6.0?
>
> No, in_recovery is supposed to remain locked until recovery completes.
> Any number of ls_recover() calls can fail due to more member changes
> during recovery, but one of them should eventually succeed (complete
> recovery), once the membership stops changing. Then in_recovery will be
> unlocked.
>
> Look at the specific errors causing ls_recover() to fail, and check if
> it's a confchg-related failure (like above), or another kind of error.
Assume the "other" node is lost, possibly forever.
"dlm_wait_function()" can return only if "dlm_ls_stop()" gets called
in the mean time.
I suppose the user-land can do something like this:
echo 0 > /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
Actually I tried it by hand: it did not unblock the situation.
I gues at the next time, it was "ping_members()" that returned
with error==1. The dead"other" node was still on the list.
Again, "ls_recover()" returned without setting free "->ls_in_recovery".
How can be "ls_recover()" reentered to be able to carry out the
recovery and to set "->ls_in_recovery" free?
(Assuming the "other" node is lost, possibly forever.)
Thanks for your response.
Zoltan Menyhart
next prev parent reply other threads:[~2010-11-23 14:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 16:31 [Cluster-devel] "->ls_in_recovery" not released Menyhart Zoltan
2010-11-22 17:34 ` David Teigland
2010-11-23 14:58 ` Menyhart Zoltan [this message]
2010-11-23 17:15 ` David Teigland
2010-11-24 16:13 ` Menyhart Zoltan
2010-11-24 20:29 ` David Teigland
2010-11-30 16:57 ` [Cluster-devel] Patch: making DLM more robust Menyhart Zoltan
2010-11-30 17:30 ` David Teigland
2010-12-01 9:23 ` Menyhart Zoltan
2010-12-01 17:27 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CEBD6A2.8090005@bull.net \
--to=zoltan.menyhart@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.