From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection
Date: Mon, 16 May 2016 12:02:46 -0500 [thread overview]
Message-ID: <20160516170246.GB20979@redhat.com> (raw)
In-Reply-To: <57397A5B.3070302@suse.com>
On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote:
> Thanks! Hum, according to the long comments, you've handled the 2/2
> even split by way of the low nodeid killing statefull merged
> numbers.
Interesting, I'd forgotten about that bit of code, so I was wrong to say
that we do nothing after a 2/2 partition merge.
> we can do that because the "clean nodes" == "stateful nodes", right? I
> guess you're saying the case that there're 3 or more partitions that
> merge, and none could see enough clean nodes?
That sounds about right, this is a fairly narrow special case that mainly
helps in the case of two evenly split partitions. There could be some
cases with more nodes and 3+ partitions where it might help.
> Now DLM outputs log message like:
> "fence work wait to clear merge $merge_count clean $clean_count part
> $part_count gone $gone_count". I'm wondering if we can provide these
> info and how long DLM has been stuck by "dlm_tool $some_cmd"?
Yes, that's a good idea. dlm_tool should clearly report if the dlm is
blocked because of a stateful partition merge. Perhaps a new global
variable printed by 'dlm_tool status -v'?
I'm not exactly sure of the condition when we'd set this new variable,
could you try it out? For a start, maybe something similar to this:
static int stateful_merge_wait;
...
if (!cluster_two_node && merge_count) {
log_retry(retry_fencing, "fence work wait to clear merge %d clean %d part %d gone %d",
merge_count, clean_count, part_count, gone_count);
if ((clean_count >= merge_count) && !part_count && (low == our_nodeid))
kick_stateful_merge_members();
+ if ((clean_count < merge_count) && !part_count)
+ stateful_merge_wait=1;
retry = 1;
goto out;
}
(and added to print_state_daemon)
Dave
prev parent reply other threads:[~2016-05-16 17:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-12 9:16 [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection Eric Ren
2016-05-12 16:51 ` David Teigland
2016-05-13 5:45 ` Eric Ren
2016-05-13 15:49 ` David Teigland
2016-05-16 7:44 ` Eric Ren
2016-05-16 17:02 ` David Teigland [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160516170246.GB20979@redhat.com \
--to=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).