From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection
Date: Mon, 16 May 2016 12:02:46 -0500 [thread overview]
Message-ID: <20160516170246.GB20979@redhat.com> (raw)
In-Reply-To: <57397A5B.3070302@suse.com>
On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote:
> Thanks! Hum, according to the long comments, you've handled the 2/2
> even split by way of the low nodeid killing statefull merged
> numbers.
Interesting, I'd forgotten about that bit of code, so I was wrong to say
that we do nothing after a 2/2 partition merge.
> we can do that because the "clean nodes" == "stateful nodes", right? I
> guess you're saying the case that there're 3 or more partitions that
> merge, and none could see enough clean nodes?
That sounds about right, this is a fairly narrow special case that mainly
helps in the case of two evenly split partitions. There could be some
cases with more nodes and 3+ partitions where it might help.
> Now DLM outputs log message like:
> "fence work wait to clear merge $merge_count clean $clean_count part
> $part_count gone $gone_count". I'm wondering if we can provide these
> info and how long DLM has been stuck by "dlm_tool $some_cmd"?
Yes, that's a good idea. dlm_tool should clearly report if the dlm is
blocked because of a stateful partition merge. Perhaps a new global
variable printed by 'dlm_tool status -v'?
I'm not exactly sure of the condition when we'd set this new variable,
could you try it out? For a start, maybe something similar to this:
static int stateful_merge_wait;
...
if (!cluster_two_node && merge_count) {
log_retry(retry_fencing, "fence work wait to clear merge %d clean %d part %d gone %d",
merge_count, clean_count, part_count, gone_count);
if ((clean_count >= merge_count) && !part_count && (low == our_nodeid))
kick_stateful_merge_members();
+ if ((clean_count < merge_count) && !part_count)
+ stateful_merge_wait=1;
retry = 1;
goto out;
}
(and added to print_state_daemon)
Dave
prev parent reply other threads:[~2016-05-16 17:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-12 9:16 [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection Eric Ren
2016-05-12 16:51 ` David Teigland
2016-05-13 5:45 ` Eric Ren
2016-05-13 15:49 ` David Teigland
2016-05-16 7:44 ` Eric Ren
2016-05-16 17:02 ` David Teigland [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160516170246.GB20979@redhat.com \
--to=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.