cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Eric Ren <zren@suse.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection
Date: Fri, 13 May 2016 13:45:47 +0800	[thread overview]
Message-ID: <57356A0B.60100@suse.com> (raw)
In-Reply-To: <20160512165114.GB13651@redhat.com>

Hi David,

Thanks very very much for explaining this to me in such nice way;-)

On 05/13/2016 12:51 AM, David Teigland wrote:
>
> T = time in seconds, A,B,C = cluster nodes.
>
> At T=1 A,B,C become members and have quorum.
> At T=10 a partition creates A,B | C.
> At T=11 it merges and creates A,B,C.
>
> At T=12, A,B will have:
> cluster_quorate=1
> cluster_quorate_monotime=1
> C->daemon_rem_time=10
>
> At T=12, C will have:
> cluster_quorate=1
> cluster_quorate_monotime=11
> A->daemon_rem_time=10
> B->daemon_rem_time=10
>
> Result:
>
> A,B will kick C from the cluster because
> cluster_quorate_monotime (1) < C->daemon_rem_time (10),
> which is what we want.
>
> C will not kick A,B from the cluster because
> cluster_quorate_monotime (11) > A->daemon_rem_time (10),
> which is what we want.
>
> It's the simpler case, but does that sound right so far?

Sure;-)

> ...
>
> If the partition and merge occur within the same second, then:
>
> At T=1 A,B,C become members and get quorum.
> At T=10 a partition creates A,B | C.
> At T=10 it merges and creates A,B,C.
>
> At T=12, A,B will have:
> cluster_quorate=1
> cluster_quorate_monotime=1
> C->daemon_rem_time=10
>
> At T=12, C will have:
> cluster_quorate=1
> cluster_quorate_monotime=10
> A->daemon_rem_time=10
> B->daemon_rem_time=10
>
> Result:
>
> A,B will kick C from the cluster because
> cluster_quorate_monotime (1) < C->daemon_rem_time (10),
> which is what we want.
>
> C will not kick A,B from the cluster because
> cluster_quorate_monotime (10) = A->daemon_rem_time (10),
> which is what we want.
>
> If that's correct, there doesn't seem to be problem so far.
> If we apply your patch, won't it allow C to kick A,B from the
> cluster since cluster_quorate_monotime = A->daemon_rem_time?

Aha, yes! Also, this patch doesn't make any sense because it doesn't 
work if cluster_quorate_monotime (the time when network re-connected) > 
A->daemon_rem_time (when network disconnected) when there are 3 or more
partitions that merge.

> ...
>
> If you're looking at a cluster with an equal partition, e.g. A,B | C,D,
> then it becomes messy because cluster_quorate_monotime = daemon_rem_time
> everywhere after the merge.  In this case, no nodes will kick others from
> the cluster, but with your patch, each side will kick the other side from
> the cluster.  Neither option is good.  In the past we decided to let the
> cluster sit in this state so an admin could choose which nodes to remove.
> Do you prefer the alternative of kicking nodes in this case (with somewhat
> unpredictable results)?  If so, we could make that an optional setting,
> but we'd want to keep the existing behavior for non-even partitions in the
> example above.
>

Gotcha, thanks! But could you please elaborate a bit more on the meaning 
of "with somewhat unpredictable results"? IMHO, you mean some 
inconsistent problems may happen? or just worry about all the service 
would be interrupted because all nodes would be fenced?

Actually, the reason why I'm working on this problem is because the 
patch for this issue has been merged into pacemaker:

        [1] https://github.com/ClusterLabs/pacemaker/pull/839

Personally, I think it's a totally bad fix which makes careful thoughts 
and efforts of dlm_controld in vain, because it make pacemaker fence 
this node whenever dlm resource agent notices there's fencing going on 
by "dlm_tool ls | grep -q "wait fencing"", right?

This fix is for:
        [2] https://bugzilla.redhat.com/show_bug.cgi?id=1268313

We've seen unnecessary fence happens in two-node cluster with this fix. 
E.g. both of two nodes will be fenced when we kill corosync on one of nodes.

Thanks for your suggestion. So far, it looks like an optional setting is 
much better than this fix. I'd like to have a try;-)

Thanks,
Eric

>
>> diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
>> index 356e80d..cd8a4e2 100644
>> --- a/dlm_controld/daemon_cpg.c
>> +++ b/dlm_controld/daemon_cpg.c
>> @@ -1695,7 +1695,7 @@ static void receive_protocol(struct dlm_header *hd, int len)
>>   		node->stateful_merge = 1;
>>
>>   		if (cluster_quorate && node->daemon_rem_time &&
>> -		    cluster_quorate_monotime < node->daemon_rem_time) {
>> +		    cluster_quorate_monotime <= node->daemon_rem_time) {
>>   			if (!node->killed) {
>>   				if (cluster_two_node) {
>>   					/*
>> --
>> 2.6.6



  reply	other threads:[~2016-05-13  5:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-12  9:16 [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection Eric Ren
2016-05-12 16:51 ` David Teigland
2016-05-13  5:45   ` Eric Ren [this message]
2016-05-13 15:49     ` David Teigland
2016-05-16  7:44       ` Eric Ren
2016-05-16 17:02         ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57356A0B.60100@suse.com \
    --to=zren@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).