From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Ren <zren@suse.com>
Date: Fri, 13 May 2016 13:45:47 +0800
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of
 network transient disconnection
In-Reply-To: <20160512165114.GB13651@redhat.com>
References: <1463044568-19583-1-git-send-email-zren@suse.com>
	<20160512165114.GB13651@redhat.com>
Message-ID: <57356A0B.60100@suse.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi David,

Thanks very very much for explaining this to me in such nice way;-)

On 05/13/2016 12:51 AM, David Teigland wrote:
>
> T = time in seconds, A,B,C = cluster nodes.
>
> At T=1 A,B,C become members and have quorum.
> At T=10 a partition creates A,B | C.
> At T=11 it merges and creates A,B,C.
>
> At T=12, A,B will have:
> cluster_quorate=1
> cluster_quorate_monotime=1
> C->daemon_rem_time=10
>
> At T=12, C will have:
> cluster_quorate=1
> cluster_quorate_monotime=11
> A->daemon_rem_time=10
> B->daemon_rem_time=10
>
> Result:
>
> A,B will kick C from the cluster because
> cluster_quorate_monotime (1) < C->daemon_rem_time (10),
> which is what we want.
>
> C will not kick A,B from the cluster because
> cluster_quorate_monotime (11) > A->daemon_rem_time (10),
> which is what we want.
>
> It's the simpler case, but does that sound right so far?

Sure;-)

> ...
>
> If the partition and merge occur within the same second, then:
>
> At T=1 A,B,C become members and get quorum.
> At T=10 a partition creates A,B | C.
> At T=10 it merges and creates A,B,C.
>
> At T=12, A,B will have:
> cluster_quorate=1
> cluster_quorate_monotime=1
> C->daemon_rem_time=10
>
> At T=12, C will have:
> cluster_quorate=1
> cluster_quorate_monotime=10
> A->daemon_rem_time=10
> B->daemon_rem_time=10
>
> Result:
>
> A,B will kick C from the cluster because
> cluster_quorate_monotime (1) < C->daemon_rem_time (10),
> which is what we want.
>
> C will not kick A,B from the cluster because
> cluster_quorate_monotime (10) = A->daemon_rem_time (10),
> which is what we want.
>
> If that's correct, there doesn't seem to be problem so far.
> If we apply your patch, won't it allow C to kick A,B from the
> cluster since cluster_quorate_monotime = A->daemon_rem_time?

Aha, yes! Also, this patch doesn't make any sense because it doesn't 
work if cluster_quorate_monotime (the time when network re-connected) > 
A->daemon_rem_time (when network disconnected) when there are 3 or more
partitions that merge.

> ...
>
> If you're looking at a cluster with an equal partition, e.g. A,B | C,D,
> then it becomes messy because cluster_quorate_monotime = daemon_rem_time
> everywhere after the merge.  In this case, no nodes will kick others from
> the cluster, but with your patch, each side will kick the other side from
> the cluster.  Neither option is good.  In the past we decided to let the
> cluster sit in this state so an admin could choose which nodes to remove.
> Do you prefer the alternative of kicking nodes in this case (with somewhat
> unpredictable results)?  If so, we could make that an optional setting,
> but we'd want to keep the existing behavior for non-even partitions in the
> example above.
>

Gotcha, thanks! But could you please elaborate a bit more on the meaning 
of "with somewhat unpredictable results"? IMHO, you mean some 
inconsistent problems may happen? or just worry about all the service 
would be interrupted because all nodes would be fenced?

Actually, the reason why I'm working on this problem is because the 
patch for this issue has been merged into pacemaker:

        [1] https://github.com/ClusterLabs/pacemaker/pull/839

Personally, I think it's a totally bad fix which makes careful thoughts 
and efforts of dlm_controld in vain, because it make pacemaker fence 
this node whenever dlm resource agent notices there's fencing going on 
by "dlm_tool ls | grep -q "wait fencing"", right?

This fix is for:
        [2] https://bugzilla.redhat.com/show_bug.cgi?id=1268313

We've seen unnecessary fence happens in two-node cluster with this fix. 
E.g. both of two nodes will be fenced when we kill corosync on one of nodes.

Thanks for your suggestion. So far, it looks like an optional setting is 
much better than this fix. I'd like to have a try;-)

Thanks,
Eric

>
>> diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
>> index 356e80d..cd8a4e2 100644
>> --- a/dlm_controld/daemon_cpg.c
>> +++ b/dlm_controld/daemon_cpg.c
>> @@ -1695,7 +1695,7 @@ static void receive_protocol(struct dlm_header *hd, int len)
>>   		node->stateful_merge = 1;
>>
>>   		if (cluster_quorate && node->daemon_rem_time &&
>> -		    cluster_quorate_monotime < node->daemon_rem_time) {
>> +		    cluster_quorate_monotime <= node->daemon_rem_time) {
>>   			if (!node->killed) {
>>   				if (cluster_two_node) {
>>   					/*
>> --
>> 2.6.6