From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Ren Date: Fri, 13 May 2016 13:45:47 +0800 Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection In-Reply-To: <20160512165114.GB13651@redhat.com> References: <1463044568-19583-1-git-send-email-zren@suse.com> <20160512165114.GB13651@redhat.com> Message-ID: <57356A0B.60100@suse.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi David, Thanks very very much for explaining this to me in such nice way;-) On 05/13/2016 12:51 AM, David Teigland wrote: > > T = time in seconds, A,B,C = cluster nodes. > > At T=1 A,B,C become members and have quorum. > At T=10 a partition creates A,B | C. > At T=11 it merges and creates A,B,C. > > At T=12, A,B will have: > cluster_quorate=1 > cluster_quorate_monotime=1 > C->daemon_rem_time=10 > > At T=12, C will have: > cluster_quorate=1 > cluster_quorate_monotime=11 > A->daemon_rem_time=10 > B->daemon_rem_time=10 > > Result: > > A,B will kick C from the cluster because > cluster_quorate_monotime (1) < C->daemon_rem_time (10), > which is what we want. > > C will not kick A,B from the cluster because > cluster_quorate_monotime (11) > A->daemon_rem_time (10), > which is what we want. > > It's the simpler case, but does that sound right so far? Sure;-) > ... > > If the partition and merge occur within the same second, then: > > At T=1 A,B,C become members and get quorum. > At T=10 a partition creates A,B | C. > At T=10 it merges and creates A,B,C. > > At T=12, A,B will have: > cluster_quorate=1 > cluster_quorate_monotime=1 > C->daemon_rem_time=10 > > At T=12, C will have: > cluster_quorate=1 > cluster_quorate_monotime=10 > A->daemon_rem_time=10 > B->daemon_rem_time=10 > > Result: > > A,B will kick C from the cluster because > cluster_quorate_monotime (1) < C->daemon_rem_time (10), > which is what we want. > > C will not kick A,B from the cluster because > cluster_quorate_monotime (10) = A->daemon_rem_time (10), > which is what we want. > > If that's correct, there doesn't seem to be problem so far. > If we apply your patch, won't it allow C to kick A,B from the > cluster since cluster_quorate_monotime = A->daemon_rem_time? Aha, yes! Also, this patch doesn't make any sense because it doesn't work if cluster_quorate_monotime (the time when network re-connected) > A->daemon_rem_time (when network disconnected) when there are 3 or more partitions that merge. > ... > > If you're looking at a cluster with an equal partition, e.g. A,B | C,D, > then it becomes messy because cluster_quorate_monotime = daemon_rem_time > everywhere after the merge. In this case, no nodes will kick others from > the cluster, but with your patch, each side will kick the other side from > the cluster. Neither option is good. In the past we decided to let the > cluster sit in this state so an admin could choose which nodes to remove. > Do you prefer the alternative of kicking nodes in this case (with somewhat > unpredictable results)? If so, we could make that an optional setting, > but we'd want to keep the existing behavior for non-even partitions in the > example above. > Gotcha, thanks! But could you please elaborate a bit more on the meaning of "with somewhat unpredictable results"? IMHO, you mean some inconsistent problems may happen? or just worry about all the service would be interrupted because all nodes would be fenced? Actually, the reason why I'm working on this problem is because the patch for this issue has been merged into pacemaker: [1] https://github.com/ClusterLabs/pacemaker/pull/839 Personally, I think it's a totally bad fix which makes careful thoughts and efforts of dlm_controld in vain, because it make pacemaker fence this node whenever dlm resource agent notices there's fencing going on by "dlm_tool ls | grep -q "wait fencing"", right? This fix is for: [2] https://bugzilla.redhat.com/show_bug.cgi?id=1268313 We've seen unnecessary fence happens in two-node cluster with this fix. E.g. both of two nodes will be fenced when we kill corosync on one of nodes. Thanks for your suggestion. So far, it looks like an optional setting is much better than this fix. I'd like to have a try;-) Thanks, Eric > >> diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c >> index 356e80d..cd8a4e2 100644 >> --- a/dlm_controld/daemon_cpg.c >> +++ b/dlm_controld/daemon_cpg.c >> @@ -1695,7 +1695,7 @@ static void receive_protocol(struct dlm_header *hd, int len) >> node->stateful_merge = 1; >> >> if (cluster_quorate && node->daemon_rem_time && >> - cluster_quorate_monotime < node->daemon_rem_time) { >> + cluster_quorate_monotime <= node->daemon_rem_time) { >> if (!node->killed) { >> if (cluster_two_node) { >> /* >> -- >> 2.6.6