From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Ren Date: Thu, 12 May 2016 17:16:08 +0800 Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection Message-ID: <1463044568-19583-1-git-send-email-zren@suse.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit DLM would be stuck in "need fencing" state, although cluster can regain quorum very quickly after a network transient disconnection. It's possible that this process happens within one monoclock. It means "cluster_quorate_monotime" can eqaul "node->daemon_rem_time". We now skip this chance of telling corosync to kill cluster for stateful merge. As a result, any fencing cannot proceed further. Signed-off-by: Eric Ren --- dlm_controld/daemon_cpg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c index 356e80d..cd8a4e2 100644 --- a/dlm_controld/daemon_cpg.c +++ b/dlm_controld/daemon_cpg.c @@ -1695,7 +1695,7 @@ static void receive_protocol(struct dlm_header *hd, int len) node->stateful_merge = 1; if (cluster_quorate && node->daemon_rem_time && - cluster_quorate_monotime < node->daemon_rem_time) { + cluster_quorate_monotime <= node->daemon_rem_time) { if (!node->killed) { if (cluster_two_node) { /* -- 2.6.6