From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bogdanov Date: Fri, 19 Aug 2011 23:32:17 +0300 Subject: [Cluster-devel] (Repost from linux-cluster) Handling of CPG_REASON_NODEDOWN in daemons Message-ID: <4E4EC851.9030608@hoster-ok.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi all, I originally posted the same content to linux-cluster list, but there is no answer there, so I suspect that this list is more suitable. Several days ago I found that clvmd CPG in my cluster went to kern_stop state, after there were some problems on corosync ring due to high load. Cluster now contains three nodes, two bare-metal and one VM. VM suffered from insufficient scheduling due to host load, and cluster went to split-brain for one second and quickly recovered back. CPG issued CPG_REASON_NODEDOWN event, and after that clvmd went to kern_stop on two bare-metal nodes and to kern_stop,fencing on VM (natural, it didn't have a quorum). I would expect VM to be fenced, but actual fencing did not happen. clvmd cpg stuck in kern_stop even after that VM was fenced manually, so I needed to take the whole cluster down to recover. I discovered a reason why node was not fenced on CPG_REASON_NODEDOWN event. Here what I see in dlm_tool dump: 1313579105 Processing membership 80592 1313579105 Skipped active node 939787530: born-on=80580, last-seen=80592, this-event=80592, last-event=80580 1313579105 Skipped active node 956564746: born-on=80564, last-seen=80592, this-event=80592, last-event=80580 1313579105 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1543767306" 1313579105 Removed inactive node 1543767306: born-on=80572, last-seen=80580, this-event=80592, last-event=80580 1313579105 dlm:controld conf 2 0 1 memb 939787530 956564746 join left 1543767306 1313579105 dlm:ls:clvmd conf 2 0 1 memb 939787530 956564746 join left 1543767306 1313579105 clvmd add_change cg 4 remove nodeid 1543767306 reason 3 1313579105 clvmd add_change cg 4 counts member 2 joined 0 remove 1 failed 1 1313579105 clvmd stop_kernel cg 4 1313579105 write "0" to "/sys/kernel/dlm/clvmd/control" 1313579105 Node 1543767306/mgmt01 has not been shot yet 1313579105 clvmd check_fencing 1543767306 wait add 1313562825 fail 1313579105 last 0 1313579107 Node 1543767306/mgmt01 was last shot 'now' 1313579107 clvmd check_fencing 1543767306 done add 1313562825 fail 1313579105 last 1313579107 1313579107 clvmd check_fencing done That means that dlm_controld received CPG_REASON_NODEDOWN event for clvmd CPG and did not call kick_node_from_cluster(), so pacemaker didn't do fencing on behalf of clvmd cpg. Please correct me if I'm wrong: * Request for fencing of node on CPG_REASON_NODEDOWN event was historically left to groupd to do. * That's why all daemons (fenced, dlm_controld, gfs2_controld) call kick_node_from_cluster() only on CPG_REASON_PROCDOWN event, not on CPG_REASON_NODEDOWN. * groupd is obsoleted in 3.x. Shouldn't daemons request fencing on CPG_REASON_NODEDOWN too? Now they only mark node as failed and increase cg failcount. I use pacemaker-based setup, and actually use only (obsoleted) dlm_controld.pcmk, but problems seems to be a little bit wider than that daemons one. Setup is: corosync-1.4.1 openais-1.1.4 pacemaker-tip clusterlib-3.1.1 dlm_controld.pcmk from 3.0.17 lvm2-cluster-2.0.85 Best, Vladislav