From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bogdanov Date: Thu, 01 Sep 2011 16:02:30 +0300 Subject: [Cluster-devel] (Repost from linux-cluster) Handling of CPG_REASON_NODEDOWN in daemons In-Reply-To: <4E4EC851.9030608@hoster-ok.com> References: <4E4EC851.9030608@hoster-ok.com> Message-ID: <4E5F8266.3020907@hoster-ok.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit No reply... Did I ask something extremely stupid? One more addition: pacemaker seen transitional membership change event the same time. 19.08.2011 23:32, Vladislav Bogdanov wrote: > Hi all, > > I originally posted the same content to linux-cluster list, but there is > no answer there, so I suspect that this list is more suitable. > > Several days ago I found that clvmd CPG in my cluster went to kern_stop > state, after there were some problems on corosync ring due to high load. > > Cluster now contains three nodes, two bare-metal and one VM. VM suffered > from insufficient scheduling due to host load, and cluster went to > split-brain for one second and quickly recovered back. > > CPG issued CPG_REASON_NODEDOWN event, and after that clvmd went to > kern_stop on two bare-metal nodes and to kern_stop,fencing on VM > (natural, it didn't have a quorum). > > I would expect VM to be fenced, but actual fencing did not happen. clvmd > cpg stuck in kern_stop even after that VM was fenced manually, so I > needed to take the whole cluster down to recover. > > I discovered a reason why node was not fenced on CPG_REASON_NODEDOWN event. > > Here what I see in dlm_tool dump: > 1313579105 Processing membership 80592 > 1313579105 Skipped active node 939787530: born-on=80580, > last-seen=80592, this-event=80592, last-event=80580 > 1313579105 Skipped active node 956564746: born-on=80564, > last-seen=80592, this-event=80592, last-event=80580 > 1313579105 del_configfs_node rmdir > "/sys/kernel/config/dlm/cluster/comms/1543767306" > 1313579105 Removed inactive node 1543767306: born-on=80572, > last-seen=80580, this-event=80592, last-event=80580 > 1313579105 dlm:controld conf 2 0 1 memb 939787530 956564746 join left > 1543767306 > 1313579105 dlm:ls:clvmd conf 2 0 1 memb 939787530 956564746 join left > 1543767306 > 1313579105 clvmd add_change cg 4 remove nodeid 1543767306 reason 3 > 1313579105 clvmd add_change cg 4 counts member 2 joined 0 remove 1 failed 1 > 1313579105 clvmd stop_kernel cg 4 > 1313579105 write "0" to "/sys/kernel/dlm/clvmd/control" > 1313579105 Node 1543767306/mgmt01 has not been shot yet > 1313579105 clvmd check_fencing 1543767306 wait add 1313562825 fail > 1313579105 last 0 > 1313579107 Node 1543767306/mgmt01 was last shot 'now' > 1313579107 clvmd check_fencing 1543767306 done add 1313562825 fail > 1313579105 last 1313579107 > 1313579107 clvmd check_fencing done > > That means that dlm_controld received CPG_REASON_NODEDOWN event for > clvmd CPG and did not call kick_node_from_cluster(), so pacemaker didn't > do fencing on behalf of clvmd cpg. > > Please correct me if I'm wrong: > * Request for fencing of node on CPG_REASON_NODEDOWN event was > historically left to groupd to do. > * That's why all daemons (fenced, dlm_controld, gfs2_controld) call > kick_node_from_cluster() only on CPG_REASON_PROCDOWN event, not on > CPG_REASON_NODEDOWN. > * groupd is obsoleted in 3.x. > > Shouldn't daemons request fencing on CPG_REASON_NODEDOWN too? > Now they only mark node as failed and increase cg failcount. > > I use pacemaker-based setup, and actually use only (obsoleted) > dlm_controld.pcmk, but problems seems to be a little bit wider than that > daemons one. > > Setup is: > corosync-1.4.1 > openais-1.1.4 > pacemaker-tip > clusterlib-3.1.1 > dlm_controld.pcmk from 3.0.17 > lvm2-cluster-2.0.85 > > Best, > Vladislav >