From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Wed, 3 Oct 2012 12:24:11 -0400 Subject: [Cluster-devel] fence daemon problems In-Reply-To: <24E144B8C0207547AD09C467A8259F75576A155B@lisa.maurer-it.com> References: <24E144B8C0207547AD09C467A8259F755768AE73@lisa.maurer-it.com> <24E144B8C0207547AD09C467A8259F755769CF56@lisa.maurer-it.com> <20121003144614.GB12614@redhat.com> <24E144B8C0207547AD09C467A8259F75576A155B@lisa.maurer-it.com> Message-ID: <20121003162411.GC12614@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Oct 03, 2012 at 04:12:10PM +0000, Dietmar Maurer wrote: > > Yes, it's a stateful partition merge, and I think /var/log/messages should have > > mentioned something about that. When a node is partitioned from the > > others (e.g. network disconnected), it has to be cleanly reset before it's > > allowed back. "cleanly reset" typically means rebooted. If it comes back > > without being reset (e.g. network reconnected), then the others ignore it, > > which is what you saw. > What message should I look for? I was wrong, I was thinking about the "daemon node %d stateful merge" messages which are debug, but should probably be changed to error. > I don't really understand why 'dlm_controld' initiates fencing, although > the node does not has quorum? > > I thought 'dlm_controld' should wait until cluster is quorate before > starting fence actions? I guess you're talking about the dlm_tool ls output? The "fencing" there means it is waiting for fenced to finish fencing before it starts dlm recovery. fenced waits for quorum. hp2:~# dlm_tool ls dlm lockspaces name rgmanager id 0x5231f3eb flags 0x00000004 kern_stop change member 3 joined 1 remove 0 failed 0 seq 2,2 members 2 3 4 new change member 2 joined 0 remove 1 failed 1 seq 3,3 new status wait_messages 0 wait_condition 1 fencing new members 3 4