From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chrissie Caulfield Date: Mon, 02 Mar 2009 07:59:58 +0000 Subject: [Cluster-devel] unfencing (cman startup) In-Reply-To: <1235756800.27848.211.camel@cerberus.int.fabbione.net> References: <1235370440.7816.209.camel@cerberus.int.fabbione.net> <20090223181530.GB12791@redhat.com> <1235413889.7816.256.camel@cerberus.int.fabbione.net> <20090223184030.GC12791@redhat.com> <1235415175.7816.261.camel@cerberus.int.fabbione.net> <20090223190958.GD12791@redhat.com> <1235631117.27848.62.camel@cerberus.int.fabbione.net> <20090226143326.GA8234@redhat.com> <1235671562.27848.107.camel@cerberus.int.fabbione.net> <49A7E27C.5070407@redhat.com> <20090227155209.GB1181@redhat.com> <1235756800.27848.211.camel@cerberus.int.fabbione.net> Message-ID: <49AB91FE.6000207@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Fabio M. Di Nitto wrote: > On Fri, 2009-02-27 at 09:52 -0600, David Teigland wrote: >> On Fri, Feb 27, 2009 at 12:54:20PM +0000, Chrissie Caulfield wrote: >>>>>> Given the time at which fence_node -U will fire, you probably want to >>>>>> add a cman_init + cman_is_active + cman_finish loop in fence_node to >>>>>> make sure cman is ready to reply to our ccs queries, otherwise we might >>>>>> have a race condition at boot time (it might be already there.. didn't >>>>>> really check the code). All our daemons do that to give cman time to >>>>>> bootstrap. >>>>> Yes, good point. I wonder if we'd be better off having cman_tool join >>>>> effectively do an is_active wait before exiting? Then we could probably >>>>> avoid doing it many other places. (It's also annoying when corosync crashes >>>>> after is_active completes, but before I've read what I need from cman/ccs.) >>> Err, cman_tool already does this with the -w switch, and the init script >>> uses it. >> Great, so the constant flogging to add cman_is_active checks everywhere will >> end!? Can I remove all my cman_is_active loops? > > This works fine via init script. We could theoretically kill all those > loops but at least for us developers, that start stuff by hand, they > could still be useful.. and maybe a good failsafe if we ask users to run > something manually for debugging.. dunno.. just a thought. I don't have > a strong opinion on this matter. > You might as well take them out to be honest. Those loops are mostly overspill from the RHEL4 cman where cman started up but could take 20-30 seconds to start or join a cluster. With openais/corosync once the daemon is up then you can talk to it. It might not be quorate ... but that IS your problem :-) Chrissie