From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio M. Di Nitto Date: Mon, 29 Aug 2011 09:38:38 +0200 Subject: [Cluster-devel] Question about /etc/init.d/cman start In-Reply-To: <24E144B8C0207547AD09C467A8259F75377CB397@lisa.maurer-it.com> References: <24E144B8C0207547AD09C467A8259F75377CB339@lisa.maurer-it.com> <4E5B336F.4020908@redhat.com> <24E144B8C0207547AD09C467A8259F75377CB397@lisa.maurer-it.com> Message-ID: <4E5B41FE.1090309@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On 08/29/2011 09:01 AM, Dietmar Maurer wrote: >> It is actually configurable via /etc/sysconfig/cman (or /etc/defaults/cman on >> debian based systems) >> >> # CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster >> on >> # startup quorum is needed by many other applications, so we may as >> # well wait here. If CMAN_QUORUM_TIMEOUT is zero, quorum will >> # be ignored. >> [ -z "$CMAN_QUORUM_TIMEOUT" ] && CMAN_QUORUM_TIMEOUT=45 >> >> Setting CMAN_QUORUM_TIMEOUT=0 will simply stop waiting for quorum and >> continue the execution of the init script. > > Sure, but I want to wait for quorum. > >> Assuming you want to retain the default behavior, once quorum is gained, it is >> enough to execute /etc/init.d/cman start again. The script is clever enough to >> start only what is necessary. >> You have a good point regarding cmannotifyd. In theory it could be used to >> trigger a "/etc/init.d/cman start" once quorum is achieved and notification >> dispatched. I can fix this upstream, but for any RHEL6 changes, I'll need you to > > I compile my own packages for debian, so a fix for upstream would be great. I am just unsure > if we should call unfence_self() from cmannotifyd. I guess it is OK if we check that > we got quorum for the first time? No you can't call unfencing from cmannotifyd. I honestly don't recall all the details on why, but one of the reason is (for example): - node 1 and node 2 - node 1 start experiencing network problems - node 2 fence-scsi node 1 - node 1 unfence itself in a non clean state due to cman notifications of up/downs. - cluster goes kaboom. > Besides, why do you want that extra complexity running 'cman start' from cmannotifyd? Especially error handling is somehow unclear to me (what if cman start fails there?). Well it's one way to do it. cmannotifyd (as documented) does not provide error handling itself. The reason is that you can't really halt all cluster operations because a bad script is activated by a "random" user via cmannotifyd. > So can't we simple make those daemon smart enough so that we can start them at boot time (always)? They are smart enough. You are misreading the comments about wait for quorum in cman init. The daemons can be safely started at boot time, even without quorum, but they can't do anything useful till quorum is achieved. That is why it is possible to override the wait for quorum. Most users have requested and wants to wait for quorum and fail if there is no quorum since it really doesn't help to have more daemons running on top cman. So maybe what you want is an option to: wait for quorum, if there is no quorum after timeout, still allow everything else to start? Fabio