From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Peterson Date: Mon, 27 Nov 2006 10:22:25 -0600 Subject: [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related) In-Reply-To: <4566D102.3060707@ubuntu.com> References: <4566D102.3060707@ubuntu.com> Message-ID: <456B10C1.8090604@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Fabio Massimo Di Nitto wrote: > Hi guys, > > I found a corner case where calling fence_tools -w leave will/might hang. > in my setup where i have 2 nodes cluster: > > - both nodes are up > - poweroff the first one -> OK > - reboot the second one -> OK > - the second node comes up again: > > cman_tools services will show: > fence 0 default 00040001 JOIN_START_WAIT > > since the first node is "dead" there is never a complete switch to state = none. > > if you call fence_tools -w leave it will hang there forever. > > in my init scripts I just changed the fence_stop() to use the usual wait 10 > seconds or die kind of loop: > > fence_tool -w leave & > for sec in $(seq 1 10); do > if pidof fence_tool &> /dev/null; then > if [ "$sec" = 10 ]; then > kill $(pidof fence_tool) > /dev/null 2>&1 > else > sleep 1 > fi > fi > done > > Regards > Fabio > > PS I spotted this problem when updating the Ubuntu init scripts, but the code > used in upstream init script seems to suffer the exact same problem. You also > want to note that i am not checking for fenced to exit, but for the tools to return. > > Hi Fabio, You should be able to do the same thing by specifying -t 10 for a ten-second timeout on fence_tool. For example: fence_tool -t 10 -w leave The default timeout value is five minutes, which means the hang shouldn't last forever at any rate. Regards, Bob Peterson Red Hat Cluster Suite