From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio Massimo Di Nitto Date: Fri, 24 Nov 2006 12:01:22 +0100 Subject: [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related) Message-ID: <4566D102.3060707@ubuntu.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi guys, I found a corner case where calling fence_tools -w leave will/might hang. in my setup where i have 2 nodes cluster: - both nodes are up - poweroff the first one -> OK - reboot the second one -> OK - the second node comes up again: cman_tools services will show: fence 0 default 00040001 JOIN_START_WAIT since the first node is "dead" there is never a complete switch to state = none. if you call fence_tools -w leave it will hang there forever. in my init scripts I just changed the fence_stop() to use the usual wait 10 seconds or die kind of loop: fence_tool -w leave & for sec in $(seq 1 10); do if pidof fence_tool &> /dev/null; then if [ "$sec" = 10 ]; then kill $(pidof fence_tool) > /dev/null 2>&1 else sleep 1 fi fi done Regards Fabio PS I spotted this problem when updating the Ubuntu init scripts, but the code used in upstream init script seems to suffer the exact same problem. You also want to note that i am not checking for fenced to exit, but for the tools to return. -- I'm going to make him an offer he can't refuse.