All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)
@ 2006-11-24 11:01 Fabio Massimo Di Nitto
  2006-11-27 16:22 ` Robert Peterson
  0 siblings, 1 reply; 4+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-11-24 11:01 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi guys,

I found a corner case where calling fence_tools -w leave will/might hang.
in my setup where i have 2 nodes cluster:

- both nodes are up
- poweroff the first one -> OK
- reboot the second one -> OK
- the second node comes up again:

cman_tools services will show:
fence            0     default  00040001 JOIN_START_WAIT

since the first node is "dead" there is never a complete switch to state = none.

if you call fence_tools -w leave it will hang there forever.

in my init scripts I just changed the fence_stop() to use the usual wait 10
seconds or die kind of loop:

         fence_tool -w leave &
         for sec in $(seq 1 10); do
                 if pidof fence_tool &> /dev/null; then
                         if [ "$sec" = 10 ]; then
                                 kill $(pidof fence_tool) > /dev/null 2>&1
                         else
                                 sleep 1
                         fi
                 fi
         done

Regards
Fabio

PS I spotted this problem when updating the Ubuntu init scripts, but the code
used in upstream init script seems to suffer the exact same problem. You also
want to note that i am not checking for fenced to exit, but for the tools to return.

-- 
I'm going to make him an offer he can't refuse.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)
  2006-11-24 11:01 [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related) Fabio Massimo Di Nitto
@ 2006-11-27 16:22 ` Robert Peterson
  2006-11-27 16:33   ` Fabio Massimo Di Nitto
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Peterson @ 2006-11-27 16:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Fabio Massimo Di Nitto wrote:
> Hi guys,
>
> I found a corner case where calling fence_tools -w leave will/might hang.
> in my setup where i have 2 nodes cluster:
>
> - both nodes are up
> - poweroff the first one -> OK
> - reboot the second one -> OK
> - the second node comes up again:
>
> cman_tools services will show:
> fence            0     default  00040001 JOIN_START_WAIT
>
> since the first node is "dead" there is never a complete switch to state = none.
>
> if you call fence_tools -w leave it will hang there forever.
>
> in my init scripts I just changed the fence_stop() to use the usual wait 10
> seconds or die kind of loop:
>
>          fence_tool -w leave &
>          for sec in $(seq 1 10); do
>                  if pidof fence_tool &> /dev/null; then
>                          if [ "$sec" = 10 ]; then
>                                  kill $(pidof fence_tool) > /dev/null 2>&1
>                          else
>                                  sleep 1
>                          fi
>                  fi
>          done
>
> Regards
> Fabio
>
> PS I spotted this problem when updating the Ubuntu init scripts, but the code
> used in upstream init script seems to suffer the exact same problem. You also
> want to note that i am not checking for fenced to exit, but for the tools to return.
>
>   
Hi Fabio,

You should be able to do the same thing by specifying -t 10 for a 
ten-second timeout
on fence_tool.  For example:

fence_tool -t 10 -w leave

The default timeout value is five minutes, which means the hang 
shouldn't last
forever at any rate.

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)
  2006-11-27 16:22 ` Robert Peterson
@ 2006-11-27 16:33   ` Fabio Massimo Di Nitto
  2006-11-27 23:38     ` Robert Peterson
  0 siblings, 1 reply; 4+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-11-27 16:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Robert Peterson wrote:
> Fabio Massimo Di Nitto wrote:

>>   
> Hi Fabio,
> 
> You should be able to do the same thing by specifying -t 10 for a 
> ten-second timeout
> on fence_tool.  For example:
> 
> fence_tool -t 10 -w leave
> 
> The default timeout value is five minutes, which means the hang 
> shouldn't last
> forever at any rate.

Ok!

would it be possible to update the man page to reflect this option?
It's not mentioned there (and well my bad i didn't look at the code) and that's
why i have been looking for a workaround.

Thanks
Fabio

-- 
I'm going to make him an offer he can't refuse.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)
  2006-11-27 16:33   ` Fabio Massimo Di Nitto
@ 2006-11-27 23:38     ` Robert Peterson
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Peterson @ 2006-11-27 23:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Fabio Massimo Di Nitto wrote:
> Ok!
>
> would it be possible to update the man page to reflect this option?
> It's not mentioned there (and well my bad i didn't look at the code) and that's
> why i have been looking for a workaround.
>
> Thanks
> Fabio
Hi Fabio,

I opened up a bugzilla bug to fix this problem.  I put you on the cc 
list too.  See:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217460

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-11-27 23:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-24 11:01 [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related) Fabio Massimo Di Nitto
2006-11-27 16:22 ` Robert Peterson
2006-11-27 16:33   ` Fabio Massimo Di Nitto
2006-11-27 23:38     ` Robert Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.