[Cluster-devel] unfence during startup

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] unfence during startup
@ 2009-11-06 17:27 David Teigland
  2009-11-12 17:50 ` Lon H. Hohberger
  0 siblings, 1 reply; 2+ messages in thread
From: David Teigland @ 2009-11-06 17:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The current init.d/cman startup sequence is:

start_cman
unfence_self
start_qdiskd
wait_for_quorum
start_fenced
start_dlm_controld
start_gfs_controld
join_fence_domain

I believe the reason we put unfence between cman and qdisk was in case the
qdisk was on a fenced device.  But, I'd forgotten about the more critical
case where someone runs 'service cman start' on a node after it has been
kicked out of the cluster and has been fenced (via fence_scsi).  This is
not too uncommon for someone to try -- they think they can just restart
the cluster on the node without first rebooting.  We go to a lot of
trouble in fenced and other daemons to recognize when someone does that
and shut things down again before getting far enough to corrupt storage.

Obviously, unfencing right at the beginning undercuts all those checks and
precautions, and could easily lead to corrupt storage.  So, we need to
move unfence to just before the join_fence_domain step.  Requiring a qdisk
to use a disk not subject to fencing shouldn't be too onerous?

Dave

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Cluster-devel] unfence during startup
  2009-11-06 17:27 [Cluster-devel] unfence during startup David Teigland
@ 2009-11-12 17:50 ` Lon H. Hohberger
  0 siblings, 0 replies; 2+ messages in thread
From: Lon H. Hohberger @ 2009-11-12 17:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, 2009-11-06 at 11:27 -0600, David Teigland wrote:
> The current init.d/cman startup sequence is:
> 
> start_cman
> unfence_self
> start_qdiskd
> wait_for_quorum
> start_fenced
> start_dlm_controld
> start_gfs_controld
> join_fence_domain
> 
> I believe the reason we put unfence between cman and qdisk was in case the
> qdisk was on a fenced device.  But, I'd forgotten about the more critical
> case where someone runs 'service cman start' on a node after it has been
> kicked out of the cluster and has been fenced (via fence_scsi).  This is
> not too uncommon for someone to try -- they think they can just restart
> the cluster on the node without first rebooting.  We go to a lot of
> trouble in fenced and other daemons to recognize when someone does that
> and shut things down again before getting far enough to corrupt storage.
> 
> Obviously, unfencing right at the beginning undercuts all those checks and
> precautions, and could easily lead to corrupt storage.  So, we need to
> move unfence to just before the join_fence_domain step.  Requiring a qdisk
> to use a disk not subject to fencing shouldn't be too onerous?

It shouldn't matter -- it's what we require today with fence_scsi.

Alternatively, we can make qdiskd check for this sort of thing as well.
It might be more trouble than it's worth, but qdiskd already has a
'stop_cman' flag which will kill cman if qdiskd detects a critical error
(e.g. trying to rejoin a cluster...)

-- Lon



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-11-12 17:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 17:27 [Cluster-devel] unfence during startup David Teigland
2009-11-12 17:50 ` Lon H. Hohberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).