From mboxrd@z Thu Jan  1 00:00:00 1970
From: Robert Peterson <rpeterso@redhat.com>
Date: Tue, 20 Jun 2006 14:43:50 -0500
Subject: [Cluster-devel] cluster/group/daemon cman.c cpg.c gd_internal. ...
In-Reply-To: <20060620191951.GA12160@redhat.com>
References: <20060620180914.11020.qmail@sourceware.org>
	<449844DB.8040700@redhat.com> <20060620191951.GA12160@redhat.com>
Message-ID: <44984FF6.9010406@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

David Teigland wrote:
> Might be a good idea, I don't really know.  I'm not even sure we'd need to
> save much or any additional state that couldn't be pulled from the gfs/dlm
> instances themselves.  It seems to me the challenge would be writing the
> daemons so they could put all the pieces and interconnections back
> together again.
>
> If this ends up being a big enough problem to get more attention, I think
> the first practical improvement we could make is something like
> blocking/clearing i/o from the residual fs's (like we do in withdraw) and
> adding the ability to fully purge instances of gfs/dlm from the kernel
> without rebooting the node.  Then the machines could all start from
> scratch without rebooting or fencing
Here's another idea that came to me:

For critical cluster processes like cman and fenced, maybe we could use 
init's ability
to restart processes, i.e. the "respawn" option in /etc/inittab.  Maybe 
we can use
"respawn" or something similar to ensure that if a critical process like 
fenced dies,
it gets restarted automatically and immediately.  Of course, that might 
cause problems
for shutdown, etc., and it would probably make it harder to test certain 
things...

Bob Peterson
Red Hat Cluster Suite