From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Peterson Date: Tue, 20 Jun 2006 14:43:50 -0500 Subject: [Cluster-devel] cluster/group/daemon cman.c cpg.c gd_internal. ... In-Reply-To: <20060620191951.GA12160@redhat.com> References: <20060620180914.11020.qmail@sourceware.org> <449844DB.8040700@redhat.com> <20060620191951.GA12160@redhat.com> Message-ID: <44984FF6.9010406@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit David Teigland wrote: > Might be a good idea, I don't really know. I'm not even sure we'd need to > save much or any additional state that couldn't be pulled from the gfs/dlm > instances themselves. It seems to me the challenge would be writing the > daemons so they could put all the pieces and interconnections back > together again. > > If this ends up being a big enough problem to get more attention, I think > the first practical improvement we could make is something like > blocking/clearing i/o from the residual fs's (like we do in withdraw) and > adding the ability to fully purge instances of gfs/dlm from the kernel > without rebooting the node. Then the machines could all start from > scratch without rebooting or fencing Here's another idea that came to me: For critical cluster processes like cman and fenced, maybe we could use init's ability to restart processes, i.e. the "respawn" option in /etc/inittab. Maybe we can use "respawn" or something similar to ensure that if a critical process like fenced dies, it gets restarted automatically and immediately. Of course, that might cause problems for shutdown, etc., and it would probably make it harder to test certain things... Bob Peterson Red Hat Cluster Suite