From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Tue, 20 Jun 2006 14:19:51 -0500 Subject: [Cluster-devel] cluster/group/daemon cman.c cpg.c gd_internal. ... In-Reply-To: <449844DB.8040700@redhat.com> References: <20060620180914.11020.qmail@sourceware.org> <449844DB.8040700@redhat.com> Message-ID: <20060620191951.GA12160@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Jun 20, 2006 at 01:56:27PM -0500, Robert Peterson wrote: > teigland at sourceware.org wrote: > > Moving the cluster infrastructure to userland introduced a new > > problem that we didn't need to worry about before. All cluster > > state now exists in userland processes which can go away and then > > come back like new, i.e. unaware of the previous state. > Hi Dave, > > You know this new development cman stuff and I really don't, but I was > just thinking: > > If we used a shared memory segment, we could hold state information > there and then cman would remember the cluster state after process > termination and restart, possibly making this whole thing unnecessary. > Just a thought. Of course, one could also argue that if the process > terminated, can we really trust the state information it had at the > time? Might be a good idea, I don't really know. I'm not even sure we'd need to save much or any additional state that couldn't be pulled from the gfs/dlm instances themselves. It seems to me the challenge would be writing the daemons so they could put all the pieces and interconnections back together again. If this ends up being a big enough problem to get more attention, I think the first practical improvement we could make is something like blocking/clearing i/o from the residual fs's (like we do in withdraw) and adding the ability to fully purge instances of gfs/dlm from the kernel without rebooting the node. Then the machines could all start from scratch without rebooting or fencing.