From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Ren Date: Fri, 20 May 2016 17:03:00 +0800 Subject: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful In-Reply-To: <20160518185029.GA5193@redhat.com> References: <1463487013-15512-1-git-send-email-zren@suse.com> <573C114C.8000909@suse.com> <20160518185029.GA5193@redhat.com> Message-ID: <573ED2C4.8010805@suse.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi David, On 05/19/2016 02:50 AM, David Teigland wrote: > On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: >> Q1: what's stateful merged node? > >> Q2: what if we add the stateful merged nodes to dlm_controld daemon >> cpg instead of fencing them? > > The details here are fundamental to the way dlm works because the dlm > depends on the properties of Virtual Synchrony. Partitions obviously > violate VS. ("Extended" forms of virtual synchrony deal with partitions, > but they are not very practical. Unfortunately, corosync implements one > of these extended forms of VS, which means any application that requires > strict VS has to implement an equivalent of this "stateful merging" > detection that's in the dlm.) > > With VS, message/membership events change the state being kept consistent > among nodes. When a partition occurs, nodes have divergent events and > inconsistent state. The partition is simple to understand, because > partitioned nodes are indistinguishable from failed nodes and are treated > as such. But, if partitioned nodes merge, the inconsistent state has to > be made consistent. This must be done in the same way a new node is added > to an existing node, which means doing "state transfer" from the existing > node to the new node to make the state consistent between them. > > If the "new" node previously had state because of partition/merge, it must > drop that old state and replace it with the state being transferred to it. > After this, they will be consistent and can continue. With a simple > process, you might just kill it, restart it and add the transferred state. > But the dlm isn't a process that can simply be restarted, the state is > spread through applications using it, and through the kernel. The only > mechanism for resetting the dlm state is resetting the kernel, which is > resetting/rebooting the machine. > >> if so, CPG $uuid now, e.g. from the perspective of A, may has only one >> memeber - A itself, it can perform lockspace now because cluster is >> quorate now (and if we skip fencing); B and C do likewise; then for each >> node, it looks like every node own this volume; so corruption may happen? > > When the nodes are partitioned, the situation is fairly straight forward > -- each node thinks the others are failed, and normal operation is blocked > until recovery happens for the failed nodes. > > The harder problem is what to do when they merge. The dlm effectively > ignores the invalid addition of the merged nodes and calls it a "stateful > merge". The merged nodes continue to be considered failed (from the > partition) and require a full restart before being added. > Thanks a lot for elaborating this valuable knowledge to me! I've also shard with pacemaker guys. They'll make corresponding changes on pcmk side once the patch of dlm_controld is merged. I've sent the patch to you. Please take a look at;-) With best regards, Eric