From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Ren <zren@suse.com>
Date: Fri, 20 May 2016 17:03:00 +0800
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info
	about stateful
In-Reply-To: <20160518185029.GA5193@redhat.com>
References: <1463487013-15512-1-git-send-email-zren@suse.com>
	<573C114C.8000909@suse.com> <20160518185029.GA5193@redhat.com>
Message-ID: <573ED2C4.8010805@suse.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi David,

On 05/19/2016 02:50 AM, David Teigland wrote:
> On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote:
>> Q1: what's stateful merged node?
>
>> Q2: what if we add the stateful merged nodes to dlm_controld daemon
>> cpg instead of fencing them?
>
> The details here are fundamental to the way dlm works because the dlm
> depends on the properties of Virtual Synchrony.  Partitions obviously
> violate VS.  ("Extended" forms of virtual synchrony deal with partitions,
> but they are not very practical.  Unfortunately, corosync implements one
> of these extended forms of VS, which means any application that requires
> strict VS has to implement an equivalent of this "stateful merging"
> detection that's in the dlm.)
>
> With VS, message/membership events change the state being kept consistent
> among nodes.  When a partition occurs, nodes have divergent events and
> inconsistent state.  The partition is simple to understand, because
> partitioned nodes are indistinguishable from failed nodes and are treated
> as such.  But, if partitioned nodes merge, the inconsistent state has to
> be made consistent.  This must be done in the same way a new node is added
> to an existing node, which means doing "state transfer" from the existing
> node to the new node to make the state consistent between them.
>
> If the "new" node previously had state because of partition/merge, it must
> drop that old state and replace it with the state being transferred to it.
> After this, they will be consistent and can continue.  With a simple
> process, you might just kill it, restart it and add the transferred state.
> But the dlm isn't a process that can simply be restarted, the state is
> spread through applications using it, and through the kernel.  The only
> mechanism for resetting the dlm state is resetting the kernel, which is
> resetting/rebooting the machine.
>
>> if so, CPG $uuid now, e.g. from the perspective of A, may has only one
>> memeber - A itself, it can perform lockspace now because cluster is
>> quorate now (and if we skip fencing); B and C do likewise; then for each
>> node, it looks like every node own this volume; so corruption may happen?
>
> When the nodes are partitioned, the situation is fairly straight forward
> -- each node thinks the others are failed, and normal operation is blocked
> until recovery happens for the failed nodes.
>
> The harder problem is what to do when they merge.  The dlm effectively
> ignores the invalid addition of the merged nodes and calls it a "stateful
> merge".  The merged nodes continue to be considered failed (from the
> partition) and require a full restart before being added.
>

Thanks a lot for elaborating this valuable knowledge to me! I've also 
shard with pacemaker guys. They'll make corresponding changes on pcmk 
side once the patch of dlm_controld is merged. I've sent the patch to 
you. Please take a look at;-)

With best regards,
Eric