From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Ren Date: Wed, 18 May 2016 14:53:00 +0800 Subject: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful In-Reply-To: <1463487013-15512-1-git-send-email-zren@suse.com> References: <1463487013-15512-1-git-send-email-zren@suse.com> Message-ID: <573C114C.8000909@suse.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi David, Ken Gaillot got me with this question: Since corosync/pcmk can be healed from such a case, why not DLM? Please look at detailed discussion here: [1] https://github.com/ClusterLabs/pacemaker/pull/839 Here is my thoughts, but I'm not sure, CMIIW please: time: T; cluster:A, B, C; and if we have a lockspace named after $uuid for a shared disk volume, and a CPG for lockspace $uuid; $uuid CPG has members of A, B and C when things are OK, but: T: quorum lost; cluster partitions into 3 parts; lockspace $uuid cannot perform any lockspace operations because cluster is not quorate; T+1: quorum regained; dlm_controld daemon CPG has not done its merging/fencing stuff; so here are 2 questions: Q1: what's stateful merged node? I've seen the comments within code;-) It means a lockspace has been on the node before it sends protocol message? Q2: what if we add the stateful merged nodes to dlm_controld daemon cpg instead of fencing them? if so, CPG $uuid now, e.g. from the perspective of A, may has only one memeber - A itself, it can perform lockspace now because cluster is quorate now (and if we skip fencing); B and C do likewise; then for each node, it looks like every node own this volume; so corruption may happen? Thanks a lot, Eric On 05/17/2016 08:10 PM, Eric Ren wrote: > Hi David, > This is just a draft patch for you to review;-) There's an issue I'm > not sure: where should we clear "stateful_merge_wait"? > > And I need more communications with pacemaker guys and more time for testing. > I will send you the formal patch if things get done;-)