[Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
@ 2010-05-13  8:49 Jiaju Zhang
       [not found] ` <20100513095117.GM20952@suse.de>
  0 siblings, 1 reply; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-13  8:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

This is a fix to the membership judging issue in dlm_controld.pcmk.
Now, dlm_controld.pcmk gets the membership change information from
Pacemaker. Pacemaker get that information from Corosync, which is
good. But when Pacemaker itself gets the membership change info, it
does some internal processing like aligning the node membership as
well as some other node info in the cluster. Before Pacemaker
finished, it won't take the node in question as _active_ member.
Just at that moment, dlm_controld.pcmk also knows the membership
change and goes to read the membership info from Pacemaker. It is a
race condition, because Pacemaker hasn't finished all the jobs in
one membership change, which means not having finished updating all
the info in crm_peer_id_cache, dlm_controld.pcmk read it! So if the
node in question is a joining node, it should be regarded as "Added"
node, but according to current logic, it is not!

Because all the components get the membership info eventually from
Corosync, IMO, for dlm_controld.pcmk, there is no need to wait 
Pacemaker/crmd to finish all the information processing related to
membership change.

Patched attached below, any review and comments are highly
appreciated!

Thanks,
Jiaju

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Andrew Beekhof <andrew@beekhof.net>
---
 group/dlm_controld/pacemaker.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/group/dlm_controld/pacemaker.c b/group/dlm_controld/pacemaker.c
index 3150a1f..9f90d48 100644
--- a/group/dlm_controld/pacemaker.c
+++ b/group/dlm_controld/pacemaker.c
@@ -81,6 +81,7 @@ int setup_cluster(void)
 void update_cluster(void)
 {
     static uint64_t last_membership = 0;
+    ais_dispatch(ais_fd_async, NULL);
     cluster_quorate = crm_have_quorum;
     if(last_membership < crm_peer_seq) {
 	log_debug("Processing membership %llu", crm_peer_seq);
@@ -91,7 +92,6 @@ void update_cluster(void)

 void process_cluster(int ci)
 {
-    ais_dispatch(ais_fd_async, NULL);
     update_cluster();
 }

@@ -102,6 +102,14 @@ void close_cluster(void) {
 #include <arpa/inet.h>
 #include <corosync/totem/totemip.h>

+static gboolean is_member(const crm_node_t *node)
+{
+    if(node && safe_str_eq(node->state, CRM_NODE_MEMBER))
+	return TRUE;
+
+    return FALSE;
+}
+
 void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
 {
     int rc = 0;
@@ -119,7 +127,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
     snprintf(path, PATH_MAX, "%s/%d", COMMS_DIR, node->id);

     rc = stat(path, &tmp);
-    is_active = crm_is_member_active(node);
+    is_active = is_member(node);

     if(rc == 0 && is_active) {
 	/* nothing to do?
@@ -212,7 +220,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
     }

     log_debug("%s %sctive node %u '%s': born-on=%llu, last-seen=%llu, this-event=%llu, last-event=%llu",
-	       action, crm_is_member_active(value)?"a":"ina",
+	       action, is_member(value)?"a":"ina",
 	       node->id, node->uname, node->born, node->last_seen,
 	       crm_peer_seq, (unsigned long long)*last);
 }
@@ -220,7 +228,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
 int is_cluster_member(uint32_t nodeid)
 {
     crm_node_t *node = crm_get_peer(nodeid, NULL);
-    return crm_is_member_active(node);
+    return is_member(node);
 }

 char *nodeid2name(int nodeid) {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
       [not found] ` <20100513095117.GM20952@suse.de>
@ 2010-05-13 10:18   ` Jiaju Zhang
  2010-05-13 16:25   ` Andrew Beekhof
  1 sibling, 0 replies; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-13 10:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, May 13, 2010 at 5:51 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>
> Hi Jiaju,
>
> thanks for the patch and the technical explanation!
>
>> Because all the components get the membership info eventually from
>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>> Pacemaker/crmd to finish all the information processing related to
>> membership change.
>>
>> Patched attached below, any review and comments are highly
>> appreciated!
>
> I'd only like to add the user-visible impact of this - namely, when a
> previously unknown node joins a pcmk cluster (with running DLM etc),
> nodes will hang.
>
> So it is fairly important to get fixed.
>
>
> So, with my limited knowledge of the code, I think the fix is sound. I
> have just one clerical comment:
>
>> ?void process_cluster(int ci)
>> ?{
>> - ? ?ais_dispatch(ais_fd_async, NULL);
>> ? ? ?update_cluster();
>> ?}
>
> Can this function be removed? It doesn't do anything any more. Or is
> there something that should be done with the "ci" argument?

There is something that should be done with the "ci" argument. In
fact, the "ci" means one slot which holds a socket(most cases) and a
related processing function which is monitored by "poll" in
dlm_controld.
update_cluster is in different calling path to process_cluster.

Many thanks for the review ;)
Jiaju



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
       [not found] ` <20100513095117.GM20952@suse.de>
  2010-05-13 10:18   ` Jiaju Zhang
@ 2010-05-13 16:25   ` Andrew Beekhof
       [not found]     ` <20100513183215.GP20952@suse.de>
  2010-05-14  4:08     ` Jiaju Zhang
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Beekhof @ 2010-05-13 16:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, May 13, 2010 at 11:51 AM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>
> Hi Jiaju,
>
> thanks for the patch and the technical explanation!
>
>> Because all the components get the membership info eventually from
>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>> Pacemaker/crmd to finish all the information processing related to
>> membership change.
>>
>> Patched attached below, any review and comments are highly
>> appreciated!
>
> I'd only like to add the user-visible impact of this - namely, when a
> previously unknown node joins a pcmk cluster (with running DLM etc),
> nodes will hang.
>
> So it is fairly important to get fixed.
>
>
> So, with my limited knowledge of the code, I think the fix is sound. I
> have just one clerical comment:
>
>> ?void process_cluster(int ci)
>> ?{
>> - ? ?ais_dispatch(ais_fd_async, NULL);
>> ? ? ?update_cluster();
>> ?}
>
> Can this function be removed?

No, it can't.
Remove that and the membership (and crm_peer_id_cache) stops being updated.

Which makes me very suspicious of the whole patch because it clearly
wasn't tested very well.

> It doesn't do anything any more. Or is
> there something that should be done with the "ci" argument?
>
>
> Regards,
> ? ?Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
       [not found]     ` <20100513183215.GP20952@suse.de>
@ 2010-05-13 20:19       ` Andrew Beekhof
  2010-05-14  3:04         ` Tim Serong
       [not found]         ` <20100513203604.GQ20952@suse.de>
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Beekhof @ 2010-05-13 20:19 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, May 13, 2010 at 8:32 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T18:25:42, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> >> ?void process_cluster(int ci)
>> >> ?{
>> >> - ? ?ais_dispatch(ais_fd_async, NULL);
>> >> ? ? ?update_cluster();
>> >> ?}
>> >
>> > Can this function be removed?
>>
>> No, it can't.
>> Remove that and the membership (and crm_peer_id_cache) stops being updated.
>>
>> Which makes me very suspicious of the whole patch because it clearly
>> wasn't tested very well.
>
> ais_dispatch() was moved to update_cluster(). That's safe, it seems.

But pointless and unrelated to the problem, so why include it?

> My question related to the detail that now process_cluster() is
> identical to update_cluster(), at initial reading suggesting the
> function might be redundant now.

process_cluster() and update_cluster() are both API entry points.


Does the behavior still occur with pacemaker 1.1.2?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership  change judging issue
  2010-05-13 20:19       ` Andrew Beekhof
@ 2010-05-14  3:04         ` Tim Serong
  2010-05-14 10:15           ` Andrew Beekhof
       [not found]         ` <20100513203604.GQ20952@suse.de>
  1 sibling, 1 reply; 11+ messages in thread
From: Tim Serong @ 2010-05-14  3:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote: 
>  
> Does the behavior still occur with pacemaker 1.1.2? 
>  

Yes.

For the record, the most minimal testcase I've managed for this
so far is as follows (substitute "/etc/init.d/corosync start" or
whatever for "rcopenais start" if you're not on something SUSE-based):

1) Configure corosync/openais on two nodes.
   Do not start the cluster yet.
   
2) On one node:

     # rm /var/lib/heartbeat/crm/*
     # rcopenais start
     # while ! crm_mon -1 | grep -qi online; do \
         echo -n "." ; sleep 5 ; done

3) Now we have one node online, configure Pacemaker:

     # cat <<CONF | crm configure
     primitive dlm ocf:pacemaker:controld
     primitive clvm ocf:lvm2:clvmd
     group g dlm clvm
     clone c g meta interleave="true"
     property stonith-enabled="false"
     property no-quorum-policy="ignore"
     commit
     CONF

   Watch "crm_mon -r" until that clone comes online.
   Should only take a few seconds.
   
4) On the other node:

     # rm /var/lib/heartbeat/crm/*
     # rcopenais start

The first node will now either wedge up spectacularly, and/or
dlm_recoverd and clvmd will be stuck in D state on both nodes.

Regards,

Tim


-- 
Tim Serong <tserong@novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
  2010-05-13 16:25   ` Andrew Beekhof
       [not found]     ` <20100513183215.GP20952@suse.de>
@ 2010-05-14  4:08     ` Jiaju Zhang
  2010-05-14  5:33       ` Jiaju Zhang
  1 sibling, 1 reply; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-14  4:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 14, 2010 at 12:25 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Thu, May 13, 2010 at 11:51 AM, Lars Marowsky-Bree <lmb@novell.com> wrote:
>> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>>
>> Hi Jiaju,
>>
>> thanks for the patch and the technical explanation!
>>
>>> Because all the components get the membership info eventually from
>>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>>> Pacemaker/crmd to finish all the information processing related to
>>> membership change.
>>>
>>> Patched attached below, any review and comments are highly
>>> appreciated!
>>
>> I'd only like to add the user-visible impact of this - namely, when a
>> previously unknown node joins a pcmk cluster (with running DLM etc),
>> nodes will hang.
>>
>> So it is fairly important to get fixed.
>>
>>
>> So, with my limited knowledge of the code, I think the fix is sound. I
>> have just one clerical comment:
>>
>>> ?void process_cluster(int ci)
>>> ?{
>>> - ? ?ais_dispatch(ais_fd_async, NULL);
>>> ? ? ?update_cluster();
>>> ?}
>>
>> Can this function be removed?
>
> No, it can't.
> Remove that and the membership (and crm_peer_id_cache) stops being updated.

Agreed.

Both update_cluster and process_cluster can't be removed. Yes, moving
the place of ais_dispatch should have nothing to do with this issue.
Oh, this is because I added it in the debugging stage and forgot to
remove it in the final version, just sending the working version to
the list, sorry, I'll posted an updated patch soon ;-)

> Which makes me very suspicious of the whole patch because it clearly
> wasn't tested very well.

I have tested in my environment and it works for me.

Thanks,
Jiaju



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
  2010-05-14  4:08     ` Jiaju Zhang
@ 2010-05-14  5:33       ` Jiaju Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-14  5:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 14, 2010 at 12:08:25PM +0800, Jiaju Zhang wrote:
 
> Both update_cluster and process_cluster can't be removed. Yes, moving
> the place of ais_dispatch should have nothing to do with this issue.
> Oh, this is because I added it in the debugging stage and forgot to
> remove it in the final version, just sending the working version to
> the list, sorry, I'll posted an updated patch soon ;-)
> 

This is the updated version of this patch, thanks a lot for your
review and comments ;-)

Thanks,
Jiaju

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
---
 group/dlm_controld/pacemaker.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/group/dlm_controld/pacemaker.c b/group/dlm_controld/pacemaker.c
index 3150a1f..0c30c5a 100644
--- a/group/dlm_controld/pacemaker.c
+++ b/group/dlm_controld/pacemaker.c
@@ -102,6 +102,14 @@ void close_cluster(void) {
 #include <arpa/inet.h>
 #include <corosync/totem/totemip.h>
 
+static gboolean is_member(const crm_node_t *node)
+{
+    if(node && safe_str_eq(node->state, CRM_NODE_MEMBER))
+	return TRUE;
+
+    return FALSE;
+}
+
 void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
 {
     int rc = 0;
@@ -119,7 +127,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
     snprintf(path, PATH_MAX, "%s/%d", COMMS_DIR, node->id);
 
     rc = stat(path, &tmp);
-    is_active = crm_is_member_active(node);
+    is_active = is_member(node);
     
     if(rc == 0 && is_active) {
 	/* nothing to do?
@@ -212,7 +220,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
     }
 
     log_debug("%s %sctive node %u '%s': born-on=%llu, last-seen=%llu, this-event=%llu, last-event=%llu",
-	       action, crm_is_member_active(value)?"a":"ina",
+	       action, is_member(value)?"a":"ina",
 	       node->id, node->uname, node->born, node->last_seen,
 	       crm_peer_seq, (unsigned long long)*last);
 }
@@ -220,7 +228,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
 int is_cluster_member(uint32_t nodeid)
 {
     crm_node_t *node = crm_get_peer(nodeid, NULL);
-    return crm_is_member_active(node);
+    return is_member(node);
 }
 
 char *nodeid2name(int nodeid) {



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
       [not found]         ` <20100513203604.GQ20952@suse.de>
@ 2010-05-14  9:52           ` Andrew Beekhof
  2010-05-14 11:25             ` Jiaju Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Beekhof @ 2010-05-14  9:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, May 13, 2010 at 10:36 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T22:19:58, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> > ais_dispatch() was moved to update_cluster(). That's safe, it seems.
>> But pointless and unrelated to the problem, so why include it?
>
> I took it that JJ thought this was part of the solution.

ci == ais_async_fd,  if there was something to read we should have
already had a callback.

>
>> Does the behavior still occur with pacemaker 1.1.2?
>
> Yes. Which fix do you think would change this?

These two that I mentioned last week:
  http://hg.clusterlabs.org/pacemaker/1.1/rev/34ae9f7b4675
  http://hg.clusterlabs.org/pacemaker/1.1/rev/19fdc0a3885a

I really doubt the patch is the right fix, conceptually, the node
can't be a member until the crmd process is running and done a whole
heap of processing.
So CRM_NODE_MEMBER without crm_proc_crmd makes no sense.

More likely, as-in the patches above, the process list wasn't being
updated correctly and thats what needs to get fixed.
It just never mattered before.

>
> JJ's patch has been tested and seems to work in general as well as
> address the problem.
>
>
> Regards,
> ? ?Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
  2010-05-14  3:04         ` Tim Serong
@ 2010-05-14 10:15           ` Andrew Beekhof
  2010-05-14 11:28             ` Jiaju Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Beekhof @ 2010-05-14 10:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong@novell.com> wrote:
> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
>>
>> Does the behavior still occur with pacemaker 1.1.2?
>>
>
> Yes.
>
> For the record, the most minimal testcase I've managed for this
> so far is as follows (substitute "/etc/init.d/corosync start" or
> whatever for "rcopenais start" if you're not on something SUSE-based):
>
> 1) Configure corosync/openais on two nodes.
> ? Do not start the cluster yet.
>
> 2) On one node:
>
> ? ? # rm /var/lib/heartbeat/crm/*
> ? ? # rcopenais start
> ? ? # while ! crm_mon -1 | grep -qi online; do \
> ? ? ? ? echo -n "." ; sleep 5 ; done
>
> 3) Now we have one node online, configure Pacemaker:
>
> ? ? # cat <<CONF | crm configure
> ? ? primitive dlm ocf:pacemaker:controld
> ? ? primitive clvm ocf:lvm2:clvmd
> ? ? group g dlm clvm
> ? ? clone c g meta interleave="true"
> ? ? property stonith-enabled="false"
> ? ? property no-quorum-policy="ignore"
> ? ? commit
> ? ? CONF
>
> ? Watch "crm_mon -r" until that clone comes online.
> ? Should only take a few seconds.
>
> 4) On the other node:
>
> ? ? # rm /var/lib/heartbeat/crm/*
> ? ? # rcopenais start
>
> The first node will now either wedge up spectacularly, and/or
> dlm_recoverd and clvmd will be stuck in D state on both nodes.

Presumably each thinks the other node isn't a member?
Perhaps something like this will help:

diff -r b59c27dc114a lib/ais/plugin.c
--- a/lib/ais/plugin.c	Wed May 12 10:51:56 2010 +0200
+++ b/lib/ais/plugin.c	Fri May 14 12:12:33 2010 +0200
@@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
 		    ais_notice("Respawning failed child process: %s",
 			       pcmk_children[lpc].name);
 		    spawn_child(&(pcmk_children[lpc]));
-		} else {
-		    send_cluster_id();
 		}
+		send_cluster_id();
 	    }
 	}
 	sched_yield ();
@@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
 	    }
 	}
     }
+    send_cluster_id();

     return 0;
 }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
  2010-05-14  9:52           ` Andrew Beekhof
@ 2010-05-14 11:25             ` Jiaju Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-14 11:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 14, 2010 at 5:52 PM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Thu, May 13, 2010 at 10:36 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
>> On 2010-05-13T22:19:58, Andrew Beekhof <andrew@beekhof.net> wrote:
>>
>>> > ais_dispatch() was moved to update_cluster(). That's safe, it seems.
>>> But pointless and unrelated to the problem, so why include it?
>>
>> I took it that JJ thought this was part of the solution.
>
> ci == ais_async_fd, ?if there was something to read we should have
> already had a callback.
>
>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>
>> Yes. Which fix do you think would change this?
>
> These two that I mentioned last week:
> ?http://hg.clusterlabs.org/pacemaker/1.1/rev/34ae9f7b4675
> ?http://hg.clusterlabs.org/pacemaker/1.1/rev/19fdc0a3885a

I've confirmed that my pacemaker has the above patch and it doesn't work.

>
> I really doubt the patch is the right fix, conceptually, the node
> can't be a member until the crmd process is running and done a whole
> heap of processing.
> So CRM_NODE_MEMBER without crm_proc_crmd makes no sense.
>
> More likely, as-in the patches above, the process list wasn't being
> updated correctly and thats what needs to get fixed.
> It just never mattered before.

This is the where the race condition happens. CRM_NODE_MEMBER was set
from the local node, but crm_proc_crmd was updated only when the peer
node sending the local node this info. When CRM_NODE_MEMBER was
updated but crm_proc_crmd was not updated yet, dlm_controld went to
read the crm_peer_id_cache, this bug happened.

Yes, Updating CRM_NODE_MEMBER and updating crm_proc_crmd should be an
_atomic_ operation, before that has finished, dlm_controld shoud not
read the membership info. So in fact,  I have been thinking about
three solutions to this problem:

1) After updating (CRM_NODE_MEMBER && crm_proc_crmd) finished, notify
dlm_controld. But I'm not sure if we can achieve this goal by doing
some work around ais_async_fd. Maybe need to dig into the code
further.
2) maybe add another flag to do some syncing between Pacemaker and
dlm_controld, it might like what dlm_controld and fs_controld has
done, but it may need much more work and a little complicated.
3) As the patch has done, only see CRM_NODE_MEMBER but not see
crm_proc_crmd. It seems have risk?(open disscussion). But after some
consideration, I think this should be OK. Because we eventually get
the membership info from Corosync, even we haven't recevied the
crm_proc_crmd info this moment, we can receive that info at once. If
some bad thing happened and the peer node was unfortunately down, we
can still received that info from Corosync. Also, the time we set the
membership info to DLM is just the node membership info, we don't need
to care if crmd is really started at this moment. The membership info
also is udpated at many other places when we do some work about
lockspace. So it should be safe as my think.

Thanks,
Jiaju

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
  2010-05-14 10:15           ` Andrew Beekhof
@ 2010-05-14 11:28             ` Jiaju Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Jiaju Zhang @ 2010-05-14 11:28 UTC (permalink / raw)
  To: cluster-devel.redhat.com

I'll test the patch later on. But I may not finished this testing by
today because
I have some problem to access the hardwre at this moment.

Thanks a lot ;-)
Jiaju

On Fri, May 14, 2010 at 6:15 PM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong@novell.com> wrote:
>> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
>>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>>
>>
>> Yes.
>>
>> For the record, the most minimal testcase I've managed for this
>> so far is as follows (substitute "/etc/init.d/corosync start" or
>> whatever for "rcopenais start" if you're not on something SUSE-based):
>>
>> 1) Configure corosync/openais on two nodes.
>> ? Do not start the cluster yet.
>>
>> 2) On one node:
>>
>> ? ? # rm /var/lib/heartbeat/crm/*
>> ? ? # rcopenais start
>> ? ? # while ! crm_mon -1 | grep -qi online; do \
>> ? ? ? ? echo -n "." ; sleep 5 ; done
>>
>> 3) Now we have one node online, configure Pacemaker:
>>
>> ? ? # cat <<CONF | crm configure
>> ? ? primitive dlm ocf:pacemaker:controld
>> ? ? primitive clvm ocf:lvm2:clvmd
>> ? ? group g dlm clvm
>> ? ? clone c g meta interleave="true"
>> ? ? property stonith-enabled="false"
>> ? ? property no-quorum-policy="ignore"
>> ? ? commit
>> ? ? CONF
>>
>> ? Watch "crm_mon -r" until that clone comes online.
>> ? Should only take a few seconds.
>>
>> 4) On the other node:
>>
>> ? ? # rm /var/lib/heartbeat/crm/*
>> ? ? # rcopenais start
>>
>> The first node will now either wedge up spectacularly, and/or
>> dlm_recoverd and clvmd will be stuck in D state on both nodes.
>
> Presumably each thinks the other node isn't a member?
> Perhaps something like this will help:
>
> diff -r b59c27dc114a lib/ais/plugin.c
> --- a/lib/ais/plugin.c ?Wed May 12 10:51:56 2010 +0200
> +++ b/lib/ais/plugin.c ?Fri May 14 12:12:33 2010 +0200
> @@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
> ? ? ? ? ? ? ? ? ? ?ais_notice("Respawning failed child process: %s",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pcmk_children[lpc].name);
> ? ? ? ? ? ? ? ? ? ?spawn_child(&(pcmk_children[lpc]));
> - ? ? ? ? ? ? ? } else {
> - ? ? ? ? ? ? ? ? ? send_cluster_id();
> ? ? ? ? ? ? ? ?}
> + ? ? ? ? ? ? ? send_cluster_id();
> ? ? ? ? ? ?}
> ? ? ? ?}
> ? ? ? ?sched_yield ();
> @@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
> ? ? ? ? ? ?}
> ? ? ? ?}
> ? ? }
> + ? ?send_cluster_id();
>
> ? ? return 0;
> ?}
>
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-05-14 11:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-13  8:49 [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue Jiaju Zhang
     [not found] ` <20100513095117.GM20952@suse.de>
2010-05-13 10:18   ` Jiaju Zhang
2010-05-13 16:25   ` Andrew Beekhof
     [not found]     ` <20100513183215.GP20952@suse.de>
2010-05-13 20:19       ` Andrew Beekhof
2010-05-14  3:04         ` Tim Serong
2010-05-14 10:15           ` Andrew Beekhof
2010-05-14 11:28             ` Jiaju Zhang
     [not found]         ` <20100513203604.GQ20952@suse.de>
2010-05-14  9:52           ` Andrew Beekhof
2010-05-14 11:25             ` Jiaju Zhang
2010-05-14  4:08     ` Jiaju Zhang
2010-05-14  5:33       ` Jiaju Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).