From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiaju Zhang Date: Thu, 13 May 2010 16:49:27 +0800 Subject: [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue Message-ID: <20100513084926.GA30727@linux-jjzhang> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, This is a fix to the membership judging issue in dlm_controld.pcmk. Now, dlm_controld.pcmk gets the membership change information from Pacemaker. Pacemaker get that information from Corosync, which is good. But when Pacemaker itself gets the membership change info, it does some internal processing like aligning the node membership as well as some other node info in the cluster. Before Pacemaker finished, it won't take the node in question as _active_ member. Just at that moment, dlm_controld.pcmk also knows the membership change and goes to read the membership info from Pacemaker. It is a race condition, because Pacemaker hasn't finished all the jobs in one membership change, which means not having finished updating all the info in crm_peer_id_cache, dlm_controld.pcmk read it! So if the node in question is a joining node, it should be regarded as "Added" node, but according to current logic, it is not! Because all the components get the membership info eventually from Corosync, IMO, for dlm_controld.pcmk, there is no need to wait Pacemaker/crmd to finish all the information processing related to membership change. Patched attached below, any review and comments are highly appreciated! Thanks, Jiaju Signed-off-by: Jiaju Zhang Cc: David Teigland Cc: Andrew Beekhof --- group/dlm_controld/pacemaker.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/group/dlm_controld/pacemaker.c b/group/dlm_controld/pacemaker.c index 3150a1f..9f90d48 100644 --- a/group/dlm_controld/pacemaker.c +++ b/group/dlm_controld/pacemaker.c @@ -81,6 +81,7 @@ int setup_cluster(void) void update_cluster(void) { static uint64_t last_membership = 0; + ais_dispatch(ais_fd_async, NULL); cluster_quorate = crm_have_quorum; if(last_membership < crm_peer_seq) { log_debug("Processing membership %llu", crm_peer_seq); @@ -91,7 +92,6 @@ void update_cluster(void) void process_cluster(int ci) { - ais_dispatch(ais_fd_async, NULL); update_cluster(); } @@ -102,6 +102,14 @@ void close_cluster(void) { #include #include +static gboolean is_member(const crm_node_t *node) +{ + if(node && safe_str_eq(node->state, CRM_NODE_MEMBER)) + return TRUE; + + return FALSE; +} + void dlm_process_node(gpointer key, gpointer value, gpointer user_data) { int rc = 0; @@ -119,7 +127,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data) snprintf(path, PATH_MAX, "%s/%d", COMMS_DIR, node->id); rc = stat(path, &tmp); - is_active = crm_is_member_active(node); + is_active = is_member(node); if(rc == 0 && is_active) { /* nothing to do? @@ -212,7 +220,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data) } log_debug("%s %sctive node %u '%s': born-on=%llu, last-seen=%llu, this-event=%llu, last-event=%llu", - action, crm_is_member_active(value)?"a":"ina", + action, is_member(value)?"a":"ina", node->id, node->uname, node->born, node->last_seen, crm_peer_seq, (unsigned long long)*last); } @@ -220,7 +228,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data) int is_cluster_member(uint32_t nodeid) { crm_node_t *node = crm_get_peer(nodeid, NULL); - return crm_is_member_active(node); + return is_member(node); } char *nodeid2name(int nodeid) {