From: Jiaju Zhang <jjzhang.linux@gmail.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
Date: Thu, 13 May 2010 16:49:27 +0800 [thread overview]
Message-ID: <20100513084926.GA30727@linux-jjzhang> (raw)
Hi,
This is a fix to the membership judging issue in dlm_controld.pcmk.
Now, dlm_controld.pcmk gets the membership change information from
Pacemaker. Pacemaker get that information from Corosync, which is
good. But when Pacemaker itself gets the membership change info, it
does some internal processing like aligning the node membership as
well as some other node info in the cluster. Before Pacemaker
finished, it won't take the node in question as _active_ member.
Just at that moment, dlm_controld.pcmk also knows the membership
change and goes to read the membership info from Pacemaker. It is a
race condition, because Pacemaker hasn't finished all the jobs in
one membership change, which means not having finished updating all
the info in crm_peer_id_cache, dlm_controld.pcmk read it! So if the
node in question is a joining node, it should be regarded as "Added"
node, but according to current logic, it is not!
Because all the components get the membership info eventually from
Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
Pacemaker/crmd to finish all the information processing related to
membership change.
Patched attached below, any review and comments are highly
appreciated!
Thanks,
Jiaju
Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Andrew Beekhof <andrew@beekhof.net>
---
group/dlm_controld/pacemaker.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/group/dlm_controld/pacemaker.c b/group/dlm_controld/pacemaker.c
index 3150a1f..9f90d48 100644
--- a/group/dlm_controld/pacemaker.c
+++ b/group/dlm_controld/pacemaker.c
@@ -81,6 +81,7 @@ int setup_cluster(void)
void update_cluster(void)
{
static uint64_t last_membership = 0;
+ ais_dispatch(ais_fd_async, NULL);
cluster_quorate = crm_have_quorum;
if(last_membership < crm_peer_seq) {
log_debug("Processing membership %llu", crm_peer_seq);
@@ -91,7 +92,6 @@ void update_cluster(void)
void process_cluster(int ci)
{
- ais_dispatch(ais_fd_async, NULL);
update_cluster();
}
@@ -102,6 +102,14 @@ void close_cluster(void) {
#include <arpa/inet.h>
#include <corosync/totem/totemip.h>
+static gboolean is_member(const crm_node_t *node)
+{
+ if(node && safe_str_eq(node->state, CRM_NODE_MEMBER))
+ return TRUE;
+
+ return FALSE;
+}
+
void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
{
int rc = 0;
@@ -119,7 +127,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
snprintf(path, PATH_MAX, "%s/%d", COMMS_DIR, node->id);
rc = stat(path, &tmp);
- is_active = crm_is_member_active(node);
+ is_active = is_member(node);
if(rc == 0 && is_active) {
/* nothing to do?
@@ -212,7 +220,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
}
log_debug("%s %sctive node %u '%s': born-on=%llu, last-seen=%llu, this-event=%llu, last-event=%llu",
- action, crm_is_member_active(value)?"a":"ina",
+ action, is_member(value)?"a":"ina",
node->id, node->uname, node->born, node->last_seen,
crm_peer_seq, (unsigned long long)*last);
}
@@ -220,7 +228,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
int is_cluster_member(uint32_t nodeid)
{
crm_node_t *node = crm_get_peer(nodeid, NULL);
- return crm_is_member_active(node);
+ return is_member(node);
}
char *nodeid2name(int nodeid) {
next reply other threads:[~2010-05-13 8:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-13 8:49 Jiaju Zhang [this message]
[not found] ` <20100513095117.GM20952@suse.de>
2010-05-13 10:18 ` [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue Jiaju Zhang
2010-05-13 16:25 ` Andrew Beekhof
[not found] ` <20100513183215.GP20952@suse.de>
2010-05-13 20:19 ` Andrew Beekhof
2010-05-14 3:04 ` Tim Serong
2010-05-14 10:15 ` Andrew Beekhof
2010-05-14 11:28 ` Jiaju Zhang
[not found] ` <20100513203604.GQ20952@suse.de>
2010-05-14 9:52 ` Andrew Beekhof
2010-05-14 11:25 ` Jiaju Zhang
2010-05-14 4:08 ` Jiaju Zhang
2010-05-14 5:33 ` Jiaju Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100513084926.GA30727@linux-jjzhang \
--to=jjzhang.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).