From: Jiaju Zhang <jjzhang.linux@gmail.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC][PATCH] dlm: Reset fs_notified when check_fs_done
Date: Mon, 1 Mar 2010 06:06:32 +0800 [thread overview]
Message-ID: <20100228220632.GA5791@linux-jjzhang> (raw)
Hi,
About the issue that dlm_controld and fs_controld sit spinning,
retrying and replying for the fs_notified check, I have a suspision
that another scenario may also hit that logic:
If the node->fs_notified has been set to 1 by previous change, when a
new change comes and needs to check the node->fs_notified, because it
has not been reset to 0, so check_fs_done will succeed even if
dlm_controld has not received the notification from fs_controld this
time.
For example, given that the following membership changes n, n+1, n+2,
we see what happens on node X:
Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason,
eventually in node X's ls->node_history, node Y's fs_notified
= 1
Step 2: cg n+1: node Y joins ...
Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one
possible scenario is: before fs_controld's notification
arrives, dlm_controld has known node Y is down from CPG
message and done a lot of work, and it saw node Y's
fs_notified = 1 (been set in Step 1) then passed the fs check
wrongly. So node Y's check_fs reset to 0.
Step 4: fs_controld's notification arrives, it sees node Y's check_fs
= 0 and assumes dlm_controld has not known node Y is down and
retries to send the notification. But in fact, dlm_controld
has already known this and finished all the work, which will
result in the spinning ...
I'm not sure if I read the code correctly :-) Below is the patch which
reset the node->fs_notified. Review and comments are highly
appreciated!
Thanks,
Jiaju
Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
---
group/dlm_controld/cpg.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
index d5245ce..b257595 100644
--- a/group/dlm_controld/cpg.c
+++ b/group/dlm_controld/cpg.c
@@ -636,6 +636,7 @@ static int check_fs_done(struct lockspace *ls)
if (node->fs_notified) {
node->check_fs = 0;
+ node->fs_notified = 0;
} else {
log_group(ls, "check_fs nodeid %d needs fs notify",
node->nodeid);
next reply other threads:[~2010-02-28 22:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-28 22:06 Jiaju Zhang [this message]
2010-11-08 15:05 ` [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done Jiaju Zhang
2010-11-08 22:06 ` David Teigland
2011-02-22 8:35 ` Jiaju Zhang
2011-02-22 17:11 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100228220632.GA5791@linux-jjzhang \
--to=jjzhang.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox