Cluster-Devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Jiaju Zhang <jjzhang.linux@gmail.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC][PATCH] dlm: Reset fs_notified when check_fs_done
Date: Mon, 1 Mar 2010 06:06:32 +0800	[thread overview]
Message-ID: <20100228220632.GA5791@linux-jjzhang> (raw)

Hi,

About the issue that dlm_controld and fs_controld sit spinning,
retrying and replying for the fs_notified check, I have a suspision
that another scenario may also hit that logic:

If the node->fs_notified has been set to 1 by previous change, when a
new change comes and needs to check the node->fs_notified, because it
has not been reset to 0, so check_fs_done will succeed even if
dlm_controld has not received the notification from fs_controld this
time.
For example, given that the following membership changes n, n+1, n+2,
we see what happens on node X:
Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason,
        eventually in node X's ls->node_history, node Y's fs_notified
        = 1
Step 2: cg n+1: node Y joins ...
Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one
        possible scenario is: before fs_controld's notification
        arrives, dlm_controld has known node Y is down from CPG
        message and done a lot of work, and it saw node Y's
        fs_notified = 1 (been set in Step 1) then passed the fs check
        wrongly. So node Y's check_fs reset to 0.
Step 4: fs_controld's notification arrives, it sees node Y's check_fs
        = 0 and assumes dlm_controld has not known node Y is down and
        retries to send the notification. But in fact, dlm_controld
        has already known this and finished all the work, which will
        result in the spinning ... 

I'm not sure if I read the code correctly :-) Below is the patch which
reset the node->fs_notified. Review and comments are highly
appreciated!

Thanks,
Jiaju

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
---
 group/dlm_controld/cpg.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
index d5245ce..b257595 100644
--- a/group/dlm_controld/cpg.c
+++ b/group/dlm_controld/cpg.c
@@ -636,6 +636,7 @@ static int check_fs_done(struct lockspace *ls)
 
 		if (node->fs_notified) {
 			node->check_fs = 0;
+			node->fs_notified = 0;
 		} else {
 			log_group(ls, "check_fs nodeid %d needs fs notify",
 				  node->nodeid);



             reply	other threads:[~2010-02-28 22:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-28 22:06 Jiaju Zhang [this message]
2010-11-08 15:05 ` [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done Jiaju Zhang
2010-11-08 22:06   ` David Teigland
2011-02-22  8:35     ` Jiaju Zhang
2011-02-22 17:11       ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100228220632.GA5791@linux-jjzhang \
    --to=jjzhang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox