cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Jiaju Zhang <jjzhang.linux@gmail.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done
Date: Mon, 8 Nov 2010 23:05:49 +0800	[thread overview]
Message-ID: <20101108150549.GC11294@linux-jjzhang> (raw)
In-Reply-To: <20100228220632.GA5791@linux-jjzhang>

Hi list,

Sorry for reposting this mail if you have received it, since it seems
to me that I haven't successfully sent it out, maybe the attachment is
too large. I'll send the attachment separately if anoyone is
interested in it :)

Hi David,

This patch was made about eight months ago, however, at that time I
have no way to regularly reproduce it and haven't verified if this
patch really works, so I set it aside for so long ...

These months, I also heard of many users reporting this issue but
unfortunately when I requested them to test the patch, most of them
even didn't try it at all :(

Luckily, things have changed now. One user met this issue two months
ago and he's also very kindly to test the patch. The result is the
patch really works.

Attached is the log before they apply the patch. This time the log
has already included the debugging messages which were added by the
commit 27b09badd40a2d1500500fa6945aeb532f75bd13 , so we can see what
really happens on the user's site.
(The log is a bit large when it was uncompressed, this is because the
spinning would print many messages to the log.)

I rebased the patch against current upstream code now. Thank you for
your review in advance :)

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
---
 group/dlm_controld/cpg.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
index 9b0d223..12cb202 100644
--- a/group/dlm_controld/cpg.c
+++ b/group/dlm_controld/cpg.c
@@ -651,6 +651,7 @@ static int check_fs_done(struct lockspace *ls)
 		if (node->fs_notified) {
 			log_group(ls, "check_fs nodeid %d clear", node->nodeid);
 			node->check_fs = 0;
+			node->fs_notified = 0;
 		} else {
 			log_group(ls, "check_fs nodeid %d needs fs notify",
 				  node->nodeid);


On Mon, Mar 01, 2010 at 06:06:32AM +0800, Jiaju Zhang wrote:
> Hi,
> 
> About the issue that dlm_controld and fs_controld sit spinning,
> retrying and replying for the fs_notified check, I have a suspision
> that another scenario may also hit that logic:
> 
> If the node->fs_notified has been set to 1 by previous change, when a
> new change comes and needs to check the node->fs_notified, because it
> has not been reset to 0, so check_fs_done will succeed even if
> dlm_controld has not received the notification from fs_controld this
> time.
> For example, given that the following membership changes n, n+1, n+2,
> we see what happens on node X:
> Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason,
>         eventually in node X's ls->node_history, node Y's fs_notified
>         = 1
> Step 2: cg n+1: node Y joins ...
> Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one
>         possible scenario is: before fs_controld's notification
>         arrives, dlm_controld has known node Y is down from CPG
>         message and done a lot of work, and it saw node Y's
>         fs_notified = 1 (been set in Step 1) then passed the fs check
>         wrongly. So node Y's check_fs reset to 0.
> Step 4: fs_controld's notification arrives, it sees node Y's check_fs
>         = 0 and assumes dlm_controld has not known node Y is down and
>         retries to send the notification. But in fact, dlm_controld
>         has already known this and finished all the work, which will
>         result in the spinning ... 
> 
> I'm not sure if I read the code correctly :-) Below is the patch which
> reset the node->fs_notified. Review and comments are highly
> appreciated!
> 
> Thanks,
> Jiaju
> 
> Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
> ---
>  group/dlm_controld/cpg.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
> index d5245ce..b257595 100644
> --- a/group/dlm_controld/cpg.c
> +++ b/group/dlm_controld/cpg.c
> @@ -636,6 +636,7 @@ static int check_fs_done(struct lockspace *ls)
>  
>  		if (node->fs_notified) {
>  			node->check_fs = 0;
> +			node->fs_notified = 0;
>  		} else {
>  			log_group(ls, "check_fs nodeid %d needs fs notify",
>  				  node->nodeid);



  reply	other threads:[~2010-11-08 15:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-28 22:06 [Cluster-devel] [RFC][PATCH] dlm: Reset fs_notified when check_fs_done Jiaju Zhang
2010-11-08 15:05 ` Jiaju Zhang [this message]
2010-11-08 22:06   ` [Cluster-devel] [PATCH] " David Teigland
2011-02-22  8:35     ` Jiaju Zhang
2011-02-22 17:11       ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101108150549.GC11294@linux-jjzhang \
    --to=jjzhang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).