cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection
Date: Mon, 16 May 2016 12:02:46 -0500	[thread overview]
Message-ID: <20160516170246.GB20979@redhat.com> (raw)
In-Reply-To: <57397A5B.3070302@suse.com>

On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote:
> Thanks! Hum, according to the long comments, you've handled the 2/2
> even split by way of the low nodeid killing statefull merged
> numbers.

Interesting, I'd forgotten about that bit of code, so I was wrong to say
that we do nothing after a 2/2 partition merge.

> we can do that because the "clean nodes" == "stateful nodes", right? I
> guess you're saying the case that there're 3 or more partitions that
> merge, and none could see enough clean nodes?

That sounds about right, this is a fairly narrow special case that mainly
helps in the case of two evenly split partitions.  There could be some
cases with more nodes and 3+ partitions where it might help.

> Now DLM outputs log message like:
> "fence work wait to clear merge $merge_count clean $clean_count part
> $part_count gone $gone_count". I'm wondering if we can provide these
> info and how long DLM has been stuck by "dlm_tool $some_cmd"?

Yes, that's a good idea.  dlm_tool should clearly report if the dlm is
blocked because of a stateful partition merge.  Perhaps a new global
variable printed by 'dlm_tool status -v'?

I'm not exactly sure of the condition when we'd set this new variable,
could you try it out?  For a start, maybe something similar to this:

static int stateful_merge_wait;
...

        if (!cluster_two_node && merge_count) {
                log_retry(retry_fencing, "fence work wait to clear merge %d clean %d part %d gone %d",
                          merge_count, clean_count, part_count, gone_count);

                if ((clean_count >= merge_count) && !part_count && (low == our_nodeid))
                        kick_stateful_merge_members();
+               if ((clean_count < merge_count) && !part_count)
+                       stateful_merge_wait=1;

                retry = 1;
                goto out;
        }

(and added to print_state_daemon)

Dave



      reply	other threads:[~2016-05-16 17:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-12  9:16 [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection Eric Ren
2016-05-12 16:51 ` David Teigland
2016-05-13  5:45   ` Eric Ren
2016-05-13 15:49     ` David Teigland
2016-05-16  7:44       ` Eric Ren
2016-05-16 17:02         ` David Teigland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160516170246.GB20979@redhat.com \
    --to=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).