Re: [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: linux-ha-dev@lists.linux-ha.org
Subject: Re: [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources
Date: Mon, 20 Sep 2004 17:09:36 +0200	[thread overview]
Message-ID: <200409201709.36849.philipp.reisner@linbit.com> (raw)
In-Reply-To: <20040910185553.GU7359@marowsky-bree.de>

[ I am not subscribed to linux-ha-dev ]

Hi Lars,

[...]
> If we notice that N1 is crashed first, that's fine. Everything will
> happen just as always, and N2 can proceed as soon as it sees the
> post-fence/stop notification, which it will see before being promoted to
> master or even being asked about it.
>
> But, from the point of view of the replicated resource on N2, this is
> indistinguishable from the split-brain; all it knows is that it lost
> connection to it's peer. So it goes on to report this.
>
> If this event occurs before we have noticed a monitoring failure or full
> node failure on N1 and were using the recovery method explained so far,
> we are going to assume an internal split-brain, and tell N2 to mark
> itself outdated, and then try to tell N1 to resume.  Oops. No more
> talky-talky to N1, and we just told N2 it's supposed to refuse to become
> master.

So the algorithm in HB/CRM seems to be:

If I see that resource (drbd) got disconnected from its peer. then {
 If the resource is a replica (secondary) then {
  tell it that it should mark itself as "desync". 
 } else /* Resource is master (primary) */ {
  Wait for the post fence event and thaw the resource.
 }
}

> So, this requires special logic - whenever one incarnation reports an
> internal split-brain, we actively need to go and verify the status of
> the other incarnations first.
>
> In which case we'd notice that, ah, N1 is down or experiencing a local
> resource failure, and instead of outdating N2, would fence / stop N1 and
> then promote N2.
>
> This is the special logic I don't much like. As Rusty put it in his
> keynote, "Fear of complexity" is good for programmers. And this reeks of
> it - extending the monitor semantics, needing an additional command on
> the secondary, _and_ needing to talk to all incarnations and then
> figuring out what to do. (I don't want to think much about partitions
> with >2 resources involved.) Alas, the problem seems to be real.
>

What is about:

If I see that resource (drbd) got disconnected from its peer. then {
 If the resource is a replica (secondary) then {
  /* do nothing */
 } else /* Resource is master (primary) */ {
  Ask the other node to do the fencing.
 }
}

If I see a fence ack then {
 Thaw the resource.
}

There is no special case in there...

> Here's some other alternatives I've thought about which seem simpler,
> but which I then noticed don't solve the problem completely.
>
>
> A) Rely on the internal split-brain timeout being larger than our
> deadtime of N1 and the resource monitoring interval.
>
> This _seems_ to solve it - because then the problematic ordering does
> not occur, but relies quite a bit on timing. And if the resource on N2
> notices, for example, a connection loss immediately, this basically
> can't be made to work. Oh yeah, it can be worked around by adding delays
> etc, but that smells a bit dung-ish, too.
>

Right, ... 
Currently drbd's timeout needs to be smaller than heartbeat's deadtime,
making this the other way round asks for troubles I think...

[...]

BTW, from the text I realized that hearbeat will monitor the resource (drbd).
Probabely with calling the resource script with a new method. Basically
hearbeat polls DRBD for an change in the connection state.

Would you like to have an active notification from DRBD ? 

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

next prev parent reply	other threads:[~2004-09-20 15:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-10 18:55 [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources Lars Marowsky-Bree
2004-09-20 15:09 ` Philipp Reisner [this message]
2004-09-20 15:36   ` Lars Ellenberg
2004-09-24 15:17   ` [Linux-ha-dev] " Lars Marowsky-Bree
2004-09-20 16:03 ` Lars Ellenberg
2004-09-21 12:58   ` Philipp Reisner
2004-09-24 15:15   ` [Linux-ha-dev] " Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200409201709.36849.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=linux-ha-dev@lists.linux-ha.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.