Re: [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources

Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed

From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: linux-ha-dev@lists.linux-ha.org
Subject: Re: [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources
Date: Mon, 20 Sep 2004 17:09:36 +0200	[thread overview]
Message-ID: <200409201709.36849.philipp.reisner@linbit.com> (raw)
In-Reply-To: <20040910185553.GU7359@marowsky-bree.de>

[ I am not subscribed to linux-ha-dev ]

Hi Lars,

[...]
> If we notice that N1 is crashed first, that's fine. Everything will
> happen just as always, and N2 can proceed as soon as it sees the
> post-fence/stop notification, which it will see before being promoted to
> master or even being asked about it.
>
> But, from the point of view of the replicated resource on N2, this is
> indistinguishable from the split-brain; all it knows is that it lost
> connection to it's peer. So it goes on to report this.
>
> If this event occurs before we have noticed a monitoring failure or full
> node failure on N1 and were using the recovery method explained so far,
> we are going to assume an internal split-brain, and tell N2 to mark
> itself outdated, and then try to tell N1 to resume.  Oops. No more
> talky-talky to N1, and we just told N2 it's supposed to refuse to become
> master.

So the algorithm in HB/CRM seems to be:

If I see that resource (drbd) got disconnected from its peer. then {
 If the resource is a replica (secondary) then {
  tell it that it should mark itself as "desync". 
 } else /* Resource is master (primary) */ {
  Wait for the post fence event and thaw the resource.
 }
}

> So, this requires special logic - whenever one incarnation reports an
> internal split-brain, we actively need to go and verify the status of
> the other incarnations first.
>
> In which case we'd notice that, ah, N1 is down or experiencing a local
> resource failure, and instead of outdating N2, would fence / stop N1 and
> then promote N2.
>
> This is the special logic I don't much like. As Rusty put it in his
> keynote, "Fear of complexity" is good for programmers. And this reeks of
> it - extending the monitor semantics, needing an additional command on
> the secondary, _and_ needing to talk to all incarnations and then
> figuring out what to do. (I don't want to think much about partitions
> with >2 resources involved.) Alas, the problem seems to be real.
>

What is about:

If I see that resource (drbd) got disconnected from its peer. then {
 If the resource is a replica (secondary) then {
  /* do nothing */
 } else /* Resource is master (primary) */ {
  Ask the other node to do the fencing.
 }
}

If I see a fence ack then {
 Thaw the resource.
}

There is no special case in there...

> Here's some other alternatives I've thought about which seem simpler,
> but which I then noticed don't solve the problem completely.
>
>
> A) Rely on the internal split-brain timeout being larger than our
> deadtime of N1 and the resource monitoring interval.
>
> This _seems_ to solve it - because then the problematic ordering does
> not occur, but relies quite a bit on timing. And if the resource on N2
> notices, for example, a connection loss immediately, this basically
> can't be made to work. Oh yeah, it can be worked around by adding delays
> etc, but that smells a bit dung-ish, too.
>

Right, ... 
Currently drbd's timeout needs to be smaller than heartbeat's deadtime,
making this the other way round asks for troubles I think...

[...]

BTW, from the text I realized that hearbeat will monitor the resource (drbd).
Probabely with calling the resource script with a new method. Basically
hearbeat polls DRBD for an change in the connection state.

Would you like to have an active notification from DRBD ? 

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

next prev parent reply	other threads:[~2004-09-20 15:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-10 18:55 [Drbd-dev] [RFC] Handling of internal split-brain in multiple state resources Lars Marowsky-Bree
2004-09-20 15:09 ` Philipp Reisner [this message]
2004-09-20 15:36   ` Lars Ellenberg
2004-09-24 15:17   ` [Linux-ha-dev] " Lars Marowsky-Bree
2004-09-20 16:03 ` Lars Ellenberg
2004-09-21 12:58   ` Philipp Reisner
2004-09-24 15:15   ` [Linux-ha-dev] " Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200409201709.36849.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=linux-ha-dev@lists.linux-ha.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox