Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] Another drbd race
Date: Tue, 7 Sep 2004 11:39:29 +0200	[thread overview]
Message-ID: <200409071139.29609.philipp.reisner@linbit.com> (raw)
In-Reply-To: <20040904100008.GA14645@nudl>

On Saturday 04 September 2004 12:00, Lars Ellenberg wrote:
> On Sat, Sep 04, 2004 at 11:48:14AM +0200, Lars Marowsky-Bree wrote:
> > Hi,
> >
> > lge and I have yesterday discussed a 'new' drbd race condition and also
> > touched on its resolution.
> >
> > Scope: in a split-brain, drbd might confirm write to the clients and
> > might on a subsequent failover lose the transactions which _have been
> > confirmed_. This is not acceptable.
> >
> > Sequence:
> >
> > Step N1 Link N2
> > 1 P ok S
> > 2 P breaks S node1 notices, goes into stand alone,
> >     stops waiting for N2 to confirm.
> > 3 P broken S S notices, initiates fencing
> > 4 x broken P N2 becomes primary
> >
> > Writes which have been done in between step 2-4 will have been confirmed
> > to the higher layers, but are not actually available on N2. This is data
> > loss; N2 is still consistent, but lost confirmed transaction.
> >
> > Partially, this is solved by the Oracle-requested "only ever confirm if
> > committed to both nodes", but of course then if it's not a broken link,
> > but N2 really went down, we'd be blocking on N1 forever, which we don't
> > want to do for HA.
> >
> > So, here's the new sequence to solve this:
> >
> > Step N1 Link N2
> > 1 P ok S
> > 2 P(blk) ok X P blocks waiting for acks; heartbeat
> >     notices that it has lost N2, and initiates
> >     fencing.
> > 3 P(blk) ok fenced heartbeat tells drbd on N1 that yes, we
> >     know it's dead, we fenced it, no point
> >     waiting.
> > 4 P ok fenced Cluster proceeds to run.
> >
> > Now, in this super-safe mode, if now N1 also fails after step 3 but
> > before N2 comes back up and is resynced, we need to make sure that N2
> > does refuse to become primary itself. This will probably require
> > additional magic in the cluster manager to handle correctly, but N2
> > needs an additional flag to prevent this from happening by accident.
> >
> > Lars?
>
> I think we can do this detection already with the combination of the
> Consistent and Connected as well as HaveBeenPrimary flag. Only the logic
> needs to be built in.
>

I do not want to "misuse" the Consistent Bit for this.

!Consistent  .... means that we are in the middle of a sync.
                   = data is not usable at all.
 Fenced      .... our data is 100% okay, but not the latest copy.


> Most likely right after connection loss the Primary should blocks for a
> configurable (default: infinity?) amount of time before giving end_io
> events back to the upper layer.
> We then need to be able to tell it to resume operation (we can do this,
> as soon as we took precautions to prevent the Secondary to become
> Primary without being forced or resynced before).
>
> Or, if the cluster decides to do so, the Secondary has time to STONITH
> the Primary (while that is still blocking) and take over.
>
> I want to include a timeout, so the cluster manager don't need to
> know about "peer is dead" notification, it only needs to know about
> STONITH.

I see. Makes sense, but on the other hand STONITH (more genral:
FENCING)  might fail, as LMB points out in one of the other mails.

-> We should probabely _not_ offer a timeout here, as soon as
   "on-disconnect freeze_io;" is set, it is freezed forever.
   Or it gets a "drbdadm resume-io r0" from the cluster manager.

> Maybe we want to introduce this functionality as a new wire protocoll,
> or only in proto C.
>

I see it controled by the 

"on-disconnect freeze_io;" option.

For N2 we need a "drbdadm fence-off r0" command and for N1 we need 
a "drbdadm resume-io r0".

* The fenced bit gets cleard when the resync is finished.
* A node refuses to become primary when the fenced bit is set.
* "drbdadm -- --do-what-I-say primary r0" overrules (and cleares?) 
  the fenced bit

To be defined: What should we do at node startup with the fenced bit.
               (At least display it at the user-dialog)

-philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

  parent reply	other threads:[~2004-09-07  9:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20040819110202.GO9601@marowsky-bree.de>
     [not found] ` <20040819113205.GP9601@marowsky-bree.de>
     [not found]   ` <R+ahoCHARbsLOMKIahWH0/Q=lge@web.de>
2004-08-20 12:52     ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Philipp Reisner
2004-08-20 13:32       ` Lars Ellenberg
2004-08-23 14:28         ` [Drbd-dev] gen_counts and primary --human Lars Ellenberg
2004-08-23 21:57           ` Lars Marowsky-Bree
2004-08-25  9:42           ` Philipp Reisner
2004-08-23 21:56         ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Lars Marowsky-Bree
2004-08-25  9:42         ` Philipp Reisner
2004-08-25 10:28           ` Lars Marowsky-Bree
2004-08-25 11:30             ` Philipp Reisner
2004-08-25 13:38           ` Lars Ellenberg
2004-09-04  9:48         ` [Drbd-dev] Another drbd race Lars Marowsky-Bree
2004-09-04 10:00           ` Lars Ellenberg
2004-09-04 10:18             ` Lars Marowsky-Bree
2004-09-04 10:43               ` Lars Ellenberg
2004-09-04 10:51                 ` Lars Marowsky-Bree
2004-09-07  9:39             ` Philipp Reisner [this message]
2004-09-07 10:13               ` Lars Ellenberg
2004-09-07 11:32                 ` Philipp Reisner
2004-09-07 12:05                   ` Lars Ellenberg
2004-09-07 12:12                     ` Lars Marowsky-Bree
2004-09-07 12:06                   ` Lars Marowsky-Bree
2004-09-07 12:19                 ` Philipp Reisner
2004-09-07 12:28                   ` Lars Marowsky-Bree
2004-09-07 12:47                     ` Philipp Reisner
2004-09-08 11:20                       ` Lars Marowsky-Bree
2004-09-08 11:31                         ` Lars Ellenberg
2004-09-08 15:11                           ` Lars Marowsky-Bree
2004-09-08 15:22                             ` Lars Ellenberg
2004-09-08 11:33                         ` Philipp Reisner
2004-09-07 15:55                   ` Lars Ellenberg
2004-08-20 14:10       ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Helmut Wollmersdorfer
2004-08-23 22:01       ` Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200409071139.29609.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox