Re: [Drbd-dev] Another drbd race

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lars Marowsky-Bree <lmb@suse.de>
To: Lars Ellenberg <lars.ellenberg@linbit.com>, drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] Another drbd race
Date: Sat, 4 Sep 2004 12:18:14 +0200	[thread overview]
Message-ID: <20040904101813.GF11820@marowsky-bree.de> (raw)
In-Reply-To: <20040904100008.GA14645@nudl>

On 2004-09-04T12:00:08,
   Lars Ellenberg <lars.ellenberg@linbit.com> said:

Yep, that should be enough to detect this on the secondary. But:

> Most likely right after connection loss the Primary should blocks for a
> configurable (default: infinity?) amount of time before giving end_io
> events back to the upper layer.
> We then need to be able to tell it to resume operation (we can do this,
> as soon as we took precautions to prevent the Secondary to become
> Primary without being forced or resynced before).
> 
> Or, if the cluster decides to do so, the Secondary has time to STONITH
> the Primary (while that is still blocking) and take over.
> 
> I want to include a timeout, so the cluster manager don't need to
> know about "peer is dead" notification, it only needs to know about
> STONITH.

If it defaults to an 'infinite' timeout, which is safe, we need the
resume operation. (Or rather, notification about the successful "peer is
dead now" event.) This is easy to add.

And it is needed, because 

a) if the fencing _failed_, the primary needs to stay blocked until it
eventually succeeds. This is a correctness issue.

a) otherwise drbd would _always_ block for at least that amount of time
when it lost the secondary, even though it's been fenced since seconds
(or even we may have fenced it before drbd's internal peer timeout hits,
in which case it wouldn't ever block). This is a performance issue.

The combination of a+b gives a very good argument for having a resume
operation, which the new CRM will be able to drive in a couple of weeks
;-)

> Maybe we want to introduce this functionality as a new wire protocoll,
> or only in proto C.

It doesn't actually need to be a new wire protocol, it just needs an
additional option set (ie, the Oracle mode) and the 'resume' operation
on the primary; or actually, that could be mapped to an explicit switch
from WFConnection to StandAlone.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	   \\\  /// 
SUSE Labs, Research and Development \honk/ 
SUSE LINUX AG - A Novell company     \\//

next prev parent reply	other threads:[~2004-09-04 10:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20040819110202.GO9601@marowsky-bree.de>
     [not found] ` <20040819113205.GP9601@marowsky-bree.de>
     [not found]   ` <R+ahoCHARbsLOMKIahWH0/Q=lge@web.de>
2004-08-20 12:52     ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Philipp Reisner
2004-08-20 13:32       ` Lars Ellenberg
2004-08-23 14:28         ` [Drbd-dev] gen_counts and primary --human Lars Ellenberg
2004-08-23 21:57           ` Lars Marowsky-Bree
2004-08-25  9:42           ` Philipp Reisner
2004-08-23 21:56         ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Lars Marowsky-Bree
2004-08-25  9:42         ` Philipp Reisner
2004-08-25 10:28           ` Lars Marowsky-Bree
2004-08-25 11:30             ` Philipp Reisner
2004-08-25 13:38           ` Lars Ellenberg
2004-09-04  9:48         ` [Drbd-dev] Another drbd race Lars Marowsky-Bree
2004-09-04 10:00           ` Lars Ellenberg
2004-09-04 10:18             ` Lars Marowsky-Bree [this message]
2004-09-04 10:43               ` Lars Ellenberg
2004-09-04 10:51                 ` Lars Marowsky-Bree
2004-09-07  9:39             ` Philipp Reisner
2004-09-07 10:13               ` Lars Ellenberg
2004-09-07 11:32                 ` Philipp Reisner
2004-09-07 12:05                   ` Lars Ellenberg
2004-09-07 12:12                     ` Lars Marowsky-Bree
2004-09-07 12:06                   ` Lars Marowsky-Bree
2004-09-07 12:19                 ` Philipp Reisner
2004-09-07 12:28                   ` Lars Marowsky-Bree
2004-09-07 12:47                     ` Philipp Reisner
2004-09-08 11:20                       ` Lars Marowsky-Bree
2004-09-08 11:31                         ` Lars Ellenberg
2004-09-08 15:11                           ` Lars Marowsky-Bree
2004-09-08 15:22                             ` Lars Ellenberg
2004-09-08 11:33                         ` Philipp Reisner
2004-09-07 15:55                   ` Lars Ellenberg
2004-08-20 14:10       ` [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Helmut Wollmersdorfer
2004-08-23 22:01       ` Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040904101813.GF11820@marowsky-bree.de \
    --to=lmb@suse.de \
    --cc=drbd-dev@lists.linbit.com \
    --cc=lars.ellenberg@linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.