All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Barton <eeb@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
Date: Mon, 22 Jun 2009 18:53:01 +0100	[thread overview]
Message-ID: <06b201c9f362$49015b20$db041160$@com> (raw)
In-Reply-To: <4A3C0CF2.1080809@cray.com>

Chris,

Comment inline...

> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of Chris Horn
> Sent: 19 June 2009 11:11 PM
> To: lustre-devel at lists.lustre.org
> Subject: Re: [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
> 
> Oops, I forgot to cc lustre-devel.
> 
> Johann Lombardi wrote:
> 
> > > On Jun 19, 2009, at 1:10 AM, Chris Horn wrote:
> >
> >> >> It seems as though an ability to short circuit is only going to be
> >> >> useful if we can distinguish between the case where we only need a short
> >> >> recovery window vs. the case where we need that extra time.  My question
> >> >> is, what are the use cases where this applies?
> >> >>
> >> >> My intuition is the following:
> >> >> Case 1:  x/y clients which are dead, (y-x)/y clients connected to the
> >> >> backup server (all clients that can connect have done so).  We want to
> >> >> go ahead and short circuit.
> >>
> > >
> > > That's the 2nd aspect of imperative recovery. We want to notify the
> > > server when all clients that were supposed to reconnect should
> > > have done so already. Basically, the idea is to tell the server that
> > > no new clients will reconnect now and that it is not needed to wait
> > > any longer for new clients to join (the x clients).
> >
> I just want to verify that in order to use this 2nd aspect of imperative
> recovery we need some method of determining client health, yes?

Yes.  

Consider a utility that runs on a client to notify it to reconnect to a
failover server, and which completes with a success status only when the
client has reconnected successfully.

If you run this utility on all clients after starting a failover server,
you can notify the server to close the recovery window once all instances have
completed since that tells you that all clients are healthy and ready to
participate in recovery.

Of course, you can decide to stop waiting and proceed with the server
notification at any time you like.  You can base this decision on a timeout,
knowing how many clients have reconnected successfully, or any other criterion
you chose - i.e. you are now the effective arbiter of client health.

    Cheers,
              Eric

  reply	other threads:[~2009-06-22 17:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-18 23:10 [Lustre-devel] Imperative Recovery - forcing failover server stop blocking Chris Horn
2009-06-19 21:18 ` Johann Lombardi
2009-06-19 22:10   ` Chris Horn
2009-06-22 17:53     ` Eric Barton [this message]
2009-06-22 18:21       ` Chris Horn
2009-06-22 19:27         ` Brian Behlendorf
2009-06-23 12:49         ` Eric Barton
2009-06-23 14:53           ` Andreas Dilger
2009-06-23 14:59             ` Chris Horn
2009-06-23 17:20             ` Robert Read

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='06b201c9f362$49015b20$db041160$@com' \
    --to=eeb@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.