All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Henke <nic@cray.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] imperative recovery
Date: Fri, 09 Jan 2009 09:27:53 -0600	[thread overview]
Message-ID: <49676CF9.5050805@cray.com> (raw)
In-Reply-To: <494AAF4A.4030304@sun.com>

Nathaniel Rutman wrote:
> Eric Barton wrote:
>>> Other options I've thought of to explore this idea:
>>>
>>> - MGS notifies clients (somehow) after a server has restarted.
>>>     
> This seems like a no-brainer easy win today, and doesn't depend on any 
> advanced features like message priority.  The only scalability issue 
> would seem to be the broadcast of the message to all clients, but this 
> is no different than the current broadcast mechanism the MGS employs to 
> update client configs.  The message from the MGS would be taken as a 
> suggestion, "Why don't y'all time out all your current RPCs since I 
> noticed OST0004 restarted.  Oh, and use failover nid #2."  Current 
> replay/recovery need not be touched.

This would be a great enhancement for OSS failover or reboot, it is really the 
only way we'll get to recovery times under ~2.5 x obd_timeout. Adaptive Timeouts 
really aren't buying us much here, as at scale and under load we are seeing the 
timeouts approach the usual static obd_timeout of 300s. It only takes one client 
with a higher timeout to push the recovery time out.

I do think this will miss a significant case: combo MGS+MDS. A majority of our 
customers are deploying with this configuration. Perhaps exposing this mechanism 
on the clients via a /proc file would be enough - that way a failover framework 
could manually trigger the timeout and/or nid switching.

Nic

  reply	other threads:[~2009-01-09 15:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1906DB02-F9DF-4F49-9A9A-23FE7E799EA8@sun.com>
2008-12-15 20:32 ` [Lustre-devel] imperative recovery Eric Barton
2008-12-18 20:15   ` Nathaniel Rutman
2009-01-09 15:27     ` Nicholas Henke [this message]
2009-01-09 17:04       ` Robert Read
2009-01-09 19:43         ` Nicholas Henke
2009-01-10  0:50         ` Andreas Dilger
2009-01-10  4:44           ` Robert Read

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49676CF9.5050805@cray.com \
    --to=nic@cray.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.