From: Nicholas Henke <nic@cray.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] imperative recovery
Date: Fri, 09 Jan 2009 09:27:53 -0600 [thread overview]
Message-ID: <49676CF9.5050805@cray.com> (raw)
In-Reply-To: <494AAF4A.4030304@sun.com>
Nathaniel Rutman wrote:
> Eric Barton wrote:
>>> Other options I've thought of to explore this idea:
>>>
>>> - MGS notifies clients (somehow) after a server has restarted.
>>>
> This seems like a no-brainer easy win today, and doesn't depend on any
> advanced features like message priority. The only scalability issue
> would seem to be the broadcast of the message to all clients, but this
> is no different than the current broadcast mechanism the MGS employs to
> update client configs. The message from the MGS would be taken as a
> suggestion, "Why don't y'all time out all your current RPCs since I
> noticed OST0004 restarted. Oh, and use failover nid #2." Current
> replay/recovery need not be touched.
This would be a great enhancement for OSS failover or reboot, it is really the
only way we'll get to recovery times under ~2.5 x obd_timeout. Adaptive Timeouts
really aren't buying us much here, as at scale and under load we are seeing the
timeouts approach the usual static obd_timeout of 300s. It only takes one client
with a higher timeout to push the recovery time out.
I do think this will miss a significant case: combo MGS+MDS. A majority of our
customers are deploying with this configuration. Perhaps exposing this mechanism
on the clients via a /proc file would be enough - that way a failover framework
could manually trigger the timeout and/or nid switching.
Nic
next prev parent reply other threads:[~2009-01-09 15:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1906DB02-F9DF-4F49-9A9A-23FE7E799EA8@sun.com>
2008-12-15 20:32 ` [Lustre-devel] imperative recovery Eric Barton
2008-12-18 20:15 ` Nathaniel Rutman
2009-01-09 15:27 ` Nicholas Henke [this message]
2009-01-09 17:04 ` Robert Read
2009-01-09 19:43 ` Nicholas Henke
2009-01-10 0:50 ` Andreas Dilger
2009-01-10 4:44 ` Robert Read
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49676CF9.5050805@cray.com \
--to=nic@cray.com \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.