From: Peter Braam <Peter.Braam@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Failover & Force export for the DMU
Date: Wed, 16 Apr 2008 08:37:04 -0700 [thread overview]
Message-ID: <C42B6B30.37C3%peter.braam@sun.com> (raw)
?Force export? for the DMU serves a similar purpose as a feature we added
for block devices in Linux in relation to exports. When failover is
initiated, the OSS/MDS servers stop sending replies and requests that are
still being processed interact with the block devices in a model where the
devices discard write commands WITHOUT returning errors. This is different
from merely declaring the device READONLY in which case errors are returned.
The latter is a default feature in the Linux kernel, what we did is a patch
(but could be a mapper module).
The thinking behind this approach was (many years ago) that this avoids
exposing the server layers to errors (caused by writes to read only devices)
from the block devices which might cause the server to panic, thereby taking
out other targets inadvertently.
However, the approach is flawed. It is (theoretically, but not so likely)
possible for the server to write something, believe it has been done, and
read it back getting the wrong data (because it wasn?t written), and still
panic.
So I would like to suggest that for the DMU we do this differently and rely
on a normal read only device. So, the server, during recovery, will be
using standard read only devices (and similar under the DMU). If the file
system or DMU returns errors because writes cannot be performed for requests
that are in progress during the failover event, then these errors should be
handled gracefully (without panics). Note that the errors will never reach
the client, not over the network and not through reply reconstruction,
because failover was initiated before they happened.
The hacked feature retains value because it can generate an artificially
large amount of rollback data, which is useful for testing the replay
recovery mechanisms in Lustre. However, with DMU snapshots this can easily
be simulated in a different manner.
Nikita, Alex ? I think the key issue here is that the error handling in the
new servers that you have written needs to be resilient enough to handle
this. Can you think about it?
Ricardo ? for the DMU all you need to do is make sure you can quickly turn a
device read only below the DMU and the DMU can handle that (its like doing
?mount ?o remount, ro?).
Regards
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080416/965dd99f/attachment.htm>
next reply other threads:[~2008-04-16 15:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 15:37 Peter Braam [this message]
2008-04-16 16:40 ` [Lustre-devel] Failover & Force export for the DMU Ricardo M. Correia
2008-04-17 0:18 ` Peter Braam
2008-04-17 16:10 ` Ricardo M. Correia
2008-04-17 17:53 ` Peter Braam
2008-04-17 17:56 ` Peter Braam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C42B6B30.37C3%peter.braam@sun.com \
--to=peter.braam@sun.com \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.