From: Andreas Dilger <adilger@clusterfs.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>,
Kalpak Shah <kalpak@clusterfs.com>,
linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [RFC][PATCH] Multiple mount protection
Date: Fri, 1 Jun 2007 12:00:04 -0600 [thread overview]
Message-ID: <20070601180003.GX5181@schatzie.adilger.int> (raw)
In-Reply-To: <20070601135241.GB28663@thunk.org>
On Jun 01, 2007 09:52 -0400, Theodore Tso wrote:
> On Fri, Jun 01, 2007 at 02:13:39PM +0200, Andi Kleen wrote:
> > Clusters usually have other ways to do this, haven't they?
> > Typically they have STONITH too. It's probably too simple minded
> > to just replace a real cluster setup which also handles split
> > brain and other conditions. So it's purely against mistakes.
>
> Yes, it's only real value is to protect against Cluster-HA
> malfunctions or misconfiguration.
While I agree that HA systems _should_ be enough for this, in our
experience even with an HA system some people get it wrong (e.g.
manually mounting and bypassing HA, HA itself is broken, comms failure,
STONITH failure, etc).
I agree it is not intended to be a replacement for an HA/STONITH
solution, just belt & suspenders that would have saved hundreds of
TB of user data in several cases if it were available. We will
enable it by default on all of our filesystems, and of course I'd
advise anyone in a SAN environment (whether they _intend_ to have
shared disk access or not) to enable it also.
> > Besides relying on it would seem dangerous because it is not synchronous
> > and you could do a lot of damage in 5 seconds.
>
> Well, the MMP feature is assigned an incompatible feature bit, so a
> kernel who doesn't know about MMP will refuse to touch it; and a
> kernel which does follow the MMP protocol will check the MMP block
> (delaying the mount by 10 seconds) to make sure no other system is
> using the block.
Correct. There is a "fast path" where it will wait a shorter time
during mount if the fs is reported cleanly unmounted. We can't skip
the check entirely, because 2 systems might be mounting at the same
time.
> So aside from being !@#!@ annoying (which is why it will never be the
> default), it does work, modulo the problem that without STONITH or any
> kind of I/O fencing, we do risk the other system coming back to life
> and then modifying the filesystem in parallel. So as everyone has
> said, this is not solution that works in isolation, but is really only
> a backup.
If the kmmpd is not scheduled in more than 10s then it will re-read the
block to ensure that the local system is still the one in control. If
not, it will ext3_error() and (in our case at least) this will make the
client fs read-only. Even if there is some IO leakage from the local
client, this is far better than to continue running with 2 systems writing
to the same disk.
Ideally there would also be a block-layer functionality to fence the IO
on the local system (e.g. plug the elevator output, I don't think that
there is anything that could be done about IO already submitted to the
device), but the function I thought did this (set_device_rdonly()) is
only checked at mount time and is useless.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
prev parent reply other threads:[~2007-06-01 18:00 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-21 19:52 [RFC][PATCH] Multiple mount protection Kalpak Shah
2007-05-22 7:15 ` Manoj Joseph
2007-05-22 7:34 ` Kalpak Shah
2007-05-22 7:53 ` Manoj Joseph
2007-05-22 8:06 ` Kalpak Shah
2007-05-24 23:25 ` Karel Zak
2007-05-25 6:44 ` Kalpak Shah
2007-05-25 14:39 ` Theodore Tso
2007-05-25 19:31 ` Jim Garlick
2007-05-25 21:36 ` Kalpak Shah
2007-05-30 20:58 ` Kalpak Shah
2007-05-31 16:16 ` Theodore Tso
2007-05-31 21:09 ` Kalpak Shah
2007-06-01 8:46 ` Andi Kleen
2007-06-01 8:27 ` Kalpak Shah
2007-06-01 9:14 ` Andreas Dilger
2007-06-01 10:56 ` Andi Kleen
2007-06-01 11:41 ` Theodore Tso
2007-06-01 12:13 ` Andi Kleen
2007-06-01 13:52 ` Theodore Tso
2007-06-01 18:00 ` Andreas Dilger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070601180003.GX5181@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=andi@firstfloor.org \
--cc=kalpak@clusterfs.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox