From: Lars Marowsky-Bree <lmb@suse.de>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [RFC] Integration with external clustering
Date: Thu Oct 20 05:58:09 2005 [thread overview]
Message-ID: <20051020105755.GE11726@marowsky-bree.de> (raw)
In-Reply-To: <20051019224221.GB4305@redhat.com>
On 2005-10-19T17:42:21, David Teigland <teigland@redhat.com> wrote:
> Just catching up on this after being away for a while. Not only has cman
> moved entirely to user space, but a large portion of gfs (everything
> related to cman and clustering) has also moved to user space. So, a user
> space gfs daemon (call it gfs_clusterd) interacts with the other user
> space clustering systems and drives the bits of gfs in the kernel.
Morning David, thanks for your insights!
> Here are the main "knobs" gfs_clusterd uses to control a specific fs:
>
> /sys/fs/gfs2/<fs_name>/lock_module/
> block
> mounted
> jid
> recover
>
> When a gfs fs is mounted on a node:
>
> . the mount process enters gfs-kernel
> . the mount process sends a simple uevent to gfs_clusterd
> . the mount process waits for gfs_clusterd to write 1 to /sys/.../mounted
>
> . gfs_clusterd gets the mount uevent from gfs-kernel
> . gfs_clusterd joins the cluster-wide "group" that represents the
> specific fs being mounted [1]
> . gfs_clusterd tells gfs-kernel which journal the local node will use by
> writing the journal id to /sys/.../jid
> . gfs_clusterd tells the mount process it can continue by writing 1
> to /sys/.../mounted
> . the local node now has the fs mounted
The /sys/.../mounted flag seems to be exactly the thing I don't like.
Sigh. ;-) It seems, however, that there's actual demand for this
functionality.
OK. I'll now make a 180 degree turn and say that we need to do this and
agree to figure out how ;-)
Ignoring the specific steps gfs_clusterd performs (which would be
different on our stack, of course), the main issue I'm not liking this
much is the hoop through kernel space for the uevent and the
notification.
(Also, your outline doesn't contain the possibility that the cluster
says "No, you CAN'T mount this. Rejected!" - is this for ease of
describing the case, or how is that implemented? Writing "2" to the
.../mounted flag or something?)
I'd much rather have all of this done in user-space prior to the actual
mount syscall being issued.
"mount" would need a generic hook by which it could call into the
cluster stuff (whatever it is) to a) have it authorize the mount, b)
_know_ about the mount, c) prepare the mount if needed - by bringing
online all pre-requisites on that node et cetera.
Actually this is quite powerful. This hook could also be used for _non
cluster filesystems_ - the cluster could deny mounting of filesystems on
shared storage which are active on another node.
Same for umount. A nice side-effect for the umount would be that it
could actually ask the cluster "hey, admin wants this unmounted, stop
everything which depends on it on that node too! Migrate!".
Two issues:
- This is a special case for filesystems. It'd be nice if we had a
generic mechanism by which this also worked for all kinds of
resources; as I've said, CIM seems to be going into that direction.
Then also this could be unified with the mechanism for example the
clustered LVMs use; the C-LVM2 already has such a mechanism internally
too.
However, filesystems are a fairly important case, and when we have
more than one implementation of this mechanism (LVM + filesystem)
we'll have a better idea of what such a generic mechanism would look
like.
- Trapping this in user-space of course isn't as powerful as
intercepting each and every mount syscall(); somebody calling directly
would get a reject. This however seems acceptable to me?
Sincerely,
Lars Marowsky-Br?e <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
next prev parent reply other threads:[~2005-10-20 5:58 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-18 16:52 [Ocfs2-devel] [RFC] Integration with external clustering Jeff Mahoney
2005-10-18 17:18 ` Joel Becker
2005-10-18 18:03 ` Lars Marowsky-Bree
2005-10-18 18:27 ` Joel Becker
2005-10-18 18:50 ` Mark Fasheh
2005-10-19 8:26 ` Lars Marowsky-Bree
2005-10-19 12:49 ` Joel Becker
2005-10-19 17:41 ` Jeff Mahoney
2005-10-20 7:39 ` Lars Marowsky-Bree
2005-10-19 16:30 ` Jeff Mahoney
2005-10-20 5:24 ` Lars Marowsky-Bree
2005-10-20 10:03 ` Joel Becker
2005-10-20 10:25 ` David Teigland
2005-10-20 10:42 ` Joel Becker
2005-10-20 10:45 ` Lars Marowsky-Bree
2005-10-21 4:05 ` Andrew Beekhof
2005-10-24 6:41 ` Lars Marowsky-Bree
2005-10-24 8:39 ` Andrew Beekhof
2005-10-21 4:09 ` Christoph Hellwig
2005-10-21 9:29 ` Robert Wipfel
2005-11-06 23:01 ` Christoph Hellwig
2005-11-07 6:08 ` Lars Marowsky-Bree
2005-10-20 6:04 ` Andrew Beekhof
2005-10-18 18:47 ` Mark Fasheh
2005-10-19 8:35 ` Lars Marowsky-Bree
2005-10-18 18:20 ` Jeff Mahoney
2005-10-19 14:57 ` Lars Marowsky-Bree
2005-10-19 17:42 ` David Teigland
2005-10-20 5:58 ` Lars Marowsky-Bree [this message]
2005-10-20 9:45 ` David Teigland
2005-10-28 10:11 ` [Ocfs2-devel] " Lars Marowsky-Bree
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051020105755.GE11726@marowsky-bree.de \
--to=lmb@suse.de \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.