All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [RFC] Integration with external clustering
Date: Wed Oct 19 17:42:09 2005	[thread overview]
Message-ID: <20051019224221.GB4305@redhat.com> (raw)
In-Reply-To: <20051019195654.GQ24589@marowsky-bree.de>

On Wed, Oct 19, 2005 at 09:56:54PM +0200, Lars Marowsky-Bree wrote:
> On 2005-10-18T19:24:18, Jeff Mahoney <jeffm@suse.com> wrote:
> 
> > > 	Have you also considered what this will or won't do to possible
> > > interaction with the CMan stack?  We'd love OCFS2 to handle both stacks.
> > I'm not really familiar with the CMan stack, but I was hoping that the
> > configuration I described would be easy enough for any userspace cluster
> > manager to handle. Lars and Andrew Beekhof are working with me on the
> > cluster side of things, so they'd be more familiar with the details here.
> 
> David, what are your thoughts? ;-)

Just catching up on this after being away for a while.  Not only has cman
moved entirely to user space, but a large portion of gfs (everything
related to cman and clustering) has also moved to user space.  So, a user
space gfs daemon (call it gfs_clusterd) interacts with the other user
space clustering systems and drives the bits of gfs in the kernel. 

Here are the main "knobs" gfs_clusterd uses to control a specific fs:

/sys/fs/gfs2/<fs_name>/lock_module/
                                   block
                                   mounted
                                   jid
                                   recover

When a gfs fs is mounted on a node:

. the mount process enters gfs-kernel
. the mount process sends a simple uevent to gfs_clusterd
. the mount process waits for gfs_clusterd to write 1 to /sys/.../mounted

. gfs_clusterd gets the mount uevent from gfs-kernel
. gfs_clusterd joins the cluster-wide "group" that represents the
  specific fs being mounted [1]
. gfs_clusterd tells gfs-kernel which journal the local node will use by
  writing the journal id to /sys/.../jid
. gfs_clusterd tells the mount process it can continue by writing 1
  to /sys/.../mounted
. the local node now has the fs mounted

[1] As part of the node being added to the group, gfs_clusterd on the
nodes that already have the fs mounted is notified of the new mounter for
the fs.

When a node that has a gfs file system mounted fails:

. the cluster infrastructure notifies gfs_clusterd that a node failed
. gfs_clusterd writes 1 to /sys/../block to block new lock requests from gfs
. the infrastructure notifies gfs_clusterd that gfs_clusterd is "stopped"
  (and therefore blocked) on all mounters
. gfs_clusterd tells gfs-kernel to recover the journal of the failed
  node by writing the journal id of the failed node to /sys/.../recover
. when journal recovery is done, gfs-kernel sends a uevent to gfs_clusterd
. gfs_clusterd tells gfs-kernel to continue normal operation by
  writing 0 to /sys/.../block

That's a simplified example of how we control gfs from user space.  Our
dlm is controlled in a similar way by the dlm_controld daemon.  Think of
the user daemon (gfs_clusterd) and kernel module (gfs.ko) as two parts of
a single system and sysfs/configfs as more of an internal communication
path between the two parts, not so much an external API.

It's the interfaces the two user daemons have with the cluster
infrastructure (membership/group manager, crm, etc) that would need to be
studied to use gfs in other environments.  None of this is easy, but
there's far more flexibility working it out in user space than in the
kernel.  The same may be the case for ocfs.

Dave

  reply	other threads:[~2005-10-19 17:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-18 16:52 [Ocfs2-devel] [RFC] Integration with external clustering Jeff Mahoney
2005-10-18 17:18 ` Joel Becker
2005-10-18 18:03   ` Lars Marowsky-Bree
2005-10-18 18:27     ` Joel Becker
2005-10-18 18:50       ` Mark Fasheh
2005-10-19  8:26       ` Lars Marowsky-Bree
2005-10-19 12:49         ` Joel Becker
2005-10-19 17:41           ` Jeff Mahoney
2005-10-20  7:39             ` Lars Marowsky-Bree
2005-10-19 16:30         ` Jeff Mahoney
2005-10-20  5:24           ` Lars Marowsky-Bree
2005-10-20 10:03             ` Joel Becker
2005-10-20 10:25               ` David Teigland
2005-10-20 10:42                 ` Joel Becker
2005-10-20 10:45                   ` Lars Marowsky-Bree
2005-10-21  4:05                     ` Andrew Beekhof
2005-10-24  6:41                       ` Lars Marowsky-Bree
2005-10-24  8:39                         ` Andrew Beekhof
2005-10-21  4:09                   ` Christoph Hellwig
2005-10-21  9:29                     ` Robert Wipfel
2005-11-06 23:01                       ` Christoph Hellwig
2005-11-07  6:08                         ` Lars Marowsky-Bree
2005-10-20  6:04           ` Andrew Beekhof
2005-10-18 18:47     ` Mark Fasheh
2005-10-19  8:35       ` Lars Marowsky-Bree
2005-10-18 18:20   ` Jeff Mahoney
2005-10-19 14:57     ` Lars Marowsky-Bree
2005-10-19 17:42       ` David Teigland [this message]
2005-10-20  5:58         ` Lars Marowsky-Bree
2005-10-20  9:45           ` David Teigland
2005-10-28 10:11 ` [Ocfs2-devel] " Lars Marowsky-Bree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051019224221.GB4305@redhat.com \
    --to=teigland@redhat.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.