From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 0/5] nfsd: overhaul the client name tracking code (RFC)
Date: Wed, 14 Dec 2011 15:00:29 -0500 [thread overview]
Message-ID: <20111214200029.GA7623@fieldses.org> (raw)
In-Reply-To: <20111214094920.6d3fafa8@tlielax.poochiereds.net>
On Wed, Dec 14, 2011 at 09:49:20AM -0500, Jeff Layton wrote:
> On Wed, 14 Dec 2011 09:35:57 -0500
> Chuck Lever <chuck.lever@oracle.com> wrote:
>
> >
> > On Dec 14, 2011, at 8:54 AM, Jeff Layton wrote:
> >
> > > First, a little background: I've recently been tasked with a project
> > > to make active/active serving of NFSv4 from clustered filesystems work.
> > > This is a large-scale, long-term project, but there are pieces of the
> > > existing code that are clearly unsuitable in such a configuration...
> > >
> > > One of the things that Bruce has long had on his wishlist is to replace
> > > the client name tracking code that the kernel uses:
> > >
> > > http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery
> > >
> > > The existing code manipulates the filesystem directly to track this
> > > info. Not only is that something that makes the VFS maintainers look
> > > askance at knfsd, but it also is unsuitable in a clustered
> > > configuration.
> > >
> > > Typically we think of the grace period as a property of the server, but
> > > with a clustered filesystem, we need to consider it as a property of the
> > > cluster as a whole. On a cold startup of the cluster, once any node
> > > grants a non-reclaim lock, then no more reclaim can be allowed on any
> > > node. Grace periods must be coordinated amongst all cluster nodes.
> >
> > Agreed, but as you go forward with this effort, you should consider that NFSv4 migration allows individual file systems to be in grace.
>From the point of view of the protocol--I think all that means is that a
client should be prepared to handle GRACE errors at any time, and should
treat them more or less the same as they would a DELAY error?
> Yes. The eventual goal is eliminate the grace period on failovers once
> the cluster fs is up and running, and out of its initial grace period.
>
> In order to do that, we'll need to push grace period handling into the
> VFS layer to some degree, probably by providing a standard set of grace
> period handling ops and allowing the filesystems to override them in
> some fashion (maybe a new set of export ops?).
That's what I've always imagined we'd do.
Long-term it would be nice if even local filesystems could respect the
grace period: local applications really shouldn't be grabbing new locks
then either, and currently the only way to prevent that is to delay
starting them until a grace period has passed.
--b.
> In any case, design of that is a later phase of this project once I get
> this part settled...
>
> > > In order to achieve that goal, we need to first allow the client name
> > > reclaim to be cluster aware as well. This patchset is a move toward that
> > > goal and covers the initial kernel part of such a change. A patchset to
> > > add a daemon to handle the upcalls will follow.
> > >
> > > Note that this patchset is still a little rough, so consider this an
> > > RFC for the overall design. We'll also need to consider a plan to
> > > deprecate the old client tracking code.
> > >
> > > The goal with this patchset is to replace the existing functionality,
> > > without disturbing the existing code too much. There's some room for
> > > more cleanup and reorganization once the old tracker is gone.
> > >
> > > Jeff Layton (5):
> > > nfsd: add nfsd4_client_tracking_ops struct and a way to set it
> > > sunrpc: create nfsd dir in rpc_pipefs
> > > nfsd: add a header describing upcall for clname tracking daemon
> > > nfsd: add a cl_daddr field and a generic flags field to nfs4_client
> > > nfsd: add the infrastructure to handle the clstate upcall
> > >
> > > fs/nfsd/nfs4recover.c | 442 +++++++++++++++++++++++++++++++++++++++++-
> > > fs/nfsd/nfs4state.c | 49 ++---
> > > fs/nfsd/state.h | 16 +-
> > > include/linux/nfsd/clstate.h | 59 ++++++
> > > net/sunrpc/rpc_pipe.c | 5 +
> > > 5 files changed, 526 insertions(+), 45 deletions(-)
> > > create mode 100644 include/linux/nfsd/clstate.h
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Jeff Layton <jlayton@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2011-12-14 20:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-14 13:54 [PATCH 0/5] nfsd: overhaul the client name tracking code (RFC) Jeff Layton
2011-12-14 13:54 ` [PATCH 1/5] nfsd: add nfsd4_client_tracking_ops struct and a way to set it Jeff Layton
2011-12-14 13:54 ` [PATCH 2/5] sunrpc: create nfsd dir in rpc_pipefs Jeff Layton
2011-12-14 13:54 ` [PATCH 3/5] nfsd: add a header describing upcall for clname tracking daemon Jeff Layton
2011-12-14 21:53 ` J. Bruce Fields
2011-12-15 15:14 ` Jeff Layton
2011-12-14 13:54 ` [PATCH 4/5] nfsd: add a cl_daddr field and a generic flags field to nfs4_client Jeff Layton
2011-12-14 13:54 ` [PATCH 5/5] nfsd: add the infrastructure to handle the clstate upcall Jeff Layton
2011-12-14 14:35 ` [PATCH 0/5] nfsd: overhaul the client name tracking code (RFC) Chuck Lever
2011-12-14 14:49 ` Jeff Layton
2011-12-14 20:00 ` J. Bruce Fields [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111214200029.GA7623@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.