From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from fieldses.org ([174.143.236.118]:51055 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755196Ab1LNUAc (ORCPT ); Wed, 14 Dec 2011 15:00:32 -0500 Date: Wed, 14 Dec 2011 15:00:29 -0500 To: Jeff Layton Cc: Chuck Lever , linux-nfs@vger.kernel.org Subject: Re: [PATCH 0/5] nfsd: overhaul the client name tracking code (RFC) Message-ID: <20111214200029.GA7623@fieldses.org> References: <1323870891-3124-1-git-send-email-jlayton@redhat.com> <18F62B11-1541-4B87-95A7-8106459903DF@oracle.com> <20111214094920.6d3fafa8@tlielax.poochiereds.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20111214094920.6d3fafa8@tlielax.poochiereds.net> From: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Dec 14, 2011 at 09:49:20AM -0500, Jeff Layton wrote: > On Wed, 14 Dec 2011 09:35:57 -0500 > Chuck Lever wrote: > > > > > On Dec 14, 2011, at 8:54 AM, Jeff Layton wrote: > > > > > First, a little background: I've recently been tasked with a project > > > to make active/active serving of NFSv4 from clustered filesystems work. > > > This is a large-scale, long-term project, but there are pieces of the > > > existing code that are clearly unsuitable in such a configuration... > > > > > > One of the things that Bruce has long had on his wishlist is to replace > > > the client name tracking code that the kernel uses: > > > > > > http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery > > > > > > The existing code manipulates the filesystem directly to track this > > > info. Not only is that something that makes the VFS maintainers look > > > askance at knfsd, but it also is unsuitable in a clustered > > > configuration. > > > > > > Typically we think of the grace period as a property of the server, but > > > with a clustered filesystem, we need to consider it as a property of the > > > cluster as a whole. On a cold startup of the cluster, once any node > > > grants a non-reclaim lock, then no more reclaim can be allowed on any > > > node. Grace periods must be coordinated amongst all cluster nodes. > > > > Agreed, but as you go forward with this effort, you should consider that NFSv4 migration allows individual file systems to be in grace. >>From the point of view of the protocol--I think all that means is that a client should be prepared to handle GRACE errors at any time, and should treat them more or less the same as they would a DELAY error? > Yes. The eventual goal is eliminate the grace period on failovers once > the cluster fs is up and running, and out of its initial grace period. > > In order to do that, we'll need to push grace period handling into the > VFS layer to some degree, probably by providing a standard set of grace > period handling ops and allowing the filesystems to override them in > some fashion (maybe a new set of export ops?). That's what I've always imagined we'd do. Long-term it would be nice if even local filesystems could respect the grace period: local applications really shouldn't be grabbing new locks then either, and currently the only way to prevent that is to delay starting them until a grace period has passed. --b. > In any case, design of that is a later phase of this project once I get > this part settled... > > > > In order to achieve that goal, we need to first allow the client name > > > reclaim to be cluster aware as well. This patchset is a move toward that > > > goal and covers the initial kernel part of such a change. A patchset to > > > add a daemon to handle the upcalls will follow. > > > > > > Note that this patchset is still a little rough, so consider this an > > > RFC for the overall design. We'll also need to consider a plan to > > > deprecate the old client tracking code. > > > > > > The goal with this patchset is to replace the existing functionality, > > > without disturbing the existing code too much. There's some room for > > > more cleanup and reorganization once the old tracker is gone. > > > > > > Jeff Layton (5): > > > nfsd: add nfsd4_client_tracking_ops struct and a way to set it > > > sunrpc: create nfsd dir in rpc_pipefs > > > nfsd: add a header describing upcall for clname tracking daemon > > > nfsd: add a cl_daddr field and a generic flags field to nfs4_client > > > nfsd: add the infrastructure to handle the clstate upcall > > > > > > fs/nfsd/nfs4recover.c | 442 +++++++++++++++++++++++++++++++++++++++++- > > > fs/nfsd/nfs4state.c | 49 ++--- > > > fs/nfsd/state.h | 16 +- > > > include/linux/nfsd/clstate.h | 59 ++++++ > > > net/sunrpc/rpc_pipe.c | 5 + > > > 5 files changed, 526 insertions(+), 45 deletions(-) > > > create mode 100644 include/linux/nfsd/clstate.h > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > Jeff Layton > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html