linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] NFSD: Return full WCC data for NFSv3 metadata operations
Date: Thu, 8 Jul 2010 15:53:35 -0400	[thread overview]
Message-ID: <20100708195335.GB31173@fieldses.org> (raw)
In-Reply-To: <20100708154037.2820.51702.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>

On Thu, Jul 08, 2010 at 11:43:46AM -0400, Chuck Lever wrote:
> NFSv3 WCC data, or "weak cache consistency" data, is an attempt to
> reduce the number of on-the-wire transactions needed by NFS clients
> to keep their caches up to date.  WCC data consists of two parts:
> 
>   o  pre-op data, which is a subset of file metadata as it was before
>      a procedure starts, and
> 
>   o  post-op data, which is a full set of NFSv3 file attribute data as
>      it is after a procedure finishes.
> 
> For an NFSv3 procedure that returns wcc_data, an NFSv3 server is free
> to return either, both, or neither of these.  Define "full WCC data"
> as a reply that contains both pre-op and post-op data.
> 
> To make this data useful, a server must ensure that file metadata is
> captured atomically around the requested operation.  If the pre-op
> data in a reply matches the file metadata that the client already has
> cached, the client can assume that no other operation on that file
> occurred while the server was fulfilling the current request, and that
> therefore the post-op metadata is the latest version, and can be
> cached.
> 
> Conversely, NFSv3 clients invalidate their metadata caches when they
> receive replies to metadata altering operations that do not contain
> full WCC data.  When a server presents a reply that does not have both
> pre-op and post-op WCC data, clients must employ extra LOOKUP and
> GETATTR requests to ensure their metadata caches are up to date,
> causing performance to suffer needlessly.  For example, untarring a
> large tar file can take almost an order of magnitude longer in this
> case, depending on the client implementation.
> 
> In the Linux NFS server implementation, to ensure that WCC data
> reflects only changes made during the current file system operation,
> the file's inode mutex is held in order to serialize metadata altering
> operations on that inode.  Our server saves pre-op data for a file
> handle just after the target inode's mutex is taken, and saves post-op
> data just before the inode's mutex is dropped (see fh_lock() and
> fh_unlock()).
> 
> In order to return full WCC data to clients, our server must have both
> the saved pre-op and the saved post-op attribute data for a file
> handle filled in before it starts XDR encoding the reply.
> Unfortunately, for the NFSv3 MKDIR, MKNOD, REMOVE, RMDIR procedures,
> our server does not unlock the parent directory inode until well after
> the reply has been XDR encoded.
> 
> In these cases, encode_wcc_data() does have saved pre-op WCC data
> available, since the fh is locked, but does not have saved post-op WCC
> data for the parent directory, since it hasn't yet been unlocked.  In
> this situation, encode_wcc_data() simply grabs the parent's current
> metadata, uses that as the post-op WCC data, and returns no pre-op
> WCC data to the client.
> 
> By instead unlocking the parent directory file handle immediately
> after the internal operations for each of these NFS procedures is
> complete, saved post-op WCC data for the file handle is filled in
> before XDR encoding starts, so full WCC data for that procedure can
> be returned to clients.
> 
> Note that the NFSv4 CREATE and REMOVE procedures already invoke
> fh_unlock() explicitly on the parent directory in order to fill in the
> NFSv4 post change attribute.
> 
> Note also that NFSv3 CREATE, RENAME, SETATTR, and SYMLINK already
> perform explicit file handle unlocking.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
> Bruce-
> 
> This patch is mechanically the same as the previous one, but the patch
> description has a more accurate and clearly stated rationale for the
> change.
> 
> Please use this one instead of the previous one.

Thanks.  The first is already committed, though.  I'm not sure if
there's a good place for the longer explanation in the docs, so it may
just have to live on in the list archives. (It's the testcase (untarring
a linux kernel from an OS X client) that I was mainly curious about.)

--b.

  parent reply	other threads:[~2010-07-08 19:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-08 15:43 [PATCH] NFSD: Return full WCC data for NFSv3 metadata operations Chuck Lever
     [not found] ` <20100708154037.2820.51702.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
2010-07-08 19:53   ` J. Bruce Fields [this message]
2010-07-08 19:59     ` Chuck Lever
2010-07-08 20:10       ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100708195335.GB31173@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).