From: Chuck Lever <chuck.lever@oracle.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] NFSD: Return full WCC data for NFSv3 metadata operations
Date: Thu, 08 Jul 2010 15:59:15 -0400 [thread overview]
Message-ID: <4C362E13.4080509@oracle.com> (raw)
In-Reply-To: <20100708195335.GB31173@fieldses.org>
On 07/ 8/10 03:53 PM, J. Bruce Fields wrote:
> On Thu, Jul 08, 2010 at 11:43:46AM -0400, Chuck Lever wrote:
>> NFSv3 WCC data, or "weak cache consistency" data, is an attempt to
>> reduce the number of on-the-wire transactions needed by NFS clients
>> to keep their caches up to date. WCC data consists of two parts:
>>
>> o pre-op data, which is a subset of file metadata as it was before
>> a procedure starts, and
>>
>> o post-op data, which is a full set of NFSv3 file attribute data as
>> it is after a procedure finishes.
>>
>> For an NFSv3 procedure that returns wcc_data, an NFSv3 server is free
>> to return either, both, or neither of these. Define "full WCC data"
>> as a reply that contains both pre-op and post-op data.
>>
>> To make this data useful, a server must ensure that file metadata is
>> captured atomically around the requested operation. If the pre-op
>> data in a reply matches the file metadata that the client already has
>> cached, the client can assume that no other operation on that file
>> occurred while the server was fulfilling the current request, and that
>> therefore the post-op metadata is the latest version, and can be
>> cached.
>>
>> Conversely, NFSv3 clients invalidate their metadata caches when they
>> receive replies to metadata altering operations that do not contain
>> full WCC data. When a server presents a reply that does not have both
>> pre-op and post-op WCC data, clients must employ extra LOOKUP and
>> GETATTR requests to ensure their metadata caches are up to date,
>> causing performance to suffer needlessly. For example, untarring a
>> large tar file can take almost an order of magnitude longer in this
>> case, depending on the client implementation.
>>
>> In the Linux NFS server implementation, to ensure that WCC data
>> reflects only changes made during the current file system operation,
>> the file's inode mutex is held in order to serialize metadata altering
>> operations on that inode. Our server saves pre-op data for a file
>> handle just after the target inode's mutex is taken, and saves post-op
>> data just before the inode's mutex is dropped (see fh_lock() and
>> fh_unlock()).
>>
>> In order to return full WCC data to clients, our server must have both
>> the saved pre-op and the saved post-op attribute data for a file
>> handle filled in before it starts XDR encoding the reply.
>> Unfortunately, for the NFSv3 MKDIR, MKNOD, REMOVE, RMDIR procedures,
>> our server does not unlock the parent directory inode until well after
>> the reply has been XDR encoded.
>>
>> In these cases, encode_wcc_data() does have saved pre-op WCC data
>> available, since the fh is locked, but does not have saved post-op WCC
>> data for the parent directory, since it hasn't yet been unlocked. In
>> this situation, encode_wcc_data() simply grabs the parent's current
>> metadata, uses that as the post-op WCC data, and returns no pre-op
>> WCC data to the client.
>>
>> By instead unlocking the parent directory file handle immediately
>> after the internal operations for each of these NFS procedures is
>> complete, saved post-op WCC data for the file handle is filled in
>> before XDR encoding starts, so full WCC data for that procedure can
>> be returned to clients.
>>
>> Note that the NFSv4 CREATE and REMOVE procedures already invoke
>> fh_unlock() explicitly on the parent directory in order to fill in the
>> NFSv4 post change attribute.
>>
>> Note also that NFSv3 CREATE, RENAME, SETATTR, and SYMLINK already
>> perform explicit file handle unlocking.
>>
>> Signed-off-by: Chuck Lever<chuck.lever@oracle.com>
>> ---
>>
>> Bruce-
>>
>> This patch is mechanically the same as the previous one, but the patch
>> description has a more accurate and clearly stated rationale for the
>> change.
>>
>> Please use this one instead of the previous one.
>
> Thanks. The first is already committed, though.
And you can't use "git reset --soft ..." to fix the commit message
because....?
next prev parent reply other threads:[~2010-07-08 20:00 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-08 15:43 [PATCH] NFSD: Return full WCC data for NFSv3 metadata operations Chuck Lever
[not found] ` <20100708154037.2820.51702.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
2010-07-08 19:53 ` J. Bruce Fields
2010-07-08 19:59 ` Chuck Lever [this message]
2010-07-08 20:10 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C362E13.4080509@oracle.com \
--to=chuck.lever@oracle.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.