* nfs_file_flush() question
@ 2008-08-17 0:23 Quentin Barnes
2008-08-17 17:04 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Quentin Barnes @ 2008-08-17 0:23 UTC (permalink / raw)
To: linux-nfs
I've been coming up to speed on the NFS protocol and its NFS client
support in Linux. I've been comparing performance of NFS on RHEL4
and RHEL5 vs. FreeBSD 6.2. (Okay, we're on an old base, but I don't
think it matters here for this question.)
In watching the NFS protocols fly back and forth between BSD
and Linux clients to an NFS server, I noticed that Linux is
doing an extra GETATTR over FreeBSD when closing a read-write
file. I tracked this back to nfs_file_flush() which is
doing a __nfs_revalidate_inode() (or in current kernels
nfs_revalidate_inode()). Why do we want nfs_file_flush() to force
a revalidate of an inode we're closing? Why not instead just
invalidate the inode's attribute?
I looked at the FreeBSD 6.2 code. In its nfs_close(), it does an
"np->n_attrstamp = 0;" to invalidate the inode's attribute cache.
The current Linux kernel code in question in nfs_file_flush() is:
==========
/* Ensure that data+attribute caches are up to date after close() */
status = nfs_do_fsync(ctx, inode);
if (!status)
nfs_revalidate_inode(NFS_SERVER(inode), inode);
==========
I would imagine this better as:
==========
/* Ensure that data+attribute caches are up to date after close() */
status = nfs_do_fsync(ctx, inode);
if (!status && !(NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))
NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATTR;
==========
Is there a reason I'm missing that the revalidate and GETATTR are
required?
Quentin
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: nfs_file_flush() question 2008-08-17 0:23 nfs_file_flush() question Quentin Barnes @ 2008-08-17 17:04 ` Trond Myklebust 2008-08-18 16:04 ` Chuck Lever 2008-08-19 20:17 ` Quentin Barnes 0 siblings, 2 replies; 7+ messages in thread From: Trond Myklebust @ 2008-08-17 17:04 UTC (permalink / raw) To: Quentin Barnes; +Cc: linux-nfs On Sat, 2008-08-16 at 19:23 -0500, Quentin Barnes wrote: > I've been coming up to speed on the NFS protocol and its NFS client > support in Linux. I've been comparing performance of NFS on RHEL4 > and RHEL5 vs. FreeBSD 6.2. (Okay, we're on an old base, but I don't > think it matters here for this question.) > > In watching the NFS protocols fly back and forth between BSD > and Linux clients to an NFS server, I noticed that Linux is > doing an extra GETATTR over FreeBSD when closing a read-write > file. I tracked this back to nfs_file_flush() which is > doing a __nfs_revalidate_inode() (or in current kernels > nfs_revalidate_inode()). Why do we want nfs_file_flush() to force > a revalidate of an inode we're closing? Why not instead just > invalidate the inode's attribute? > > I looked at the FreeBSD 6.2 code. In its nfs_close(), it does an > "np->n_attrstamp = 0;" to invalidate the inode's attribute cache. > > The current Linux kernel code in question in nfs_file_flush() is: > ========== > /* Ensure that data+attribute caches are up to date after close() */ > status = nfs_do_fsync(ctx, inode); > if (!status) > nfs_revalidate_inode(NFS_SERVER(inode), inode); > ========== > > I would imagine this better as: > ========== > /* Ensure that data+attribute caches are up to date after close() */ > status = nfs_do_fsync(ctx, inode); > if (!status && !(NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO)) > NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATTR; > ========== > > Is there a reason I'm missing that the revalidate and GETATTR are > required? Yes: It is required for correct close-to-open cache consistency semantics. If I don't know the correct mtime attribute of the file when I close it, then I can't compare it with the mtime of the file when I open it again. If so, close-to-open semantics forbid me from assuming that my cached data is still valid, and so I have to throw out the entire page cache contents for that file. Cheers Trond ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfs_file_flush() question 2008-08-17 17:04 ` Trond Myklebust @ 2008-08-18 16:04 ` Chuck Lever 2008-08-18 16:53 ` Trond Myklebust 2008-08-19 20:17 ` Quentin Barnes 1 sibling, 1 reply; 7+ messages in thread From: Chuck Lever @ 2008-08-18 16:04 UTC (permalink / raw) To: Trond Myklebust; +Cc: Quentin Barnes, linux-nfs On Aug 17, 2008, at 1:04 PM, Trond Myklebust wrote: > On Sat, 2008-08-16 at 19:23 -0500, Quentin Barnes wrote: >> I've been coming up to speed on the NFS protocol and its NFS client >> support in Linux. I've been comparing performance of NFS on RHEL4 >> and RHEL5 vs. FreeBSD 6.2. (Okay, we're on an old base, but I don't >> think it matters here for this question.) >> >> In watching the NFS protocols fly back and forth between BSD >> and Linux clients to an NFS server, I noticed that Linux is >> doing an extra GETATTR over FreeBSD when closing a read-write >> file. I tracked this back to nfs_file_flush() which is >> doing a __nfs_revalidate_inode() (or in current kernels >> nfs_revalidate_inode()). Why do we want nfs_file_flush() to force >> a revalidate of an inode we're closing? Why not instead just >> invalidate the inode's attribute? >> >> I looked at the FreeBSD 6.2 code. In its nfs_close(), it does an >> "np->n_attrstamp = 0;" to invalidate the inode's attribute cache. >> >> The current Linux kernel code in question in nfs_file_flush() is: >> ========== >> /* Ensure that data+attribute caches are up to date after close() >> */ >> status = nfs_do_fsync(ctx, inode); >> if (!status) >> nfs_revalidate_inode(NFS_SERVER(inode), inode); >> ========== >> >> I would imagine this better as: >> ========== >> /* Ensure that data+attribute caches are up to date after close() >> */ >> status = nfs_do_fsync(ctx, inode); >> if (!status && !(NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO)) >> NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATTR; >> ========== >> >> Is there a reason I'm missing that the revalidate and GETATTR are >> required? > > Yes: It is required for correct close-to-open cache consistency > semantics. > > If I don't know the correct mtime attribute of the file when I close > it, > then I can't compare it with the mtime of the file when I open it > again. > If so, close-to-open semantics forbid me from assuming that my cached > data is still valid, and so I have to throw out the entire page cache > contents for that file. For NFSv3, a WRITE operation can return post-op attributes which reflect the updated mtime on the server. In that case, we get the updated mtime for free, and don't need the additional GETATTR. However, there are corner cases: 1. The client had to send multiple WRITEs to finish flushing the file's dirty data, and the replies returned out of order. In this case, the client can't know which mtime is the final value. 2. The server chose not to return post-op attributes. Does the Linux NFS client optimize away the GETATTR when it has sent only a single WRITE and the server has returned post-op attributes? Using a large wsize with a modern server implementation might make this a fairly common scenario. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfs_file_flush() question 2008-08-18 16:04 ` Chuck Lever @ 2008-08-18 16:53 ` Trond Myklebust 0 siblings, 0 replies; 7+ messages in thread From: Trond Myklebust @ 2008-08-18 16:53 UTC (permalink / raw) To: Chuck Lever; +Cc: Quentin Barnes, linux-nfs On Mon, 2008-08-18 at 12:04 -0400, Chuck Lever wrote: > Does the Linux NFS client optimize away the GETATTR when it has sent > only a single WRITE and the server has returned post-op attributes? > Using a large wsize with a modern server implementation might make > this a fairly common scenario. Yes: please see the code. We use a standard nfs_revalidate_inode() which will be optimised away if the inode metadata is known to be up to date. Trond ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfs_file_flush() question 2008-08-17 17:04 ` Trond Myklebust 2008-08-18 16:04 ` Chuck Lever @ 2008-08-19 20:17 ` Quentin Barnes 2008-08-20 0:30 ` Trond Myklebust 1 sibling, 1 reply; 7+ messages in thread From: Quentin Barnes @ 2008-08-19 20:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Sun, Aug 17, 2008 at 10:04:01AM -0700, Trond Myklebust wrote: > On Sat, 2008-08-16 at 19:23 -0500, Quentin Barnes wrote: > > I've been coming up to speed on the NFS protocol and its NFS client > > support in Linux. I've been comparing performance of NFS on RHEL4 > > and RHEL5 vs. FreeBSD 6.2. (Okay, we're on an old base, but I don't > > think it matters here for this question.) Oops. I goofed in my assumption. There is a notable difference between the older kernels and the newer kernels in this regards. [...] > > Is there a reason I'm missing that the revalidate and GETATTR are > > required? > > Yes: It is required for correct close-to-open cache consistency > semantics. > > If I don't know the correct mtime attribute of the file when I close it, If I follow the code, you do know the mtime when closing the file. With V3, from the WRITE and COMMIT, you're given weak cache consistency data containing the the updated mtimes, correct? I'm still learning the NFS protocol, so I know I don't fully understand when and how the WCC is utilized in the kernel, so I probably have something wrong. > then I can't compare it with the mtime of the file when I open it again. > If so, close-to-open semantics forbid me from assuming that my cached > data is still valid, and so I have to throw out the entire page cache > contents for that file. I watched a 2.6.24 kernel I had lying around. It never does a GETATTR anymore during closing a file. The older kernels invalidated the attribute cache as part of nfs_file_flush()'s write/commit step. The newer kernels still leave the attribute or data cache marked as valid post-write and commit. (More below). If the nocto mount flag is used, the only difference is the GETATTR on open(2) is avoided. On Mon, Aug 18, 2008 at 09:53:11AM -0700, Trond Myklebust wrote: > On Mon, 2008-08-18 at 12:04 -0400, Chuck Lever wrote: > > Does the Linux NFS client optimize away the GETATTR when it has sent > > only a single WRITE and the server has returned post-op attributes? > > Using a large wsize with a modern server implementation might make > > this a fairly common scenario. > > Yes: please see the code. We use a standard nfs_revalidate_inode() which > will be optimised away if the inode metadata is known to be up to date. That's what I found. There was two pieces to that change. The first was in nfs_file_flush() changing the call from __nfs_revalidate_inode() to nfs_revalidate_inode() in 2.6.15 so the always forced GETATTR could be optimized out when the attribute cache was still valid and hadn't timed out. But making the change to nfs_revalidate_inode() by itself only helps in the case where the file was open O_RDWR and no write(2) was done. The code still needed to be updated to use the WCC data at the right time. In the older kernels when nfs_wb_all() ended up calling nfs_update_inode() which was clearing the cache when it saw the mtime change from the WRITE. I tracked down why. Newer kernels (2.6.24 and later) had nfs_post_op_update_inode_force_wcc() call added to nfs3_write_done() which updated the inode with the WCC data from the WRITE so the later call to nfs_update_inode() didn't see an unexpected mtime change flagging the attribute and data cache as invalid. At least that's my current understanding from reading through the code for the last couple of days and comparing older and newer kernels. Please correct me where I'm wrong. > Cheers > Trond Quentin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfs_file_flush() question 2008-08-19 20:17 ` Quentin Barnes @ 2008-08-20 0:30 ` Trond Myklebust 2008-08-20 3:38 ` Quentin Barnes 0 siblings, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2008-08-20 0:30 UTC (permalink / raw) To: Quentin Barnes; +Cc: linux-nfs@vger.kernel.org On Tue, 2008-08-19 at 15:17 -0500, Quentin Barnes wrote: > > If I don't know the correct mtime attribute of the file when I close it, > > If I follow the code, you do know the mtime when closing the > file. With V3, from the WRITE and COMMIT, you're given weak cache > consistency data containing the the updated mtimes, correct? No. Please note the difference between a call to nfs_update_inode(), and a call to nfs_refresh_inode(). The latter tries to be more careful about updating the inode attributes if there is a chance that we may have raced with another RPC call to the same inode, and hence that the attributes returned may be stale. > But making the change to nfs_revalidate_inode() by itself only > helps in the case where the file was open O_RDWR and no write(2) > was done. The code still needed to be updated to use the WCC data > at the right time. In the older kernels when nfs_wb_all() ended up > calling nfs_update_inode() which was clearing the cache when it saw > the mtime change from the WRITE. I tracked down why. Newer kernels > (2.6.24 and later) had nfs_post_op_update_inode_force_wcc() call > added to nfs3_write_done() which updated the inode with the WCC data > from the WRITE so the later call to nfs_update_inode() didn't see > an unexpected mtime change flagging the attribute and data cache as > invalid. See above. Cheers Trond ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfs_file_flush() question 2008-08-20 0:30 ` Trond Myklebust @ 2008-08-20 3:38 ` Quentin Barnes 0 siblings, 0 replies; 7+ messages in thread From: Quentin Barnes @ 2008-08-20 3:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Tue, Aug 19, 2008 at 05:30:52PM -0700, Trond Myklebust wrote: > On Tue, 2008-08-19 at 15:17 -0500, Quentin Barnes wrote: > > > If I don't know the correct mtime attribute of the file when I close it, > > > > If I follow the code, you do know the mtime when closing the > > file. With V3, from the WRITE and COMMIT, you're given weak cache > > consistency data containing the the updated mtimes, correct? > > No. Please note the difference between a call to nfs_update_inode(), and > a call to nfs_refresh_inode(). The latter tries to be more careful about > updating the inode attributes if there is a chance that we may have > raced with another RPC call to the same inode, and hence that the > attributes returned may be stale. > > > But making the change to nfs_revalidate_inode() by itself only > > helps in the case where the file was open O_RDWR and no write(2) > > was done. The code still needed to be updated to use the WCC data > > at the right time. In the older kernels when nfs_wb_all() ended up > > calling nfs_update_inode() which was clearing the cache when it saw > > the mtime change from the WRITE. I tracked down why. Newer kernels > > (2.6.24 and later) had nfs_post_op_update_inode_force_wcc() call > > added to nfs3_write_done() which updated the inode with the WCC data > > from the WRITE so the later call to nfs_update_inode() didn't see > > an unexpected mtime change flagging the attribute and data cache as > > invalid. > > See above. I'm not sure I'm following what you mean. What I wrote in the second part is what's happening from watching the traces. I may not have gotten the exact call chain right, but the end result is what's going on. I'll go into more detail. But what I'm wondering is what exactly is wrong. Is it my understanding or something in my analysis? Are you indicating that the 2.6.24 kernel and later are misbehaving since they don't invalidate the caches and do a GETATTR? Here's a debug trace on a 2.6.24 kernel of a process doing an open(2), write(2), and close(2): ====== NFS: permission(0:14/33685649), mask=0x1, res=0 NFS: nfs_lookup_revalidate(/.xtest1) is valid NFS: permission(0:14/33686480), mask=0x6, res=0 nfs: write(/.xtest1(33686480), 23@0) NFS: nfs_updatepage(/.xtest1 23@0) NFS: nfs_updatepage returns 0 (isize 23) nfs: flush(0:14/33686480) *nfs: flush pre-nfs_do_fsync cache_validity = 0x00000000 NFS: 0 initiated write call (req 0:14/33686480, 23 bytes @ offset 0) NFS: 38 nfs_writeback_done (status 23) NFS: nfs_update_inode(0:14/33686480 ct=2 info=0x7) NFS: write (0:14/33686480 23@0) marked for commit NFS: 0 initiated commit call NFS: 39 nfs_commit_done (status 0) NFS: nfs_update_inode(0:14/33686480 ct=2 info=0x6) NFS: commit (0:14/33686480 23@0) OK *nfs: flush post-nfs_do_fsync cache_validity = 0x00000000 NFS: dentry_delete(/.xtest1, 8) ====== I added some extra debug of my own just before and after the call to nfs_do_fsync() in nfs_file_flush(). I noted them with "*"s. Note that after the commit, the attribute and data cache are still valid with 2.6.24 (cache_validity is still 0x0, so nothing's invalidated). When that's the case, there is no GETATTR call. Now the same thing on 2.6.9: ====== nfs: flush(0:13/33686480) *nfs: flush pre-nfs_wb_all cache_validity = 0x00000000 NFS: 103 initiated write call (req 0:13/33686480, 23 bytes @ offset 0) NFS: nfs_update_inode(0:13/33686480 ct=2 info=0x6) NFS: mtime change on server for file 0:13/33686480 NFS: 103 nfs_writeback_done (status 23) NFS: write (0:13/33686480 23@0) marked for commit NFS: 104 initiated commit call NFS: nfs_update_inode(0:13/33686480 ct=2 info=0x6) NFS: 104 nfs_commit_done (status 0) NFS: commit (0:13/33686480 23@0) OK *nfs: flush post-nfs_wb_all cache_validity = 0x0000001b NFS: revalidating (0:13/33686480) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:13/33686480 ct=1 info=0x6) NFS: (0:13/33686480) data cache invalidated NFS: nfs3_forget_cached_acls(0:13/33686480) NFS: (0:13/33686480) revalidation complete NFS: dentry_delete(//.xtest1, 0) ====== The extra debug this time is around the nfs_wb_all() call. As you can see during the flush the attribute and data caches get marked invalid (cache_validity goes from 0x0 to 0x1b) so a GETATTR call is made. In nfs_update_inode(), it tells us why it happened, "mtime change on server for file" which does not happen on 2.6.24. I ported over the nfs_post_op_update_inode_force_wcc() function to 2.6.9 and hooked it into the *write_done() functions and that got rid of the "mtime change on server for file" message from nfs_update_inode() on that kernel. Is 2.6.24 doing the right thing? > Cheers > Trond Quentin ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-08-20 3:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-17 0:23 nfs_file_flush() question Quentin Barnes 2008-08-17 17:04 ` Trond Myklebust 2008-08-18 16:04 ` Chuck Lever 2008-08-18 16:53 ` Trond Myklebust 2008-08-19 20:17 ` Quentin Barnes 2008-08-20 0:30 ` Trond Myklebust 2008-08-20 3:38 ` Quentin Barnes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.