From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Andy Chittenden <andyc@bluearc.com>
Cc: linux-kernel@vger.kernel.org
Subject: RE: nfs client readdir caching issue?
Date: Mon, 07 Jul 2008 13:20:07 -0400 [thread overview]
Message-ID: <1215451207.19512.16.camel@localhost> (raw)
In-Reply-To: <0F10A59FDFFDFD4E9BEBD7365DE6725501F64C4B@uk-email.terastack.bluearc.com>
On Thu, 2008-07-03 at 09:47 +0100, Andy Chittenden wrote:
> > If so, then invalidate_inode_pages2_range() would have to be broken:
> we
> > always clear the readdir cache immediately after reading in the page
> > with index 0 (i.e. the first readdir page).
>
> I'm confused by the call to invalidate_inode_pages2_range:
>
> if (page->index == 0)
> invalidate_inode_pages2_range(inode->i_mapping,
> PAGE_CACHE_SIZE, -1);
>
> That's passing in a pgoff_t of 4096 as the start page offset from which
> to invalidate. And an enormous number for the end page to invalidate to.
> So it looks like the nfs client is trying to invalidate from a *byte*
> offset of 4096 (but if that's true, the first page could contain less
> than 4096 bytes depending on the size of the readdir response it
> received but I'll leave that to one side for the moment).
>
> What's confusing me is that when I look at the implementation of
> invalidate_inode_pages2_range, I see this call to pagevec_lookup:
>
> pagevec_lookup(&pvec, mapping, next,
> min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1))
> {
>
> looking at the pagevec_lookup comments, it claims the fourth parameter
> is the number of pages:
>
> * @nr_pages: The maximum number of pages
>
> So how can (end - next) be a number of pages? (next will be 4096 in the
> call from the nfs client). IE it looks like
> invalidate_inode_pages2_range is expecting a page range (as the name
> suggests). IE I'm wondering whether the call to
> invalidate_inode_pages2_range should be:
>
> if (page->index == 0)
> invalidate_inode_pages2_range(inode->i_mapping, 1, -1);
>
> FWIW We've also seen this on larger directories so I'm wondering what
> would happen if a readdir part way down the cookie chain returned more
> data (or less) than it did last time. IE if the above is correct, then
> replace the two lines with:
>
> invalidate_inode_pages2_range(inode->i_mapping, page->index + 1,
> -1);
>
> IE purge the rest of the pages for the inode.
Hi Andy,
Apologies for the late response. I was travelling on Thursday, and got
caught up with the long weekend due to the July 4th holiday in the US.
Yes, I agree with your comment above: the
invalidate_inode_pages2_range() call is specifying a byte range instead
of a page index range, and is therefore wrong.
Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes).
Cheers
Trond
-------------------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS: Fix readdir cache invalidation
invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/dir.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 58d43da..982a206 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -204,7 +204,7 @@ int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page *page)
* Note: assumes we have exclusive access to this mapping either
* through inode->i_mutex or some other mechanism.
*/
- if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) {
+ if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) {
/* Should never happen */
nfs_zap_mapping(inode, inode->i_mapping);
}
prev parent reply other threads:[~2008-07-07 17:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-02 11:03 nfs client readdir caching issue? Andy Chittenden
2008-07-02 17:37 ` Trond Myklebust
2008-07-03 8:47 ` Andy Chittenden
2008-07-07 17:20 ` Trond Myklebust [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1215451207.19512.16.camel@localhost \
--to=trond.myklebust@fys.uio.no \
--cc=andyc@bluearc.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox