All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Andy Chittenden <andyc@bluearc.com>
Cc: linux-kernel@vger.kernel.org
Subject: RE: nfs client readdir caching issue?
Date: Mon, 07 Jul 2008 13:20:07 -0400	[thread overview]
Message-ID: <1215451207.19512.16.camel@localhost> (raw)
In-Reply-To: <0F10A59FDFFDFD4E9BEBD7365DE6725501F64C4B@uk-email.terastack.bluearc.com>

On Thu, 2008-07-03 at 09:47 +0100, Andy Chittenden wrote:
> > If so, then invalidate_inode_pages2_range() would have to be broken:
> we
> > always clear the readdir cache immediately after reading in the page
> > with index 0 (i.e. the first readdir page).
> 
> I'm confused by the call to invalidate_inode_pages2_range:
> 
>         if (page->index == 0)
>                 invalidate_inode_pages2_range(inode->i_mapping,
> PAGE_CACHE_SIZE, -1);
> 
> That's passing in a pgoff_t of 4096 as the start page offset from which
> to invalidate. And an enormous number for the end page to invalidate to.
> So it looks like the nfs client is trying to invalidate from a *byte*
> offset of 4096 (but if that's true, the first page could contain less
> than 4096 bytes depending on the size of the readdir response it
> received but I'll leave that to one side for the moment).
> 
> What's confusing me is that when I look at the implementation of
> invalidate_inode_pages2_range, I see this call to pagevec_lookup:
> 
>                 pagevec_lookup(&pvec, mapping, next,
>                         min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1))
> {
> 
> looking at the pagevec_lookup comments, it claims the fourth parameter
> is the number of pages:
> 
> * @nr_pages:   The maximum number of pages
> 
> So how can (end - next) be a number of pages? (next will be 4096 in the
> call from the nfs client). IE it looks like
> invalidate_inode_pages2_range is expecting a page range (as the name
> suggests). IE I'm wondering whether the call to
> invalidate_inode_pages2_range should be:
> 
>         if (page->index == 0)
>                 invalidate_inode_pages2_range(inode->i_mapping, 1, -1);
> 
> FWIW We've also seen this on larger directories so I'm wondering what
> would happen if a readdir part way down the cookie chain returned more
> data (or less) than it did last time. IE if the above is correct, then
> replace the two lines with:
> 
> 	invalidate_inode_pages2_range(inode->i_mapping, page->index + 1,
> -1);
> 
> IE purge the rest of the pages for the inode.

Hi Andy,

Apologies for the late response. I was travelling on Thursday, and got
caught up with the long weekend due to the July 4th holiday in the US.


Yes, I agree with your comment above: the
invalidate_inode_pages2_range() call is specifying a byte range instead
of a page index range, and is therefore wrong.

Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes).

Cheers
  Trond
-------------------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS: Fix readdir cache invalidation

invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/dir.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 58d43da..982a206 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -204,7 +204,7 @@ int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page *page)
 	 * Note: assumes we have exclusive access to this mapping either
 	 *	 through inode->i_mutex or some other mechanism.
 	 */
-	if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) {
+	if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) {
 		/* Should never happen */
 		nfs_zap_mapping(inode, inode->i_mapping);
 	}



      reply	other threads:[~2008-07-07 17:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-02 11:03 nfs client readdir caching issue? Andy Chittenden
2008-07-02 17:37 ` Trond Myklebust
2008-07-03  8:47   ` Andy Chittenden
2008-07-07 17:20     ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1215451207.19512.16.camel@localhost \
    --to=trond.myklebust@fys.uio.no \
    --cc=andyc@bluearc.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.