From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jeff.layton@primarydata.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
NFS <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes.
Date: Thu, 17 Jul 2014 11:50:24 +1000 [thread overview]
Message-ID: <20140717115024.1eb7433d@notabene.brown> (raw)
In-Reply-To: <20140714194738.5aafaf25@tlielax.poochiereds.net>
[-- Attachment #1: Type: text/plain, Size: 8298 bytes --]
On Mon, 14 Jul 2014 19:47:38 -0400 Jeff Layton <jeff.layton@primarydata.com>
wrote:
> On Tue, 15 Jul 2014 08:57:27 +1000
> NeilBrown <neilb@suse.de> wrote:
>
> > On Mon, 14 Jul 2014 09:00:28 -0400 Jeff Layton <jeff.layton@primarydata.com>
> > wrote:
> >
> > > On Mon, 14 Jul 2014 22:35:13 +1000
> > > NeilBrown <neilb@suse.de> wrote:
> > >
> > > > On Mon, 14 Jul 2014 08:14:55 -0400 Jeff Layton <jeff.layton@primarydata.com>
> > > > wrote:
> > > >
> > > > > On Mon, 14 Jul 2014 15:14:05 +1000
> > > > > NeilBrown <neilb@suse.de> wrote:
> > > > >
> > > > > >
> > > > > > If an 'open' of a file in an NFSv4 filesystem finds that the dentry is
> > > > > > in cache, but the inode is stale (on the server), the dentry will not
> > > > > > be re-validated immediately and may cause ESTALE to be returned to
> > > > > > user-space.
> > > > > >
> > > > > > For a non-create 'open', do_last() calls lookup_fast() and on success
> > > > > > will eventually call may_open() which calls into nfs_permission().
> > > > > > If nfs_permission() makes the ACCESS call to the server it will get
> > > > > > NFS4ERR_STALE, resulting in ESTALE from may_open() and thence from
> > > > > > do_last().
> > > > > > The retry-on-ESTALE in filename_lookup() will repeat exactly the same
> > > > > > process because nothing in this path will invalidate the dentry due to
> > > > > > the inode being stale, so the ESTALE will be returned.
> > > > > >
> > > > > > lookup_fast() calls ->d_revalidate(), but for an OPEN on an NFSv4
> > > > > > filesystem, that will succeed for regular files:
> > > > > > /* Let f_op->open() actually open (and revalidate) the file */
> > > > > >
> > > > > > Unfortunately in the case of a STALE inode, f_op->open() never gets
> > > > > > called. If we teach nfs4_lookup_revalidate() to report a failure on
> > > > > > NFS_STALE() inodes, then the dentry will be invalidated and a full
> > > > > > lookup will be attempted. The ESTALE errors go away.
> > > > > >
> > > > > >
> > > > > > While I think this fix is correct, I'm not convinced that it is
> > > > > > sufficient, particularly if lookupcache=none.
> > > > > > The current code will fail an "open" is nfs_permission() fails,
> > > > > > without having performed a LOOKUP. i.e. it will use the cache.
> > > > > > nfs_lookup_revalidate will force a lookup before the permission check
> > > > > > if NFS_MOUNT_LOOKUP_CACHE_NONE, but nfs4_lookup_revalidate will not.
> > > > > >
> > > > >
> > > > > This patch should make the code fall through to nfs_lookup_revalidate,
> > > > > which would then force the lookup, right?
> > > >
> > > > Yes ... though maybe that's not what I really want to do. I really wanted to
> > > > just return '0', though I would need to check that is right in all cases.
> > > >
> > > > >
> > > > > Also, I'm a little unclear...
> > > > >
> > > > > Why would may_open fail with ESTALE after the v4 OPEN succeeds? The
> > > > > OPEN should be returning a filehandle and attributes for the inode
> > > > > actually opened. It seems like we ought to be doing any permission
> > > > > checks vs. that inode, not anything we had in cache. Presumably the
> > > > > server is then holding it open so it shouldn't be stale.
> > > >
> > > > may_open is called *before* and v4 OPEN.
> > > >
> > > > In do_last, if the inode is already in cache, then
> > > > lookup_fast is called, which calls d_revalidate
> > > > then may_open (calls ->permission)
> > > > then finish_open which calls f_op->open
> > > >
> > > > Yes, we should be doing permission checking against whatever 'open' finds.
> > > > But the VFS is structured to the the permission check after d_revalidate and
> > > > before ->open. So maybe d_revalidate needs to do the NFS open??
> > > >
> > >
> > > Ok, I see. Ugh, having the revalidate do the open sounds...messy.
> >
> > Having the VFS call into the file system in dribs and drabs, rather than just
> > asking the filesystem to "open" and letting it call back to VFS libraries
> > for name lookup etc it what is really messy (IMO).
> >
> > So yes - definite mess. Not entirely sure where the mess is.
> >
>
> Yeah, that might have been cleaner overall. I'm not sure how we can get
> there from where the code is today though...
>
> > >
> > > A simpler fix might be to fix it so that an -ESTALE return from
> > > may_open triggers a retry. Something like this maybe (probably
> > > whitespace damaged, so just for discussion purposes):
> >
> > Nice idea but doesn't work.
> > We get back to retry_lookup and call lookup_open().
> > lookup_dcache calls d_revalidate which reports that everything is fine, so it
> > tells lookup_open which jumps to out_no_open and does nothing useful.
> > So we end up in may_open() again which returns ESTALE again but now we've
> > used up all our extra lives...
> >
>
> Ahh right, so you'd probably need to pair that with the patch you
> already have. Regardless, it seems like getting back an ESTALE from
> may_open should trigger a retry rather than just erroring out.
>
> >
> > One thing I noticed while exploring this is that do_last calls "may_open"
> > *before* finish_open() while atomic_open() calls "may_open" *after*
> > finish_open() (which it calls by virtual of the fact that all ->atomic_open
> > methods call finish_open()).
> >
> > I was very tempted to just move the 'may_open' call in 'do_last' to after the
> > 'finish_open' call. That fixed the problem, but I'm not sure it is "right".
> >
> > I think the real core messiness here is that permission checking should be
> > neither before nor after finish_open, but should be an integral part of
> > finish_open with the filesystem doing the permission check in f_op->open().
> >
> > I'm currently thinking this is the best patch for now:
> >
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 4f7414afca27..5c40cfd3ae29 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -1563,9 +1563,10 @@ static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> > /* We cannot do exclusive creation on a positive dentry */
> > if (flags & LOOKUP_EXCL)
> > goto no_open_dput;
> >
> > - /* Let f_op->open() actually open (and revalidate) the file */
> > - ret = 1;
> > + if (!NFS_STALE(inode))
> > + /* Let f_op->open() actually open (and revalidate) the file */
> > + ret = 1;
> >
> > out:
> > dput(parent);
> >
> >
> > Thanks,
> > NeilBrown
> >
>
> That looks fine too, but I think you probably will also want to pair it
> with making may_open retry the open on an ESTALE return.
>
> The problem with the above check alone is that it's only going to fire
> if you previously found the inode to be stale. It may be stale on the
> server, but the client doesn't realize it yet, or could go stale after
> this check and before the ACCESS call. In that case, you'll still end
> up getting back an ESTALE once you hit may_open (unless I'm missing
> something) and that won't trigger a reattempt either.
I must admit to being a bit confused by your position here.
You are the one who introduced the high-level retry-on-ESTALE functionality
into namei.c. So you presumably know that an ESTALE will already be
retried. Yet you are suggesting to that we add another retry here??
The way I understanding it, ESTALE should only be retried if it was a cached
inode that was found to be STALE. When that happens, the dentry needs to be
invalidated and then the whole path retried again from the top with
LOOKUP_REVAL. This time we won't trust anything that is cached so any ESTALE
we find is a real ESTALE that must be returned to the caller.
From this perspective, the problem is either something is seeing a STALE
inode in the first pass and not invalidating the dentry, or that something is
not revalidating the dentry on the second pass despite LOOKUP_REVAL being set.
I'm assuming that nfs4_look_revalidate should be invalidating the dentry on
the first pass (by returning 0). Other fixes might be possible, but further
retries should be pointless - we already have the required retry in place
thanks to you!
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2014-07-17 1:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-14 5:14 [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes NeilBrown
2014-07-14 12:14 ` Jeff Layton
2014-07-14 12:35 ` NeilBrown
2014-07-14 13:00 ` Jeff Layton
2014-07-14 22:57 ` NeilBrown
2014-07-14 23:47 ` Jeff Layton
2014-07-17 1:50 ` NeilBrown [this message]
2014-07-17 11:22 ` Jeff Layton
2014-07-17 12:52 ` Miklos Szeredi
2014-07-17 14:41 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140717115024.1eb7433d@notabene.brown \
--to=neilb@suse.de \
--cc=jeff.layton@primarydata.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@primarydata.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox