* Re: [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes.
[not found] ` <20140717115024.1eb7433d-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2014-07-17 11:22 ` Jeff Layton
[not found] ` <20140717072241.2c1549a3-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Jeff Layton @ 2014-07-17 11:22 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Trond Myklebust, Alexander Viro, NFS, Miklos Szeredi,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 11983 bytes --]
On Thu, 17 Jul 2014 11:50:24 +1000
NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> On Mon, 14 Jul 2014 19:47:38 -0400 Jeff Layton <jeff.layton@primarydata.com>
> wrote:
>
> > On Tue, 15 Jul 2014 08:57:27 +1000
> > NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> >
> > > On Mon, 14 Jul 2014 09:00:28 -0400 Jeff Layton <jeff.layton@primarydata.com>
> > > wrote:
> > >
> > > > On Mon, 14 Jul 2014 22:35:13 +1000
> > > > NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> > > >
> > > > > On Mon, 14 Jul 2014 08:14:55 -0400 Jeff Layton <jeff.layton@primarydata.com>
> > > > > wrote:
> > > > >
> > > > > > On Mon, 14 Jul 2014 15:14:05 +1000
> > > > > > NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> > > > > >
> > > > > > >
> > > > > > > If an 'open' of a file in an NFSv4 filesystem finds that the dentry is
> > > > > > > in cache, but the inode is stale (on the server), the dentry will not
> > > > > > > be re-validated immediately and may cause ESTALE to be returned to
> > > > > > > user-space.
> > > > > > >
> > > > > > > For a non-create 'open', do_last() calls lookup_fast() and on success
> > > > > > > will eventually call may_open() which calls into nfs_permission().
> > > > > > > If nfs_permission() makes the ACCESS call to the server it will get
> > > > > > > NFS4ERR_STALE, resulting in ESTALE from may_open() and thence from
> > > > > > > do_last().
> > > > > > > The retry-on-ESTALE in filename_lookup() will repeat exactly the same
> > > > > > > process because nothing in this path will invalidate the dentry due to
> > > > > > > the inode being stale, so the ESTALE will be returned.
> > > > > > >
> > > > > > > lookup_fast() calls ->d_revalidate(), but for an OPEN on an NFSv4
> > > > > > > filesystem, that will succeed for regular files:
> > > > > > > /* Let f_op->open() actually open (and revalidate) the file */
> > > > > > >
> > > > > > > Unfortunately in the case of a STALE inode, f_op->open() never gets
> > > > > > > called. If we teach nfs4_lookup_revalidate() to report a failure on
> > > > > > > NFS_STALE() inodes, then the dentry will be invalidated and a full
> > > > > > > lookup will be attempted. The ESTALE errors go away.
> > > > > > >
> > > > > > >
> > > > > > > While I think this fix is correct, I'm not convinced that it is
> > > > > > > sufficient, particularly if lookupcache=none.
> > > > > > > The current code will fail an "open" is nfs_permission() fails,
> > > > > > > without having performed a LOOKUP. i.e. it will use the cache.
> > > > > > > nfs_lookup_revalidate will force a lookup before the permission check
> > > > > > > if NFS_MOUNT_LOOKUP_CACHE_NONE, but nfs4_lookup_revalidate will not.
> > > > > > >
> > > > > >
> > > > > > This patch should make the code fall through to nfs_lookup_revalidate,
> > > > > > which would then force the lookup, right?
> > > > >
> > > > > Yes ... though maybe that's not what I really want to do. I really wanted to
> > > > > just return '0', though I would need to check that is right in all cases.
> > > > >
> > > > > >
> > > > > > Also, I'm a little unclear...
> > > > > >
> > > > > > Why would may_open fail with ESTALE after the v4 OPEN succeeds? The
> > > > > > OPEN should be returning a filehandle and attributes for the inode
> > > > > > actually opened. It seems like we ought to be doing any permission
> > > > > > checks vs. that inode, not anything we had in cache. Presumably the
> > > > > > server is then holding it open so it shouldn't be stale.
> > > > >
> > > > > may_open is called *before* and v4 OPEN.
> > > > >
> > > > > In do_last, if the inode is already in cache, then
> > > > > lookup_fast is called, which calls d_revalidate
> > > > > then may_open (calls ->permission)
> > > > > then finish_open which calls f_op->open
> > > > >
> > > > > Yes, we should be doing permission checking against whatever 'open' finds.
> > > > > But the VFS is structured to the the permission check after d_revalidate and
> > > > > before ->open. So maybe d_revalidate needs to do the NFS open??
> > > > >
> > > >
> > > > Ok, I see. Ugh, having the revalidate do the open sounds...messy.
> > >
> > > Having the VFS call into the file system in dribs and drabs, rather than just
> > > asking the filesystem to "open" and letting it call back to VFS libraries
> > > for name lookup etc it what is really messy (IMO).
> > >
> > > So yes - definite mess. Not entirely sure where the mess is.
> > >
> >
> > Yeah, that might have been cleaner overall. I'm not sure how we can get
> > there from where the code is today though...
> >
> > > >
> > > > A simpler fix might be to fix it so that an -ESTALE return from
> > > > may_open triggers a retry. Something like this maybe (probably
> > > > whitespace damaged, so just for discussion purposes):
> > >
> > > Nice idea but doesn't work.
> > > We get back to retry_lookup and call lookup_open().
> > > lookup_dcache calls d_revalidate which reports that everything is fine, so it
> > > tells lookup_open which jumps to out_no_open and does nothing useful.
> > > So we end up in may_open() again which returns ESTALE again but now we've
> > > used up all our extra lives...
> > >
> >
> > Ahh right, so you'd probably need to pair that with the patch you
> > already have. Regardless, it seems like getting back an ESTALE from
> > may_open should trigger a retry rather than just erroring out.
> >
> > >
> > > One thing I noticed while exploring this is that do_last calls "may_open"
> > > *before* finish_open() while atomic_open() calls "may_open" *after*
> > > finish_open() (which it calls by virtual of the fact that all ->atomic_open
> > > methods call finish_open()).
> > >
> > > I was very tempted to just move the 'may_open' call in 'do_last' to after the
> > > 'finish_open' call. That fixed the problem, but I'm not sure it is "right".
> > >
> > > I think the real core messiness here is that permission checking should be
> > > neither before nor after finish_open, but should be an integral part of
> > > finish_open with the filesystem doing the permission check in f_op->open().
> > >
> > > I'm currently thinking this is the best patch for now:
> > >
> > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > index 4f7414afca27..5c40cfd3ae29 100644
> > > --- a/fs/nfs/dir.c
> > > +++ b/fs/nfs/dir.c
> > > @@ -1563,9 +1563,10 @@ static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> > > /* We cannot do exclusive creation on a positive dentry */
> > > if (flags & LOOKUP_EXCL)
> > > goto no_open_dput;
> > >
> > > - /* Let f_op->open() actually open (and revalidate) the file */
> > > - ret = 1;
> > > + if (!NFS_STALE(inode))
> > > + /* Let f_op->open() actually open (and revalidate) the file */
> > > + ret = 1;
> > >
> > > out:
> > > dput(parent);
> > >
> > >
> > > Thanks,
> > > NeilBrown
> > >
> >
> > That looks fine too, but I think you probably will also want to pair it
> > with making may_open retry the open on an ESTALE return.
> >
> > The problem with the above check alone is that it's only going to fire
> > if you previously found the inode to be stale. It may be stale on the
> > server, but the client doesn't realize it yet, or could go stale after
> > this check and before the ACCESS call. In that case, you'll still end
> > up getting back an ESTALE once you hit may_open (unless I'm missing
> > something) and that won't trigger a reattempt either.
>
> I must admit to being a bit confused by your position here.
>
> You are the one who introduced the high-level retry-on-ESTALE functionality
> into namei.c. So you presumably know that an ESTALE will already be
> retried. Yet you are suggesting to that we add another retry here??
>
> The way I understanding it, ESTALE should only be retried if it was a cached
> inode that was found to be STALE. When that happens, the dentry needs to be
> invalidated and then the whole path retried again from the top with
> LOOKUP_REVAL. This time we won't trust anything that is cached so any ESTALE
> we find is a real ESTALE that must be returned to the caller.
>
> From this perspective, the problem is either something is seeing a STALE
> inode in the first pass and not invalidating the dentry, or that something is
> not revalidating the dentry on the second pass despite LOOKUP_REVAL being set.
> I'm assuming that nfs4_look_revalidate should be invalidating the dentry on
> the first pass (by returning 0). Other fixes might be possible, but further
> retries should be pointless - we already have the required retry in place
> thanks to you!
>
> Thanks,
> NeilBrown
(cc'ing Miklos)
<facepalm>
You're totally correct. I had forgotten that we do retries on ESTALE at
a higher level. I got confused by the EOPENSTALE handling there after
finish_open.
</facepalm>
So, on your patch:
Acked-by: Jeff Layton <jlayton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
That said, it does sort of bring up an unrelated question:
What's so special about an EOPENSTALE return from finish_open that we
need to handle retries in do_last? It seems like we could get rid of the
stale_open label and just let do_filp_open handle it like we would
an ESTALE return from any other spot in the function.
Just for giggles, here's an RFC patch. It builds but I haven't tested
it. It might also be possible to do some cleanup around saved_parent
with this.
Thoughts?
-------------------------[snip]-------------------
[PATCH] vfs: don't handle EOPENSTALE retries in do_last
We already handle ESTALE retries at higher levels. Retrying the lookup
and open in do_last is somewhat redundant. Remove the logic that
for that from do_last and just let the upper layers handle it.
Cc: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
fs/namei.c | 26 +-------------------------
1 file changed, 1 insertion(+), 25 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 985c6f368485..34c6d008d0e5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2882,7 +2882,6 @@ static int do_last(struct nameidata *nd, struct path *path,
struct inode *inode;
bool symlink_ok = false;
struct path save_parent = { .dentry = NULL, .mnt = NULL };
- bool retried = false;
int error;
nd->flags &= ~LOOKUP_PARENT;
@@ -2927,7 +2926,6 @@ static int do_last(struct nameidata *nd, struct path *path,
goto out;
}
-retry_lookup:
if (op->open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
error = mnt_want_write(nd->path.mnt);
if (!error)
@@ -3017,7 +3015,6 @@ finish_lookup:
save_parent.dentry = nd->path.dentry;
save_parent.mnt = mntget(path->mnt);
nd->path.dentry = path->dentry;
-
}
nd->inode = inode;
/* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */
@@ -3049,11 +3046,8 @@ finish_open_created:
goto out;
file->f_path.mnt = nd->path.mnt;
error = finish_open(file, nd->path.dentry, NULL, opened);
- if (error) {
- if (error == -EOPENSTALE)
- goto stale_open;
+ if (error)
goto out;
- }
opened:
error = open_check_o_direct(file);
if (error)
@@ -3080,24 +3074,6 @@ exit_dput:
exit_fput:
fput(file);
goto out;
-
-stale_open:
- /* If no saved parent or already retried then can't retry */
- if (!save_parent.dentry || retried)
- goto out;
-
- BUG_ON(save_parent.dentry != dir);
- path_put(&nd->path);
- nd->path = save_parent;
- nd->inode = dir->d_inode;
- save_parent.mnt = NULL;
- save_parent.dentry = NULL;
- if (got_write) {
- mnt_drop_write(nd->path.mnt);
- got_write = false;
- }
- retried = true;
- goto retry_lookup;
}
static int do_tmpfile(int dfd, struct filename *pathname,
--
1.9.3
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes.
[not found] ` <20140717072241.2c1549a3-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2014-07-17 12:52 ` Miklos Szeredi
2014-07-17 14:41 ` Jeff Layton
0 siblings, 1 reply; 3+ messages in thread
From: Miklos Szeredi @ 2014-07-17 12:52 UTC (permalink / raw)
To: Jeff Layton
Cc: NeilBrown, Trond Myklebust, Alexander Viro, NFS, Linux-Fsdevel
On Thu, Jul 17, 2014 at 1:22 PM, Jeff Layton
<jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> wrote:
> What's so special about an EOPENSTALE return from finish_open that we
> need to handle retries in do_last? It seems like we could get rid of the
> stale_open label and just let do_filp_open handle it like we would
> an ESTALE return from any other spot in the function.
>
> Just for giggles, here's an RFC patch. It builds but I haven't tested
> it. It might also be possible to do some cleanup around saved_parent
> with this.
>
> Thoughts?
EOPENSTALE is an optimization for the redoing only the last component.
It's the analogue of ->d_revalidate() failure, in which case lookup of
that component only is retried path components before that are not.
I'm not sure if it's a valid optimization, but if not, then we should
also consider doing LOOKUP_REVAL on the whole path on any
d_revalidate() failure as well.
Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes.
2014-07-17 12:52 ` Miklos Szeredi
@ 2014-07-17 14:41 ` Jeff Layton
0 siblings, 0 replies; 3+ messages in thread
From: Jeff Layton @ 2014-07-17 14:41 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jeff Layton, NeilBrown, Trond Myklebust, Alexander Viro, NFS,
Linux-Fsdevel
On Thu, 17 Jul 2014 14:52:00 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jul 17, 2014 at 1:22 PM, Jeff Layton
> <jeff.layton@primarydata.com> wrote:
>
> > What's so special about an EOPENSTALE return from finish_open that we
> > need to handle retries in do_last? It seems like we could get rid of the
> > stale_open label and just let do_filp_open handle it like we would
> > an ESTALE return from any other spot in the function.
> >
> > Just for giggles, here's an RFC patch. It builds but I haven't tested
> > it. It might also be possible to do some cleanup around saved_parent
> > with this.
> >
> > Thoughts?
>
> EOPENSTALE is an optimization for the redoing only the last component.
> It's the analogue of ->d_revalidate() failure, in which case lookup of
> that component only is retried path components before that are not.
>
> I'm not sure if it's a valid optimization, but if not, then we should
> also consider doing LOOKUP_REVAL on the whole path on any
> d_revalidate() failure as well.
>
> Thanks,
> Miklos
Ok, that makes sense and it's seems like good enough reason to keep it
as is for now. We can just drop my RFC patch...
Thanks,
--
Jeff Layton <jlayton@primarydata.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-07-17 14:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140714151405.2fa06dd7@notabene.brown>
[not found] ` <20140714081455.69f55224@tlielax.poochiereds.net>
[not found] ` <20140714223513.47807c98@notabene.brown>
[not found] ` <20140714090028.6f04fd2c@tlielax.poochiereds.net>
[not found] ` <20140715085727.6fa12272@notabene.brown>
[not found] ` <20140714194738.5aafaf25@tlielax.poochiereds.net>
[not found] ` <20140717115024.1eb7433d@notabene.brown>
[not found] ` <20140717115024.1eb7433d-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2014-07-17 11:22 ` [PATCH] NFS: nfs4_lookup_revalidate need to report STALE inodes Jeff Layton
[not found] ` <20140717072241.2c1549a3-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2014-07-17 12:52 ` Miklos Szeredi
2014-07-17 14:41 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).