From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jlayton@redhat.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
NFS <linux-nfs@vger.kernel.org>
Subject: Re: More fun with unmounting ESTALE directories.
Date: Mon, 18 Feb 2013 13:25:09 +1100 [thread overview]
Message-ID: <20130218132509.0ce779de@notabene.brown> (raw)
In-Reply-To: <20130214104230.013b7f71@tlielax.poochiereds.net>
[-- Attachment #1: Type: text/plain, Size: 7243 bytes --]
On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:
> On Tue, 12 Feb 2013 11:38:13 +1100
> NeilBrown <neilb@suse.de> wrote:
>
> >
> > I've been exploring difficulties with unmounting stale directories and
> > discovered another bug.
> >
> > If I:
> >
> > SERVER: mkdir /foo/bar #and make sure it is exported
> > CLIENT: mount -o vers=4 server:/foo/bar /mnt
> > SERVER: rm -r /foo
> > CLIENT: > /mnt/baz # gets an error of course
> > CLIENT: ls -l /mnt # error again
> > CLIENT: umount /mnt
> >
> > The result of that last command is:
> >
> > /mnt was not found in /proc/mounts
> > /mnt was not found in /proc/mounts
> >
> > Strange?
> >
> > cat /proc/mounts
> >
> > .....
> > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> > ....
> >
> > Notice the "\040(deleted)".
> >
> > NFS has unhashed that directory because it is obviously bad, and d_path()
> > notices and adds " (deleted)".
> >
> > Now I might be able to argue that NFS shouldn't be unhashing a directory that
> > is a mountpoint - it certainly seems strange behaviour.
> >
> > But I think I can more strongly argue that /proc/mounts shouldn't be showing
> > the mounted directory, but instead the directory that it is mounted on.
> > Obviously these both have the same name so it shouldn't matter ... except
> > that here is a case where it does.
> >
> > I "fixed" it with
> >
> > --- a/fs/proc_namespace.c
> > +++ b/fs/proc_namespace.c
> > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
> > {
> > struct mount *r = real_mount(mnt);
> > int err = 0;
> > - struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> > + struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
> > struct super_block *sb = mnt_path.dentry->d_sb;
> >
> > if (sb->s_op->show_devname) {
> >
> > though I suspect that isn't safe and needs some locking.
> >
> > Probably both should be fixed: NFS should not invalidate any mounted
> > directory, and show_vfsmnt() should report the mointpoint, not the mounted
> > directory.
> >
> > I can't figure out any way to get NFS to not invalidate the mounted directory.
> > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> > don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> >
> > Suggestions? Thoughts?
> >
> > Thanks,
> > NeilBrown
> >
>
> I've also been looking at some weird ESTALE problems. Here's another
> fun one that doesn't involve mountpoints. Assume here that we're
> working in the same exported directory on client and server:
>
> server# mkdir a
> client# cd a
> server# mv a a.bak
> client# sleep 30 # (or whatever the dir attrcache timeout is)
> client# stat .
> stat: cannot stat ‘.’: Stale NFS file handle
>
> Obviously, "." should not be stale. It got renamed, but the inode still
> exists on the server.
>
> If you sniff on the wire, you'll see that the server doesn't ever send
> an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
> end up trying to revalidate the dentry that "." refers to. We find that
> the parent changed (obviously) and then try to redo the lookup of "a".
> At that point we notice that it doesn't exist and turn it into ESTALE.
>
> I don't really understand the point of FS_REVAL_DOT. What does that
> actually buy us? I wonder if removing it would also help your testcase?
>
I think that is a slightly different issue, but certainly related.
I have hit your problem before and have the following patch in SLES. I think
I tried pushing it upstream, but didn't get much in the way of a useful
response.
(patch is space-damaged - don't try to apply with 'patch').
BTW I have another problem, related to this one and which could be fixed by
removing FS_REVAL_DOT.
If you
mount -o vers=4,noac server:/some/path /mnt
then stop nfsd on the server and
umount /mnt
it hangs.
Partly it hangs because 'mount' tries to do a 'readlink' on the mountpoint.
I can probably get it to not do that (or use --no-canonicalize).
But then sys_umount hangs because it tries to check with the server that the
thing being unmounted really is still a directory...
I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
last step and returns the mounted-on directory, not the mountpoint that is
mounted there - or at least makes sure not revalidate happens on that final
mounted directory.
I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
attributes from the server if the cache is old. However it seems to do a
whole lot more than that, including "lookup" calls which it I'm sure is wrong.
NeilBrown
Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.
When d_revalidate is called on a dentry because FS_REVAL_DOT is set
it isn't really appropriate to revalidate the name.
If the path was simply ".", then the current-working-directory could
have been renamed on the server and should still be accessible as "."
even if it has a new name.
If the path was "/some/long/path/.", then the final component ("path" in
this case) has already been revalidated and there is no particular
need to do it again.
If we change nd->last_type to refer to "the last component looked at"
rather than just "the last component", then these cases can be
detected by "nd->last_type != LAST_NORM".
Signed-off-by: NeilBrown <neilb@suse.de>
---
fs/namei.c | 2 +-
fs/nfs/dir.c | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--- linux-3.0-SLE11-SP2.orig/fs/namei.c
+++ linux-3.0-SLE11-SP2/fs/namei.c
@@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
}
}
+ nd->last_type = type;
/* remove trailing slashes? */
if (!c)
goto last_component;
@@ -1486,7 +1487,6 @@ last_component:
/* Clear LOOKUP_CONTINUE iff it was previously unset */
nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
nd->last = this;
- nd->last_type = type;
return 0;
}
terminate_walk(nd);
--- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
+++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
@@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
if (NFS_STALE(inode))
goto out_bad;
+ if (nd->last_type != LAST_NORM) {
+ /* name not relevant, just inode */
+ error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (error)
+ goto out_bad;
+ else
+ goto out_valid;
+ }
+
error = -ENOMEM;
fhandle = nfs_alloc_fhandle();
fattr = nfs_alloc_fattr();
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-02-18 2:25 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-12 0:38 More fun with unmounting ESTALE directories NeilBrown
2013-02-14 15:42 ` Jeff Layton
2013-02-18 2:25 ` NeilBrown [this message]
2013-02-18 12:41 ` Jeff Layton
2013-02-18 15:36 ` Chuck Lever
2013-02-18 21:58 ` J. Bruce Fields
2013-02-18 22:05 ` Jeff Layton
2013-02-18 22:16 ` Chuck Lever
2013-02-18 18:46 ` Al Viro
2013-02-18 19:46 ` Jeff Layton
2013-02-18 20:15 ` Al Viro
2013-02-18 23:14 ` NeilBrown
2013-02-19 12:33 ` Jeff Layton
2013-02-18 23:10 ` NeilBrown
2013-02-18 23:17 ` Myklebust, Trond
2013-02-18 23:31 ` NeilBrown
2013-02-19 14:27 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130218132509.0ce779de@notabene.brown \
--to=neilb@suse.de \
--cc=Trond.Myklebust@netapp.com \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.