Re: More fun with unmounting ESTALE directories.

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@redhat.com>
To: NeilBrown <neilb@suse.de>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	NFS <linux-nfs@vger.kernel.org>,
	steved@redhat.com
Subject: Re: More fun with unmounting ESTALE directories.
Date: Mon, 18 Feb 2013 07:41:05 -0500	[thread overview]
Message-ID: <20130218074105.38cf49ea@tlielax.poochiereds.net> (raw)
In-Reply-To: <20130218132509.0ce779de@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 8741 bytes --]

On Mon, 18 Feb 2013 13:25:09 +1100
NeilBrown <neilb@suse.de> wrote:

> On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Tue, 12 Feb 2013 11:38:13 +1100
> > NeilBrown <neilb@suse.de> wrote:
> > 
> > > 
> > > I've been exploring difficulties with unmounting stale directories and
> > > discovered another bug.
> > > 
> > > If I:
> > > 
> > > SERVER:  mkdir /foo/bar  #and make sure it is exported
> > > CLIENT:  mount -o vers=4 server:/foo/bar /mnt
> > > SERVER:  rm -r /foo
> > > CLIENT:  > /mnt/baz # gets an error of course
> > > CLIENT:  ls -l /mnt # error again
> > > CLIENT:  umount /mnt
> > > 
> > > The result of that last command is:
> > > 
> > > /mnt was not found in /proc/mounts
> > > /mnt was not found in /proc/mounts
> > > 
> > > Strange?
> > > 
> > > cat /proc/mounts
> > > 
> > > .....
> > > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> > > ....
> > > 
> > > Notice the "\040(deleted)".
> > > 
> > > NFS has unhashed that directory because it is obviously bad, and d_path()
> > > notices and adds " (deleted)".
> > > 
> > > Now I might be able to argue that NFS shouldn't be unhashing a directory that
> > > is a mountpoint - it certainly seems strange behaviour.
> > > 
> > > But I think I can more strongly argue that /proc/mounts shouldn't be showing
> > > the mounted directory, but instead the directory that it is mounted on.
> > > Obviously these both have the same name so it shouldn't matter ... except
> > > that here is a case where it does.
> > > 
> > > I "fixed" it with
> > > 
> > > --- a/fs/proc_namespace.c
> > > +++ b/fs/proc_namespace.c
> > > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
> > >  {
> > >  	struct mount *r = real_mount(mnt);
> > >  	int err = 0;
> > > -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> > > +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
> > >  	struct super_block *sb = mnt_path.dentry->d_sb;
> > >  
> > >  	if (sb->s_op->show_devname) {
> > > 
> > > though I suspect that isn't safe and needs some locking.
> > > 
> > > Probably both should be fixed:  NFS should not invalidate any mounted
> > > directory, and show_vfsmnt() should report the mointpoint, not the mounted
> > > directory.
> > > 
> > > I can't figure out any way to get NFS to not invalidate the mounted directory.
> > > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> > > don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> > > 
> > > Suggestions?  Thoughts?
> > > 
> > > Thanks,
> > > NeilBrown
> > > 
> > 
> > I've also been looking at some weird ESTALE problems. Here's another
> > fun one that doesn't involve mountpoints. Assume here that we're
> > working in the same exported directory on client and server:
> > 
> >     server# mkdir a
> >     client# cd a
> >     server# mv a a.bak
> >     client# sleep 30  # (or whatever the dir attrcache timeout is)
> >     client# stat .
> >     stat: cannot stat ‘.’: Stale NFS file handle
> > 
> > Obviously, "." should not be stale. It got renamed, but the inode still
> > exists on the server.
> > 
> > If you sniff on the wire, you'll see that the server doesn't ever send
> > an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
> > end up trying to revalidate the dentry that "." refers to. We find that
> > the parent changed (obviously) and then try to redo the lookup of "a".
> > At that point we notice that it doesn't exist and turn it into ESTALE.
> > 
> > I don't really understand the point of FS_REVAL_DOT. What does that
> > actually buy us? I wonder if removing it would also help your testcase?
> > 
> 
> I think that is a slightly different issue, but certainly related. 
> I have hit your problem before and have the following patch in SLES.  I think
> I tried pushing it upstream, but didn't get much in the way of a useful
> response.
> (patch is space-damaged - don't try to apply with 'patch').
> 
> BTW I have another problem, related to this one and which could be fixed by
> removing FS_REVAL_DOT.
> 
> If you
>    mount -o vers=4,noac server:/some/path /mnt
> then stop nfsd on the server and
>    umount /mnt
> 
> it hangs.
> Partly it hangs because  'mount' tries to do a 'readlink' on the mountpoint.
> I can probably get it to not do that (or use --no-canonicalize).
> But then sys_umount hangs because it tries to check with the server that the
> thing being unmounted really is still a directory...
> 
> I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> last step and returns the mounted-on directory, not the mountpoint that is
> mounted there - or at least makes sure not revalidate happens on that final
> mounted directory.
> 
> 
> I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
> attributes from the server if the cache is old.  However it seems to do a
> whole lot more than that, including "lookup" calls which it I'm sure is wrong.
> 
> 
> 
> Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.
> 
> When d_revalidate is called on a dentry because FS_REVAL_DOT is set
> it isn't really appropriate to revalidate the name.
> 
> If the path was simply ".", then the current-working-directory could
> have been renamed on the server and should still be accessible as "."
> even if it has a new name.
> 
> If the path was "/some/long/path/.", then the final component ("path" in
> this case) has already been revalidated and there is no particular
> need to do it again.
> 
> If we change nd->last_type to refer to "the last component looked at"
> rather than just "the last component", then these cases can be
> detected by "nd->last_type != LAST_NORM".
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> ---
>  fs/namei.c   |    2 +-
>  fs/nfs/dir.c |    9 +++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> --- linux-3.0-SLE11-SP2.orig/fs/namei.c
> +++ linux-3.0-SLE11-SP2/fs/namei.c
> @@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
>                         }
>                 }
>  
> +               nd->last_type = type;
>                 /* remove trailing slashes? */
>                 if (!c)
>                         goto last_component;
> @@ -1486,7 +1487,6 @@ last_component:
>                 /* Clear LOOKUP_CONTINUE iff it was previously unset */
>                 nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
>                 nd->last = this;
> -               nd->last_type = type;
>                 return 0;
>         }
>         terminate_walk(nd);
> --- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
> +++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
> @@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
>         if (NFS_STALE(inode))
>                 goto out_bad;
>  
> +       if (nd->last_type != LAST_NORM) {
> +               /* name not relevant, just inode */
> +               error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +               if (error)
> +                       goto out_bad;
> +               else
> +                       goto out_valid;
> +       }
> +
>         error = -ENOMEM;
>         fhandle = nfs_alloc_fhandle();
>         fattr = nfs_alloc_fattr();

Ahh thanks -- that is the same problem exactly. I'll have to look over
your patch and see whether and how it could be applied to current
mainline code.

I think the umount thing may be the same problem that steved was
talking about the other day (V4 unmount causes a GETATTR). I hadn't put
the two together, but you're probably right.

LOOKUP_MOUNTPOINT is a very interesting idea and might even be
reasonable in conjunction with removing FS_REVAL_DOT as it would make
the needs of umount more explicit.

As far as FS_REVAL_DOT goes though, in the current mainline code, the
only place that looks at it is complete_walk(), and it just means that
we won't skip doing a d_revalidate on the final component (which will
only have not been done if it's a '.', right?).

If we remove FS_REVAL_DOT altogether, then we can chop off the bottom
half of that function, which has a certain appeal. I guess the question
we have to answer is -- if we remove FS_REVAL_DOT, what (if anything)
will break?


-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

next prev parent reply	other threads:[~2013-02-18 12:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12  0:38 More fun with unmounting ESTALE directories NeilBrown
2013-02-14 15:42 ` Jeff Layton
2013-02-18  2:25   ` NeilBrown
2013-02-18 12:41     ` Jeff Layton [this message]
2013-02-18 15:36       ` Chuck Lever
2013-02-18 21:58         ` J. Bruce Fields
2013-02-18 22:05           ` Jeff Layton
2013-02-18 22:16           ` Chuck Lever
2013-02-18 18:46     ` Al Viro
2013-02-18 19:46       ` Jeff Layton
2013-02-18 20:15         ` Al Viro
2013-02-18 23:14           ` NeilBrown
2013-02-19 12:33             ` Jeff Layton
2013-02-18 23:10       ` NeilBrown
2013-02-18 23:17         ` Myklebust, Trond
2013-02-18 23:31           ` NeilBrown
2013-02-19 14:27         ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130218074105.38cf49ea@tlielax.poochiereds.net \
    --to=jlayton@redhat.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=steved@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox