[PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6)
@ 2009-12-02 19:59 Jeff Layton
  2009-12-02 19:59 ` [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks Jeff Layton
  2009-12-02 19:59 ` [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems Jeff Layton
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Layton @ 2009-12-02 19:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: jamie, pavel, miklos, viro, duaneg, ebiederm

This patchset is another attempt to add missing dentry revalidations to
certain places in the lookup codepath. The main difference from the last
patchset is:

* I've dropped the patch that added permissions checks when chasing
  LAST_BIND symlinks. There's a lot disagreement about whether the
  current behavior is even a bug. I'd prefer to see more concensus on
  that point before we do anything here.

* I've rejiggered the error handling in this codepath to ensure that the
  bind mounts could still be unmounted. Testing showed that returning an
  error in do_lookup when a bind-mounted dentry went stale made it
  un-unmountable.

Comments and suggestions appreciated...

Jeff Layton (2):
  vfs: force reval of target when following LAST_BIND symlinks
  vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT
    filesystems

 fs/namei.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 55 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks
  2009-12-02 19:59 [PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6) Jeff Layton
@ 2009-12-02 19:59 ` Jeff Layton
  2009-12-02 23:53   ` Eric W. Biederman
  2009-12-03 10:39   ` Miklos Szeredi
  2009-12-02 19:59 ` [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems Jeff Layton
  1 sibling, 2 replies; 15+ messages in thread
From: Jeff Layton @ 2009-12-02 19:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: jamie, pavel, miklos, viro, duaneg, ebiederm

procfs-style symlinks return a last_type of LAST_BIND without an actual
path string. This causes __follow_link to skip calling __vfs_follow_link
and so the dentry isn't revalidated.

This is a problem when the link target sits on NFSv4 as it depends on
the VFS to revalidate the dentry before using it on an open call. Ensure
that this occurs by forcing a revalidation of the target dentry of
LAST_BIND symlinks.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/namei.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3374917..339789e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
 }
 
 /*
+ * force_reval_path - force revalidation of a dentry
+ *
+ * In some situations the path walking code will trust dentries without
+ * revalidating them. This causes problems for filesystems that depend on
+ * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
+ * (which indicates that it's possible for the dentry to go stale), force
+ * a d_revalidate call before proceeding.
+ *
+ * Returns 0 if the revalidation was successful. If the revalidation fails,
+ * either return the error returned by d_revalidate or -ESTALE if the
+ * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
+ * invalidate the dentry. It's up to the caller to handle putting references
+ * to the path if necessary.
+ */
+static int
+force_reval_path(struct path *path, struct nameidata *nd)
+{
+	int status;
+	struct dentry *dentry = path->dentry;
+
+	/*
+	 * only check on filesystems where it's possible for the dentry to
+	 * become stale. It's assumed that if this flag is set then the
+	 * d_revalidate op will also be defined.
+	 */
+	if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
+		return 0;
+
+	status = dentry->d_op->d_revalidate(dentry, nd);
+	if (status > 0)
+		return 0;
+
+	if (!status) {
+		d_invalidate(dentry);
+		status = -ESTALE;
+	}
+	return status;
+}
+
+/*
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
@@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
 		error = 0;
 		if (s)
 			error = __vfs_follow_link(nd, s);
+		else if (nd->last_type == LAST_BIND) {
+			error = force_reval_path(&nd->path, nd);
+			if (error)
+				path_put(&nd->path);
+		}
 		if (dentry->d_inode->i_op->put_link)
 			dentry->d_inode->i_op->put_link(dentry, nd, cookie);
 	}
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-02 19:59 [PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6) Jeff Layton
  2009-12-02 19:59 ` [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks Jeff Layton
@ 2009-12-02 19:59 ` Jeff Layton
  2009-12-03  0:01   ` Eric W. Biederman
  2009-12-03 10:58   ` Miklos Szeredi
  1 sibling, 2 replies; 15+ messages in thread
From: Jeff Layton @ 2009-12-02 19:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: jamie, pavel, miklos, viro, duaneg, ebiederm

In the case of a bind mounted file, the path walking code will assume
that the cached dentry that was bind mounted is valid. This is a problem
problem for NFSv4 in a way that's similar to LAST_BIND symlinks.

Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
__follow_mount returns true.

Note that in the non-open codepath, we cannot return an error to the
lookup if the revalidation fails. Doing so will leave a bind mount in
a state such that we can't unmount it. In that case we'll just have to
settle for d_invalidating it (which should mostly turn out to be a
d_drop in this case) and returning success.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/namei.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 339789e..0d55b6f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
 done:
 	path->mnt = mnt;
 	path->dentry = dentry;
-	__follow_mount(path);
+
+	/*
+	 * cannot return the error returned by force_reval_path as that can
+	 * make it impossible to unmount a bind mounted dentry if it's stale.
+	 */
+	if (__follow_mount(path))
+		force_reval_path(path, nd);
 	return 0;
 
 need_lookup:
@@ -1840,6 +1846,9 @@ do_last:
 		error = -ELOOP;
 		if (flag & O_NOFOLLOW)
 			goto exit_dput;
+		error = force_reval_path(&path, &nd);
+		if (error)
+			goto exit_dput;
 	}
 
 	error = -ENOENT;
-- 
1.5.5.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks
  2009-12-02 19:59 ` [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks Jeff Layton
@ 2009-12-02 23:53   ` Eric W. Biederman
  2009-12-03 10:39   ` Miklos Szeredi
  1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2009-12-02 23:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg

Jeff Layton <jlayton@redhat.com> writes:

> procfs-style symlinks return a last_type of LAST_BIND without an actual
> path string. This causes __follow_link to skip calling __vfs_follow_link
> and so the dentry isn't revalidated.
>
> This is a problem when the link target sits on NFSv4 as it depends on
> the VFS to revalidate the dentry before using it on an open call. Ensure
> that this occurs by forcing a revalidation of the target dentry of
> LAST_BIND symlinks.

This seems reasonable to me, and in a quick run through I can't see any
problems.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/namei.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 45 insertions(+), 0 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 3374917..339789e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
>  }
>  
>  /*
> + * force_reval_path - force revalidation of a dentry
> + *
> + * In some situations the path walking code will trust dentries without
> + * revalidating them. This causes problems for filesystems that depend on
> + * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
> + * (which indicates that it's possible for the dentry to go stale), force
> + * a d_revalidate call before proceeding.
> + *
> + * Returns 0 if the revalidation was successful. If the revalidation fails,
> + * either return the error returned by d_revalidate or -ESTALE if the
> + * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
> + * invalidate the dentry. It's up to the caller to handle putting references
> + * to the path if necessary.
> + */
> +static int
> +force_reval_path(struct path *path, struct nameidata *nd)
> +{
> +	int status;
> +	struct dentry *dentry = path->dentry;
> +
> +	/*
> +	 * only check on filesystems where it's possible for the dentry to
> +	 * become stale. It's assumed that if this flag is set then the
> +	 * d_revalidate op will also be defined.
> +	 */
> +	if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
> +		return 0;
> +
> +	status = dentry->d_op->d_revalidate(dentry, nd);
> +	if (status > 0)
> +		return 0;
> +
> +	if (!status) {
> +		d_invalidate(dentry);
> +		status = -ESTALE;
> +	}
> +	return status;
> +}
> +
> +/*
>   * Internal lookup() using the new generic dcache.
>   * SMP-safe
>   */
> @@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
>  		error = 0;
>  		if (s)
>  			error = __vfs_follow_link(nd, s);
> +		else if (nd->last_type == LAST_BIND) {
> +			error = force_reval_path(&nd->path, nd);
> +			if (error)
> +				path_put(&nd->path);
> +		}
>  		if (dentry->d_inode->i_op->put_link)
>  			dentry->d_inode->i_op->put_link(dentry, nd, cookie);
>  	}
> -- 
> 1.5.5.6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-02 19:59 ` [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems Jeff Layton
@ 2009-12-03  0:01   ` Eric W. Biederman
  2009-12-03  1:23     ` Jeff Layton
  2009-12-03 10:58   ` Miklos Szeredi
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2009-12-03  0:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg

Jeff Layton <jlayton@redhat.com> writes:

> In the case of a bind mounted file, the path walking code will assume
> that the cached dentry that was bind mounted is valid. This is a problem
> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>
> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> __follow_mount returns true.
>
> Note that in the non-open codepath, we cannot return an error to the
> lookup if the revalidation fails. Doing so will leave a bind mount in
> a state such that we can't unmount it. In that case we'll just have to
> settle for d_invalidating it (which should mostly turn out to be a
> d_drop in this case) and returning success.

Looks reasonable to me.  I wonder a little if we care about follow_mount
as well as __follow_mount.

This seems reasonable to me.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/namei.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 339789e..0d55b6f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
>  done:
>  	path->mnt = mnt;
>  	path->dentry = dentry;
> -	__follow_mount(path);
> +
> +	/*
> +	 * cannot return the error returned by force_reval_path as that can
> +	 * make it impossible to unmount a bind mounted dentry if it's stale.
> +	 */
> +	if (__follow_mount(path))
> +		force_reval_path(path, nd);
>  	return 0;
>  
>  need_lookup:
> @@ -1840,6 +1846,9 @@ do_last:
>  		error = -ELOOP;
>  		if (flag & O_NOFOLLOW)
>  			goto exit_dput;
> +		error = force_reval_path(&path, &nd);
> +		if (error)
> +			goto exit_dput;
>  	}
>  
>  	error = -ENOENT;
> -- 
> 1.5.5.6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03  0:01   ` Eric W. Biederman
@ 2009-12-03  1:23     ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2009-12-03  1:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg

On Wed, 02 Dec 2009 16:01:59 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Jeff Layton <jlayton@redhat.com> writes:
> 
> > In the case of a bind mounted file, the path walking code will assume
> > that the cached dentry that was bind mounted is valid. This is a problem
> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> >
> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> > __follow_mount returns true.
> >
> > Note that in the non-open codepath, we cannot return an error to the
> > lookup if the revalidation fails. Doing so will leave a bind mount in
> > a state such that we can't unmount it. In that case we'll just have to
> > settle for d_invalidating it (which should mostly turn out to be a
> > d_drop in this case) and returning success.
> 
> Looks reasonable to me.  I wonder a little if we care about follow_mount
> as well as __follow_mount.
> 

Good question.

It does sort of seem like it. There's also follow_down too...

OTOH, we have a reproducible testcases that the two patches in this
series fix. I'm not aware of anything that's technically "broken" by
the lack of revalidations in the other codepaths so I'm hesitant to add
them in unless and until we do.

> This seems reasonable to me.
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> >
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/namei.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 339789e..0d55b6f 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> >  done:
> >  	path->mnt = mnt;
> >  	path->dentry = dentry;
> > -	__follow_mount(path);
> > +
> > +	/*
> > +	 * cannot return the error returned by force_reval_path as that can
> > +	 * make it impossible to unmount a bind mounted dentry if it's stale.
> > +	 */
> > +	if (__follow_mount(path))
> > +		force_reval_path(path, nd);
> >  	return 0;
> >  
> >  need_lookup:
> > @@ -1840,6 +1846,9 @@ do_last:
> >  		error = -ELOOP;
> >  		if (flag & O_NOFOLLOW)
> >  			goto exit_dput;
> > +		error = force_reval_path(&path, &nd);
> > +		if (error)
> > +			goto exit_dput;
> >  	}
> >  
> >  	error = -ENOENT;
> > -- 
> > 1.5.5.6


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks
  2009-12-02 19:59 ` [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks Jeff Layton
  2009-12-02 23:53   ` Eric W. Biederman
@ 2009-12-03 10:39   ` Miklos Szeredi
  1 sibling, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2009-12-03 10:39 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg,
	ebiederm

On Wed,  2 Dec 2009, Jeff Layton wrote:
> procfs-style symlinks return a last_type of LAST_BIND without an actual
> path string. This causes __follow_link to skip calling __vfs_follow_link
> and so the dentry isn't revalidated.
> 
> This is a problem when the link target sits on NFSv4 as it depends on
> the VFS to revalidate the dentry before using it on an open call. Ensure
> that this occurs by forcing a revalidation of the target dentry of
> LAST_BIND symlinks.

An added advantage is that the symlinks in /proc will show the
invalidated path as "(deleted)".  Maybe proc_pid_readlink() should
also call force_reval_path() to get an up-to-date state of the path?

Acked-by: Miklos Szeredi <mszeredi@suse.cz>

> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/namei.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 3374917..339789e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
>  }
>  
>  /*
> + * force_reval_path - force revalidation of a dentry
> + *
> + * In some situations the path walking code will trust dentries without
> + * revalidating them. This causes problems for filesystems that depend on
> + * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
> + * (which indicates that it's possible for the dentry to go stale), force
> + * a d_revalidate call before proceeding.
> + *
> + * Returns 0 if the revalidation was successful. If the revalidation fails,
> + * either return the error returned by d_revalidate or -ESTALE if the
> + * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
> + * invalidate the dentry. It's up to the caller to handle putting references
> + * to the path if necessary.
> + */
> +static int
> +force_reval_path(struct path *path, struct nameidata *nd)
> +{
> +	int status;
> +	struct dentry *dentry = path->dentry;
> +
> +	/*
> +	 * only check on filesystems where it's possible for the dentry to
> +	 * become stale. It's assumed that if this flag is set then the
> +	 * d_revalidate op will also be defined.
> +	 */
> +	if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
> +		return 0;
> +
> +	status = dentry->d_op->d_revalidate(dentry, nd);
> +	if (status > 0)
> +		return 0;
> +
> +	if (!status) {
> +		d_invalidate(dentry);
> +		status = -ESTALE;
> +	}
> +	return status;
> +}
> +
> +/*
>   * Internal lookup() using the new generic dcache.
>   * SMP-safe
>   */
> @@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
>  		error = 0;
>  		if (s)
>  			error = __vfs_follow_link(nd, s);
> +		else if (nd->last_type == LAST_BIND) {
> +			error = force_reval_path(&nd->path, nd);
> +			if (error)
> +				path_put(&nd->path);
> +		}
>  		if (dentry->d_inode->i_op->put_link)
>  			dentry->d_inode->i_op->put_link(dentry, nd, cookie);
>  	}
> -- 
> 1.5.5.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-02 19:59 ` [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems Jeff Layton
  2009-12-03  0:01   ` Eric W. Biederman
@ 2009-12-03 10:58   ` Miklos Szeredi
  2009-12-03 11:11     ` Eric W. Biederman
  2009-12-03 15:20     ` Jeff Layton
  1 sibling, 2 replies; 15+ messages in thread
From: Miklos Szeredi @ 2009-12-03 10:58 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg,
	ebiederm

On Wed,  2 Dec 2009, Jeff Layton wrote:
> In the case of a bind mounted file, the path walking code will assume
> that the cached dentry that was bind mounted is valid. This is a problem
> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> 
> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> __follow_mount returns true.
> 
> Note that in the non-open codepath, we cannot return an error to the
> lookup if the revalidation fails. Doing so will leave a bind mount in
> a state such that we can't unmount it. In that case we'll just have to
> settle for d_invalidating it (which should mostly turn out to be a
> d_drop in this case) and returning success.

The only worry I have is that this adds an extra branch in a very hot
codepath (do_lookup).  An error can't be returned, as you note, and
for bind mounted directories d_invalidate() will not succeed: the
directory is busy, it's referenced by the mount.  So basically the
only thing this does is working around the NFSv4 issue.  But Trond has
a proper solution to that, and a temporary solution could be added to
do_filp_open() rather than burdening do_lookup() with it, no?

Thanks,
Miklos

> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/namei.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 339789e..0d55b6f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
>  done:
>  	path->mnt = mnt;
>  	path->dentry = dentry;
> -	__follow_mount(path);
> +
> +	/*
> +	 * cannot return the error returned by force_reval_path as that can
> +	 * make it impossible to unmount a bind mounted dentry if it's stale.
> +	 */
> +	if (__follow_mount(path))
> +		force_reval_path(path, nd);
>  	return 0;
>  
>  need_lookup:
> @@ -1840,6 +1846,9 @@ do_last:
>  		error = -ELOOP;
>  		if (flag & O_NOFOLLOW)
>  			goto exit_dput;
> +		error = force_reval_path(&path, &nd);
> +		if (error)
> +			goto exit_dput;
>  	}
>  
>  	error = -ENOENT;
> -- 
> 1.5.5.6
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 10:58   ` Miklos Szeredi
@ 2009-12-03 11:11     ` Eric W. Biederman
  2009-12-03 11:19       ` Miklos Szeredi
  2009-12-03 15:20     ` Jeff Layton
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2009-12-03 11:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, jamie, pavel, viro,
	duaneg

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Wed,  2 Dec 2009, Jeff Layton wrote:
>> In the case of a bind mounted file, the path walking code will assume
>> that the cached dentry that was bind mounted is valid. This is a problem
>> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>> 
>> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
>> __follow_mount returns true.
>> 
>> Note that in the non-open codepath, we cannot return an error to the
>> lookup if the revalidation fails. Doing so will leave a bind mount in
>> a state such that we can't unmount it. In that case we'll just have to
>> settle for d_invalidating it (which should mostly turn out to be a
>> d_drop in this case) and returning success.
>
> The only worry I have is that this adds an extra branch in a very hot
> codepath (do_lookup).  An error can't be returned, as you note, and
> for bind mounted directories d_invalidate() will not succeed: the
> directory is busy, it's referenced by the mount. 

Not true.  d_mountpoint is false, so d_invalidate can succeed.

> So basically the
> only thing this does is working around the NFSv4 issue. 

No, this should catch other cases where we have a dentry goes
stale as well, and lets the distributed filesystem handle it.

It is probably worth a benchmark to ease the concerns about the hotpath.
I expect the cpu will predict the branch as unlikely and we won't see
any difference.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 11:11     ` Eric W. Biederman
@ 2009-12-03 11:19       ` Miklos Szeredi
  2009-12-03 11:56         ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2009-12-03 11:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: miklos, jlayton, linux-fsdevel, linux-kernel, jamie, pavel, viro,
	duaneg

On Thu, 03 Dec 2009, Eric W. Biederman wrote:
> > The only worry I have is that this adds an extra branch in a very hot
> > codepath (do_lookup).  An error can't be returned, as you note, and
> > for bind mounted directories d_invalidate() will not succeed: the
> > directory is busy, it's referenced by the mount. 
> 
> Not true.  d_mountpoint is false, so d_invalidate can succeed.

Have a look at the code.  d_invalidate() doesn't check for a
mountpoint, it checks the refcount.  It needs to keep the directory
dentry hashed if it's in any way reachable other than from the cache
(file descriptor, cwd, mount, etc).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 11:19       ` Miklos Szeredi
@ 2009-12-03 11:56         ` Eric W. Biederman
  2009-12-03 15:21           ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2009-12-03 11:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jlayton, linux-fsdevel, linux-kernel, jamie, pavel, viro, duaneg

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Thu, 03 Dec 2009, Eric W. Biederman wrote:
>> > The only worry I have is that this adds an extra branch in a very hot
>> > codepath (do_lookup).  An error can't be returned, as you note, and
>> > for bind mounted directories d_invalidate() will not succeed: the
>> > directory is busy, it's referenced by the mount. 
>> 
>> Not true.  d_mountpoint is false, so d_invalidate can succeed.
>
> Have a look at the code.  d_invalidate() doesn't check for a
> mountpoint, it checks the refcount.  It needs to keep the directory
> dentry hashed if it's in any way reachable other than from the cache
> (file descriptor, cwd, mount, etc).

Ah.  I thought you were thinking about the mandatory have_submounts()
check in dentry->d_op->d_revalidate().

I expect the generic d_invalidate will simply hit the:
	spin_lock(&dcache_lock);
	if (d_unhashed(dentry)) {
		spin_unlock(&dcache_lock);
		return 0;
	}

After the distributed filesystem has called d_drop in
dentry->d_op->d_revalidate (when appropriate.

Eric




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 10:58   ` Miklos Szeredi
  2009-12-03 11:11     ` Eric W. Biederman
@ 2009-12-03 15:20     ` Jeff Layton
  2009-12-03 18:35       ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2009-12-03 15:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, jamie, pavel, miklos, viro, duaneg,
	ebiederm, Trond Myklebust

On Thu, 03 Dec 2009 11:58:43 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Wed,  2 Dec 2009, Jeff Layton wrote:
> > In the case of a bind mounted file, the path walking code will assume
> > that the cached dentry that was bind mounted is valid. This is a problem
> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> > 
> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> > __follow_mount returns true.
> > 
> > Note that in the non-open codepath, we cannot return an error to the
> > lookup if the revalidation fails. Doing so will leave a bind mount in
> > a state such that we can't unmount it. In that case we'll just have to
> > settle for d_invalidating it (which should mostly turn out to be a
> > d_drop in this case) and returning success.
> 
> The only worry I have is that this adds an extra branch in a very hot
> codepath (do_lookup).  An error can't be returned, as you note, and
> for bind mounted directories d_invalidate() will not succeed: the
> directory is busy, it's referenced by the mount.  So basically the
> only thing this does is working around the NFSv4 issue.  But Trond has
> a proper solution to that, and a temporary solution could be added to
> do_filp_open() rather than burdening do_lookup() with it, no?
> 

(re-adding Trond. I forgot to cc him on this latest set)

Self-NAK on this patch...

That's my main worry too, and sadly it doesn't seem to be unfounded.
This patch adds a lot of extra d_revalidate calls here. I think it's
going to be too expensive to do this.

Unfortunately, adding the code to do_filp_open alone doesn't really
help. The path walking code in there is just for the O_CREAT case. If
we're not creating the file, we call into path_lookup_open which
eventually calls do_lookup.

What we probably need to do is only make it d_revalidate when it looks
like the final dentry is bind mounted. I'm not sure how to tell that
though and even if I could, I'm still leery of adding this to such a
hot codepath.

The only problem I've identified that this fixes is with file bind
mounts and I don't get the impression they're that common. Maybe the
best thing is to just fix the LAST_BIND symlink case for now and wait
for Trond or Al's overhaul of this code.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 11:56         ` Eric W. Biederman
@ 2009-12-03 15:21           ` Miklos Szeredi
  0 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2009-12-03 15:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: miklos, jlayton, linux-fsdevel, linux-kernel, jamie, pavel, viro,
	duaneg

On Thu, 03 Dec 2009, Eric W. Biederman wrote:
> Ah.  I thought you were thinking about the mandatory have_submounts()
> check in dentry->d_op->d_revalidate().
> 
> I expect the generic d_invalidate will simply hit the:
> 	spin_lock(&dcache_lock);
> 	if (d_unhashed(dentry)) {
> 		spin_unlock(&dcache_lock);
> 		return 0;
> 	}
> 
> After the distributed filesystem has called d_drop in
> dentry->d_op->d_revalidate (when appropriate.

Ah, right, NFS's d_revalidate does do d_drop().  Okay, I have no
problem with the patch, as long as path lookup performance doesn't
show a regression.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 15:20     ` Jeff Layton
@ 2009-12-03 18:35       ` Eric W. Biederman
  2009-12-03 19:15         ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2009-12-03 18:35 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, jamie, pavel, viro,
	duaneg, Trond Myklebust

Jeff Layton <jlayton@redhat.com> writes:

> On Thu, 03 Dec 2009 11:58:43 +0100
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> On Wed,  2 Dec 2009, Jeff Layton wrote:
>> > In the case of a bind mounted file, the path walking code will assume
>> > that the cached dentry that was bind mounted is valid. This is a problem
>> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>> > 
>> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
>> > __follow_mount returns true.
>> > 
>> > Note that in the non-open codepath, we cannot return an error to the
>> > lookup if the revalidation fails. Doing so will leave a bind mount in
>> > a state such that we can't unmount it. In that case we'll just have to
>> > settle for d_invalidating it (which should mostly turn out to be a
>> > d_drop in this case) and returning success.
>> 
>> The only worry I have is that this adds an extra branch in a very hot
>> codepath (do_lookup).  An error can't be returned, as you note, and
>> for bind mounted directories d_invalidate() will not succeed: the
>> directory is busy, it's referenced by the mount.  So basically the
>> only thing this does is working around the NFSv4 issue.  But Trond has
>> a proper solution to that, and a temporary solution could be added to
>> do_filp_open() rather than burdening do_lookup() with it, no?
>> 
>
> (re-adding Trond. I forgot to cc him on this latest set)
>
> Self-NAK on this patch...
>
> That's my main worry too, and sadly it doesn't seem to be unfounded.
> This patch adds a lot of extra d_revalidate calls here. I think it's
> going to be too expensive to do this.

How so?  We should only see extra calls if we follow a mount point.
Currently we call d_revalidate on every path component.

> The only problem I've identified that this fixes is with file bind
> mounts and I don't get the impression they're that common. Maybe the
> best thing is to just fix the LAST_BIND symlink case for now and wait
> for Trond or Al's overhaul of this code.

Well right now following mount points breaks the VFS contract that we
will revalidate all dentries before we use them.  That breaking of the
contract breaks NFS.

I don't know what else d_revalidate is good for.  On the sysfs side
I only use it to unhash the dentry.  Something we don't care about
from the do_lookup side of things if we have a bind mount.

I'm not clear what kind of changes revalidating a deleted but open
file will give you on NFS.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems
  2009-12-03 18:35       ` Eric W. Biederman
@ 2009-12-03 19:15         ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2009-12-03 19:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, jamie, pavel, viro,
	duaneg, Trond Myklebust

On Thu, 03 Dec 2009 10:35:21 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Jeff Layton <jlayton@redhat.com> writes:
> 
> > On Thu, 03 Dec 2009 11:58:43 +0100
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> >> On Wed,  2 Dec 2009, Jeff Layton wrote:
> >> > In the case of a bind mounted file, the path walking code will assume
> >> > that the cached dentry that was bind mounted is valid. This is a problem
> >> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> >> > 
> >> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> >> > __follow_mount returns true.
> >> > 
> >> > Note that in the non-open codepath, we cannot return an error to the
> >> > lookup if the revalidation fails. Doing so will leave a bind mount in
> >> > a state such that we can't unmount it. In that case we'll just have to
> >> > settle for d_invalidating it (which should mostly turn out to be a
> >> > d_drop in this case) and returning success.
> >> 
> >> The only worry I have is that this adds an extra branch in a very hot
> >> codepath (do_lookup).  An error can't be returned, as you note, and
> >> for bind mounted directories d_invalidate() will not succeed: the
> >> directory is busy, it's referenced by the mount.  So basically the
> >> only thing this does is working around the NFSv4 issue.  But Trond has
> >> a proper solution to that, and a temporary solution could be added to
> >> do_filp_open() rather than burdening do_lookup() with it, no?
> >> 
> >
> > (re-adding Trond. I forgot to cc him on this latest set)
> >
> > Self-NAK on this patch...
> >
> > That's my main worry too, and sadly it doesn't seem to be unfounded.
> > This patch adds a lot of extra d_revalidate calls here. I think it's
> > going to be too expensive to do this.
> 
> How so?  We should only see extra calls if we follow a mount point.
> Currently we call d_revalidate on every path component.
> 
> > The only problem I've identified that this fixes is with file bind
> > mounts and I don't get the impression they're that common. Maybe the
> > best thing is to just fix the LAST_BIND symlink case for now and wait
> > for Trond or Al's overhaul of this code.
> 
> Well right now following mount points breaks the VFS contract that we
> will revalidate all dentries before we use them.  That breaking of the
> contract breaks NFS.
> 
> I don't know what else d_revalidate is good for.  On the sysfs side
> I only use it to unhash the dentry.  Something we don't care about
> from the do_lookup side of things if we have a bind mount.
> 
> I'm not clear what kind of changes revalidating a deleted but open
> file will give you on NFS.
> 

My concern here is based mainly on a simple experiment I did to fire off
a printk and do a dump_stack every time we call d_revalidate from
force_reval_path. Even a very small set of operations caused that to
get called many, many times, mostly from do_lookup.

You're correct that the current behavior breaks the d_revalidate
"contract". What I'm not sure of is whether that truly breaks anything
beyond file bind mounts.

If it doesn't then we have to ask ourselves -- is it worth a potential
performance hit in a very hot codepath to fix bind mounted files that
live on NFSv4? My current thinking is "no".

Much of that decision is based upon an assumption that file bind mounts
are rarely ever used. If that's the case, then it's probably more
prudent to wait until the VFS has been fixed so that NFSv4 no longer
needs to depend on d_revalidate this way.

The LAST_BIND symlink case is a little more straightforward since the
fix is more targeted. I think that should be reasonable for 2.6.33.

Cheers,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-12-03 19:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-02 19:59 [PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6) Jeff Layton
2009-12-02 19:59 ` [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks Jeff Layton
2009-12-02 23:53   ` Eric W. Biederman
2009-12-03 10:39   ` Miklos Szeredi
2009-12-02 19:59 ` [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems Jeff Layton
2009-12-03  0:01   ` Eric W. Biederman
2009-12-03  1:23     ` Jeff Layton
2009-12-03 10:58   ` Miklos Szeredi
2009-12-03 11:11     ` Eric W. Biederman
2009-12-03 11:19       ` Miklos Szeredi
2009-12-03 11:56         ` Eric W. Biederman
2009-12-03 15:21           ` Miklos Szeredi
2009-12-03 15:20     ` Jeff Layton
2009-12-03 18:35       ` Eric W. Biederman
2009-12-03 19:15         ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).