linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] VFS: minor improvements to a couple of interfaces
@ 2025-02-07  3:36 NeilBrown
  2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  3:36 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro, Jan Kara
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon,
	Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore,
	Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel,
	linux-nfs, linux-cifs, audit

Hi,
 I found these opportunities for simplification as part of my work to
 enhance filesystem directory operations to not require an exclusive
 lock on the directory.
 There are quite a collection of users of these interfaces incluing NFS,
 smb/server, bcachefs, devtmpfs, and audit.  Hence the long Cc line.

NeilBrown

 [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at()
 [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown
@ 2025-02-07  3:36 ` NeilBrown
  2025-02-07  3:46   ` Kent Overstreet
  2025-02-07 19:09   ` Paul Moore
  2025-02-07  3:36 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
  2025-02-10  8:25 ` [PATCH 0/2] VFS: minor improvements to a couple of interfaces Christian Brauner
  2 siblings, 2 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  3:36 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro, Jan Kara
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon,
	Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore,
	Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel,
	linux-nfs, linux-cifs, audit

No callers of kern_path_locked() or user_path_locked_at() want a
negative dentry.  So change them to return -ENOENT instead.  This
simplifies callers.

This results in a subtle change to bcachefs in that an ioctl will now
return -ENOENT in preference to -EXDEV.  I believe this restores the
behaviour to what it was prior to
 Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/base/devtmpfs.c | 65 +++++++++++++++++++----------------------
 fs/bcachefs/fs-ioctl.c  |  4 ---
 fs/namei.c              |  4 +++
 kernel/audit_watch.c    | 12 ++++----
 4 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index b848764ef018..c9e34842139f 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -245,15 +245,12 @@ static int dev_rmdir(const char *name)
 	dentry = kern_path_locked(name, &parent);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
-	if (d_really_is_positive(dentry)) {
-		if (d_inode(dentry)->i_private == &thread)
-			err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
-					dentry);
-		else
-			err = -EPERM;
-	} else {
-		err = -ENOENT;
-	}
+	if (d_inode(dentry)->i_private == &thread)
+		err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
+				dentry);
+	else
+		err = -EPERM;
+
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
 	path_put(&parent);
@@ -310,6 +307,8 @@ static int handle_remove(const char *nodename, struct device *dev)
 {
 	struct path parent;
 	struct dentry *dentry;
+	struct kstat stat;
+	struct path p;
 	int deleted = 0;
 	int err;
 
@@ -317,32 +316,28 @@ static int handle_remove(const char *nodename, struct device *dev)
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	if (d_really_is_positive(dentry)) {
-		struct kstat stat;
-		struct path p = {.mnt = parent.mnt, .dentry = dentry};
-		err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
-				  AT_STATX_SYNC_AS_STAT);
-		if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
-			struct iattr newattrs;
-			/*
-			 * before unlinking this node, reset permissions
-			 * of possible references like hardlinks
-			 */
-			newattrs.ia_uid = GLOBAL_ROOT_UID;
-			newattrs.ia_gid = GLOBAL_ROOT_GID;
-			newattrs.ia_mode = stat.mode & ~0777;
-			newattrs.ia_valid =
-				ATTR_UID|ATTR_GID|ATTR_MODE;
-			inode_lock(d_inode(dentry));
-			notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
-			inode_unlock(d_inode(dentry));
-			err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
-					 dentry, NULL);
-			if (!err || err == -ENOENT)
-				deleted = 1;
-		}
-	} else {
-		err = -ENOENT;
+	p.mnt = parent.mnt;
+	p.dentry = dentry;
+	err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
+			  AT_STATX_SYNC_AS_STAT);
+	if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
+		struct iattr newattrs;
+		/*
+		 * before unlinking this node, reset permissions
+		 * of possible references like hardlinks
+		 */
+		newattrs.ia_uid = GLOBAL_ROOT_UID;
+		newattrs.ia_gid = GLOBAL_ROOT_GID;
+		newattrs.ia_mode = stat.mode & ~0777;
+		newattrs.ia_valid =
+			ATTR_UID|ATTR_GID|ATTR_MODE;
+		inode_lock(d_inode(dentry));
+		notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
+		inode_unlock(d_inode(dentry));
+		err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
+				 dentry, NULL);
+		if (!err || err == -ENOENT)
+			deleted = 1;
 	}
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 15725b4ce393..595b57fabc9a 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
 		ret = -EXDEV;
 		goto err;
 	}
-	if (!d_is_positive(victim)) {
-		ret = -ENOENT;
-		goto err;
-	}
 	ret = __bch2_unlink(dir, victim, true);
 	if (!ret) {
 		fsnotify_rmdir(dir, victim);
diff --git a/fs/namei.c b/fs/namei.c
index 3a4039acdb3f..e3047db7b2b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2736,6 +2736,10 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	d = lookup_one_qstr_excl(&last, path->dentry, 0);
+	if (!IS_ERR(d) && d_is_negative(d)) {
+		dput(d);
+		d = ERR_PTR(-ENOENT);
+	}
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 7f358740e958..e3130675ee6b 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
 	struct dentry *d = kern_path_locked(watch->path, parent);
 	if (IS_ERR(d))
 		return PTR_ERR(d);
-	if (d_is_positive(d)) {
-		/* update watch filter fields */
-		watch->dev = d->d_sb->s_dev;
-		watch->ino = d_backing_inode(d)->i_ino;
-	}
+	/* update watch filter fields */
+	watch->dev = d->d_sb->s_dev;
+	watch->ino = d_backing_inode(d)->i_ino;
+
 	inode_unlock(d_backing_inode(parent->dentry));
 	dput(d);
 	return 0;
@@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 	/* caller expects mutex locked */
 	mutex_lock(&audit_filter_mutex);
 
-	if (ret) {
+	if (ret && ret != -ENOENT) {
 		audit_put_watch(watch);
 		return ret;
 	}
@@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 
 	h = audit_hash_ino((u32)watch->ino);
 	*list = &audit_inode_hash[h];
+	ret = 0;
 error:
 	path_put(&parent_path);
 	audit_put_watch(watch);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-07  3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown
  2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
@ 2025-02-07  3:36 ` NeilBrown
  2025-02-12  2:50   ` kernel test robot
  2025-02-12  3:16   ` Al Viro
  2025-02-10  8:25 ` [PATCH 0/2] VFS: minor improvements to a couple of interfaces Christian Brauner
  2 siblings, 2 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  3:36 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro, Jan Kara
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon,
	Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore,
	Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel,
	linux-nfs, linux-cifs, audit

Callers of lookup_one_qstr_excl() often check if the result is negative or
positive.
These changes can easily be moved into lookup_one_qstr_excl() by checking the
lookup flags:
LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
LOOKUP_EXCL means it IS an error if the name DOES exist.

This patch adds these checks, then removes error checks from callers,
and ensures that appropriate flags are passed.

This subtly changes the meaning of LOOKUP_EXCL.  Previously it could
only accompany LOOKUP_CREATE.  Now it can accompany LOOKUP_RENAME_TARGET
as well.  A couple of small changes are needed to accommodate this.  The
NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
exactly what the name says.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c          | 59 +++++++++++++++------------------------------
 fs/nfs/dir.c        |  3 ++-
 fs/smb/server/vfs.c | 26 ++++++++------------
 3 files changed, 31 insertions(+), 57 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e3047db7b2b4..e0527f4b0586 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1665,6 +1665,8 @@ static struct dentry *lookup_dcache(const struct qstr *name,
  * dentries - as the matter of fact, this only gets called
  * when directory is guaranteed to have no in-lookup children
  * at all.
+ * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
+ * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
  */
 struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 				    struct dentry *base,
@@ -1675,7 +1677,7 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 	struct inode *dir = base->d_inode;
 
 	if (dentry)
-		return dentry;
+		goto found;
 
 	/* Don't create child dentry for a dead directory. */
 	if (unlikely(IS_DEADDIR(dir)))
@@ -1690,6 +1692,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 		dput(dentry);
 		dentry = old;
 	}
+found:
+	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
+		dput(dentry);
+		return ERR_PTR(-ENOENT);
+	}
+	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
+		dput(dentry);
+		return ERR_PTR(-EEXIST);
+	}
 	return dentry;
 }
 EXPORT_SYMBOL(lookup_one_qstr_excl);
@@ -2736,10 +2747,6 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	d = lookup_one_qstr_excl(&last, path->dentry, 0);
-	if (!IS_ERR(d) && d_is_negative(d)) {
-		dput(d);
-		d = ERR_PTR(-ENOENT);
-	}
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
@@ -4077,27 +4084,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	 * '/', and a directory wasn't requested.
 	 */
 	if (last.name[last.len] && !want_dir)
-		create_flags = 0;
+		create_flags &= ~LOOKUP_CREATE;
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	dentry = lookup_one_qstr_excl(&last, path->dentry,
 				      reval_flag | create_flags);
 	if (IS_ERR(dentry))
 		goto unlock;
 
-	error = -EEXIST;
-	if (d_is_positive(dentry))
-		goto fail;
-
-	/*
-	 * Special case - lookup gave negative, but... we had foo/bar/
-	 * From the vfs_mknod() POV we just have a negative dentry -
-	 * all is fine. Let's be bastards - you had / on the end, you've
-	 * been asking for (non-existent) directory. -ENOENT for you.
-	 */
-	if (unlikely(!create_flags)) {
-		error = -ENOENT;
-		goto fail;
-	}
 	if (unlikely(err2)) {
 		error = err2;
 		goto fail;
@@ -4444,10 +4437,6 @@ int do_rmdir(int dfd, struct filename *name)
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit3;
-	if (!dentry->d_inode) {
-		error = -ENOENT;
-		goto exit4;
-	}
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit4;
@@ -4578,7 +4567,7 @@ int do_unlinkat(int dfd, struct filename *name)
 	if (!IS_ERR(dentry)) {
 
 		/* Why not before? Because we want correct error value */
-		if (last.name[last.len] || d_is_negative(dentry))
+		if (last.name[last.len])
 			goto slashes;
 		inode = dentry->d_inode;
 		ihold(inode);
@@ -4612,9 +4601,7 @@ int do_unlinkat(int dfd, struct filename *name)
 	return error;
 
 slashes:
-	if (d_is_negative(dentry))
-		error = -ENOENT;
-	else if (d_is_dir(dentry))
+	if (d_is_dir(dentry))
 		error = -EISDIR;
 	else
 		error = -ENOTDIR;
@@ -5114,7 +5101,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	struct qstr old_last, new_last;
 	int old_type, new_type;
 	struct inode *delegated_inode = NULL;
-	unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
+	unsigned int lookup_flags = 0, target_flags =
+		LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
 	bool should_retry = false;
 	int error = -EINVAL;
 
@@ -5127,6 +5115,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 
 	if (flags & RENAME_EXCHANGE)
 		target_flags = 0;
+	if (flags & RENAME_NOREPLACE)
+		target_flags |= LOOKUP_EXCL;
 
 retry:
 	error = filename_parentat(olddfd, from, lookup_flags, &old_path,
@@ -5168,23 +5158,12 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	error = PTR_ERR(old_dentry);
 	if (IS_ERR(old_dentry))
 		goto exit3;
-	/* source must exist */
-	error = -ENOENT;
-	if (d_is_negative(old_dentry))
-		goto exit4;
 	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
 					  lookup_flags | target_flags);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto exit4;
-	error = -EEXIST;
-	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
-		goto exit5;
 	if (flags & RENAME_EXCHANGE) {
-		error = -ENOENT;
-		if (d_is_negative(new_dentry))
-			goto exit5;
-
 		if (!d_is_dir(new_dentry)) {
 			error = -ENOTDIR;
 			if (new_last.name[new_last.len])
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2b04038b0e40..56cf16a72334 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1532,7 +1532,8 @@ static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
 {
 	if (NFS_PROTO(dir)->version == 2)
 		return 0;
-	return flags & LOOKUP_EXCL;
+	return (flags & (LOOKUP_CREATE | LOOKUP_EXCL)) ==
+		(LOOKUP_CREATE | LOOKUP_EXCL);
 }
 
 /*
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 6890016e1923..fe29acef5872 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -113,11 +113,6 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf,
 	if (IS_ERR(d))
 		goto err_out;
 
-	if (d_is_negative(d)) {
-		dput(d);
-		goto err_out;
-	}
-
 	path->dentry = d;
 	path->mnt = mntget(parent_path->mnt);
 
@@ -693,6 +688,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 	struct ksmbd_file *parent_fp;
 	int new_type;
 	int err, lookup_flags = LOOKUP_NO_SYMLINKS;
+	int target_lookup_flags = LOOKUP_RENAME_TARGET;
 
 	if (ksmbd_override_fsids(work))
 		return -ENOMEM;
@@ -703,6 +699,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 		goto revert_fsids;
 	}
 
+	/*
+	 * explicitly handle file overwrite case, for compatibility with
+	 * filesystems that may not support rename flags (e.g: fuse)
+	 */
+	if (flags & RENAME_NOREPLACE)
+		target_lookup_flags |= LOOKUP_EXCL;
+	flags &= ~(RENAME_NOREPLACE);
+
 retry:
 	err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
 				     &new_path, &new_last, &new_type,
@@ -743,7 +747,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 	}
 
 	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
-					  lookup_flags | LOOKUP_RENAME_TARGET);
+					  lookup_flags | target_lookup_flags);
 	if (IS_ERR(new_dentry)) {
 		err = PTR_ERR(new_dentry);
 		goto out3;
@@ -754,16 +758,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 		goto out4;
 	}
 
-	/*
-	 * explicitly handle file overwrite case, for compatibility with
-	 * filesystems that may not support rename flags (e.g: fuse)
-	 */
-	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) {
-		err = -EEXIST;
-		goto out4;
-	}
-	flags &= ~(RENAME_NOREPLACE);
-
 	if (old_child == trap) {
 		err = -EINVAL;
 		goto out4;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
@ 2025-02-07  3:46   ` Kent Overstreet
  2025-02-07  4:53     ` NeilBrown
  2025-02-07 19:09   ` Paul Moore
  1 sibling, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2025-02-07  3:46 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> No callers of kern_path_locked() or user_path_locked_at() want a
> negative dentry.  So change them to return -ENOENT instead.  This
> simplifies callers.
> 
> This results in a subtle change to bcachefs in that an ioctl will now
> return -ENOENT in preference to -EXDEV.  I believe this restores the
> behaviour to what it was prior to

I'm not following how the code change matches the commit message?

>  Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index 15725b4ce393..595b57fabc9a 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
>  		ret = -EXDEV;
>  		goto err;
>  	}
> -	if (!d_is_positive(victim)) {
> -		ret = -ENOENT;
> -		goto err;
> -	}
>  	ret = __bch2_unlink(dir, victim, true);
>  	if (!ret) {
>  		fsnotify_rmdir(dir, victim);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  3:46   ` Kent Overstreet
@ 2025-02-07  4:53     ` NeilBrown
  2025-02-07  6:13       ` Kent Overstreet
  2025-02-07  6:53       ` [PATCH 1/2] " NeilBrown
  0 siblings, 2 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  4:53 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > No callers of kern_path_locked() or user_path_locked_at() want a
> > negative dentry.  So change them to return -ENOENT instead.  This
> > simplifies callers.
> > 
> > This results in a subtle change to bcachefs in that an ioctl will now
> > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > behaviour to what it was prior to
> 
> I'm not following how the code change matches the commit message?

Maybe it doesn't.  Let me checked.

Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
-EXDEV.

-ENOENT is returned if the path named in arg.dst_ptr cannot be found.
-EXDEV is returned if the filesystem on which that path exists is not
 the one that the ioctl is called on.

If the target filesystem is "/foo" and the path given is "/bar/baz" and
/bar exists but /bar/baz does not, then user_path_locked_at or
user_path_at will return a negative dentry corresponding to the
(non-existent) name "baz" in /bar.

In this case the dentry exists so the filesystem on which it was found
can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
are credible return values.


- before bbe6a7c899e7 the -EXDEV is tested immediately after the call
  to user_path_att() so there is no chance that ENOENT will be returned.
  I cannot actually find where ENOENT could be returned ...  but that
  doesn't really matter now.

- after that patch .... again the -EXDEV test comes first. That isn't
  what I remember.  I must have misread it.

- after my patch user_path_locked_at() will return -ENOENT if the whole
  name cannot be found.  So now you get -ENOENT instead of -EXDEV.

So with my patch, ENOENT always wins, and it was never like that before.
Thanks for challenging me!

Do you think there could be a problem with changing the error returned
in this circumstance? i.e. if you try to destroy a subvolume with a
non-existant name on a different filesystem could getting -ENOENT
instead of -EXDEV be noticed?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  4:53     ` NeilBrown
@ 2025-02-07  6:13       ` Kent Overstreet
  2025-02-07  6:34         ` NeilBrown
  2025-02-07  6:51         ` [PATCH 1/2 v2] " NeilBrown
  2025-02-07  6:53       ` [PATCH 1/2] " NeilBrown
  1 sibling, 2 replies; 27+ messages in thread
From: Kent Overstreet @ 2025-02-07  6:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > negative dentry.  So change them to return -ENOENT instead.  This
> > > simplifies callers.
> > > 
> > > This results in a subtle change to bcachefs in that an ioctl will now
> > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > behaviour to what it was prior to
> > 
> > I'm not following how the code change matches the commit message?
> 
> Maybe it doesn't.  Let me checked.
> 
> Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> -EXDEV.
> 
> -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> -EXDEV is returned if the filesystem on which that path exists is not
>  the one that the ioctl is called on.
> 
> If the target filesystem is "/foo" and the path given is "/bar/baz" and
> /bar exists but /bar/baz does not, then user_path_locked_at or
> user_path_at will return a negative dentry corresponding to the
> (non-existent) name "baz" in /bar.
> 
> In this case the dentry exists so the filesystem on which it was found
> can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> are credible return values.
> 
> 
> - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
>   to user_path_att() so there is no chance that ENOENT will be returned.
>   I cannot actually find where ENOENT could be returned ...  but that
>   doesn't really matter now.
> 
> - after that patch .... again the -EXDEV test comes first. That isn't
>   what I remember.  I must have misread it.
> 
> - after my patch user_path_locked_at() will return -ENOENT if the whole
>   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> 
> So with my patch, ENOENT always wins, and it was never like that before.
> Thanks for challenging me!

How do you always manage to be unfailingly polite? :)

> 
> Do you think there could be a problem with changing the error returned
> in this circumstance? i.e. if you try to destroy a subvolume with a
> non-existant name on a different filesystem could getting -ENOENT
> instead of -EXDEV be noticed?

-EXDEV is the standard error code for "we're crossing a filesystem
boundary and we can't or aren't supposed to be", so no, let's not change
that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  6:13       ` Kent Overstreet
@ 2025-02-07  6:34         ` NeilBrown
  2025-02-07  6:51           ` Kent Overstreet
  2025-02-07  6:51         ` [PATCH 1/2 v2] " NeilBrown
  1 sibling, 1 reply; 27+ messages in thread
From: NeilBrown @ 2025-02-07  6:34 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > > negative dentry.  So change them to return -ENOENT instead.  This
> > > > simplifies callers.
> > > > 
> > > > This results in a subtle change to bcachefs in that an ioctl will now
> > > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > > behaviour to what it was prior to
> > > 
> > > I'm not following how the code change matches the commit message?
> > 
> > Maybe it doesn't.  Let me checked.
> > 
> > Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> > which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> > -EXDEV.
> > 
> > -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> > -EXDEV is returned if the filesystem on which that path exists is not
> >  the one that the ioctl is called on.
> > 
> > If the target filesystem is "/foo" and the path given is "/bar/baz" and
> > /bar exists but /bar/baz does not, then user_path_locked_at or
> > user_path_at will return a negative dentry corresponding to the
> > (non-existent) name "baz" in /bar.
> > 
> > In this case the dentry exists so the filesystem on which it was found
> > can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> > are credible return values.
> > 
> > 
> > - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
> >   to user_path_att() so there is no chance that ENOENT will be returned.
> >   I cannot actually find where ENOENT could be returned ...  but that
> >   doesn't really matter now.
> > 
> > - after that patch .... again the -EXDEV test comes first. That isn't
> >   what I remember.  I must have misread it.
> > 
> > - after my patch user_path_locked_at() will return -ENOENT if the whole
> >   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> > 
> > So with my patch, ENOENT always wins, and it was never like that before.
> > Thanks for challenging me!
> 
> How do you always manage to be unfailingly polite? :)

My dad impressed the value of politeness on me at an early age.  It is
an effective way of manipulating people!

> 
> > 
> > Do you think there could be a problem with changing the error returned
> > in this circumstance? i.e. if you try to destroy a subvolume with a
> > non-existant name on a different filesystem could getting -ENOENT
> > instead of -EXDEV be noticed?
> 
> -EXDEV is the standard error code for "we're crossing a filesystem
> boundary and we can't or aren't supposed to be", so no, let's not change
> that.
> 

OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
be too hard.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  6:34         ` NeilBrown
@ 2025-02-07  6:51           ` Kent Overstreet
  2025-02-07  7:30             ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2025-02-07  6:51 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > Do you think there could be a problem with changing the error returned
> > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > non-existant name on a different filesystem could getting -ENOENT
> > > instead of -EXDEV be noticed?
> > 
> > -EXDEV is the standard error code for "we're crossing a filesystem
> > boundary and we can't or aren't supposed to be", so no, let's not change
> > that.
> > 
> 
> OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> be too hard.

Hang on, why does that require keeping user_path_locked_at()? Just
compare i_sb...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2 v2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  6:13       ` Kent Overstreet
  2025-02-07  6:34         ` NeilBrown
@ 2025-02-07  6:51         ` NeilBrown
  1 sibling, 0 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  6:51 UTC (permalink / raw)
  To: Kent Overstreet, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-fsdevel


No callers of kern_path_locked() want a negative dentry.  So change it
to return -ENOENT instead.  This simplifies callers.

user_path_locked() needs to still return a negative dentry.  The only
caller of user_path_locked() wants a negative dentry so it can give the
correct -EXDEV error when given a path on a different filesystem even
when the final component of the path doesn't exist.

Signed-off-by: NeilBrown <neilb@suse.de>
---

This is an alternate version of 1/2 which doesn't change
user_path_locked_at() or bcachefs.  I'm only sending it to VFS emails
and Kent.
Please let me know if it is OK but you would rather I resent the whole
series. 

Thanks,
NeilBrown


 drivers/base/devtmpfs.c | 65 +++++++++++++++++++----------------------
 fs/namei.c              | 13 +++++++--
 kernel/audit_watch.c    | 12 ++++----
 3 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index b848764ef018..c9e34842139f 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -245,15 +245,12 @@ static int dev_rmdir(const char *name)
 	dentry = kern_path_locked(name, &parent);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
-	if (d_really_is_positive(dentry)) {
-		if (d_inode(dentry)->i_private == &thread)
-			err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
-					dentry);
-		else
-			err = -EPERM;
-	} else {
-		err = -ENOENT;
-	}
+	if (d_inode(dentry)->i_private == &thread)
+		err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
+				dentry);
+	else
+		err = -EPERM;
+
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
 	path_put(&parent);
@@ -310,6 +307,8 @@ static int handle_remove(const char *nodename, struct device *dev)
 {
 	struct path parent;
 	struct dentry *dentry;
+	struct kstat stat;
+	struct path p;
 	int deleted = 0;
 	int err;
 
@@ -317,32 +316,28 @@ static int handle_remove(const char *nodename, struct device *dev)
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	if (d_really_is_positive(dentry)) {
-		struct kstat stat;
-		struct path p = {.mnt = parent.mnt, .dentry = dentry};
-		err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
-				  AT_STATX_SYNC_AS_STAT);
-		if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
-			struct iattr newattrs;
-			/*
-			 * before unlinking this node, reset permissions
-			 * of possible references like hardlinks
-			 */
-			newattrs.ia_uid = GLOBAL_ROOT_UID;
-			newattrs.ia_gid = GLOBAL_ROOT_GID;
-			newattrs.ia_mode = stat.mode & ~0777;
-			newattrs.ia_valid =
-				ATTR_UID|ATTR_GID|ATTR_MODE;
-			inode_lock(d_inode(dentry));
-			notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
-			inode_unlock(d_inode(dentry));
-			err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
-					 dentry, NULL);
-			if (!err || err == -ENOENT)
-				deleted = 1;
-		}
-	} else {
-		err = -ENOENT;
+	p.mnt = parent.mnt;
+	p.dentry = dentry;
+	err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
+			  AT_STATX_SYNC_AS_STAT);
+	if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
+		struct iattr newattrs;
+		/*
+		 * before unlinking this node, reset permissions
+		 * of possible references like hardlinks
+		 */
+		newattrs.ia_uid = GLOBAL_ROOT_UID;
+		newattrs.ia_gid = GLOBAL_ROOT_GID;
+		newattrs.ia_mode = stat.mode & ~0777;
+		newattrs.ia_valid =
+			ATTR_UID|ATTR_GID|ATTR_MODE;
+		inode_lock(d_inode(dentry));
+		notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
+		inode_unlock(d_inode(dentry));
+		err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
+				 dentry, NULL);
+		if (!err || err == -ENOENT)
+			deleted = 1;
 	}
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
diff --git a/fs/namei.c b/fs/namei.c
index 3a4039acdb3f..e53427083342 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2721,7 +2721,8 @@ static int filename_parentat(int dfd, struct filename *name,
 }
 
 /* does lookup, returns the object with parent locked */
-static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
+static struct dentry *__kern_path_locked(int dfd, struct filename *name,
+					 struct path *path, bool require_positive)
 {
 	struct dentry *d;
 	struct qstr last;
@@ -2736,6 +2737,10 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	d = lookup_one_qstr_excl(&last, path->dentry, 0);
+	if (!IS_ERR(d) && d_is_negative(d) && !require_positive) {
+		dput(d);
+		d = ERR_PTR(-ENOENT);
+	}
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
@@ -2743,19 +2748,21 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	return d;
 }
 
+/* kern_path_locked() always returns a positive dentry or an error */
 struct dentry *kern_path_locked(const char *name, struct path *path)
 {
 	struct filename *filename = getname_kernel(name);
-	struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
+	struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path, true);
 
 	putname(filename);
 	return res;
 }
 
+/* user_path_locks() may return a negative dentry */
 struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
 {
 	struct filename *filename = getname(name);
-	struct dentry *res = __kern_path_locked(dfd, filename, path);
+	struct dentry *res = __kern_path_locked(dfd, filename, path, false);
 
 	putname(filename);
 	return res;
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 7f358740e958..e3130675ee6b 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
 	struct dentry *d = kern_path_locked(watch->path, parent);
 	if (IS_ERR(d))
 		return PTR_ERR(d);
-	if (d_is_positive(d)) {
-		/* update watch filter fields */
-		watch->dev = d->d_sb->s_dev;
-		watch->ino = d_backing_inode(d)->i_ino;
-	}
+	/* update watch filter fields */
+	watch->dev = d->d_sb->s_dev;
+	watch->ino = d_backing_inode(d)->i_ino;
+
 	inode_unlock(d_backing_inode(parent->dentry));
 	dput(d);
 	return 0;
@@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 	/* caller expects mutex locked */
 	mutex_lock(&audit_filter_mutex);
 
-	if (ret) {
+	if (ret && ret != -ENOENT) {
 		audit_put_watch(watch);
 		return ret;
 	}
@@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 
 	h = audit_hash_ino((u32)watch->ino);
 	*list = &audit_inode_hash[h];
+	ret = 0;
 error:
 	path_put(&parent_path);
 	audit_put_watch(watch);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  4:53     ` NeilBrown
  2025-02-07  6:13       ` Kent Overstreet
@ 2025-02-07  6:53       ` NeilBrown
  1 sibling, 0 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-07  6:53 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, 07 Feb 2025, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > negative dentry.  So change them to return -ENOENT instead.  This
> > > simplifies callers.
> > > 
> > > This results in a subtle change to bcachefs in that an ioctl will now
> > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > behaviour to what it was prior to
> > 
> > I'm not following how the code change matches the commit message?
> 
> Maybe it doesn't.  Let me checked.
> 
> Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> -EXDEV.
> 
> -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> -EXDEV is returned if the filesystem on which that path exists is not
>  the one that the ioctl is called on.
> 
> If the target filesystem is "/foo" and the path given is "/bar/baz" and
> /bar exists but /bar/baz does not, then user_path_locked_at or
> user_path_at will return a negative dentry corresponding to the
> (non-existent) name "baz" in /bar.
> 
> In this case the dentry exists so the filesystem on which it was found
> can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> are credible return values.
> 
> 
> - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
>   to user_path_at() so there is no chance that ENOENT will be returned.
>   I cannot actually find where ENOENT could be returned ...  but that
>   doesn't really matter now.

No, that's not right.  user_path_at() never returns a negative dentry.
It isn't documented and I tried reading the code instead of looking at
callers.

So before that commit it DID return ENOENT rather than -EXDEV where as
after that commit it returns -EXDEV rather than -ENOENT.

So my patch IS restoring the old behaviour.  Are you *sure* you want
-EXDEV when given a non-existent path?

Thanks,
NeilBrown


> 
> - after that patch .... again the -EXDEV test comes first. That isn't
>   what I remember.  I must have misread it.
> 
> - after my patch user_path_locked_at() will return -ENOENT if the whole
>   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> 
> So with my patch, ENOENT always wins, and it was never like that before.
> Thanks for challenging me!
> 
> Do you think there could be a problem with changing the error returned
> in this circumstance? i.e. if you try to destroy a subvolume with a
> non-existant name on a different filesystem could getting -ENOENT
> instead of -EXDEV be noticed?
> 
> Thanks,
> NeilBrown
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  6:51           ` Kent Overstreet
@ 2025-02-07  7:30             ` NeilBrown
  2025-02-07 13:35               ` Kent Overstreet
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2025-02-07  7:30 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > Do you think there could be a problem with changing the error returned
> > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > non-existant name on a different filesystem could getting -ENOENT
> > > > instead of -EXDEV be noticed?
> > > 
> > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > that.
> > > 
> > 
> > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > be too hard.
> 
> Hang on, why does that require keeping user_path_locked_at()? Just
> compare i_sb...
> 

I changed user_path_locked_at() to not return a dentry at all when the
full path couldn't be found.  If there is no dentry, then there is no
->d_sb.
(if there was an ->i_sb, there would be an inode and this all wouldn't
be an issue).

To recap: the difference happens if the path DOESN'T exist but the
parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
case and the error code shouldn't matter.  But I had to ask...

NeilBrown


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  7:30             ` NeilBrown
@ 2025-02-07 13:35               ` Kent Overstreet
  2025-02-10  1:20                 ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2025-02-07 13:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > Do you think there could be a problem with changing the error returned
> > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > instead of -EXDEV be noticed?
> > > > 
> > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > that.
> > > > 
> > > 
> > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > be too hard.
> > 
> > Hang on, why does that require keeping user_path_locked_at()? Just
> > compare i_sb...
> > 
> 
> I changed user_path_locked_at() to not return a dentry at all when the
> full path couldn't be found.  If there is no dentry, then there is no
> ->d_sb.
> (if there was an ->i_sb, there would be an inode and this all wouldn't
> be an issue).
> 
> To recap: the difference happens if the path DOESN'T exist but the
> parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> case and the error code shouldn't matter.  But I had to ask...

Ahh...

Well, if I've scanned the series correctly (sorry, we're on different
timezones and I haven't had much caffeine yet) I hope you don't have to
keep that function just for bcachefs - but I do think the error code is
important.

Userspace getting -ENOENT and reporting -ENOENT to the user will
inevitably lead to head banging frustration by someone, somewhere, when
they're trying to delete something and the system is tell them it
doesn't exist when they can see it very much does exist, right there :)
the more precise error code is a very helpful cue...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
  2025-02-07  3:46   ` Kent Overstreet
@ 2025-02-07 19:09   ` Paul Moore
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Moore @ 2025-02-07 19:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Eric Paris, linux-kernel,
	linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit

On Thu, Feb 6, 2025 at 10:41 PM NeilBrown <neilb@suse.de> wrote:
>
> No callers of kern_path_locked() or user_path_locked_at() want a
> negative dentry.  So change them to return -ENOENT instead.  This
> simplifies callers.
>
> This results in a subtle change to bcachefs in that an ioctl will now
> return -ENOENT in preference to -EXDEV.  I believe this restores the
> behaviour to what it was prior to
>  Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/base/devtmpfs.c | 65 +++++++++++++++++++----------------------
>  fs/bcachefs/fs-ioctl.c  |  4 ---
>  fs/namei.c              |  4 +++
>  kernel/audit_watch.c    | 12 ++++----
>  4 files changed, 40 insertions(+), 45 deletions(-)

...

> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 7f358740e958..e3130675ee6b 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
>         struct dentry *d = kern_path_locked(watch->path, parent);
>         if (IS_ERR(d))
>                 return PTR_ERR(d);
> -       if (d_is_positive(d)) {
> -               /* update watch filter fields */
> -               watch->dev = d->d_sb->s_dev;
> -               watch->ino = d_backing_inode(d)->i_ino;
> -       }
> +       /* update watch filter fields */
> +       watch->dev = d->d_sb->s_dev;
> +       watch->ino = d_backing_inode(d)->i_ino;
> +
>         inode_unlock(d_backing_inode(parent->dentry));
>         dput(d);
>         return 0;
> @@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
>         /* caller expects mutex locked */
>         mutex_lock(&audit_filter_mutex);
>
> -       if (ret) {
> +       if (ret && ret != -ENOENT) {
>                 audit_put_watch(watch);
>                 return ret;
>         }
> @@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
>
>         h = audit_hash_ino((u32)watch->ino);
>         *list = &audit_inode_hash[h];
> +       ret = 0;

If you have to respin this patch for any reason I'd prefer to move the
'ret = 0' up to just after the if block you're modifying in the chunk
above, but that's a trivial nitpick so please don't respin only for
that.  Otherwise it looks fine to me from an audit perspective.

Acked-by: Paul Moore <paul@paul-moore.com> (Audit)

>  error:
>         path_put(&parent_path);
>         audit_put_watch(watch);
> --
> 2.47.1

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-07 13:35               ` Kent Overstreet
@ 2025-02-10  1:20                 ` NeilBrown
  2025-02-10 16:33                   ` Kent Overstreet
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2025-02-10  1:20 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Sat, 08 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > Do you think there could be a problem with changing the error returned
> > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > instead of -EXDEV be noticed?
> > > > > 
> > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > that.
> > > > > 
> > > > 
> > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > be too hard.
> > > 
> > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > compare i_sb...
> > > 
> > 
> > I changed user_path_locked_at() to not return a dentry at all when the
> > full path couldn't be found.  If there is no dentry, then there is no
> > ->d_sb.
> > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > be an issue).
> > 
> > To recap: the difference happens if the path DOESN'T exist but the
> > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > case and the error code shouldn't matter.  But I had to ask...
> 
> Ahh...
> 
> Well, if I've scanned the series correctly (sorry, we're on different
> timezones and I haven't had much caffeine yet) I hope you don't have to
> keep that function just for bcachefs - but I do think the error code is
> important.
> 
> Userspace getting -ENOENT and reporting -ENOENT to the user will
> inevitably lead to head banging frustration by someone, somewhere, when
> they're trying to delete something and the system is tell them it
> doesn't exist when they can see it very much does exist, right there :)
> the more precise error code is a very helpful cue...
> 

???
You will only get -ENOENT if there is no ent.  There is no question of a
confusing error message.
If you ask for a non-exist name on the correct filesystem, you get -ENOENT
If you ask for an existing name of the wrong filesystem, you get -EXDEV
That all works as expected and always has.

But what if you ask for a non-existing name in a directory on the
wrong filesystem?  
The code you originally wrote in 42d237320e9817a9 would return
-ENOENT because that it what user_path_at() would return.
But using user_path_at() is "wrong" because it doesn't lock the directory
so ->d_parent is not guaranteed to be stable.
Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
that doesn't check for a negative dentry so Al added a check to return
-ENOENT, but that was added *after* the test that returns -EXDEV.

So now if you call subvolume_destroy on a non-existing name in a
directory on the wrong filesystem, you get -EXDEV.  I think that is
a bit weird but not a lot weird.
My patch will change it back to -ENOENT - the way you originally wrote
it.

I hope you are ok with that.

NeilBrown


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] VFS: minor improvements to a couple of interfaces
  2025-02-07  3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown
  2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
  2025-02-07  3:36 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
@ 2025-02-10  8:25 ` Christian Brauner
  2025-02-10  8:42   ` Al Viro
  2 siblings, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2025-02-10  8:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Kent Overstreet, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit, Alexander Viro,
	Jan Kara

On Fri, 07 Feb 2025 14:36:46 +1100, NeilBrown wrote:
>  I found these opportunities for simplification as part of my work to
>  enhance filesystem directory operations to not require an exclusive
>  lock on the directory.
>  There are quite a collection of users of these interfaces incluing NFS,
>  smb/server, bcachefs, devtmpfs, and audit.  Hence the long Cc line.
> 
> NeilBrown
> 
> [...]

I've taken your first cleanup. Thanks for splitting those out of the
other series.

---

Applied to the vfs-6.15.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.misc

[1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
      https://git.kernel.org/vfs/vfs/c/2ebf0c6b48d8
[2/2] VFS: add common error checks to lookup_one_qstr_excl()
      https://git.kernel.org/vfs/vfs/c/4b3c043c69bc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] VFS: minor improvements to a couple of interfaces
  2025-02-10  8:25 ` [PATCH 0/2] VFS: minor improvements to a couple of interfaces Christian Brauner
@ 2025-02-10  8:42   ` Al Viro
  2025-02-10  9:41     ` Christian Brauner
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2025-02-10  8:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: NeilBrown, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Kent Overstreet, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit, Jan Kara

On Mon, Feb 10, 2025 at 09:25:26AM +0100, Christian Brauner wrote:
> On Fri, 07 Feb 2025 14:36:46 +1100, NeilBrown wrote:
> >  I found these opportunities for simplification as part of my work to
> >  enhance filesystem directory operations to not require an exclusive
> >  lock on the directory.
> >  There are quite a collection of users of these interfaces incluing NFS,
> >  smb/server, bcachefs, devtmpfs, and audit.  Hence the long Cc line.
> > 
> > NeilBrown
> > 
> > [...]
> 
> I've taken your first cleanup. Thanks for splitting those out of the
> other series.
> 
> ---
> 
> Applied to the vfs-6.15.misc branch of the vfs/vfs.git tree.
> Patches in the vfs-6.15.misc branch should appear in linux-next soon.

Might be better to put it into a separate branch, so that further
work in that direction wouldn't be mixed with other stuff...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] VFS: minor improvements to a couple of interfaces
  2025-02-10  8:42   ` Al Viro
@ 2025-02-10  9:41     ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2025-02-10  9:41 UTC (permalink / raw)
  To: Al Viro
  Cc: NeilBrown, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Kent Overstreet, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit, Jan Kara

On Mon, Feb 10, 2025 at 08:42:46AM +0000, Al Viro wrote:
> On Mon, Feb 10, 2025 at 09:25:26AM +0100, Christian Brauner wrote:
> > On Fri, 07 Feb 2025 14:36:46 +1100, NeilBrown wrote:
> > >  I found these opportunities for simplification as part of my work to
> > >  enhance filesystem directory operations to not require an exclusive
> > >  lock on the directory.
> > >  There are quite a collection of users of these interfaces incluing NFS,
> > >  smb/server, bcachefs, devtmpfs, and audit.  Hence the long Cc line.
> > > 
> > > NeilBrown
> > > 
> > > [...]
> > 
> > I've taken your first cleanup. Thanks for splitting those out of the
> > other series.
> > 
> > ---
> > 
> > Applied to the vfs-6.15.misc branch of the vfs/vfs.git tree.
> > Patches in the vfs-6.15.misc branch should appear in linux-next soon.
> 
> Might be better to put it into a separate branch, so that further
> work in that direction wouldn't be mixed with other stuff...

Yep, good call:

vfs-6.15.async.dir

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-10  1:20                 ` NeilBrown
@ 2025-02-10 16:33                   ` Kent Overstreet
  2025-02-12  3:24                     ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2025-02-10 16:33 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote:
> On Sat, 08 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > > Do you think there could be a problem with changing the error returned
> > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > > instead of -EXDEV be noticed?
> > > > > > 
> > > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > > that.
> > > > > > 
> > > > > 
> > > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > > be too hard.
> > > > 
> > > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > > compare i_sb...
> > > > 
> > > 
> > > I changed user_path_locked_at() to not return a dentry at all when the
> > > full path couldn't be found.  If there is no dentry, then there is no
> > > ->d_sb.
> > > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > > be an issue).
> > > 
> > > To recap: the difference happens if the path DOESN'T exist but the
> > > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > > case and the error code shouldn't matter.  But I had to ask...
> > 
> > Ahh...
> > 
> > Well, if I've scanned the series correctly (sorry, we're on different
> > timezones and I haven't had much caffeine yet) I hope you don't have to
> > keep that function just for bcachefs - but I do think the error code is
> > important.
> > 
> > Userspace getting -ENOENT and reporting -ENOENT to the user will
> > inevitably lead to head banging frustration by someone, somewhere, when
> > they're trying to delete something and the system is tell them it
> > doesn't exist when they can see it very much does exist, right there :)
> > the more precise error code is a very helpful cue...
> > 
> 
> ???
> You will only get -ENOENT if there is no ent.  There is no question of a
> confusing error message.
> If you ask for a non-exist name on the correct filesystem, you get -ENOENT
> If you ask for an existing name of the wrong filesystem, you get -EXDEV
> That all works as expected and always has.
> 
> But what if you ask for a non-existing name in a directory on the
> wrong filesystem?  
> The code you originally wrote in 42d237320e9817a9 would return
> -ENOENT because that it what user_path_at() would return.

Ahh - ok, I think I see where I misread before

> But using user_path_at() is "wrong" because it doesn't lock the directory
> so ->d_parent is not guaranteed to be stable.
> Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
> that doesn't check for a negative dentry so Al added a check to return
> -ENOENT, but that was added *after* the test that returns -EXDEV.
> 
> So now if you call subvolume_destroy on a non-existing name in a
> directory on the wrong filesystem, you get -EXDEV.  I think that is
> a bit weird but not a lot weird.

Yeah, we don't need to preserve that. As long as calling it on a name
that _does_ exist on a different filesystem returns -EXDEV, that's all I
care about.

So assuming that's the case you can go ahead and add my acked-by...

Nit: I would go back and stare at the patch some more, but threading got
completely fubar so I can't find anything. Doh.

> My patch will change it back to -ENOENT - the way you originally wrote
> it.
> 
> I hope you are ok with that.

Yes, sounds good.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-07  3:36 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
@ 2025-02-12  2:50   ` kernel test robot
  2025-02-12  3:16   ` Al Viro
  1 sibling, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-02-12  2:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: oe-lkp, lkp, linux-fsdevel, linux-nfs, linux-cifs,
	Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, audit, oliver.sang



Hello,

kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:

commit: 9a292bc4cbb25ca84f90dbacdf3064a9d6e7804f ("[PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/VFS-change-kern_path_locked-and-user_path_locked_at-to-never-return-negative-dentry/20250207-185417
base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/all/20250207034040.3402438-3-neilb@suse.de/
patch subject: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()

in testcase: trinity
version: 
with following parameters:

	runtime: 300s
	group: group-01
	nr_groups: 5



config: i386-randconfig-053-20250208
compiler: clang-19
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502121009.f7bc8e67-lkp@intel.com


[  163.118842][ T4188] BUG: unable to handle page fault for address: fffffffe
[  163.119485][ T4188] #PF: supervisor read access in kernel mode
[  163.120015][ T4188] #PF: error_code(0x0000) - not-present page
[  163.120523][ T4188] *pde = 026d3067 *pte = 00000000
[  163.120971][ T4188] Oops: Oops: 0000 [#1]
[  163.121339][ T4188] CPU: 0 UID: 65534 PID: 4188 Comm: trinity-c3 Tainted: G S                 6.14.0-rc1-00084-g9a292bc4cbb2 #1
[  163.122321][ T4188] Tainted: [S]=CPU_OUT_OF_SPEC
[  163.122717][ T4188] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 163.123520][ T4188] EIP: lookup_one_qstr_excl (include/linux/dcache.h:416 include/linux/dcache.h:421 include/linux/dcache.h:467 fs/namei.c:1696) 
[ 163.123973][ T4188] Code: 5e 89 d8 89 fa e8 62 ed 00 00 85 c0 74 58 8b 7e 18 89 c2 89 f0 89 d6 8b 4d f0 ff 17 85 c0 75 4d 89 f0 8b 75 f0 b9 00 00 38 00 <23> 08 89 f2 81 e2 00 00 02 00 09 ca 74 13 f7 c6 00 00 04 00 74 17
All code
========
   0:	5e                   	pop    %rsi
   1:	89 d8                	mov    %ebx,%eax
   3:	89 fa                	mov    %edi,%edx
   5:	e8 62 ed 00 00       	call   0xed6c
   a:	85 c0                	test   %eax,%eax
   c:	74 58                	je     0x66
   e:	8b 7e 18             	mov    0x18(%rsi),%edi
  11:	89 c2                	mov    %eax,%edx
  13:	89 f0                	mov    %esi,%eax
  15:	89 d6                	mov    %edx,%esi
  17:	8b 4d f0             	mov    -0x10(%rbp),%ecx
  1a:	ff 17                	call   *(%rdi)
  1c:	85 c0                	test   %eax,%eax
  1e:	75 4d                	jne    0x6d
  20:	89 f0                	mov    %esi,%eax
  22:	8b 75 f0             	mov    -0x10(%rbp),%esi
  25:	b9 00 00 38 00       	mov    $0x380000,%ecx
  2a:*	23 08                	and    (%rax),%ecx		<-- trapping instruction
  2c:	89 f2                	mov    %esi,%edx
  2e:	81 e2 00 00 02 00    	and    $0x20000,%edx
  34:	09 ca                	or     %ecx,%edx
  36:	74 13                	je     0x4b
  38:	f7 c6 00 00 04 00    	test   $0x40000,%esi
  3e:	74 17                	je     0x57

Code starting with the faulting instruction
===========================================
   0:	23 08                	and    (%rax),%ecx
   2:	89 f2                	mov    %esi,%edx
   4:	81 e2 00 00 02 00    	and    $0x20000,%edx
   a:	09 ca                	or     %ecx,%edx
   c:	74 13                	je     0x21
   e:	f7 c6 00 00 04 00    	test   $0x40000,%esi
  14:	74 17                	je     0x2d
[  163.125537][ T4188] EAX: fffffffe EBX: c3e7e540 ECX: 00380000 EDX: 273874b2
[  163.126145][ T4188] ESI: 00060000 EDI: fffffffe EBP: ebef9ef4 ESP: ebef9edc
[  163.126716][ T4188] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010246
[  163.127324][ T4188] CR0: 80050033 CR2: fffffffe CR3: 2b2df000 CR4: 00040690
[  163.127872][ T4188] Call Trace:
[ 163.128178][ T4188] ? __die_body (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) 
[ 163.128552][ T4188] ? __die (arch/x86/kernel/dumpstack.c:434) 
[ 163.128898][ T4188] ? page_fault_oops (arch/x86/mm/fault.c:710) 
[ 163.129329][ T4188] ? lock_acquire (kernel/locking/lockdep.c:5851) 
[ 163.129723][ T4188] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:737) 
[ 163.130222][ T4188] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:784) 
[ 163.130687][ T4188] ? bad_area_nosemaphore (arch/x86/mm/fault.c:833) 
[ 163.131129][ T4188] ? do_kern_addr_fault (arch/x86/mm/fault.c:1197) 
[ 163.131569][ T4188] ? exc_page_fault (arch/x86/mm/fault.c:1479) 
[ 163.132004][ T4188] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:? kernel/locking/lockdep.c:4408) 
[ 163.132482][ T4188] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1493) 
[ 163.132999][ T4188] ? handle_exception (arch/x86/entry/entry_32.S:1048) 
[ 163.133429][ T4188] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1493) 
[ 163.133946][ T4188] ? lookup_one_qstr_excl (include/linux/dcache.h:416 include/linux/dcache.h:421 include/linux/dcache.h:467 fs/namei.c:1696) 
[ 163.134390][ T4188] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1493) 
[ 163.134919][ T4188] ? lookup_one_qstr_excl (include/linux/dcache.h:416 include/linux/dcache.h:421 include/linux/dcache.h:467 fs/namei.c:1696) 
[ 163.135354][ T4188] filename_create (include/linux/err.h:67 fs/namei.c:4091) 
[ 163.135763][ T4188] do_symlinkat (fs/namei.c:4676) 
[ 163.136166][ T4188] __ia32_sys_symlinkat (fs/namei.c:4696) 
[ 163.136593][ T4188] ia32_sys_call (arch/x86/entry/syscall_32.c:44) 
[ 163.136967][ T4188] __do_fast_syscall_32 (arch/x86/entry/common.c:?) 
[ 163.137377][ T4188] do_fast_syscall_32 (arch/x86/entry/common.c:411) 
[ 163.137774][ T4188] do_SYSENTER_32 (arch/x86/entry/common.c:449) 
[ 163.138146][ T4188] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836) 
[  163.138542][ T4188] EIP: 0xb7f61539
[ 163.138845][ T4188] Code: 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 0f 1f 00 58 b8 77 00 00 00 cd 80 90 0f 1f
All code
========
   0:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
   4:	10 07                	adc    %al,(%rdi)
   6:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
   a:	10 08                	adc    %cl,(%rax)
   c:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
	...
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:*	89 e5                	mov    %esp,%ebp		<-- trapping instruction
  26:	0f 34                	sysenter
  28:	cd 80                	int    $0x80
  2a:	5d                   	pop    %rbp
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	ret
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	0f 1f 00             	nopl   (%rax)
  35:	58                   	pop    %rax
  36:	b8 77 00 00 00       	mov    $0x77,%eax
  3b:	cd 80                	int    $0x80
  3d:	90                   	nop
  3e:	0f                   	.byte 0xf
  3f:	1f                   	(bad)

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	ret
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	0f 1f 00             	nopl   (%rax)
   b:	58                   	pop    %rax
   c:	b8 77 00 00 00       	mov    $0x77,%eax
  11:	cd 80                	int    $0x80
  13:	90                   	nop
  14:	0f                   	.byte 0xf
  15:	1f                   	(bad)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250212/202502121009.f7bc8e67-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-07  3:36 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
  2025-02-12  2:50   ` kernel test robot
@ 2025-02-12  3:16   ` Al Viro
  2025-02-12  3:25     ` Al Viro
  1 sibling, 1 reply; 27+ messages in thread
From: Al Viro @ 2025-02-12  3:16 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs,
	linux-cifs, audit

On Fri, Feb 07, 2025 at 02:36:48PM +1100, NeilBrown wrote:
> @@ -1690,6 +1692,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  		dput(dentry);
>  		dentry = old;
>  	}
> +found:

... and if ->lookup() returns an error, this will blow up (as bot has just
reported).

> +	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
> +		dput(dentry);
> +		return ERR_PTR(-ENOENT);
> +	}
> +	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
> +		dput(dentry);
> +		return ERR_PTR(-EEXIST);
> +	}


> @@ -4077,27 +4084,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
>  	 * '/', and a directory wasn't requested.
>  	 */
>  	if (last.name[last.len] && !want_dir)
> -		create_flags = 0;
> +		create_flags &= ~LOOKUP_CREATE;

See the patch I've posted in earlier thread; the entire "strip LOOKUP_CREATE"
thing is wrong.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  2025-02-10 16:33                   ` Kent Overstreet
@ 2025-02-12  3:24                     ` NeilBrown
  0 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-12  3:24 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs,
	linux-fsdevel, linux-nfs, linux-cifs, audit

On Tue, 11 Feb 2025, Kent Overstreet wrote:
> On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote:
> > On Sat, 08 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > > > Do you think there could be a problem with changing the error returned
> > > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > > > instead of -EXDEV be noticed?
> > > > > > > 
> > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > > > that.
> > > > > > > 
> > > > > > 
> > > > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > > > be too hard.
> > > > > 
> > > > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > > > compare i_sb...
> > > > > 
> > > > 
> > > > I changed user_path_locked_at() to not return a dentry at all when the
> > > > full path couldn't be found.  If there is no dentry, then there is no
> > > > ->d_sb.
> > > > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > > > be an issue).
> > > > 
> > > > To recap: the difference happens if the path DOESN'T exist but the
> > > > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > > > case and the error code shouldn't matter.  But I had to ask...
> > > 
> > > Ahh...
> > > 
> > > Well, if I've scanned the series correctly (sorry, we're on different
> > > timezones and I haven't had much caffeine yet) I hope you don't have to
> > > keep that function just for bcachefs - but I do think the error code is
> > > important.
> > > 
> > > Userspace getting -ENOENT and reporting -ENOENT to the user will
> > > inevitably lead to head banging frustration by someone, somewhere, when
> > > they're trying to delete something and the system is tell them it
> > > doesn't exist when they can see it very much does exist, right there :)
> > > the more precise error code is a very helpful cue...
> > > 
> > 
> > ???
> > You will only get -ENOENT if there is no ent.  There is no question of a
> > confusing error message.
> > If you ask for a non-exist name on the correct filesystem, you get -ENOENT
> > If you ask for an existing name of the wrong filesystem, you get -EXDEV
> > That all works as expected and always has.
> > 
> > But what if you ask for a non-existing name in a directory on the
> > wrong filesystem?  
> > The code you originally wrote in 42d237320e9817a9 would return
> > -ENOENT because that it what user_path_at() would return.
> 
> Ahh - ok, I think I see where I misread before
> 
> > But using user_path_at() is "wrong" because it doesn't lock the directory
> > so ->d_parent is not guaranteed to be stable.
> > Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
> > that doesn't check for a negative dentry so Al added a check to return
> > -ENOENT, but that was added *after* the test that returns -EXDEV.
> > 
> > So now if you call subvolume_destroy on a non-existing name in a
> > directory on the wrong filesystem, you get -EXDEV.  I think that is
> > a bit weird but not a lot weird.
> 
> Yeah, we don't need to preserve that. As long as calling it on a name
> that _does_ exist on a different filesystem returns -EXDEV, that's all I
> care about.
> 
> So assuming that's the case you can go ahead and add my acked-by...

Cool - thanks.

> 
> Nit: I would go back and stare at the patch some more, but threading got
> completely fubar so I can't find anything. Doh.
> 

:-)

NeilBrown



> > My patch will change it back to -ENOENT - the way you originally wrote
> > it.
> > 
> > I hope you are ok with that.
> 
> Yes, sounds good.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-12  3:16   ` Al Viro
@ 2025-02-12  3:25     ` Al Viro
  2025-02-12  3:45       ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2025-02-12  3:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs,
	linux-cifs, audit

On Wed, Feb 12, 2025 at 03:16:08AM +0000, Al Viro wrote:
> On Fri, Feb 07, 2025 at 02:36:48PM +1100, NeilBrown wrote:
> > @@ -1690,6 +1692,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> >  		dput(dentry);
> >  		dentry = old;
> >  	}
> > +found:
> 
> ... and if ->lookup() returns an error, this will blow up (as bot has just
> reported).
> 
> > +	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
> > +		dput(dentry);
> > +		return ERR_PTR(-ENOENT);
> > +	}
> > +	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
> > +		dput(dentry);
> > +		return ERR_PTR(-EEXIST);
> > +	}
> 
> 
> > @@ -4077,27 +4084,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> >  	 * '/', and a directory wasn't requested.
> >  	 */
> >  	if (last.name[last.len] && !want_dir)
> > -		create_flags = 0;
> > +		create_flags &= ~LOOKUP_CREATE;
> 
> See the patch I've posted in earlier thread; the entire "strip LOOKUP_CREATE"
> thing is wrong.

On top of mainline that's

filename_create(): don't force handling trailing slashes into the common path

Only mkdir accepts pathnames that end with / - anything like mknod() (symlink(),
etc.) always fails on those.  Don't try to force that the common codepath -
all we are doing is a lookup and check for existence to determine which
error should it be.  Do that before bothering with mnt_want_write(), etc.;
as far as underlying filesystem is concerned it's just a lookup.  Simplifies
the normal codepath and kills the lookup intent dependency on more than
the call site.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namei.c b/fs/namei.c
index 3ab9440c5b93..6189e54f767a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4054,13 +4054,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	struct dentry *dentry = ERR_PTR(-EEXIST);
 	struct qstr last;
 	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
-	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
-	unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
 	int type;
 	int err2;
 	int error;
 
-	error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
+	lookup_flags &= LOOKUP_REVAL;
+
+	error = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
 	if (error)
 		return ERR_PTR(error);
 
@@ -4070,18 +4070,28 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	 */
 	if (unlikely(type != LAST_NORM))
 		goto out;
+	/*
+	 * mkdir foo/bar/ is OK, but for anything else a slash in the end
+	 * is always an error; the only question is which one.
+	 */
+	if (unlikely(last.name[last.len] && !want_dir)) {
+		dentry = lookup_dcache(&last, path->dentry, lookup_flags);
+		if (!dentry)
+			dentry = lookup_slow(&last, path->dentry, lookup_flags);
+		if (!IS_ERR(dentry)) {
+			error = d_is_positive(dentry) ? -EEXIST : -ENOENT;
+			dput(dentry);
+			dentry = ERR_PTR(error);
+		}
+		goto out;
+	}
 
 	/* don't fail immediately if it's r/o, at least try to report other errors */
 	err2 = mnt_want_write(path->mnt);
-	/*
-	 * Do the final lookup.  Suppress 'create' if there is a trailing
-	 * '/', and a directory wasn't requested.
-	 */
-	if (last.name[last.len] && !want_dir)
-		create_flags = 0;
+	/* do the final lookup */
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	dentry = lookup_one_qstr_excl(&last, path->dentry,
-				      reval_flag | create_flags);
+				lookup_flags | LOOKUP_CREATE | LOOKUP_EXCL);
 	if (IS_ERR(dentry))
 		goto unlock;
 
@@ -4089,16 +4099,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	if (d_is_positive(dentry))
 		goto fail;
 
-	/*
-	 * Special case - lookup gave negative, but... we had foo/bar/
-	 * From the vfs_mknod() POV we just have a negative dentry -
-	 * all is fine. Let's be bastards - you had / on the end, you've
-	 * been asking for (non-existent) directory. -ENOENT for you.
-	 */
-	if (unlikely(!create_flags)) {
-		error = -ENOENT;
-		goto fail;
-	}
 	if (unlikely(err2)) {
 		error = err2;
 		goto fail;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-12  3:25     ` Al Viro
@ 2025-02-12  3:45       ` NeilBrown
  2025-02-12  4:06         ` Al Viro
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2025-02-12  3:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs,
	linux-cifs, audit

On Wed, 12 Feb 2025, Al Viro wrote:
> On Wed, Feb 12, 2025 at 03:16:08AM +0000, Al Viro wrote:
> > On Fri, Feb 07, 2025 at 02:36:48PM +1100, NeilBrown wrote:
> > > @@ -1690,6 +1692,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> > >  		dput(dentry);
> > >  		dentry = old;
> > >  	}
> > > +found:
> > 
> > ... and if ->lookup() returns an error, this will blow up (as bot has just
> > reported).

Yes, I need an early exit if (IS_ERR(dentry)).  Thanks.

> > 
> > > +	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
> > > +		dput(dentry);
> > > +		return ERR_PTR(-ENOENT);
> > > +	}
> > > +	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
> > > +		dput(dentry);
> > > +		return ERR_PTR(-EEXIST);
> > > +	}
> > 
> > 
> > > @@ -4077,27 +4084,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> > >  	 * '/', and a directory wasn't requested.
> > >  	 */
> > >  	if (last.name[last.len] && !want_dir)
> > > -		create_flags = 0;
> > > +		create_flags &= ~LOOKUP_CREATE;
> > 
> > See the patch I've posted in earlier thread; the entire "strip LOOKUP_CREATE"
> > thing is wrong.
> 
> On top of mainline that's
> 
> filename_create(): don't force handling trailing slashes into the common path
> 
> Only mkdir accepts pathnames that end with / - anything like mknod() (symlink(),
> etc.) always fails on those.  Don't try to force that the common codepath -
> all we are doing is a lookup and check for existence to determine which
> error should it be.  Do that before bothering with mnt_want_write(), etc.;
> as far as underlying filesystem is concerned it's just a lookup.  Simplifies
> the normal codepath and kills the lookup intent dependency on more than
> the call site.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/namei.c b/fs/namei.c
> index 3ab9440c5b93..6189e54f767a 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4054,13 +4054,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
>  	struct dentry *dentry = ERR_PTR(-EEXIST);
>  	struct qstr last;
>  	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
> -	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
> -	unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
>  	int type;
>  	int err2;
>  	int error;
>  
> -	error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
> +	lookup_flags &= LOOKUP_REVAL;
> +
> +	error = filename_parentat(dfd, name, lookup_flags, path, &last, &type);
>  	if (error)
>  		return ERR_PTR(error);
>  
> @@ -4070,18 +4070,28 @@ static struct dentry *filename_create(int dfd, struct filename *name,
>  	 */
>  	if (unlikely(type != LAST_NORM))
>  		goto out;
> +	/*
> +	 * mkdir foo/bar/ is OK, but for anything else a slash in the end
> +	 * is always an error; the only question is which one.
> +	 */
> +	if (unlikely(last.name[last.len] && !want_dir)) {
> +		dentry = lookup_dcache(&last, path->dentry, lookup_flags);
> +		if (!dentry)
> +			dentry = lookup_slow(&last, path->dentry, lookup_flags);

I do see some value in the simplicity of this approach, though maybe not
as much value as you see.  But the above uses inode_lock_share(), rather
than the nested version, so lockdep will complain.
If you open-code a nested lock, or write a new helper, you get very
close to the sequence for calling lookup_one_qstr_excl() below.  So
it isn't clear to me that the benefit is worth the cost.

This current code in filename_create isn't actually wrong is it?

Thanks,
NeilBrown



> +		if (!IS_ERR(dentry)) {
> +			error = d_is_positive(dentry) ? -EEXIST : -ENOENT;
> +			dput(dentry);
> +			dentry = ERR_PTR(error);
> +		}
> +		goto out;
> +	}
>  
>  	/* don't fail immediately if it's r/o, at least try to report other errors */
>  	err2 = mnt_want_write(path->mnt);
> -	/*
> -	 * Do the final lookup.  Suppress 'create' if there is a trailing
> -	 * '/', and a directory wasn't requested.
> -	 */
> -	if (last.name[last.len] && !want_dir)
> -		create_flags = 0;
> +	/* do the final lookup */
>  	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
>  	dentry = lookup_one_qstr_excl(&last, path->dentry,
> -				      reval_flag | create_flags);
> +				lookup_flags | LOOKUP_CREATE | LOOKUP_EXCL);
>  	if (IS_ERR(dentry))
>  		goto unlock;
>  
> @@ -4089,16 +4099,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
>  	if (d_is_positive(dentry))
>  		goto fail;
>  
> -	/*
> -	 * Special case - lookup gave negative, but... we had foo/bar/
> -	 * From the vfs_mknod() POV we just have a negative dentry -
> -	 * all is fine. Let's be bastards - you had / on the end, you've
> -	 * been asking for (non-existent) directory. -ENOENT for you.
> -	 */
> -	if (unlikely(!create_flags)) {
> -		error = -ENOENT;
> -		goto fail;
> -	}
>  	if (unlikely(err2)) {
>  		error = err2;
>  		goto fail;
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-12  3:45       ` NeilBrown
@ 2025-02-12  4:06         ` Al Viro
  2025-02-12  4:40           ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2025-02-12  4:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christian Brauner, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs,
	linux-cifs, audit

On Wed, Feb 12, 2025 at 02:45:04PM +1100, NeilBrown wrote:
> On Wed, 12 Feb 2025, Al Viro wrote:

> I do see some value in the simplicity of this approach, though maybe not
> as much value as you see.  But the above uses inode_lock_share(), rather
> than the nested version, so lockdep will complain.

IDGI...  It doesn't grab any ->i_rwsem inside that one, so what's there to
complain about?  And in that case it returns with no locks held, so...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-12  4:06         ` Al Viro
@ 2025-02-12  4:40           ` NeilBrown
  0 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2025-02-12  4:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Jan Kara, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris,
	linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs,
	linux-cifs, audit

On Wed, 12 Feb 2025, Al Viro wrote:
> On Wed, Feb 12, 2025 at 02:45:04PM +1100, NeilBrown wrote:
> > On Wed, 12 Feb 2025, Al Viro wrote:
> 
> > I do see some value in the simplicity of this approach, though maybe not
> > as much value as you see.  But the above uses inode_lock_share(), rather
> > than the nested version, so lockdep will complain.
> 
> IDGI...  It doesn't grab any ->i_rwsem inside that one, so what's there to
> complain about?  And in that case it returns with no locks held, so...
> 

Sorry - my bad.  I saw the difference in nesting and jumped the wrong
way.

NeilBrown

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-17  0:27 [PATCH 0/2 v2] " NeilBrown
@ 2025-02-17  0:27 ` NeilBrown
  2025-02-17 13:46   ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2025-02-17  0:27 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro, Jan Kara; +Cc: linux-fsdevel, linux-nfs

Callers of lookup_one_qstr_excl() often check if the result is negative or
positive.
These changes can easily be moved into lookup_one_qstr_excl() by checking the
lookup flags:
LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
LOOKUP_EXCL means it IS an error if the name DOES exist.

This patch adds these checks, then removes error checks from callers,
and ensures that appropriate flags are passed.

This subtly changes the meaning of LOOKUP_EXCL.  Previously it could
only accompany LOOKUP_CREATE.  Now it can accompany LOOKUP_RENAME_TARGET
as well.  A couple of small changes are needed to accommodate this.  The
NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
exactly what the name says.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250207034040.3402438-3-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 Documentation/filesystems/porting.rst | 12 ++++++
 fs/namei.c                            | 61 +++++++++------------------
 fs/nfs/dir.c                          |  3 +-
 fs/smb/server/vfs.c                   | 26 +++++-------
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 2ead47e20677..3b6622fbd66b 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1165,3 +1165,15 @@ magic.
 kern_path_locked() and user_path_locked() no longer return a negative
 dentry so this doesn't need to be checked.  If the name cannot be found,
 ERR_PTR(-ENOENT) is returned.
+
+** recommend**
+
+lookup_one_qstr_excl() is changed to return errors in more cases, so
+these conditions don't require explicit checks.
+ - if LOOKUP_CREATE is NOT given, then the dentry won't be negative,
+   ERR_PTR(-ENOENT) is returned instead
+ - if LOOKUP_EXCL IS given, then the dentry won't be positive,
+   ERR_PTR(-EEXIST) is rreturned instread
+
+LOOKUP_EXCL now means "target must not exist".  It can be combined with 
+LOOK_CREATE or LOOKUP_RENAME_TARGET.
diff --git a/fs/namei.c b/fs/namei.c
index fb6da3ca0ca5..b7cdca902803 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1670,6 +1670,8 @@ static struct dentry *lookup_dcache(const struct qstr *name,
  * dentries - as the matter of fact, this only gets called
  * when directory is guaranteed to have no in-lookup children
  * at all.
+ * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
+ * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
  */
 struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 				    struct dentry *base,
@@ -1680,7 +1682,7 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 	struct inode *dir = base->d_inode;
 
 	if (dentry)
-		return dentry;
+		goto found;
 
 	/* Don't create child dentry for a dead directory. */
 	if (unlikely(IS_DEADDIR(dir)))
@@ -1695,6 +1697,17 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 		dput(dentry);
 		dentry = old;
 	}
+found:
+	if (IS_ERR(dentry))
+		return dentry;
+	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
+		dput(dentry);
+		return ERR_PTR(-ENOENT);
+	}
+	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
+		dput(dentry);
+		return ERR_PTR(-EEXIST);
+	}
 	return dentry;
 }
 EXPORT_SYMBOL(lookup_one_qstr_excl);
@@ -2741,10 +2754,6 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	d = lookup_one_qstr_excl(&last, path->dentry, 0);
-	if (!IS_ERR(d) && d_is_negative(d)) {
-		dput(d);
-		d = ERR_PTR(-ENOENT);
-	}
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
@@ -4082,27 +4091,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	 * '/', and a directory wasn't requested.
 	 */
 	if (last.name[last.len] && !want_dir)
-		create_flags = 0;
+		create_flags &= ~LOOKUP_CREATE;
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	dentry = lookup_one_qstr_excl(&last, path->dentry,
 				      reval_flag | create_flags);
 	if (IS_ERR(dentry))
 		goto unlock;
 
-	error = -EEXIST;
-	if (d_is_positive(dentry))
-		goto fail;
-
-	/*
-	 * Special case - lookup gave negative, but... we had foo/bar/
-	 * From the vfs_mknod() POV we just have a negative dentry -
-	 * all is fine. Let's be bastards - you had / on the end, you've
-	 * been asking for (non-existent) directory. -ENOENT for you.
-	 */
-	if (unlikely(!create_flags)) {
-		error = -ENOENT;
-		goto fail;
-	}
 	if (unlikely(err2)) {
 		error = err2;
 		goto fail;
@@ -4449,10 +4444,6 @@ int do_rmdir(int dfd, struct filename *name)
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit3;
-	if (!dentry->d_inode) {
-		error = -ENOENT;
-		goto exit4;
-	}
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit4;
@@ -4583,7 +4574,7 @@ int do_unlinkat(int dfd, struct filename *name)
 	if (!IS_ERR(dentry)) {
 
 		/* Why not before? Because we want correct error value */
-		if (last.name[last.len] || d_is_negative(dentry))
+		if (last.name[last.len])
 			goto slashes;
 		inode = dentry->d_inode;
 		ihold(inode);
@@ -4617,9 +4608,7 @@ int do_unlinkat(int dfd, struct filename *name)
 	return error;
 
 slashes:
-	if (d_is_negative(dentry))
-		error = -ENOENT;
-	else if (d_is_dir(dentry))
+	if (d_is_dir(dentry))
 		error = -EISDIR;
 	else
 		error = -ENOTDIR;
@@ -5119,7 +5108,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	struct qstr old_last, new_last;
 	int old_type, new_type;
 	struct inode *delegated_inode = NULL;
-	unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
+	unsigned int lookup_flags = 0, target_flags =
+		LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
 	bool should_retry = false;
 	int error = -EINVAL;
 
@@ -5132,6 +5122,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 
 	if (flags & RENAME_EXCHANGE)
 		target_flags = 0;
+	if (flags & RENAME_NOREPLACE)
+		target_flags |= LOOKUP_EXCL;
 
 retry:
 	error = filename_parentat(olddfd, from, lookup_flags, &old_path,
@@ -5173,23 +5165,12 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	error = PTR_ERR(old_dentry);
 	if (IS_ERR(old_dentry))
 		goto exit3;
-	/* source must exist */
-	error = -ENOENT;
-	if (d_is_negative(old_dentry))
-		goto exit4;
 	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
 					  lookup_flags | target_flags);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto exit4;
-	error = -EEXIST;
-	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
-		goto exit5;
 	if (flags & RENAME_EXCHANGE) {
-		error = -ENOENT;
-		if (d_is_negative(new_dentry))
-			goto exit5;
-
 		if (!d_is_dir(new_dentry)) {
 			error = -ENOTDIR;
 			if (new_last.name[new_last.len])
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2b04038b0e40..56cf16a72334 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1532,7 +1532,8 @@ static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
 {
 	if (NFS_PROTO(dir)->version == 2)
 		return 0;
-	return flags & LOOKUP_EXCL;
+	return (flags & (LOOKUP_CREATE | LOOKUP_EXCL)) ==
+		(LOOKUP_CREATE | LOOKUP_EXCL);
 }
 
 /*
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 6890016e1923..fe29acef5872 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -113,11 +113,6 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf,
 	if (IS_ERR(d))
 		goto err_out;
 
-	if (d_is_negative(d)) {
-		dput(d);
-		goto err_out;
-	}
-
 	path->dentry = d;
 	path->mnt = mntget(parent_path->mnt);
 
@@ -693,6 +688,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 	struct ksmbd_file *parent_fp;
 	int new_type;
 	int err, lookup_flags = LOOKUP_NO_SYMLINKS;
+	int target_lookup_flags = LOOKUP_RENAME_TARGET;
 
 	if (ksmbd_override_fsids(work))
 		return -ENOMEM;
@@ -703,6 +699,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 		goto revert_fsids;
 	}
 
+	/*
+	 * explicitly handle file overwrite case, for compatibility with
+	 * filesystems that may not support rename flags (e.g: fuse)
+	 */
+	if (flags & RENAME_NOREPLACE)
+		target_lookup_flags |= LOOKUP_EXCL;
+	flags &= ~(RENAME_NOREPLACE);
+
 retry:
 	err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
 				     &new_path, &new_last, &new_type,
@@ -743,7 +747,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 	}
 
 	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
-					  lookup_flags | LOOKUP_RENAME_TARGET);
+					  lookup_flags | target_lookup_flags);
 	if (IS_ERR(new_dentry)) {
 		err = PTR_ERR(new_dentry);
 		goto out3;
@@ -754,16 +758,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 		goto out4;
 	}
 
-	/*
-	 * explicitly handle file overwrite case, for compatibility with
-	 * filesystems that may not support rename flags (e.g: fuse)
-	 */
-	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) {
-		err = -EEXIST;
-		goto out4;
-	}
-	flags &= ~(RENAME_NOREPLACE);
-
 	if (old_child == trap) {
 		err = -EINVAL;
 		goto out4;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl()
  2025-02-17  0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
@ 2025-02-17 13:46   ` Jeff Layton
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Layton @ 2025-02-17 13:46 UTC (permalink / raw)
  To: NeilBrown, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-fsdevel, linux-nfs

On Mon, 2025-02-17 at 11:27 +1100, NeilBrown wrote:
> Callers of lookup_one_qstr_excl() often check if the result is negative or
> positive.
> These changes can easily be moved into lookup_one_qstr_excl() by checking the
> lookup flags:
> LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
> LOOKUP_EXCL means it IS an error if the name DOES exist.
> 
> This patch adds these checks, then removes error checks from callers,
> and ensures that appropriate flags are passed.
> 
> This subtly changes the meaning of LOOKUP_EXCL.  Previously it could
> only accompany LOOKUP_CREATE.  Now it can accompany LOOKUP_RENAME_TARGET
> as well.  A couple of small changes are needed to accommodate this.  The
> NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
> exactly what the name says.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> Link: https://lore.kernel.org/r/20250207034040.3402438-3-neilb@suse.de
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  Documentation/filesystems/porting.rst | 12 ++++++
>  fs/namei.c                            | 61 +++++++++------------------
>  fs/nfs/dir.c                          |  3 +-
>  fs/smb/server/vfs.c                   | 26 +++++-------
>  4 files changed, 45 insertions(+), 57 deletions(-)
> 
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 2ead47e20677..3b6622fbd66b 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1165,3 +1165,15 @@ magic.
>  kern_path_locked() and user_path_locked() no longer return a negative
>  dentry so this doesn't need to be checked.  If the name cannot be found,
>  ERR_PTR(-ENOENT) is returned.
> +
> +** recommend**
> +
> +lookup_one_qstr_excl() is changed to return errors in more cases, so
> +these conditions don't require explicit checks.
> + - if LOOKUP_CREATE is NOT given, then the dentry won't be negative,
> +   ERR_PTR(-ENOENT) is returned instead
> + - if LOOKUP_EXCL IS given, then the dentry won't be positive,
> +   ERR_PTR(-EEXIST) is rreturned instread
> +
> +LOOKUP_EXCL now means "target must not exist".  It can be combined with 
> +LOOK_CREATE or LOOKUP_RENAME_TARGET.
> diff --git a/fs/namei.c b/fs/namei.c
> index fb6da3ca0ca5..b7cdca902803 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1670,6 +1670,8 @@ static struct dentry *lookup_dcache(const struct qstr *name,
>   * dentries - as the matter of fact, this only gets called
>   * when directory is guaranteed to have no in-lookup children
>   * at all.
> + * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
> + * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
>   */
>  struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  				    struct dentry *base,
> @@ -1680,7 +1682,7 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  	struct inode *dir = base->d_inode;
>  
>  	if (dentry)
> -		return dentry;
> +		goto found;
>  
>  	/* Don't create child dentry for a dead directory. */
>  	if (unlikely(IS_DEADDIR(dir)))
> @@ -1695,6 +1697,17 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  		dput(dentry);
>  		dentry = old;
>  	}
> +found:
> +	if (IS_ERR(dentry))
> +		return dentry;
> +	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
> +		dput(dentry);
> +		return ERR_PTR(-ENOENT);
> +	}
> +	if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) {
> +		dput(dentry);
> +		return ERR_PTR(-EEXIST);
> +	}
>  	return dentry;
>  }
>  EXPORT_SYMBOL(lookup_one_qstr_excl);
> @@ -2741,10 +2754,6 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
>  	}
>  	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
>  	d = lookup_one_qstr_excl(&last, path->dentry, 0);
> -	if (!IS_ERR(d) && d_is_negative(d)) {
> -		dput(d);
> -		d = ERR_PTR(-ENOENT);
> -	}
>  	if (IS_ERR(d)) {
>  		inode_unlock(path->dentry->d_inode);
>  		path_put(path);
> @@ -4082,27 +4091,13 @@ static struct dentry *filename_create(int dfd, struct filename *name,
>  	 * '/', and a directory wasn't requested.
>  	 */
>  	if (last.name[last.len] && !want_dir)
> -		create_flags = 0;
> +		create_flags &= ~LOOKUP_CREATE;
>  	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
>  	dentry = lookup_one_qstr_excl(&last, path->dentry,
>  				      reval_flag | create_flags);
>  	if (IS_ERR(dentry))
>  		goto unlock;
>  
> -	error = -EEXIST;
> -	if (d_is_positive(dentry))
> -		goto fail;
> -
> -	/*
> -	 * Special case - lookup gave negative, but... we had foo/bar/
> -	 * From the vfs_mknod() POV we just have a negative dentry -
> -	 * all is fine. Let's be bastards - you had / on the end, you've
> -	 * been asking for (non-existent) directory. -ENOENT for you.
> -	 */
> -	if (unlikely(!create_flags)) {
> -		error = -ENOENT;
> -		goto fail;
> -	}
>  	if (unlikely(err2)) {
>  		error = err2;
>  		goto fail;
> @@ -4449,10 +4444,6 @@ int do_rmdir(int dfd, struct filename *name)
>  	error = PTR_ERR(dentry);
>  	if (IS_ERR(dentry))
>  		goto exit3;
> -	if (!dentry->d_inode) {
> -		error = -ENOENT;
> -		goto exit4;
> -	}
>  	error = security_path_rmdir(&path, dentry);
>  	if (error)
>  		goto exit4;
> @@ -4583,7 +4574,7 @@ int do_unlinkat(int dfd, struct filename *name)
>  	if (!IS_ERR(dentry)) {
>  
>  		/* Why not before? Because we want correct error value */
> -		if (last.name[last.len] || d_is_negative(dentry))
> +		if (last.name[last.len])
>  			goto slashes;
>  		inode = dentry->d_inode;
>  		ihold(inode);
> @@ -4617,9 +4608,7 @@ int do_unlinkat(int dfd, struct filename *name)
>  	return error;
>  
>  slashes:
> -	if (d_is_negative(dentry))
> -		error = -ENOENT;
> -	else if (d_is_dir(dentry))
> +	if (d_is_dir(dentry))
>  		error = -EISDIR;
>  	else
>  		error = -ENOTDIR;
> @@ -5119,7 +5108,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
>  	struct qstr old_last, new_last;
>  	int old_type, new_type;
>  	struct inode *delegated_inode = NULL;
> -	unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
> +	unsigned int lookup_flags = 0, target_flags =
> +		LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
>  	bool should_retry = false;
>  	int error = -EINVAL;
>  
> @@ -5132,6 +5122,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
>  
>  	if (flags & RENAME_EXCHANGE)
>  		target_flags = 0;
> +	if (flags & RENAME_NOREPLACE)
> +		target_flags |= LOOKUP_EXCL;
>  
>  retry:
>  	error = filename_parentat(olddfd, from, lookup_flags, &old_path,
> @@ -5173,23 +5165,12 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
>  	error = PTR_ERR(old_dentry);
>  	if (IS_ERR(old_dentry))
>  		goto exit3;
> -	/* source must exist */
> -	error = -ENOENT;
> -	if (d_is_negative(old_dentry))
> -		goto exit4;
>  	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
>  					  lookup_flags | target_flags);
>  	error = PTR_ERR(new_dentry);
>  	if (IS_ERR(new_dentry))
>  		goto exit4;
> -	error = -EEXIST;
> -	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
> -		goto exit5;
>  	if (flags & RENAME_EXCHANGE) {
> -		error = -ENOENT;
> -		if (d_is_negative(new_dentry))
> -			goto exit5;
> -
>  		if (!d_is_dir(new_dentry)) {
>  			error = -ENOTDIR;
>  			if (new_last.name[new_last.len])
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 2b04038b0e40..56cf16a72334 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1532,7 +1532,8 @@ static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
>  {
>  	if (NFS_PROTO(dir)->version == 2)
>  		return 0;
> -	return flags & LOOKUP_EXCL;
> +	return (flags & (LOOKUP_CREATE | LOOKUP_EXCL)) ==
> +		(LOOKUP_CREATE | LOOKUP_EXCL);
>  }
>  
>  /*
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 6890016e1923..fe29acef5872 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -113,11 +113,6 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf,
>  	if (IS_ERR(d))
>  		goto err_out;
>  
> -	if (d_is_negative(d)) {
> -		dput(d);
> -		goto err_out;
> -	}
> -
>  	path->dentry = d;
>  	path->mnt = mntget(parent_path->mnt);
>  
> @@ -693,6 +688,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
>  	struct ksmbd_file *parent_fp;
>  	int new_type;
>  	int err, lookup_flags = LOOKUP_NO_SYMLINKS;
> +	int target_lookup_flags = LOOKUP_RENAME_TARGET;
>  
>  	if (ksmbd_override_fsids(work))
>  		return -ENOMEM;
> @@ -703,6 +699,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
>  		goto revert_fsids;
>  	}
>  
> +	/*
> +	 * explicitly handle file overwrite case, for compatibility with
> +	 * filesystems that may not support rename flags (e.g: fuse)
> +	 */
> +	if (flags & RENAME_NOREPLACE)
> +		target_lookup_flags |= LOOKUP_EXCL;
> +	flags &= ~(RENAME_NOREPLACE);
> +
>  retry:
>  	err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
>  				     &new_path, &new_last, &new_type,
> @@ -743,7 +747,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
>  	}
>  
>  	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
> -					  lookup_flags | LOOKUP_RENAME_TARGET);
> +					  lookup_flags | target_lookup_flags);
>  	if (IS_ERR(new_dentry)) {
>  		err = PTR_ERR(new_dentry);
>  		goto out3;
> @@ -754,16 +758,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
>  		goto out4;
>  	}
>  
> -	/*
> -	 * explicitly handle file overwrite case, for compatibility with
> -	 * filesystems that may not support rename flags (e.g: fuse)
> -	 */
> -	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) {
> -		err = -EEXIST;
> -		goto out4;
> -	}
> -	flags &= ~(RENAME_NOREPLACE);
> -
>  	if (old_child == trap) {
>  		err = -EINVAL;
>  		goto out4;

Nice simplification.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-02-17 13:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07  3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown
2025-02-07  3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
2025-02-07  3:46   ` Kent Overstreet
2025-02-07  4:53     ` NeilBrown
2025-02-07  6:13       ` Kent Overstreet
2025-02-07  6:34         ` NeilBrown
2025-02-07  6:51           ` Kent Overstreet
2025-02-07  7:30             ` NeilBrown
2025-02-07 13:35               ` Kent Overstreet
2025-02-10  1:20                 ` NeilBrown
2025-02-10 16:33                   ` Kent Overstreet
2025-02-12  3:24                     ` NeilBrown
2025-02-07  6:51         ` [PATCH 1/2 v2] " NeilBrown
2025-02-07  6:53       ` [PATCH 1/2] " NeilBrown
2025-02-07 19:09   ` Paul Moore
2025-02-07  3:36 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
2025-02-12  2:50   ` kernel test robot
2025-02-12  3:16   ` Al Viro
2025-02-12  3:25     ` Al Viro
2025-02-12  3:45       ` NeilBrown
2025-02-12  4:06         ` Al Viro
2025-02-12  4:40           ` NeilBrown
2025-02-10  8:25 ` [PATCH 0/2] VFS: minor improvements to a couple of interfaces Christian Brauner
2025-02-10  8:42   ` Al Viro
2025-02-10  9:41     ` Christian Brauner
  -- strict thread matches above, loose matches on Subject: below --
2025-02-17  0:27 [PATCH 0/2 v2] " NeilBrown
2025-02-17  0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown
2025-02-17 13:46   ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).