[PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations
@ 2025-07-21  7:59 NeilBrown
  2025-07-21  7:59 ` [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21  7:59 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

Hi,

 these patches (against vfs.all) primarily introduce new APIs for
 preparing dentries for create, remove, rename.  The goal is to
 centralise knowledge of how we do locking (currently by locking the
 directory) so that we can eventually change the mechanism (e.g.  to
 locking just the dentry).

 Naming is difficult and I've changed my mind several times. :-)

 The basic approach is to return a dentry which can be passed to
 vfs_create(), vfs_unlink() etc, and subsequently to release that
 dentry.  The closest analogue to this in the VFS is kern_path_create()
 which is paired with done_path_create(), though there is also
 kern_path_locked() which is paired with explicit inode_unlock() and
 dput().  So my current approach uses "done_" for finishing up.

 I have:
   dentry_lookup() dentry_lookup_noperm() dentry_lookup_hashed()
   dentry_lookup_killable()
 paired with
   done_dentry_lookup()

 and also
   rename_lookup() rename_lookup_noperm() rename_lookup_hashed()
 paired with
   done_rename_lookup()
 (these take a "struct renamedata *" to which some qstrs are added.

 There is also "dentry_lock_in()" which is used instead of
 dentry_lookup() when you already have the dentry and want to lock it.
 So you "lock" it "in" a given parent.  I'm not very proud of this name,
 but I don't want to use "dentry_lock" as I want to save that for
 low-level locking primitives.

 There is also done_dentry_lookup_return() which doesn't dput() the
 dentry but returns it instread.  In about 1/6 of places where I need
 done_dentry_lookup() the code makes use of the dentry afterwards.  Only
 in half the places where done_dentry_lookup_return() is used is the
 returned value immediately returned by the calling function.  I could
 do a dget() before done_dentry_lookup(), but that looks awkward and I
 think having the _return version is justified.  I'm happy to hear other
 opinions.

 In order for this dentry-focussed API to work we need to have the
 dentry to unlock.  vfs_rmdir() currently consumes the dentry on
 failure, so we don't have it unless we clumsily keep a copy.  So an
 early patch changes vfs_rmdir() to both consume the dentry and drop the
 lock on failure.

 After these new APIs are refined, agreed, and applied I will have a
 collection of patches to roll them out throughout the kernel.  Then we
 can start/continue discussing a new approach to locking which allows
 directory operations to proceed in parallel.

 If you want a sneak peek at some of this future work - for context
 mostly - my current devel code is at https://github.com/neilbrown/linux.git
 in a branch "pdirops".  Be warned that a lot of the later code is under
 development, is known to be wrong, and doesn't even compile.  Not today
 anyway.  The rolling out of the new APIs is fairly mature though.

 Please review and suggest better names, or tell me that my choices are adequate.
 And find the bugs in the code too :-)

 I haven't cc:ed the maintains of the non-VFS code that the patches
 touch.  I can do that once the approach and names have been approved.

Thanks,
NeilBrown

 [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
 [PATCH 2/7] VFS: introduce done_dentry_lookup()
 [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
 [PATCH 4/7] VFS: introduce dentry_lookup() and friends
 [PATCH 5/7] VFS: add dentry_lookup_killable()
 [PATCH 6/7] VFS: add rename_lookup()
 [PATCH 7/7] VFS: introduce dentry_lock_in()

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
@ 2025-07-21  7:59 ` NeilBrown
  2025-07-21 13:06   ` Jeff Layton
  2025-07-21  7:59 ` [PATCH 2/7] VFS: introduce done_dentry_lookup() NeilBrown
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2025-07-21  7:59 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

A rename can only rename with a single mount.  Callers of vfs_rename()
must and do ensure this is the case.

So there is no point in having two mnt_idmaps in renamedata as they are
always the same.  Only of of them is passed to ->rename in any case.

This patch replaces both with a single "mnt_idmap" and changes all
callers.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/cachefiles/namei.c    |  3 +--
 fs/ecryptfs/inode.c      |  3 +--
 fs/namei.c               | 17 ++++++++---------
 fs/nfsd/vfs.c            |  3 +--
 fs/overlayfs/overlayfs.h |  3 +--
 fs/smb/server/vfs.c      |  3 +--
 include/linux/fs.h       |  6 ++----
 7 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 91dfd0231877..d1edb2ac3837 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -387,10 +387,9 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
 		cachefiles_io_error(cache, "Rename security error %d", ret);
 	} else {
 		struct renamedata rd = {
-			.old_mnt_idmap	= &nop_mnt_idmap,
+			.mnt_idmap	= &nop_mnt_idmap,
 			.old_parent	= dir,
 			.old_dentry	= rep,
-			.new_mnt_idmap	= &nop_mnt_idmap,
 			.new_parent	= cache->graveyard,
 			.new_dentry	= grave,
 		};
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..abd954c6a14e 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -634,10 +634,9 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		goto out_lock;
 	}
 
-	rd.old_mnt_idmap	= &nop_mnt_idmap;
+	rd.mnt_idmap		= &nop_mnt_idmap;
 	rd.old_parent		= lower_old_dir_dentry;
 	rd.old_dentry		= lower_old_dentry;
-	rd.new_mnt_idmap	= &nop_mnt_idmap;
 	rd.new_parent		= lower_new_dir_dentry;
 	rd.new_dentry		= lower_new_dentry;
 	rc = vfs_rename(&rd);
diff --git a/fs/namei.c b/fs/namei.c
index cd43ff89fbaa..1c80445693d4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5024,20 +5024,20 @@ int vfs_rename(struct renamedata *rd)
 	if (source == target)
 		return 0;
 
-	error = may_delete(rd->old_mnt_idmap, old_dir, old_dentry, is_dir);
+	error = may_delete(rd->mnt_idmap, old_dir, old_dentry, is_dir);
 	if (error)
 		return error;
 
 	if (!target) {
-		error = may_create(rd->new_mnt_idmap, new_dir, new_dentry);
+		error = may_create(rd->mnt_idmap, new_dir, new_dentry);
 	} else {
 		new_is_dir = d_is_dir(new_dentry);
 
 		if (!(flags & RENAME_EXCHANGE))
-			error = may_delete(rd->new_mnt_idmap, new_dir,
+			error = may_delete(rd->mnt_idmap, new_dir,
 					   new_dentry, is_dir);
 		else
-			error = may_delete(rd->new_mnt_idmap, new_dir,
+			error = may_delete(rd->mnt_idmap, new_dir,
 					   new_dentry, new_is_dir);
 	}
 	if (error)
@@ -5052,13 +5052,13 @@ int vfs_rename(struct renamedata *rd)
 	 */
 	if (new_dir != old_dir) {
 		if (is_dir) {
-			error = inode_permission(rd->old_mnt_idmap, source,
+			error = inode_permission(rd->mnt_idmap, source,
 						 MAY_WRITE);
 			if (error)
 				return error;
 		}
 		if ((flags & RENAME_EXCHANGE) && new_is_dir) {
-			error = inode_permission(rd->new_mnt_idmap, target,
+			error = inode_permission(rd->mnt_idmap, target,
 						 MAY_WRITE);
 			if (error)
 				return error;
@@ -5126,7 +5126,7 @@ int vfs_rename(struct renamedata *rd)
 		if (error)
 			goto out;
 	}
-	error = old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry,
+	error = old_dir->i_op->rename(rd->mnt_idmap, old_dir, old_dentry,
 				      new_dir, new_dentry, flags);
 	if (error)
 		goto out;
@@ -5269,10 +5269,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 
 	rd.old_parent	   = old_path.dentry;
 	rd.old_dentry	   = old_dentry;
-	rd.old_mnt_idmap   = mnt_idmap(old_path.mnt);
+	rd.mnt_idmap	   = mnt_idmap(old_path.mnt);
 	rd.new_parent	   = new_path.dentry;
 	rd.new_dentry	   = new_dentry;
-	rd.new_mnt_idmap   = mnt_idmap(new_path.mnt);
 	rd.delegated_inode = &delegated_inode;
 	rd.flags	   = flags;
 	error = vfs_rename(&rd);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7d522e426b2d..a21940cadede 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1940,10 +1940,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		goto out_dput_old;
 	} else {
 		struct renamedata rd = {
-			.old_mnt_idmap	= &nop_mnt_idmap,
+			.mnt_idmap	= &nop_mnt_idmap,
 			.old_parent	= fdentry,
 			.old_dentry	= odentry,
-			.new_mnt_idmap	= &nop_mnt_idmap,
 			.new_parent	= tdentry,
 			.new_dentry	= ndentry,
 		};
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index bb0d7ded8e76..4f84abaa0d68 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -361,10 +361,9 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
 {
 	int err;
 	struct renamedata rd = {
-		.old_mnt_idmap	= ovl_upper_mnt_idmap(ofs),
+		.mnt_idmap	= ovl_upper_mnt_idmap(ofs),
 		.old_parent	= olddir,
 		.old_dentry	= olddentry,
-		.new_mnt_idmap	= ovl_upper_mnt_idmap(ofs),
 		.new_parent	= newdir,
 		.new_dentry	= newdentry,
 		.flags		= flags,
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 49e731dd0529..bfd62a21e75c 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -770,10 +770,9 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
 		goto out4;
 	}
 
-	rd.old_mnt_idmap	= mnt_idmap(old_path->mnt),
+	rd.mnt_idmap		= mnt_idmap(old_path->mnt),
 	rd.old_parent		= old_parent,
 	rd.old_dentry		= old_child,
-	rd.new_mnt_idmap	= mnt_idmap(new_path.mnt),
 	rd.new_parent		= new_path.dentry,
 	rd.new_dentry		= new_dentry,
 	rd.flags		= flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1948b2c828d3..d3e27da8a6aa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2005,20 +2005,18 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
 
 /**
  * struct renamedata - contains all information required for renaming
- * @old_mnt_idmap:     idmap of the old mount the inode was found from
+ * @mnt_idmap:     idmap of the mount in which the rename is happening.
  * @old_parent:        parent of source
  * @old_dentry:                source
- * @new_mnt_idmap:     idmap of the new mount the inode was found from
  * @new_parent:        parent of destination
  * @new_dentry:                destination
  * @delegated_inode:   returns an inode needing a delegation break
  * @flags:             rename flags
  */
 struct renamedata {
-	struct mnt_idmap *old_mnt_idmap;
+	struct mnt_idmap *mnt_idmap;
 	struct dentry *old_parent;
 	struct dentry *old_dentry;
-	struct mnt_idmap *new_mnt_idmap;
 	struct dentry *new_parent;
 	struct dentry *new_dentry;
 	struct inode **delegated_inode;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] VFS: introduce done_dentry_lookup()
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
  2025-07-21  7:59 ` [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
@ 2025-07-21  7:59 ` NeilBrown
  2025-07-21 13:39   ` Jeff Layton
  2025-07-21  7:59 ` [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure NeilBrown
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2025-07-21  7:59 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

done_dentry_lookup() is the first step in introducing a new API for
locked operation on names in directories - those that create or remove
names.  Rename operations will also be part of this API but will
use separate interfaces.

The plan is to lock just the dentry (or dentries), not the whole
directory.  A "dentry_lookup()" operation will perform the locking and
lookup with a corresponding "done_dentry_lookup()" releasing the
resulting dentry and dropping any locks.

This done_dentry_lookup() can immediately be used to complete updates
started with kern_path_locked() (much as done_path_create() already
completes operations started with kern_path_create()).

So this patch adds done_dentry_lookup() and uses it where
kern_path_locked() is used.  It also adds done_dentry_lookup_return()
which returns a reference to the dentry rather than dropping it.  This
is a less common need in existing code, but still worth its own interface.

Signed-off-by: NeilBrown <neil@brown.name>
---
 drivers/base/devtmpfs.c |  7 ++-----
 fs/bcachefs/fs-ioctl.c  |  3 +--
 fs/namei.c              | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/namei.h   |  3 +++
 kernel/audit_fsnotify.c |  9 ++++-----
 kernel/audit_watch.c    |  3 +--
 6 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 31bfb3194b4c..47bee8166c8d 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -265,8 +265,7 @@ static int dev_rmdir(const char *name)
 	else
 		err = -EPERM;
 
-	dput(dentry);
-	inode_unlock(d_inode(parent.dentry));
+	done_dentry_lookup(dentry);
 	path_put(&parent);
 	return err;
 }
@@ -349,9 +348,7 @@ static int handle_remove(const char *nodename, struct device *dev)
 		if (!err || err == -ENOENT)
 			deleted = 1;
 	}
-	dput(dentry);
-	inode_unlock(d_inode(parent.dentry));
-
+	done_dentry_lookup(dentry);
 	path_put(&parent);
 	if (deleted && strchr(nodename, '/'))
 		delete_path(nodename);
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 4e72e654da96..8077ddf4ddc4 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -351,8 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
 		d_invalidate(victim);
 	}
 err:
-	inode_unlock(dir);
-	dput(victim);
+	done_dentry_lookup(victim);
 	path_put(&path);
 	return ret;
 }
diff --git a/fs/namei.c b/fs/namei.c
index 1c80445693d4..da160a01e23d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1714,6 +1714,44 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 }
 EXPORT_SYMBOL(lookup_one_qstr_excl);
 
+/**
+ * done_dentry_lookup - finish a lookup used for create/delete
+ * @dentry:  the target dentry
+ *
+ * After locking the directory and lookup or validating a dentry
+ * an attempt can be made to create (including link) or remove (including
+ * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
+ * unlock the parent directory and dput() the dentry.
+ *
+ * This interface allows a smooth transition from parent-dir based
+ * locking to dentry based locking.
+ */
+void done_dentry_lookup(struct dentry *dentry)
+{
+	inode_unlock(dentry->d_parent->d_inode);
+	dput(dentry);
+}
+EXPORT_SYMBOL(done_dentry_lookup);
+
+/**
+ * done_dentry_lookup_return - finish a lookup sequence, returning the dentry
+ * @dentry:  the target dentry
+ *
+ * After locking the directory and lookup or validating a dentry
+ * an attempt can be made to create (including link) or remove (including
+ * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
+ * unlock the parent directory.  The dentry is returned for further use.
+ *
+ * This interface allows a smooth transition from parent-dir based
+ * locking to dentry based locking.
+ */
+struct dentry *done_dentry_lookup_return(struct dentry *dentry)
+{
+	inode_unlock(dentry->d_parent->d_inode);
+	return dentry;
+}
+EXPORT_SYMBOL(done_dentry_lookup_return);
+
 /**
  * lookup_fast - do fast lockless (but racy) lookup of a dentry
  * @nd: current nameidata
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5d085428e471..e097f11587c9 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -81,6 +81,9 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
 					    struct qstr *name,
 					    struct dentry *base);
 
+void done_dentry_lookup(struct dentry *dentry);
+struct dentry *done_dentry_lookup_return(struct dentry *dentry);
+
 extern int follow_down_one(struct path *);
 extern int follow_down(struct path *path, unsigned int flags);
 extern int follow_up(struct path *);
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index c565fbf66ac8..170836c3520f 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -85,8 +85,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 	dentry = kern_path_locked(pathname, &path);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry); /* returning an error */
-	inode = path.dentry->d_inode;
-	inode_unlock(inode);
+	inode = igrab(dentry->d_inode);
+	done_dentry_lookup(dentry);
 
 	audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
 	if (unlikely(!audit_mark)) {
@@ -97,17 +97,16 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group);
 	audit_mark->mark.mask = AUDIT_FS_EVENTS;
 	audit_mark->path = pathname;
-	audit_update_mark(audit_mark, dentry->d_inode);
+	audit_update_mark(audit_mark, inode);
 	audit_mark->rule = krule;
 
-	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
+	ret = fsnotify_add_inode_mark(&audit_mark->mark, path.dentry->d_inode, 0);
 	if (ret < 0) {
 		audit_mark->path = NULL;
 		fsnotify_put_mark(&audit_mark->mark);
 		audit_mark = ERR_PTR(ret);
 	}
 out:
-	dput(dentry);
 	path_put(&path);
 	return audit_mark;
 }
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 0ebbbe37a60f..f6ecac2109d4 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -359,8 +359,7 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
 		watch->ino = d_backing_inode(d)->i_ino;
 	}
 
-	inode_unlock(d_backing_inode(parent->dentry));
-	dput(d);
+	done_dentry_lookup(d);
 	return 0;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
  2025-07-21  7:59 ` [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
  2025-07-21  7:59 ` [PATCH 2/7] VFS: introduce done_dentry_lookup() NeilBrown
@ 2025-07-21  7:59 ` NeilBrown
  2025-07-21 14:46   ` Jeff Layton
  2025-07-21  8:00 ` [PATCH 4/7] VFS: introduce dentry_lookup() and friends NeilBrown
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2025-07-21  7:59 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

Proposed changes to directory-op locking will lock the dentry rather
than the whole directory.  So the dentry will need to be unlocked.

vfs_mkdir() consumes the dentry on error, so there will be no dentry to
be unlocked.

So this patch changes vfs_mkdir() to unlock on error as well as
releasing the dentry.  This requires various other functions in various
callers to also unlock on error.

At present this results in some clumsy code.  Once the transition to
dentry locking is complete the clumsiness will be gone.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/cachefiles/namei.c    |  9 +++++----
 fs/ecryptfs/inode.c      |  3 ++-
 fs/namei.c               | 24 ++++++++++++++++--------
 fs/nfsd/nfs4recover.c    | 12 +++++-------
 fs/nfsd/vfs.c            | 12 ++++++++++--
 fs/overlayfs/dir.c       | 13 +++++++------
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/super.c     |  5 +++--
 fs/xfs/scrub/orphanage.c |  2 +-
 9 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d1edb2ac3837..732d78911bed 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -131,8 +131,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 		ret = cachefiles_inject_write_error();
 		if (ret == 0)
 			subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
-		else
+		else {
+			/* vfs_mkdir() unlocks on failure so we must too */
+			inode_unlock(d_inode(dir));
 			subdir = ERR_PTR(ret);
+		}
 		if (IS_ERR(subdir)) {
 			trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
 						   cachefiles_trace_mkdir_error);
@@ -196,9 +199,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 	return ERR_PTR(-EBUSY);
 
 mkdir_error:
-	inode_unlock(d_inode(dir));
-	if (!IS_ERR(subdir))
-		dput(subdir);
+	done_dentry_lookup(subdir);
 	pr_err("mkdir %s failed with error %d\n", dirname, ret);
 	return ERR_PTR(ret);
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index abd954c6a14e..5d8cb042aa57 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -520,7 +520,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 				 lower_dentry, mode);
 	rc = PTR_ERR(lower_dentry);
 	if (IS_ERR(lower_dentry))
-		goto out;
+		goto out_unlocked;
 	rc = 0;
 	if (d_unhashed(lower_dentry))
 		goto out;
@@ -532,6 +532,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 	set_nlink(dir, lower_dir->i_nlink);
 out:
 	inode_unlock(lower_dir);
+out_unlocked:
 	if (d_really_is_negative(dentry))
 		d_drop(dentry);
 	return ERR_PTR(rc);
diff --git a/fs/namei.c b/fs/namei.c
index da160a01e23d..950a0d0d54da 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1723,13 +1723,18 @@ EXPORT_SYMBOL(lookup_one_qstr_excl);
  * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
  * unlock the parent directory and dput() the dentry.
  *
+ * If the dentry is an error - as can happen after vfs_mkdir() -
+ * the unlock is skipped as unneeded.
+ *
  * This interface allows a smooth transition from parent-dir based
  * locking to dentry based locking.
  */
 void done_dentry_lookup(struct dentry *dentry)
 {
-	inode_unlock(dentry->d_parent->d_inode);
-	dput(dentry);
+	if (!IS_ERR(dentry)) {
+		inode_unlock(dentry->d_parent->d_inode);
+		dput(dentry);
+	}
 }
 EXPORT_SYMBOL(done_dentry_lookup);
 
@@ -1742,12 +1747,16 @@ EXPORT_SYMBOL(done_dentry_lookup);
  * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
  * unlock the parent directory.  The dentry is returned for further use.
  *
+ * If the dentry is an error - as can happen after vfs_mkdir() -
+ * the unlock is skipped as unneeded.
+ *
  * This interface allows a smooth transition from parent-dir based
  * locking to dentry based locking.
  */
 struct dentry *done_dentry_lookup_return(struct dentry *dentry)
 {
-	inode_unlock(dentry->d_parent->d_inode);
+	if (!IS_ERR(dentry))
+		inode_unlock(dentry->d_parent->d_inode);
 	return dentry;
 }
 EXPORT_SYMBOL(done_dentry_lookup_return);
@@ -4210,9 +4219,7 @@ EXPORT_SYMBOL(kern_path_create);
 
 void done_path_create(struct path *path, struct dentry *dentry)
 {
-	if (!IS_ERR(dentry))
-		dput(dentry);
-	inode_unlock(path->dentry->d_inode);
+	done_dentry_lookup(dentry);
 	mnt_drop_write(path->mnt);
 	path_put(path);
 }
@@ -4375,7 +4382,8 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
  * negative or unhashes it and possibly splices a different one returning it,
  * the original dentry is dput() and the alternate is returned.
  *
- * In case of an error the dentry is dput() and an ERR_PTR() is returned.
+ * In case of an error the dentry is dput(), the parent is unlocked, and
+ * an ERR_PTR() is returned.
  */
 struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 			 struct dentry *dentry, umode_t mode)
@@ -4413,7 +4421,7 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 	return dentry;
 
 err:
-	dput(dentry);
+	done_dentry_lookup(dentry);
 	return ERR_PTR(error);
 }
 EXPORT_SYMBOL(vfs_mkdir);
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 82785db730d9..693fa95fa678 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -222,7 +222,8 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 	dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
 	if (IS_ERR(dentry)) {
 		status = PTR_ERR(dentry);
-		goto out_unlock;
+		inode_unlock(d_inode(dir));
+		goto out;
 	}
 	if (d_really_is_positive(dentry))
 		/*
@@ -233,15 +234,12 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 		 * In the 4.0 case, we should never get here; but we may
 		 * as well be forgiving and just succeed silently.
 		 */
-		goto out_put;
+		goto out;
 	dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
 	if (IS_ERR(dentry))
 		status = PTR_ERR(dentry);
-out_put:
-	if (!status)
-		dput(dentry);
-out_unlock:
-	inode_unlock(d_inode(dir));
+out:
+	done_dentry_lookup(dentry);
 	if (status == 0) {
 		if (nn->in_grace)
 			__nfsd4_create_reclaim_record_grace(clp, dname,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index a21940cadede..e85195e858a2 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1489,7 +1489,9 @@ nfsd_check_ignore_resizing(struct iattr *iap)
 		iap->ia_valid &= ~ATTR_SIZE;
 }
 
-/* The parent directory should already be locked: */
+/* The parent directory should already be locked.  The lock
+ * will be dropped on error.
+ */
 __be32
 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   struct nfsd_attrs *attrs,
@@ -1555,8 +1557,11 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
 
 out:
-	if (!IS_ERR(dchild))
+	if (!IS_ERR(dchild)) {
+		if (err)
+			inode_unlock(dirp);
 		dput(dchild);
+	}
 	return err;
 
 out_nfserr:
@@ -1613,6 +1618,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err != nfs_ok)
 		goto out_unlock;
 	err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
+	if (err)
+		/* lock will have been dropped */
+		return err;
 	fh_fill_post_attrs(fhp);
 out_unlock:
 	inode_unlock(dentry->d_inode);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 30619777f0f6..74b52595ea0e 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -161,14 +161,17 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
 	goto out;
 }
 
+/* dir will be unlocked on return */
 struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
 			       struct dentry *newdentry, struct ovl_cattr *attr)
 {
 	struct inode *dir = parent->d_inode;
 	int err;
 
-	if (IS_ERR(newdentry))
+	if (IS_ERR(newdentry)) {
+		inode_unlock(dir);
 		return newdentry;
+	}
 
 	err = -ESTALE;
 	if (newdentry->d_inode)
@@ -213,11 +216,11 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
 	}
 out:
 	if (err) {
-		if (!IS_ERR(newdentry))
-			dput(newdentry);
+		done_dentry_lookup(newdentry);
 		return ERR_PTR(err);
+	} else {
+		return done_dentry_lookup_return(newdentry);
 	}
-	return newdentry;
 }
 
 struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
@@ -227,7 +230,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
 	inode_lock(workdir->d_inode);
 	ret = ovl_create_real(ofs, workdir,
 			      ovl_lookup_temp(ofs, workdir), attr);
-	inode_unlock(workdir->d_inode);
 	return ret;
 }
 
@@ -335,7 +337,6 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 				    ovl_lookup_upper(ofs, dentry->d_name.name,
 						     upperdir, dentry->d_name.len),
 				    attr);
-	inode_unlock(udir);
 	if (IS_ERR(newdentry))
 		return PTR_ERR(newdentry);
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 4f84abaa0d68..238c26142318 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -250,6 +250,7 @@ static inline struct dentry *ovl_do_mkdir(struct ovl_fs *ofs,
 
 	ret = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode);
 	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(ret));
+	/* Note: dir will have been unlocked on failure */
 	return ret;
 }
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4afa91882075..df99a6fa17ef 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -328,11 +328,11 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 		}
 
 		work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
-		inode_unlock(dir);
 		err = PTR_ERR(work);
 		if (IS_ERR(work))
 			goto out_err;
 
+		done_dentry_lookup_return(work);
 		/* Weird filesystem returning with hashed negative (kernfs)? */
 		err = -EINVAL;
 		if (d_really_is_negative(work))
@@ -623,7 +623,8 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
 	child = ovl_lookup_upper(ofs, name, parent, len);
 	if (!IS_ERR(child) && !child->d_inode)
 		child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
-	inode_unlock(parent->d_inode);
+	else
+		inode_unlock(parent->d_inode);
 	dput(parent);
 
 	return child;
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 9c12cb844231..c95bded4e8a7 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -170,7 +170,7 @@ xrep_orphanage_create(
 					     orphanage_dentry, 0750);
 		error = PTR_ERR(orphanage_dentry);
 		if (IS_ERR(orphanage_dentry))
-			goto out_unlock_root;
+			goto out_dput_root;
 	}
 
 	/* Not a directory? Bail out. */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] VFS: introduce dentry_lookup() and friends
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
                   ` (2 preceding siblings ...)
  2025-07-21  7:59 ` [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure NeilBrown
@ 2025-07-21  8:00 ` NeilBrown
  2025-07-21 10:20   ` Amir Goldstein
  2025-07-21  8:00 ` [PATCH 5/7] VFS: add dentry_lookup_killable() NeilBrown
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2025-07-21  8:00 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

dentry_lookup() combines locking the directory and performing a lookup
prior to a change to the directory.
Abstracting this prepares for changing the locking requirements.

dentry_lookup_noperm() does the same without needing a mnt_idmap and
without checking permissions.  This is useful for internal filesystem
management (e.g.  creating virtual files in response to events) and in
other cases similar to lookup_noperm().

dentry_lookup_hashed() also does no permissions checking and assumes
that the hash of the name has already been stored in the qstr.  This is
useful following filename_parentat().

These are intended to be paired with done_dentry_lookup() which provides
the inverse of putting the dentry and unlocking.

Like lookup_one_qstr_excl(), dentry_lookup() returns -ENOENT if
LOOKUP_CREATE was NOT given and the name cannot be found,, and returns
-EEXIST if LOOKUP_EXCL WAS given and the name CAN be found.

These functions replace all uses of lookup_one_qstr_excl() in namei.c
except for those used for rename.

Some of the variants should possibly be inlines in a header.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 158 ++++++++++++++++++++++++++++++------------
 include/linux/namei.h |   8 ++-
 2 files changed, 119 insertions(+), 47 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 950a0d0d54da..f292df61565a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1714,17 +1714,98 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 }
 EXPORT_SYMBOL(lookup_one_qstr_excl);
 
+/**
+ * dentry_lookup_hashed - lookup and lock a name prior to dir ops
+ * @last: the name in the given directory
+ * @base: the directory in which the name is to be found
+ * @lookup_flags: %LOOKUP_xxx flags
+ *
+ * The name is looked up and necessary locks are taken so that
+ * the name can be created or removed.
+ * The "necessary locks" are currently the inode node lock on @base.
+ * The name @last is expected to already have the hash calculated.
+ * No permission checks are performed.
+ * Returns: the dentry, suitably locked, or an ERR_PTR().
+ */
+struct dentry *dentry_lookup_hashed(struct qstr *last,
+				    struct dentry *base,
+				    unsigned int lookup_flags)
+{
+	struct dentry *dentry;
+
+	inode_lock_nested(base->d_inode, I_MUTEX_PARENT);
+
+	dentry = lookup_one_qstr_excl(last, base, lookup_flags);
+	if (IS_ERR(dentry))
+		inode_unlock(base->d_inode);
+	return dentry;
+}
+EXPORT_SYMBOL(dentry_lookup_hashed);
+
+static int lookup_noperm_common(struct qstr *qname, struct dentry *base);
+static int lookup_one_common(struct mnt_idmap *idmap,
+			     struct qstr *qname, struct dentry *base);
+
+/**
+ * dentry_lookup_noperm - lookup and lock a name prior to dir ops
+ * @last: the name in the given directory
+ * @base: the directory in which the name is to be found
+ * @lookup_flags: %LOOKUP_xxx flags
+ *
+ * The name is looked up and necessary locks are taken so that
+ * the name can be created or removed.
+ * The "necessary locks" are currently the inode node lock on @base.
+ * The name @last is NOT expected to have the hash calculated.
+ * No permission checks are performed.
+ * Returns: the dentry, suitably locked, or an ERR_PTR().
+ */
+struct dentry *dentry_lookup_noperm(struct qstr *last,
+				    struct dentry *base,
+				    unsigned int lookup_flags)
+{
+	int err;
+
+	err = lookup_noperm_common(last, base);
+	if (err < 0)
+		return ERR_PTR(err);
+	return dentry_lookup_hashed(last, base, lookup_flags);
+}
+EXPORT_SYMBOL(dentry_lookup_noperm);
+
+/**
+ * dentry_lookup - lookup and lock a name prior to dir ops
+ * @last: the name in the given directory
+ * @base: the directory in which the name is to be found
+ * @lookup_flags: %LOOKUP_xxx flags
+ *
+ * The name is looked up and necessary locks are taken so that
+ * the name can be created or removed.
+ * The "necessary locks" are currently the inode node lock on @base.
+ * The name @last is NOT expected to already have the hash calculated.
+ * Permission checks are performed to ensure %MAY_EXEC access to @base.
+ * Returns: the dentry, suitably locked, or an ERR_PTR().
+ */
+struct dentry *dentry_lookup(struct mnt_idmap *idmap,
+			     struct qstr *last,
+			     struct dentry *base,
+			     unsigned int lookup_flags)
+{
+	int err;
+
+	err = lookup_one_common(idmap, last, base);
+	if (err < 0)
+		return ERR_PTR(err);
+	return dentry_lookup_hashed(last, base, lookup_flags);
+}
+EXPORT_SYMBOL(dentry_lookup);
+
 /**
  * done_dentry_lookup - finish a lookup used for create/delete
  * @dentry:  the target dentry
  *
- * After locking the directory and lookup or validating a dentry
- * an attempt can be made to create (including link) or remove (including
- * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
- * unlock the parent directory and dput() the dentry.
- *
- * If the dentry is an error - as can happen after vfs_mkdir() -
- * the unlock is skipped as unneeded.
+ * Reverse the effects of dentry_lookup() or similar.  If the
+ * @dentry is not an error, the lock and the reference to the dentry
+ * are dropped.
  *
  * This interface allows a smooth transition from parent-dir based
  * locking to dentry based locking.
@@ -1733,6 +1814,7 @@ void done_dentry_lookup(struct dentry *dentry)
 {
 	if (!IS_ERR(dentry)) {
 		inode_unlock(dentry->d_parent->d_inode);
+		d_lookup_done(dentry);
 		dput(dentry);
 	}
 }
@@ -1742,16 +1824,16 @@ EXPORT_SYMBOL(done_dentry_lookup);
  * done_dentry_lookup_return - finish a lookup sequence, returning the dentry
  * @dentry:  the target dentry
  *
- * After locking the directory and lookup or validating a dentry
- * an attempt can be made to create (including link) or remove (including
- * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
- * unlock the parent directory.  The dentry is returned for further use.
+ * Reverse the effects of dentry_lookup() or similar, but keep the dentry.
+ * If @dentry is not an error, the lock is dropped.
  *
  * If the dentry is an error - as can happen after vfs_mkdir() -
  * the unlock is skipped as unneeded.
  *
  * This interface allows a smooth transition from parent-dir based
  * locking to dentry based locking.
+ *
+ * Returns: the dentry.
  */
 struct dentry *done_dentry_lookup_return(struct dentry *dentry)
 {
@@ -2803,12 +2885,9 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 		return ERR_PTR(error);
 	if (unlikely(type != LAST_NORM))
 		return ERR_PTR(-EINVAL);
-	inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
-	d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
-	if (IS_ERR(d)) {
-		inode_unlock(parent_path.dentry->d_inode);
+	d = dentry_lookup_hashed(&last, parent_path.dentry, 0);
+	if (IS_ERR(d))
 		return d;
-	}
 	path->dentry = no_free_ptr(parent_path.dentry);
 	path->mnt = no_free_ptr(parent_path.mnt);
 	return d;
@@ -2827,12 +2906,9 @@ struct dentry *kern_path_locked_negative(const char *name, struct path *path)
 		return ERR_PTR(error);
 	if (unlikely(type != LAST_NORM))
 		return ERR_PTR(-EINVAL);
-	inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
-	d = lookup_one_qstr_excl(&last, parent_path.dentry, LOOKUP_CREATE);
-	if (IS_ERR(d)) {
-		inode_unlock(parent_path.dentry->d_inode);
+	d = dentry_lookup_hashed(&last, parent_path.dentry, LOOKUP_CREATE);
+	if (IS_ERR(d))
 		return d;
-	}
 	path->dentry = no_free_ptr(parent_path.dentry);
 	path->mnt = no_free_ptr(parent_path.mnt);
 	return d;
@@ -4161,7 +4237,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
 	unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
 	int type;
-	int err2;
 	int error;
 
 	error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
@@ -4173,35 +4248,30 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	 * (foo/., foo/.., /////)
 	 */
 	if (unlikely(type != LAST_NORM))
-		goto out;
+		goto put;
 
 	/* don't fail immediately if it's r/o, at least try to report other errors */
-	err2 = mnt_want_write(path->mnt);
+	error = mnt_want_write(path->mnt);
 	/*
 	 * Do the final lookup.  Suppress 'create' if there is a trailing
 	 * '/', and a directory wasn't requested.
 	 */
 	if (last.name[last.len] && !want_dir)
 		create_flags &= ~LOOKUP_CREATE;
-	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
-	dentry = lookup_one_qstr_excl(&last, path->dentry,
-				      reval_flag | create_flags);
+	dentry = dentry_lookup_hashed(&last, path->dentry, reval_flag | create_flags);
 	if (IS_ERR(dentry))
-		goto unlock;
+		goto drop;
 
-	if (unlikely(err2)) {
-		error = err2;
+	if (unlikely(error))
 		goto fail;
-	}
 	return dentry;
 fail:
-	dput(dentry);
+	done_dentry_lookup(dentry);
 	dentry = ERR_PTR(error);
-unlock:
-	inode_unlock(path->dentry->d_inode);
-	if (!err2)
+drop:
+	if (!error)
 		mnt_drop_write(path->mnt);
-out:
+put:
 	path_put(path);
 	return dentry;
 }
@@ -4225,7 +4295,7 @@ void done_path_create(struct path *path, struct dentry *dentry)
 }
 EXPORT_SYMBOL(done_path_create);
 
-inline struct dentry *user_path_create(int dfd, const char __user *pathname,
+struct dentry *user_path_create(int dfd, const char __user *pathname,
 				struct path *path, unsigned int lookup_flags)
 {
 	struct filename *filename = getname(pathname);
@@ -4551,19 +4621,18 @@ int do_rmdir(int dfd, struct filename *name)
 	if (error)
 		goto exit2;
 
-	inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
-	dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+	dentry = dentry_lookup_hashed(&last, path.dentry, lookup_flags);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit3;
+
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit4;
 	error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
 exit4:
-	dput(dentry);
+	done_dentry_lookup(dentry);
 exit3:
-	inode_unlock(path.dentry->d_inode);
 	mnt_drop_write(path.mnt);
 exit2:
 	path_put(&path);
@@ -4680,11 +4749,9 @@ int do_unlinkat(int dfd, struct filename *name)
 	if (error)
 		goto exit2;
 retry_deleg:
-	inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
-	dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+	dentry = dentry_lookup_hashed(&last, path.dentry, lookup_flags);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
-
 		/* Why not before? Because we want correct error value */
 		if (last.name[last.len])
 			goto slashes;
@@ -4696,9 +4763,8 @@ int do_unlinkat(int dfd, struct filename *name)
 		error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
 				   dentry, &delegated_inode);
 exit3:
-		dput(dentry);
+		done_dentry_lookup(dentry);
 	}
-	inode_unlock(path.dentry->d_inode);
 	if (inode)
 		iput(inode);	/* truncate the inode here */
 	inode = NULL;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index e097f11587c9..6d2cf4af5f16 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -80,7 +80,13 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
 struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
 					    struct qstr *name,
 					    struct dentry *base);
-
+struct dentry *dentry_lookup(struct mnt_idmap *idmap,
+			       struct qstr *last, struct dentry *base,
+			       unsigned int lookup_flags);
+struct dentry *dentry_lookup_noperm(struct qstr *name, struct dentry *base,
+				      unsigned int lookup_flags);
+struct dentry *dentry_lookup_hashed(struct qstr *last, struct dentry *base,
+				      unsigned int lookup_flags);
 void done_dentry_lookup(struct dentry *dentry);
 struct dentry *done_dentry_lookup_return(struct dentry *dentry);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] VFS: add dentry_lookup_killable()
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
                   ` (3 preceding siblings ...)
  2025-07-21  8:00 ` [PATCH 4/7] VFS: introduce dentry_lookup() and friends NeilBrown
@ 2025-07-21  8:00 ` NeilBrown
  2025-07-21  8:00 ` [PATCH 6/7] VFS: add rename_lookup() NeilBrown
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21  8:00 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

btrfs/ioctl.c uses a "killable" lock on the directory when creating an
destroying subvols.  overlayfs also does this.

This patch adds dentry_lookup_killable() for these users.

Possibly all dentry_lookup should be killable as there is no down-side,
but that can come in a later patch.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/namei.h |  3 +++
 2 files changed, 40 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index f292df61565a..46657736c7a0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1799,6 +1799,43 @@ struct dentry *dentry_lookup(struct mnt_idmap *idmap,
 }
 EXPORT_SYMBOL(dentry_lookup);
 
+/**
+ * dentry_lookup_killable - lookup and lock a name prior to dir ops
+ * @last: the name in the given directory
+ * @base: the directory in which the name is to be found
+ * @lookup_flags: %LOOKUP_xxx flags
+ *
+ * The name is looked up and necessary locks are taken so that
+ * the name can be created or removed.
+ * The "necessary locks" are currently the inode node lock on @base.
+ * If a fatal signal arrives or is already pending the operation is aborted.
+ * The name @last is NOT expected to already have the hash calculated.
+ * Permission checks are performed to ensure %MAY_EXEC access to @base.
+ * Returns: the dentry, suitably locked, or an ERR_PTR().
+ */
+struct dentry *dentry_lookup_killable(struct mnt_idmap *idmap,
+				      struct qstr *last,
+				      struct dentry *base,
+				      unsigned int lookup_flags)
+{
+	struct dentry *dentry;
+	int err;
+
+	err = lookup_one_common(idmap, last, base);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	err = down_write_killable_nested(&base->d_inode->i_rwsem, I_MUTEX_PARENT);
+	if (err)
+		return ERR_PTR(err);
+
+	dentry = lookup_one_qstr_excl(last, base, lookup_flags);
+	if (IS_ERR(dentry))
+		inode_unlock(base->d_inode);
+	return dentry;
+}
+EXPORT_SYMBOL(dentry_lookup_killable);
+
 /**
  * done_dentry_lookup - finish a lookup used for create/delete
  * @dentry:  the target dentry
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 6d2cf4af5f16..8eade32623d8 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -83,6 +83,9 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
 struct dentry *dentry_lookup(struct mnt_idmap *idmap,
 			       struct qstr *last, struct dentry *base,
 			       unsigned int lookup_flags);
+struct dentry *dentry_lookup_killable(struct mnt_idmap *idmap,
+				      struct qstr *last, struct dentry *base,
+				      unsigned int lookup_flags);
 struct dentry *dentry_lookup_noperm(struct qstr *name, struct dentry *base,
 				      unsigned int lookup_flags);
 struct dentry *dentry_lookup_hashed(struct qstr *last, struct dentry *base,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] VFS: add rename_lookup()
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
                   ` (4 preceding siblings ...)
  2025-07-21  8:00 ` [PATCH 5/7] VFS: add dentry_lookup_killable() NeilBrown
@ 2025-07-21  8:00 ` NeilBrown
  2025-07-21  8:00 ` [PATCH 7/7] VFS: introduce dentry_lock_in() NeilBrown
  2025-07-21 11:32 ` [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations Amir Goldstein
  7 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21  8:00 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

rename_lookup() combines lookup and locking for a rename.

Two names - new_last and old_last - are added to struct renamedata so it
can be passed to rename_lookup() to have the old and new dentries filled
in.

rename_lookup_hashed() assumes that the names are already hashed
and skips permission checking.  This is appropriate for use after
filename_parentat().

rename_lookup_noperm() does hash the name but avoids permission
checking.  This will be used by debugfs.

If either old_dentry or new_dentry are not NULL, the corresponding
"last" is ignored and the dentry is used as-is.  After locks are
obtained we check that the parent is still correct.  If old_parent
was not given, then it is set to the parent of old_dentry which
was locked.  new_parent must never be NULL.

done_rename_lookup() unlocks and drops any dentry references taken.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 316 ++++++++++++++++++++++++++++++++++--------
 include/linux/fs.h    |   4 +
 include/linux/namei.h |   4 +
 3 files changed, 263 insertions(+), 61 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 46657736c7a0..ae8079916ac6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3490,6 +3490,233 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
 }
 EXPORT_SYMBOL(unlock_rename);
 
+/**
+ * rename_lookup_hashed - lookup and lock names for rename
+ * @rd:           rename data containing relevant details
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ *                LOOKUP_NO_SYMLINKS etc).
+ *
+ * Optionally look up two names and ensure locks are in place for
+ * rename.
+ * Normally @rd.old_dentry and @rd.new_dentry are %NULL and the
+ * old and new directories and last names are given in @rd.  In this
+ * case the names are looked up with appropriate locking and the
+ * results stored in @rd.old_dentry and @rd.new_dentry.
+ *
+ * If either are not NULL, then the corresponding lookup is avoided
+ * but the required locks are still taken.  In this case @rd.old_parent
+ * may be %NULL, otherwise @rd.old_dentry must have that as its d_parent
+ * pointer after the locks are obtained.  @rd.new_parent must always
+ * be non-NULL, and must always be the correct parent after locking.
+ *
+ * On success a reference is held on @rd.old_dentry, @rd.new_dentry,
+ * and @rd.old_parent whether they were originally %NULL or not.  These
+ * references are dropped by done_rename_lookup().  @rd.new_parent
+ * must always be non-NULL and no extra reference is taken.
+ *
+ * The passed in qstrs must have the hash calculated, and no permission
+ * checking is performed.
+ *
+ * Returns: zero or an error.
+ */
+int
+rename_lookup_hashed(struct renamedata *rd, int lookup_flags)
+{
+	struct dentry *p;
+	struct dentry *d1, *d2;
+	int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+	int err;
+
+	if (rd->flags & RENAME_EXCHANGE)
+		target_flags = 0;
+	if (rd->flags & RENAME_NOREPLACE)
+		target_flags |= LOOKUP_EXCL;
+
+	if (rd->old_dentry) {
+		/* Already have the dentry - need to be sure to lock the correct parent */
+		p = lock_rename_child(rd->old_dentry, rd->new_parent);
+		if (IS_ERR(p))
+			return PTR_ERR(p);
+		if (d_unhashed(rd->old_dentry) ||
+		    (rd->old_parent && rd->old_parent != rd->old_dentry->d_parent)) {
+			/* dentry was removed, or moved and explicit parent requested */
+			unlock_rename(rd->old_dentry->d_parent, rd->new_parent);
+			return -EINVAL;
+		}
+		rd->old_parent = dget(rd->old_dentry->d_parent);
+		d1 = dget(rd->old_dentry);
+	} else {
+		p = lock_rename(rd->old_parent, rd->new_parent);
+		if (IS_ERR(p))
+			return PTR_ERR(p);
+		dget(rd->old_parent);
+
+		d1 = lookup_one_qstr_excl(&rd->old_last, rd->old_parent,
+					  lookup_flags);
+		if (IS_ERR(d1))
+			goto out_unlock_1;
+	}
+	if (rd->new_dentry) {
+		if (d_unhashed(rd->new_dentry) ||
+		    rd->new_dentry->d_parent != rd->new_parent) {
+			/* new_dentry was moved or removed! */
+			goto out_unlock_2;
+		}
+		d2 = dget(rd->new_dentry);
+	} else {
+		d2 = lookup_one_qstr_excl(&rd->new_last, rd->new_parent,
+					  lookup_flags | target_flags);
+		if (IS_ERR(d2))
+			goto out_unlock_2;
+	}
+
+	if (d1 == p) {
+		/* source is an ancestor of target */
+		err = -EINVAL;
+		goto out_unlock_3;
+	}
+
+	if (d2 == p) {
+		/* target is an ancestor of source */
+		if (rd->flags & RENAME_EXCHANGE)
+			err = -EINVAL;
+		else
+			err = -ENOTEMPTY;
+		goto out_unlock_3;
+	}
+
+	rd->old_dentry = d1;
+	rd->new_dentry = d2;
+	return 0;
+
+out_unlock_3:
+	d_lookup_done(d2);
+	dput(d2);
+	d2 = ERR_PTR(err);
+out_unlock_2:
+	d_lookup_done(d1);
+	dput(d1);
+	d1 = d2;
+out_unlock_1:
+	unlock_rename(rd->old_parent, rd->new_parent);
+	dput(rd->old_parent);
+	return PTR_ERR(d1);
+}
+EXPORT_SYMBOL(rename_lookup_hashed);
+
+/**
+ * rename_lookup_noperm - lookup and lock names for rename
+ * @rd:           rename data containing relevant details
+ * @lookup_flags: optional LOOKUP_REVAL to pass to ->lookup
+ *
+ * Optionally look up two names and ensure locks are in place for
+ * rename.
+ * Normally @rd.old_dentry and @rd.new_dentry are %NULL and the
+ * old and new directories and last names are given in @rd.  In this
+ * case the names are looked up with appropriate locking and the
+ * results stored in @rd.old_dentry and @rd.new_dentry.
+ *
+ * If either are not NULL, then the corresponding lookup is avoided
+ * but the required locks are still taken.  In this case @rd.old_parent
+ * may be %NULL, otherwise @rd.old_dentry must have that as its d_parent
+ * pointer after the locks are obtained.  @rd.new_parent must always
+ * be non-NULL, and must always be the correct parent after locking.
+ *
+ * On success a reference is held on @rd.old_dentry, @rd.new_dentry,
+ * and @rd.old_parent whether they were originally %NULL or not.  These
+ * references are dropped by done_rename_lookup().  @rd.new_parent
+ * must always be non-NULL and no extra reference is taken.
+ *
+ * The passed in qstrs need not have the hash calculated, and no
+ * permission checking is performed.
+ *
+ * Returns: zero or an error.
+ */
+int rename_lookup_noperm(struct renamedata *rd, int lookup_flags)
+{
+	int err;
+
+	if (!rd->old_dentry) {
+		err = lookup_noperm_common(&rd->old_last, rd->old_parent);
+		if (err)
+			return err;
+	}
+	if (!rd->new_dentry) {
+		err = lookup_noperm_common(&rd->new_last, rd->new_parent);
+		if (err)
+			return err;
+	}
+	return rename_lookup_hashed(rd, lookup_flags);
+}
+EXPORT_SYMBOL(rename_lookup_noperm);
+
+/**
+ * rename_lookup - lookup and lock names for rename
+ * @rd:           rename data containing relevant details
+ * @lookup_flags: optional LOOKUP_REVAL to pass to ->lookup
+ *
+ * Optionally look up two names and ensure locks are in place for
+ * rename.
+ * Normally @rd.old_dentry and @rd.new_dentry are %NULL and the
+ * old and new directories and last names are given in @rd.  In this
+ * case the names are looked up with appropriate locking and the
+ * results stored in @rd.old_dentry and @rd.new_dentry.
+ *
+ * If either are not NULL, then the corresponding lookup is avoided
+ * but the required locks are still taken.  In this case @rd.old_parent
+ * may be %NULL, otherwise @rd.old_dentry must have that as its d_parent
+ * pointer after the locks are obtained.  @rd.new_parent must always
+ * be non-NULL, and must always be the correct parent after locking.
+ *
+ * On success a reference is held on @rd.old_dentry, @rd.new_dentry,
+ * and @rd.old_parent whether they were originally %NULL or not.  These
+ * references are dropped by done_rename_lookup().  @rd.new_parent
+ * must always be non-NULL and no extra reference is taken.
+ *
+ * The passed in qstrs need not have the hash calculated, and normal
+ * permission checking for MAY_EXEC is performed.
+ *
+ * Returns: zero or an error.
+ */
+int rename_lookup(struct renamedata *rd, int lookup_flags)
+{
+	int err;
+
+	if (!rd->old_dentry) {
+		err = lookup_one_common(rd->mnt_idmap, &rd->old_last,
+					rd->old_parent);
+		if (err)
+			return err;
+	}
+	if (!rd->new_dentry) {
+		err = lookup_one_common(rd->mnt_idmap, &rd->new_last,
+					rd->new_parent);
+		if (err)
+			return err;
+	}
+	return rename_lookup_hashed(rd, lookup_flags);
+}
+EXPORT_SYMBOL(rename_lookup);
+
+/**
+ * done_rename_lookup - unlock dentries after rename
+ * @rd: the struct renamedata that was passed to rename_lookup()
+ *
+ * After a successful rename_lookup() (or similar) call, and after
+ * any required renaming is performed, done_rename_rename() must be called
+ * to drop any locks and references that were obtained by the earlier function.
+ */
+void done_rename_lookup(struct renamedata *rd)
+{
+	d_lookup_done(rd->old_dentry);
+	d_lookup_done(rd->new_dentry);
+	unlock_rename(rd->old_parent, rd->new_parent);
+	dput(rd->old_parent);
+	dput(rd->old_dentry);
+	dput(rd->new_dentry);
+}
+EXPORT_SYMBOL(done_rename_lookup);
+
 /**
  * vfs_prepare_mode - prepare the mode to be used for a new inode
  * @idmap:	idmap of the mount the inode was found from
@@ -5318,14 +5545,10 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 		 struct filename *to, unsigned int flags)
 {
 	struct renamedata rd;
-	struct dentry *old_dentry, *new_dentry;
-	struct dentry *trap;
 	struct path old_path, new_path;
-	struct qstr old_last, new_last;
 	int old_type, new_type;
 	struct inode *delegated_inode = NULL;
-	unsigned int lookup_flags = 0, target_flags =
-		LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+	unsigned int lookup_flags = 0;
 	bool should_retry = false;
 	int error = -EINVAL;
 
@@ -5336,19 +5559,14 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	    (flags & RENAME_EXCHANGE))
 		goto put_names;
 
-	if (flags & RENAME_EXCHANGE)
-		target_flags = 0;
-	if (flags & RENAME_NOREPLACE)
-		target_flags |= LOOKUP_EXCL;
-
 retry:
 	error = filename_parentat(olddfd, from, lookup_flags, &old_path,
-				  &old_last, &old_type);
+				  &rd.old_last, &old_type);
 	if (error)
 		goto put_names;
 
-	error = filename_parentat(newdfd, to, lookup_flags, &new_path, &new_last,
-				  &new_type);
+	error = filename_parentat(newdfd, to, lookup_flags, &new_path,
+				  &rd.new_last, &new_type);
 	if (error)
 		goto exit1;
 
@@ -5370,66 +5588,42 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 		goto exit2;
 
 retry_deleg:
-	trap = lock_rename(new_path.dentry, old_path.dentry);
-	if (IS_ERR(trap)) {
-		error = PTR_ERR(trap);
+	rd.old_parent	   = old_path.dentry;
+	rd.mnt_idmap	   = mnt_idmap(old_path.mnt);
+	rd.old_dentry	   = NULL;
+	rd.new_parent	   = new_path.dentry;
+	rd.new_dentry	   = NULL;
+	rd.delegated_inode = &delegated_inode;
+	rd.flags	   = flags;
+
+	error = rename_lookup_hashed(&rd, lookup_flags);
+	if (error)
 		goto exit_lock_rename;
-	}
 
-	old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
-					  lookup_flags);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
-		goto exit3;
-	new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
-					  lookup_flags | target_flags);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
-		goto exit4;
 	if (flags & RENAME_EXCHANGE) {
-		if (!d_is_dir(new_dentry)) {
+		if (!d_is_dir(rd.new_dentry)) {
 			error = -ENOTDIR;
-			if (new_last.name[new_last.len])
-				goto exit5;
+			if (rd.new_last.name[rd.new_last.len])
+				goto exit_unlock;
 		}
 	}
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
-	if (!d_is_dir(old_dentry)) {
+	if (!d_is_dir(rd.old_dentry)) {
 		error = -ENOTDIR;
-		if (old_last.name[old_last.len])
-			goto exit5;
-		if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
-			goto exit5;
-	}
-	/* source should not be ancestor of target */
-	error = -EINVAL;
-	if (old_dentry == trap)
-		goto exit5;
-	/* target should not be an ancestor of source */
-	if (!(flags & RENAME_EXCHANGE))
-		error = -ENOTEMPTY;
-	if (new_dentry == trap)
-		goto exit5;
+		if (rd.old_last.name[rd.old_last.len])
+			goto exit_unlock;
+		if (!(flags & RENAME_EXCHANGE) && rd.new_last.name[rd.new_last.len])
+			goto exit_unlock;
+	}
 
-	error = security_path_rename(&old_path, old_dentry,
-				     &new_path, new_dentry, flags);
+	error = security_path_rename(&old_path, rd.old_dentry,
+				     &new_path, rd.new_dentry, flags);
 	if (error)
-		goto exit5;
+		goto exit_unlock;
 
-	rd.old_parent	   = old_path.dentry;
-	rd.old_dentry	   = old_dentry;
-	rd.mnt_idmap	   = mnt_idmap(old_path.mnt);
-	rd.new_parent	   = new_path.dentry;
-	rd.new_dentry	   = new_dentry;
-	rd.delegated_inode = &delegated_inode;
-	rd.flags	   = flags;
 	error = vfs_rename(&rd);
-exit5:
-	dput(new_dentry);
-exit4:
-	dput(old_dentry);
-exit3:
-	unlock_rename(new_path.dentry, old_path.dentry);
+exit_unlock:
+	done_rename_lookup(&rd);
 exit_lock_rename:
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d3e27da8a6aa..1ae7328749a0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2008,8 +2008,10 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
  * @mnt_idmap:     idmap of the mount in which the rename is happening.
  * @old_parent:        parent of source
  * @old_dentry:                source
+ * @old_last:          name for old_dentry in old_dir, if old_dentry not given
  * @new_parent:        parent of destination
  * @new_dentry:                destination
+ * @new_last:          name for new_dentry in new_dir, if new_dentry not given
  * @delegated_inode:   returns an inode needing a delegation break
  * @flags:             rename flags
  */
@@ -2017,8 +2019,10 @@ struct renamedata {
 	struct mnt_idmap *mnt_idmap;
 	struct dentry *old_parent;
 	struct dentry *old_dentry;
+	struct qstr old_last;
 	struct dentry *new_parent;
 	struct dentry *new_dentry;
+	struct qstr new_last;
 	struct inode **delegated_inode;
 	unsigned int flags;
 } __randomize_layout;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 8eade32623d8..c86d9683563c 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -100,6 +100,10 @@ extern int follow_up(struct path *);
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
+int rename_lookup(struct renamedata *rd, int lookup_flags);
+int rename_lookup_noperm(struct renamedata *rd, int lookup_flags);
+int rename_lookup_hashed(struct renamedata *rd, int lookup_flags);
+void done_rename_lookup(struct renamedata *rd);
 
 /**
  * mode_strip_umask - handle vfs umask stripping
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] VFS: introduce dentry_lock_in()
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
                   ` (5 preceding siblings ...)
  2025-07-21  8:00 ` [PATCH 6/7] VFS: add rename_lookup() NeilBrown
@ 2025-07-21  8:00 ` NeilBrown
  2025-07-21 11:32 ` [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations Amir Goldstein
  7 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21  8:00 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel

A few callers operate on a dentry which they already have - unlike the
normal case where a lookup proceeds an operation.

For these callers dentry_lock_in() is provided where other callers would
use dentry_lookup().  The call will fail if, after the lock was
gained, the child is no longer a child of the given parent.

When the operation completes done_dentry_lookup() must be called.  An
extra reference is taken when the dentry_lock_in() call succeeds
and will be dropped by done_dentry_lookup().

This will be used in smb/server, ecryptfs, and overlayfs, each of which
have their own lock_parent() or parent_lock() or similar; and a few
other places which lock the parent but don't check if the parent is
still correct (often because rename isn't supported so parent cannot be
incorrect).

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 26 ++++++++++++++++++++++++++
 include/linux/namei.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index ae8079916ac6..ed656a1e458c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1836,6 +1836,32 @@ struct dentry *dentry_lookup_killable(struct mnt_idmap *idmap,
 }
 EXPORT_SYMBOL(dentry_lookup_killable);
 
+/**
+ * dentry_lock_in: lock a dentry in given parent prior to dir ops
+ * @child: the dentry to lock
+ * @parent: the dentry of the assumed parent
+ *
+ * The child is locked - currently by taking i_rwsem on the parent - to
+ * prepare for create/remove operations.  If the given parent is no longer
+ * the parent of the dentry after the lock is gained, the lock is released
+ * and the call fails (returns %false).
+ *
+ * A reference is taken to the child on success.  The lock and reference
+ * must both be dropped by done_dentry_lookup() after the operation completes.
+ */
+bool dentry_lock_in(struct dentry *child, struct dentry *parent)
+{
+	inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
+	if (child->d_parent == parent) {
+		/* get the child to balance with done_dentry_lookup() which puts it. */
+		dget(child);
+		return true;
+	}
+	inode_unlock(d_inode(parent));
+	return false;
+}
+EXPORT_SYMBOL(dentry_lock_in);
+
 /**
  * done_dentry_lookup - finish a lookup used for create/delete
  * @dentry:  the target dentry
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c86d9683563c..61ab251237e4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -104,6 +104,7 @@ int rename_lookup(struct renamedata *rd, int lookup_flags);
 int rename_lookup_noperm(struct renamedata *rd, int lookup_flags);
 int rename_lookup_hashed(struct renamedata *rd, int lookup_flags);
 void done_rename_lookup(struct renamedata *rd);
+bool dentry_lock_in(struct dentry *child, struct dentry *parent);
 
 /**
  * mode_strip_umask - handle vfs umask stripping
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] VFS: introduce dentry_lookup() and friends
  2025-07-21  8:00 ` [PATCH 4/7] VFS: introduce dentry_lookup() and friends NeilBrown
@ 2025-07-21 10:20   ` Amir Goldstein
  2025-07-21 23:27     ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2025-07-21 10:20 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Mon, Jul 21, 2025 at 10:55 AM NeilBrown <neil@brown.name> wrote:
>
> dentry_lookup() combines locking the directory and performing a lookup
> prior to a change to the directory.
> Abstracting this prepares for changing the locking requirements.
>
> dentry_lookup_noperm() does the same without needing a mnt_idmap and
> without checking permissions.  This is useful for internal filesystem
> management (e.g.  creating virtual files in response to events) and in
> other cases similar to lookup_noperm().
>
> dentry_lookup_hashed() also does no permissions checking and assumes
> that the hash of the name has already been stored in the qstr.

That's a very confusing choice of name because _hashed() (to me) sounds
like the opposite of d_unhashed() which is not at all the case.

> This is useful following filename_parentat().
>
> These are intended to be paired with done_dentry_lookup() which provides
> the inverse of putting the dentry and unlocking.
>
> Like lookup_one_qstr_excl(), dentry_lookup() returns -ENOENT if
> LOOKUP_CREATE was NOT given and the name cannot be found,, and returns
> -EEXIST if LOOKUP_EXCL WAS given and the name CAN be found.
>
> These functions replace all uses of lookup_one_qstr_excl() in namei.c
> except for those used for rename.
>
> Some of the variants should possibly be inlines in a header.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/namei.c            | 158 ++++++++++++++++++++++++++++++------------
>  include/linux/namei.h |   8 ++-
>  2 files changed, 119 insertions(+), 47 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 950a0d0d54da..f292df61565a 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1714,17 +1714,98 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  }
>  EXPORT_SYMBOL(lookup_one_qstr_excl);
>
> +/**
> + * dentry_lookup_hashed - lookup and lock a name prior to dir ops
> + * @last: the name in the given directory
> + * @base: the directory in which the name is to be found
> + * @lookup_flags: %LOOKUP_xxx flags
> + *
> + * The name is looked up and necessary locks are taken so that
> + * the name can be created or removed.
> + * The "necessary locks" are currently the inode node lock on @base.
> + * The name @last is expected to already have the hash calculated.
> + * No permission checks are performed.
> + * Returns: the dentry, suitably locked, or an ERR_PTR().
> + */
> +struct dentry *dentry_lookup_hashed(struct qstr *last,
> +                                   struct dentry *base,
> +                                   unsigned int lookup_flags)
> +{
> +       struct dentry *dentry;
> +
> +       inode_lock_nested(base->d_inode, I_MUTEX_PARENT);
> +
> +       dentry = lookup_one_qstr_excl(last, base, lookup_flags);
> +       if (IS_ERR(dentry))
> +               inode_unlock(base->d_inode);
> +       return dentry;
> +}
> +EXPORT_SYMBOL(dentry_lookup_hashed);

Observation:

This part could be factored out of
__kern_path_locked()/kern_path_locked_negative()

If you do that in patch 2 while introducing done_dentry_lookup() then
it also makes
a lot of sense to balance the introduced done_dentry_lookup() with the
factored out
helper __dentry_lookup_locked() or whatever its name is.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations
  2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
                   ` (6 preceding siblings ...)
  2025-07-21  8:00 ` [PATCH 7/7] VFS: introduce dentry_lock_in() NeilBrown
@ 2025-07-21 11:32 ` Amir Goldstein
  2025-07-21 23:48   ` NeilBrown
  7 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2025-07-21 11:32 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Mon, Jul 21, 2025 at 10:46 AM NeilBrown <neil@brown.name> wrote:
>
> Hi,
>
>  these patches (against vfs.all) primarily introduce new APIs for
>  preparing dentries for create, remove, rename.  The goal is to
>  centralise knowledge of how we do locking (currently by locking the
>  directory) so that we can eventually change the mechanism (e.g.  to
>  locking just the dentry).
>
>  Naming is difficult and I've changed my mind several times. :-)

Indeed it is.
I generally like the done_ approach that you took.
Few minor naming comments follow.

>
>  The basic approach is to return a dentry which can be passed to
>  vfs_create(), vfs_unlink() etc, and subsequently to release that
>  dentry.  The closest analogue to this in the VFS is kern_path_create()
>  which is paired with done_path_create(), though there is also
>  kern_path_locked() which is paired with explicit inode_unlock() and
>  dput().  So my current approach uses "done_" for finishing up.
>
>  I have:
>    dentry_lookup() dentry_lookup_noperm() dentry_lookup_hashed()

As I wrote on the patch that introduces them I find dentry_lookup_hashed()
confusing because the dentry is not hashed (only the hash is calculated).

Looking at another precedent of _noperm() vfs API we have:

vfs_setxattr()
  __vfs_setxattr_locked()
    __vfs_setxattr_noperm()
      __vfs_setxattr()

Do I'd say for lack of better naming __dentry_lookup() could makes sense
for the bare lock&dget and it could also be introduced earlier along with
introducing done_dentry_lookup()

>    dentry_lookup_killable()
>  paired with
>    done_dentry_lookup()
>
>  and also
>    rename_lookup() rename_lookup_noperm() rename_lookup_hashed()
>  paired with
>    done_rename_lookup()
>  (these take a "struct renamedata *" to which some qstrs are added.
>
>  There is also "dentry_lock_in()" which is used instead of
>  dentry_lookup() when you already have the dentry and want to lock it.
>  So you "lock" it "in" a given parent.  I'm not very proud of this name,
>  but I don't want to use "dentry_lock" as I want to save that for
>  low-level locking primitives.

Very strange name :)

What's wrong with dentry_lock_parent()?

Although I think that using the verb _lock_ for locking and dget is
actively confusing, so something along the lines of
resume_dentry_lookup()/dentry_lookup_reacquire() might serve the
readers of the code better.

>
>  There is also done_dentry_lookup_return() which doesn't dput() the
>  dentry but returns it instread.  In about 1/6 of places where I need
>  done_dentry_lookup() the code makes use of the dentry afterwards.  Only
>  in half the places where done_dentry_lookup_return() is used is the
>  returned value immediately returned by the calling function.  I could
>  do a dget() before done_dentry_lookup(), but that looks awkward and I
>  think having the _return version is justified.  I'm happy to hear other
>  opinions.

The name is not very descriptive IMO, but I do not have a better suggestion.
Unless you can describe it for the purpose that it is used for, e.g.
yeild_dentry_lookup() that can be followed with resume_dentry_lookup(),
but I do not know if those are your intentions for the return API.

Thanks,
Amir.

>
>  In order for this dentry-focussed API to work we need to have the
>  dentry to unlock.  vfs_rmdir() currently consumes the dentry on
>  failure, so we don't have it unless we clumsily keep a copy.  So an
>  early patch changes vfs_rmdir() to both consume the dentry and drop the
>  lock on failure.
>
>  After these new APIs are refined, agreed, and applied I will have a
>  collection of patches to roll them out throughout the kernel.  Then we
>  can start/continue discussing a new approach to locking which allows
>  directory operations to proceed in parallel.
>
>  If you want a sneak peek at some of this future work - for context
>  mostly - my current devel code is at https://github.com/neilbrown/linux.git
>  in a branch "pdirops".  Be warned that a lot of the later code is under
>  development, is known to be wrong, and doesn't even compile.  Not today
>  anyway.  The rolling out of the new APIs is fairly mature though.
>
>  Please review and suggest better names, or tell me that my choices are adequate.
>  And find the bugs in the code too :-)
>
>  I haven't cc:ed the maintains of the non-VFS code that the patches
>  touch.  I can do that once the approach and names have been approved.
>
> Thanks,
> NeilBrown
>
>
>  [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
>  [PATCH 2/7] VFS: introduce done_dentry_lookup()
>  [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
>  [PATCH 4/7] VFS: introduce dentry_lookup() and friends
>  [PATCH 5/7] VFS: add dentry_lookup_killable()
>  [PATCH 6/7] VFS: add rename_lookup()
>  [PATCH 7/7] VFS: introduce dentry_lock_in()
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
  2025-07-21  7:59 ` [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
@ 2025-07-21 13:06   ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2025-07-21 13:06 UTC (permalink / raw)
  To: NeilBrown, Linus Torvalds, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-fsdevel, linux-kernel

On Mon, 2025-07-21 at 17:59 +1000, NeilBrown wrote:
> A rename can only rename with a single mount.  Callers of vfs_rename()
> must and do ensure this is the case.
> 
> So there is no point in having two mnt_idmaps in renamedata as they are
> always the same.  Only of of them is passed to ->rename in any case.
> 
> This patch replaces both with a single "mnt_idmap" and changes all
> callers.
> 
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/cachefiles/namei.c    |  3 +--
>  fs/ecryptfs/inode.c      |  3 +--
>  fs/namei.c               | 17 ++++++++---------
>  fs/nfsd/vfs.c            |  3 +--
>  fs/overlayfs/overlayfs.h |  3 +--
>  fs/smb/server/vfs.c      |  3 +--
>  include/linux/fs.h       |  6 ++----
>  7 files changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 91dfd0231877..d1edb2ac3837 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -387,10 +387,9 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
>  		cachefiles_io_error(cache, "Rename security error %d", ret);
>  	} else {
>  		struct renamedata rd = {
> -			.old_mnt_idmap	= &nop_mnt_idmap,
> +			.mnt_idmap	= &nop_mnt_idmap,
>  			.old_parent	= dir,
>  			.old_dentry	= rep,
> -			.new_mnt_idmap	= &nop_mnt_idmap,
>  			.new_parent	= cache->graveyard,
>  			.new_dentry	= grave,
>  		};
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 72fbe1316ab8..abd954c6a14e 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -634,10 +634,9 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
>  		goto out_lock;
>  	}
>  
> -	rd.old_mnt_idmap	= &nop_mnt_idmap;
> +	rd.mnt_idmap		= &nop_mnt_idmap;
>  	rd.old_parent		= lower_old_dir_dentry;
>  	rd.old_dentry		= lower_old_dentry;
> -	rd.new_mnt_idmap	= &nop_mnt_idmap;
>  	rd.new_parent		= lower_new_dir_dentry;
>  	rd.new_dentry		= lower_new_dentry;
>  	rc = vfs_rename(&rd);
> diff --git a/fs/namei.c b/fs/namei.c
> index cd43ff89fbaa..1c80445693d4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -5024,20 +5024,20 @@ int vfs_rename(struct renamedata *rd)
>  	if (source == target)
>  		return 0;
>  
> -	error = may_delete(rd->old_mnt_idmap, old_dir, old_dentry, is_dir);
> +	error = may_delete(rd->mnt_idmap, old_dir, old_dentry, is_dir);
>  	if (error)
>  		return error;
>  
>  	if (!target) {
> -		error = may_create(rd->new_mnt_idmap, new_dir, new_dentry);
> +		error = may_create(rd->mnt_idmap, new_dir, new_dentry);
>  	} else {
>  		new_is_dir = d_is_dir(new_dentry);
>  
>  		if (!(flags & RENAME_EXCHANGE))
> -			error = may_delete(rd->new_mnt_idmap, new_dir,
> +			error = may_delete(rd->mnt_idmap, new_dir,
>  					   new_dentry, is_dir);
>  		else
> -			error = may_delete(rd->new_mnt_idmap, new_dir,
> +			error = may_delete(rd->mnt_idmap, new_dir,
>  					   new_dentry, new_is_dir);
>  	}
>  	if (error)
> @@ -5052,13 +5052,13 @@ int vfs_rename(struct renamedata *rd)
>  	 */
>  	if (new_dir != old_dir) {
>  		if (is_dir) {
> -			error = inode_permission(rd->old_mnt_idmap, source,
> +			error = inode_permission(rd->mnt_idmap, source,
>  						 MAY_WRITE);
>  			if (error)
>  				return error;
>  		}
>  		if ((flags & RENAME_EXCHANGE) && new_is_dir) {
> -			error = inode_permission(rd->new_mnt_idmap, target,
> +			error = inode_permission(rd->mnt_idmap, target,
>  						 MAY_WRITE);
>  			if (error)
>  				return error;
> @@ -5126,7 +5126,7 @@ int vfs_rename(struct renamedata *rd)
>  		if (error)
>  			goto out;
>  	}
> -	error = old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry,
> +	error = old_dir->i_op->rename(rd->mnt_idmap, old_dir, old_dentry,
>  				      new_dir, new_dentry, flags);
>  	if (error)
>  		goto out;
> @@ -5269,10 +5269,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
>  
>  	rd.old_parent	   = old_path.dentry;
>  	rd.old_dentry	   = old_dentry;
> -	rd.old_mnt_idmap   = mnt_idmap(old_path.mnt);
> +	rd.mnt_idmap	   = mnt_idmap(old_path.mnt);
>  	rd.new_parent	   = new_path.dentry;
>  	rd.new_dentry	   = new_dentry;
> -	rd.new_mnt_idmap   = mnt_idmap(new_path.mnt);
>  	rd.delegated_inode = &delegated_inode;
>  	rd.flags	   = flags;
>  	error = vfs_rename(&rd);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 7d522e426b2d..a21940cadede 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1940,10 +1940,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  		goto out_dput_old;
>  	} else {
>  		struct renamedata rd = {
> -			.old_mnt_idmap	= &nop_mnt_idmap,
> +			.mnt_idmap	= &nop_mnt_idmap,
>  			.old_parent	= fdentry,
>  			.old_dentry	= odentry,
> -			.new_mnt_idmap	= &nop_mnt_idmap,
>  			.new_parent	= tdentry,
>  			.new_dentry	= ndentry,
>  		};
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index bb0d7ded8e76..4f84abaa0d68 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -361,10 +361,9 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
>  {
>  	int err;
>  	struct renamedata rd = {
> -		.old_mnt_idmap	= ovl_upper_mnt_idmap(ofs),
> +		.mnt_idmap	= ovl_upper_mnt_idmap(ofs),
>  		.old_parent	= olddir,
>  		.old_dentry	= olddentry,
> -		.new_mnt_idmap	= ovl_upper_mnt_idmap(ofs),
>  		.new_parent	= newdir,
>  		.new_dentry	= newdentry,
>  		.flags		= flags,
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 49e731dd0529..bfd62a21e75c 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -770,10 +770,9 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
>  		goto out4;
>  	}
>  
> -	rd.old_mnt_idmap	= mnt_idmap(old_path->mnt),
> +	rd.mnt_idmap		= mnt_idmap(old_path->mnt),
>  	rd.old_parent		= old_parent,
>  	rd.old_dentry		= old_child,
> -	rd.new_mnt_idmap	= mnt_idmap(new_path.mnt),
>  	rd.new_parent		= new_path.dentry,
>  	rd.new_dentry		= new_dentry,
>  	rd.flags		= flags,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1948b2c828d3..d3e27da8a6aa 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2005,20 +2005,18 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
>  
>  /**
>   * struct renamedata - contains all information required for renaming
> - * @old_mnt_idmap:     idmap of the old mount the inode was found from
> + * @mnt_idmap:     idmap of the mount in which the rename is happening.
>   * @old_parent:        parent of source
>   * @old_dentry:                source
> - * @new_mnt_idmap:     idmap of the new mount the inode was found from
>   * @new_parent:        parent of destination
>   * @new_dentry:                destination
>   * @delegated_inode:   returns an inode needing a delegation break
>   * @flags:             rename flags
>   */
>  struct renamedata {
> -	struct mnt_idmap *old_mnt_idmap;
> +	struct mnt_idmap *mnt_idmap;
>  	struct dentry *old_parent;
>  	struct dentry *old_dentry;
> -	struct mnt_idmap *new_mnt_idmap;
>  	struct dentry *new_parent;
>  	struct dentry *new_dentry;
>  	struct inode **delegated_inode;

Nice cleanup.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] VFS: introduce done_dentry_lookup()
  2025-07-21  7:59 ` [PATCH 2/7] VFS: introduce done_dentry_lookup() NeilBrown
@ 2025-07-21 13:39   ` Jeff Layton
  2025-07-21 23:04     ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2025-07-21 13:39 UTC (permalink / raw)
  To: NeilBrown, Linus Torvalds, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-fsdevel, linux-kernel

On Mon, 2025-07-21 at 17:59 +1000, NeilBrown wrote:
> done_dentry_lookup() is the first step in introducing a new API for
> locked operation on names in directories - those that create or remove
> names.  Rename operations will also be part of this API but will
> use separate interfaces.
> 
> The plan is to lock just the dentry (or dentries), not the whole
> directory.  A "dentry_lookup()" operation will perform the locking and
> lookup with a corresponding "done_dentry_lookup()" releasing the
> resulting dentry and dropping any locks.
> 
> This done_dentry_lookup() can immediately be used to complete updates
> started with kern_path_locked() (much as done_path_create() already
> completes operations started with kern_path_create()).
> 
> So this patch adds done_dentry_lookup() and uses it where
> kern_path_locked() is used.  It also adds done_dentry_lookup_return()
> which returns a reference to the dentry rather than dropping it.  This
> is a less common need in existing code, but still worth its own interface.
> 
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  drivers/base/devtmpfs.c |  7 ++-----
>  fs/bcachefs/fs-ioctl.c  |  3 +--
>  fs/namei.c              | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/namei.h   |  3 +++
>  kernel/audit_fsnotify.c |  9 ++++-----
>  kernel/audit_watch.c    |  3 +--
>  6 files changed, 49 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 31bfb3194b4c..47bee8166c8d 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -265,8 +265,7 @@ static int dev_rmdir(const char *name)
>  	else
>  		err = -EPERM;
>  
> -	dput(dentry);
> -	inode_unlock(d_inode(parent.dentry));
> +	done_dentry_lookup(dentry);
>  	path_put(&parent);
>  	return err;
>  }
> @@ -349,9 +348,7 @@ static int handle_remove(const char *nodename, struct device *dev)
>  		if (!err || err == -ENOENT)
>  			deleted = 1;
>  	}
> -	dput(dentry);
> -	inode_unlock(d_inode(parent.dentry));
> -
> +	done_dentry_lookup(dentry);
>  	path_put(&parent);
>  	if (deleted && strchr(nodename, '/'))
>  		delete_path(nodename);
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index 4e72e654da96..8077ddf4ddc4 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -351,8 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
>  		d_invalidate(victim);
>  	}
>  err:
> -	inode_unlock(dir);
> -	dput(victim);
> +	done_dentry_lookup(victim);
>  	path_put(&path);
>  	return ret;
>  }
> diff --git a/fs/namei.c b/fs/namei.c
> index 1c80445693d4..da160a01e23d 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1714,6 +1714,44 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  }
>  EXPORT_SYMBOL(lookup_one_qstr_excl);
>  
> +/**
> + * done_dentry_lookup - finish a lookup used for create/delete
> + * @dentry:  the target dentry
> + *
> + * After locking the directory and lookup or validating a dentry
> + * an attempt can be made to create (including link) or remove (including
> + * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
> + * unlock the parent directory and dput() the dentry.
> + *
> + * This interface allows a smooth transition from parent-dir based
> + * locking to dentry based locking.
> + */
> +void done_dentry_lookup(struct dentry *dentry)
> +{
> +	inode_unlock(dentry->d_parent->d_inode);
> +	dput(dentry);
> +}
> +EXPORT_SYMBOL(done_dentry_lookup);
> +
> +/**
> + * done_dentry_lookup_return - finish a lookup sequence, returning the dentry
> + * @dentry:  the target dentry
> + *
> + * After locking the directory and lookup or validating a dentry
> + * an attempt can be made to create (including link) or remove (including
> + * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
> + * unlock the parent directory.  The dentry is returned for further use.
> + *
> + * This interface allows a smooth transition from parent-dir based
> + * locking to dentry based locking.
> + */
> +struct dentry *done_dentry_lookup_return(struct dentry *dentry)
> +{
> +	inode_unlock(dentry->d_parent->d_inode);
> +	return dentry;
> +}
> +EXPORT_SYMBOL(done_dentry_lookup_return);
> +
>  /**
>   * lookup_fast - do fast lockless (but racy) lookup of a dentry
>   * @nd: current nameidata
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 5d085428e471..e097f11587c9 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -81,6 +81,9 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
>  					    struct qstr *name,
>  					    struct dentry *base);
>  
> +void done_dentry_lookup(struct dentry *dentry);
> +struct dentry *done_dentry_lookup_return(struct dentry *dentry);
> +
>  extern int follow_down_one(struct path *);
>  extern int follow_down(struct path *path, unsigned int flags);
>  extern int follow_up(struct path *);
> diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> index c565fbf66ac8..170836c3520f 100644
> --- a/kernel/audit_fsnotify.c
> +++ b/kernel/audit_fsnotify.c
> @@ -85,8 +85,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
>  	dentry = kern_path_locked(pathname, &path);
>  	if (IS_ERR(dentry))
>  		return ERR_CAST(dentry); /* returning an error */
> -	inode = path.dentry->d_inode;
> -	inode_unlock(inode);
> +	inode = igrab(dentry->d_inode);

This is a little confusing. This patch changes "inode" from pointing to
the parent inode to point to the child inode instead. That actually
makes a bit more sense given the naming, but might best be done in a
separate patch.

> +	done_dentry_lookup(dentry);
>  
>  	audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
>  	if (unlikely(!audit_mark)) {
> @@ -97,17 +97,16 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
>  	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group);
>  	audit_mark->mark.mask = AUDIT_FS_EVENTS;
>  	audit_mark->path = pathname;
> -	audit_update_mark(audit_mark, dentry->d_inode);
> +	audit_update_mark(audit_mark, inode);
>  	audit_mark->rule = krule;
>  
> -	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
> +	ret = fsnotify_add_inode_mark(&audit_mark->mark, path.dentry->d_inode, 0);
>  	if (ret < 0) {
>  		audit_mark->path = NULL;
>  		fsnotify_put_mark(&audit_mark->mark);
>  		audit_mark = ERR_PTR(ret);
>  	}
>  out:
> -	dput(dentry);
>  	path_put(&path);
>  	return audit_mark;
>  }
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 0ebbbe37a60f..f6ecac2109d4 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -359,8 +359,7 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
>  		watch->ino = d_backing_inode(d)->i_ino;
>  	}
>  
> -	inode_unlock(d_backing_inode(parent->dentry));
> -	dput(d);
> +	done_dentry_lookup(d);
>  	return 0;
>  }
>  

It looks right to me though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
  2025-07-21  7:59 ` [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure NeilBrown
@ 2025-07-21 14:46   ` Jeff Layton
  2025-07-21 23:08     ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2025-07-21 14:46 UTC (permalink / raw)
  To: NeilBrown, Linus Torvalds, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-fsdevel, linux-kernel

On Mon, 2025-07-21 at 17:59 +1000, NeilBrown wrote:
> Proposed changes to directory-op locking will lock the dentry rather
> than the whole directory.  So the dentry will need to be unlocked.
> 
> vfs_mkdir() consumes the dentry on error, so there will be no dentry to
> be unlocked.
> 
> So this patch changes vfs_mkdir() to unlock on error as well as
> releasing the dentry.  This requires various other functions in various
> callers to also unlock on error.
> 
> At present this results in some clumsy code.  Once the transition to
> dentry locking is complete the clumsiness will be gone.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/cachefiles/namei.c    |  9 +++++----
>  fs/ecryptfs/inode.c      |  3 ++-
>  fs/namei.c               | 24 ++++++++++++++++--------
>  fs/nfsd/nfs4recover.c    | 12 +++++-------
>  fs/nfsd/vfs.c            | 12 ++++++++++--
>  fs/overlayfs/dir.c       | 13 +++++++------
>  fs/overlayfs/overlayfs.h |  1 +
>  fs/overlayfs/super.c     |  5 +++--
>  fs/xfs/scrub/orphanage.c |  2 +-
>  9 files changed, 50 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d1edb2ac3837..732d78911bed 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -131,8 +131,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>  		ret = cachefiles_inject_write_error();
>  		if (ret == 0)
>  			subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
> -		else
> +		else {
> +			/* vfs_mkdir() unlocks on failure so we must too */
> +			inode_unlock(d_inode(dir));
>  			subdir = ERR_PTR(ret);
> +		}
>  		if (IS_ERR(subdir)) {
>  			trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
>  						   cachefiles_trace_mkdir_error);
> @@ -196,9 +199,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>  	return ERR_PTR(-EBUSY);
>  
>  mkdir_error:
> -	inode_unlock(d_inode(dir));
> -	if (!IS_ERR(subdir))
> -		dput(subdir);
> +	done_dentry_lookup(subdir);
>  	pr_err("mkdir %s failed with error %d\n", dirname, ret);
>  	return ERR_PTR(ret);
>  
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index abd954c6a14e..5d8cb042aa57 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -520,7 +520,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  				 lower_dentry, mode);
>  	rc = PTR_ERR(lower_dentry);
>  	if (IS_ERR(lower_dentry))
> -		goto out;
> +		goto out_unlocked;
>  	rc = 0;
>  	if (d_unhashed(lower_dentry))
>  		goto out;
> @@ -532,6 +532,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	set_nlink(dir, lower_dir->i_nlink);
>  out:
>  	inode_unlock(lower_dir);
> +out_unlocked:
>  	if (d_really_is_negative(dentry))
>  		d_drop(dentry);
>  	return ERR_PTR(rc);
> diff --git a/fs/namei.c b/fs/namei.c
> index da160a01e23d..950a0d0d54da 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1723,13 +1723,18 @@ EXPORT_SYMBOL(lookup_one_qstr_excl);
>   * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
>   * unlock the parent directory and dput() the dentry.
>   *
> + * If the dentry is an error - as can happen after vfs_mkdir() -
> + * the unlock is skipped as unneeded.
> + *
>   * This interface allows a smooth transition from parent-dir based
>   * locking to dentry based locking.
>   */
>  void done_dentry_lookup(struct dentry *dentry)
>  {
> -	inode_unlock(dentry->d_parent->d_inode);
> -	dput(dentry);
> +	if (!IS_ERR(dentry)) {
> +		inode_unlock(dentry->d_parent->d_inode);
> +		dput(dentry);
> +	}

nit: could you introduce these versions of done_dentry_lookup() and
done_dentry_lookup_return() in patch #2, even if not strictly needed
yet? Better to introduce it as ERR_PTR-safe from the start. 

>  }
>  EXPORT_SYMBOL(done_dentry_lookup);
>  
> @@ -1742,12 +1747,16 @@ EXPORT_SYMBOL(done_dentry_lookup);
>   * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
>   * unlock the parent directory.  The dentry is returned for further use.
>   *
> + * If the dentry is an error - as can happen after vfs_mkdir() -
> + * the unlock is skipped as unneeded.
> + *
>   * This interface allows a smooth transition from parent-dir based
>   * locking to dentry based locking.
>   */
>  struct dentry *done_dentry_lookup_return(struct dentry *dentry)
>  {
> -	inode_unlock(dentry->d_parent->d_inode);
> +	if (!IS_ERR(dentry))
> +		inode_unlock(dentry->d_parent->d_inode);
>  	return dentry;
>  }
>  EXPORT_SYMBOL(done_dentry_lookup_return);
> @@ -4210,9 +4219,7 @@ EXPORT_SYMBOL(kern_path_create);
>  
>  void done_path_create(struct path *path, struct dentry *dentry)
>  {
> -	if (!IS_ERR(dentry))
> -		dput(dentry);
> -	inode_unlock(path->dentry->d_inode);
> +	done_dentry_lookup(dentry);
>  	mnt_drop_write(path->mnt);
>  	path_put(path);
>  }
> @@ -4375,7 +4382,8 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
>   * negative or unhashes it and possibly splices a different one returning it,
>   * the original dentry is dput() and the alternate is returned.
>   *
> - * In case of an error the dentry is dput() and an ERR_PTR() is returned.
> + * In case of an error the dentry is dput(), the parent is unlocked, and
> + * an ERR_PTR() is returned.
>   */
>  struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  			 struct dentry *dentry, umode_t mode)
> @@ -4413,7 +4421,7 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	return dentry;
>  
>  err:
> -	dput(dentry);
> +	done_dentry_lookup(dentry);
>  	return ERR_PTR(error);
>  }
>  EXPORT_SYMBOL(vfs_mkdir);
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 82785db730d9..693fa95fa678 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -222,7 +222,8 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
>  	dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
>  	if (IS_ERR(dentry)) {
>  		status = PTR_ERR(dentry);
> -		goto out_unlock;
> +		inode_unlock(d_inode(dir));
> +		goto out;
>  	}
>  	if (d_really_is_positive(dentry))
>  		/*
> @@ -233,15 +234,12 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
>  		 * In the 4.0 case, we should never get here; but we may
>  		 * as well be forgiving and just succeed silently.
>  		 */
> -		goto out_put;
> +		goto out;
>  	dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
>  	if (IS_ERR(dentry))
>  		status = PTR_ERR(dentry);
> -out_put:
> -	if (!status)
> -		dput(dentry);
> -out_unlock:
> -	inode_unlock(d_inode(dir));
> +out:
> +	done_dentry_lookup(dentry);
>  	if (status == 0) {
>  		if (nn->in_grace)
>  			__nfsd4_create_reclaim_record_grace(clp, dname,
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index a21940cadede..e85195e858a2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1489,7 +1489,9 @@ nfsd_check_ignore_resizing(struct iattr *iap)
>  		iap->ia_valid &= ~ATTR_SIZE;
>  }
>  
> -/* The parent directory should already be locked: */
> +/* The parent directory should already be locked.  The lock
> + * will be dropped on error.
> + */
>  __be32
>  nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		   struct nfsd_attrs *attrs,
> @@ -1555,8 +1557,11 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
>  
>  out:
> -	if (!IS_ERR(dchild))
> +	if (!IS_ERR(dchild)) {
> +		if (err)
> +			inode_unlock(dirp);
>  		dput(dchild);
> +	}
>  	return err;
>  
>  out_nfserr:
> @@ -1613,6 +1618,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (err != nfs_ok)
>  		goto out_unlock;
>  	err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> +	if (err)
> +		/* lock will have been dropped */
> +		return err;
>  	fh_fill_post_attrs(fhp);
>  out_unlock:
>  	inode_unlock(dentry->d_inode);
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 30619777f0f6..74b52595ea0e 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -161,14 +161,17 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
>  	goto out;
>  }
>  
> +/* dir will be unlocked on return */
>  struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
>  			       struct dentry *newdentry, struct ovl_cattr *attr)
>  {
>  	struct inode *dir = parent->d_inode;
>  	int err;
>  
> -	if (IS_ERR(newdentry))
> +	if (IS_ERR(newdentry)) {
> +		inode_unlock(dir);
>  		return newdentry;
> +	}
>  
>  	err = -ESTALE;
>  	if (newdentry->d_inode)
> @@ -213,11 +216,11 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
>  	}
>  out:
>  	if (err) {
> -		if (!IS_ERR(newdentry))
> -			dput(newdentry);
> +		done_dentry_lookup(newdentry);
>  		return ERR_PTR(err);
> +	} else {
> +		return done_dentry_lookup_return(newdentry);
>  	}
> -	return newdentry;
>  }
>  
>  struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> @@ -227,7 +230,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
>  	inode_lock(workdir->d_inode);
>  	ret = ovl_create_real(ofs, workdir,
>  			      ovl_lookup_temp(ofs, workdir), attr);
> -	inode_unlock(workdir->d_inode);
>  	return ret;
>  }
>  
> @@ -335,7 +337,6 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
>  				    ovl_lookup_upper(ofs, dentry->d_name.name,
>  						     upperdir, dentry->d_name.len),
>  				    attr);
> -	inode_unlock(udir);
>  	if (IS_ERR(newdentry))
>  		return PTR_ERR(newdentry);
>  
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 4f84abaa0d68..238c26142318 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -250,6 +250,7 @@ static inline struct dentry *ovl_do_mkdir(struct ovl_fs *ofs,
>  
>  	ret = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode);
>  	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(ret));
> +	/* Note: dir will have been unlocked on failure */
>  	return ret;
>  }
>  
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 4afa91882075..df99a6fa17ef 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -328,11 +328,11 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
>  		}
>  
>  		work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> -		inode_unlock(dir);
>  		err = PTR_ERR(work);
>  		if (IS_ERR(work))
>  			goto out_err;
>  
> +		done_dentry_lookup_return(work);
>  		/* Weird filesystem returning with hashed negative (kernfs)? */
>  		err = -EINVAL;
>  		if (d_really_is_negative(work))
> @@ -623,7 +623,8 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
>  	child = ovl_lookup_upper(ofs, name, parent, len);
>  	if (!IS_ERR(child) && !child->d_inode)
>  		child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> -	inode_unlock(parent->d_inode);
> +	else
> +		inode_unlock(parent->d_inode);
>  	dput(parent);
>  
>  	return child;
> diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> index 9c12cb844231..c95bded4e8a7 100644
> --- a/fs/xfs/scrub/orphanage.c
> +++ b/fs/xfs/scrub/orphanage.c
> @@ -170,7 +170,7 @@ xrep_orphanage_create(
>  					     orphanage_dentry, 0750);
>  		error = PTR_ERR(orphanage_dentry);
>  		if (IS_ERR(orphanage_dentry))
> -			goto out_unlock_root;
> +			goto out_dput_root;
>  	}
>  
>  	/* Not a directory? Bail out. */

This does make for some awkward code. Fortunately there aren't that
many vfs_mkdir() callers.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] VFS: introduce done_dentry_lookup()
  2025-07-21 13:39   ` Jeff Layton
@ 2025-07-21 23:04     ` NeilBrown
  0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21 23:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Mon, 21 Jul 2025, Jeff Layton wrote:
> On Mon, 2025-07-21 at 17:59 +1000, NeilBrown wrote:
> > done_dentry_lookup() is the first step in introducing a new API for
> > locked operation on names in directories - those that create or remove
> > names.  Rename operations will also be part of this API but will
> > use separate interfaces.
> > 
> > The plan is to lock just the dentry (or dentries), not the whole
> > directory.  A "dentry_lookup()" operation will perform the locking and
> > lookup with a corresponding "done_dentry_lookup()" releasing the
> > resulting dentry and dropping any locks.
> > 
> > This done_dentry_lookup() can immediately be used to complete updates
> > started with kern_path_locked() (much as done_path_create() already
> > completes operations started with kern_path_create()).
> > 
> > So this patch adds done_dentry_lookup() and uses it where
> > kern_path_locked() is used.  It also adds done_dentry_lookup_return()
> > which returns a reference to the dentry rather than dropping it.  This
> > is a less common need in existing code, but still worth its own interface.
> > 
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  drivers/base/devtmpfs.c |  7 ++-----
> >  fs/bcachefs/fs-ioctl.c  |  3 +--
> >  fs/namei.c              | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/namei.h   |  3 +++
> >  kernel/audit_fsnotify.c |  9 ++++-----
> >  kernel/audit_watch.c    |  3 +--
> >  6 files changed, 49 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 31bfb3194b4c..47bee8166c8d 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -265,8 +265,7 @@ static int dev_rmdir(const char *name)
> >  	else
> >  		err = -EPERM;
> >  
> > -	dput(dentry);
> > -	inode_unlock(d_inode(parent.dentry));
> > +	done_dentry_lookup(dentry);
> >  	path_put(&parent);
> >  	return err;
> >  }
> > @@ -349,9 +348,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> >  		if (!err || err == -ENOENT)
> >  			deleted = 1;
> >  	}
> > -	dput(dentry);
> > -	inode_unlock(d_inode(parent.dentry));
> > -
> > +	done_dentry_lookup(dentry);
> >  	path_put(&parent);
> >  	if (deleted && strchr(nodename, '/'))
> >  		delete_path(nodename);
> > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> > index 4e72e654da96..8077ddf4ddc4 100644
> > --- a/fs/bcachefs/fs-ioctl.c
> > +++ b/fs/bcachefs/fs-ioctl.c
> > @@ -351,8 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> >  		d_invalidate(victim);
> >  	}
> >  err:
> > -	inode_unlock(dir);
> > -	dput(victim);
> > +	done_dentry_lookup(victim);
> >  	path_put(&path);
> >  	return ret;
> >  }
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 1c80445693d4..da160a01e23d 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1714,6 +1714,44 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> >  }
> >  EXPORT_SYMBOL(lookup_one_qstr_excl);
> >  
> > +/**
> > + * done_dentry_lookup - finish a lookup used for create/delete
> > + * @dentry:  the target dentry
> > + *
> > + * After locking the directory and lookup or validating a dentry
> > + * an attempt can be made to create (including link) or remove (including
> > + * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
> > + * unlock the parent directory and dput() the dentry.
> > + *
> > + * This interface allows a smooth transition from parent-dir based
> > + * locking to dentry based locking.
> > + */
> > +void done_dentry_lookup(struct dentry *dentry)
> > +{
> > +	inode_unlock(dentry->d_parent->d_inode);
> > +	dput(dentry);
> > +}
> > +EXPORT_SYMBOL(done_dentry_lookup);
> > +
> > +/**
> > + * done_dentry_lookup_return - finish a lookup sequence, returning the dentry
> > + * @dentry:  the target dentry
> > + *
> > + * After locking the directory and lookup or validating a dentry
> > + * an attempt can be made to create (including link) or remove (including
> > + * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
> > + * unlock the parent directory.  The dentry is returned for further use.
> > + *
> > + * This interface allows a smooth transition from parent-dir based
> > + * locking to dentry based locking.
> > + */
> > +struct dentry *done_dentry_lookup_return(struct dentry *dentry)
> > +{
> > +	inode_unlock(dentry->d_parent->d_inode);
> > +	return dentry;
> > +}
> > +EXPORT_SYMBOL(done_dentry_lookup_return);
> > +
> >  /**
> >   * lookup_fast - do fast lockless (but racy) lookup of a dentry
> >   * @nd: current nameidata
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 5d085428e471..e097f11587c9 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -81,6 +81,9 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
> >  					    struct qstr *name,
> >  					    struct dentry *base);
> >  
> > +void done_dentry_lookup(struct dentry *dentry);
> > +struct dentry *done_dentry_lookup_return(struct dentry *dentry);
> > +
> >  extern int follow_down_one(struct path *);
> >  extern int follow_down(struct path *path, unsigned int flags);
> >  extern int follow_up(struct path *);
> > diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> > index c565fbf66ac8..170836c3520f 100644
> > --- a/kernel/audit_fsnotify.c
> > +++ b/kernel/audit_fsnotify.c
> > @@ -85,8 +85,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
> >  	dentry = kern_path_locked(pathname, &path);
> >  	if (IS_ERR(dentry))
> >  		return ERR_CAST(dentry); /* returning an error */
> > -	inode = path.dentry->d_inode;
> > -	inode_unlock(inode);
> > +	inode = igrab(dentry->d_inode);
> 
> This is a little confusing. This patch changes "inode" from pointing to
> the parent inode to point to the child inode instead. That actually
> makes a bit more sense given the naming, but might best be done in a
> separate patch.

I could at least explain it properly the commit description.
This is part of my struggle with done_dentry_lookup_return().  Maybe I
should just use that and preserve the current use of dentry here....

Or maybe a separate patch as you say.  I'll ponder it.

> 
> > +	done_dentry_lookup(dentry);
> >  
> >  	audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
> >  	if (unlikely(!audit_mark)) {
> > @@ -97,17 +97,16 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
> >  	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group);
> >  	audit_mark->mark.mask = AUDIT_FS_EVENTS;
> >  	audit_mark->path = pathname;
> > -	audit_update_mark(audit_mark, dentry->d_inode);
> > +	audit_update_mark(audit_mark, inode);
> >  	audit_mark->rule = krule;
> >  
> > -	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
> > +	ret = fsnotify_add_inode_mark(&audit_mark->mark, path.dentry->d_inode, 0);
> >  	if (ret < 0) {
> >  		audit_mark->path = NULL;
> >  		fsnotify_put_mark(&audit_mark->mark);
> >  		audit_mark = ERR_PTR(ret);
> >  	}
> >  out:
> > -	dput(dentry);
> >  	path_put(&path);
> >  	return audit_mark;
> >  }
> > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> > index 0ebbbe37a60f..f6ecac2109d4 100644
> > --- a/kernel/audit_watch.c
> > +++ b/kernel/audit_watch.c
> > @@ -359,8 +359,7 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
> >  		watch->ino = d_backing_inode(d)->i_ino;
> >  	}
> >  
> > -	inode_unlock(d_backing_inode(parent->dentry));
> > -	dput(d);
> > +	done_dentry_lookup(d);
> >  	return 0;
> >  }
> >  
> 
> It looks right to me though.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
  2025-07-21 14:46   ` Jeff Layton
@ 2025-07-21 23:08     ` NeilBrown
  0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21 23:08 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Tue, 22 Jul 2025, Jeff Layton wrote:
> On Mon, 2025-07-21 at 17:59 +1000, NeilBrown wrote:
> > Proposed changes to directory-op locking will lock the dentry rather
> > than the whole directory.  So the dentry will need to be unlocked.
> > 
> > vfs_mkdir() consumes the dentry on error, so there will be no dentry to
> > be unlocked.
> > 
> > So this patch changes vfs_mkdir() to unlock on error as well as
> > releasing the dentry.  This requires various other functions in various
> > callers to also unlock on error.
> > 
> > At present this results in some clumsy code.  Once the transition to
> > dentry locking is complete the clumsiness will be gone.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  fs/cachefiles/namei.c    |  9 +++++----
> >  fs/ecryptfs/inode.c      |  3 ++-
> >  fs/namei.c               | 24 ++++++++++++++++--------
> >  fs/nfsd/nfs4recover.c    | 12 +++++-------
> >  fs/nfsd/vfs.c            | 12 ++++++++++--
> >  fs/overlayfs/dir.c       | 13 +++++++------
> >  fs/overlayfs/overlayfs.h |  1 +
> >  fs/overlayfs/super.c     |  5 +++--
> >  fs/xfs/scrub/orphanage.c |  2 +-
> >  9 files changed, 50 insertions(+), 31 deletions(-)
> > 
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index d1edb2ac3837..732d78911bed 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -131,8 +131,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> >  		ret = cachefiles_inject_write_error();
> >  		if (ret == 0)
> >  			subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
> > -		else
> > +		else {
> > +			/* vfs_mkdir() unlocks on failure so we must too */
> > +			inode_unlock(d_inode(dir));
> >  			subdir = ERR_PTR(ret);
> > +		}
> >  		if (IS_ERR(subdir)) {
> >  			trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
> >  						   cachefiles_trace_mkdir_error);
> > @@ -196,9 +199,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> >  	return ERR_PTR(-EBUSY);
> >  
> >  mkdir_error:
> > -	inode_unlock(d_inode(dir));
> > -	if (!IS_ERR(subdir))
> > -		dput(subdir);
> > +	done_dentry_lookup(subdir);
> >  	pr_err("mkdir %s failed with error %d\n", dirname, ret);
> >  	return ERR_PTR(ret);
> >  
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index abd954c6a14e..5d8cb042aa57 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -520,7 +520,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> >  				 lower_dentry, mode);
> >  	rc = PTR_ERR(lower_dentry);
> >  	if (IS_ERR(lower_dentry))
> > -		goto out;
> > +		goto out_unlocked;
> >  	rc = 0;
> >  	if (d_unhashed(lower_dentry))
> >  		goto out;
> > @@ -532,6 +532,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> >  	set_nlink(dir, lower_dir->i_nlink);
> >  out:
> >  	inode_unlock(lower_dir);
> > +out_unlocked:
> >  	if (d_really_is_negative(dentry))
> >  		d_drop(dentry);
> >  	return ERR_PTR(rc);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index da160a01e23d..950a0d0d54da 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1723,13 +1723,18 @@ EXPORT_SYMBOL(lookup_one_qstr_excl);
> >   * rmdir) a dentry.  After this, done_dentry_lookup() can be used to both
> >   * unlock the parent directory and dput() the dentry.
> >   *
> > + * If the dentry is an error - as can happen after vfs_mkdir() -
> > + * the unlock is skipped as unneeded.
> > + *
> >   * This interface allows a smooth transition from parent-dir based
> >   * locking to dentry based locking.
> >   */
> >  void done_dentry_lookup(struct dentry *dentry)
> >  {
> > -	inode_unlock(dentry->d_parent->d_inode);
> > -	dput(dentry);
> > +	if (!IS_ERR(dentry)) {
> > +		inode_unlock(dentry->d_parent->d_inode);
> > +		dput(dentry);
> > +	}
> 
> nit: could you introduce these versions of done_dentry_lookup() and
> done_dentry_lookup_return() in patch #2, even if not strictly needed
> yet? Better to introduce it as ERR_PTR-safe from the start. 

Maybe... an ERR_PTR-safe done_dentry_lookup() implies that you generally
don't need to unlock if you have an error.  But that isn't the case
until this patch.
But it is only a one-patch gap if I introduce that a bit earlier.  Maybe
it doesn't matter.

I'll try it and see what I think.

Thanks,
NeilBrown


> 
> >  }
> >  EXPORT_SYMBOL(done_dentry_lookup);
> >  
> > @@ -1742,12 +1747,16 @@ EXPORT_SYMBOL(done_dentry_lookup);
> >   * rmdir) a dentry.  After this, done_dentry_lookup_return() can be used to
> >   * unlock the parent directory.  The dentry is returned for further use.
> >   *
> > + * If the dentry is an error - as can happen after vfs_mkdir() -
> > + * the unlock is skipped as unneeded.
> > + *
> >   * This interface allows a smooth transition from parent-dir based
> >   * locking to dentry based locking.
> >   */
> >  struct dentry *done_dentry_lookup_return(struct dentry *dentry)
> >  {
> > -	inode_unlock(dentry->d_parent->d_inode);
> > +	if (!IS_ERR(dentry))
> > +		inode_unlock(dentry->d_parent->d_inode);
> >  	return dentry;
> >  }
> >  EXPORT_SYMBOL(done_dentry_lookup_return);
> > @@ -4210,9 +4219,7 @@ EXPORT_SYMBOL(kern_path_create);
> >  
> >  void done_path_create(struct path *path, struct dentry *dentry)
> >  {
> > -	if (!IS_ERR(dentry))
> > -		dput(dentry);
> > -	inode_unlock(path->dentry->d_inode);
> > +	done_dentry_lookup(dentry);
> >  	mnt_drop_write(path->mnt);
> >  	path_put(path);
> >  }
> > @@ -4375,7 +4382,8 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
> >   * negative or unhashes it and possibly splices a different one returning it,
> >   * the original dentry is dput() and the alternate is returned.
> >   *
> > - * In case of an error the dentry is dput() and an ERR_PTR() is returned.
> > + * In case of an error the dentry is dput(), the parent is unlocked, and
> > + * an ERR_PTR() is returned.
> >   */
> >  struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> >  			 struct dentry *dentry, umode_t mode)
> > @@ -4413,7 +4421,7 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> >  	return dentry;
> >  
> >  err:
> > -	dput(dentry);
> > +	done_dentry_lookup(dentry);
> >  	return ERR_PTR(error);
> >  }
> >  EXPORT_SYMBOL(vfs_mkdir);
> > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > index 82785db730d9..693fa95fa678 100644
> > --- a/fs/nfsd/nfs4recover.c
> > +++ b/fs/nfsd/nfs4recover.c
> > @@ -222,7 +222,8 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> >  	dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> >  	if (IS_ERR(dentry)) {
> >  		status = PTR_ERR(dentry);
> > -		goto out_unlock;
> > +		inode_unlock(d_inode(dir));
> > +		goto out;
> >  	}
> >  	if (d_really_is_positive(dentry))
> >  		/*
> > @@ -233,15 +234,12 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> >  		 * In the 4.0 case, we should never get here; but we may
> >  		 * as well be forgiving and just succeed silently.
> >  		 */
> > -		goto out_put;
> > +		goto out;
> >  	dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> >  	if (IS_ERR(dentry))
> >  		status = PTR_ERR(dentry);
> > -out_put:
> > -	if (!status)
> > -		dput(dentry);
> > -out_unlock:
> > -	inode_unlock(d_inode(dir));
> > +out:
> > +	done_dentry_lookup(dentry);
> >  	if (status == 0) {
> >  		if (nn->in_grace)
> >  			__nfsd4_create_reclaim_record_grace(clp, dname,
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index a21940cadede..e85195e858a2 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1489,7 +1489,9 @@ nfsd_check_ignore_resizing(struct iattr *iap)
> >  		iap->ia_valid &= ~ATTR_SIZE;
> >  }
> >  
> > -/* The parent directory should already be locked: */
> > +/* The parent directory should already be locked.  The lock
> > + * will be dropped on error.
> > + */
> >  __be32
> >  nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		   struct nfsd_attrs *attrs,
> > @@ -1555,8 +1557,11 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> >  
> >  out:
> > -	if (!IS_ERR(dchild))
> > +	if (!IS_ERR(dchild)) {
> > +		if (err)
> > +			inode_unlock(dirp);
> >  		dput(dchild);
> > +	}
> >  	return err;
> >  
> >  out_nfserr:
> > @@ -1613,6 +1618,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (err != nfs_ok)
> >  		goto out_unlock;
> >  	err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> > +	if (err)
> > +		/* lock will have been dropped */
> > +		return err;
> >  	fh_fill_post_attrs(fhp);
> >  out_unlock:
> >  	inode_unlock(dentry->d_inode);
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index 30619777f0f6..74b52595ea0e 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -161,14 +161,17 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
> >  	goto out;
> >  }
> >  
> > +/* dir will be unlocked on return */
> >  struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
> >  			       struct dentry *newdentry, struct ovl_cattr *attr)
> >  {
> >  	struct inode *dir = parent->d_inode;
> >  	int err;
> >  
> > -	if (IS_ERR(newdentry))
> > +	if (IS_ERR(newdentry)) {
> > +		inode_unlock(dir);
> >  		return newdentry;
> > +	}
> >  
> >  	err = -ESTALE;
> >  	if (newdentry->d_inode)
> > @@ -213,11 +216,11 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
> >  	}
> >  out:
> >  	if (err) {
> > -		if (!IS_ERR(newdentry))
> > -			dput(newdentry);
> > +		done_dentry_lookup(newdentry);
> >  		return ERR_PTR(err);
> > +	} else {
> > +		return done_dentry_lookup_return(newdentry);
> >  	}
> > -	return newdentry;
> >  }
> >  
> >  struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> > @@ -227,7 +230,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> >  	inode_lock(workdir->d_inode);
> >  	ret = ovl_create_real(ofs, workdir,
> >  			      ovl_lookup_temp(ofs, workdir), attr);
> > -	inode_unlock(workdir->d_inode);
> >  	return ret;
> >  }
> >  
> > @@ -335,7 +337,6 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> >  				    ovl_lookup_upper(ofs, dentry->d_name.name,
> >  						     upperdir, dentry->d_name.len),
> >  				    attr);
> > -	inode_unlock(udir);
> >  	if (IS_ERR(newdentry))
> >  		return PTR_ERR(newdentry);
> >  
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 4f84abaa0d68..238c26142318 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -250,6 +250,7 @@ static inline struct dentry *ovl_do_mkdir(struct ovl_fs *ofs,
> >  
> >  	ret = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode);
> >  	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(ret));
> > +	/* Note: dir will have been unlocked on failure */
> >  	return ret;
> >  }
> >  
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 4afa91882075..df99a6fa17ef 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -328,11 +328,11 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> >  		}
> >  
> >  		work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> > -		inode_unlock(dir);
> >  		err = PTR_ERR(work);
> >  		if (IS_ERR(work))
> >  			goto out_err;
> >  
> > +		done_dentry_lookup_return(work);
> >  		/* Weird filesystem returning with hashed negative (kernfs)? */
> >  		err = -EINVAL;
> >  		if (d_really_is_negative(work))
> > @@ -623,7 +623,8 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> >  	child = ovl_lookup_upper(ofs, name, parent, len);
> >  	if (!IS_ERR(child) && !child->d_inode)
> >  		child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> > -	inode_unlock(parent->d_inode);
> > +	else
> > +		inode_unlock(parent->d_inode);
> >  	dput(parent);
> >  
> >  	return child;
> > diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> > index 9c12cb844231..c95bded4e8a7 100644
> > --- a/fs/xfs/scrub/orphanage.c
> > +++ b/fs/xfs/scrub/orphanage.c
> > @@ -170,7 +170,7 @@ xrep_orphanage_create(
> >  					     orphanage_dentry, 0750);
> >  		error = PTR_ERR(orphanage_dentry);
> >  		if (IS_ERR(orphanage_dentry))
> > -			goto out_unlock_root;
> > +			goto out_dput_root;
> >  	}
> >  
> >  	/* Not a directory? Bail out. */
> 
> This does make for some awkward code. Fortunately there aren't that
> many vfs_mkdir() callers.
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] VFS: introduce dentry_lookup() and friends
  2025-07-21 10:20   ` Amir Goldstein
@ 2025-07-21 23:27     ` NeilBrown
  2025-07-23 15:17       ` Amir Goldstein
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2025-07-21 23:27 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Mon, 21 Jul 2025, Amir Goldstein wrote:
> On Mon, Jul 21, 2025 at 10:55 AM NeilBrown <neil@brown.name> wrote:
> >
> > dentry_lookup() combines locking the directory and performing a lookup
> > prior to a change to the directory.
> > Abstracting this prepares for changing the locking requirements.
> >
> > dentry_lookup_noperm() does the same without needing a mnt_idmap and
> > without checking permissions.  This is useful for internal filesystem
> > management (e.g.  creating virtual files in response to events) and in
> > other cases similar to lookup_noperm().
> >
> > dentry_lookup_hashed() also does no permissions checking and assumes
> > that the hash of the name has already been stored in the qstr.
> 
> That's a very confusing choice of name because _hashed() (to me) sounds
> like the opposite of d_unhashed() which is not at all the case.

True.  But maybe the confusion what already there.
You can "d_add()" a dentry and later "d_drop()" the dentry and if the
dentry isn't between those two operations, then it is "d_unhashed()"
which leaks out the implementation detail (hash table) for dentry
lookup. Maybe d_unhashed() should be d_added() with inverted meaning?

There is only one user of this interface outside of namei.c so I could
unexported to keep the confusion local.  That would mean
ksmbd_vfs_path_lookup() would hav to use dentry_lookup_noperm() which
would recalculate the hash which vfs_path_parent_lookup() already
calculated (and we cannot simply tell it not to bother calculating).
Actually it already uses lookup_noperm_unlocked() in the
don't-need-a-lock-branch which recalculates the hash.....

Would making that name static ease your concern?

> 
> > This is useful following filename_parentat().
> >
> > These are intended to be paired with done_dentry_lookup() which provides
> > the inverse of putting the dentry and unlocking.
> >
> > Like lookup_one_qstr_excl(), dentry_lookup() returns -ENOENT if
> > LOOKUP_CREATE was NOT given and the name cannot be found,, and returns
> > -EEXIST if LOOKUP_EXCL WAS given and the name CAN be found.
> >
> > These functions replace all uses of lookup_one_qstr_excl() in namei.c
> > except for those used for rename.
> >
> > Some of the variants should possibly be inlines in a header.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  fs/namei.c            | 158 ++++++++++++++++++++++++++++++------------
> >  include/linux/namei.h |   8 ++-
> >  2 files changed, 119 insertions(+), 47 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 950a0d0d54da..f292df61565a 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1714,17 +1714,98 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> >  }
> >  EXPORT_SYMBOL(lookup_one_qstr_excl);
> >
> > +/**
> > + * dentry_lookup_hashed - lookup and lock a name prior to dir ops
> > + * @last: the name in the given directory
> > + * @base: the directory in which the name is to be found
> > + * @lookup_flags: %LOOKUP_xxx flags
> > + *
> > + * The name is looked up and necessary locks are taken so that
> > + * the name can be created or removed.
> > + * The "necessary locks" are currently the inode node lock on @base.
> > + * The name @last is expected to already have the hash calculated.
> > + * No permission checks are performed.
> > + * Returns: the dentry, suitably locked, or an ERR_PTR().
> > + */
> > +struct dentry *dentry_lookup_hashed(struct qstr *last,
> > +                                   struct dentry *base,
> > +                                   unsigned int lookup_flags)
> > +{
> > +       struct dentry *dentry;
> > +
> > +       inode_lock_nested(base->d_inode, I_MUTEX_PARENT);
> > +
> > +       dentry = lookup_one_qstr_excl(last, base, lookup_flags);
> > +       if (IS_ERR(dentry))
> > +               inode_unlock(base->d_inode);
> > +       return dentry;
> > +}
> > +EXPORT_SYMBOL(dentry_lookup_hashed);
> 
> Observation:
> 
> This part could be factored out of
> __kern_path_locked()/kern_path_locked_negative()

This patch does exactly that....

> 
> If you do that in patch 2 while introducing done_dentry_lookup() then
> it also makes
> a lot of sense to balance the introduced done_dentry_lookup() with the
> factored out
> helper __dentry_lookup_locked() or whatever its name is.

I don't think I want a __dentry_lookup_locked().  The lock and the
lookup need to be tightly connected.
But maybe I cold introduce dentry_lookup_hashed() in patch 2 ...
Or maybe call it __dentry_lookup() ??

Thanks,
NeilBrown


> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations
  2025-07-21 11:32 ` [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations Amir Goldstein
@ 2025-07-21 23:48   ` NeilBrown
  0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2025-07-21 23:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Mon, 21 Jul 2025, Amir Goldstein wrote:
> On Mon, Jul 21, 2025 at 10:46 AM NeilBrown <neil@brown.name> wrote:
> >
> > Hi,
> >
> >  these patches (against vfs.all) primarily introduce new APIs for
> >  preparing dentries for create, remove, rename.  The goal is to
> >  centralise knowledge of how we do locking (currently by locking the
> >  directory) so that we can eventually change the mechanism (e.g.  to
> >  locking just the dentry).
> >
> >  Naming is difficult and I've changed my mind several times. :-)
> 
> Indeed it is.
> I generally like the done_ approach that you took.
> Few minor naming comments follow.
> 
> >
> >  The basic approach is to return a dentry which can be passed to
> >  vfs_create(), vfs_unlink() etc, and subsequently to release that
> >  dentry.  The closest analogue to this in the VFS is kern_path_create()
> >  which is paired with done_path_create(), though there is also
> >  kern_path_locked() which is paired with explicit inode_unlock() and
> >  dput().  So my current approach uses "done_" for finishing up.
> >
> >  I have:
> >    dentry_lookup() dentry_lookup_noperm() dentry_lookup_hashed()
> 
> As I wrote on the patch that introduces them I find dentry_lookup_hashed()
> confusing because the dentry is not hashed (only the hash is calculated).
> 
> Looking at another precedent of _noperm() vfs API we have:
> 
> vfs_setxattr()
>   __vfs_setxattr_locked()
>     __vfs_setxattr_noperm()
>       __vfs_setxattr()
> 
> Do I'd say for lack of better naming __dentry_lookup() could makes sense
> for the bare lock&dget and it could also be introduced earlier along with
> introducing done_dentry_lookup()
> 
> >    dentry_lookup_killable()
> >  paired with
> >    done_dentry_lookup()
> >
> >  and also
> >    rename_lookup() rename_lookup_noperm() rename_lookup_hashed()
> >  paired with
> >    done_rename_lookup()
> >  (these take a "struct renamedata *" to which some qstrs are added.
> >
> >  There is also "dentry_lock_in()" which is used instead of
> >  dentry_lookup() when you already have the dentry and want to lock it.
> >  So you "lock" it "in" a given parent.  I'm not very proud of this name,
> >  but I don't want to use "dentry_lock" as I want to save that for
> >  low-level locking primitives.
> 
> Very strange name :)

I wanted to encourage people to comment !!

> 
> What's wrong with dentry_lock_parent()?

That sounds like we are locking the parent, but we are conceptually locking
the dentry (Though at first we do that by locking the whole directory).

> 
> Although I think that using the verb _lock_ for locking and dget is
> actively confusing, so something along the lines of
> resume_dentry_lookup()/dentry_lookup_reacquire() might serve the
> readers of the code better.

Hmmm....  there is certainly potential there.  dentry_lookup_continue()
???

> 
> >
> >  There is also done_dentry_lookup_return() which doesn't dput() the
> >  dentry but returns it instread.  In about 1/6 of places where I need
> >  done_dentry_lookup() the code makes use of the dentry afterwards.  Only
> >  in half the places where done_dentry_lookup_return() is used is the
> >  returned value immediately returned by the calling function.  I could
> >  do a dget() before done_dentry_lookup(), but that looks awkward and I
> >  think having the _return version is justified.  I'm happy to hear other
> >  opinions.
> 
> The name is not very descriptive IMO, but I do not have a better suggestion.
> Unless you can describe it for the purpose that it is used for, e.g.
> yeild_dentry_lookup() that can be followed with resume_dentry_lookup(),
> but I do not know if those are your intentions for the return API.

The intention is that a few places (maybe a dozen) need to keep a
reference to the dentry after the operation completes.  Typically this
is a mkdir or a create.  The caller might then want to create another
file in the directory so it holds on to the dentry.
A fairly common case is that virtual filesystems hold an extra reference
to everything they create so the dcache becomes the primary store.

I suspect that in the latter case it is better to have an explicit
dget() with a comment saying "preserve in cache until explicitly
removed" or similar.
For the former case it might be better to introduce a new dentry
variable.

 dentry = dentry_lookup(....);
 dentry = vfs_mkdir(dir, dentry);
 if (!IS_ERR(dentry))
        returnval = dget(dentry);
 done_dentry_lookup(dentry)

 return returnval;

Maybe I should enhance dget() to pass through errors as well as NULL.

Maybe the above would look even better as

 struct dentry *dentry __free(lookup) = dentry_lookup(.....);
 /* handle error */
 dentry = vfs_mkdir(dir, dentry);
 return dget(dentry);

So I'll try dropping the _return version, enhancing dget, and adding a
DEFINE_FREE.

Thanks,
NeilBrown



> 
> Thanks,
> Amir.
> 
> >
> >  In order for this dentry-focussed API to work we need to have the
> >  dentry to unlock.  vfs_rmdir() currently consumes the dentry on
> >  failure, so we don't have it unless we clumsily keep a copy.  So an
> >  early patch changes vfs_rmdir() to both consume the dentry and drop the
> >  lock on failure.
> >
> >  After these new APIs are refined, agreed, and applied I will have a
> >  collection of patches to roll them out throughout the kernel.  Then we
> >  can start/continue discussing a new approach to locking which allows
> >  directory operations to proceed in parallel.
> >
> >  If you want a sneak peek at some of this future work - for context
> >  mostly - my current devel code is at https://github.com/neilbrown/linux.git
> >  in a branch "pdirops".  Be warned that a lot of the later code is under
> >  development, is known to be wrong, and doesn't even compile.  Not today
> >  anyway.  The rolling out of the new APIs is fairly mature though.
> >
> >  Please review and suggest better names, or tell me that my choices are adequate.
> >  And find the bugs in the code too :-)
> >
> >  I haven't cc:ed the maintains of the non-VFS code that the patches
> >  touch.  I can do that once the approach and names have been approved.
> >
> > Thanks,
> > NeilBrown
> >
> >
> >  [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
> >  [PATCH 2/7] VFS: introduce done_dentry_lookup()
> >  [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure.
> >  [PATCH 4/7] VFS: introduce dentry_lookup() and friends
> >  [PATCH 5/7] VFS: add dentry_lookup_killable()
> >  [PATCH 6/7] VFS: add rename_lookup()
> >  [PATCH 7/7] VFS: introduce dentry_lock_in()
> >
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] VFS: introduce dentry_lookup() and friends
  2025-07-21 23:27     ` NeilBrown
@ 2025-07-23 15:17       ` Amir Goldstein
  0 siblings, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2025-07-23 15:17 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-kernel

On Tue, Jul 22, 2025 at 1:27 AM NeilBrown <neil@brown.name> wrote:
>
> On Mon, 21 Jul 2025, Amir Goldstein wrote:
> > On Mon, Jul 21, 2025 at 10:55 AM NeilBrown <neil@brown.name> wrote:
> > >
> > > dentry_lookup() combines locking the directory and performing a lookup
> > > prior to a change to the directory.
> > > Abstracting this prepares for changing the locking requirements.
> > >
> > > dentry_lookup_noperm() does the same without needing a mnt_idmap and
> > > without checking permissions.  This is useful for internal filesystem
> > > management (e.g.  creating virtual files in response to events) and in
> > > other cases similar to lookup_noperm().
> > >
> > > dentry_lookup_hashed() also does no permissions checking and assumes
> > > that the hash of the name has already been stored in the qstr.
> >
> > That's a very confusing choice of name because _hashed() (to me) sounds
> > like the opposite of d_unhashed() which is not at all the case.
>
> True.  But maybe the confusion what already there.
> You can "d_add()" a dentry and later "d_drop()" the dentry and if the
> dentry isn't between those two operations, then it is "d_unhashed()"
> which leaks out the implementation detail (hash table) for dentry
> lookup. Maybe d_unhashed() should be d_added() with inverted meaning?
>
> There is only one user of this interface outside of namei.c so I could
> unexported to keep the confusion local.  That would mean
> ksmbd_vfs_path_lookup() would hav to use dentry_lookup_noperm() which
> would recalculate the hash which vfs_path_parent_lookup() already
> calculated (and we cannot simply tell it not to bother calculating).
> Actually it already uses lookup_noperm_unlocked() in the
> don't-need-a-lock-branch which recalculates the hash.....
>
> Would making that name static ease your concern?
>
> >
> > > This is useful following filename_parentat().
> > >
> > > These are intended to be paired with done_dentry_lookup() which provides
> > > the inverse of putting the dentry and unlocking.
> > >
> > > Like lookup_one_qstr_excl(), dentry_lookup() returns -ENOENT if
> > > LOOKUP_CREATE was NOT given and the name cannot be found,, and returns
> > > -EEXIST if LOOKUP_EXCL WAS given and the name CAN be found.
> > >
> > > These functions replace all uses of lookup_one_qstr_excl() in namei.c
> > > except for those used for rename.
> > >
> > > Some of the variants should possibly be inlines in a header.
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> > >  fs/namei.c            | 158 ++++++++++++++++++++++++++++++------------
> > >  include/linux/namei.h |   8 ++-
> > >  2 files changed, 119 insertions(+), 47 deletions(-)
> > >
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 950a0d0d54da..f292df61565a 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -1714,17 +1714,98 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> > >  }
> > >  EXPORT_SYMBOL(lookup_one_qstr_excl);
> > >
> > > +/**
> > > + * dentry_lookup_hashed - lookup and lock a name prior to dir ops
> > > + * @last: the name in the given directory
> > > + * @base: the directory in which the name is to be found
> > > + * @lookup_flags: %LOOKUP_xxx flags
> > > + *
> > > + * The name is looked up and necessary locks are taken so that
> > > + * the name can be created or removed.
> > > + * The "necessary locks" are currently the inode node lock on @base.
> > > + * The name @last is expected to already have the hash calculated.
> > > + * No permission checks are performed.
> > > + * Returns: the dentry, suitably locked, or an ERR_PTR().
> > > + */
> > > +struct dentry *dentry_lookup_hashed(struct qstr *last,
> > > +                                   struct dentry *base,
> > > +                                   unsigned int lookup_flags)
> > > +{
> > > +       struct dentry *dentry;
> > > +
> > > +       inode_lock_nested(base->d_inode, I_MUTEX_PARENT);
> > > +
> > > +       dentry = lookup_one_qstr_excl(last, base, lookup_flags);
> > > +       if (IS_ERR(dentry))
> > > +               inode_unlock(base->d_inode);
> > > +       return dentry;
> > > +}
> > > +EXPORT_SYMBOL(dentry_lookup_hashed);
> >
> > Observation:
> >
> > This part could be factored out of
> > __kern_path_locked()/kern_path_locked_negative()
>
> This patch does exactly that....
>
> >
> > If you do that in patch 2 while introducing done_dentry_lookup() then
> > it also makes
> > a lot of sense to balance the introduced done_dentry_lookup() with the
> > factored out
> > helper __dentry_lookup_locked() or whatever its name is.
>
> I don't think I want a __dentry_lookup_locked().  The lock and the
> lookup need to be tightly connected.
> But maybe I cold introduce dentry_lookup_hashed() in patch 2 ...
> Or maybe call it __dentry_lookup() ??

__dentry_lookup() sounds good to me.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-07-23 15:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21  7:59 [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations NeilBrown
2025-07-21  7:59 ` [PATCH 1/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
2025-07-21 13:06   ` Jeff Layton
2025-07-21  7:59 ` [PATCH 2/7] VFS: introduce done_dentry_lookup() NeilBrown
2025-07-21 13:39   ` Jeff Layton
2025-07-21 23:04     ` NeilBrown
2025-07-21  7:59 ` [PATCH 3/7] VFS: Change vfs_mkdir() to unlock on failure NeilBrown
2025-07-21 14:46   ` Jeff Layton
2025-07-21 23:08     ` NeilBrown
2025-07-21  8:00 ` [PATCH 4/7] VFS: introduce dentry_lookup() and friends NeilBrown
2025-07-21 10:20   ` Amir Goldstein
2025-07-21 23:27     ` NeilBrown
2025-07-23 15:17       ` Amir Goldstein
2025-07-21  8:00 ` [PATCH 5/7] VFS: add dentry_lookup_killable() NeilBrown
2025-07-21  8:00 ` [PATCH 6/7] VFS: add rename_lookup() NeilBrown
2025-07-21  8:00 ` [PATCH 7/7] VFS: introduce dentry_lock_in() NeilBrown
2025-07-21 11:32 ` [PATCH 0/7 RFC] New APIs for name lookup and lock for directory operations Amir Goldstein
2025-07-21 23:48   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).