[PATCH RFC 00/53] lift lookup out of exclive lock for dir ops

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
@ 2026-03-12 21:11 NeilBrown
  2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
                   ` (54 more replies)
  0 siblings, 55 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

This patch set progresses my effort to improve concurrency of
directory operations and specifically to allow concurrent updates
in a given directory.

There are a bunch of VFS patches which introduce some new APIs and
improve existing ones.  Then a bunch of per-filesystem changes which
adjust to meet new needs, often using the new APIs, then a final bunch
of VFS patches which discard some APIs that are no longer wanted, and
one (the second last) which makes the big change.  Some of the fs
patches don't depend on any preceeding patch and if maintainers wanted
to take those early I certainly wouldn't object!  I've put a '*' next
to patches which I think can be taken at any time.

My longer term goal involves pushing the parent-directory locking down
into filesystems (which can then discard it if it isn't needed) and using
exclusive dentry locking in the VFS for all directory operations other
than readdir - which by its nature needs shared locking and will
continue to use the directory lock.

The VFS already has exclusive dentry locking for the limited case of
lookup.  Newly created dentries (when created by d_alloc_parallel()) are
exclusively locked using the DCACHE_PAR_LOOKUP bit.  They remain
exclusive locked until they are hashed as negative or positive dentries,
or they are discarded.

DCACHE_PAR_LOOKUP currently depends on a shared parent lock to exclude
directory modifying operations.  This patch set removes this dependency
so that d_alloc_parallel() can be called without locking and all
directory modifying operations receive either a hashed dentry or an
in-lookup dentry (they currently recieve either a hashed or unhashed,
or sometimes in-lookup (atomic_open only)).

The cases where a filesystem can receive an in-lookup dentry are:
 - lookup. Currently can receive in-lookup or unhashed.  After this patch set
    it always receives in-lookup
 - atomic_open.  Currently can receive in-lookup or hashed-negative.
    This doesn't change with this patchset.
 - rename. currently can receive hashed or unhashed.  After this patchset
    can also receive in-lookup where previously it would receive unhashed.
    This is only for the target of a rename over NFS.
 - link, mknod, mkdir, symlink.  currently received hashed-negative except for
    NFS which notices the implied exclusive create and skips the lookup so
    the filesystem can received unhashed-negative for the operation.

There are two particular needs to be addressed before we can use d_alloc_parallel()
outside of the directory lock.

1/ d_alloc_parallel() effects a blocking lock so lock ordering is important.
  If we are to take the directory lock *after* calling d_alloc_parallel() (and 
  still holding an in-lookup dentry, as happens at least when ->atomic_open
  is called) then we must never call d_alloc_parallel() while holding the
  directory lock, even a shared lock.
  This particularly affects readdir as several filesystems prime the dcache
  with readdir results and so use d_alloc_parallel() in the ->iterate_shared
  handler, which will now have deadlock potential.  To address this we
  introduce d_alloc_noblock() which fails rather than blocking.

  A few other cases of potential lock inversion exist.  These are
  addressed by dropping the directory lock when it is safe to do so
  before calling d_alloc_parallel().  This requires the addtion of
  LOOKUP_SHARED so that ->lookup knows how the parent is locked.  This
  is ugly but is gone by the end of the series. After the locking is
  rearranged in the second last patch, ->lookup is only ever called
  with a shared lock.

2/ As d_alloc_parallel() will be able to run without the directory lock,
  holding that lock exclusively is not enough to protect some dcache
  manipulations.  In particular, several filesystems d_drop() a dentry
  and (possibly) re-hash it.  This will no longer be safe as
  d_alloc_parallel() could run while the dentry was dropped, would find
  that name doesn't exist in the dcache, and would create a new dentry
  leading to two uncoordinated dentries with the same name.

  It will still be safe to d_drop() a dentry after the operation has
  completed, whether in success or failure.  But d_drop()ing before that
  is best avoided.  An early d_drop() that isn't followed by a rehash is
  not clearly problematic for a filesystem which still uses parent locking
  (as all do at present) but is good to discourage that pattern now.

  This is addressed, in part, by changing d_splice_alias() to be able to
  instantiate any negative dentry, whether hashed, unhashed, or
  in-lookup.  This removes the need for d_drop() in most cases.

New APIs added are:

 - d_alloc_noblock - see patch 05 for details
 - d_duplicate - patch 06

Removed APIs:

 - d_alloc
 - d_rehash
 - d_add
 - lookup_one
 - lookup_noperm

Changed APIs:

 - d_alloc_paralle - no longer requires a waitqueue_head_t
 - d_splice_alias - now works with in-lookup dentry
 - d_alloc_name - now works with ->d_hash

d_alloc_name() should be used with d_make_persistent().  These don't require
VFS locking as the filesystem doesn't permit create/remove via VFS calls,
and provides its own locking to avoid duplicate names.

d_splice_alias() should *always* be used:
  in ->lookup 
  in ->iterate_shared for cache priming.
  in ->atomic_open, possibly via a call to ->lookup
  in ->mkdir unless d_instantiate_new() can be used.
  in ->link ->symlink ->mknod if ->lookup skips LOOKUP_CREATE|LOOKUP_EXCL

Thanks for reading this far!  I've been testing NFS but haven't tried
anything else yet.  As well as the normal review of details I'd love to
know if I've missed any important conseqeunces of the locking change.
It is a big conceptual change and there could easily be surprising
implications.

Thanks,
NeilBrown

 [PATCH 01/53] VFS: fix various typos in documentation for
 [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup
 [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
 [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
 [PATCH 05/53] VFS: introduce d_alloc_noblock()
 [PATCH 06/53] VFS: add d_duplicate()
 [PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
 [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in
*[PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from
 [PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
 [PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
 [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in
 [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
 [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
 [PATCH 15/53] nfs: use d_duplicate()
*[PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
*[PATCH 17/53] coda: don't d_drop() early.
 [PATCH 18/53] shmem: use d_duplicate()
*[PATCH 19/53] afs: use d_time instead of d_fsdata
*[PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename
 [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode()
 [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename()
 [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock.
 [PATCH 24/53] afs: use d_duplicate()
*[PATCH 25/53] smb/client: use d_time to store a timestamp in dentry,
*[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
*[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
 [PATCH 28/53] smb/client: Use d_alloc_noblock() in
*[PATCH 29/53] exfat: simplify exfat_lookup()
*[PATCH 30/53] configfs: remove d_add() calls before
 [PATCH 31/53] configfs: stop using d_add().
*[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
*[PATCH 33/53] ext4: use on-stack dentries in
 [PATCH 34/53] tracefs: stop using d_add().
 [PATCH 35/53] cephfs: stop using d_add().
*[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
 [PATCH 37/53] cephfs: Use d_alloc_noblock() in
 [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
 [PATCH 39/53] ecryptfs: stop using d_add().
 [PATCH 40/53] gfs2: stop using d_add().
 [PATCH 41/53] libfs: stop using d_add().
 [PATCH 42/53] fuse: don't d_drop() before d_splice_alias()
 [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link()
 [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in
 [PATCH 45/53] efivarfs: use d_alloc_name()
 [PATCH 46/53] Remove references to d_add() in documentation and
 [PATCH 47/53] VFS: make d_alloc() local to VFS.
 [PATCH 48/53] VFS: remove d_add()
 [PATCH 49/53] VFS: remove d_rehash()
 [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm()
 [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
 [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock
 [PATCH 53/53] VFS: remove LOOKUP_SHARED

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
                   ` (53 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Various typos fixes.
start_creating_dentry() now documented as *creating*, not *removing* the
entry.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  8 +++----
 fs/namei.c                            | 30 +++++++++++++--------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index d02aa57e4477..560b473e02d0 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1203,7 +1203,7 @@ will fail-safe.
 
 ---
 
-** mandatory**
+**mandatory**
 
 lookup_one(), lookup_one_unlocked(), lookup_one_positive_unlocked() now
 take a qstr instead of a name and len.  These, not the "one_len"
@@ -1212,7 +1212,7 @@ that filesysmtem, through a mount point - which will have a mnt_idmap.
 
 ---
 
-** mandatory**
+**mandatory**
 
 Functions try_lookup_one_len(), lookup_one_len(),
 lookup_one_len_unlocked() and lookup_positive_unlocked() have been
@@ -1229,7 +1229,7 @@ already been performed such as after vfs_path_parent_lookup()
 
 ---
 
-** mandatory**
+**mandatory**
 
 d_hash_and_lookup() is no longer exported or available outside the VFS.
 Use try_lookup_noperm() instead.  This adds name validation and takes
@@ -1370,7 +1370,7 @@ lookup_one_qstr_excl() is no longer exported - use start_creating() or
 similar.
 ---
 
-** mandatory**
+**mandatory**
 
 lock_rename(), lock_rename_child(), unlock_rename() are no
 longer available.  Use start_renaming() or similar.
diff --git a/fs/namei.c b/fs/namei.c
index 77189335bbcc..6ffb8367b1cf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2936,8 +2936,8 @@ struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
  * end_dirop - signal completion of a dirop
  * @de: the dentry which was returned by start_dirop or similar.
  *
- * If the de is an error, nothing happens. Otherwise any lock taken to
- * protect the dentry is dropped and the dentry itself is release (dput()).
+ * If the @de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is released (dput()).
  */
 void end_dirop(struct dentry *de)
 {
@@ -3254,7 +3254,7 @@ EXPORT_SYMBOL(lookup_one_unlocked);
  * the i_rwsem itself if necessary.  If a fatal signal is pending or
  * delivered, it will return %-EINTR if the lock is needed.
  *
- * Returns: A dentry, possibly negative, or
+ * Returns: A positive dentry, or
  *	   - same errors as lookup_one_unlocked() or
  *	   - ERR_PTR(-EINTR) if a fatal signal is pending.
  */
@@ -3376,7 +3376,7 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
 EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
 
 /**
- * start_creating - prepare to create a given name with permission checking
+ * start_creating - prepare to access or create a given name with permission checking
  * @idmap:  idmap of the mount
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
@@ -3408,8 +3408,8 @@ EXPORT_SYMBOL(start_creating);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
- * an object from a directory.  Permission checking (MAY_EXEC) is performed
+ * Locks are taken and a lookup is performed prior to removing an object
+ * from a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
  * If the name doesn't exist, an error is returned.
@@ -3435,7 +3435,7 @@ EXPORT_SYMBOL(start_removing);
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
  *
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
  * an object in a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
@@ -3464,7 +3464,7 @@ EXPORT_SYMBOL(start_creating_killable);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
  * an object from a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
@@ -3494,7 +3494,7 @@ EXPORT_SYMBOL(start_removing_killable);
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
  *
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
  * an object in a directory.
  *
  * If the name already exists, a positive dentry is returned.
@@ -3517,7 +3517,7 @@ EXPORT_SYMBOL(start_creating_noperm);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
  * an object from a directory.
  *
  * If the name doesn't exist, an error is returned.
@@ -3538,11 +3538,11 @@ struct dentry *start_removing_noperm(struct dentry *parent,
 EXPORT_SYMBOL(start_removing_noperm);
 
 /**
- * start_creating_dentry - prepare to create a given dentry
- * @parent: directory from which dentry should be removed
- * @child:  the dentry to be removed
+ * start_creating_dentry - prepare to access or create a given dentry
+ * @parent: directory of dentry
+ * @child:  the dentry to be prepared
  *
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
  * the validity of the dentry is checked: correct parent and still hashed.
  *
  * If the dentry is valid and negative a reference is taken and
@@ -3575,7 +3575,7 @@ EXPORT_SYMBOL(start_creating_dentry);
  * @parent: directory from which dentry should be removed
  * @child:  the dentry to be removed
  *
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
  * the validity of the dentry is checked: correct parent and still hashed.
  *
  * If the dentry is valid and positive, a reference is taken and
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
  2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
                   ` (52 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

We currently have three interfaces for attaching existing inodes to
normal filesystems(*).
- d_add() requires an unhashed or in-lookup dentry and doesn't handle
  splicing in case a directory already has dentry
- d_instantiate() requires a hashed dentry, and also doesn't handle
  splicing.
- d_splice_alias() requires unhashed or in-lookup and does handle
  splicing, and can return an alternate dentry.

So there is no interface that supports both hashed and in-lookup, which
is what ->atomic_open needs to deal with.

Some filesystems check for in-lookup in their atomic_open and if found,
perform a ->lookup and can subsequently use d_instantiate() if the
dentry is still negative.  Other d_drop() the dentry so they can use
d_splice_alias().

This last will cause a problem for proposed changes to locking which
require the dentry to remain hashed while and operation proceeds on it.

There is also no interface which splices a directory (which might
already have a dentry) to a hashed dentry.  Filesystems which need to do
this d_drop() first.

So with this patch d_splice_alias() can handle hashed, unhashed, or
in-lookup dentries.  This makes it suitable for ->lookup, ->atomic_open,
and ->mkdir.

As a side effect d_add() will also now handle hashed dentries, but
future patches will remove d_add() as there is no benefit having it as
well as the others.

__d_add() currently contains code that is identical to
__d_instantiate(), so the former is changed to call the later, and both
d_add() and d_instantiate() call __d_add().

* There is also d_make_persistent() for filesystems which are
  dcache-based and don't support mkdir, create etc, and
  d_instantiate_new() for newly created inodes that are still locked.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/vfs.rst |  4 ++--
 fs/dcache.c                       | 31 ++++++++++++-------------------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..d8df0a84cdba 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -507,8 +507,8 @@ otherwise noted.
 	dentry before the first mkdir returns.
 
 	If there is any chance this could happen, then the new inode
-	should be d_drop()ed and attached with d_splice_alias().  The
-	returned dentry (if any) should be returned by ->mkdir().
+	should be attached with d_splice_alias().  The returned
+	dentry (if any) should be returned by ->mkdir().
 
 ``rmdir``
 	called by the rmdir(2) system call.  Only required if you want
diff --git a/fs/dcache.c b/fs/dcache.c
index 7ba1801d8132..2a100c616576 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2001,7 +2001,6 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
  * (or otherwise set) by the caller to indicate that it is now
  * in use by the dcache.
  */
- 
 void d_instantiate(struct dentry *entry, struct inode * inode)
 {
 	BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
@@ -2755,18 +2754,14 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
 		dir = dentry->d_parent->d_inode;
 		n = start_dir_add(dir);
 		d_wait = __d_lookup_unhash(dentry);
+		__d_rehash(dentry);
+	} else if (d_unhashed(dentry)) {
+		__d_rehash(dentry);
 	}
 	if (unlikely(ops))
 		d_set_d_op(dentry, ops);
-	if (inode) {
-		unsigned add_flags = d_flags_for_inode(inode);
-		hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
-		raw_write_seqcount_begin(&dentry->d_seq);
-		__d_set_inode_and_type(dentry, inode, add_flags);
-		raw_write_seqcount_end(&dentry->d_seq);
-		fsnotify_update_flags(dentry);
-	}
-	__d_rehash(dentry);
+	if (inode)
+		__d_instantiate(dentry, inode);
 	if (dir)
 		end_dir_add(dir, n, d_wait);
 	spin_unlock(&dentry->d_lock);
@@ -3066,8 +3061,6 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
 	if (IS_ERR(inode))
 		return ERR_CAST(inode);
 
-	BUG_ON(!d_unhashed(dentry));
-
 	if (!inode)
 		goto out;
 
@@ -3116,6 +3109,8 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
  * @inode:  the inode which may have a disconnected dentry
  * @dentry: a negative dentry which we want to point to the inode.
  *
+ * @dentry must be negative and may be in-lookup or unhashed or hashed.
+ *
  * If inode is a directory and has an IS_ROOT alias, then d_move that in
  * place of the given dentry and return it, else simply d_add the inode
  * to the dentry and return NULL.
@@ -3123,16 +3118,14 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
  * If a non-IS_ROOT directory is found, the filesystem is corrupt, and
  * we should error out: directories can't have multiple aliases.
  *
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
+ * This should be used to return the result of ->lookup() and to
+ * instantiate the result of ->mkdir(), is often useful for
+ * ->atomic_open, and may be used to instantiate other objects.
  *
  * If a dentry was found and moved, then it is returned.  Otherwise NULL
- * is returned.  This matches the expected return value of ->lookup.
+ * is returned.  This matches the expected return value of ->lookup and
+ * ->mkdir.
  *
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
  */
 struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
 {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
  2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
  2026-03-12 21:11 ` [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
                   ` (51 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

efivarfs() is similar to other filesystems which use d_alloc_name(), but
it cannot use d_alloc_name() as it has a ->d_hash function.

The only problem with using ->d_hash if available is that it can return
an error, but d_alloc_name() cannot.  If we document that d_alloc_name()
cannot be used when ->d_hash returns an error, then any filesystem which
has a safe ->d_hash can safely use d_alloc_name().

So enhance d_alloc_name() to check for a ->d_hash function
and document that this is not permitted if the ->d_hash function can
fail( which efivarfs_d_hash() cannot).

Also document locking requirements for use.

This is a step towards eventually deprecating d_alloc().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 2a100c616576..6dfc2c7110ba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1878,12 +1878,29 @@ struct dentry *d_alloc_pseudo(struct super_block *sb, const struct qstr *name)
 	return dentry;
 }
 
+/**
+ * d_alloc_name: allocate a dentry for use in a dcache-based filesystem.
+ * @parent: dentry of the parent for the dentry
+ * @name: name of the dentry
+ *
+ * d_alloc_name() allocates a dentry without any protection against races.
+ * It should only be used in directories that do not support create/rename/link
+ * inode operations.  The result is typically passed to d_make_persistent().
+ *
+ * This must NOT be used by filesystems which provide a d_hash() function
+ * which can return an error.
+ */
 struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 {
 	struct qstr q;
 
 	q.name = name;
 	q.hash_len = hashlen_string(parent, name);
+	if (parent->d_flags & DCACHE_OP_HASH) {
+		int err = parent->d_op->d_hash(parent, &q);
+		if (WARN_ON_ONCE(err))
+			return NULL;
+	}
 	return d_alloc(parent, &q);
 }
 EXPORT_SYMBOL(d_alloc_name);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (2 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 05/53] VFS: introduce d_alloc_noblock() NeilBrown
                   ` (50 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

d_alloc_parallel() currently requires a wait_queue_head to be passed in.
This must have a life time which extends until the lookup is completed.

Future patches will make more use of d_alloc_parallel() and use it in
contexts where having an on-stack wait_queue_head is not convenient.
For exaple lookup_one_qstr_excl() can only use d_alloc_parallel() if it
accepts a wait_queue_head pass in by the caller.

The interface would be easier to use if this need were removed.  Rather
than passing a wait_queue_head into lookup_one_qstr_excl() we can let
d_alloc_parallel() manage the wait_queue_head entirely itself.

This patch replaces the on-stack wqs with a global array of wqs which
are used as needed.  A wq is NOT assigned when a dentry is first
created but only when a second thread attempts to use the same name and
so is forced to wait.  At this moment a wq is chosen using a hash of the
dentry pointer and that wq is assigned to ->d_wait.  The ->d_lock is
then dropped and the task waits.

When the dentry is finally moved out of "in_lookup" a wake up is only
sent if ->d_wait is not NULL.  This avoids an (uncontended) spin
lock/unlock which saves a couple of atomic operations in a common case.

The wake up passes the dentry that the wake up is for as the "key" and
the waiter will only wake processes waiting on the same key.  This means
that when these global waitqueues are shared (which is inevitable
though unlikely to be frequent), a task will not be woken prematurely.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  6 +++
 fs/afs/dir_silly.c                    |  4 +-
 fs/dcache.c                           | 78 ++++++++++++++++++++++-----
 fs/fuse/readdir.c                     |  3 +-
 fs/namei.c                            |  6 +--
 fs/nfs/dir.c                          |  6 +--
 fs/nfs/unlink.c                       |  3 +-
 fs/proc/base.c                        |  3 +-
 fs/proc/proc_sysctl.c                 |  3 +-
 fs/smb/client/readdir.c               |  3 +-
 include/linux/dcache.h                |  3 +-
 include/linux/nfs_xdr.h               |  1 -
 12 files changed, 81 insertions(+), 38 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 560b473e02d0..6a507c508ccf 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1375,3 +1375,9 @@ similar.
 lock_rename(), lock_rename_child(), unlock_rename() are no
 longer available.  Use start_renaming() or similar.
 
+---
+
+**mandatory**
+
+d_alloc_parallel() no longer requires a waitqueue_head.  It uses one
+from an internal table when needed.
diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index a748fd133faf..982bb6ec15f0 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -248,13 +248,11 @@ int afs_silly_iput(struct dentry *dentry, struct inode *inode)
 	struct dentry *alias;
 	int ret;
 
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
 	_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
 
 	down_read(&dvnode->rmdir_lock);
 
-	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq);
+	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name);
 	if (IS_ERR(alias)) {
 		up_read(&dvnode->rmdir_lock);
 		return 0;
diff --git a/fs/dcache.c b/fs/dcache.c
index 6dfc2c7110ba..c80406bfa0d8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2199,8 +2199,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		return found;
 	}
 	if (d_in_lookup(dentry)) {
-		found = d_alloc_parallel(dentry->d_parent, name,
-					dentry->d_wait);
+		found = d_alloc_parallel(dentry->d_parent, name);
 		if (IS_ERR(found) || !d_in_lookup(found)) {
 			iput(inode);
 			return found;
@@ -2210,7 +2209,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		if (!found) {
 			iput(inode);
 			return ERR_PTR(-ENOMEM);
-		} 
+		}
 	}
 	res = d_splice_alias(inode, found);
 	if (res) {
@@ -2576,6 +2575,46 @@ void d_rehash(struct dentry * entry)
 }
 EXPORT_SYMBOL(d_rehash);
 
+#define PAR_LOOKUP_WQ_BITS	8
+#define PAR_LOOKUP_WQS (1 << PAR_LOOKUP_WQ_BITS)
+static wait_queue_head_t par_wait_table[PAR_LOOKUP_WQS] __cacheline_aligned;
+
+static int __init par_wait_init(void)
+{
+	int i;
+
+	for (i = 0; i < PAR_LOOKUP_WQS; i++)
+		init_waitqueue_head(&par_wait_table[i]);
+	return 0;
+}
+fs_initcall(par_wait_init);
+
+struct par_wait_key {
+	struct dentry *de;
+	struct wait_queue_entry wqe;
+};
+
+static int d_wait_wake_fn(struct wait_queue_entry *wq_entry,
+			  unsigned mode, int sync, void *key)
+{
+	struct par_wait_key *pwk = container_of(wq_entry,
+						 struct par_wait_key, wqe);
+	if (pwk->de == key)
+		return default_wake_function(wq_entry, mode, sync, key);
+	return 0;
+}
+
+static inline void d_wake_waiters(struct wait_queue_head *d_wait,
+				  struct dentry *dentry)
+{
+	/* ->d_wait is only set if some thread is actually waiting.
+	 * If we find it is NULL - the common case - then there was no
+	 * contention and there are no waiters to be woken.
+	 */
+	if (d_wait)
+		__wake_up(d_wait, TASK_NORMAL, 0, dentry);
+}
+
 static inline unsigned start_dir_add(struct inode *dir)
 {
 	preempt_disable_nested();
@@ -2588,31 +2627,42 @@ static inline unsigned start_dir_add(struct inode *dir)
 }
 
 static inline void end_dir_add(struct inode *dir, unsigned int n,
-			       wait_queue_head_t *d_wait)
+			       wait_queue_head_t *d_wait, struct dentry *de)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
 	preempt_enable_nested();
-	if (wq_has_sleeper(d_wait))
-		wake_up_all(d_wait);
+	d_wake_waiters(d_wait, de);
 }
 
 static void d_wait_lookup(struct dentry *dentry)
 {
 	if (d_in_lookup(dentry)) {
-		DECLARE_WAITQUEUE(wait, current);
-		add_wait_queue(dentry->d_wait, &wait);
+		struct par_wait_key wk = {
+			.de = dentry,
+			.wqe = {
+				.private = current,
+				.func = d_wait_wake_fn,
+			},
+		};
+		struct wait_queue_head *wq;
+
+		if (!dentry->d_wait)
+			dentry->d_wait = &par_wait_table[hash_ptr(dentry,
+								  PAR_LOOKUP_WQ_BITS)];
+		wq = dentry->d_wait;
+		add_wait_queue(wq, &wk.wqe);
 		do {
 			set_current_state(TASK_UNINTERRUPTIBLE);
 			spin_unlock(&dentry->d_lock);
 			schedule();
 			spin_lock(&dentry->d_lock);
 		} while (d_in_lookup(dentry));
+		remove_wait_queue(wq, &wk.wqe);
 	}
 }
 
 struct dentry *d_alloc_parallel(struct dentry *parent,
-				const struct qstr *name,
-				wait_queue_head_t *wq)
+				const struct qstr *name)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2625,6 +2675,7 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		return ERR_PTR(-ENOMEM);
 
 	new->d_flags |= DCACHE_PAR_LOOKUP;
+	new->d_wait = NULL;
 	spin_lock(&parent->d_lock);
 	new->d_parent = dget_dlock(parent);
 	hlist_add_head(&new->d_sib, &parent->d_children);
@@ -2715,7 +2766,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		return dentry;
 	}
 	rcu_read_unlock();
-	new->d_wait = wq;
 	hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
 	hlist_bl_unlock(b);
 	return new;
@@ -2753,7 +2803,7 @@ static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
 void __d_lookup_unhash_wake(struct dentry *dentry)
 {
 	spin_lock(&dentry->d_lock);
-	wake_up_all(__d_lookup_unhash(dentry));
+	d_wake_waiters(__d_lookup_unhash(dentry), dentry);
 	spin_unlock(&dentry->d_lock);
 }
 EXPORT_SYMBOL(__d_lookup_unhash_wake);
@@ -2780,7 +2830,7 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
 	if (inode)
 		__d_instantiate(dentry, inode);
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, d_wait, dentry);
 	spin_unlock(&dentry->d_lock);
 	if (inode)
 		spin_unlock(&inode->i_lock);
@@ -2964,7 +3014,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 	write_seqcount_end(&dentry->d_seq);
 
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, d_wait, target);
 
 	if (dentry->d_parent != old_parent)
 		spin_unlock(&dentry->d_parent->d_lock);
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index c2aae2eef086..f588252891af 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -160,7 +160,6 @@ static int fuse_direntplus_link(struct file *file,
 	struct inode *dir = d_inode(parent);
 	struct fuse_conn *fc;
 	struct inode *inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	int epoch;
 
 	if (!o->nodeid) {
@@ -197,7 +196,7 @@ static int fuse_direntplus_link(struct file *file,
 	dentry = d_lookup(parent, &name);
 	if (!dentry) {
 retry:
-		dentry = d_alloc_parallel(parent, &name, &wq);
+		dentry = d_alloc_parallel(parent, &name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
diff --git a/fs/namei.c b/fs/namei.c
index 6ffb8367b1cf..d31c3db7eb5e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1891,13 +1891,12 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 {
 	struct dentry *dentry, *old;
 	struct inode *inode = dir->d_inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	/* Don't go there if it's already dead */
 	if (unlikely(IS_DEADDIR(inode)))
 		return ERR_PTR(-ENOENT);
 again:
-	dentry = d_alloc_parallel(dir, name, &wq);
+	dentry = d_alloc_parallel(dir, name);
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
@@ -4408,7 +4407,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -4417,7 +4415,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	dentry = d_lookup(dir, &nd->last);
 	for (;;) {
 		if (!dentry) {
-			dentry = d_alloc_parallel(dir, &nd->last, &wq);
+			dentry = d_alloc_parallel(dir, &nd->last);
 			if (IS_ERR(dentry))
 				return dentry;
 		}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2402f57c8e7d..52e7656195ec 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -727,7 +727,6 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		unsigned long dir_verifier)
 {
 	struct qstr filename = QSTR_INIT(entry->name, entry->len);
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct dentry *dentry;
 	struct dentry *alias;
 	struct inode *inode;
@@ -756,7 +755,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 	dentry = d_lookup(parent, &filename);
 again:
 	if (!dentry) {
-		dentry = d_alloc_parallel(parent, &filename, &wq);
+		dentry = d_alloc_parallel(parent, &filename);
 		if (IS_ERR(dentry))
 			return;
 	}
@@ -2107,7 +2106,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		    struct file *file, unsigned open_flags,
 		    umode_t mode)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct nfs_open_context *ctx;
 	struct dentry *res;
 	struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -2163,7 +2161,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		d_drop(dentry);
 		switched = true;
 		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name, &wq);
+					  &dentry->d_name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 		if (unlikely(!d_in_lookup(dentry)))
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index df3ca4669df6..43ea897943c0 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -124,7 +124,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
 	struct dentry *alias;
 
 	down_read_non_owner(&NFS_I(dir)->rmdir_sem);
-	alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
+	alias = d_alloc_parallel(dentry->d_parent, &data->args.name);
 	if (IS_ERR(alias)) {
 		up_read_non_owner(&NFS_I(dir)->rmdir_sem);
 		return 0;
@@ -185,7 +185,6 @@ nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
 
 	data->cred = get_current_cred();
 	data->res.dir_attr = &data->dir_attr;
-	init_waitqueue_head(&data->wq);
 
 	status = -EBUSY;
 	spin_lock(&dentry->d_lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4eec684baca9..070c0d58b2da 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2129,8 +2129,7 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx,
 
 	child = try_lookup_noperm(&qname, dir);
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			goto end_instantiate;
 		if (d_in_lookup(child)) {
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 49ab74e0bfde..04a382178c65 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -692,8 +692,7 @@ static bool proc_sys_fill_cache(struct file *file,
 
 	child = d_lookup(dir, &qname);
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			return false;
 		if (d_in_lookup(child)) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 8615a8747b7f..47f5d620b750 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -73,7 +73,6 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 	struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
 	bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions;
 	bool reparse_need_reval = false;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	int rc;
 
 	cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
@@ -105,7 +104,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 		    (fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
 			return;
 
-		dentry = d_alloc_parallel(parent, name, &wq);
+		dentry = d_alloc_parallel(parent, name);
 	}
 	if (IS_ERR(dentry))
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 898c60d21c92..c6440c626a0f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -244,8 +244,7 @@ extern void d_delete(struct dentry *);
 /* allocate/de-allocate */
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
-extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
-					wait_queue_head_t *);
+extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..1acc2479cb38 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1740,7 +1740,6 @@ struct nfs_unlinkdata {
 	struct nfs_removeargs args;
 	struct nfs_removeres res;
 	struct dentry *dentry;
-	wait_queue_head_t wq;
 	const struct cred *cred;
 	struct nfs_fattr dir_attr;
 	long timeout;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/53] VFS: introduce d_alloc_noblock()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (3 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 06/53] VFS: add d_duplicate() NeilBrown
                   ` (49 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Several filesystems use the results of readdir to prime the dcache.
These filesystems use d_alloc_parallel() which can block if there is a
concurrent lookup.  Blocking in that case is pointless as the lookup
will add info to the dcache and there is no value in the readdir waiting
to see if it should add the info too.

Also these calls to d_alloc_parallel() are made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So this patch introduces d_alloc_noblock() which doesn't block but
instead returns ERR_PTR(-EWOULDBLOCK).  Filesystems that prime the
dcache (smb/client, nfs, fuse, cephfs) can now use that and ignore
-EWOULDBLOCK errors as harmless.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 82 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/dcache.h |  1 +
 2 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c80406bfa0d8..f4d7d200bc46 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2661,8 +2661,16 @@ static void d_wait_lookup(struct dentry *dentry)
 	}
 }
 
-struct dentry *d_alloc_parallel(struct dentry *parent,
-				const struct qstr *name)
+/* What to do when __d_alloc_parallel finds a d_in_lookup dentry */
+enum alloc_para {
+	ALLOC_PARA_WAIT,
+	ALLOC_PARA_FAIL,
+};
+
+static inline
+struct dentry *__d_alloc_parallel(struct dentry *parent,
+				  const struct qstr *name,
+				  enum alloc_para how)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2745,7 +2753,18 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		 * wait for them to finish
 		 */
 		spin_lock(&dentry->d_lock);
-		d_wait_lookup(dentry);
+		if (d_in_lookup(dentry))
+			switch (how) {
+			case ALLOC_PARA_FAIL:
+				spin_unlock(&dentry->d_lock);
+				dput(new);
+				dput(dentry);
+				return ERR_PTR(-EWOULDBLOCK);
+			case ALLOC_PARA_WAIT:
+				d_wait_lookup(dentry);
+				/* ... and continue */
+			}
+
 		/*
 		 * it's not in-lookup anymore; in principle we should repeat
 		 * everything from dcache lookup, but it's likely to be what
@@ -2774,8 +2793,65 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 	dput(dentry);
 	goto retry;
 }
+
+/**
+ * d_alloc_parallel() - allocate a new dentry and ensure uniqueness
+ * @parent - dentry of the parent
+ * @name   - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup(), then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_parallel() waits for
+ * that lookup to complete before returning the dentry and then ensures the
+ * match is still valid.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error %-ENOMEM.
+ */
+struct dentry *d_alloc_parallel(struct dentry *parent,
+				const struct qstr *name)
+{
+	return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
+}
 EXPORT_SYMBOL(d_alloc_parallel);
 
+/**
+ * d_alloc_noblock() - find or allocate a new dentry
+ * @parent - dentry of the parent
+ * @name   - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup() then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_noblock()
+ * returns with error %-EWOULDBLOCK.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error
+ *    %-ENOMEM or %-EWOULDBLOCK.
+ */
+struct dentry *d_alloc_noblock(struct dentry *parent,
+					struct qstr *name)
+{
+	return __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
+}
+EXPORT_SYMBOL(d_alloc_noblock);
+
 /*
  * - Unhash the dentry
  * - Retrieve and clear the waitqueue head in dentry
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c6440c626a0f..3cb70b3398f0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -245,6 +245,7 @@ extern void d_delete(struct dentry *);
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
+extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/53] VFS: add d_duplicate()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (4 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 05/53] VFS: introduce d_alloc_noblock() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 07/53] VFS: Add LOOKUP_SHARED flag NeilBrown
                   ` (48 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Occasionally a single operation can require two sub-operations on the
same name, and it is important that a d_alloc_parallel() (once that can
be run unlocked) does not create another dentry with the same name
between the operations.

Two examples:
1/ rename where the target name (a positive dentry) needs to be
  "silly-renamed" to a temporary name so it will remain available on the
  server (NFS and AFS).  Here the same name needs to be the subject
  of one rename, and the target of another.
2/ rename where the subject needs to be replaced with a white-out
  (shmemfs).  Here the same name need to be the target of a rename
  and the target of a mknod()

In both cases the original dentry is renamed to something else, and a
replacement is instantiated, possibly as the target of d_move(), possibly
by d_instantiate().

Currently d_alloc() is used to create the dentry and the exclusive lock
on the parent ensures no other dentry is created.  When
d_alloc_parallel() is moved out of the parent lock, this will no longer
be sufficient.  In particular if the original is renamed away before the
new is instantiated, there is a window where d_alloc_parallel() could
create another name.  "silly-rename" does work in this order.  shmemfs
whiteout doesn't open this hole but is essentially the same pattern and
should use the same approach.

The new d_duplicate() creates an in-lookup dentry with the same name as
the original dentry, which must be hashed.  There is no need to check if
an in-lookup dentry exists with the same name as d_alloc_parallel() will
never try add one while the hashed dentry exists.  Once the new
in-lookup is created, d_alloc_parallel() will find it and wait for it to
complete, then use it.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 52 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |  1 +
 2 files changed, 53 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index f4d7d200bc46..c12319097d6e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1832,6 +1832,58 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 }
 EXPORT_SYMBOL(d_alloc);
 
+/**
+ * d_duplicate: duplicate a dentry for combined atomic operation
+ * @dentry: the dentry to duplicate
+ *
+ * Some rename operations need to be combined with another operation
+ * inside the filesystem.
+ * 1/ A cluster filesystem when renaming to an in-use file might need to
+ *   first "silly-rename" that target out of the way before the main rename
+ * 2/ A filesystem that supports white-out might want to create a whiteout
+ *   in place of the file being moved.
+ *
+ * For this they need two dentries which temporarily have the same name,
+ * before one is renamed.  d_duplicate() provides for this.  Given a
+ * positive hashed dentry, it creates a second in-lookup dentry.
+ * Because the original dentry exists, no other thread will try to
+ * create an in-lookup dentry, os there can be no race in this create.
+ *
+ * The caller should d_move() the original to a new name, often via a
+ * rename request, and should call d_lookup_done() on the newly created
+ * dentry.  If the new is instantiated and the old MUST either be moved
+ * or dropped.
+ *
+ * Parent must be locked.
+ *
+ * Returns: an in-lookup dentry, or an error.
+ */
+struct dentry *d_duplicate(struct dentry *dentry)
+{
+	unsigned int hash = dentry->d_name.hash;
+	struct dentry *parent = dentry->d_parent;
+	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
+	struct dentry *new = __d_alloc(parent->d_sb, &dentry->d_name);
+
+	if (unlikely(!new))
+		return ERR_PTR(-ENOMEM);
+
+	new->d_flags |= DCACHE_PAR_LOOKUP;
+	new->d_wait = NULL;
+	spin_lock(&parent->d_lock);
+	new->d_parent = dget_dlock(parent);
+	hlist_add_head(&new->d_sib, &parent->d_children);
+	if (parent->d_flags & DCACHE_DISCONNECTED)
+		new->d_flags |= DCACHE_DISCONNECTED;
+	spin_unlock(&dentry->d_parent->d_lock);
+
+	hlist_bl_lock(b);
+	hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
+	hlist_bl_unlock(b);
+	return new;
+}
+EXPORT_SYMBOL(d_duplicate);
+
 struct dentry *d_alloc_anon(struct super_block *sb)
 {
 	return __d_alloc(sb, NULL);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3cb70b3398f0..2a3ebd368ed9 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -247,6 +247,7 @@ extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
+struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
 					  const struct dentry_operations *);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (5 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 06/53] VFS: add d_duplicate() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
                   ` (47 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Some ->lookup handlers will need to drop and retake the parent lock, so
they can safely use d_alloc_parallel().

->lookup can be called with the parent lock either exclusive or shared.

A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.

This is rather ugly, but will be gone by the end of the series when
->lookup is *always* called with a shared lock on the parent.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 7 ++++---
 include/linux/namei.h | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d31c3db7eb5e..eed388ee8a30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1928,7 +1928,7 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
 	struct inode *inode = dir->d_inode;
 	struct dentry *res;
 	inode_lock_shared(inode);
-	res = __lookup_slow(name, dir, flags);
+	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
 	inode_unlock_shared(inode);
 	return res;
 }
@@ -1942,7 +1942,7 @@ static struct dentry *lookup_slow_killable(const struct qstr *name,
 
 	if (inode_lock_shared_killable(inode))
 		return ERR_PTR(-EINTR);
-	res = __lookup_slow(name, dir, flags);
+	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
 	inode_unlock_shared(inode);
 	return res;
 }
@@ -4407,6 +4407,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
+	unsigned int shared_flag = (op->open_flag & O_CREAT) ? 0 : LOOKUP_SHARED;
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -4474,7 +4475,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 
 	if (d_in_lookup(dentry)) {
 		struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
-							     nd->flags);
+							     nd->flags | shared_flag);
 		d_lookup_done(dentry);
 		if (unlikely(res)) {
 			if (IS_ERR(res)) {
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ad6dd9987b9..b3346a513d8f 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -37,8 +37,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_CREATE		BIT(17)	/* ... in object creation */
 #define LOOKUP_EXCL		BIT(18)	/* ... in target must not exist */
 #define LOOKUP_RENAME_TARGET	BIT(19)	/* ... in destination of rename() */
+#define LOOKUP_SHARED		BIT(20) /* Parent lock is held shared */
 
-/* 4 spare bits for intent */
+/* 3 spare bits for intent */
 
 /* Scoping flags for lookup. */
 #define LOOKUP_NO_SYMLINKS	BIT(24) /* No symlink crossing. */
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (6 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 07/53] VFS: Add LOOKUP_SHARED flag NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
                   ` (46 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.

We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.

So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 18 +++++++++++++++++-
 fs/xfs/xfs_iops.c      |  3 ++-
 include/linux/dcache.h |  3 ++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c12319097d6e..a1219b446b74 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2225,6 +2225,7 @@ EXPORT_SYMBOL(d_obtain_root);
  * @dentry: the negative dentry that was passed to the parent's lookup func
  * @inode:  the inode case-insensitive lookup has found
  * @name:   the case-exact name to be associated with the returned dentry
+ * @bool:   %true if lookup was performed with LOOKUP_SHARED
  *
  * This is to avoid filling the dcache with case-insensitive names to the
  * same inode, only the actual correct case is stored in the dcache for
@@ -2237,7 +2238,7 @@ EXPORT_SYMBOL(d_obtain_root);
  * the exact case, and return the spliced entry.
  */
 struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
-			struct qstr *name)
+			struct qstr *name, bool shared)
 {
 	struct dentry *found, *res;
 
@@ -2250,6 +2251,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		iput(inode);
 		return found;
 	}
+	/*
+	 * We are holding parent lock and so don't want to wait for a
+	 * d_in_lookup() dentry.  We can safely drop the parent lock and
+	 * reclaim it as we have exclusive access to dentry as it is
+	 * d_in_lookup() (so ->d_parent is stable) and we are near the
+	 * end ->lookup() and will shortly drop the lock anyway.
+	 */
+	if (shared)
+		inode_unlock_shared(d_inode(dentry->d_parent));
+	else
+		inode_unlock(d_inode(dentry->d_parent));
 	if (d_in_lookup(dentry)) {
 		found = d_alloc_parallel(dentry->d_parent, name);
 		if (IS_ERR(found) || !d_in_lookup(found)) {
@@ -2263,6 +2275,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 			return ERR_PTR(-ENOMEM);
 		}
 	}
+	if (shared)
+		inode_lock_shared(d_inode(dentry->d_parent));
+	else
+		inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
 	res = d_splice_alias(inode, found);
 	if (res) {
 		d_lookup_done(found);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 208543e57eda..ec19d3ec7cf0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -35,6 +35,7 @@
 #include <linux/security.h>
 #include <linux/iversion.h>
 #include <linux/fiemap.h>
+#include <linux/namei.h> // for LOOKUP_SHARED
 
 /*
  * Directories have different lock order w.r.t. mmap_lock compared to regular
@@ -369,7 +370,7 @@ xfs_vn_ci_lookup(
 	/* else case-insensitive match... */
 	dname.name = ci_name.name;
 	dname.len = ci_name.len;
-	dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+	dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));
 	kfree(ci_name.name);
 	return dentry;
 }
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 2a3ebd368ed9..a97eb151d9db 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -251,7 +251,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
 					  const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+				bool);
 extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
 			const struct qstr *name);
 extern struct dentry *d_find_any_alias(struct inode *inode);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (7 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() NeilBrown
                   ` (45 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

It is important that two non-create NFS "open"s of a negative dentry
don't race.  They both have only a shared lock on i_rwsem and so could
run concurrently, but they might both try to call d_splice_alias() at
the same time which is confusing at best.

nfs_atomic_open() currently avoids this by discarding the negative
dentry and creating a new one using d_alloc_parallel().  Only one thread
can successfully get the d_in_lookup() dentry, the other will wait for
the first to finish, and can use the result of that first lookup.

A proposed locking change inverts the order between i_rwsem and
d_alloc_parallel() so it will not be safe to call d_alloc_parallel()
while holding i_rwsem - even shared.

We can achieve the same effect by causing ->d_revalidate to invalidate a
negative dentry when LOOKUP_OPEN is set.  Doing this is consistent with
the "close to open" caching semantics of NFS which requires the server
to be queried whenever opening a file - cached information must not be
trusted.

With this change to ->d_revaliate (implemented in nfs_neg_need_reval) we
can be sure that we have exclusive access to any dentry that reaches
nfs_atomic_open().  Either O_CREAT was requested and so the parent is
locked exclusively, or the dentry will have DCACHE_PAR_LOOKUP set.

This means that the d_drop() and d_alloc_parallel() calls in
nfs_atomic_lookup() are no longer needed to provide exclusion

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 30 +++++++-----------------------
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 52e7656195ec..3033cc5ce12f 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1657,6 +1657,13 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 {
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
+	if (flags & LOOKUP_OPEN)
+		/* close-to-open semantics require we go to server
+		 * on each open.  By invalidating the dentry we
+		 * also ensure nfs_atomic_open() always has exclusive
+		 * access to the dentry.
+		 */
+		return 0;
 	if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONEG)
 		return 1;
 	/* Case insensitive server? Revalidate negative dentries */
@@ -2112,7 +2119,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct inode *inode;
 	unsigned int lookup_flags = 0;
 	unsigned long dir_verifier;
-	bool switched = false;
 	int created = 0;
 	int err;
 
@@ -2157,17 +2163,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		attr.ia_size = 0;
 	}
 
-	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
-		d_drop(dentry);
-		switched = true;
-		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name);
-		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
-		if (unlikely(!d_in_lookup(dentry)))
-			return finish_no_open(file, dentry);
-	}
-
 	ctx = create_nfs_open_context(dentry, open_flags, file);
 	err = PTR_ERR(ctx);
 	if (IS_ERR(ctx))
@@ -2210,10 +2205,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 	put_nfs_open_context(ctx);
 out:
-	if (unlikely(switched)) {
-		d_lookup_done(dentry);
-		dput(dentry);
-	}
 	return err;
 
 no_open:
@@ -2236,13 +2227,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 			res = ERR_PTR(-EOPENSTALE);
 		}
 	}
-	if (switched) {
-		d_lookup_done(dentry);
-		if (!res)
-			res = dentry;
-		else
-			dput(dentry);
-	}
 	return finish_no_open(file, res);
 }
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (8 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() NeilBrown
                   ` (44 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

When filename_linkat() calls filename_create() which ultimately
calls ->lookup, the flags LOOKUP_CREATE|LOOKUP_EXCL are passed.
nfs_lookup() treats this as an exclusive create (which it is) and
skips the ->lookup, leaving the dentry unchanged.

Currently that means nfs_link() can get a hashed dentry (if the name was
already in the cache) or an unhashed dentry (if it wasn't). As none of
d_add(), d_instantiate(), d_splice_alias() could handle both of these,
nfs_link() calls d_drop() and then then d_add().

Recent changes to d_splice_alias() mean that it *can* work with either
hashed or unhashed dentries.  Future changes to locking mean that it
will be unsafe to d_drop() a dentry while an operation (in this case
"link()") is still ongoing.

So change to use d_splice_alias(), and not to d_drop() until an error is
detected (as in that case was can't be sure what is actually on the server).

Also update the comment for nfs_is_exclusive_create() to note that
link(), mkdir(), mknod(), symlink() all appear as exclusive creates.
Those other than link() already used d_splice_alias() via
nfs_add_or_obtain().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3033cc5ce12f..a188b09c9a54 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1571,6 +1571,9 @@ static int nfs_check_verifier(struct inode *dir, struct dentry *dentry,
 /*
  * Use intent information to check whether or not we're going to do
  * an O_EXCL create using this path component.
+ * Note that link(), mkdir(), mknod(), symlink() all appear as
+ * exclusive creation.  Regular file creation could be distinguished
+ * with LOOKUP_OPEN.
  */
 static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
 {
@@ -2677,14 +2680,15 @@ nfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
 		old_dentry, dentry);
 
 	trace_nfs_link_enter(inode, dir, dentry);
-	d_drop(dentry);
 	if (S_ISREG(inode->i_mode))
 		nfs_sync_inode(inode);
 	error = NFS_PROTO(dir)->link(inode, dir, &dentry->d_name);
 	if (error == 0) {
 		nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 		ihold(inode);
-		d_add(dentry, inode);
+		d_splice_alias(inode, dentry);
+	} else {
+		d_drop(dentry);
 	}
 	trace_nfs_link_exit(inode, dir, dentry, error);
 	return error;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (9 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:11 ` [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
                   ` (43 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

nfs_add_or_obtain() is used, often via nfs_instantiate(), to attach a
newly created inode to the appropriate dentry - or to provide an
alternate dentry.
It has to drop the dentry first, which is problematic for proposed
locking changes.

As d_splice_alias() now works with hashed dentries, the d_drop() is no
longer needed.

However we still d_drop() on error as the status of the name is
uncertain.

nfs_open_and_get_state() is only used for files so we should be able to
use d_instantiate().  However as that depends on the server for
correctness, it is safer to stay with the current code pattern and use
d_splice_alias() there too.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c      | 3 +--
 fs/nfs/nfs4proc.c | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a188b09c9a54..f92ea11aea44 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2330,8 +2330,6 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	struct dentry *d;
 	int error;
 
-	d_drop(dentry);
-
 	if (fhandle->size == 0) {
 		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
 					       fhandle, fattr);
@@ -2352,6 +2350,7 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	dput(parent);
 	return d;
 out_error:
+	d_drop(dentry);
 	d = ERR_PTR(error);
 	goto out;
 }
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 91bcf67bd743..a4ee0c0b4567 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3099,7 +3099,6 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
 	nfs_set_verifier(dentry, dir_verifier);
 	if (d_really_is_negative(dentry)) {
 		struct dentry *alias;
-		d_drop(dentry);
 		alias = d_splice_alias(igrab(state->inode), dentry);
 		/* d_splice_alias() can't fail here - it's a non-directory */
 		if (alias) {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (10 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() NeilBrown
@ 2026-03-12 21:11 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
                   ` (42 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

When atomic_create fails with -ENOENT we currently d_drop() the dentry
and then re-add it (d_splice_alias()) with a NULL inode.
This drop-and-re-add will not work with proposed locking changes.

As d_splice_alias() now supports hashed dentries, we don't need the
d_drop() until it is determined that some other error has occurred.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f92ea11aea44..ffba4de3df01 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2179,7 +2179,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		err = PTR_ERR(inode);
 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 		put_nfs_open_context(ctx);
-		d_drop(dentry);
 		switch (err) {
 		case -ENOENT:
 			if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
@@ -2188,7 +2187,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 				dir_verifier = nfs_save_change_attribute(dir);
 			nfs_set_verifier(dentry, dir_verifier);
 			d_splice_alias(NULL, dentry);
-			break;
+			goto out;
 		case -EISDIR:
 		case -ENOTDIR:
 			goto no_open;
@@ -2200,6 +2199,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		default:
 			break;
 		}
+		d_drop(dentry);
 		goto out;
 	}
 	file->f_mode |= FMODE_CAN_ODIRECT;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (11 preceding siblings ...)
  2026-03-12 21:11 ` [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename NeilBrown
                   ` (41 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

NFS uses the results of readdir to prime the dcache.  Using
d_alloc_parallel() can block if there is a concurrent lookup.  Blocking
in that case is pointless as the lookup will add info to the dcache and
there is no value in the readdir waiting to see if it should add the
info too.

Also this call to d_alloc_parallel() is made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So change to use d_alloc_noblock(), and use try_lookup_noperm() rather
than full_name_hash and d_lookup.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ffba4de3df01..4b73ec59bbcc 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -750,15 +750,14 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		if (filename.len == 2 && filename.name[1] == '.')
 			return;
 	}
-	filename.hash = full_name_hash(parent, filename.name, filename.len);
 
-	dentry = d_lookup(parent, &filename);
+	dentry = try_lookup_noperm(&filename, parent);
 again:
-	if (!dentry) {
-		dentry = d_alloc_parallel(parent, &filename);
-		if (IS_ERR(dentry))
-			return;
-	}
+	if (!dentry)
+		dentry = d_alloc_noblock(parent, &filename);
+	if (IS_ERR(dentry))
+		return;
+
 	if (!d_in_lookup(dentry)) {
 		/* Is there a mountpoint here? If so, just exit */
 		if (!nfs_fsid_equal(&NFS_SB(dentry->d_sb)->fsid,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (12 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 15/53] nfs: use d_duplicate() NeilBrown
                   ` (40 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Rather than performing a normal lookup (which will be awkward with
future locking changes) use d_alloc_noblock() to find a dentry for an
unused name, and then open-code the rest of lookup_slow() to see if it
is free on the server.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/unlink.c | 56 +++++++++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 43ea897943c0..f112c13d97a1 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -445,7 +445,8 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 	static unsigned int sillycounter;
 	unsigned char silly[SILLYNAME_LEN + 1];
 	unsigned long long fileid;
-	struct dentry *sdentry;
+	struct dentry *sdentry, *old;
+	struct qstr qsilly;
 	struct inode *inode = d_inode(dentry);
 	struct rpc_task *task;
 	int            error = -EBUSY;
@@ -462,26 +463,41 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 
 	fileid = NFS_FILEID(d_inode(dentry));
 
-	sdentry = NULL;
-	do {
+newname:
+	sillycounter++;
+	scnprintf(silly, sizeof(silly),
+		  SILLYNAME_PREFIX "%0*llx%0*x",
+		  SILLYNAME_FILEID_LEN, fileid,
+		  SILLYNAME_COUNTER_LEN, sillycounter);
+
+	dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
+		 dentry, silly);
+	qsilly = QSTR(silly);
+	sdentry = try_lookup_noperm(&qsilly, dentry->d_parent);
+	if (!sdentry)
+		sdentry = d_alloc_noblock(dentry->d_parent, &qsilly);
+	if (sdentry == ERR_PTR(-EWOULDBLOCK))
+		/* Name currently being looked up */
+		goto newname;
+	/*
+	 * N.B. Better to return EBUSY here ... it could be
+	 * dangerous to delete the file while it's in use.
+	 */
+	if (IS_ERR(sdentry))
+		goto out;
+	if (d_is_positive(sdentry)) {
 		dput(sdentry);
-		sillycounter++;
-		scnprintf(silly, sizeof(silly),
-			  SILLYNAME_PREFIX "%0*llx%0*x",
-			  SILLYNAME_FILEID_LEN, fileid,
-			  SILLYNAME_COUNTER_LEN, sillycounter);
-
-		dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
-				dentry, silly);
-
-		sdentry = lookup_noperm(&QSTR(silly), dentry->d_parent);
-		/*
-		 * N.B. Better to return EBUSY here ... it could be
-		 * dangerous to delete the file while it's in use.
-		 */
-		if (IS_ERR(sdentry))
-			goto out;
-	} while (d_inode(sdentry) != NULL); /* need negative lookup */
+		goto newname;
+	}
+	/* This name isn't known locally - check on server */
+	old = dir->i_op->lookup(dir, sdentry, 0);
+	d_lookup_done(sdentry);
+	if (old || d_is_positive(sdentry)) {
+		if (!IS_ERR(old))
+			dput(old);
+		dput(sdentry);
+		goto newname;
+	}
 
 	ihold(inode);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 15/53] nfs: use d_duplicate()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (13 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
                   ` (39 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

As preparation for d_alloc_parallel() being allowed without the
directory locked, use d_duplicate() to duplicate a dentry for silly
rename.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 4b73ec59bbcc..655a1e8467f7 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2781,11 +2781,9 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			spin_unlock(&new_dentry->d_lock);
 
 			/* copy the target dentry's name */
-			dentry = d_alloc(new_dentry->d_parent,
-					 &new_dentry->d_name);
+			dentry = d_duplicate(new_dentry);
 			if (!dentry)
 				goto out;
-
 			/* silly-rename the existing target ... */
 			err = nfs_sillyrename(new_dir, new_dentry);
 			if (err)
@@ -2850,8 +2848,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		nfs_dentry_handle_enoent(old_dentry);
 
 	/* new dentry created? */
-	if (dentry)
+	if (dentry) {
+		d_lookup_done(dentry);
 		dput(dentry);
+	}
 	return error;
 }
 EXPORT_SYMBOL_GPL(nfs_rename);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (14 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 15/53] nfs: use d_duplicate() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-15 13:51   ` Amir Goldstein
  2026-03-12 21:12 ` [PATCH 17/53] coda: don't d_drop() early NeilBrown
                   ` (38 subsequent siblings)
  54 siblings, 1 reply; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

When performing an "impure" readdir, ovl needs to perform a lookup on some
of the names that it found.
With proposed locking changes it will not be possible to perform this
lookup (in particular, not safe to wait for d_alloc_parallel()) while
holding a lock on the directory.

ovl doesn't really need the lock at this point.  It has already iterated
the directory and has cached a list of the contents.  It now needs to
gather extra information about some contents.  It can do this without
the lock.

After gathering that info it needs to retake the lock for API
correctness.  After doing this it must check IS_DEADDIR() again to
ensure readdir always returns -ENOENT on a removed directory.

Note that while ->iterate_shared is called with a shared lock, ovl uses
WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
that exclusive lock.

As the directory is no longer locked in ovl_cache_update() we need
dget_parent() to get a reference to the parent.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/overlayfs/readdir.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1dcc75b3a90f..d5123b37921c 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 			goto get;
 		}
 		if (p->len == 2) {
-			/* we shall not be moved */
-			this = dget(dir->d_parent);
+			this = dget_parent(dir);
 			goto get;
 		}
 	}
 	/* This checks also for xwhiteouts */
-	this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
+	this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
 	if (IS_ERR_OR_NULL(this) || !this->d_inode) {
 		/* Mark a stale entry */
 		p->is_whiteout = true;
@@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
 	if (err)
 		return err;
 
+	inode_unlock(path->dentry->d_inode);
 	list_for_each_entry_safe(p, n, list, l_node) {
 		if (!name_is_dot_dotdot(p->name, p->len)) {
 			err = ovl_cache_update(path, p, true);
 			if (err)
-				return err;
+				break;
 		}
 		if (p->ino == p->real_ino) {
 			list_del(&p->l_node);
@@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
 			struct rb_node *parent = NULL;
 
 			if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
-							      &newp, &parent)))
-				return -EIO;
+							      &newp, &parent))) {
+				err = -EIO;
+				break;
+			}
 
 			rb_link_node(&p->node, parent, newp);
 			rb_insert_color(&p->node, root);
 		}
 	}
-	return 0;
+	inode_lock(path->dentry->d_inode);
+	if (IS_DEADDIR(path->dentry->d_inode))
+		err = -ENOENT;
+	return err;
 }
 
 static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 17/53] coda: don't d_drop() early.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (15 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 18/53] shmem: use d_duplicate() NeilBrown
                   ` (37 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Proposed locking changes will mean that calling d_drop() could
effectively unlock the name allowing a parallel lookup to proceed.
For this reason it could only be called *after* the attempt to create a
symlink (in this case) has completed (whether successfully or not).

So move the d_drop() to after the venus_symlink() call.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/coda/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index c64b8cd81568..70eb6042fdaa 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -244,13 +244,13 @@ static int coda_symlink(struct mnt_idmap *idmap,
 	if (symlen > CODA_MAXPATHLEN)
 		return -ENAMETOOLONG;
 
+	error = venus_symlink(dir_inode->i_sb, coda_i2f(dir_inode), name, len,
+			      symname, symlen);
 	/*
-	 * This entry is now negative. Since we do not create
+	 * This entry is still negative. Since we did not create
 	 * an inode for the entry we have to drop it.
 	 */
 	d_drop(de);
-	error = venus_symlink(dir_inode->i_sb, coda_i2f(dir_inode), name, len,
-			      symname, symlen);
 
 	/* mtime is no good anymore */
 	if (!error)
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 18/53] shmem: use d_duplicate()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (16 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 17/53] coda: don't d_drop() early NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 19/53] afs: use d_time instead of d_fsdata NeilBrown
                   ` (36 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

To prepare for d_alloc_parallel() being permitted without a directory
lock, use d_duplicate() when duplicating a dentry in order to create a
whiteout.

Signed-off-by: NeilBrown <neil@brown.name>
---
 mm/shmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index b40f3cd48961..6b39a59355d7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4030,11 +4030,12 @@ static int shmem_whiteout(struct mnt_idmap *idmap,
 	struct dentry *whiteout;
 	int error;
 
-	whiteout = d_alloc(old_dentry->d_parent, &old_dentry->d_name);
+	whiteout = d_duplicate(old_dentry);
 	if (!whiteout)
 		return -ENOMEM;
 	error = shmem_mknod(idmap, old_dir, whiteout,
 			    S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
+	d_lookup_done(whiteout);
 	dput(whiteout);
 	return error;
 }
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 19/53] afs: use d_time instead of d_fsdata
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (17 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 18/53] shmem: use d_duplicate() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename NeilBrown
                   ` (35 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

afs uses ->d_fsdata to store version information for the parent
directory.  ->d_time is arguably a better field to store this
information as the version is like a time stamp, and ->d_time is an
unsigned long, while ->d_fsdata is a void *.

This will leave ->d_fsdata free for a different use ...  which
admittedly is also not a void*, but is certainly not at all a time.

Interesting the value stored in ->d_time or d_fsdata is u64 which is a
different size of 32 bit hosts.  Maybe that doesn't matter.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir.c      | 18 +++++++++---------
 fs/afs/internal.h |  8 ++++----
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 78caef3f1338..a0417292314c 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -808,7 +808,7 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry)
 		afs_dir_iterate(dir, &cookie->ctx, NULL, &data_version);
 	}
 
-	dentry->d_fsdata = (void *)(unsigned long)data_version;
+	dentry->d_time = (unsigned long)data_version;
 
 	/* Check to see if we already have an inode for the primary fid. */
 	inode = ilookup5(dir->i_sb, cookie->fids[1].vnode,
@@ -895,9 +895,9 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry)
 	}
 
 	if (op->file[0].scb.have_status)
-		dentry->d_fsdata = (void *)(unsigned long)op->file[0].scb.status.data_version;
+		dentry->d_time = (unsigned long)op->file[0].scb.status.data_version;
 	else
-		dentry->d_fsdata = (void *)(unsigned long)op->file[0].dv_before;
+		dentry->d_time = (unsigned long)op->file[0].dv_before;
 	ret = afs_put_operation(op);
 out:
 	kfree(cookie);
@@ -1010,7 +1010,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 	_debug("splice %p", dentry->d_inode);
 	d = d_splice_alias(inode, dentry);
 	if (!IS_ERR_OR_NULL(d)) {
-		d->d_fsdata = dentry->d_fsdata;
+		d->d_time = dentry->d_time;
 		trace_afs_lookup(dvnode, &d->d_name, &fid);
 	} else {
 		trace_afs_lookup(dvnode, &dentry->d_name, &fid);
@@ -1040,7 +1040,7 @@ static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 	 * version.
 	 */
 	dir_version = (long)READ_ONCE(dvnode->status.data_version);
-	de_version = (long)READ_ONCE(dentry->d_fsdata);
+	de_version = (long)READ_ONCE(dentry->d_time);
 	if (de_version != dir_version) {
 		dir_version = (long)READ_ONCE(dvnode->invalid_before);
 		if (de_version - dir_version < 0)
@@ -1100,7 +1100,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	 * version.
 	 */
 	dir_version = dir->status.data_version;
-	de_version = (long)dentry->d_fsdata;
+	de_version = (long)dentry->d_time;
 	if (de_version == (long)dir_version)
 		goto out_valid_noupdate;
 
@@ -1161,7 +1161,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	}
 
 out_valid:
-	dentry->d_fsdata = (void *)(unsigned long)dir_version;
+	dentry->d_time = (unsigned long)dir_version;
 out_valid_noupdate:
 	key_put(key);
 	_leave(" = 1 [valid]");
@@ -1931,7 +1931,7 @@ static void afs_rename_edit_dir(struct afs_operation *op)
 		spin_unlock(&new_inode->i_lock);
 	}
 
-	/* Now we can update d_fsdata on the dentries to reflect their
+	/* Now we can update d_time on the dentries to reflect their
 	 * new parent's data_version.
 	 */
 	afs_update_dentry_version(op, new_dvp, op->dentry);
@@ -2167,7 +2167,7 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	}
 
 	/* This bit is potentially nasty as there's a potential race with
-	 * afs_d_revalidate{,_rcu}().  We have to change d_fsdata on the dentry
+	 * afs_d_revalidate{,_rcu}().  We have to change d_time_ on the dentry
 	 * to reflect it's new parent's new data_version after the op, but
 	 * d_revalidate may see old_dentry between the op having taken place
 	 * and the version being updated.
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 009064b8d661..106a7fe06b56 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1746,17 +1746,17 @@ static inline struct inode *AFS_VNODE_TO_I(struct afs_vnode *vnode)
 }
 
 /*
- * Note that a dentry got changed.  We need to set d_fsdata to the data version
+ * Note that a dentry got changed.  We need to set d_time to the data version
  * number derived from the result of the operation.  It doesn't matter if
- * d_fsdata goes backwards as we'll just revalidate.
+ * d_time goes backwards as we'll just revalidate.
  */
 static inline void afs_update_dentry_version(struct afs_operation *op,
 					     struct afs_vnode_param *dir_vp,
 					     struct dentry *dentry)
 {
 	if (!op->cumul_error.error)
-		dentry->d_fsdata =
-			(void *)(unsigned long)dir_vp->scb.status.data_version;
+		dentry->d_time =
+			(unsigned long)dir_vp->scb.status.data_version;
 }
 
 /*
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (18 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 19/53] afs: use d_time instead of d_fsdata NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() NeilBrown
                   ` (34 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

afs needs to block lookup of dentries during unlink and rename.
There are two reasons:
1/ If the target is to be removed, not silly-renamed, the subsequent
   opens cannot be allowed as the file won't exist on the server.
2/ If the rename source is being moved between directories a lookup,
   particularly d_revalidate, might change ->d_time asynchronously
   with rename changing ->d_time with possible incorrect results.

afs current unhashes the dentry to force a lookup which will wait on the
directory lock, and rehashes afterwards.  This is incompatible with
proposed changed to directory locking which will require a dentry to
remain hashed throughout rename/unlink/etc operations.

This patch copies a mechanism developed for NFS.  ->d_fsdata which is
currently unused is now set to a non-NULL value when lookups must be
blocked.  d_revalidate checks for this value, and waits for it to become
NULL.

->d_lock is used to ensure d_revalidate never updates ->d_time while
->d_fsdata is set.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/afs.h      |  7 ++++++
 fs/afs/dir.c      | 64 +++++++++++++++++++++++++++++------------------
 fs/afs/internal.h |  5 +---
 3 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index ec3db00bd081..019e77b08458 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -26,6 +26,13 @@ typedef u64			afs_volid_t;
 typedef u64			afs_vnodeid_t;
 typedef u64			afs_dataversion_t;
 
+/* This is stored in ->d_fsdata to stop d_revalidate looking at,
+ * and possibly changing, ->d_time on a dentry which is being moved
+ * between directories, and to block lookup for dentry that is
+ * being removed without silly-rename.
+ */
+#define AFS_FSDATA_BLOCKED ((void*)1)
+
 typedef enum {
 	AFSVL_RWVOL,			/* read/write volume */
 	AFSVL_ROVOL,			/* read-only volume */
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index a0417292314c..9c57614feccf 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1034,6 +1034,10 @@ static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 	if (!afs_check_validity(dvnode))
 		return -ECHILD;
 
+	/* A rename/unlink is pending */
+	if (dentry->d_fsdata)
+		return -ECHILD;
+
 	/* We only need to invalidate a dentry if the server's copy changed
 	 * behind our back.  If we made the change, it's no problem.  Note that
 	 * on a 32-bit system, we only have 32 bits in the dentry to store the
@@ -1069,6 +1073,10 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	if (flags & LOOKUP_RCU)
 		return afs_d_revalidate_rcu(dir, dentry);
 
+	/* Wait for rename/unlink to complete */
+wait_for_rename:
+	wait_var_event(&dentry->d_fsdata, dentry->d_fsdata == NULL);
+
 	if (d_really_is_positive(dentry)) {
 		vnode = AFS_FS_I(d_inode(dentry));
 		_enter("{v={%llx:%llu} n=%pd fl=%lx},",
@@ -1161,7 +1169,13 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	}
 
 out_valid:
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_fsdata) {
+		spin_unlock(&dentry->d_lock);
+		goto wait_for_rename;
+	}
 	dentry->d_time = (unsigned long)dir_version;
+	spin_unlock(&dentry->d_lock);
 out_valid_noupdate:
 	key_put(key);
 	_leave(" = 1 [valid]");
@@ -1536,8 +1550,7 @@ static void afs_unlink_edit_dir(struct afs_operation *op)
 static void afs_unlink_put(struct afs_operation *op)
 {
 	_enter("op=%08x", op->debug_id);
-	if (op->unlink.need_rehash && afs_op_error(op) < 0 && afs_op_error(op) != -ENOENT)
-		d_rehash(op->dentry);
+	store_release_wake_up(&op->dentry->d_fsdata, NULL);
 }
 
 static const struct afs_operation_ops afs_unlink_operation = {
@@ -1591,11 +1604,7 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry)
 		afs_op_set_error(op, afs_sillyrename(dvnode, vnode, dentry, op->key));
 		goto error;
 	}
-	if (!d_unhashed(dentry)) {
-		/* Prevent a race with RCU lookup. */
-		__d_drop(dentry);
-		op->unlink.need_rehash = true;
-	}
+	dentry->d_fsdata = AFS_FSDATA_BLOCKED;
 	spin_unlock(&dentry->d_lock);
 
 	op->file[1].vnode = vnode;
@@ -1885,9 +1894,10 @@ static void afs_rename_edit_dir(struct afs_operation *op)
 
 	_enter("op=%08x", op->debug_id);
 
-	if (op->rename.rehash) {
-		d_rehash(op->rename.rehash);
-		op->rename.rehash = NULL;
+	if (op->rename.unblock) {
+		/* Rename has finished, so unlocks lookups to target */
+		store_release_wake_up(&op->rename.unblock->d_fsdata, NULL);
+		op->rename.unblock = NULL;
 	}
 
 	fscache_begin_write_operation(&orig_cres, afs_vnode_cache(orig_dvnode));
@@ -1970,6 +1980,9 @@ static void afs_rename_exchange_edit_dir(struct afs_operation *op)
 
 		d_exchange(old_dentry, new_dentry);
 		up_write(&orig_dvnode->validate_lock);
+	/* dentry has been moved, so d_validate can safely proceed */
+	store_release_wake_up(&old_dentry->d_fsdata, NULL);
+
 	} else {
 		down_write(&orig_dvnode->validate_lock);
 		if (test_bit(AFS_VNODE_DIR_VALID, &orig_dvnode->flags) &&
@@ -2009,11 +2022,10 @@ static void afs_rename_exchange_edit_dir(struct afs_operation *op)
 static void afs_rename_put(struct afs_operation *op)
 {
 	_enter("op=%08x", op->debug_id);
-	if (op->rename.rehash)
-		d_rehash(op->rename.rehash);
+	if (op->rename.unblock)
+		store_release_wake_up(&op->rename.unblock->d_fsdata, NULL);
+	store_release_wake_up(&op->dentry->d_fsdata, NULL);
 	dput(op->rename.tmp);
-	if (afs_op_error(op))
-		d_rehash(op->dentry);
 }
 
 static const struct afs_operation_ops afs_rename_operation = {
@@ -2121,7 +2133,6 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		op->ops		= &afs_rename_noreplace_operation;
 	} else if (flags & RENAME_EXCHANGE) {
 		op->ops		= &afs_rename_exchange_operation;
-		d_drop(new_dentry);
 	} else {
 		/* If we might displace the target, we might need to do silly
 		 * rename.
@@ -2135,14 +2146,12 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		 */
 		if (d_is_positive(new_dentry) && !d_is_dir(new_dentry)) {
 			/* To prevent any new references to the target during
-			 * the rename, we unhash the dentry in advance.
+			 * the rename, we set d_fsdata which afs_d_revalidate will wait for.
+			 * d_lock ensures d_count() and ->d_fsdata are consistent.
 			 */
-			if (!d_unhashed(new_dentry)) {
-				d_drop(new_dentry);
-				op->rename.rehash = new_dentry;
-			}
-
+			spin_lock(&new_dentry->d_lock);
 			if (d_count(new_dentry) > 2) {
+				spin_unlock(&new_dentry->d_lock);
 				/* copy the target dentry's name */
 				op->rename.tmp = d_alloc(new_dentry->d_parent,
 							 &new_dentry->d_name);
@@ -2160,8 +2169,12 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 				}
 
 				op->dentry_2 = op->rename.tmp;
-				op->rename.rehash = NULL;
 				op->rename.new_negative = true;
+			} else {
+				/* Block any lookups to target until the rename completes */
+				new_dentry->d_fsdata = AFS_FSDATA_BLOCKED;
+				op->rename.unblock = new_dentry;
+				spin_unlock(&new_dentry->d_lock);
 			}
 		}
 	}
@@ -2172,10 +2185,11 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	 * d_revalidate may see old_dentry between the op having taken place
 	 * and the version being updated.
 	 *
-	 * So drop the old_dentry for now to make other threads go through
-	 * lookup instead - which we hold a lock against.
+	 * So block revalidate on the old_dentry until the rename completes.
 	 */
-	d_drop(old_dentry);
+	spin_lock(&old_dentry->d_lock);
+	old_dentry->d_fsdata = AFS_FSDATA_BLOCKED;
+	spin_unlock(&old_dentry->d_lock);
 
 	ret = afs_do_sync_operation(op);
 	if (ret == -ENOTSUPP)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 106a7fe06b56..f2898ce9c0e6 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -891,10 +891,7 @@ struct afs_operation {
 			const char *symlink;
 		} create;
 		struct {
-			bool	need_rehash;
-		} unlink;
-		struct {
-			struct dentry	*rehash;
+			struct dentry	*unblock;
 			struct dentry	*tmp;
 			unsigned int	rename_flags;
 			bool		new_negative;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (19 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() NeilBrown
                   ` (33 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

As afs supports the fhandle interfaces there is a theoretical possibility
that the inode created for mkdir could be found by open_by_handle_at()
and given a dentry before d_instantiate() is called.  This would result
in two dentries for the one directory inode, which is not permitted.

So this patch changes afs_mkdir() to use d_splice_alias() and to
return the alternate dentry from ->mkdir() if appropriate.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir.c      | 14 ++++++++++----
 fs/afs/internal.h |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9c57614feccf..1e472768e1f1 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1248,7 +1248,7 @@ void afs_check_for_remote_deletion(struct afs_operation *op)
 /*
  * Create a new inode for create/mkdir/symlink
  */
-static void afs_vnode_new_inode(struct afs_operation *op)
+static struct dentry *afs_vnode_new_inode(struct afs_operation *op)
 {
 	struct afs_vnode_param *dvp = &op->file[0];
 	struct afs_vnode_param *vp = &op->file[1];
@@ -1265,7 +1265,7 @@ static void afs_vnode_new_inode(struct afs_operation *op)
 		 * the new directory on the server.
 		 */
 		afs_op_accumulate_error(op, PTR_ERR(inode), 0);
-		return;
+		return NULL;
 	}
 
 	vnode = AFS_FS_I(inode);
@@ -1276,7 +1276,7 @@ static void afs_vnode_new_inode(struct afs_operation *op)
 		afs_init_new_symlink(vnode, op);
 	if (!afs_op_error(op))
 		afs_cache_permit(vnode, op->key, vnode->cb_break, &vp->scb);
-	d_instantiate(op->dentry, inode);
+	return d_splice_alias(inode, op->dentry);
 }
 
 static void afs_create_success(struct afs_operation *op)
@@ -1285,7 +1285,7 @@ static void afs_create_success(struct afs_operation *op)
 	op->ctime = op->file[0].scb.status.mtime_client;
 	afs_vnode_commit_status(op, &op->file[0]);
 	afs_update_dentry_version(op, &op->file[0], op->dentry);
-	afs_vnode_new_inode(op);
+	op->create.ret = afs_vnode_new_inode(op);
 }
 
 static void afs_create_edit_dir(struct afs_operation *op)
@@ -1356,6 +1356,12 @@ static struct dentry *afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 	op->ops		= &afs_mkdir_operation;
 	ret = afs_do_sync_operation(op);
 	afs_dir_unuse_cookie(dvnode, ret);
+	if (op->create.ret) {
+		/* Alternate dentry */
+		if (ret == 0)
+			return op->create.ret;
+		dput(op->create.ret);
+	}
 	return ERR_PTR(ret);
 }
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f2898ce9c0e6..ce94f10a14c0 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -889,6 +889,7 @@ struct afs_operation {
 			int	reason;		/* enum afs_edit_dir_reason */
 			mode_t	mode;
 			const char *symlink;
+			struct dentry *ret;
 		} create;
 		struct {
 			struct dentry	*unblock;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (20 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock NeilBrown
                   ` (32 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

Rather than performing a normal lookup (which will be awkward with future
locking changes) use d_alloc_noblock() to find a dentry for an
unused name, and use an open-coded lookup_slow() to see if it is free on
the server.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir_silly.c | 51 ++++++++++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index 982bb6ec15f0..699143b21cdd 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -112,7 +112,9 @@ int afs_sillyrename(struct afs_vnode *dvnode, struct afs_vnode *vnode,
 		    struct dentry *dentry, struct key *key)
 {
 	static unsigned int sillycounter;
-	struct dentry *sdentry = NULL;
+	struct dentry *sdentry = NULL, *old;
+	struct inode *dir = dentry->d_parent->d_inode;
+	struct qstr qsilly;
 	unsigned char silly[16];
 	int ret = -EBUSY;
 
@@ -122,23 +124,38 @@ int afs_sillyrename(struct afs_vnode *dvnode, struct afs_vnode *vnode,
 	if (dentry->d_flags & DCACHE_NFSFS_RENAMED)
 		return -EBUSY;
 
-	sdentry = NULL;
-	do {
-		dput(sdentry);
-		sillycounter++;
-
-		/* Create a silly name.  Note that the ".__afs" prefix is
-		 * understood by the salvager and must not be changed.
-		 */
-		scnprintf(silly, sizeof(silly), ".__afs%04X", sillycounter);
-		sdentry = lookup_noperm(&QSTR(silly), dentry->d_parent);
+newname:
+	sillycounter++;
 
-		/* N.B. Better to return EBUSY here ... it could be dangerous
-		 * to delete the file while it's in use.
-		 */
-		if (IS_ERR(sdentry))
-			goto out;
-	} while (!d_is_negative(sdentry));
+	/* Create a silly name.  Note that the ".__afs" prefix is
+	 * understood by the salvager and must not be changed.
+	 */
+	scnprintf(silly, sizeof(silly), ".__afs%04X", sillycounter);
+	qsilly = QSTR(silly);
+	sdentry = try_lookup_noperm(&qsilly, dentry->d_parent);
+	if (!sdentry)
+		sdentry = d_alloc_noblock(dentry->d_parent, &qsilly);
+	if (sdentry == ERR_PTR(-EWOULDBLOCK))
+		/* try another name */
+		goto newname;
+	/* N.B. Better to return EBUSY here ... it could be dangerous
+	 * to delete the file while it's in use.
+	 */
+	if (IS_ERR(sdentry))
+		goto out;
+	if (d_is_positive(sdentry)) {
+		dput(sdentry);
+		goto newname;
+	}
+	/* This name isn't known locally - check on server */
+	old = dir->i_op->lookup(dir, sdentry, 0);
+	d_lookup_done(sdentry);
+	if (old || d_is_positive(sdentry)) {
+		if (!IS_ERR(old))
+			dput(old);
+		dput(sdentry);
+		goto newname;
+	}
 
 	ihold(&vnode->netfs.inode);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (21 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 24/53] afs: use d_duplicate() NeilBrown
                   ` (31 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

If afs is asked to lookup a name ending with @sys, it needs to look up
a different name for which is allocates a dentry with
d_alloc_parallel().
This is done while the parent lock is held which will be a problem in
a future patch where the ordering of the parent lock and
d_alloc_parallel() locking is reversed.

There is no actual need to hold the lock while d_alloc_parallel() is
called, so with this patch we drop the lock and reclaim it.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 1e472768e1f1..c195ee851191 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -908,12 +908,14 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry)
 /*
  * Look up an entry in a directory with @sys substitution.
  */
-static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry)
+static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry,
+				       unsigned int flags)
 {
 	struct afs_sysnames *subs;
 	struct afs_net *net = afs_i2net(dir);
 	struct dentry *ret;
 	char *buf, *p, *name;
+	struct qstr nm;
 	int len, i;
 
 	_enter("");
@@ -933,6 +935,13 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry)
 	refcount_inc(&subs->usage);
 	read_unlock(&net->sysnames_lock);
 
+	/* Calling d_alloc_parallel() while holding parent locked is undesirable.
+	 * We don't really need the lock any more.
+	 */
+	if (flags & LOOKUP_SHARED)
+		inode_unlock_shared(dir);
+	else
+		inode_unlock(dir);
 	for (i = 0; i < subs->nr; i++) {
 		name = subs->subs[i];
 		len = dentry->d_name.len - 4 + strlen(name);
@@ -942,7 +951,10 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry)
 		}
 
 		strcpy(p, name);
-		ret = lookup_noperm(&QSTR(buf), dentry->d_parent);
+		nm = QSTR(buf);
+		ret = try_lookup_noperm(&nm, dentry->d_parent);
+		if (!ret)
+			ret = d_alloc_parallel(dentry->d_parent, &nm);
 		if (IS_ERR(ret) || d_is_positive(ret))
 			goto out_s;
 		dput(ret);
@@ -953,6 +965,10 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry)
 	 */
 	ret = NULL;
 out_s:
+	if (flags & LOOKUP_SHARED)
+		inode_lock_shared(dir);
+	else
+		inode_lock_nested(dir, I_MUTEX_PARENT);
 	afs_put_sysnames(subs);
 	kfree(buf);
 out_p:
@@ -998,7 +1014,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 	    dentry->d_name.name[dentry->d_name.len - 3] == 's' &&
 	    dentry->d_name.name[dentry->d_name.len - 2] == 'y' &&
 	    dentry->d_name.name[dentry->d_name.len - 1] == 's')
-		return afs_lookup_atsys(dir, dentry);
+		return afs_lookup_atsys(dir, dentry, flags);
 
 	afs_stat_v(dvnode, n_lookup);
 	inode = afs_do_lookup(dir, dentry);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 24/53] afs: use d_duplicate()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (22 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata NeilBrown
                   ` (30 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

To prepare for d_alloc_parallel() being permitted without a directory
lock, use d_duplicate() when duplicating a dentry in order to create a
whiteout.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index c195ee851191..b5c593f50079 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -2047,6 +2047,8 @@ static void afs_rename_put(struct afs_operation *op)
 	if (op->rename.unblock)
 		store_release_wake_up(&op->rename.unblock->d_fsdata, NULL);
 	store_release_wake_up(&op->dentry->d_fsdata, NULL);
+	if (op->rename.tmp)
+		d_lookup_done(op->rename.tmp);
 	dput(op->rename.tmp);
 }
 
@@ -2175,8 +2177,7 @@ static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			if (d_count(new_dentry) > 2) {
 				spin_unlock(&new_dentry->d_lock);
 				/* copy the target dentry's name */
-				op->rename.tmp = d_alloc(new_dentry->d_parent,
-							 &new_dentry->d_name);
+				op->rename.tmp = d_duplicate(new_dentry);
 				if (!op->rename.tmp) {
 					afs_op_nomem(op);
 					goto error;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (23 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 24/53] afs: use d_duplicate() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens NeilBrown
                   ` (29 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

smb/client uses d_fsdata is exactly the way that d_time is intended to
be used.  It previous used d_time but this was changed in
  Commit: a00be0e31f8d ("cifs: don't use ->d_time")
without any reason being given.

This patch effectively reverts that patch (though it doesn't remove the
helpers) so that d_fsdata can be used for something more generic.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/smb/client/cifsfs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index e320d39b01f5..5153e811c50b 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -30,12 +30,12 @@ cifs_uniqueid_to_ino_t(u64 fileid)
 
 static inline void cifs_set_time(struct dentry *dentry, unsigned long time)
 {
-	dentry->d_fsdata = (void *) time;
+	dentry->d_time = time;
 }
 
 static inline unsigned long cifs_get_time(struct dentry *dentry)
 {
-	return (unsigned long) dentry->d_fsdata;
+	return dentry->d_time;
 }
 
 extern struct file_system_type cifs_fs_type, smb3_fs_type;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (24 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open NeilBrown
                   ` (28 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

smb/client needs to block new opens of the target of unlink and rename
while the operation is progressing.  This stablises d_count() and allows
a determination of whether a "silly-rename" is required.

It currently unhashes the dentry which will cause lookup to block on
the parent directory i_rwsem.  Proposed changes to locking will cause
this approach to stop working and the exclusivity will be provided for
the dentry only, and only while it is hashed.

So we introduce a new machanism similar to that used by nfs and afs.
->d_fsdata (currently unused by smb/client) is set to a non-NULL
value when lookups need to be blocked.  ->d_revalidate checks for this
and blocks.  This might still allow d_count() to increment, but once it
has been tested as 1, there can be no new opens completed.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/smb/client/dir.c   |  3 +++
 fs/smb/client/inode.c | 48 +++++++++++++++++--------------------------
 2 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index cb10088197d2..cecbc0cce5c5 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -790,6 +790,9 @@ cifs_d_revalidate(struct inode *dir, const struct qstr *name,
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
+	/* Wait for pending rename/unlink */
+	wait_var_event(&direntry->d_fsdata, direntry->d_fsdata == NULL);
+
 	if (d_really_is_positive(direntry)) {
 		int rc;
 		struct inode *inode = d_inode(direntry);
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index d4d3cfeb6c90..3549605fa9c2 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -28,6 +28,13 @@
 #include "cached_dir.h"
 #include "reparse.h"
 
+/* This is stored in ->d_fsdata to block d_revalidate on a
+ * file dentry that is being removed - unlink or rename target.
+ * This causes any open attempt to block.  There may be existing opens
+ * but they can be detected by checking d_count() under ->d_lock.
+ */
+#define CIFS_FSDATA_BLOCKED ((void *)1)
+
 /*
  * Set parameters for the netfs library
  */
@@ -1946,27 +1953,21 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren
 	__u32 dosattr = 0, origattr = 0;
 	struct TCP_Server_Info *server;
 	struct iattr *attrs = NULL;
-	bool rehash = false;
 
 	cifs_dbg(FYI, "cifs_unlink, dir=0x%p, dentry=0x%p\n", dir, dentry);
 
 	if (unlikely(cifs_forced_shutdown(cifs_sb)))
 		return smb_EIO(smb_eio_trace_forced_shutdown);
 
-	/* Unhash dentry in advance to prevent any concurrent opens */
-	spin_lock(&dentry->d_lock);
-	if (!d_unhashed(dentry)) {
-		__d_drop(dentry);
-		rehash = true;
-	}
-	spin_unlock(&dentry->d_lock);
-
 	tlink = cifs_sb_tlink(cifs_sb);
 	if (IS_ERR(tlink))
 		return PTR_ERR(tlink);
 	tcon = tlink_tcon(tlink);
 	server = tcon->ses->server;
 
+	/* Set d_fsdata to prevent any concurrent opens */
+	dentry->d_fsdata = CIFS_FSDATA_BLOCKED;
+
 	xid = get_xid();
 	page = alloc_dentry_path();
 
@@ -2083,8 +2084,9 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren
 	kfree(attrs);
 	free_xid(xid);
 	cifs_put_tlink(tlink);
-	if (rehash)
-		d_rehash(dentry);
+
+	/* Allow lookups */
+	store_release_wake_up(&dentry->d_fsdata, NULL);
 	return rc;
 }
 
@@ -2501,7 +2503,6 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	struct cifs_sb_info *cifs_sb;
 	struct tcon_link *tlink;
 	struct cifs_tcon *tcon;
-	bool rehash = false;
 	unsigned int xid;
 	int rc, tmprc;
 	int retry_count = 0;
@@ -2517,23 +2518,15 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	if (unlikely(cifs_forced_shutdown(cifs_sb)))
 		return smb_EIO(smb_eio_trace_forced_shutdown);
 
-	/*
-	 * Prevent any concurrent opens on the target by unhashing the dentry.
-	 * VFS already unhashes the target when renaming directories.
-	 */
-	if (d_is_positive(target_dentry) && !d_is_dir(target_dentry)) {
-		if (!d_unhashed(target_dentry)) {
-			d_drop(target_dentry);
-			rehash = true;
-		}
-	}
-
 	tlink = cifs_sb_tlink(cifs_sb);
 	if (IS_ERR(tlink))
 		return PTR_ERR(tlink);
 	tcon = tlink_tcon(tlink);
 	server = tcon->ses->server;
 
+	/* Set d_fsdata to prevent any concurrent opens */
+	target_dentry->d_fsdata = CIFS_FSDATA_BLOCKED;
+
 	page1 = alloc_dentry_path();
 	page2 = alloc_dentry_path();
 	xid = get_xid();
@@ -2570,8 +2563,6 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 		}
 	}
 
-	if (!rc)
-		rehash = false;
 	/*
 	 * No-replace is the natural behavior for CIFS, so skip unlink hacks.
 	 */
@@ -2662,8 +2653,6 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 			}
 			rc = cifs_do_rename(xid, source_dentry, from_name,
 					    target_dentry, to_name);
-			if (!rc)
-				rehash = false;
 		}
 	}
 
@@ -2671,8 +2660,9 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	CIFS_I(source_dir)->time = CIFS_I(target_dir)->time = 0;
 
 cifs_rename_exit:
-	if (rehash)
-		d_rehash(target_dentry);
+	/* Allow lookups */
+	store_release_wake_up(&target_dentry->d_fsdata, NULL);
+
 	kfree(info_buf_source);
 	free_dentry_path(page2);
 	free_dentry_path(page1);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (25 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 28/53] smb/client: Use d_alloc_noblock() in cifs_prime_dcache() NeilBrown
                   ` (27 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

atomic_open can be called with a hashed-negative dentry or an in-lookup
dentry.  Rather than d_drop() and d_add() we can use d_splice_alias()
which keeps the dentry hashed - important for proposed locking changes.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/smb/client/dir.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index cecbc0cce5c5..361a20987927 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -439,8 +439,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 			goto out_err;
 		}
 
-	d_drop(direntry);
-	d_add(direntry, newinode);
+	d_splice_alias(newinode, direntry);
 
 out:
 	free_dentry_path(page);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 28/53] smb/client: Use d_alloc_noblock() in cifs_prime_dcache()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (26 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 29/53] exfat: simplify exfat_lookup() NeilBrown
                   ` (26 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

cifs uses the results of readdir to prime the dcache.  Using
d_alloc_parallel() can block if there is a concurrent lookup.  Blocking
in that case is pointless as the lookup will add info to the dcache and
there is no value in the readdir waiting to see if it should add the
info too.

Also this call to d_alloc_parallel() is made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So change to use d_alloc_noblock().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/smb/client/readdir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 47f5d620b750..dabf9507bc40 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -104,7 +104,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 		    (fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
 			return;
 
-		dentry = d_alloc_parallel(parent, name);
+		dentry = d_alloc_noblock(parent, name);
 	}
 	if (IS_ERR(dentry))
 		return;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 29/53] exfat: simplify exfat_lookup()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (27 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 28/53] smb/client: Use d_alloc_noblock() in cifs_prime_dcache() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group() NeilBrown
                   ` (25 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

1/ exfat_d_anon_disconn() serves no purpose.
  It is only called (on alias) when
         alias->d_parent == dentry->d_parent
  and in that case IS_ROOT(dentry) will return false, so the whole
  function will return false.
  So we can remove it.

2/ When an alias for the inode is found in the same parent
  it is always sufficient to d_move() the alias to the new
  name.  This will keep just one dentry around when there are multiple
  effective names, and it will always show the most recently used name,
  which appears to be the intention.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/exfat/namei.c | 36 +++++++-----------------------------
 1 file changed, 7 insertions(+), 29 deletions(-)

diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 670116ae9ec8..e04cda7425da 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -711,11 +711,6 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
 	return 0;
 }
 
-static int exfat_d_anon_disconn(struct dentry *dentry)
-{
-	return IS_ROOT(dentry) && (dentry->d_flags & DCACHE_DISCONNECTED);
-}
-
 static struct dentry *exfat_lookup(struct inode *dir, struct dentry *dentry,
 		unsigned int flags)
 {
@@ -750,32 +745,15 @@ static struct dentry *exfat_lookup(struct inode *dir, struct dentry *dentry,
 	 * Checking "alias->d_parent == dentry->d_parent" to make sure
 	 * FS is not corrupted (especially double linked dir).
 	 */
-	if (alias && alias->d_parent == dentry->d_parent &&
-			!exfat_d_anon_disconn(alias)) {
-
+	if (alias && alias->d_parent == dentry->d_parent) {
 		/*
-		 * Unhashed alias is able to exist because of revalidate()
-		 * called by lookup_fast. You can easily make this status
-		 * by calling create and lookup concurrently
-		 * In such case, we reuse an alias instead of new dentry
+		 * As EXFAT does not support hard-links this must
+		 * be an alternate name for the same file,
+		 * possibly longname vs 8.3 alias.
+		 * Rather than allocating a new dentry, use the old
+		 * one but keep the most recently used name.
 		 */
-		if (d_unhashed(alias)) {
-			WARN_ON(alias->d_name.hash_len !=
-				dentry->d_name.hash_len);
-			exfat_info(sb, "rehashed a dentry(%p) in read lookup",
-				   alias);
-			d_drop(dentry);
-			d_rehash(alias);
-		} else if (!S_ISDIR(i_mode)) {
-			/*
-			 * This inode has non anonymous-DCACHE_DISCONNECTED
-			 * dentry. This means, the user did ->lookup() by an
-			 * another name (longname vs 8.3 alias of it) in past.
-			 *
-			 * Switch to new one for reason of locality if possible.
-			 */
-			d_move(alias, dentry);
-		}
+		d_move(alias, dentry);
 		iput(inode);
 		mutex_unlock(&EXFAT_SB(sb)->s_lock);
 		return alias;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (28 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 29/53] exfat: simplify exfat_lookup() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 31/53] configfs: stop using d_add() NeilBrown
                   ` (24 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

These d_add() calls cannot be necessary.  The inode given is NULL so all
they do is attach the dentry to the hash table.

If configfs_attach_group() fails, then d_drop() is called so the dentry
will be detached.
If configfs_attach_group() succeeds, then
 configfs_attach_group -> configfs_attach_item ->configfs_create_dir
must have succeeded, so d_instantiate() will have been called and the
dentry hashed there.

So the only effect is that the dentry will be hashed-negative for a
short period which will allow a lookup to find nothing without waiting
for the directory i_rwsem.  I can find no indication that this might be
important.

Adding a dentry as negative, and then later making it positive is an
unusual pattern and appears to be unnecessary, so it is best avoided.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/configfs/dir.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 362b6ff9b908..c82eca0b5d73 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -706,8 +706,6 @@ static int create_default_group(struct config_group *parent_group,
 	ret = -ENOMEM;
 	child = d_alloc_name(parent, group->cg_item.ci_name);
 	if (child) {
-		d_add(child, NULL);
-
 		ret = configfs_attach_group(&parent_group->cg_item,
 					    &group->cg_item, child, frag);
 		if (!ret) {
@@ -1904,8 +1902,6 @@ int configfs_register_subsystem(struct configfs_subsystem *subsys)
 	err = -ENOMEM;
 	dentry = d_alloc_name(root, group->cg_item.ci_name);
 	if (dentry) {
-		d_add(dentry, NULL);
-
 		err = configfs_dirent_exists(dentry);
 		if (!err)
 			err = configfs_attach_group(sd->s_element,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 31/53] configfs: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (29 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
                   ` (23 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in configfs, but as
it is planned to remove d_add(), change to use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/configfs/dir.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index c82eca0b5d73..6ec589b6b8a4 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -501,8 +501,7 @@ static struct dentry * configfs_lookup(struct inode *dir,
 	}
 	spin_unlock(&configfs_dirent_lock);
 done:
-	d_add(dentry, inode);
-	return NULL;
+	return d_splice_alias(inode, dentry);
 }
 
 /*
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (30 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 31/53] configfs: stop using d_add() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-17 10:00   ` Jan Kara
  2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
                   ` (22 subsequent siblings)
  54 siblings, 1 reply; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

__ext4_link() is separate from ext4_link() so that it can be used for
replaying a "fast_commit" which records a link operation.
Replaying the fast_commit does not require any interaction with the
dcache - it is purely ext4-local - but it uses a dentry to simplify code
reuse.

An interface it uses - d_alloc() - is not generally useful and will soon
be removed.  This patch prepares ext4 for that removal.  Specifically it
removes all dcache-modification code from __ext4_link().  This will mean
that __ext4_link() treats the dentry as read-only so fast_commit reply
can simply provide an on-stack dentry.

Various "const" markers are sprinkled around to confirm that the dentry
is treated read-only.

This patch only rearranges code and slightly re-orders it, so those
changes can be reviewed separately.  The next patch will remove the use
of d_alloc().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c                 |  2 +-
 fs/ext4/ext4.h              |  4 ++--
 fs/ext4/fast_commit.c       | 14 +++++++++++---
 fs/ext4/namei.c             | 23 +++++++++++++----------
 include/linux/dcache.h      |  2 +-
 include/trace/events/ext4.h |  4 ++--
 6 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a1219b446b74..c48337d95f9a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -358,7 +358,7 @@ static inline int dname_external(const struct dentry *dentry)
 	return dentry->d_name.name != dentry->d_shortname.string;
 }
 
-void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
+void take_dentry_name_snapshot(struct name_snapshot *name, const struct dentry *dentry)
 {
 	unsigned seq;
 	const unsigned char *s;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 293f698b7042..1794407652ff 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2972,7 +2972,7 @@ void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t star
 void __ext4_fc_track_unlink(handle_t *handle, struct inode *inode,
 	struct dentry *dentry);
 void __ext4_fc_track_link(handle_t *handle, struct inode *inode,
-	struct dentry *dentry);
+	const struct dentry *dentry);
 void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry);
 void ext4_fc_track_link(handle_t *handle, struct dentry *dentry);
 void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
@@ -3719,7 +3719,7 @@ extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode,
 extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
 			 struct inode *inode, struct dentry *dentry);
 extern int __ext4_link(struct inode *dir, struct inode *inode,
-		       struct dentry *dentry);
+		       const struct dentry *dentry);
 
 #define S_SHIFT 12
 static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index f575751f1cae..2a5daf1d9667 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -388,7 +388,7 @@ static int ext4_fc_track_template(
 }
 
 struct __track_dentry_update_args {
-	struct dentry *dentry;
+	const struct dentry *dentry;
 	int op;
 };
 
@@ -400,7 +400,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct __track_dentry_update_args *dentry_update =
 		(struct __track_dentry_update_args *)arg;
-	struct dentry *dentry = dentry_update->dentry;
+	const struct dentry *dentry = dentry_update->dentry;
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct super_block *sb = inode->i_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -483,7 +483,7 @@ void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry)
 }
 
 void __ext4_fc_track_link(handle_t *handle,
-	struct inode *inode, struct dentry *dentry)
+	struct inode *inode, const struct dentry *dentry)
 {
 	struct __track_dentry_update_args args;
 	int ret;
@@ -1471,7 +1471,15 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
 		goto out;
 	}
 
+	ihold(inode);
+	inc_nlink(inode);
 	ret = __ext4_link(dir, inode, dentry_inode);
+	if (ret) {
+		drop_nlink(inode);
+		iput(inode);
+	} else {
+		d_instantiate(dentry_inode, inode);
+	}
 	/*
 	 * It's possible that link already existed since data blocks
 	 * for the dir in question got persisted before we crashed OR
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c4b5e252af0e..80e1051cab44 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2353,7 +2353,7 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
  * may not sleep between calling this and putting something into
  * the entry, as someone else might have used it while you slept.
  */
-static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
+static int ext4_add_entry(handle_t *handle, const struct dentry *dentry,
 			  struct inode *inode)
 {
 	struct inode *dir = d_inode(dentry->d_parent);
@@ -3445,7 +3445,7 @@ static int ext4_symlink(struct mnt_idmap *idmap, struct inode *dir,
 	return err;
 }
 
-int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
+int __ext4_link(struct inode *dir, struct inode *inode, const struct dentry *dentry)
 {
 	handle_t *handle;
 	int err, retries = 0;
@@ -3460,8 +3460,6 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
 		ext4_handle_sync(handle);
 
 	inode_set_ctime_current(inode);
-	ext4_inc_count(inode);
-	ihold(inode);
 
 	err = ext4_add_entry(handle, dentry, inode);
 	if (!err) {
@@ -3471,11 +3469,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
 		 */
 		if (inode->i_nlink == 1)
 			ext4_orphan_del(handle, inode);
-		d_instantiate(dentry, inode);
-		ext4_fc_track_link(handle, dentry);
-	} else {
-		drop_nlink(inode);
-		iput(inode);
+		__ext4_fc_track_link(handle, inode, dentry);
 	}
 	ext4_journal_stop(handle);
 	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
@@ -3504,7 +3498,16 @@ static int ext4_link(struct dentry *old_dentry,
 	err = dquot_initialize(dir);
 	if (err)
 		return err;
-	return __ext4_link(dir, inode, dentry);
+	ihold(inode);
+	ext4_inc_count(inode);
+	err = __ext4_link(dir, inode, dentry);
+	if (err) {
+		drop_nlink(inode);
+		iput(inode);
+	} else {
+		d_instantiate(dentry, inode);
+	}
+	return err;
 }
 
 /*
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a97eb151d9db..3b12577ddfbb 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -600,7 +600,7 @@ struct name_snapshot {
 	struct qstr name;
 	union shortname_store inline_name;
 };
-void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
+void take_dentry_name_snapshot(struct name_snapshot *, const struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
 
 static inline struct dentry *d_first_child(const struct dentry *dentry)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index a3e8fe414df8..efcf1018c208 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2870,7 +2870,7 @@ TRACE_EVENT(ext4_fc_stats,
 DECLARE_EVENT_CLASS(ext4_fc_track_dentry,
 
 	TP_PROTO(handle_t *handle, struct inode *inode,
-		 struct dentry *dentry, int ret),
+		 const struct dentry *dentry, int ret),
 
 	TP_ARGS(handle, inode, dentry, ret),
 
@@ -2902,7 +2902,7 @@ DECLARE_EVENT_CLASS(ext4_fc_track_dentry,
 #define DEFINE_EVENT_CLASS_DENTRY(__type)				\
 DEFINE_EVENT(ext4_fc_track_dentry, ext4_fc_track_##__type,		\
 	TP_PROTO(handle_t *handle, struct inode *inode,			\
-		 struct dentry *dentry, int ret),			\
+		 const struct dentry *dentry, int ret),			\
 	TP_ARGS(handle, inode, dentry, ret)				\
 )
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (31 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-17  9:37   ` Jan Kara
  2026-03-12 21:12 ` [PATCH 34/53] tracefs: stop using d_add() NeilBrown
                   ` (21 subsequent siblings)
  54 siblings, 1 reply; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

ext4_fc_replay_link_internal() uses two dentries to simply code-reuse
when replaying a "link" operation.  It does not need to interact with
the dcache and removes the dentries shortly after adding them.

They are passed to __ext4_link() which only performs read accesses on
these dentries and only uses the name and parent of dentry_inode (plus
checking a flag is unset) and only uses the inode of the parent.

So instead of allocating dentries and adding them to the dcache, allocat
two dentries on the stack, set up the required fields, and pass these to
__ext4_link().

This substantially simplifies the code and removes on of the few uses of
d_alloc() - preparing for its removal.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ext4/fast_commit.c | 40 ++++++++--------------------------------
 1 file changed, 8 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 2a5daf1d9667..e3593bb90a62 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1446,8 +1446,6 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
 				struct inode *inode)
 {
 	struct inode *dir = NULL;
-	struct dentry *dentry_dir = NULL, *dentry_inode = NULL;
-	struct qstr qstr_dname = QSTR_INIT(darg->dname, darg->dname_len);
 	int ret = 0;
 
 	dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL);
@@ -1457,28 +1455,14 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
 		goto out;
 	}
 
-	dentry_dir = d_obtain_alias(dir);
-	if (IS_ERR(dentry_dir)) {
-		ext4_debug("Failed to obtain dentry");
-		dentry_dir = NULL;
-		goto out;
-	}
+	{
+		struct dentry dentry_dir = { .d_inode = dir };
+		const struct dentry dentry_inode = {
+			.d_parent = &dentry_dir,
+			.d_name = QSTR_LEN(darg->dname, darg->dname_len),
+		};
 
-	dentry_inode = d_alloc(dentry_dir, &qstr_dname);
-	if (!dentry_inode) {
-		ext4_debug("Inode dentry not created.");
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	ihold(inode);
-	inc_nlink(inode);
-	ret = __ext4_link(dir, inode, dentry_inode);
-	if (ret) {
-		drop_nlink(inode);
-		iput(inode);
-	} else {
-		d_instantiate(dentry_inode, inode);
+		ret = __ext4_link(dir, inode, &dentry_inode);
 	}
 	/*
 	 * It's possible that link already existed since data blocks
@@ -1493,16 +1477,8 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
 
 	ret = 0;
 out:
-	if (dentry_dir) {
-		d_drop(dentry_dir);
-		dput(dentry_dir);
-	} else if (dir) {
+	if (dir)
 		iput(dir);
-	}
-	if (dentry_inode) {
-		d_drop(dentry_inode);
-		dput(dentry_inode);
-	}
 
 	return ret;
 }
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 34/53] tracefs: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (32 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 35/53] cephfs: " NeilBrown
                   ` (20 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in tracefs, but as
it is planned to remove d_add(), change to use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/tracefs/event_inode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8e5ac464b328..c30567b5331e 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -393,8 +393,7 @@ static struct dentry *lookup_file(struct eventfs_inode *parent_ei,
 	// Files have their parent's ei as their fsdata
 	dentry->d_fsdata = get_ei(parent_ei);
 
-	d_add(dentry, inode);
-	return NULL;
+	return d_splice_alias(inode, dentry);
 };
 
 /**
@@ -424,8 +423,7 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
 
 	dentry->d_fsdata = get_ei(ei);
 
-	d_add(dentry, inode);
-	return NULL;
+	return d_splice_alias(inode, dentry);
 }
 
 static inline struct eventfs_inode *init_ei(struct eventfs_inode *ei, const char *name)
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 35/53] cephfs: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (33 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 34/53] tracefs: stop using d_add() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace() NeilBrown
                   ` (19 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in tracefs, as the
inode is always NULL, but as it is planned to remove d_add(), change to
use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ceph/dir.c   | 5 ++---
 fs/ceph/inode.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 86d7aa594ea9..c7dac71b55bd 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -769,7 +769,7 @@ struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
 				d_drop(dentry);
 				err = -ENOENT;
 			} else {
-				d_add(dentry, NULL);
+				d_splice_alias(NULL, dentry);
 			}
 		}
 	}
@@ -840,9 +840,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 			spin_unlock(&ci->i_ceph_lock);
 			doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
 			      ceph_vinop(dir));
-			d_add(dentry, NULL);
 			di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
-			return NULL;
+			return d_splice_alias(NULL, dentry);
 		}
 		spin_unlock(&ci->i_ceph_lock);
 	}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index d76f9a79dc0c..59f9f6948bb2 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1773,7 +1773,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				d_delete(dn);
 			} else if (have_lease) {
 				if (d_unhashed(dn))
-					d_add(dn, NULL);
+					d_splice_alias(NULL, dn);
 			}
 
 			if (!d_unhashed(dn) && have_lease)
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (34 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 35/53] cephfs: " NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate() NeilBrown
                   ` (18 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

When performing a get_name export_operation, ceph sends a LOOKUPNAME op
to the server.  When it gets a reply it tries to look up the name
locally and if the name exists in the dcache with the wrong inode, it
discards the result and tries again.

If it doesn't find the name in the dcache it will allocate a new dentry
and never make any use of it.  The dentry is never instantiated and is
assigned to ->r_dentry which is then freed by post-op cleanup.

As this is a waste, and as there is a plan to remove d_alloc(), this
code is discarded.

Also try_lookup_noperm() is used in place of full_name_hash() and
d_lookup(), and QSTR_LEN() is used to initialise dname.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ceph/inode.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 59f9f6948bb2..0982fbda2a82 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -15,6 +15,7 @@
 #include <linux/sort.h>
 #include <linux/iversion.h>
 #include <linux/fscrypt.h>
+#include <linux/namei.h>
 
 #include "super.h"
 #include "mds_client.h"
@@ -1623,33 +1624,17 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				ceph_fname_free_buffer(parent_dir, &oname);
 				goto done;
 			}
-			dname.name = oname.name;
-			dname.len = oname.len;
-			dname.hash = full_name_hash(parent, dname.name, dname.len);
+			dname = QSTR_LEN(oname.name, oname.len);
 			tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
 			tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
 retry_lookup:
-			dn = d_lookup(parent, &dname);
+			dn = try_lookup_noperm(&dname, parent);
 			doutc(cl, "d_lookup on parent=%p name=%.*s got %p\n",
 			      parent, dname.len, dname.name, dn);
-
-			if (!dn) {
-				dn = d_alloc(parent, &dname);
-				doutc(cl, "d_alloc %p '%.*s' = %p\n", parent,
-				      dname.len, dname.name, dn);
-				if (!dn) {
-					dput(parent);
-					ceph_fname_free_buffer(parent_dir, &oname);
-					err = -ENOMEM;
-					goto done;
-				}
-				if (is_nokey) {
-					spin_lock(&dn->d_lock);
-					dn->d_flags |= DCACHE_NOKEY_NAME;
-					spin_unlock(&dn->d_lock);
-				}
-				err = 0;
-			} else if (d_really_is_positive(dn) &&
+			if (IS_ERR(dn))
+				/* should be impossible */
+				dn = NULL;
+			if (dn && d_really_is_positive(dn) &&
 				   (ceph_ino(d_inode(dn)) != tvino.ino ||
 				    ceph_snap(d_inode(dn)) != tvino.snap)) {
 				doutc(cl, " dn %p points to wrong inode %p\n",
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (35 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() NeilBrown
                   ` (17 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

cephfs uses the results of readdir to prime the dcache.  Using d_alloc()
is no longer safe, even with an exclusive lock on the parent, as
d_alloc_parallel() will be allowed to run unlocked.  The safe interface
is d_alloc_noblock().  In the rare case that this blocks because there
is a concurrent lookup for the same name there is little cost in not
completing the allocating in the directory code.

It it still possible to create an inode at this point so we do that even
when there is no dentry.

So change to use d_alloc_noblock() and handle -EWOULDBLOCK.  Also use
QSTR_LEN() to initialise dname, and try_lookup_noperm instead of
full_name_hash() and d_lookup().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ceph/inode.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 0982fbda2a82..8557b207d337 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2011,9 +2011,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
 		struct ceph_vino tvino;
 
-		dname.name = rde->name;
-		dname.len = rde->name_len;
-		dname.hash = full_name_hash(parent, dname.name, dname.len);
+		dname = QSTR_LEN(rde->name, rde->name_len);
 
 		tvino.ino = le64_to_cpu(rde->inode.in->ino);
 		tvino.snap = le64_to_cpu(rde->inode.in->snapid);
@@ -2029,20 +2027,24 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		}
 
 retry_lookup:
-		dn = d_lookup(parent, &dname);
+		dn = try_lookup_noperm(&dname, parent);
 		doutc(cl, "d_lookup on parent=%p name=%.*s got %p\n",
 		      parent, dname.len, dname.name, dn);
-
-		if (!dn) {
-			dn = d_alloc(parent, &dname);
-			doutc(cl, "d_alloc %p '%.*s' = %p\n", parent,
+		if (IS_ERR(dn)) {
+			err = PTR_ERR(dn);
+			goto out;
+		} else if (!dn) {
+			dn = d_alloc_noblock(parent, &dname);
+			doutc(cl, "d_alloc_noblock %p '%.*s' = %p\n", parent,
 			      dname.len, dname.name, dn);
-			if (!dn) {
-				doutc(cl, "d_alloc badness\n");
-				err = -ENOMEM;
+			if (dn == ERR_PTR(-EWOULDBLOCK)) {
+				/* Just handle the inode info */
+				dn = NULL;
+			} else if (IS_ERR(dn)) {
+				doutc(cl, "d_alloc_noblock badness\n");
+				err = PTR_ERR(dn);
 				goto out;
-			}
-			if (rde->is_nokey) {
+			} else if (rde->is_nokey) {
 				spin_lock(&dn->d_lock);
 				dn->d_flags |= DCACHE_NOKEY_NAME;
 				spin_unlock(&dn->d_lock);
@@ -2069,7 +2071,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		}
 
 		/* inode */
-		if (d_really_is_positive(dn)) {
+		if (dn && d_really_is_positive(dn)) {
 			in = d_inode(dn);
 		} else {
 			in = ceph_get_inode(parent->d_sb, tvino, NULL);
@@ -2087,21 +2089,22 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		if (ret < 0) {
 			pr_err_client(cl, "badness on %p %llx.%llx\n", in,
 				      ceph_vinop(in));
-			if (d_really_is_negative(dn)) {
+			if (!dn || d_really_is_negative(dn)) {
 				if (inode_state_read_once(in) & I_NEW) {
 					ihold(in);
 					discard_new_inode(in);
 				}
 				iput(in);
 			}
-			d_drop(dn);
+			if (dn)
+				d_drop(dn);
 			err = ret;
 			goto next_item;
 		}
 		if (inode_state_read_once(in) & I_NEW)
 			unlock_new_inode(in);
 
-		if (d_really_is_negative(dn)) {
+		if (d_in_lookup(dn) || d_really_is_negative(dn)) {
 			if (ceph_security_xattr_deadlock(in)) {
 				doutc(cl, " skip splicing dn %p to inode %p"
 				      " (security xattr deadlock)\n", dn, in);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (36 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 39/53] ecryptfs: stop using d_add() NeilBrown
                   ` (16 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

In two places ceph drops a dentry and then calls d_splice_alias().
The d_drop() is no longer needed before d_splice_alias() and will
cause problems for proposed changes to locking.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ceph/file.c  | 2 --
 fs/ceph/inode.c | 3 ---
 2 files changed, 5 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 66bbf6d517a9..c40d129bbd03 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -751,8 +751,6 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
 			unlock_new_inode(inode);
 		}
 		if (d_in_lookup(dentry) || d_really_is_negative(dentry)) {
-			if (!d_unhashed(dentry))
-				d_drop(dentry);
 			dn = d_splice_alias(inode, dentry);
 			WARN_ON_ONCE(dn && dn != dentry);
 		}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8557b207d337..32bac5cac8c4 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1517,9 +1517,6 @@ static int splice_dentry(struct dentry **pdn, struct inode *in)
 		}
 	}
 
-	/* dn must be unhashed */
-	if (!d_unhashed(dn))
-		d_drop(dn);
 	realdn = d_splice_alias(in, dn);
 	if (IS_ERR(realdn)) {
 		pr_err_client(cl, "error %ld %p inode %p ino %llx.%llx\n",
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 39/53] ecryptfs: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (37 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 40/53] gfs2: " NeilBrown
                   ` (15 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in ecryptfs, but as
it is planned to remove d_add(), change to use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/ecryptfs/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 8ab014db3e03..beb9e2c8b8b3 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -352,8 +352,7 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 
 	if (!lower_inode) {
 		/* We want to add because we couldn't find in lower */
-		d_add(dentry, NULL);
-		return NULL;
+		return d_splice_alias(NULL, dentry);
 	}
 	inode = __ecryptfs_get_inode(lower_inode, dentry->d_sb);
 	if (IS_ERR(inode)) {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 40/53] gfs2: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (38 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 39/53] ecryptfs: stop using d_add() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 41/53] libfs: " NeilBrown
                   ` (14 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in gfs2, as the
inode is always NULL, but as it is planned to remove d_add(), change to
use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/gfs2/inode.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 8344040ecaf7..9997fbc1084c 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -988,10 +988,9 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
 	int error;
 
 	inode = gfs2_lookupi(dir, &dentry->d_name, 0);
-	if (inode == NULL) {
-		d_add(dentry, NULL);
-		return NULL;
-	}
+	if (inode == NULL)
+		return d_splice_alias(NULL, dentry);
+
 	if (IS_ERR(inode))
 		return ERR_CAST(inode);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 41/53] libfs: stop using d_add().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (39 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 40/53] gfs2: " NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() NeilBrown
                   ` (13 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

"Best practice" is to use d_splice_alias() at the end of a ->lookup
function.  d_add() often works and is not incorrect in libfs, as the
inode is always NULL, but as it is planned to remove d_add(), change to
use d_splice_alias().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/libfs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 63b4fb082435..75f44341f98b 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -79,8 +79,7 @@ struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, unsigned
 	if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
 		return NULL;
 
-	d_add(dentry, NULL);
-	return NULL;
+	return d_splice_alias(NULL, dentry);
 }
 EXPORT_SYMBOL(simple_lookup);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 42/53] fuse: don't d_drop() before d_splice_alias()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (40 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 41/53] libfs: " NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link() NeilBrown
                   ` (12 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

create_new_entry() is used to finalise the creation of various
objects (mknod, mkdir, symlink etc).

It currently uses d_drop() which will be a problem for a proposed
new locking scheme.

d_splice_alias() now works on hashed dentries so the d_drop() isn't
needed.  Drop it.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/fuse/dir.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 7ac6b232ef12..a659877b520a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1020,7 +1020,6 @@ static struct dentry *create_new_entry(struct mnt_idmap *idmap, struct fuse_moun
 	}
 	kfree(forget);
 
-	d_drop(entry);
 	d = d_splice_alias(inode, entry);
 	if (IS_ERR(d))
 		return d;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (41 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir() NeilBrown
                   ` (11 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

fuse uses the results of readdir to prime the dcache.  Using
d_alloc_parallel() can block if there is a concurrent lookup.  Blocking
in that case is pointless as the lookup will add info to the dcache and
there is no value in the readdir waiting to see if it should add the
info too.

Also this call to d_alloc_parallel() is made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So change to use d_alloc_noblock(), and use try_lookup_noperm() rather
than full_name_hash and d_lookup.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/fuse/readdir.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index f588252891af..400a1a24f659 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -12,6 +12,7 @@
 #include <linux/posix_acl.h>
 #include <linux/pagemap.h>
 #include <linux/highmem.h>
+#include <linux/namei.h>
 
 static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
 {
@@ -192,14 +193,18 @@ static int fuse_direntplus_link(struct file *file,
 	fc = get_fuse_conn(dir);
 	epoch = atomic_read(&fc->epoch);
 
-	name.hash = full_name_hash(parent, name.name, name.len);
-	dentry = d_lookup(parent, &name);
+	dentry = try_lookup_noperm(&name, parent);
 	if (!dentry) {
 retry:
-		dentry = d_alloc_parallel(parent, &name);
-		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+		dentry = d_alloc_noblock(parent, &name);
+	}
+	if (IS_ERR(dentry)) {
+		if (PTR_ERR(dentry) == -EWOULDBLOCK)
+			/* harmless */
+			return 0;
+		return PTR_ERR(dentry);
 	}
+
 	if (!d_in_lookup(dentry)) {
 		struct fuse_inode *fi;
 		inode = d_inode(dentry);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (42 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 45/53] efivarfs: use d_alloc_name() NeilBrown
                   ` (10 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

hostfs_mkdir() uses d_drop() and d_splice_alias() to ensure it has the
right dentry after a mkdir.
d_drop() is no longer needed here and will cause problem for future
changes to directory locking.  So remove the d_drop().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/hostfs/hostfs_kern.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index abe86d72d9ef..f737f99710d5 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -700,7 +700,6 @@ static struct dentry *hostfs_mkdir(struct mnt_idmap *idmap, struct inode *ino,
 		dentry = ERR_PTR(err);
 	} else {
 		inode = hostfs_iget(dentry->d_sb, file);
-		d_drop(dentry);
 		dentry = d_splice_alias(inode, dentry);
 	}
 	__putname(file);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 45/53] efivarfs: use d_alloc_name()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (43 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 46/53] Remove references to d_add() in documentation and comments NeilBrown
                   ` (9 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

efivarfs() is one of the few remaining users of d_alloc().
Other similar filesystems use d_alloc_name() in the same circumstances.
Now that d_alloc_name() supports ->d_hash (providing that it never
fails), change efivarfs to use that.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/efivarfs/super.c | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 1c5224cf183e..232d9757804c 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -189,26 +189,6 @@ static const struct dentry_operations efivarfs_d_ops = {
 	.d_hash = efivarfs_d_hash,
 };
 
-static struct dentry *efivarfs_alloc_dentry(struct dentry *parent, char *name)
-{
-	struct dentry *d;
-	struct qstr q;
-	int err;
-
-	q.name = name;
-	q.len = strlen(name);
-
-	err = efivarfs_d_hash(parent, &q);
-	if (err)
-		return ERR_PTR(err);
-
-	d = d_alloc(parent, &q);
-	if (d)
-		return d;
-
-	return ERR_PTR(-ENOMEM);
-}
-
 bool efivarfs_variable_is_present(efi_char16_t *variable_name,
 				  efi_guid_t *vendor, void *data)
 {
@@ -263,9 +243,9 @@ static int efivarfs_create_dentry(struct super_block *sb, efi_char16_t *name16,
 	memcpy(entry->var.VariableName, name16, name_size);
 	memcpy(&(entry->var.VendorGuid), &vendor, sizeof(efi_guid_t));
 
-	dentry = efivarfs_alloc_dentry(root, name);
-	if (IS_ERR(dentry)) {
-		err = PTR_ERR(dentry);
+	dentry = d_alloc_name(root, name);
+	if (!dentry) {
+		err = -ENOMEM;
 		goto fail_inode;
 	}
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 46/53] Remove references to d_add() in documentation and comments.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (44 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 45/53] efivarfs: use d_alloc_name() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 47/53] VFS: make d_alloc() local to VFS NeilBrown
                   ` (8 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

It is planned to remove d_add(), so remove all references in
documentation and comments.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/nfs/exporting.rst | 10 ++--------
 Documentation/filesystems/vfs.rst           |  4 ++--
 fs/afs/dir.c                                |  5 +++--
 fs/dcache.c                                 |  2 +-
 fs/ocfs2/namei.c                            |  2 +-
 fs/xfs/xfs_iops.c                           |  6 +++---
 6 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
index a01d9b9b5bc3..ccaacdc72576 100644
--- a/Documentation/filesystems/nfs/exporting.rst
+++ b/Documentation/filesystems/nfs/exporting.rst
@@ -101,14 +101,8 @@ Filesystem Issues
 For a filesystem to be exportable it must:
 
    1. provide the filehandle fragment routines described below.
-   2. make sure that d_splice_alias is used rather than d_add
-      when ->lookup finds an inode for a given parent and name.
-
-      If inode is NULL, d_splice_alias(inode, dentry) is equivalent to::
-
-		d_add(dentry, inode), NULL
-
-      Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
+   2. Use d_splice_alias() when ->lookup finds an inode for a given 
+      parent and name.
 
       Typically the ->lookup routine will simply end with a::
 
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index d8df0a84cdba..26dec777ca5c 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -460,7 +460,7 @@ otherwise noted.
 ``lookup``
 	called when the VFS needs to look up an inode in a parent
 	directory.  The name to look for is found in the dentry.  This
-	method must call d_add() to insert the found inode into the
+	method must call d_splice_alias() to insert the found inode into the
 	dentry.  The "i_count" field in the inode structure should be
 	incremented.  If the named inode does not exist a NULL inode
 	should be inserted into the dentry (this is called a negative
@@ -1433,7 +1433,7 @@ manipulate dentries:
 	d_iput() method is called).  If there are other references, then
 	d_drop() is called instead
 
-``d_add``
+``d_splice_alias``
 	add a dentry to its parents hash list and then calls
 	d_instantiate()
 
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index b5c593f50079..f259ca2da383 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -960,8 +960,9 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry,
 		dput(ret);
 	}
 
-	/* We don't want to d_add() the @sys dentry here as we don't want to
-	 * the cached dentry to hide changes to the sysnames list.
+	/* We don't want to d_splice_alias() the @sys dentry here as we
+	 * don't want to the cached dentry to hide changes to the
+	 * sysnames list.
 	 */
 	ret = NULL;
 out_s:
diff --git a/fs/dcache.c b/fs/dcache.c
index c48337d95f9a..9a6139013367 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3323,7 +3323,7 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
  * @dentry must be negative and may be in-lookup or unhashed or hashed.
  *
  * If inode is a directory and has an IS_ROOT alias, then d_move that in
- * place of the given dentry and return it, else simply d_add the inode
+ * place of the given dentry and return it, else simply __d_add the inode
  * to the dentry and return NULL.
  *
  * If a non-IS_ROOT directory is found, the filesystem is corrupt, and
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 268b79339a51..0d3116142bd7 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -172,7 +172,7 @@ static struct dentry *ocfs2_lookup(struct inode *dir, struct dentry *dentry,
 		ocfs2_dentry_attach_gen(dentry);
 
 bail_unlock:
-	/* Don't drop the cluster lock until *after* the d_add --
+	/* Don't drop the cluster lock until *after* the d_splice_alias --
 	 * unlink on another node will message us to remove that
 	 * dentry under this lock so otherwise we can race this with
 	 * the downconvert thread and have a stale dentry. */
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ec19d3ec7cf0..2641061ba1db 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -356,9 +356,9 @@ xfs_vn_ci_lookup(
 		if (unlikely(error != -ENOENT))
 			return ERR_PTR(error);
 		/*
-		 * call d_add(dentry, NULL) here when d_drop_negative_children
-		 * is called in xfs_vn_mknod (ie. allow negative dentries
-		 * with CI filesystems).
+		 * call d_splice_alias(NULL, dentry) here when
+		 * d_drop_negative_children is called in xfs_vn_mknod
+		 * (ie.  allow negative dentries with CI filesystems).
 		 */
 		return NULL;
 	}
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 47/53] VFS: make d_alloc() local to VFS.
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (45 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 46/53] Remove references to d_add() in documentation and comments NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 48/53] VFS: remove d_add() NeilBrown
                   ` (7 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

d_alloc() is not useful.  d_alloc_name() is a better interface for
those cases where it is safe to allocate a dentry without
synchronisation with the VFS, and d_alloc_parallel() or
d_alloc_noblock() shoudl be used when synchronisation is needed.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst | 8 ++++++++
 fs/dcache.c                           | 1 -
 fs/internal.h                         | 1 +
 include/linux/dcache.h                | 1 -
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 6a507c508ccf..4712403fd98e 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1381,3 +1381,11 @@ longer available.  Use start_renaming() or similar.
 
 d_alloc_parallel() no longer requires a waitqueue_head.  It uses one
 from an internal table when needed.
+
+---
+
+**mandatory**
+
+d_alloc() is no longer exported as its use can be racy.  Use d_alloc_name()
+when object creation is controlled separately from standard filesystem interface,
+and d_alloc_parallel() or d_alloc_noblock() when standard interfaces can be used.
diff --git a/fs/dcache.c b/fs/dcache.c
index 9a6139013367..23f04fa05778 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1830,7 +1830,6 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 
 	return dentry;
 }
-EXPORT_SYMBOL(d_alloc);
 
 /**
  * d_duplicate: duplicate a dentry for combined atomic operation
diff --git a/fs/internal.h b/fs/internal.h
index cbc384a1aa09..9c637e2d18ef 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -223,6 +223,7 @@ bool sync_lazytime(struct inode *inode);
 /*
  * dcache.c
  */
+struct dentry *d_alloc(struct dentry * parent, const struct qstr *name);
 extern int d_set_mounted(struct dentry *dentry);
 extern long prune_dcache_sb(struct super_block *sb, struct shrink_control *sc);
 extern struct dentry *d_alloc_cursor(struct dentry *);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3b12577ddfbb..18242f9598dc 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -242,7 +242,6 @@ extern void d_drop(struct dentry *dentry);
 extern void d_delete(struct dentry *);
 
 /* allocate/de-allocate */
-extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 48/53] VFS: remove d_add()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (46 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 47/53] VFS: make d_alloc() local to VFS NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 49/53] VFS: remove d_rehash() NeilBrown
                   ` (6 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

d_add() has been supplanted by d_splice_alias(), d_make_persistent() and
others.  It is no longer used and can be discarded

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 19 -------------------
 include/linux/dcache.h |  2 --
 2 files changed, 21 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 23f04fa05778..4ebbbcc5aec4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2979,25 +2979,6 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
 		spin_unlock(&inode->i_lock);
 }
 
-/**
- * d_add - add dentry to hash queues
- * @entry: dentry to add
- * @inode: The inode to attach to this dentry
- *
- * This adds the entry to the hash queues and initializes @inode.
- * The entry was actually filled in earlier during d_alloc().
- */
-
-void d_add(struct dentry *entry, struct inode *inode)
-{
-	if (inode) {
-		security_d_instantiate(entry, inode);
-		spin_lock(&inode->i_lock);
-	}
-	__d_add(entry, inode, NULL);
-}
-EXPORT_SYMBOL(d_add);
-
 struct dentry *d_make_persistent(struct dentry *dentry, struct inode *inode)
 {
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 18242f9598dc..31b4a831ecdb 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -282,8 +282,6 @@ extern int path_has_submounts(const struct path *);
  */
 extern void d_rehash(struct dentry *);
  
-extern void d_add(struct dentry *, struct inode *);
-
 /* used for rename() and baskets */
 extern void d_move(struct dentry *, struct dentry *);
 extern void d_exchange(struct dentry *, struct dentry *);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 49/53] VFS: remove d_rehash()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (47 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 48/53] VFS: remove d_add() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() NeilBrown
                   ` (5 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

d_rehash() is no longer used.  Is existence implies that it might be
safe to unhash and rehash ad dentry, and with proposed locking changes
that will no longer be the case.
So remove it.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  7 +++++++
 fs/dcache.c                           | 15 ---------------
 include/linux/dcache.h                |  5 -----
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 4712403fd98e..154a38cd7801 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1389,3 +1389,10 @@ from an internal table when needed.
 d_alloc() is no longer exported as its use can be racy.  Use d_alloc_name()
 when object creation is controlled separately from standard filesystem interface,
 and d_alloc_parallel() or d_alloc_noblock() when standard interfaces can be used.
+
+---
+**mandatory**
+
+d_rehash() is gone. It should never be needed.  Only unhash a dentry if
+you really don't want it.
+
diff --git a/fs/dcache.c b/fs/dcache.c
index 4ebbbcc5aec4..abb96ad8e015 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2627,21 +2627,6 @@ static void __d_rehash(struct dentry *entry)
 	hlist_bl_unlock(b);
 }
 
-/**
- * d_rehash - add an entry back to the hash
- * @entry: dentry to add to the hash
- *
- * Adds a dentry to the hash according to its name.
- */
- 
-void d_rehash(struct dentry * entry)
-{
-	spin_lock(&entry->d_lock);
-	__d_rehash(entry);
-	spin_unlock(&entry->d_lock);
-}
-EXPORT_SYMBOL(d_rehash);
-
 #define PAR_LOOKUP_WQ_BITS	8
 #define PAR_LOOKUP_WQS (1 << PAR_LOOKUP_WQ_BITS)
 static wait_queue_head_t par_wait_table[PAR_LOOKUP_WQS] __cacheline_aligned;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 31b4a831ecdb..eb1a59b6fca7 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -277,11 +277,6 @@ extern struct dentry *d_find_alias_rcu(struct inode *);
 /* test whether we have any submounts in a subdir tree */
 extern int path_has_submounts(const struct path *);
 
-/*
- * This adds the entry to the hash queues.
- */
-extern void d_rehash(struct dentry *);
- 
 /* used for rename() and baskets */
 extern void d_move(struct dentry *, struct dentry *);
 extern void d_exchange(struct dentry *, struct dentry *);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm()
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (48 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 49/53] VFS: remove d_rehash() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() NeilBrown
                   ` (4 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

These are no longer used, so remove them.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  7 +++
 fs/ecryptfs/inode.c                   |  2 +-
 fs/namei.c                            | 61 ++-------------------------
 include/linux/namei.h                 |  2 -
 4 files changed, 12 insertions(+), 60 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 154a38cd7801..7e83bd3c5a12 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1396,3 +1396,10 @@ and d_alloc_parallel() or d_alloc_noblock() when standard interfaces can be used
 d_rehash() is gone. It should never be needed.  Only unhash a dentry if
 you really don't want it.
 
+---
+
+** mandatory**
+
+lookup_one() and lookup_noperm() are no longer available.  Use
+start_creating() or similar instead.
+
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index beb9e2c8b8b3..a7a596d51d67 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -414,7 +414,7 @@ static struct dentry *ecryptfs_lookup(struct inode *ecryptfs_dir_inode,
 
 	lower_dentry = lookup_noperm_unlocked(&qname, lower_dir_dentry);
 	if (IS_ERR(lower_dentry)) {
-		ecryptfs_printk(KERN_DEBUG, "%s: lookup_noperm() returned "
+		ecryptfs_printk(KERN_DEBUG, "%s: lookup_noperm_unlocked() returned "
 				"[%ld] on lower_dentry = [%s]\n", __func__,
 				PTR_ERR(lower_dentry),
 				qname.name);
diff --git a/fs/namei.c b/fs/namei.c
index eed388ee8a30..cb80490a869f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3148,59 +3148,6 @@ struct dentry *try_lookup_noperm(struct qstr *name, struct dentry *base)
 }
 EXPORT_SYMBOL(try_lookup_noperm);
 
-/**
- * lookup_noperm - filesystem helper to lookup single pathname component
- * @name:	qstr storing pathname component to lookup
- * @base:	base directory to lookup from
- *
- * Note that this routine is purely a helper for filesystem usage and should
- * not be called by generic code.  It does no permission checking.
- *
- * The caller must hold base->i_rwsem.
- */
-struct dentry *lookup_noperm(struct qstr *name, struct dentry *base)
-{
-	struct dentry *dentry;
-	int err;
-
-	WARN_ON_ONCE(!inode_is_locked(base->d_inode));
-
-	err = lookup_noperm_common(name, base);
-	if (err)
-		return ERR_PTR(err);
-
-	dentry = lookup_dcache(name, base, 0);
-	return dentry ? dentry : __lookup_slow(name, base, 0);
-}
-EXPORT_SYMBOL(lookup_noperm);
-
-/**
- * lookup_one - lookup single pathname component
- * @idmap:	idmap of the mount the lookup is performed from
- * @name:	qstr holding pathname component to lookup
- * @base:	base directory to lookup from
- *
- * This can be used for in-kernel filesystem clients such as file servers.
- *
- * The caller must hold base->i_rwsem.
- */
-struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr *name,
-			  struct dentry *base)
-{
-	struct dentry *dentry;
-	int err;
-
-	WARN_ON_ONCE(!inode_is_locked(base->d_inode));
-
-	err = lookup_one_common(idmap, name, base);
-	if (err)
-		return ERR_PTR(err);
-
-	dentry = lookup_dcache(name, base, 0);
-	return dentry ? dentry : __lookup_slow(name, base, 0);
-}
-EXPORT_SYMBOL(lookup_one);
-
 /**
  * lookup_one_unlocked - lookup single pathname component
  * @idmap:	idmap of the mount the lookup is performed from
@@ -3209,8 +3156,8 @@ EXPORT_SYMBOL(lookup_one);
  *
  * This can be used for in-kernel filesystem clients such as file servers.
  *
- * Unlike lookup_one, it should be called without the parent
- * i_rwsem held, and will take the i_rwsem itself if necessary.
+ * It should be called without the parent i_rwsem held, and will take
+ * the i_rwsem itself if necessary.
  *
  * Returns: - A dentry, possibly negative, or
  *	    - same errors as try_lookup_noperm() or
@@ -3322,8 +3269,8 @@ EXPORT_SYMBOL(lookup_one_positive_unlocked);
  * Note that this routine is purely a helper for filesystem usage and should
  * not be called by generic code. It does no permission checking.
  *
- * Unlike lookup_noperm(), it should be called without the parent
- * i_rwsem held, and will take the i_rwsem itself if necessary.
+ * This should be called without the parent i_rwsem held, and will take
+ * the i_rwsem itself if necessary.
  *
  * Unlike try_lookup_noperm() it *does* revalidate the dentry if it already
  * existed.
diff --git a/include/linux/namei.h b/include/linux/namei.h
index b3346a513d8f..cb79e84c718d 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -74,10 +74,8 @@ int vfs_path_lookup(struct dentry *, struct vfsmount *, const char *,
 		    unsigned int, struct path *);
 
 extern struct dentry *try_lookup_noperm(struct qstr *, struct dentry *);
-extern struct dentry *lookup_noperm(struct qstr *, struct dentry *);
 extern struct dentry *lookup_noperm_unlocked(struct qstr *, struct dentry *);
 extern struct dentry *lookup_noperm_positive_unlocked(struct qstr *, struct dentry *);
-struct dentry *lookup_one(struct mnt_idmap *, struct qstr *, struct dentry *);
 struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
 				   struct qstr *name, struct dentry *base);
 struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (49 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock NeilBrown
                   ` (3 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

lookup_one_qstr_excl() is used for lookups prior to directory
modifications (other than open(O_CREATE)), whether create, remove,
or rename.

A future patch will lift lookup out of the i_rwsem lock that protects
the directory during these operations (only taking a shared lock if the
target name is not yet in the dcache).

To prepare for this change and particularly to allow lookup to
always be done outside the parent i_rwsem, change lookup_one_qstr_excl()
to use d_alloc_parallel().

For the target of create and rename some filesystems skip the
preliminary lookup and combine it with the main operation.  This is only
safe if the operation has exclusive access to the dentry.  Currently
this is guaranteed by an exclusive lock on the directory.
d_alloc_parallel() provides alternate exclusive access (in the case
where the name isn't in the dcache and ->lookup will be called).

As a result of this change, ->lookup is now only ever called with a
d_in_lookup() dentry.  Consequently we can remove the d_in_lookup()
check from d_add_ci() which is only used in ->lookup.

If LOOKUP_EXCL or LOOKUP_RENAME_TARGET is passed, the caller must ensure
d_lookup_done() is called at an appropriate time, and must not assume
that it can test for positive or negative dentries without confirming
that the dentry is no longer d_in_lookup() - unless it is filesystem
code acting on itself and *knows* that ->lookup() always completes the
lookup (currently true for all filesystems other than NFS).
This is all handled in start_creating() and end_dirop() and friends.

Note that as lookup_one_qstr_excl() is called with an exclusive lock on
the directory, d_alloc_parallel() cannot race with another thread and
cannot return a non-in-lookup dentry.  However that is expected to
change so that case is handled with this patch.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst | 14 ++++++++++
 fs/dcache.c                           | 16 +++--------
 fs/namei.c                            | 38 ++++++++++++++++++++-------
 3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 7e83bd3c5a12..5ddc5ecfcc64 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1403,3 +1403,17 @@ you really don't want it.
 lookup_one() and lookup_noperm() are no longer available.  Use
 start_creating() or similar instead.
 
+
+---
+
+**mandatory**
+
+All start_creating and start_renaming functions may return a
+d_in_lookup() dentry if passed "O_CREATE|O_EXCL" or "O_RENAME_TARGET".
+end_dirop() calls the necessary d_lookup_done().  If the caller
+*knows* which filesystem is being used, it may know that this is not
+possible.  Otherwise it must be careful testing if the dentry is
+positive or negative as the lookup may not have been performed yet.
+
+inode_operations.lookup() is now only ever called with a d_in_lookup()
+dentry.
diff --git a/fs/dcache.c b/fs/dcache.c
index abb96ad8e015..f573716d1a04 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2261,18 +2261,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		inode_unlock_shared(d_inode(dentry->d_parent));
 	else
 		inode_unlock(d_inode(dentry->d_parent));
-	if (d_in_lookup(dentry)) {
-		found = d_alloc_parallel(dentry->d_parent, name);
-		if (IS_ERR(found) || !d_in_lookup(found)) {
-			iput(inode);
-			return found;
-		}
-	} else {
-		found = d_alloc(dentry->d_parent, name);
-		if (!found) {
-			iput(inode);
-			return ERR_PTR(-ENOMEM);
-		}
+	found = d_alloc_parallel(dentry->d_parent, name);
+	if (IS_ERR(found) || !d_in_lookup(found)) {
+		iput(inode);
+		return found;
 	}
 	if (shared)
 		inode_lock_shared(d_inode(dentry->d_parent));
diff --git a/fs/namei.c b/fs/namei.c
index cb80490a869f..bba419f2fc53 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1774,13 +1774,14 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 }
 
 /*
- * Parent directory has inode locked exclusive.  This is one
- * and only case when ->lookup() gets called on non in-lookup
- * dentries - as the matter of fact, this only gets called
- * when directory is guaranteed to have no in-lookup children
- * at all.
- * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
- * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
+ * Parent directory has inode locked.
+ * If Lookup_EXCL or LOOKUP_RENAME_TARGET is set
+ * d_lookup_done() must be called before the dentry is dput()
+ * If the dentry is not d_in_lookup():
+ *   Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
+ *   Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
+ * If it is d_in_lookup() then these conditions can only be checked by the
+ * file system when carrying out the intent (create or rename).
  */
 static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 					   struct dentry *base, unsigned int flags)
@@ -1798,18 +1799,27 @@ static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 	if (unlikely(IS_DEADDIR(dir)))
 		return ERR_PTR(-ENOENT);
 
-	dentry = d_alloc(base, name);
-	if (unlikely(!dentry))
-		return ERR_PTR(-ENOMEM);
+	dentry = d_alloc_parallel(base, name);
+	if (unlikely(IS_ERR(dentry)))
+		return dentry;
+	if (unlikely(!d_in_lookup(dentry)))
+		/* Raced with another thread which did the lookup */
+		goto found;
 
 	old = dir->i_op->lookup(dir, dentry, flags);
 	if (unlikely(old)) {
+		d_lookup_done(dentry);
 		dput(dentry);
 		dentry = old;
 	}
 found:
 	if (IS_ERR(dentry))
 		return dentry;
+	if (d_in_lookup(dentry))
+		/* We cannot check for errors - the caller will have to
+		 * wait for any create-etc attempt to get relevant errors.
+		 */
+		return dentry;
 	if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
 		dput(dentry);
 		return ERR_PTR(-ENOENT);
@@ -2921,6 +2931,8 @@ static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
  * The lookup is performed and necessary locks are taken so that, on success,
  * the returned dentry can be operated on safely.
  * The qstr must already have the hash value calculated.
+ * The dentry may be d_in_lookup() if %LOOKUP_EXCL or %LOOKUP_RENAME_TARGET
+ * is given, depending on the filesystem.
  *
  * Returns: a locked dentry, or an error.
  *
@@ -2942,6 +2954,7 @@ void end_dirop(struct dentry *de)
 {
 	if (!IS_ERR(de)) {
 		inode_unlock(de->d_parent->d_inode);
+		d_lookup_done(de);
 		dput(de);
 	}
 }
@@ -3854,8 +3867,10 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
 	return 0;
 
 out_dput_d2:
+	d_lookup_done(d2);
 	dput(d2);
 out_dput_d1:
+	d_lookup_done(d1);
 	dput(d1);
 out_unlock:
 	unlock_rename(rd->old_parent, rd->new_parent);
@@ -3950,6 +3965,7 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
 	return 0;
 
 out_dput_d2:
+	d_lookup_done(d2);
 	dput(d2);
 out_unlock:
 	unlock_rename(old_dentry->d_parent, rd->new_parent);
@@ -4059,6 +4075,8 @@ EXPORT_SYMBOL(start_renaming_two_dentries);
 
 void end_renaming(struct renamedata *rd)
 {
+	d_lookup_done(rd->old_dentry);
+	d_lookup_done(rd->new_dentry);
 	unlock_rename(rd->old_parent, rd->new_parent);
 	dput(rd->old_dentry);
 	dput(rd->new_dentry);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (50 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 21:12 ` [PATCH 53/53] VFS: remove LOOKUP_SHARED NeilBrown
                   ` (2 subsequent siblings)
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

d_alloc_parallel() can block waiting on a d_in_lookup() dentry
so it is important to order it consistently with other blocking locks
such as inode_lock().

Currenty d_alloc_parallel() is ordered after inode_lock(): it can be
called while the inode is locked, and so the inode cannot be locked
while a d_in_lookup() dentry is held.

This patch reverses that order.  d_alloc_parallel() must now be called
*before* locking the directory, and must not be called afterwards.  This
allows directory locking to be moved closer to the filesystem
operations, and ultimately into those operations.

lookup_one_qstr_excl() is now called without an lock held, exclusive or
otherwise, so the "_excl" is dropped - it is now lookup_one_qstr().

As a lock is taken *after* lookup, start_dirop() and start_renaming()
must ensure that if the dentry isn't d_in_lookup() that after the lock
is taken the parent is still correct and the dentry is still hashed.

lookup_one_qstr() and lookup_slow() don't need to re-check the parent as
the dentry is always d_in_lookup() so parent cannot change.

The locking in lookup_slow() is moved into __lookup_slow() immediately
before/after ->lookup, and lookup_slow() just sets the task state for
waiting.

Parent locking is removed from open_last_lookups() and performed in
lookup_open().  A shared lock is taken if ->lookup() needs to be called.
An exclusive lock is taken separately if ->create() needs to be called -
with checks that the dentry hasn't become positive.

If ->atomic_open is needed we take exclusive or shared parent lock as
appropriate and check for a positive dentry or DEAD parent.

The fsnotify_create() call is kept inside the locked region in
lookup_open().  I don't know if this is important.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c | 239 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 154 insertions(+), 85 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bba419f2fc53..3d213070a515 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1773,8 +1773,19 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 	return dentry;
 }
 
+static inline bool inode_lock_shared_state(struct inode *inode, unsigned int state)
+{
+	if (state == TASK_KILLABLE) {
+		if (down_read_killable(&inode->i_rwsem) != 0) {
+			return false;
+		}
+	} else {
+		inode_lock_shared(inode);
+	}
+	return true;
+}
+
 /*
- * Parent directory has inode locked.
  * If Lookup_EXCL or LOOKUP_RENAME_TARGET is set
  * d_lookup_done() must be called before the dentry is dput()
  * If the dentry is not d_in_lookup():
@@ -1783,8 +1794,9 @@ static struct dentry *lookup_dcache(const struct qstr *name,
  * If it is d_in_lookup() then these conditions can only be checked by the
  * file system when carrying out the intent (create or rename).
  */
-static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
-					   struct dentry *base, unsigned int flags)
+static struct dentry *lookup_one_qstr(const struct qstr *name,
+				      struct dentry *base, unsigned int flags,
+				      unsigned int state)
 {
 	struct dentry *dentry;
 	struct dentry *old;
@@ -1806,7 +1818,16 @@ static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 		/* Raced with another thread which did the lookup */
 		goto found;
 
-	old = dir->i_op->lookup(dir, dentry, flags);
+	if (!inode_lock_shared_state(dir, state)) {
+		d_lookup_done(dentry);
+		dput(dentry);
+		return ERR_PTR(-EINTR);
+	}
+	if (unlikely(IS_DEADDIR(dir)))
+		old = ERR_PTR(-ENOENT);
+	else
+		old = dir->i_op->lookup(dir, dentry, flags | LOOKUP_SHARED);
+	inode_unlock_shared(dir);
 	if (unlikely(old)) {
 		d_lookup_done(dentry);
 		dput(dentry);
@@ -1897,7 +1918,8 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 /* Fast lookup failed, do it the slow way */
 static struct dentry *__lookup_slow(const struct qstr *name,
 				    struct dentry *dir,
-				    unsigned int flags)
+				    unsigned int flags,
+				    unsigned int state)
 {
 	struct dentry *dentry, *old;
 	struct inode *inode = dir->d_inode;
@@ -1920,8 +1942,17 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 			dput(dentry);
 			dentry = ERR_PTR(error);
 		}
+	} else if (!inode_lock_shared_state(inode, state)) {
+		d_lookup_done(dentry);
+		dput(dentry);
+		return ERR_PTR(-EINTR);
 	} else {
-		old = inode->i_op->lookup(inode, dentry, flags);
+		if (unlikely(IS_DEADDIR(inode)))
+			old = ERR_PTR(-ENOENT);
+		else
+			old = inode->i_op->lookup(inode, dentry,
+						  flags | LOOKUP_SHARED);
+		inode_unlock_shared(inode);
 		d_lookup_done(dentry);
 		if (unlikely(old)) {
 			dput(dentry);
@@ -1935,26 +1966,14 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
 				  struct dentry *dir,
 				  unsigned int flags)
 {
-	struct inode *inode = dir->d_inode;
-	struct dentry *res;
-	inode_lock_shared(inode);
-	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
-	inode_unlock_shared(inode);
-	return res;
+	return __lookup_slow(name, dir, flags | LOOKUP_SHARED, TASK_NORMAL);
 }
 
 static struct dentry *lookup_slow_killable(const struct qstr *name,
 					   struct dentry *dir,
 					   unsigned int flags)
 {
-	struct inode *inode = dir->d_inode;
-	struct dentry *res;
-
-	if (inode_lock_shared_killable(inode))
-		return ERR_PTR(-EINTR);
-	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
-	inode_unlock_shared(inode);
-	return res;
+	return __lookup_slow(name, dir, flags | LOOKUP_SHARED, TASK_KILLABLE);
 }
 
 static inline int may_lookup(struct mnt_idmap *idmap,
@@ -2908,18 +2927,26 @@ static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
 	struct dentry *dentry;
 	struct inode *dir = d_inode(parent);
 
-	if (state == TASK_KILLABLE) {
-		int ret = down_write_killable_nested(&dir->i_rwsem,
-						     I_MUTEX_PARENT);
-		if (ret)
-			return ERR_PTR(ret);
-	} else {
-		inode_lock_nested(dir, I_MUTEX_PARENT);
-	}
-	dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
-	if (IS_ERR(dentry))
+	while(1) {
+		dentry = lookup_one_qstr(name, parent, lookup_flags, state);
+		if (IS_ERR(dentry))
+			return dentry;
+		if (state == TASK_KILLABLE) {
+			if (down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT) != 0) {
+				d_lookup_done(dentry);
+				dput(dentry);
+				return ERR_PTR(-EINTR);
+			}
+		} else {
+			inode_lock_nested(dir, I_MUTEX_PARENT);
+		}
+		if (d_in_lookup(dentry) ||
+		    (!d_unhashed(dentry) && dentry->d_parent == parent))
+			return dentry;
 		inode_unlock(dir);
-	return dentry;
+		d_lookup_done(dentry);
+		dput(dentry);
+	}
 }
 
 /**
@@ -3830,26 +3857,37 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
 	if (rd->flags & RENAME_NOREPLACE)
 		target_flags |= LOOKUP_EXCL;
 
-	trap = lock_rename(rd->old_parent, rd->new_parent);
-	if (IS_ERR(trap))
-		return PTR_ERR(trap);
-
-	d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
-				  lookup_flags);
+retry:
+	d1 = lookup_one_qstr(old_last, rd->old_parent,
+			     lookup_flags, TASK_NORMAL);
 	err = PTR_ERR(d1);
 	if (IS_ERR(d1))
-		goto out_unlock;
+		goto out_err;
 
-	d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
-				  lookup_flags | target_flags);
+	d2 = lookup_one_qstr(new_last, rd->new_parent,
+			     lookup_flags | target_flags, TASK_NORMAL);
 	err = PTR_ERR(d2);
 	if (IS_ERR(d2))
 		goto out_dput_d1;
 
+	trap = lock_rename(rd->old_parent, rd->new_parent);
+	err = PTR_ERR(trap);
+	if (IS_ERR(trap))
+		goto out_unlock;
+
+	if (unlikely((!d_in_lookup(d1) && d_unhashed(d1)) || d1->d_parent != rd->old_parent ||
+		     (!d_in_lookup(d2) && d_unhashed(d2)) || d2->d_parent != rd->new_parent)) {
+		unlock_rename(rd->old_parent, rd->new_parent);
+		d_lookup_done(d1); dput(d1);
+		d_lookup_done(d2); dput(d2);
+		dput(trap);
+		goto retry;
+	}
+
 	if (d1 == trap) {
 		/* source is an ancestor of target */
 		err = -EINVAL;
-		goto out_dput_d2;
+		goto out_unlock;
 	}
 
 	if (d2 == trap) {
@@ -3858,7 +3896,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
 			err = -EINVAL;
 		else
 			err = -ENOTEMPTY;
-		goto out_dput_d2;
+		goto out_unlock;
 	}
 
 	rd->old_dentry = d1;
@@ -3866,14 +3904,14 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
 	dget(rd->old_parent);
 	return 0;
 
-out_dput_d2:
+out_unlock:
+	unlock_rename(rd->old_parent, rd->new_parent);
 	d_lookup_done(d2);
 	dput(d2);
 out_dput_d1:
 	d_lookup_done(d1);
 	dput(d1);
-out_unlock:
-	unlock_rename(rd->old_parent, rd->new_parent);
+out_err:
 	return err;
 }
 
@@ -3927,10 +3965,22 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
 	if (rd->flags & RENAME_NOREPLACE)
 		target_flags |= LOOKUP_EXCL;
 
-	/* Already have the dentry - need to be sure to lock the correct parent */
+retry:
+	d2 = lookup_one_qstr(new_last, rd->new_parent,
+			     lookup_flags | target_flags, TASK_NORMAL);
+	err = PTR_ERR(d2);
+	if (IS_ERR(d2))
+		goto out_unlock;
+
+	/*
+	 * Already have the old_dentry - need to be sure to lock
+	 * the correct parent
+	 */
 	trap = lock_rename_child(old_dentry, rd->new_parent);
+	err = PTR_ERR(trap);
 	if (IS_ERR(trap))
-		return PTR_ERR(trap);
+		goto out_dput_d2;
+
 	if (d_unhashed(old_dentry) ||
 	    (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
 		/* dentry was removed, or moved and explicit parent requested */
@@ -3938,16 +3988,19 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
 		goto out_unlock;
 	}
 
-	d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
-				  lookup_flags | target_flags);
-	err = PTR_ERR(d2);
-	if (IS_ERR(d2))
-		goto out_unlock;
+	if (unlikely((!d_in_lookup(d2) && d_unhashed(d2)) ||
+		     d2->d_parent != rd->new_parent)) {
+		/* d2 was moved/removed before lock - repeat lookup */
+		unlock_rename(old_dentry->d_parent, rd->new_parent);
+		d_lookup_done(d2); dput(d2);
+		dput(trap);
+		goto retry;
+	}
 
 	if (old_dentry == trap) {
 		/* source is an ancestor of target */
 		err = -EINVAL;
-		goto out_dput_d2;
+		goto out_unlock;
 	}
 
 	if (d2 == trap) {
@@ -3956,7 +4009,7 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
 			err = -EINVAL;
 		else
 			err = -ENOTEMPTY;
-		goto out_dput_d2;
+		goto out_unlock;
 	}
 
 	rd->old_dentry = dget(old_dentry);
@@ -3964,11 +4017,11 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
 	rd->old_parent = dget(old_dentry->d_parent);
 	return 0;
 
+out_unlock:
+	unlock_rename(old_dentry->d_parent, rd->new_parent);
 out_dput_d2:
 	d_lookup_done(d2);
 	dput(d2);
-out_unlock:
-	unlock_rename(old_dentry->d_parent, rd->new_parent);
 	return err;
 }
 
@@ -4319,8 +4372,19 @@ static struct dentry *atomic_open(const struct path *path, struct dentry *dentry
 
 	file->__f_path.dentry = DENTRY_NOT_SET;
 	file->__f_path.mnt = path->mnt;
-	error = dir->i_op->atomic_open(dir, dentry, file,
-				       open_to_namei_flags(open_flag), mode);
+
+	if (open_flag & O_CREAT)
+		inode_lock(dir);
+	else
+		inode_lock_shared(dir);
+	if (dentry->d_inode)
+		error = finish_no_open(file, NULL);
+	else if (unlikely(IS_DEADDIR(dir)))
+		error = -ENOENT;
+	else
+		error = dir->i_op->atomic_open(dir, dentry, file,
+					       open_to_namei_flags(open_flag),
+					       mode);
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
@@ -4339,6 +4403,13 @@ static struct dentry *atomic_open(const struct path *path, struct dentry *dentry
 				error = -ENOENT;
 		}
 	}
+	if (!error && (file->f_mode & FMODE_CREATED))
+		fsnotify_create(dir, dentry);
+	if (open_flag & O_CREAT)
+		inode_unlock(dir);
+	else
+		inode_unlock_shared(dir);
+
 	if (error) {
 		dput(dentry);
 		dentry = ERR_PTR(error);
@@ -4372,10 +4443,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
-	unsigned int shared_flag = (op->open_flag & O_CREAT) ? 0 : LOOKUP_SHARED;
-
-	if (unlikely(IS_DEADDIR(dir_inode)))
-		return ERR_PTR(-ENOENT);
 
 	file->f_mode &= ~FMODE_CREATED;
 	dentry = d_lookup(dir, &nd->last);
@@ -4420,7 +4487,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	if (open_flag & O_CREAT) {
 		if (open_flag & O_EXCL)
 			open_flag &= ~O_TRUNC;
-		mode = vfs_prepare_mode(idmap, dir->d_inode, mode, mode, mode);
+		mode = vfs_prepare_mode(idmap, dir_inode, mode, mode, mode);
 		if (likely(got_write))
 			create_error = may_o_create(idmap, &nd->path,
 						    dentry, mode);
@@ -4439,8 +4506,15 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	}
 
 	if (d_in_lookup(dentry)) {
-		struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
-							     nd->flags | shared_flag);
+		struct dentry *res;
+
+		inode_lock_shared(dir_inode);
+		if (IS_DEADDIR(dir_inode))
+			res = ERR_PTR(-ENOENT);
+		else
+			res = dir_inode->i_op->lookup(dir_inode, dentry,
+						      nd->flags | LOOKUP_SHARED);
+		inode_unlock_shared(dir_inode);
 		d_lookup_done(dentry);
 		if (unlikely(res)) {
 			if (IS_ERR(res)) {
@@ -4459,15 +4533,22 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (error)
 			goto out_dput;
 
-		file->f_mode |= FMODE_CREATED;
-		audit_inode_child(dir_inode, dentry, AUDIT_TYPE_CHILD_CREATE);
-		if (!dir_inode->i_op->create) {
-			error = -EACCES;
-			goto out_dput;
-		}
+		inode_lock(dir_inode);
+		if (!dentry->d_inode && !unlikely(IS_DEADDIR(dir_inode))) {
+			file->f_mode |= FMODE_CREATED;
+			audit_inode_child(dir_inode, dentry, AUDIT_TYPE_CHILD_CREATE);
+			if (!dir_inode->i_op->create) {
+				error = -EACCES;
+				goto out_dput;
+			}
 
-		error = dir_inode->i_op->create(idmap, dir_inode, dentry,
-						mode, open_flag & O_EXCL);
+			error = dir_inode->i_op->create(idmap, dir_inode, dentry,
+							mode, open_flag & O_EXCL);
+			if (!error)
+				fsnotify_create(dir_inode, dentry);
+		} else if (!dentry->d_inode)
+			error = -ENOENT;
+		inode_unlock(dir_inode);
 		if (error)
 			goto out_dput;
 	}
@@ -4522,7 +4603,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct delegated_inode delegated_inode = { };
-	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
 	struct dentry *dentry;
@@ -4562,22 +4642,11 @@ static const char *open_last_lookups(struct nameidata *nd,
 		 * dropping this one anyway.
 		 */
 	}
-	if (open_flag & O_CREAT)
-		inode_lock(dir->d_inode);
-	else
-		inode_lock_shared(dir->d_inode);
 	dentry = lookup_open(nd, file, op, got_write, &delegated_inode);
 	if (!IS_ERR(dentry)) {
-		if (file->f_mode & FMODE_CREATED)
-			fsnotify_create(dir->d_inode, dentry);
 		if (file->f_mode & FMODE_OPENED)
 			fsnotify_open(file);
 	}
-	if (open_flag & O_CREAT)
-		inode_unlock(dir->d_inode);
-	else
-		inode_unlock_shared(dir->d_inode);
-
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 53/53] VFS: remove LOOKUP_SHARED
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (51 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock NeilBrown
@ 2026-03-12 21:12 ` NeilBrown
  2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
  2026-03-12 23:46 ` Linus Torvalds
  54 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

From: NeilBrown <neil@brown.name>

->lookup is now always called with a shared lock and LOOKUP_SHARED set,
so we can discard that flag and remove the code for when it wasn't set.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/afs/dir.c           | 10 ++--------
 fs/dcache.c            | 13 +++----------
 fs/namei.c             | 10 +++++-----
 fs/xfs/xfs_iops.c      |  3 +--
 include/linux/dcache.h |  3 +--
 include/linux/namei.h  |  3 +--
 6 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index f259ca2da383..29e39aeaf654 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -938,10 +938,7 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry,
 	/* Calling d_alloc_parallel() while holding parent locked is undesirable.
 	 * We don't really need the lock any more.
 	 */
-	if (flags & LOOKUP_SHARED)
-		inode_unlock_shared(dir);
-	else
-		inode_unlock(dir);
+	inode_unlock_shared(dir);
 	for (i = 0; i < subs->nr; i++) {
 		name = subs->subs[i];
 		len = dentry->d_name.len - 4 + strlen(name);
@@ -966,10 +963,7 @@ static struct dentry *afs_lookup_atsys(struct inode *dir, struct dentry *dentry,
 	 */
 	ret = NULL;
 out_s:
-	if (flags & LOOKUP_SHARED)
-		inode_lock_shared(dir);
-	else
-		inode_lock_nested(dir, I_MUTEX_PARENT);
+	inode_lock_shared(dir);
 	afs_put_sysnames(subs);
 	kfree(buf);
 out_p:
diff --git a/fs/dcache.c b/fs/dcache.c
index f573716d1a04..2d694e14bd22 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2224,7 +2224,6 @@ EXPORT_SYMBOL(d_obtain_root);
  * @dentry: the negative dentry that was passed to the parent's lookup func
  * @inode:  the inode case-insensitive lookup has found
  * @name:   the case-exact name to be associated with the returned dentry
- * @bool:   %true if lookup was performed with LOOKUP_SHARED
  *
  * This is to avoid filling the dcache with case-insensitive names to the
  * same inode, only the actual correct case is stored in the dcache for
@@ -2237,7 +2236,7 @@ EXPORT_SYMBOL(d_obtain_root);
  * the exact case, and return the spliced entry.
  */
 struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
-			struct qstr *name, bool shared)
+			struct qstr *name)
 {
 	struct dentry *found, *res;
 
@@ -2257,19 +2256,13 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 	 * d_in_lookup() (so ->d_parent is stable) and we are near the
 	 * end ->lookup() and will shortly drop the lock anyway.
 	 */
-	if (shared)
-		inode_unlock_shared(d_inode(dentry->d_parent));
-	else
-		inode_unlock(d_inode(dentry->d_parent));
+	inode_unlock_shared(d_inode(dentry->d_parent));
 	found = d_alloc_parallel(dentry->d_parent, name);
 	if (IS_ERR(found) || !d_in_lookup(found)) {
 		iput(inode);
 		return found;
 	}
-	if (shared)
-		inode_lock_shared(d_inode(dentry->d_parent));
-	else
-		inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
+	inode_lock_shared(d_inode(dentry->d_parent));
 	res = d_splice_alias(inode, found);
 	if (res) {
 		d_lookup_done(found);
diff --git a/fs/namei.c b/fs/namei.c
index 3d213070a515..9e2ac3077f72 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1826,7 +1826,7 @@ static struct dentry *lookup_one_qstr(const struct qstr *name,
 	if (unlikely(IS_DEADDIR(dir)))
 		old = ERR_PTR(-ENOENT);
 	else
-		old = dir->i_op->lookup(dir, dentry, flags | LOOKUP_SHARED);
+		old = dir->i_op->lookup(dir, dentry, flags);
 	inode_unlock_shared(dir);
 	if (unlikely(old)) {
 		d_lookup_done(dentry);
@@ -1951,7 +1951,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 			old = ERR_PTR(-ENOENT);
 		else
 			old = inode->i_op->lookup(inode, dentry,
-						  flags | LOOKUP_SHARED);
+						  flags);
 		inode_unlock_shared(inode);
 		d_lookup_done(dentry);
 		if (unlikely(old)) {
@@ -1966,14 +1966,14 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
 				  struct dentry *dir,
 				  unsigned int flags)
 {
-	return __lookup_slow(name, dir, flags | LOOKUP_SHARED, TASK_NORMAL);
+	return __lookup_slow(name, dir, flags, TASK_NORMAL);
 }
 
 static struct dentry *lookup_slow_killable(const struct qstr *name,
 					   struct dentry *dir,
 					   unsigned int flags)
 {
-	return __lookup_slow(name, dir, flags | LOOKUP_SHARED, TASK_KILLABLE);
+	return __lookup_slow(name, dir, flags, TASK_KILLABLE);
 }
 
 static inline int may_lookup(struct mnt_idmap *idmap,
@@ -4513,7 +4513,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 			res = ERR_PTR(-ENOENT);
 		else
 			res = dir_inode->i_op->lookup(dir_inode, dentry,
-						      nd->flags | LOOKUP_SHARED);
+						      nd->flags);
 		inode_unlock_shared(dir_inode);
 		d_lookup_done(dentry);
 		if (unlikely(res)) {
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 2641061ba1db..cfd1cb42a29f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -35,7 +35,6 @@
 #include <linux/security.h>
 #include <linux/iversion.h>
 #include <linux/fiemap.h>
-#include <linux/namei.h> // for LOOKUP_SHARED
 
 /*
  * Directories have different lock order w.r.t. mmap_lock compared to regular
@@ -370,7 +369,7 @@ xfs_vn_ci_lookup(
 	/* else case-insensitive match... */
 	dname.name = ci_name.name;
 	dname.len = ci_name.len;
-	dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));
+	dentry = d_add_ci(dentry, VFS_I(ip), &dname);
 	kfree(ci_name.name);
 	return dentry;
 }
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index eb1a59b6fca7..74607dbcb7f0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -250,8 +250,7 @@ struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
 					  const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
-				bool);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
 extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
 			const struct qstr *name);
 extern struct dentry *d_find_any_alias(struct inode *inode);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index cb79e84c718d..643d862a7fda 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -37,9 +37,8 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_CREATE		BIT(17)	/* ... in object creation */
 #define LOOKUP_EXCL		BIT(18)	/* ... in target must not exist */
 #define LOOKUP_RENAME_TARGET	BIT(19)	/* ... in destination of rename() */
-#define LOOKUP_SHARED		BIT(20) /* Parent lock is held shared */
 
-/* 3 spare bits for intent */
+/* 4 spare bits for intent */
 
 /* Scoping flags for lookup. */
 #define LOOKUP_NO_SYMLINKS	BIT(24) /* No symlink crossing. */
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (52 preceding siblings ...)
  2026-03-12 21:12 ` [PATCH 53/53] VFS: remove LOOKUP_SHARED NeilBrown
@ 2026-03-12 23:38 ` Steven Rostedt
  2026-03-13  0:18   ` NeilBrown
  2026-03-12 23:46 ` Linus Torvalds
  54 siblings, 1 reply; 65+ messages in thread
From: Steven Rostedt @ 2026-03-12 23:38 UTC (permalink / raw)
  To: NeilBrown
  Cc: NeilBrown, Linus Torvalds, Alexander Viro, Christian Brauner,
	Jan Kara, Jeff Layton, Trond Myklebust, Anna Schumaker,
	Carlos Maiolino, Miklos Szeredi, Amir Goldstein, Jan Harkes,
	Hugh Dickins, Baolin Wang, David Howells, Marc Dionne,
	Steve French, Namjae Jeon, Sungjong Seo, Yuezhang Mo,
	Andreas Hindborg, Breno Leitao, Theodore Ts'o, Andreas Dilger,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Fri, 13 Mar 2026 08:11:47 +1100
NeilBrown <neilb@ownmail.net> wrote:

> *[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
> *[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
>  [PATCH 28/53] smb/client: Use d_alloc_noblock() in
> *[PATCH 29/53] exfat: simplify exfat_lookup()
> *[PATCH 30/53] configfs: remove d_add() calls before
>  [PATCH 31/53] configfs: stop using d_add().
> *[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
> *[PATCH 33/53] ext4: use on-stack dentries in

>  [PATCH 34/53] tracefs: stop using d_add().

Hmm, another reason I hate being Cc'd on every patch of a patch bomb where
I only need to look at one (and maybe the first) patch.

For some reason, I'm missing several patches, and this is one of them :-p

-- Steve


>  [PATCH 35/53] cephfs: stop using d_add().
> *[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
>  [PATCH 37/53] cephfs: Use d_alloc_noblock() in
>  [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
>  [PATCH 39/53] ecryptfs: stop using d_add().
>  [PATCH 40/53] gfs2: stop using d_add().

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
  2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
                   ` (53 preceding siblings ...)
  2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
@ 2026-03-12 23:46 ` Linus Torvalds
  2026-03-13  0:09   ` NeilBrown
  54 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2026-03-12 23:46 UTC (permalink / raw)
  To: NeilBrown
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Jeff Layton,
	Trond Myklebust, Anna Schumaker, Carlos Maiolino, Miklos Szeredi,
	Amir Goldstein, Jan Harkes, Hugh Dickins, Baolin Wang,
	David Howells, Marc Dionne, Steve French, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Andreas Hindborg, Breno Leitao,
	Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Thu, 12 Mar 2026 at 14:44, NeilBrown <neilb@ownmail.net> wrote:
>
> This patch set progresses my effort to improve concurrency of
> directory operations and specifically to allow concurrent updates
> in a given directory.

I only got about half the patches, but the ones I did get didn't raise
my hackles.

HOWEVER.

This is very much a "absolutely requires ACKs from Al" series. Al?

Also, because I only got about half the patches, and there's 53 of
them total, I'd really like to see a git branch for something like
this. It makes it easier to review for me, and I suspect it makes it
easier for some of the test robots too.

But again - this needs Al to look at it. Iirc he had some fundamental
concern with the last version - hopefully now fixed, but ...

                 Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
  2026-03-12 23:46 ` Linus Torvalds
@ 2026-03-13  0:09   ` NeilBrown
  0 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-13  0:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Jeff Layton,
	Trond Myklebust, Anna Schumaker, Carlos Maiolino, Miklos Szeredi,
	Amir Goldstein, Jan Harkes, Hugh Dickins, Baolin Wang,
	David Howells, Marc Dionne, Steve French, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Andreas Hindborg, Breno Leitao,
	Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Fri, 13 Mar 2026, Linus Torvalds wrote:
> On Thu, 12 Mar 2026 at 14:44, NeilBrown <neilb@ownmail.net> wrote:
> >
> > This patch set progresses my effort to improve concurrency of
> > directory operations and specifically to allow concurrent updates
> > in a given directory.
> 
> I only got about half the patches, but the ones I did get didn't raise
> my hackles.
> 
> HOWEVER.
> 
> This is very much a "absolutely requires ACKs from Al" series. Al?

Yes, I'm looking forward to Al's thoughts

> 
> Also, because I only got about half the patches, and there's 53 of
> them total, I'd really like to see a git branch for something like
> this. It makes it easier to review for me, and I suspect it makes it
> easier for some of the test robots too.

github.com/neilbrown/linux.git branch pdirops

But if you have only time for one patch, 52/53 is the one to look at.

Thanks,
NeilBrown

> 
> But again - this needs Al to look at it. Iirc he had some fundamental
> concern with the last version - hopefully now fixed, but ...
> 
>                  Linus
> 
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
  2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
@ 2026-03-13  0:18   ` NeilBrown
  0 siblings, 0 replies; 65+ messages in thread
From: NeilBrown @ 2026-03-13  0:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Masami Hiramatsu,
	Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, Tyler Hicks,
	Andreas Gruenbacher, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Jeremy Kerr, Ard Biesheuvel, linux-fsdevel,
	linux-nfs, linux-xfs, linux-unionfs, coda, linux-mm, linux-afs,
	linux-cifs, linux-ext4, linux-kernel, linux-trace-kernel,
	ceph-devel, ecryptfs, gfs2, linux-um, linux-efi

On Fri, 13 Mar 2026, Steven Rostedt wrote:
> On Fri, 13 Mar 2026 08:11:47 +1100
> NeilBrown <neilb@ownmail.net> wrote:
> 
> > *[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
> > *[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
> >  [PATCH 28/53] smb/client: Use d_alloc_noblock() in
> > *[PATCH 29/53] exfat: simplify exfat_lookup()
> > *[PATCH 30/53] configfs: remove d_add() calls before
> >  [PATCH 31/53] configfs: stop using d_add().
> > *[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
> > *[PATCH 33/53] ext4: use on-stack dentries in
> 
> >  [PATCH 34/53] tracefs: stop using d_add().
> 
> Hmm, another reason I hate being Cc'd on every patch of a patch bomb where
> I only need to look at one (and maybe the first) patch.

I could try to refine my tooling, but you can't please all the people
all the time...  I wonder how many people would be bothered if only the
cover-letter was sent to everyone, and the patches only went to lkml -
to be fetched from lore if not subscribed.

You would probably need to look at 02/53

https://github.com/neilbrown/linux/commit/aebdc6545eb18e5b6a7d41320f30d752996b3c6c

to have the context to understand 34/53

> 
> For some reason, I'm missing several patches, and this is one of them :-p

They don't seem to have made it to lore.kernel.org either.  Maybe I'm
being rate-limited somewhere.

https://github.com/neilbrown/linux/commit/77074c04a94176d6b2b2caf44dd84f0788a420c4

Thanks,
NeilBrown

> 
> -- Steve
> 
> 
> >  [PATCH 35/53] cephfs: stop using d_add().
> > *[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
> >  [PATCH 37/53] cephfs: Use d_alloc_noblock() in
> >  [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
> >  [PATCH 39/53] ecryptfs: stop using d_add().
> >  [PATCH 40/53] gfs2: stop using d_add().
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
  2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
@ 2026-03-15 13:51   ` Amir Goldstein
  2026-03-18 21:10     ` NeilBrown
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2026-03-15 13:51 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Jan Harkes, Hugh Dickins, Baolin Wang,
	David Howells, Marc Dionne, Steve French, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Andreas Hindborg, Breno Leitao,
	Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Thu, Mar 12, 2026 at 10:49 PM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> When performing an "impure" readdir, ovl needs to perform a lookup on some
> of the names that it found.
> With proposed locking changes it will not be possible to perform this
> lookup (in particular, not safe to wait for d_alloc_parallel()) while
> holding a lock on the directory.
>
> ovl doesn't really need the lock at this point.

Not exactly. see below.

> It has already iterated
> the directory and has cached a list of the contents.  It now needs to
> gather extra information about some contents.  It can do this without
> the lock.
>
> After gathering that info it needs to retake the lock for API
> correctness.  After doing this it must check IS_DEADDIR() again to
> ensure readdir always returns -ENOENT on a removed directory.
>
> Note that while ->iterate_shared is called with a shared lock, ovl uses
> WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
> that exclusive lock.
>
> As the directory is no longer locked in ovl_cache_update() we need
> dget_parent() to get a reference to the parent.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/overlayfs/readdir.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 1dcc75b3a90f..d5123b37921c 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>                         goto get;
>                 }
>                 if (p->len == 2) {
> -                       /* we shall not be moved */
> -                       this = dget(dir->d_parent);
> +                       this = dget_parent(dir);
>                         goto get;
>                 }
>         }
>         /* This checks also for xwhiteouts */
> -       this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> +       this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);

ovl_cache_update() is also called from ovl_iterate_merged() where inode
is locked.

>         if (IS_ERR_OR_NULL(this) || !this->d_inode) {
>                 /* Mark a stale entry */
>                 p->is_whiteout = true;
> @@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
>         if (err)
>                 return err;
>
> +       inode_unlock(path->dentry->d_inode);
>         list_for_each_entry_safe(p, n, list, l_node) {
>                 if (!name_is_dot_dotdot(p->name, p->len)) {
>                         err = ovl_cache_update(path, p, true);
>                         if (err)
> -                               return err;
> +                               break;
>                 }
>                 if (p->ino == p->real_ino) {
>                         list_del(&p->l_node);
> @@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
>                         struct rb_node *parent = NULL;
>
>                         if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
> -                                                             &newp, &parent)))
> -                               return -EIO;
> +                                                             &newp, &parent))) {
> +                               err = -EIO;
> +                               break;
> +                       }
>
>                         rb_link_node(&p->node, parent, newp);
>                         rb_insert_color(&p->node, root);
>                 }
>         }
> -       return 0;
> +       inode_lock(path->dentry->d_inode);
> +       if (IS_DEADDIR(path->dentry->d_inode))
> +               err = -ENOENT;
> +       return err;
>  }
>
>  static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
> --

You missed the fact that overlayfs uses the dir inode lock
to protect the readdir inode cache, so your patch introduces
a risk for storing a stale readdir cache when dir modify operations
invalidate the readdir cache version while lock is dropped
and also introduces memory leak when cache is stomped
without freeing cache created by a competing thread.
I think something like the untested patch below should fix this.

I did not look into ovl_iterate_merged() to see if it has a simple
fix and I am not 100% sure that this fix for impure dir is enough.

Thanks,
Amir.

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index d5123b37921c8..9e90064b252ce 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -702,15 +702,13 @@ static struct ovl_dir_cache
*ovl_cache_get_impure(const struct path *path)
        struct inode *inode = d_inode(dentry);
        struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
        struct ovl_dir_cache *cache;
+       /* Snapshot version before ovl_dir_read_impure() drops i_rwsem */
+       u64 version = ovl_inode_version_get(inode);

        cache = ovl_dir_cache(inode);
-       if (cache && ovl_inode_version_get(inode) == cache->version)
+       if (cache && version == cache->version)
                return cache;

-       /* Impure cache is not refcounted, free it here */
-       ovl_dir_cache_free(inode);
-       ovl_set_dir_cache(inode, NULL);
-
        cache = kzalloc_obj(struct ovl_dir_cache);
        if (!cache)
                return ERR_PTR(-ENOMEM);
@@ -721,6 +719,14 @@ static struct ovl_dir_cache
*ovl_cache_get_impure(const struct path *path)
                kfree(cache);
                return ERR_PTR(res);
        }
+
+       /*
+        * Impure cache is not refcounted, free it here.
+        * Also frees cache stored by concurrent readdir during i_rwsem drop.
+        */
+       ovl_dir_cache_free(inode);
+       ovl_set_dir_cache(inode, NULL);
+
        if (list_empty(&cache->entries)) {
                /*
                 * A good opportunity to get rid of an unneeded "impure" flag.
@@ -736,7 +742,7 @@ static struct ovl_dir_cache
*ovl_cache_get_impure(const struct path *path)
                return NULL;
        }

-       cache->version = ovl_inode_version_get(inode);
+       cache->version = version;
        ovl_set_dir_cache(inode, cache);

        return cache;

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal()
  2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
@ 2026-03-17  9:37   ` Jan Kara
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Kara @ 2026-03-17  9:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Fri 13-03-26 08:12:20, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
> 
> ext4_fc_replay_link_internal() uses two dentries to simply code-reuse
> when replaying a "link" operation.  It does not need to interact with
> the dcache and removes the dentries shortly after adding them.
> 
> They are passed to __ext4_link() which only performs read accesses on
> these dentries and only uses the name and parent of dentry_inode (plus
> checking a flag is unset) and only uses the inode of the parent.
> 
> So instead of allocating dentries and adding them to the dcache, allocat
> two dentries on the stack, set up the required fields, and pass these to
> __ext4_link().
> 
> This substantially simplifies the code and removes on of the few uses of
> d_alloc() - preparing for its removal.
> 
> Signed-off-by: NeilBrown <neil@brown.name>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fast_commit.c | 40 ++++++++--------------------------------
>  1 file changed, 8 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 2a5daf1d9667..e3593bb90a62 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1446,8 +1446,6 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
>  				struct inode *inode)
>  {
>  	struct inode *dir = NULL;
> -	struct dentry *dentry_dir = NULL, *dentry_inode = NULL;
> -	struct qstr qstr_dname = QSTR_INIT(darg->dname, darg->dname_len);
>  	int ret = 0;
>  
>  	dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL);
> @@ -1457,28 +1455,14 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
>  		goto out;
>  	}
>  
> -	dentry_dir = d_obtain_alias(dir);
> -	if (IS_ERR(dentry_dir)) {
> -		ext4_debug("Failed to obtain dentry");
> -		dentry_dir = NULL;
> -		goto out;
> -	}
> +	{
> +		struct dentry dentry_dir = { .d_inode = dir };
> +		const struct dentry dentry_inode = {
> +			.d_parent = &dentry_dir,
> +			.d_name = QSTR_LEN(darg->dname, darg->dname_len),
> +		};
>  
> -	dentry_inode = d_alloc(dentry_dir, &qstr_dname);
> -	if (!dentry_inode) {
> -		ext4_debug("Inode dentry not created.");
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> -
> -	ihold(inode);
> -	inc_nlink(inode);
> -	ret = __ext4_link(dir, inode, dentry_inode);
> -	if (ret) {
> -		drop_nlink(inode);
> -		iput(inode);
> -	} else {
> -		d_instantiate(dentry_inode, inode);
> +		ret = __ext4_link(dir, inode, &dentry_inode);
>  	}
>  	/*
>  	 * It's possible that link already existed since data blocks
> @@ -1493,16 +1477,8 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
>  
>  	ret = 0;
>  out:
> -	if (dentry_dir) {
> -		d_drop(dentry_dir);
> -		dput(dentry_dir);
> -	} else if (dir) {
> +	if (dir)
>  		iput(dir);
> -	}
> -	if (dentry_inode) {
> -		d_drop(dentry_inode);
> -		dput(dentry_inode);
> -	}
>  
>  	return ret;
>  }
> -- 
> 2.50.0.107.gf914562f5916.dirty
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
  2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
@ 2026-03-17 10:00   ` Jan Kara
  2026-03-17 20:27     ` [PATCH 32/53f] " NeilBrown
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2026-03-17 10:00 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
	Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
	Baolin Wang, David Howells, Marc Dionne, Steve French,
	Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
	Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
	Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel,
	linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
	linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
	linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
	linux-efi

On Fri 13-03-26 08:12:19, NeilBrown wrote:
...
> diff --git a/fs/dcache.c b/fs/dcache.c
> index a1219b446b74..c48337d95f9a 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -358,7 +358,7 @@ static inline int dname_external(const struct dentry *dentry)
>  	return dentry->d_name.name != dentry->d_shortname.string;
>  }
>  
> -void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
> +void take_dentry_name_snapshot(struct name_snapshot *name, const struct dentry *dentry)
>  {
>  	unsigned seq;
>  	const unsigned char *s;

The constification of take_dentry_name_snapshot() should probably be a
separate patch? Also I'd note that this constification (and the
constification of __ext4_fc_track_link()) isn't really needed here because
ext4_fc_track_link() will immediately bail through ext4_fc_disabled() when
fast commit replay is happening so __ext4_fc_track_link() never gets called
in that case - more about that below.

> @@ -1471,7 +1471,15 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
>  		goto out;
>  	}
>  
> +	ihold(inode);
> +	inc_nlink(inode);
>  	ret = __ext4_link(dir, inode, dentry_inode);
> +	if (ret) {
> +		drop_nlink(inode);
> +		iput(inode);
> +	} else {
> +		d_instantiate(dentry_inode, inode);
> +	}
>  	/*
>  	 * It's possible that link already existed since data blocks
>  	 * for the dir in question got persisted before we crashed OR
...
> @@ -3460,8 +3460,6 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
>  		ext4_handle_sync(handle);
>  
>  	inode_set_ctime_current(inode);
> -	ext4_inc_count(inode);
> -	ihold(inode);
>  
>  	err = ext4_add_entry(handle, dentry, inode);
>  	if (!err) {
> @@ -3471,11 +3469,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
>  		 */
>  		if (inode->i_nlink == 1)
>  			ext4_orphan_del(handle, inode);
> -		d_instantiate(dentry, inode);
> -		ext4_fc_track_link(handle, dentry);
> -	} else {
> -		drop_nlink(inode);
> -		iput(inode);
> +		__ext4_fc_track_link(handle, inode, dentry);

This looks wrong. If fastcommit replay is running, we must skip calling
__ext4_fc_track_link(). Similarly if the filesystem is currently
inelligible for fastcommit (due to some complex unsupported operations
running in parallel). Why did you change ext4_fc_track_link() to
__ext4_fc_track_link()?

> @@ -3504,7 +3498,16 @@ static int ext4_link(struct dentry *old_dentry,
>  	err = dquot_initialize(dir);
>  	if (err)
>  		return err;
> -	return __ext4_link(dir, inode, dentry);
> +	ihold(inode);
> +	ext4_inc_count(inode);

I'd put inc_nlink() here instead. We are guaranteed to have a regular file
anyway and it matches what we do in ext4_fc_replay_link_internal().
Alternatively we could consistently use ext4_inc_count() &
ext4_dec_count() in these functions.

> +	err = __ext4_link(dir, inode, dentry);
> +	if (err) {
> +		drop_nlink(inode);
> +		iput(inode);
> +	} else {
> +		d_instantiate(dentry, inode);
> +	}
> +	return err;
>  }

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 32/53f] ext4: move dcache modifying code out of __ext4_link()
  2026-03-17 10:00   ` Jan Kara
@ 2026-03-17 20:27     ` NeilBrown
  2026-03-18 17:47       ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: NeilBrown @ 2026-03-17 20:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Jeff Layton,
	Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
	linux-kernel


(cc trimmed...)

On Tue, 17 Mar 2026, Jan Kara wrote:
> On Fri 13-03-26 08:12:19, NeilBrown wrote:
> ...
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index a1219b446b74..c48337d95f9a 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -358,7 +358,7 @@ static inline int dname_external(const struct dentry *dentry)
> >  	return dentry->d_name.name != dentry->d_shortname.string;
> >  }
> >  
> > -void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
> > +void take_dentry_name_snapshot(struct name_snapshot *name, const struct dentry *dentry)
> >  {
> >  	unsigned seq;
> >  	const unsigned char *s;
> 
> The constification of take_dentry_name_snapshot() should probably be a
> separate patch? Also I'd note that this constification (and the
> constification of __ext4_fc_track_link()) isn't really needed here because
> ext4_fc_track_link() will immediately bail through ext4_fc_disabled() when
> fast commit replay is happening so __ext4_fc_track_link() never gets called
> in that case - more about that below.

I thought I might have overdone the constantification, but I didn't want
to under-do it, at least for my compile tests.


> 
> > @@ -1471,7 +1471,15 @@ static int ext4_fc_replay_link_internal(struct super_block *sb,
> >  		goto out;
> >  	}
> >  
> > +	ihold(inode);
> > +	inc_nlink(inode);
> >  	ret = __ext4_link(dir, inode, dentry_inode);
> > +	if (ret) {
> > +		drop_nlink(inode);
> > +		iput(inode);
> > +	} else {
> > +		d_instantiate(dentry_inode, inode);
> > +	}
> >  	/*
> >  	 * It's possible that link already existed since data blocks
> >  	 * for the dir in question got persisted before we crashed OR
> ...
> > @@ -3460,8 +3460,6 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
> >  		ext4_handle_sync(handle);
> >  
> >  	inode_set_ctime_current(inode);
> > -	ext4_inc_count(inode);
> > -	ihold(inode);
> >  
> >  	err = ext4_add_entry(handle, dentry, inode);
> >  	if (!err) {
> > @@ -3471,11 +3469,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
> >  		 */
> >  		if (inode->i_nlink == 1)
> >  			ext4_orphan_del(handle, inode);
> > -		d_instantiate(dentry, inode);
> > -		ext4_fc_track_link(handle, dentry);
> > -	} else {
> > -		drop_nlink(inode);
> > -		iput(inode);
> > +		__ext4_fc_track_link(handle, inode, dentry);
> 
> This looks wrong. If fastcommit replay is running, we must skip calling
> __ext4_fc_track_link(). Similarly if the filesystem is currently
> inelligible for fastcommit (due to some complex unsupported operations
> running in parallel). Why did you change ext4_fc_track_link() to
> __ext4_fc_track_link()?

I changed to __ext4_fc_track_link() because I needed something that
accepted the inode separately from the dentry.  As you point out, that
means we lose some important code which makes the decision misguided.

I'm wondering about taking a different approach - not using a dentry at
all and not constifying anything.
I could split __ext4_add_entry() out of ext4_add_entry() and instead of
passing the dentry-to-add I could pass the dir inode qstr name, and
d_flags.

Then ext4_link could be passed the same plus a "do fast commit" flag.

The result would be more verbose, but also hopefully more clear.

> 
> > @@ -3504,7 +3498,16 @@ static int ext4_link(struct dentry *old_dentry,
> >  	err = dquot_initialize(dir);
> >  	if (err)
> >  		return err;
> > -	return __ext4_link(dir, inode, dentry);
> > +	ihold(inode);
> > +	ext4_inc_count(inode);
> 
> I'd put inc_nlink() here instead. We are guaranteed to have a regular file
> anyway and it matches what we do in ext4_fc_replay_link_internal().
> Alternatively we could consistently use ext4_inc_count() &
> ext4_dec_count() in these functions.

Current __ext4_link() has ext4_inc_count().
But it is only usable in namei.c.  So I didn't use it in
ext4_fc_replay_link_internal() as that is in a different file.

I'll revise and resend these patches just to the ext4 team.

Thanks,
NeilBrown


> 
> > +	err = __ext4_link(dir, inode, dentry);
> > +	if (err) {
> > +		drop_nlink(inode);
> > +		iput(inode);
> > +	} else {
> > +		d_instantiate(dentry, inode);
> > +	}
> > +	return err;
> >  }
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 32/53f] ext4: move dcache modifying code out of __ext4_link()
  2026-03-17 20:27     ` [PATCH 32/53f] " NeilBrown
@ 2026-03-18 17:47       ` Jan Kara
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Kara @ 2026-03-18 17:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jan Kara, Alexander Viro, Christian Brauner, Jeff Layton,
	Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
	linux-kernel

On Wed 18-03-26 07:27:37, NeilBrown wrote:
> > > @@ -3460,8 +3460,6 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
> > >  		ext4_handle_sync(handle);
> > >  
> > >  	inode_set_ctime_current(inode);
> > > -	ext4_inc_count(inode);
> > > -	ihold(inode);
> > >  
> > >  	err = ext4_add_entry(handle, dentry, inode);
> > >  	if (!err) {
> > > @@ -3471,11 +3469,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
> > >  		 */
> > >  		if (inode->i_nlink == 1)
> > >  			ext4_orphan_del(handle, inode);
> > > -		d_instantiate(dentry, inode);
> > > -		ext4_fc_track_link(handle, dentry);
> > > -	} else {
> > > -		drop_nlink(inode);
> > > -		iput(inode);
> > > +		__ext4_fc_track_link(handle, inode, dentry);
> > 
> > This looks wrong. If fastcommit replay is running, we must skip calling
> > __ext4_fc_track_link(). Similarly if the filesystem is currently
> > inelligible for fastcommit (due to some complex unsupported operations
> > running in parallel). Why did you change ext4_fc_track_link() to
> > __ext4_fc_track_link()?
> 
> I changed to __ext4_fc_track_link() because I needed something that
> accepted the inode separately from the dentry.  As you point out, that
> means we lose some important code which makes the decision misguided.
> 
> I'm wondering about taking a different approach - not using a dentry at
> all and not constifying anything.
> I could split __ext4_add_entry() out of ext4_add_entry() and instead of
> passing the dentry-to-add I could pass the dir inode qstr name, and
> d_flags.
> 
> Then ext4_link could be passed the same plus a "do fast commit" flag.
> 
> The result would be more verbose, but also hopefully more clear.

Yeah, I was considering that option as well when looking at this. Let's see
how the code will look like.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
  2026-03-15 13:51   ` Amir Goldstein
@ 2026-03-18 21:10     ` NeilBrown
  2026-03-20 14:47       ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: NeilBrown @ 2026-03-18 21:10 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Miklos Szeredi,
	linux-fsdevel, linux-unionfs, linux-kernel

[[ CC list trimmed ]]

On Mon, 16 Mar 2026, Amir Goldstein wrote:
> On Thu, Mar 12, 2026 at 10:49 PM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > When performing an "impure" readdir, ovl needs to perform a lookup on some
> > of the names that it found.
> > With proposed locking changes it will not be possible to perform this
> > lookup (in particular, not safe to wait for d_alloc_parallel()) while
> > holding a lock on the directory.
> >
> > ovl doesn't really need the lock at this point.
> 
> Not exactly. see below.
> 
> > It has already iterated
> > the directory and has cached a list of the contents.  It now needs to
> > gather extra information about some contents.  It can do this without
> > the lock.
> >
> > After gathering that info it needs to retake the lock for API
> > correctness.  After doing this it must check IS_DEADDIR() again to
> > ensure readdir always returns -ENOENT on a removed directory.
> >
> > Note that while ->iterate_shared is called with a shared lock, ovl uses
> > WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
> > that exclusive lock.
> >
> > As the directory is no longer locked in ovl_cache_update() we need
> > dget_parent() to get a reference to the parent.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  fs/overlayfs/readdir.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 1dcc75b3a90f..d5123b37921c 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
> >                         goto get;
> >                 }
> >                 if (p->len == 2) {
> > -                       /* we shall not be moved */
> > -                       this = dget(dir->d_parent);
> > +                       this = dget_parent(dir);
> >                         goto get;
> >                 }
> >         }
> >         /* This checks also for xwhiteouts */
> > -       this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> > +       this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> 
> ovl_cache_update() is also called from ovl_iterate_merged() where inode
> is locked.
> 
> >         if (IS_ERR_OR_NULL(this) || !this->d_inode) {
> >                 /* Mark a stale entry */
> >                 p->is_whiteout = true;
> > @@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
> >         if (err)
> >                 return err;
> >
> > +       inode_unlock(path->dentry->d_inode);
> >         list_for_each_entry_safe(p, n, list, l_node) {
> >                 if (!name_is_dot_dotdot(p->name, p->len)) {
> >                         err = ovl_cache_update(path, p, true);
> >                         if (err)
> > -                               return err;
> > +                               break;
> >                 }
> >                 if (p->ino == p->real_ino) {
> >                         list_del(&p->l_node);
> > @@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
> >                         struct rb_node *parent = NULL;
> >
> >                         if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
> > -                                                             &newp, &parent)))
> > -                               return -EIO;
> > +                                                             &newp, &parent))) {
> > +                               err = -EIO;
> > +                               break;
> > +                       }
> >
> >                         rb_link_node(&p->node, parent, newp);
> >                         rb_insert_color(&p->node, root);
> >                 }
> >         }
> > -       return 0;
> > +       inode_lock(path->dentry->d_inode);
> > +       if (IS_DEADDIR(path->dentry->d_inode))
> > +               err = -ENOENT;
> > +       return err;
> >  }
> >
> >  static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
> > --
> 
> You missed the fact that overlayfs uses the dir inode lock
> to protect the readdir inode cache, so your patch introduces
> a risk for storing a stale readdir cache when dir modify operations
> invalidate the readdir cache version while lock is dropped
> and also introduces memory leak when cache is stomped
> without freeing cache created by a competing thread.
> I think something like the untested patch below should fix this.

Yes, I did miss that - thanks. I think I missed a few other details too.
I no longer think it can be safe to drop the lock without substantial
rewrites - and even then maybe not.

So I'm considering a different approach.
This patch demonstrates what I'm thinking, though it still needs work I
think.

Thanks,
NeilBrown

From: NeilBrown <neil@brown.name>
Subject: [PATCH] ovl: stop using lookup_one() in iterate_shared() handling.

lookup_one() is expected to be removed as it does not fit well with
proposed changes to directory locking.
Specifically d_alloc_parallel() will be ordered outside of i_rwsem
and as iterate_shared() is called with i_rwsem held it is not safe
to call d_alloc_parallel().

We can instead call d_alloc_noblock() and then call the ->lookup, but
that can fail if there is a lookup attempt concurrent with the
readdir().

ovl cannot afford for the lookup to fail as that could produce incorrect
results, and it cannot safely drop i_rwsem temporarily and that could
introduce races with handling of the directory cache.

Instead we rely on the fact that ovl_iterate() has an exclusive lock on
the directory, so any concurrent lookup will wait for the ovl_iterate()
call to complete.  We allocate a separate dentry and if the lookup is
successful, it is hashed with the result.

When the concurrent lookup gets i_rwsem it mustn't do its own lookup -
it must use the existing dentry.  This is done using
try_lookup_noperm().  To manage overheads we keep a counter of the
number of "Stray dentries" there might be on each directory and only
check for one when this count is non zero.

If a 'stray dentry' were discarded for any reason before the concurrent
lookup completed, the count would never reach zero.  That might be a problem.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/overlayfs/namei.c     | 12 ++++++++++++
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/readdir.c   | 26 ++++++++++++++++++++++++--
 fs/overlayfs/super.c     |  1 +
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index d8dd4b052984..c3ff57047712 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1399,6 +1399,18 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > ofs->namelen)
 		return ERR_PTR(-ENAMETOOLONG);
 
+	if (atomic_read(&OVL_I(dir)->stray_dentries) && d_in_lookup(dentry)) {
+		/* This dentry might have forced readdir to do the lookup */
+		struct dentry *alias =
+			try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
+						    dentry->d_name.len),
+					  dentry->d_parent);
+		if (alias && !IS_ERR(alias)) {
+			atomic_dec(&OVL_I(dir)->stray_dentries);
+			return alias;
+		}
+	}
+
 	with_ovl_creds(dentry->d_sb)
 		err = ovl_lookup_layers(&ctx, &d);
 
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1d4828dbcf7a..0e7751d5dfca 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -172,6 +172,7 @@ struct ovl_inode {
 	struct inode vfs_inode;
 	struct dentry *__upperdentry;
 	struct ovl_entry *oe;
+	atomic_t stray_dentries; /* directory */
 
 	/* synchronize copy up and more */
 	struct mutex lock;
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1dcc75b3a90f..add556a0a2b6 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -557,6 +557,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 	enum ovl_path_type type;
 	u64 ino = p->real_ino;
 	int xinobits = ovl_xino_bits(ofs);
+	bool did_alloc = false;
 	int err = 0;
 
 	if (!ovl_same_dev(ofs) && !p->check_xwhiteout)
@@ -574,8 +575,29 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 		}
 	}
 	/* This checks also for xwhiteouts */
-	this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
-	if (IS_ERR_OR_NULL(this) || !this->d_inode) {
+	this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
+	if (this == ERR_PTR(-EWOULDBLOCK)) {
+		/*
+		 * Some other thread is looking up this name and will block
+		 * on i_rwsem before they can complete the lookup.
+		 * We will do the lookup and when that lookup gets a turn it
+		 * will return this dentry.
+		 */
+		this = d_alloc_name(dir, p->name);
+		did_alloc = true;
+	}
+	if (!IS_ERR(this) && !d_unhashed(this)) {
+		/* Either we got in-lookup or we made our own unhashed */
+		struct dentry *alias = ovl_lookup(dir->d_inode, this, 0);
+		if (alias) {
+			d_lookup_done(this);
+			dput(this);
+			this = alias;
+		} else if (did_alloc) {
+			atomic_inc(&OVL_I(dir->d_inode)->stray_dentries);
+		}
+	}
+	if (IS_ERR(this) || !this->d_inode) {
 		/* Mark a stale entry */
 		p->is_whiteout = true;
 		if (IS_ERR(this)) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d4c12feec039..172d3ac7d3e2 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -195,6 +195,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
 	oi->__upperdentry = NULL;
 	oi->lowerdata_redirect = NULL;
 	oi->oe = NULL;
+	atomic_set(&oi->stray_dentries, 0);
 	mutex_init(&oi->lock);
 
 	return &oi->vfs_inode;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
  2026-03-18 21:10     ` NeilBrown
@ 2026-03-20 14:47       ` Amir Goldstein
  0 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2026-03-20 14:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Miklos Szeredi,
	linux-fsdevel, linux-unionfs, linux-kernel

On Wed, Mar 18, 2026 at 10:10 PM NeilBrown <neilb@ownmail.net> wrote:
>
> [[ CC list trimmed ]]
>
> On Mon, 16 Mar 2026, Amir Goldstein wrote:
> > On Thu, Mar 12, 2026 at 10:49 PM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > When performing an "impure" readdir, ovl needs to perform a lookup on some
> > > of the names that it found.
> > > With proposed locking changes it will not be possible to perform this
> > > lookup (in particular, not safe to wait for d_alloc_parallel()) while
> > > holding a lock on the directory.
> > >
> > > ovl doesn't really need the lock at this point.
> >
> > Not exactly. see below.
> >
> > > It has already iterated
> > > the directory and has cached a list of the contents.  It now needs to
> > > gather extra information about some contents.  It can do this without
> > > the lock.
> > >
> > > After gathering that info it needs to retake the lock for API
> > > correctness.  After doing this it must check IS_DEADDIR() again to
> > > ensure readdir always returns -ENOENT on a removed directory.
> > >
> > > Note that while ->iterate_shared is called with a shared lock, ovl uses
> > > WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
> > > that exclusive lock.
> > >
> > > As the directory is no longer locked in ovl_cache_update() we need
> > > dget_parent() to get a reference to the parent.
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> > >  fs/overlayfs/readdir.c | 19 ++++++++++++-------
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > > index 1dcc75b3a90f..d5123b37921c 100644
> > > --- a/fs/overlayfs/readdir.c
> > > +++ b/fs/overlayfs/readdir.c
> > > @@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
> > >                         goto get;
> > >                 }
> > >                 if (p->len == 2) {
> > > -                       /* we shall not be moved */
> > > -                       this = dget(dir->d_parent);
> > > +                       this = dget_parent(dir);
> > >                         goto get;
> > >                 }
> > >         }
> > >         /* This checks also for xwhiteouts */
> > > -       this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> > > +       this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> >
> > ovl_cache_update() is also called from ovl_iterate_merged() where inode
> > is locked.
> >
> > >         if (IS_ERR_OR_NULL(this) || !this->d_inode) {
> > >                 /* Mark a stale entry */
> > >                 p->is_whiteout = true;
> > > @@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
> > >         if (err)
> > >                 return err;
> > >
> > > +       inode_unlock(path->dentry->d_inode);
> > >         list_for_each_entry_safe(p, n, list, l_node) {
> > >                 if (!name_is_dot_dotdot(p->name, p->len)) {
> > >                         err = ovl_cache_update(path, p, true);
> > >                         if (err)
> > > -                               return err;
> > > +                               break;
> > >                 }
> > >                 if (p->ino == p->real_ino) {
> > >                         list_del(&p->l_node);
> > > @@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path,  struct list_head *list,
> > >                         struct rb_node *parent = NULL;
> > >
> > >                         if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
> > > -                                                             &newp, &parent)))
> > > -                               return -EIO;
> > > +                                                             &newp, &parent))) {
> > > +                               err = -EIO;
> > > +                               break;
> > > +                       }
> > >
> > >                         rb_link_node(&p->node, parent, newp);
> > >                         rb_insert_color(&p->node, root);
> > >                 }
> > >         }
> > > -       return 0;
> > > +       inode_lock(path->dentry->d_inode);
> > > +       if (IS_DEADDIR(path->dentry->d_inode))
> > > +               err = -ENOENT;
> > > +       return err;
> > >  }
> > >
> > >  static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
> > > --
> >
> > You missed the fact that overlayfs uses the dir inode lock
> > to protect the readdir inode cache, so your patch introduces
> > a risk for storing a stale readdir cache when dir modify operations
> > invalidate the readdir cache version while lock is dropped
> > and also introduces memory leak when cache is stomped
> > without freeing cache created by a competing thread.
> > I think something like the untested patch below should fix this.
>
> Yes, I did miss that - thanks. I think I missed a few other details too.
> I no longer think it can be safe to drop the lock without substantial
> rewrites - and even then maybe not.
>
> So I'm considering a different approach.
> This patch demonstrates what I'm thinking, though it still needs work I
> think.

I like this direction.

I always thought that we need to get rid of this vfs lookup
inside readdir but I thought it would be a lot of work.

Your suggestion walks around this in an elegant way.

>
> Thanks,
> NeilBrown
>
> From: NeilBrown <neil@brown.name>
> Subject: [PATCH] ovl: stop using lookup_one() in iterate_shared() handling.
>
> lookup_one() is expected to be removed as it does not fit well with
> proposed changes to directory locking.
> Specifically d_alloc_parallel() will be ordered outside of i_rwsem
> and as iterate_shared() is called with i_rwsem held it is not safe
> to call d_alloc_parallel().
>
> We can instead call d_alloc_noblock() and then call the ->lookup, but
> that can fail if there is a lookup attempt concurrent with the
> readdir().
>
> ovl cannot afford for the lookup to fail as that could produce incorrect
> results, and it cannot safely drop i_rwsem temporarily and that could
> introduce races with handling of the directory cache.
>
> Instead we rely on the fact that ovl_iterate() has an exclusive lock on
> the directory, so any concurrent lookup will wait for the ovl_iterate()
> call to complete.  We allocate a separate dentry and if the lookup is
> successful, it is hashed with the result.
>
> When the concurrent lookup gets i_rwsem it mustn't do its own lookup -
> it must use the existing dentry.  This is done using
> try_lookup_noperm().  To manage overheads we keep a counter of the
> number of "Stray dentries" there might be on each directory and only
> check for one when this count is non zero.
>
> If a 'stray dentry' were discarded for any reason before the concurrent
> lookup completed, the count would never reach zero.  That might be a problem.

Can we deal with the discarded dentries using OVL_E_FLAGS() for
a stray ovl dentry implement the relevant ovl_dentry_operations to decrement
the stray counter?

Thanks,
Amir.

>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/overlayfs/namei.c     | 12 ++++++++++++
>  fs/overlayfs/ovl_entry.h |  1 +
>  fs/overlayfs/readdir.c   | 26 ++++++++++++++++++++++++--
>  fs/overlayfs/super.c     |  1 +
>  4 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index d8dd4b052984..c3ff57047712 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -1399,6 +1399,18 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         if (dentry->d_name.len > ofs->namelen)
>                 return ERR_PTR(-ENAMETOOLONG);
>
> +       if (atomic_read(&OVL_I(dir)->stray_dentries) && d_in_lookup(dentry)) {
> +               /* This dentry might have forced readdir to do the lookup */
> +               struct dentry *alias =
> +                       try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
> +                                                   dentry->d_name.len),
> +                                         dentry->d_parent);
> +               if (alias && !IS_ERR(alias)) {
> +                       atomic_dec(&OVL_I(dir)->stray_dentries);
> +                       return alias;
> +               }
> +       }
> +
>         with_ovl_creds(dentry->d_sb)
>                 err = ovl_lookup_layers(&ctx, &d);
>
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 1d4828dbcf7a..0e7751d5dfca 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -172,6 +172,7 @@ struct ovl_inode {
>         struct inode vfs_inode;
>         struct dentry *__upperdentry;
>         struct ovl_entry *oe;
> +       atomic_t stray_dentries; /* directory */
>
>         /* synchronize copy up and more */
>         struct mutex lock;
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 1dcc75b3a90f..add556a0a2b6 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -557,6 +557,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>         enum ovl_path_type type;
>         u64 ino = p->real_ino;
>         int xinobits = ovl_xino_bits(ofs);
> +       bool did_alloc = false;
>         int err = 0;
>
>         if (!ovl_same_dev(ofs) && !p->check_xwhiteout)
> @@ -574,8 +575,29 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>                 }
>         }
>         /* This checks also for xwhiteouts */
> -       this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> -       if (IS_ERR_OR_NULL(this) || !this->d_inode) {
> +       this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
> +       if (this == ERR_PTR(-EWOULDBLOCK)) {
> +               /*
> +                * Some other thread is looking up this name and will block
> +                * on i_rwsem before they can complete the lookup.
> +                * We will do the lookup and when that lookup gets a turn it
> +                * will return this dentry.
> +                */
> +               this = d_alloc_name(dir, p->name);
> +               did_alloc = true;
> +       }
> +       if (!IS_ERR(this) && !d_unhashed(this)) {
> +               /* Either we got in-lookup or we made our own unhashed */
> +               struct dentry *alias = ovl_lookup(dir->d_inode, this, 0);
> +               if (alias) {
> +                       d_lookup_done(this);
> +                       dput(this);
> +                       this = alias;
> +               } else if (did_alloc) {
> +                       atomic_inc(&OVL_I(dir->d_inode)->stray_dentries);
> +               }
> +       }
> +       if (IS_ERR(this) || !this->d_inode) {
>                 /* Mark a stale entry */
>                 p->is_whiteout = true;
>                 if (IS_ERR(this)) {
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index d4c12feec039..172d3ac7d3e2 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -195,6 +195,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
>         oi->__upperdentry = NULL;
>         oi->lowerdata_redirect = NULL;
>         oi->oe = NULL;
> +       atomic_set(&oi->stray_dentries, 0);
>         mutex_init(&oi->lock);
>
>         return &oi->vfs_inode;
> --
> 2.50.0.107.gf914562f5916.dirty
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2026-03-20 14:47 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
2026-03-12 21:11 ` [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
2026-03-12 21:11 ` [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
2026-03-12 21:11 ` [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
2026-03-12 21:11 ` [PATCH 05/53] VFS: introduce d_alloc_noblock() NeilBrown
2026-03-12 21:11 ` [PATCH 06/53] VFS: add d_duplicate() NeilBrown
2026-03-12 21:11 ` [PATCH 07/53] VFS: Add LOOKUP_SHARED flag NeilBrown
2026-03-12 21:11 ` [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
2026-03-12 21:11 ` [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
2026-03-12 21:11 ` [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() NeilBrown
2026-03-12 21:11 ` [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:11 ` [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
2026-03-12 21:12 ` [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
2026-03-12 21:12 ` [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename NeilBrown
2026-03-12 21:12 ` [PATCH 15/53] nfs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
2026-03-15 13:51   ` Amir Goldstein
2026-03-18 21:10     ` NeilBrown
2026-03-20 14:47       ` Amir Goldstein
2026-03-12 21:12 ` [PATCH 17/53] coda: don't d_drop() early NeilBrown
2026-03-12 21:12 ` [PATCH 18/53] shmem: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 19/53] afs: use d_time instead of d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename NeilBrown
2026-03-12 21:12 ` [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() NeilBrown
2026-03-12 21:12 ` [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() NeilBrown
2026-03-12 21:12 ` [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock NeilBrown
2026-03-12 21:12 ` [PATCH 24/53] afs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens NeilBrown
2026-03-12 21:12 ` [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open NeilBrown
2026-03-12 21:12 ` [PATCH 28/53] smb/client: Use d_alloc_noblock() in cifs_prime_dcache() NeilBrown
2026-03-12 21:12 ` [PATCH 29/53] exfat: simplify exfat_lookup() NeilBrown
2026-03-12 21:12 ` [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group() NeilBrown
2026-03-12 21:12 ` [PATCH 31/53] configfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
2026-03-17 10:00   ` Jan Kara
2026-03-17 20:27     ` [PATCH 32/53f] " NeilBrown
2026-03-18 17:47       ` Jan Kara
2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
2026-03-17  9:37   ` Jan Kara
2026-03-12 21:12 ` [PATCH 34/53] tracefs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 35/53] cephfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace() NeilBrown
2026-03-12 21:12 ` [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate() NeilBrown
2026-03-12 21:12 ` [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 39/53] ecryptfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 40/53] gfs2: " NeilBrown
2026-03-12 21:12 ` [PATCH 41/53] libfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link() NeilBrown
2026-03-12 21:12 ` [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir() NeilBrown
2026-03-12 21:12 ` [PATCH 45/53] efivarfs: use d_alloc_name() NeilBrown
2026-03-12 21:12 ` [PATCH 46/53] Remove references to d_add() in documentation and comments NeilBrown
2026-03-12 21:12 ` [PATCH 47/53] VFS: make d_alloc() local to VFS NeilBrown
2026-03-12 21:12 ` [PATCH 48/53] VFS: remove d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 49/53] VFS: remove d_rehash() NeilBrown
2026-03-12 21:12 ` [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() NeilBrown
2026-03-12 21:12 ` [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() NeilBrown
2026-03-12 21:12 ` [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock NeilBrown
2026-03-12 21:12 ` [PATCH 53/53] VFS: remove LOOKUP_SHARED NeilBrown
2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
2026-03-13  0:18   ` NeilBrown
2026-03-12 23:46 ` Linus Torvalds
2026-03-13  0:09   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox