public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops
@ 2026-04-27  4:01 NeilBrown
  2026-04-27  4:01 ` [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
                   ` (19 more replies)
  0 siblings, 20 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

[[ sorry - this v3 is identical to v2 except that the linux-kernel@vger
   address is actually correct.  Please reply to this one so that you don't
   get bounces like I did - NB ]]

This patch set progresses my effort to improve concurrency of
directory operations and specifically to allow concurrent updates
in a given directory.

It is a selection of patches from the 53-patch set I posted in March
which got relatively little response.  Maybe a shorter set will be more
approachable.

This set:
 - prepares the VFS in various ways
 - make use of these preparations in ovl and NFS (the most challenging
   filesystems for lookup as they do the most interesting things)
 - make use in efivars and shmem which for different reasons need a small 
   change that seemed worth including here.

The goal that these patch work towards is moving lookup out of i_rwsem
on the directory - except for the actual ->lookup call.  This is itself
a step towards allowing broad concurrency of operations in a given
directory.

There are two particular requirements before lookup can move outside the lock:
1/ d_drop() mustn't be used before an operation completes: the dentry being present
   in the dcache becomes part of the locking protocol.  This in turn requires
   d_splice_alias() to work with hashed negative dentries.
2/ d_alloc_parallel() mustn't be called while i_rw_sem is held, as this would
   result in a lock inversion.  So d_alloc_noblock and others are introduced
   to handle the various cases.
   In a few cases we need to drop and re-take i_rw_sem inside ->lookup.
   As lookup might be called with a shared or exclusive lock this requires
   a new LOOKUP_SHARED flag which is ugly but can be removed after the
   lookup is moved out of the lock (then ->lookup will only ever be called
   with a shared lock).

The full set of patches including these 19 and the rest to complete the
lifting of lookup out of the exclusive lock can be found at
   github/neilbrown/linux in branch pdirops

Significant changes since last time are:
 - use wait_var_event for d_alloc_parallel() rather than effectively
   duplicating that infrastructure - as suggested by Christop
 - changes to ovl_readdir handling as discussed with Amir.

Thanks,
NeilBrown

 [PATCH v3 01/19] VFS: fix various typos in documentation for
 [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup
 [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash
 [PATCH v3 04/19] VFS: use wait_var_event for waiting in
 [PATCH v3 05/19] VFS: introduce d_alloc_noblock()
 [PATCH v3 06/19] VFS: add d_duplicate()
 [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
 [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across
 [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared()
 [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return()
 [PATCH v3 11/19] efivarfs: use d_alloc_name()
 [PATCH v3 12/19] shmem: use d_duplicate()
 [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from
 [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link()
 [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias()
 [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in
 [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
 [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename
 [PATCH v3 19/19] nfs: use d_duplicate()

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

Various typos fixes.
start_creating_dentry() now documented as *creating*, not *removing* the
entry.
Unwanted spaces in Documentation/filesystems/porting.rst removed.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  8 +++----
 fs/namei.c                            | 30 +++++++++++++--------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index fdf074429cd3..bfdff4608028 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1203,7 +1203,7 @@ will fail-safe.
 
 ---
 
-** mandatory**
+**mandatory**
 
 lookup_one(), lookup_one_unlocked(), lookup_one_positive_unlocked() now
 take a qstr instead of a name and len.  These, not the "one_len"
@@ -1212,7 +1212,7 @@ that filesysmtem, through a mount point - which will have a mnt_idmap.
 
 ---
 
-** mandatory**
+**mandatory**
 
 Functions try_lookup_one_len(), lookup_one_len(),
 lookup_one_len_unlocked() and lookup_positive_unlocked() have been
@@ -1229,7 +1229,7 @@ already been performed such as after vfs_path_parent_lookup()
 
 ---
 
-** mandatory**
+**mandatory**
 
 d_hash_and_lookup() is no longer exported or available outside the VFS.
 Use try_lookup_noperm() instead.  This adds name validation and takes
@@ -1371,7 +1371,7 @@ similar.
 
 ---
 
-** mandatory**
+**mandatory**
 
 lock_rename(), lock_rename_child(), unlock_rename() are no
 longer available.  Use start_renaming() or similar.
diff --git a/fs/namei.c b/fs/namei.c
index c7fac83c9a85..65e60536a6d1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2942,8 +2942,8 @@ struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
  * end_dirop - signal completion of a dirop
  * @de: the dentry which was returned by start_dirop or similar.
  *
- * If the de is an error, nothing happens. Otherwise any lock taken to
- * protect the dentry is dropped and the dentry itself is release (dput()).
+ * If the @de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is released (dput()).
  */
 void end_dirop(struct dentry *de)
 {
@@ -3260,7 +3260,7 @@ EXPORT_SYMBOL(lookup_one_unlocked);
  * the i_rwsem itself if necessary.  If a fatal signal is pending or
  * delivered, it will return %-EINTR if the lock is needed.
  *
- * Returns: A dentry, possibly negative, or
+ * Returns: A positive dentry, or
  *	   - same errors as lookup_one_unlocked() or
  *	   - ERR_PTR(-EINTR) if a fatal signal is pending.
  */
@@ -3382,7 +3382,7 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
 EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
 
 /**
- * start_creating - prepare to create a given name with permission checking
+ * start_creating - prepare to access or create a given name with permission checking
  * @idmap:  idmap of the mount
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
@@ -3414,8 +3414,8 @@ EXPORT_SYMBOL(start_creating);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
- * an object from a directory.  Permission checking (MAY_EXEC) is performed
+ * Locks are taken and a lookup is performed prior to removing an object
+ * from a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
  * If the name doesn't exist, an error is returned.
@@ -3441,7 +3441,7 @@ EXPORT_SYMBOL(start_removing);
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
  *
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
  * an object in a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
@@ -3470,7 +3470,7 @@ EXPORT_SYMBOL(start_creating_killable);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
  * an object from a directory.  Permission checking (MAY_EXEC) is performed
  * against @idmap.
  *
@@ -3500,7 +3500,7 @@ EXPORT_SYMBOL(start_removing_killable);
  * @parent: directory in which to prepare to create the name
  * @name:   the name to be created
  *
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
  * an object in a directory.
  *
  * If the name already exists, a positive dentry is returned.
@@ -3523,7 +3523,7 @@ EXPORT_SYMBOL(start_creating_noperm);
  * @parent: directory in which to find the name
  * @name:   the name to be removed
  *
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
  * an object from a directory.
  *
  * If the name doesn't exist, an error is returned.
@@ -3544,11 +3544,11 @@ struct dentry *start_removing_noperm(struct dentry *parent,
 EXPORT_SYMBOL(start_removing_noperm);
 
 /**
- * start_creating_dentry - prepare to create a given dentry
- * @parent: directory from which dentry should be removed
- * @child:  the dentry to be removed
+ * start_creating_dentry - prepare to access or create a given dentry
+ * @parent: directory of dentry
+ * @child:  the dentry to be prepared
  *
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
  * the validity of the dentry is checked: correct parent and still hashed.
  *
  * If the dentry is valid and negative a reference is taken and
@@ -3581,7 +3581,7 @@ EXPORT_SYMBOL(start_creating_dentry);
  * @parent: directory from which dentry should be removed
  * @child:  the dentry to be removed
  *
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
  * the validity of the dentry is checked: correct parent and still hashed.
  *
  * If the dentry is valid and positive, a reference is taken and
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup dentries
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
  2026-04-27  4:01 ` [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

We currently have three interfaces for attaching existing inodes to
normal filesystems(*).
- d_add() requires an unhashed or in-lookup dentry and doesn't handle
  splicing in case a directory already has dentry
- d_instantiate() requires a hashed dentry, and also doesn't handle
  splicing.
- d_splice_alias() requires unhashed or in-lookup and does handle
  splicing, and can return an alternate dentry.

So there is no interface that supports both hashed and in-lookup, which
is what ->atomic_open needs to deal with.

Some filesystems check for in-lookup in their atomic_open and if found,
perform a ->lookup and can subsequently use d_instantiate() if the
dentry is still negative.  Others d_drop() the dentry so they can use
d_splice_alias().

This last will cause a problem for proposed changes to locking which
require the dentry to remain hashed while and operation proceeds on it.

There is also no interface which splices a directory (which might
already have a dentry) to a hashed dentry.  Filesystems which need to do
this d_drop() first.

Some filesystemss (NFS) skip ->lookup processes for
  LOOKUP_CREATE|LOOKUP_EXCL
which includes mknod, link, symlink etc.  So these inode operations
might get an unhashed or a hashed-negative dentry.  There is no
interface for instantiating these so against they need to unhash first.

So with this patch d_splice_alias() can handle hashed, unhashed, or
in-lookup dentries.  This makes it suitable for ->lookup, ->atomic_open,
and ->mkdir and others.

As a side effect d_add() will also now handle hashed dentries, but
I have plans to remove d_add() as there is no benefit having it as
well as the others.

__d_add() currently contains code that is identical to
__d_instantiate(), so the former is changed to call the later, and both
d_add() and d_instantiate() call __d_add().

* There is also d_make_persistent() for filesystems which are
  dcache-based and don't support mkdir, create etc, and
  d_instantiate_new() for newly created inodes that are still locked.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/vfs.rst |  4 ++--
 fs/dcache.c                       | 31 ++++++++++++-------------------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..d8df0a84cdba 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -507,8 +507,8 @@ otherwise noted.
 	dentry before the first mkdir returns.
 
 	If there is any chance this could happen, then the new inode
-	should be d_drop()ed and attached with d_splice_alias().  The
-	returned dentry (if any) should be returned by ->mkdir().
+	should be attached with d_splice_alias().  The returned
+	dentry (if any) should be returned by ->mkdir().
 
 ``rmdir``
 	called by the rmdir(2) system call.  Only required if you want
diff --git a/fs/dcache.c b/fs/dcache.c
index 2c61aeea41f4..789544525c56 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2068,7 +2068,6 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
  * (or otherwise set) by the caller to indicate that it is now
  * in use by the dcache.
  */
- 
 void d_instantiate(struct dentry *entry, struct inode * inode)
 {
 	BUG_ON(d_really_is_positive(entry));
@@ -2822,18 +2821,14 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
 		dir = dentry->d_parent->d_inode;
 		n = start_dir_add(dir);
 		d_wait = __d_lookup_unhash(dentry);
+		__d_rehash(dentry);
+	} else if (d_unhashed(dentry)) {
+		__d_rehash(dentry);
 	}
 	if (unlikely(ops))
 		d_set_d_op(dentry, ops);
-	if (inode) {
-		unsigned add_flags = d_flags_for_inode(inode);
-		hlist_add_head(&dentry->d_alias, &inode->i_dentry);
-		raw_write_seqcount_begin(&dentry->d_seq);
-		__d_set_inode_and_type(dentry, inode, add_flags);
-		raw_write_seqcount_end(&dentry->d_seq);
-		fsnotify_update_flags(dentry);
-	}
-	__d_rehash(dentry);
+	if (inode)
+		__d_instantiate(dentry, inode);
 	if (dir)
 		end_dir_add(dir, n, d_wait);
 	spin_unlock(&dentry->d_lock);
@@ -3133,8 +3128,6 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
 	if (IS_ERR(inode))
 		return ERR_CAST(inode);
 
-	BUG_ON(!d_unhashed(dentry));
-
 	if (!inode)
 		goto out;
 
@@ -3183,6 +3176,8 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
  * @inode:  the inode which may have a disconnected dentry
  * @dentry: a negative dentry which we want to point to the inode.
  *
+ * @dentry must be negative and may be in-lookup or unhashed or hashed.
+ *
  * If inode is a directory and has an IS_ROOT alias, then d_move that in
  * place of the given dentry and return it, else simply d_add the inode
  * to the dentry and return NULL.
@@ -3190,16 +3185,14 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
  * If a non-IS_ROOT directory is found, the filesystem is corrupt, and
  * we should error out: directories can't have multiple aliases.
  *
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
+ * This should be used to return the result of ->lookup() and to
+ * instantiate the result of ->mkdir(), is often useful for
+ * ->atomic_open, and may be used to instantiate other objects.
  *
  * If a dentry was found and moved, then it is returned.  Otherwise NULL
- * is returned.  This matches the expected return value of ->lookup.
+ * is returned.  This matches the expected return value of ->lookup and
+ * ->mkdir.
  *
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
  */
 struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
 {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
  2026-04-27  4:01 ` [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
  2026-04-27  4:01 ` [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel() NeilBrown
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

efivarfs() is similar to other filesystems which use d_alloc_name(), but
it cannot use d_alloc_name() as it has a ->d_hash function.

The only problem with using ->d_hash if available is that it can return
an error, but d_alloc_name() cannot.  If we document that d_alloc_name()
cannot be used when ->d_hash returns an error, then any filesystem which
has a safe ->d_hash can safely use d_alloc_name().

So enhance d_alloc_name() to check for a ->d_hash function
and document that this is not permitted if the ->d_hash function can
fail( which efivarfs_d_hash() cannot).

Also document locking requirements for use.

This is a step towards eventually deprecating d_alloc().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 789544525c56..d0a504fc62e5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1945,12 +1945,31 @@ struct dentry *d_alloc_pseudo(struct super_block *sb, const struct qstr *name)
 	return dentry;
 }
 
+/**
+ * d_alloc_name: allocate a dentry for use in a dcache-based filesystem.
+ * @parent: dentry of the parent for the dentry
+ * @name: name of the dentry
+ *
+ * d_alloc_name() allocates a dentry without any protection against
+ * races.  It should only be used in directories that do not support
+ * create/rename/link inode operations and so is particularly suited for
+ * use with simple_dir_inode_operations.  The result is typically passed
+ * to d_make_persistent().
+ *
+ * This must NOT be used by filesystems which provide a d_hash() function
+ * that can return an error.
+ */
 struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 {
 	struct qstr q;
 
 	q.name = name;
 	q.hash_len = hashlen_string(parent, name);
+	if (parent->d_flags & DCACHE_OP_HASH) {
+		int err = parent->d_op->d_hash(parent, &q);
+		if (WARN_ON_ONCE(err))
+			return NULL;
+	}
 	return d_alloc(parent, &q);
 }
 EXPORT_SYMBOL(d_alloc_name);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (2 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 05/19] VFS: introduce d_alloc_noblock() NeilBrown
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

d_alloc_parallel() currently requires a wait_queue_head to be passed in.
This must have a life time which extends until the lookup is completed.

This makes it awkward to use and particularly make it hard to use in
lookup_one_qstr_excl() which I hope to do in the future.

This patch changes d_alloc_parallel() to use wake_up_var_locked() to
wake up waiters, and wait_var_event_spinlock() to wait.  dentry->d_lock
is used for synchronisation as it is already held and the relevant
times.

In most cases there will be no waiters so the wake_up_var_locked()
call is a complete waste.  To optimise this a new ->d_flags flag is
added: DCACHE_LOCK_WAITERS.  This is set whenever any thread prepares to
wait for the dentry, and if it isn't set when DCACHE_PAR_LOOKUP is
cleared, no wakeup is sent.
(The name is deliberately generic as I plan to replace DCACHE_PAR_LOOKUP
with more generic per-dentry locking in the future).

__d_lookup_unhash() now returns a bool rather than a wq.  This is true
if DCACHE_LOCK_WAITERS was sent and is used to decide to send the wake
up.  It would be easier to send the wakeup immediately when clearing
DCACHE_LOCK_WAITERS, but then the waiter could wake a bit earlier and
then spend time spinning on ->d_lock.  I don't know if that cost is
interesting.

__d_lookup_unhash() no longer needs to re-init ->d_lru.  That was
previously shared (in a union) with ->d_wait but ->d_wait is now gone
so it no longer corrupts ->d_lru.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  7 +++
 fs/afs/dir_silly.c                    |  4 +-
 fs/dcache.c                           | 64 ++++++++++++---------------
 fs/fuse/readdir.c                     |  3 +-
 fs/namei.c                            |  6 +--
 fs/nfs/dir.c                          |  6 +--
 fs/nfs/unlink.c                       |  3 +-
 fs/proc/base.c                        |  3 +-
 fs/proc/proc_sysctl.c                 |  3 +-
 fs/smb/client/readdir.c               |  3 +-
 include/linux/dcache.h                | 11 +++--
 include/linux/nfs_xdr.h               |  1 -
 12 files changed, 51 insertions(+), 63 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index bfdff4608028..85c7b2007f8c 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1385,3 +1385,10 @@ for_each_alias(dentry, inode) instead of hlist_for_each_entry; better
 yet, see if any of the exported primitives could be used instead of
 the entire loop.  You still need to hold ->i_lock of the inode over
 either form of manual loop.
+
+---
+
+**mandatory**
+
+d_alloc_parallel() no longer requires a waitqueue_head.
+
diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index a748fd133faf..982bb6ec15f0 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -248,13 +248,11 @@ int afs_silly_iput(struct dentry *dentry, struct inode *inode)
 	struct dentry *alias;
 	int ret;
 
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
 	_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
 
 	down_read(&dvnode->rmdir_lock);
 
-	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq);
+	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name);
 	if (IS_ERR(alias)) {
 		up_read(&dvnode->rmdir_lock);
 		return 0;
diff --git a/fs/dcache.c b/fs/dcache.c
index d0a504fc62e5..2dcefa60db32 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2268,8 +2268,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		return found;
 	}
 	if (d_in_lookup(dentry)) {
-		found = d_alloc_parallel(dentry->d_parent, name,
-					dentry->d_wait);
+		found = d_alloc_parallel(dentry->d_parent, name);
 		if (IS_ERR(found) || !d_in_lookup(found)) {
 			iput(inode);
 			return found;
@@ -2279,7 +2278,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		if (!found) {
 			iput(inode);
 			return ERR_PTR(-ENOMEM);
-		} 
+		}
 	}
 	res = d_splice_alias(inode, found);
 	if (res) {
@@ -2657,31 +2656,24 @@ static inline unsigned start_dir_add(struct inode *dir)
 }
 
 static inline void end_dir_add(struct inode *dir, unsigned int n,
-			       wait_queue_head_t *d_wait)
+			       bool do_wake, struct dentry *de)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
 	preempt_enable_nested();
-	if (wq_has_sleeper(d_wait))
-		wake_up_all(d_wait);
+	if (do_wake)
+		wake_up_var_locked(&de->d_flags, &de->d_lock);
 }
 
-static void d_wait_lookup(struct dentry *dentry)
+static inline bool d_must_wait(struct dentry *dentry)
 {
-	if (d_in_lookup(dentry)) {
-		DECLARE_WAITQUEUE(wait, current);
-		add_wait_queue(dentry->d_wait, &wait);
-		do {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			spin_unlock(&dentry->d_lock);
-			schedule();
-			spin_lock(&dentry->d_lock);
-		} while (d_in_lookup(dentry));
-	}
+	if (!d_in_lookup(dentry))
+		return false;
+	dentry->d_flags |= DCACHE_LOCK_WAITER;
+	return true;
 }
 
 struct dentry *d_alloc_parallel(struct dentry *parent,
-				const struct qstr *name,
-				wait_queue_head_t *wq)
+				const struct qstr *name)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2763,7 +2755,9 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		 * wait for them to finish
 		 */
 		spin_lock(&dentry->d_lock);
-		d_wait_lookup(dentry);
+		wait_var_event_spinlock(&dentry->d_flags,
+					!d_must_wait(dentry),
+					&dentry->d_lock);
 		/*
 		 * it's not in-lookup anymore; in principle we should repeat
 		 * everything from dcache lookup, but it's likely to be what
@@ -2784,7 +2778,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		return dentry;
 	}
 	rcu_read_unlock();
-	new->d_wait = wq;
 	hlist_bl_add_head(&new->d_in_lookup_hash, b);
 	hlist_bl_unlock(b);
 	return new;
@@ -2800,29 +2793,29 @@ EXPORT_SYMBOL(d_alloc_parallel);
  * - Retrieve and clear the waitqueue head in dentry
  * - Return the waitqueue head
  */
-static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
+static bool __d_lookup_unhash(struct dentry *dentry)
 {
-	wait_queue_head_t *d_wait;
 	struct hlist_bl_head *b;
+	bool do_wake;
 
 	lockdep_assert_held(&dentry->d_lock);
 
 	b = in_lookup_hash(dentry->d_parent, dentry->d_name.hash);
 	hlist_bl_lock(b);
-	dentry->d_flags &= ~DCACHE_PAR_LOOKUP;
 	__hlist_bl_del(&dentry->d_in_lookup_hash);
-	d_wait = dentry->d_wait;
-	dentry->d_wait = NULL;
+	do_wake = !!(dentry->d_flags & DCACHE_LOCK_WAITER);
+	dentry->d_flags &= ~(DCACHE_PAR_LOOKUP|DCACHE_LOCK_WAITER);
+
 	hlist_bl_unlock(b);
 	dentry->waiters = NULL;
-	INIT_LIST_HEAD(&dentry->d_lru);
-	return d_wait;
+	return do_wake;
 }
 
 void __d_lookup_unhash_wake(struct dentry *dentry)
 {
 	spin_lock(&dentry->d_lock);
-	wake_up_all(__d_lookup_unhash(dentry));
+	if (__d_lookup_unhash(dentry))
+		wake_up_var_locked(&dentry->d_flags, &dentry->d_lock);
 	spin_unlock(&dentry->d_lock);
 }
 EXPORT_SYMBOL(__d_lookup_unhash_wake);
@@ -2832,14 +2825,15 @@ EXPORT_SYMBOL(__d_lookup_unhash_wake);
 static inline void __d_add(struct dentry *dentry, struct inode *inode,
 			   const struct dentry_operations *ops)
 {
-	wait_queue_head_t *d_wait;
+	bool do_wake = false;
 	struct inode *dir = NULL;
 	unsigned n;
+
 	spin_lock(&dentry->d_lock);
 	if (unlikely(d_in_lookup(dentry))) {
 		dir = dentry->d_parent->d_inode;
 		n = start_dir_add(dir);
-		d_wait = __d_lookup_unhash(dentry);
+		do_wake = __d_lookup_unhash(dentry);
 		__d_rehash(dentry);
 	} else if (d_unhashed(dentry)) {
 		__d_rehash(dentry);
@@ -2849,7 +2843,7 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
 	if (inode)
 		__d_instantiate(dentry, inode);
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, do_wake, dentry);
 	spin_unlock(&dentry->d_lock);
 	if (inode)
 		spin_unlock(&inode->i_lock);
@@ -2962,7 +2956,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 		     bool exchange)
 {
 	struct dentry *old_parent, *p;
-	wait_queue_head_t *d_wait;
+	bool do_wake = false;
 	struct inode *dir = NULL;
 	unsigned n;
 
@@ -2993,7 +2987,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 	if (unlikely(d_in_lookup(target))) {
 		dir = target->d_parent->d_inode;
 		n = start_dir_add(dir);
-		d_wait = __d_lookup_unhash(target);
+		do_wake = __d_lookup_unhash(target);
 	}
 
 	write_seqcount_begin(&dentry->d_seq);
@@ -3033,7 +3027,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 	write_seqcount_end(&dentry->d_seq);
 
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, do_wake, target);
 
 	if (dentry->d_parent != old_parent)
 		spin_unlock(&dentry->d_parent->d_lock);
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index db5ae8ec1030..a2361f1d9905 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -164,7 +164,6 @@ static int fuse_direntplus_link(struct file *file,
 	struct inode *dir = d_inode(parent);
 	struct fuse_conn *fc;
 	struct inode *inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	int epoch;
 
 	if (!o->nodeid) {
@@ -201,7 +200,7 @@ static int fuse_direntplus_link(struct file *file,
 	dentry = d_lookup(parent, &name);
 	if (!dentry) {
 retry:
-		dentry = d_alloc_parallel(parent, &name, &wq);
+		dentry = d_alloc_parallel(parent, &name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
diff --git a/fs/namei.c b/fs/namei.c
index 65e60536a6d1..a6349b31fdb6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1891,13 +1891,12 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 {
 	struct dentry *dentry, *old;
 	struct inode *inode = dir->d_inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	/* Don't go there if it's already dead */
 	if (unlikely(IS_DEADDIR(inode)))
 		return ERR_PTR(-ENOENT);
 again:
-	dentry = d_alloc_parallel(dir, name, &wq);
+	dentry = d_alloc_parallel(dir, name);
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
@@ -4414,7 +4413,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -4423,7 +4421,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	dentry = d_lookup(dir, &nd->last);
 	for (;;) {
 		if (!dentry) {
-			dentry = d_alloc_parallel(dir, &nd->last, &wq);
+			dentry = d_alloc_parallel(dir, &nd->last);
 			if (IS_ERR(dentry))
 				return dentry;
 		}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e9ce1883288c..9580af999d70 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -726,7 +726,6 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		unsigned long dir_verifier)
 {
 	struct qstr filename = QSTR_INIT(entry->name, entry->len);
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct dentry *dentry;
 	struct dentry *alias;
 	struct inode *inode;
@@ -755,7 +754,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 	dentry = d_lookup(parent, &filename);
 again:
 	if (!dentry) {
-		dentry = d_alloc_parallel(parent, &filename, &wq);
+		dentry = d_alloc_parallel(parent, &filename);
 		if (IS_ERR(dentry))
 			return;
 	}
@@ -2106,7 +2105,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		    struct file *file, unsigned open_flags,
 		    umode_t mode)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct nfs_open_context *ctx;
 	struct dentry *res;
 	struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -2162,7 +2160,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		d_drop(dentry);
 		switched = true;
 		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name, &wq);
+					  &dentry->d_name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 		if (unlikely(!d_in_lookup(dentry)))
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index df3ca4669df6..43ea897943c0 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -124,7 +124,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
 	struct dentry *alias;
 
 	down_read_non_owner(&NFS_I(dir)->rmdir_sem);
-	alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
+	alias = d_alloc_parallel(dentry->d_parent, &data->args.name);
 	if (IS_ERR(alias)) {
 		up_read_non_owner(&NFS_I(dir)->rmdir_sem);
 		return 0;
@@ -185,7 +185,6 @@ nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
 
 	data->cred = get_current_cred();
 	data->res.dir_attr = &data->dir_attr;
-	init_waitqueue_head(&data->wq);
 
 	status = -EBUSY;
 	spin_lock(&dentry->d_lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..d55a4b603188 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2132,8 +2132,7 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx,
 		goto end_instantiate;
 
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			goto end_instantiate;
 		if (d_in_lookup(child)) {
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 49ab74e0bfde..04a382178c65 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -692,8 +692,7 @@ static bool proc_sys_fill_cache(struct file *file,
 
 	child = d_lookup(dir, &qname);
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			return false;
 		if (d_in_lookup(child)) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index be22bbc4a65a..a8995261831c 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -73,7 +73,6 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 	struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
 	bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions;
 	bool reparse_need_reval = false;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	int rc;
 
 	cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
@@ -105,7 +104,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 		    (fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
 			return;
 
-		dentry = d_alloc_parallel(parent, name, &wq);
+		dentry = d_alloc_parallel(parent, name);
 	}
 	if (IS_ERR(dentry))
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 2577c05f84ec..14b91a7d0bb6 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -116,10 +116,7 @@ struct dentry {
 					 * possible!
 					 */
 
-	union {
-		struct list_head d_lru;		/* LRU list */
-		wait_queue_head_t *d_wait;	/* in-lookup ones only */
-	};
+	struct list_head d_lru;		/* LRU list */
 	struct hlist_node d_sib;	/* child of parent list */
 	struct hlist_head d_children;	/* our children */
 	/*
@@ -210,6 +207,9 @@ enum dentry_flags {
 	DCACHE_REFERENCED		= BIT(6),	/* Recently used, don't discard. */
 	DCACHE_DONTCACHE		= BIT(7),	/* Purge from memory on final dput() */
 	DCACHE_CANT_MOUNT		= BIT(8),
+	DCACHE_LOCK_WAITER		= BIT(9),	/* A thread is waiting for
+							 * PAR_LOOKUP to clear
+							 */
 	DCACHE_SHRINK_LIST		= BIT(10),
 	DCACHE_OP_WEAK_REVALIDATE	= BIT(11),
 	/*
@@ -256,8 +256,7 @@ extern void d_delete(struct dentry *);
 /* allocate/de-allocate */
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
-extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
-					wait_queue_head_t *);
+extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index fcbd21b5685f..6aced49d5f00 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1743,7 +1743,6 @@ struct nfs_unlinkdata {
 	struct nfs_removeargs args;
 	struct nfs_removeres res;
 	struct dentry *dentry;
-	wait_queue_head_t wq;
 	const struct cred *cred;
 	struct nfs_fattr dir_attr;
 	long timeout;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 05/19] VFS: introduce d_alloc_noblock()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (3 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 06/19] VFS: add d_duplicate() NeilBrown
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

Several filesystems use the results of readdir to prime the dcache.
These filesystems use d_alloc_parallel() which can block if there is a
concurrent lookup.  Blocking in that case is pointless as the lookup
will add info to the dcache and there is no value in the readdir waiting
to see if it should add the info too.

Also these calls to d_alloc_parallel() are made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So this patch introduces d_alloc_noblock() which doesn't block but
instead returns ERR_PTR(-EWOULDBLOCK).  Filesystems that prime the
dcache (smb/client, nfs, fuse, cephfs) can now use that and ignore
-EWOULDBLOCK errors as harmless.

Unlike d_alloc_parallel(), d_alloc_noblock() calculates the hash and
performs a lookup before an allocation, as that is what all callers
want.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 91 +++++++++++++++++++++++++++++++++++++++---
 include/linux/dcache.h |  1 +
 2 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 2dcefa60db32..dc06e70695d2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -32,6 +32,7 @@
 #include <linux/bit_spinlock.h>
 #include <linux/rculist_bl.h>
 #include <linux/list_lru.h>
+#include <linux/namei.h>
 #include "internal.h"
 #include "mount.h"
 
@@ -2672,8 +2673,16 @@ static inline bool d_must_wait(struct dentry *dentry)
 	return true;
 }
 
-struct dentry *d_alloc_parallel(struct dentry *parent,
-				const struct qstr *name)
+/* What to do when __d_alloc_parallel finds a d_in_lookup dentry */
+enum alloc_para {
+	ALLOC_PARA_WAIT,
+	ALLOC_PARA_FAIL,
+};
+
+static inline
+struct dentry *__d_alloc_parallel(struct dentry *parent,
+				  const struct qstr *name,
+				  enum alloc_para how)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2755,9 +2764,20 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		 * wait for them to finish
 		 */
 		spin_lock(&dentry->d_lock);
-		wait_var_event_spinlock(&dentry->d_flags,
-					!d_must_wait(dentry),
-					&dentry->d_lock);
+		if (d_in_lookup(dentry))
+			switch (how) {
+			case ALLOC_PARA_FAIL:
+				spin_unlock(&dentry->d_lock);
+				dput(new);
+				dput(dentry);
+				return ERR_PTR(-EWOULDBLOCK);
+			case ALLOC_PARA_WAIT:
+				wait_var_event_spinlock(&dentry->d_flags,
+							!d_must_wait(dentry),
+							&dentry->d_lock);
+				/* ... and continue */
+			}
+
 		/*
 		 * it's not in-lookup anymore; in principle we should repeat
 		 * everything from dcache lookup, but it's likely to be what
@@ -2786,8 +2806,69 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 	dput(dentry);
 	goto retry;
 }
+
+/**
+ * d_alloc_parallel() - allocate a new dentry and ensure uniqueness
+ * @parent - dentry of the parent
+ * @name   - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup(), then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_parallel() waits for
+ * that lookup to complete before returning the dentry and then ensures the
+ * match is still valid.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error %-ENOMEM.
+ */
+struct dentry *d_alloc_parallel(struct dentry *parent,
+				const struct qstr *name)
+{
+	return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
+}
 EXPORT_SYMBOL(d_alloc_parallel);
 
+/**
+ * d_alloc_noblock() - find or allocate a new dentry
+ * @parent - dentry of the parent
+ * @name   - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup() then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_noblock()
+ * returns with error %-EWOULDBLOCK.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name need not already have ->hash set.
+ *
+ * Returns: the dentry, whether found or allocated, or an error
+ *    %-ENOMEM, %-EWOULDBLOCK, and anything returned by ->d_hash().
+ */
+struct dentry *d_alloc_noblock(struct dentry *parent,
+			       struct qstr *name)
+{
+	struct dentry *de;
+
+	de = try_lookup_noperm(name, parent);
+	if (!de)
+		de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
+	return de;
+}
+EXPORT_SYMBOL(d_alloc_noblock);
+
 /*
  * - Unhash the dentry
  * - Retrieve and clear the waitqueue head in dentry
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 14b91a7d0bb6..85e8570cbd48 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -257,6 +257,7 @@ extern void d_delete(struct dentry *);
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
+extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 06/19] VFS: add d_duplicate()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (4 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 05/19] VFS: introduce d_alloc_noblock() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag NeilBrown
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

Occasionally a single operation can require two sub-operations on the
same name, and it is important that a d_alloc_parallel() (once that can
be run unlocked) does not create another dentry with the same name
between the operations.

Two examples:
1/ rename where the target name (a positive dentry) needs to be
  "silly-renamed" to a temporary name so it will remain available on the
  server (NFS and AFS).  Here the same name needs to be the subject
  of one rename, and the target of another.
2/ rename where the subject needs to be replaced with a white-out
  (shmemfs).  Here the same name need to be the target of a rename
  and the target of a mknod()

In both cases the original dentry is renamed to something else, and a
replacement is instantiated, possibly as the target of d_move(), possibly
by d_instantiate().

Currently d_alloc() is used to create the dentry and the exclusive lock
on the parent ensures no other dentry is created.  When
d_alloc_parallel() is moved out of the parent lock, this will no longer
be sufficient.  In particular if the original is renamed away before the
new is instantiated, there is a window where d_alloc_parallel() could
create another name.  "silly-rename" does work in this order.  shmemfs
whiteout doesn't open this hole but is essentially the same pattern and
should use the same approach.

The new d_duplicate() creates an in-lookup dentry with the same name as
the original dentry, which must be hashed.  There is no need to check if
an in-lookup dentry exists with the same name as d_alloc_parallel() will
never try add one while the hashed dentry exists.  Once the new
in-lookup is created, d_alloc_parallel() will find it and wait for it to
complete, then use it.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 51 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |  1 +
 2 files changed, 52 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index dc06e70695d2..569a8ddf4c0d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1900,6 +1900,57 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 }
 EXPORT_SYMBOL(d_alloc);
 
+/**
+ * d_duplicate: duplicate a dentry for combined atomic operation
+ * @dentry: the dentry to duplicate
+ *
+ * Some rename operations need to be combined with another operation
+ * inside the filesystem.
+ * 1/ A cluster filesystem when renaming to an in-use file might need to
+ *   first "silly-rename" that target out of the way before the main rename
+ * 2/ A filesystem that supports white-out might want to create a whiteout
+ *   in place of the file being moved.
+ *
+ * For this they need two dentries which temporarily have the same name,
+ * before one is renamed.  d_duplicate() provides for this.  Given a
+ * positive hashed dentry, it creates a second in-lookup dentry.
+ * Because the original dentry exists, no other thread will try to
+ * create an in-lookup dentry, os there can be no race in this create.
+ *
+ * The caller should d_move() the original to a new name, often via a
+ * rename request, and should call d_lookup_done() on the newly created
+ * dentry.  If the new is instantiated and the old MUST either be moved
+ * or dropped.
+ *
+ * Parent must be locked.
+ *
+ * Returns: an in-lookup dentry, or an error.
+ */
+struct dentry *d_duplicate(struct dentry *dentry)
+{
+	unsigned int hash = dentry->d_name.hash;
+	struct dentry *parent = dentry->d_parent;
+	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
+	struct dentry *new = __d_alloc(parent->d_sb, &dentry->d_name);
+
+	if (unlikely(!new))
+		return ERR_PTR(-ENOMEM);
+
+	new->d_flags |= DCACHE_PAR_LOOKUP;
+	spin_lock(&parent->d_lock);
+	new->d_parent = dget_dlock(parent);
+	hlist_add_head(&new->d_sib, &parent->d_children);
+	if (parent->d_flags & DCACHE_DISCONNECTED)
+		new->d_flags |= DCACHE_DISCONNECTED;
+	spin_unlock(&dentry->d_parent->d_lock);
+
+	hlist_bl_lock(b);
+	hlist_bl_add_head(&new->d_in_lookup_hash, b);
+	hlist_bl_unlock(b);
+	return new;
+}
+EXPORT_SYMBOL(d_duplicate);
+
 struct dentry *d_alloc_anon(struct super_block *sb)
 {
 	return __d_alloc(sb, NULL);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 85e8570cbd48..3991f9988599 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -259,6 +259,7 @@ extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
+struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
 					  const struct dentry_operations *);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (5 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 06/19] VFS: add d_duplicate() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  7:43   ` Amir Goldstein
  2026-04-27  4:01 ` [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

Some ->lookup handlers will need to drop and retake the parent lock, so
they can safely use d_alloc_parallel().

->lookup can be called with the parent lock either exclusive or shared.

A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.

This is rather ugly, but will be gone soon after we move
d_alloc_parallel() out of the directory lock as ->lookup() will *always*
called with a shared lock on the parent.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/namei.c            | 7 ++++---
 include/linux/namei.h | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a6349b31fdb6..e77ba9d31857 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1928,7 +1928,7 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
 	struct inode *inode = dir->d_inode;
 	struct dentry *res;
 	inode_lock_shared(inode);
-	res = __lookup_slow(name, dir, flags);
+	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
 	inode_unlock_shared(inode);
 	return res;
 }
@@ -1942,7 +1942,7 @@ static struct dentry *lookup_slow_killable(const struct qstr *name,
 
 	if (inode_lock_shared_killable(inode))
 		return ERR_PTR(-EINTR);
-	res = __lookup_slow(name, dir, flags);
+	res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
 	inode_unlock_shared(inode);
 	return res;
 }
@@ -4413,6 +4413,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
+	unsigned int shared_flag = (op->open_flag & O_CREAT) ? 0 : LOOKUP_SHARED;
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -4480,7 +4481,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 
 	if (d_in_lookup(dentry)) {
 		struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
-							     nd->flags);
+							     nd->flags | shared_flag);
 		d_lookup_done(dentry);
 		if (unlikely(res)) {
 			if (IS_ERR(res)) {
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ad6dd9987b9..b3346a513d8f 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -37,8 +37,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_CREATE		BIT(17)	/* ... in object creation */
 #define LOOKUP_EXCL		BIT(18)	/* ... in target must not exist */
 #define LOOKUP_RENAME_TARGET	BIT(19)	/* ... in destination of rename() */
+#define LOOKUP_SHARED		BIT(20) /* Parent lock is held shared */
 
-/* 4 spare bits for intent */
+/* 3 spare bits for intent */
 
 /* Scoping flags for lookup. */
 #define LOOKUP_NO_SYMLINKS	BIT(24) /* No symlink crossing. */
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (6 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  7:49   ` Amir Goldstein
  2026-04-27  4:01 ` [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling NeilBrown
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.

We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.

So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c            | 18 +++++++++++++++++-
 fs/ntfs/namei.c        |  4 +++-
 fs/xfs/xfs_iops.c      |  3 ++-
 include/linux/dcache.h |  3 ++-
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 569a8ddf4c0d..a2ddfe811df3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2294,6 +2294,7 @@ EXPORT_SYMBOL(d_obtain_root);
  * @dentry: the negative dentry that was passed to the parent's lookup func
  * @inode:  the inode case-insensitive lookup has found
  * @name:   the case-exact name to be associated with the returned dentry
+ * @bool:   %true if lookup was performed with LOOKUP_SHARED
  *
  * This is to avoid filling the dcache with case-insensitive names to the
  * same inode, only the actual correct case is stored in the dcache for
@@ -2306,7 +2307,7 @@ EXPORT_SYMBOL(d_obtain_root);
  * the exact case, and return the spliced entry.
  */
 struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
-			struct qstr *name)
+			struct qstr *name, bool shared)
 {
 	struct dentry *found, *res;
 
@@ -2319,6 +2320,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		iput(inode);
 		return found;
 	}
+	/*
+	 * We are holding parent lock and so don't want to wait for a
+	 * d_in_lookup() dentry.  We can safely drop the parent lock and
+	 * reclaim it as we have exclusive access to dentry as it is
+	 * d_in_lookup() (so ->d_parent is stable) and we are near the
+	 * end ->lookup() and will shortly drop the lock anyway.
+	 */
+	if (shared)
+		inode_unlock_shared(d_inode(dentry->d_parent));
+	else
+		inode_unlock(d_inode(dentry->d_parent));
 	if (d_in_lookup(dentry)) {
 		found = d_alloc_parallel(dentry->d_parent, name);
 		if (IS_ERR(found) || !d_in_lookup(found)) {
@@ -2332,6 +2344,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 			return ERR_PTR(-ENOMEM);
 		}
 	}
+	if (shared)
+		inode_lock_shared(d_inode(dentry->d_parent));
+	else
+		inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
 	res = d_splice_alias(inode, found);
 	if (res) {
 		d_lookup_done(found);
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 10894de519c3..30dddef43300 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -8,6 +8,7 @@
 
 #include <linux/exportfs.h>
 #include <linux/iversion.h>
+#include <linux/namei.h> // for LOOKUP_SHARED
 
 #include "ntfs.h"
 #include "time.h"
@@ -310,7 +311,8 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
 		}
 		nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
 
-		dent = d_add_ci(dent, dent_inode, &nls_name);
+		dent = d_add_ci(dent, dent_inode, &nls_name,
+				!!(flags & LOOKUP_SHARED));
 		kfree(nls_name.name);
 		return dent;
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 325c2200c501..f03d832f1468 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -35,6 +35,7 @@
 #include <linux/security.h>
 #include <linux/iversion.h>
 #include <linux/fiemap.h>
+#include <linux/namei.h> // for LOOKUP_SHARED
 
 /*
  * Directories have different lock order w.r.t. mmap_lock compared to regular
@@ -369,7 +370,7 @@ xfs_vn_ci_lookup(
 	/* else case-insensitive match... */
 	dname.name = ci_name.name;
 	dname.len = ci_name.len;
-	dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+	dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));
 	kfree(ci_name.name);
 	return dentry;
 }
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3991f9988599..662b98185337 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -263,7 +263,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
 extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
 					  const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+				bool);
 extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
 			const struct qstr *name);
 extern struct dentry *d_find_any_alias(struct inode *inode);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling.
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (7 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27 10:10   ` Amir Goldstein
  2026-04-27  4:01 ` [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return() NeilBrown
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

lookup_one() is expected to be removed as it does not fit well with
proposed changes to directory locking.
Specifically d_alloc_parallel() will be ordered outside of i_rwsem
and as iterate_shared() is called with i_rwsem held it is not safe
to call d_alloc_parallel().

We can instead call d_alloc_noblock() and then call the ->lookup, but
that can fail if there is a lookup attempt concurrent with the
readdir().

ovl cannot afford for the lookup to fail as that could produce incorrect
results, and it cannot safely drop i_rwsem temporarily and that could
introduce races with handling of the directory cache.

Instead we rely on the fact that ovl_iterate() has an exclusive lock on
the directory, so any concurrent lookup will wait for the ovl_iterate()
call to complete.  We allocate a separate dentry and if the lookup is
successful, it is hashed with the result.

When the concurrent lookup gets i_rwsem it mustn't do its own lookup -
it must use the existing dentry.  This is found, if it exists, using
try_lookup_noperm().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/overlayfs/namei.c   | 12 ++++++++++++
 fs/overlayfs/readdir.c | 24 ++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index ca899fdfaafd..69032eb2b1e2 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1385,6 +1385,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct ovl_entry *poe = OVL_E(dentry->d_parent);
 	bool check_redirect = (ovl_redirect_follow(ofs) || ofs->numdatalayer);
+	struct dentry *alias;
 	int err;
 	struct ovl_lookup_ctx ctx = {
 		.dentry = dentry,
@@ -1399,6 +1400,17 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > ofs->namelen)
 		return ERR_PTR(-ENAMETOOLONG);
 
+	/*
+	 * The existance of this in-lookup dentry might have forced
+	 * readdir to do the lookup with a new dentry.  If so we must
+	 * return that one.
+	 */
+	alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
+					    dentry->d_name.len),
+				  dentry->d_parent);
+	if (alias && !IS_ERR(alias))
+		return alias;
+
 	with_ovl_creds(dentry->d_sb)
 		err = ovl_lookup_layers(&ctx, &d);
 
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1dcc75b3a90f..e03b32491941 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -574,8 +574,28 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 		}
 	}
 	/* This checks also for xwhiteouts */
-	this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
-	if (IS_ERR_OR_NULL(this) || !this->d_inode) {
+	this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
+	if (this == ERR_PTR(-EWOULDBLOCK)) {
+		/*
+		 * Some other thead is looking up this name and will
+		 * block on i_rwsem before it can complete the lookup.
+		 * We will do the lookup in a new dentry and when that
+		 * lookup gets a turn it will find and return this
+		 * dentry.
+		 */
+		this = d_alloc_name(dir, p->name);
+	}
+	if (!IS_ERR(this) && !d_unhashed(this)) {
+		/* Either we got an in-lookup or we made our own unhashed */
+		struct dentry *alias = ovl_lookup(dir->d_inode, this, 0);
+
+		if (alias) {
+			d_lookup_done(this);
+			dput(this);
+			this = alias;
+		}
+	}
+	if (IS_ERR(this) || !this->d_inode) {
 		/* Mark a stale entry */
 		p->is_whiteout = true;
 		if (IS_ERR(this)) {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (8 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  9:40   ` Amir Goldstein
  2026-04-27  4:01 ` [PATCH v3 11/19] efivarfs: use d_alloc_name() NeilBrown
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

ovl_lookup currently needs to check if a dentry with the same name has
already been added to the dcache as readdir might need to do.  This
is an unnecessary performance cost to manage a rare race.

If ovl could know which in-lookup dentries raced with readdir, it could
limit the extra lookup to just those.

So add d_alloc_noblock_return() which provides the in-lookup dentry when
it returns -EWOULDBLOCK.

ovl_readdir() can use this this and flag the dentry such that
ovl_lookup() and easily check if a repeat lookup is needed.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/dcache.c              | 50 ++++++++++++++++++++++++++++++++++++----
 fs/overlayfs/namei.c     | 23 ++++++++++--------
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/readdir.c   |  7 ++++--
 include/linux/dcache.h   |  2 ++
 5 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a2ddfe811df3..2f11257b725b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2749,7 +2749,8 @@ enum alloc_para {
 static inline
 struct dentry *__d_alloc_parallel(struct dentry *parent,
 				  const struct qstr *name,
-				  enum alloc_para how)
+				  enum alloc_para how,
+				  struct dentry **dentryp)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2836,7 +2837,10 @@ struct dentry *__d_alloc_parallel(struct dentry *parent,
 			case ALLOC_PARA_FAIL:
 				spin_unlock(&dentry->d_lock);
 				dput(new);
-				dput(dentry);
+				if (dentryp)
+					*dentryp = dentry;
+				else
+					dput(dentry);
 				return ERR_PTR(-EWOULDBLOCK);
 			case ALLOC_PARA_WAIT:
 				wait_var_event_spinlock(&dentry->d_flags,
@@ -2899,7 +2903,7 @@ struct dentry *__d_alloc_parallel(struct dentry *parent,
 struct dentry *d_alloc_parallel(struct dentry *parent,
 				const struct qstr *name)
 {
-	return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
+	return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT, NULL);
 }
 EXPORT_SYMBOL(d_alloc_parallel);
 
@@ -2931,11 +2935,49 @@ struct dentry *d_alloc_noblock(struct dentry *parent,
 
 	de = try_lookup_noperm(name, parent);
 	if (!de)
-		de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
+		de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL, NULL);
 	return de;
 }
 EXPORT_SYMBOL(d_alloc_noblock);
 
+/**
+ * d_alloc_noblock_return() - find or allocate a new dentry
+ * @parent - dentry of the parent
+ * @name   - name of the dentry within that parent.
+ * @dentryp - place to store the blocking dentry
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup() then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_noblock()
+ * returns with error %-EWOULDBLOCK and the blocking dentry is passed
+ * in @dentryp.  The dentry must be dput() by the caller.
+ *
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name need not already have ->hash set.
+ *
+ * Returns: the dentry, whether found or allocated, or an error
+ *    %-ENOMEM, %-EWOULDBLOCK, and anything returned by ->d_hash().
+ */
+struct dentry *d_alloc_noblock_return(struct dentry *parent,
+				      struct qstr *name,
+				      struct dentry **dentryp)
+{
+	struct dentry *de;
+
+	de = try_lookup_noperm(name, parent);
+	if (!de)
+		de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL,
+					dentryp);
+	return de;
+}
+EXPORT_SYMBOL(d_alloc_noblock_return);
+
 /*
  * - Unhash the dentry
  * - Retrieve and clear the waitqueue head in dentry
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 69032eb2b1e2..524e661c3c8d 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -1400,16 +1400,19 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > ofs->namelen)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	/*
-	 * The existance of this in-lookup dentry might have forced
-	 * readdir to do the lookup with a new dentry.  If so we must
-	 * return that one.
-	 */
-	alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
-					    dentry->d_name.len),
-				  dentry->d_parent);
-	if (alias && !IS_ERR(alias))
-		return alias;
+	if (ovl_dentry_test_flag(OVL_E_RACED_READDIR, dentry)) {
+		ovl_dentry_clear_flag(OVL_E_RACED_READDIR, dentry);
+		/*
+		 * The existance of this in-lookup dentry might have
+		 * forced readdir to do the lookup with a new dentry.
+		 * If so we must return that one.
+		 */
+		alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
+						    dentry->d_name.len),
+					  dentry->d_parent);
+		if (alias && !IS_ERR(alias))
+			return alias;
+	}
 
 	with_ovl_creds(dentry->d_sb)
 		err = ovl_lookup_layers(&ctx, &d);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index b75df37f70ac..bd6f1a25aca1 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -71,6 +71,8 @@ enum ovl_entry_flag {
 	OVL_E_CONNECTED,
 	/* Lower stack may contain xwhiteout entries */
 	OVL_E_XWHITEOUTS,
+	/* dentry was found in-lookup during readdir */
+	OVL_E_RACED_READDIR,
 };
 
 enum {
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index e03b32491941..e483bd939a8c 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -553,7 +553,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 {
 	struct dentry *dir = path->dentry;
 	struct ovl_fs *ofs = OVL_FS(dir->d_sb);
-	struct dentry *this = NULL;
+	struct dentry *this = NULL, *in_lookup;
 	enum ovl_path_type type;
 	u64 ino = p->real_ino;
 	int xinobits = ovl_xino_bits(ofs);
@@ -574,7 +574,8 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 		}
 	}
 	/* This checks also for xwhiteouts */
-	this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
+	this = d_alloc_noblock_return(dir, &QSTR_LEN(p->name, p->len),
+				      &in_lookup);
 	if (this == ERR_PTR(-EWOULDBLOCK)) {
 		/*
 		 * Some other thead is looking up this name and will
@@ -583,6 +584,8 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
 		 * lookup gets a turn it will find and return this
 		 * dentry.
 		 */
+		ovl_dentry_set_flag(OVL_E_RACED_READDIR, in_lookup);
+		dput(in_lookup);
 		this = d_alloc_name(dir, p->name);
 	}
 	if (!IS_ERR(this) && !d_unhashed(this)) {
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 662b98185337..db7cdcbac775 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -258,6 +258,8 @@ extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
+extern struct dentry * d_alloc_noblock_return(struct dentry *, struct qstr *,
+					      struct dentry **);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 struct dentry *d_duplicate(struct dentry *dentry);
 /* weird procfs mess; *NOT* exported */
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 11/19] efivarfs: use d_alloc_name()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (9 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 12/19] shmem: use d_duplicate() NeilBrown
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

efivarfs() is one of the few remaining users of d_alloc().
Other similar filesystems use d_alloc_name() in the same circumstances.
Now that d_alloc_name() supports ->d_hash (providing that it never
fails), change efivarfs to use that.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/efivarfs/super.c | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 1c5224cf183e..232d9757804c 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -189,26 +189,6 @@ static const struct dentry_operations efivarfs_d_ops = {
 	.d_hash = efivarfs_d_hash,
 };
 
-static struct dentry *efivarfs_alloc_dentry(struct dentry *parent, char *name)
-{
-	struct dentry *d;
-	struct qstr q;
-	int err;
-
-	q.name = name;
-	q.len = strlen(name);
-
-	err = efivarfs_d_hash(parent, &q);
-	if (err)
-		return ERR_PTR(err);
-
-	d = d_alloc(parent, &q);
-	if (d)
-		return d;
-
-	return ERR_PTR(-ENOMEM);
-}
-
 bool efivarfs_variable_is_present(efi_char16_t *variable_name,
 				  efi_guid_t *vendor, void *data)
 {
@@ -263,9 +243,9 @@ static int efivarfs_create_dentry(struct super_block *sb, efi_char16_t *name16,
 	memcpy(entry->var.VariableName, name16, name_size);
 	memcpy(&(entry->var.VendorGuid), &vendor, sizeof(efi_guid_t));
 
-	dentry = efivarfs_alloc_dentry(root, name);
-	if (IS_ERR(dentry)) {
-		err = PTR_ERR(dentry);
+	dentry = d_alloc_name(root, name);
+	if (!dentry) {
+		err = -ENOMEM;
 		goto fail_inode;
 	}
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 12/19] shmem: use d_duplicate()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (10 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 11/19] efivarfs: use d_alloc_name() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

To prepare for d_alloc_parallel() being permitted without a directory
lock, use d_duplicate() when duplicating a dentry in order to create a
whiteout.

Signed-off-by: NeilBrown <neil@brown.name>
---
 mm/shmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3b5dc21b323c..8b0d2340dee7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4006,11 +4006,12 @@ static int shmem_whiteout(struct mnt_idmap *idmap,
 	struct dentry *whiteout;
 	int error;
 
-	whiteout = d_alloc(old_dentry->d_parent, &old_dentry->d_name);
+	whiteout = d_duplicate(old_dentry);
 	if (!whiteout)
 		return -ENOMEM;
 	error = shmem_mknod(idmap, old_dir, whiteout,
 			    S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
+	d_lookup_done(whiteout);
 	dput(whiteout);
 	return error;
 }
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (11 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 12/19] shmem: use d_duplicate() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link() NeilBrown
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

It is important that two non-create NFS "open"s of a negative dentry
don't race.  They both have only a shared lock on i_rwsem and so could
run concurrently, but they might both try to call d_splice_alias() at
the same time which is confusing at best.

nfs_atomic_open() currently avoids this by discarding the negative
dentry and creating a new one using d_alloc_parallel().  Only one thread
can successfully get the d_in_lookup() dentry, the other will wait for
the first to finish, and can use the result of that first lookup.

A proposed locking change inverts the order between i_rwsem and
d_alloc_parallel() so it will not be safe to call d_alloc_parallel()
while holding i_rwsem - even shared.

We can achieve the same effect by causing ->d_revalidate to invalidate a
negative dentry when LOOKUP_OPEN is set.  Doing this is consistent with
the "close to open" caching semantics of NFS which requires the server
to be queried whenever opening a file - cached information must not be
trusted.

With this change to ->d_revaliate (implemented in nfs_neg_need_reval) we
can be sure that we have exclusive access to any dentry that reaches
nfs_atomic_open().  Either O_CREAT was requested and so the parent is
locked exclusively, or the dentry will have DCACHE_PAR_LOOKUP set.

This means that the d_drop() and d_alloc_parallel() calls in
nfs_atomic_lookup() are no longer needed to provide exclusion

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 30 +++++++-----------------------
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9580af999d70..0791fc2d161b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1656,6 +1656,13 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 {
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
+	if (flags & LOOKUP_OPEN)
+		/* close-to-open semantics require we go to server
+		 * on each open.  By invalidating the dentry we
+		 * also ensure nfs_atomic_open() always has exclusive
+		 * access to the dentry.
+		 */
+		return 0;
 	if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONEG)
 		return 1;
 	/* Case insensitive server? Revalidate negative dentries */
@@ -2111,7 +2118,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct inode *inode;
 	unsigned int lookup_flags = 0;
 	unsigned long dir_verifier;
-	bool switched = false;
 	int created = 0;
 	int err;
 
@@ -2156,17 +2162,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		attr.ia_size = 0;
 	}
 
-	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
-		d_drop(dentry);
-		switched = true;
-		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name);
-		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
-		if (unlikely(!d_in_lookup(dentry)))
-			return finish_no_open(file, dentry);
-	}
-
 	ctx = create_nfs_open_context(dentry, open_flags, file);
 	err = PTR_ERR(ctx);
 	if (IS_ERR(ctx))
@@ -2209,10 +2204,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 	put_nfs_open_context(ctx);
 out:
-	if (unlikely(switched)) {
-		d_lookup_done(dentry);
-		dput(dentry);
-	}
 	return err;
 
 no_open:
@@ -2235,13 +2226,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 			res = ERR_PTR(-EOPENSTALE);
 		}
 	}
-	if (switched) {
-		d_lookup_done(dentry);
-		if (!res)
-			res = dentry;
-		else
-			dput(dentry);
-	}
 	return finish_no_open(file, res);
 }
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (12 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias() NeilBrown
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

When filename_linkat() calls filename_create() which ultimately
calls ->lookup, the flags LOOKUP_CREATE|LOOKUP_EXCL are passed.
nfs_lookup() treats this as an exclusive create (which it is) and
skips the ->lookup, leaving the dentry unchanged.

Currently that means nfs_link() can get a hashed dentry (if the name was
already in the cache) or an unhashed dentry (if it wasn't). As none of
d_add(), d_instantiate(), d_splice_alias() could handle both of these,
nfs_link() calls d_drop() and then then d_add().

Recent changes to d_splice_alias() mean that it *can* work with either
hashed or unhashed dentries.  Future changes to locking mean that it
will be unsafe to d_drop() a dentry while an operation (in this case
"link()") is still ongoing.

So change to use d_splice_alias(), and not to d_drop() until an error is
detected (as in that case was can't be sure what is actually on the server).

Also update the comment for nfs_is_exclusive_create() to note that
link(), mkdir(), mknod(), symlink() all appear as exclusive creates.
Those other than link() already used d_splice_alias() via
nfs_add_or_obtain().

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 0791fc2d161b..2c1315a02e52 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1570,6 +1570,9 @@ static int nfs_check_verifier(struct inode *dir, struct dentry *dentry,
 /*
  * Use intent information to check whether or not we're going to do
  * an O_EXCL create using this path component.
+ * Note that link(), mkdir(), mknod(), symlink() all appear as
+ * exclusive creation.  Regular file creation could be distinguished
+ * with LOOKUP_OPEN.
  */
 static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
 {
@@ -2676,14 +2679,15 @@ nfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
 		old_dentry, dentry);
 
 	trace_nfs_link_enter(inode, dir, dentry);
-	d_drop(dentry);
 	if (S_ISREG(inode->i_mode))
 		nfs_sync_inode(inode);
 	error = NFS_PROTO(dir)->link(inode, dir, &dentry->d_name);
 	if (error == 0) {
 		nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 		ihold(inode);
-		d_add(dentry, inode);
+		d_splice_alias(inode, dentry);
+	} else {
+		d_drop(dentry);
 	}
 	trace_nfs_link_exit(inode, dir, dentry, error);
 	return error;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (13 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

nfs_add_or_obtain() is used, often via nfs_instantiate(), to attach a
newly created inode to the appropriate dentry - or to provide an
alternate dentry.
It has to drop the dentry first, which is problematic for proposed
locking changes.

As d_splice_alias() now works with hashed dentries, the d_drop() is no
longer needed.

However we still d_drop() on error as the status of the name is
uncertain.

nfs_open_and_get_state() is only used for files so we should be able to
use d_instantiate().  However as that depends on the server for
correctness, it is safer to stay with the current code pattern and use
d_splice_alias() there too.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c      | 3 +--
 fs/nfs/nfs4proc.c | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2c1315a02e52..e1d56400fc6a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2329,8 +2329,6 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	struct dentry *d;
 	int error;
 
-	d_drop(dentry);
-
 	if (fhandle->size == 0) {
 		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
 					       fhandle, fattr);
@@ -2351,6 +2349,7 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	dput(parent);
 	return d;
 out_error:
+	d_drop(dentry);
 	d = ERR_PTR(error);
 	goto out;
 }
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a9b8d482d289..185c933fb54c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3099,7 +3099,6 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
 	nfs_set_verifier(dentry, dir_verifier);
 	if (d_really_is_negative(dentry)) {
 		struct dentry *alias;
-		d_drop(dentry);
 		alias = d_splice_alias(igrab(state->inode), dentry);
 		/* d_splice_alias() can't fail here - it's a non-directory */
 		if (alias) {
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in atomic_create.
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (14 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

When atomic_create fails with -ENOENT we currently d_drop() the dentry
and then re-add it (d_splice_alias()) with a NULL inode.
This drop-and-re-add will not work with proposed locking changes.

As d_splice_alias() now supports hashed dentries, we don't need the
d_drop() until it is determined that some other error has occurred.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e1d56400fc6a..2e8389968317 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2178,7 +2178,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		err = PTR_ERR(inode);
 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 		put_nfs_open_context(ctx);
-		d_drop(dentry);
 		switch (err) {
 		case -ENOENT:
 			if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
@@ -2187,7 +2186,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 				dir_verifier = nfs_save_change_attribute(dir);
 			nfs_set_verifier(dentry, dir_verifier);
 			d_splice_alias(NULL, dentry);
-			break;
+			goto out;
 		case -EISDIR:
 		case -ENOTDIR:
 			goto no_open;
@@ -2199,6 +2198,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		default:
 			break;
 		}
+		d_drop(dentry);
 		goto out;
 	}
 	file->f_mode |= FMODE_CAN_ODIRECT;
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (15 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename NeilBrown
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

NFS uses the results of readdir to prime the dcache.  Using
d_alloc_parallel() can block if there is a concurrent lookup.  Blocking
in that case is pointless as the lookup will add info to the dcache and
there is no value in the readdir waiting to see if it should add the
info too.

Also this call to d_alloc_parallel() is made while the parent
directory is locked.  A proposed change to locking will lock the parent
later, after d_alloc_parallel().  This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.

So change to use d_alloc_noblock(), which removes the need for
calculating the hash or doing a preliminary lookup.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2e8389968317..ee4b9b1a484f 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -749,15 +749,12 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		if (filename.len == 2 && filename.name[1] == '.')
 			return;
 	}
-	filename.hash = full_name_hash(parent, filename.name, filename.len);
 
-	dentry = d_lookup(parent, &filename);
 again:
-	if (!dentry) {
-		dentry = d_alloc_parallel(parent, &filename);
-		if (IS_ERR(dentry))
-			return;
-	}
+	dentry = d_alloc_noblock(parent, &filename);
+	if (IS_ERR(dentry))
+		return;
+
 	if (!d_in_lookup(dentry)) {
 		/* Is there a mountpoint here? If so, just exit */
 		if (!nfs_fsid_equal(&NFS_SB(dentry->d_sb)->fsid,
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (16 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  4:01 ` [PATCH v3 19/19] nfs: use d_duplicate() NeilBrown
  2026-04-27  8:42 ` [syzbot ci] Re: Prepare to lift lookup out of exclusive lock for directory ops syzbot ci
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

Rather than performing a normal lookup (which will be awkward with
future locking changes) use d_alloc_noblock() to find a dentry for an
unused name, and then open-code the rest of lookup_slow() to see if it
is free on the server.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/unlink.c | 51 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 43ea897943c0..1ac9bd2a63b2 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -445,7 +445,7 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 	static unsigned int sillycounter;
 	unsigned char silly[SILLYNAME_LEN + 1];
 	unsigned long long fileid;
-	struct dentry *sdentry;
+	struct dentry *sdentry, *old;
 	struct inode *inode = d_inode(dentry);
 	struct rpc_task *task;
 	int            error = -EBUSY;
@@ -462,26 +462,37 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 
 	fileid = NFS_FILEID(d_inode(dentry));
 
-	sdentry = NULL;
-	do {
+newname:
+	sillycounter++;
+	scnprintf(silly, sizeof(silly),
+		  SILLYNAME_PREFIX "%0*llx%0*x",
+		  SILLYNAME_FILEID_LEN, fileid,
+		  SILLYNAME_COUNTER_LEN, sillycounter);
+
+	dfprintk(VFS, "NFS: trying to rename %pd to %s\n", dentry, silly);
+	sdentry = d_alloc_noblock(dentry->d_parent, &QSTR(silly));
+	if (sdentry == ERR_PTR(-EWOULDBLOCK))
+		/* Name currently being looked up */
+		goto newname;
+	/*
+	 * N.B. Better to return EBUSY here ... it could be
+	 * dangerous to delete the file while it's in use.
+	 */
+	if (IS_ERR(sdentry))
+		goto out;
+	if (d_is_positive(sdentry)) {
 		dput(sdentry);
-		sillycounter++;
-		scnprintf(silly, sizeof(silly),
-			  SILLYNAME_PREFIX "%0*llx%0*x",
-			  SILLYNAME_FILEID_LEN, fileid,
-			  SILLYNAME_COUNTER_LEN, sillycounter);
-
-		dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
-				dentry, silly);
-
-		sdentry = lookup_noperm(&QSTR(silly), dentry->d_parent);
-		/*
-		 * N.B. Better to return EBUSY here ... it could be
-		 * dangerous to delete the file while it's in use.
-		 */
-		if (IS_ERR(sdentry))
-			goto out;
-	} while (d_inode(sdentry) != NULL); /* need negative lookup */
+		goto newname;
+	}
+	/* This name isn't known locally - check on server */
+	old = dir->i_op->lookup(dir, sdentry, 0);
+	d_lookup_done(sdentry);
+	if (old || d_is_positive(sdentry)) {
+		if (!IS_ERR(old))
+			dput(old);
+		dput(sdentry);
+		goto newname;
+	}
 
 	ihold(inode);
 
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 19/19] nfs: use d_duplicate()
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (17 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename NeilBrown
@ 2026-04-27  4:01 ` NeilBrown
  2026-04-27  8:42 ` [syzbot ci] Re: Prepare to lift lookup out of exclusive lock for directory ops syzbot ci
  19 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  4:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Amir Goldstein, Jeremy Kerr, Ard Biesheuvel
  Cc: linux-efi, linux-fsdevel, linux-nfs, linux-unionfs, linux-kernel

From: NeilBrown <neil@brown.name>

As preparation for d_alloc_parallel() being allowed without the
directory locked, use d_duplicate() to duplicate a dentry for silly
rename.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ee4b9b1a484f..dd48dea8bc40 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2778,11 +2778,9 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			spin_unlock(&new_dentry->d_lock);
 
 			/* copy the target dentry's name */
-			dentry = d_alloc(new_dentry->d_parent,
-					 &new_dentry->d_name);
+			dentry = d_duplicate(new_dentry);
 			if (!dentry)
 				goto out;
-
 			/* silly-rename the existing target ... */
 			err = nfs_sillyrename(new_dir, new_dentry);
 			if (err)
@@ -2847,8 +2845,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		nfs_dentry_handle_enoent(old_dentry);
 
 	/* new dentry created? */
-	if (dentry)
+	if (dentry) {
+		d_lookup_done(dentry);
 		dput(dentry);
+	}
 	return error;
 }
 EXPORT_SYMBOL_GPL(nfs_rename);
-- 
2.50.0.107.gf914562f5916.dirty


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
  2026-04-27  4:01 ` [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag NeilBrown
@ 2026-04-27  7:43   ` Amir Goldstein
  2026-04-27  8:47     ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2026-04-27  7:43 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, Apr 27, 2026 at 6:06 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> Some ->lookup handlers will need to drop and retake the parent lock, so
> they can safely use d_alloc_parallel().
>
> ->lookup can be called with the parent lock either exclusive or shared.
>
> A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
>
> This is rather ugly, but will be gone soon after we move
> d_alloc_parallel() out of the directory lock as ->lookup() will *always*
> called with a shared lock on the parent.

Neil,

Forgive me for being skeptical about the *always* part.

How long ago did we add ->iterate_shared()?

It's true that Linus eventually got rid of ->iterate(), but we did not
get rid of the assumption that iterate_shared() might be upgraded
to exclusive lock.

The obvious reason is that *someone* needs to do this work for
old filesystems, which are also hard to test and nobody wants to
touch them.

I have nothing against this patch, but I think it is more realistic
to state that LOOKUP_SHARED is here to stay, so if you think it
is too ugly, maybe there is something to be done about it.
Personally, I do not see the ugliness though.

Am I misjudging the situation of shared lookup wrt old filesystems?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci()
  2026-04-27  4:01 ` [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
@ 2026-04-27  7:49   ` Amir Goldstein
  2026-04-27  8:48     ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2026-04-27  7:49 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, Apr 27, 2026 at 6:07 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> A proposed change will invert the lock ordering between
> d_alloc_parallel() and inode_lock() on the parent.
> When that happens it will not be safe to call d_alloc_parallel() while
> holding the parent lock - even shared.
>
> We don't need to keep the parent lock held when d_add_ci() is run - the
> VFS doesn't need it as dentry is exclusively held due to
> DCACHE_PAR_LOOKUP and the filesystem has finished its work.
>
> So drop and reclaim the lock (shared or exclusive as determined by
> LOOKUP_SHARED) to avoid future deadlock.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/dcache.c            | 18 +++++++++++++++++-
>  fs/ntfs/namei.c        |  4 +++-
>  fs/xfs/xfs_iops.c      |  3 ++-
>  include/linux/dcache.h |  3 ++-
>  4 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 569a8ddf4c0d..a2ddfe811df3 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2294,6 +2294,7 @@ EXPORT_SYMBOL(d_obtain_root);
>   * @dentry: the negative dentry that was passed to the parent's lookup func
>   * @inode:  the inode case-insensitive lookup has found
>   * @name:   the case-exact name to be associated with the returned dentry
> + * @bool:   %true if lookup was performed with LOOKUP_SHARED

I do not understand the choice of API.
Seems better to pass the lookup flags and avoid exposing this very internal
logic in the call sites...

>   *
>   * This is to avoid filling the dcache with case-insensitive names to the
>   * same inode, only the actual correct case is stored in the dcache for
> @@ -2306,7 +2307,7 @@ EXPORT_SYMBOL(d_obtain_root);
>   * the exact case, and return the spliced entry.
>   */
>  struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
> -                       struct qstr *name)
> +                       struct qstr *name, bool shared)
>  {
>         struct dentry *found, *res;
>
> @@ -2319,6 +2320,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
>                 iput(inode);
>                 return found;
>         }
> +       /*
> +        * We are holding parent lock and so don't want to wait for a
> +        * d_in_lookup() dentry.  We can safely drop the parent lock and
> +        * reclaim it as we have exclusive access to dentry as it is
> +        * d_in_lookup() (so ->d_parent is stable) and we are near the
> +        * end ->lookup() and will shortly drop the lock anyway.
> +        */
> +       if (shared)
> +               inode_unlock_shared(d_inode(dentry->d_parent));
> +       else
> +               inode_unlock(d_inode(dentry->d_parent));
>         if (d_in_lookup(dentry)) {
>                 found = d_alloc_parallel(dentry->d_parent, name);
>                 if (IS_ERR(found) || !d_in_lookup(found)) {
> @@ -2332,6 +2344,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
>                         return ERR_PTR(-ENOMEM);
>                 }
>         }
> +       if (shared)
> +               inode_lock_shared(d_inode(dentry->d_parent));
> +       else
> +               inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
>         res = d_splice_alias(inode, found);
>         if (res) {
>                 d_lookup_done(found);
> diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
> index 10894de519c3..30dddef43300 100644
> --- a/fs/ntfs/namei.c
> +++ b/fs/ntfs/namei.c
> @@ -8,6 +8,7 @@
>
>  #include <linux/exportfs.h>
>  #include <linux/iversion.h>
> +#include <linux/namei.h> // for LOOKUP_SHARED

... this won't be needed

>
>  #include "ntfs.h"
>  #include "time.h"
> @@ -310,7 +311,8 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
>                 }
>                 nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
>
> -               dent = d_add_ci(dent, dent_inode, &nls_name);
> +               dent = d_add_ci(dent, dent_inode, &nls_name,
> +                               !!(flags & LOOKUP_SHARED));
>                 kfree(nls_name.name);
>                 return dent;
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 325c2200c501..f03d832f1468 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -35,6 +35,7 @@
>  #include <linux/security.h>
>  #include <linux/iversion.h>
>  #include <linux/fiemap.h>
> +#include <linux/namei.h> // for LOOKUP_SHARED
>
>  /*
>   * Directories have different lock order w.r.t. mmap_lock compared to regular
> @@ -369,7 +370,7 @@ xfs_vn_ci_lookup(
>         /* else case-insensitive match... */
>         dname.name = ci_name.name;
>         dname.len = ci_name.len;
> -       dentry = d_add_ci(dentry, VFS_I(ip), &dname);
> +       dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));


... and this ugliness could be avoided.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [syzbot ci] Re: Prepare to lift lookup out of exclusive lock for directory ops
  2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
                   ` (18 preceding siblings ...)
  2026-04-27  4:01 ` [PATCH v3 19/19] nfs: use d_duplicate() NeilBrown
@ 2026-04-27  8:42 ` syzbot ci
  19 siblings, 0 replies; 28+ messages in thread
From: syzbot ci @ 2026-04-27  8:42 UTC (permalink / raw)
  To: amir73il, anna, ardb, brauner, jack, jk, jlayton, linux-efi,
	linux-fsdevel, linux-kernel, linux-nfs, linux-unionfs, miklos,
	neilb, torvalds, trondmy, viro
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v3] Prepare to lift lookup out of exclusive lock for directory ops
https://lore.kernel.org/all/20260427040517.828226-1-neilb@ownmail.net
* [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc
* [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup dentries
* [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash
* [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel()
* [PATCH v3 05/19] VFS: introduce d_alloc_noblock()
* [PATCH v3 06/19] VFS: add d_duplicate()
* [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
* [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci()
* [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling.
* [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return()
* [PATCH v3 11/19] efivarfs: use d_alloc_name()
* [PATCH v3 12/19] shmem: use d_duplicate()
* [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open()
* [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link()
* [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias()
* [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in atomic_create.
* [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
* [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename
* [PATCH v3 19/19] nfs: use d_duplicate()

and found the following issues:
* KASAN: slab-out-of-bounds Read in __dentry_kill
* WARNING in __d_instantiate

Full report is available here:
https://ci.syzbot.org/series/9ec82ecc-cc80-4fe2-b595-e5c9d7c49aae

***

KASAN: slab-out-of-bounds Read in __dentry_kill

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      254f49634ee16a731174d2ae34bc50bd5f45e731
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/01f9ecd4-e5ae-4e96-aff5-2636e33c0528/config
syz repro: https://ci.syzbot.org/findings/ef8289e5-b522-4c69-bed5-f7be42e035c2/syz_repro

overlayfs: "xino" feature enabled using 3 upper inode bits.
==================================================================
BUG: KASAN: slab-out-of-bounds in __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
BUG: KASAN: slab-out-of-bounds in _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
Read of size 1 at addr ffff888119c6e5b8 by task syz.0.17/5869

CPU: 0 UID: 0 PID: 5869 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 __kasan_check_byte+0x2a/0x40 mm/kasan/common.c:574
 kasan_check_byte include/linux/kasan.h:402 [inline]
 lock_acquire+0x84/0x350 kernel/locking/lockdep.c:5842
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
 _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
 complete_with_flags kernel/sched/completion.c:25 [inline]
 complete+0x28/0x1b0 kernel/sched/completion.c:52
 d_complete_waiters fs/dcache.c:651 [inline]
 dentry_unlist fs/dcache.c:664 [inline]
 __dentry_kill+0x552/0x690 fs/dcache.c:733
 finish_dput+0xc9/0x480 fs/dcache.c:928
 ovl_cache_update+0x68e/0xc30 fs/overlayfs/readdir.c:643
 ovl_iterate_merged fs/overlayfs/readdir.c:882 [inline]
 ovl_iterate+0x686/0x21a0 fs/overlayfs/readdir.c:930
 wrap_directory_iterator+0x96/0xe0 fs/readdir.c:67
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7aa699cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7aa780a028 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
RAX: ffffffffffffffda RBX: 00007f7aa6c15fa0 RCX: 00007f7aa699cdd9
RDX: 0000000000001000 RSI: 0000200000000400 RDI: 0000000000000003
RBP: 00007f7aa6a32d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7aa6c16038 R14: 00007f7aa6c15fa0 R15: 00007ffce28f8638
 </TASK>

Allocated by task 5869:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4569 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 kmem_cache_alloc_lru_noprof+0x2b8/0x640 mm/slub.c:4917
 __d_alloc+0x37/0x6f0 fs/dcache.c:1808
 __d_alloc_parallel+0xe3/0x1660 fs/dcache.c:2758
 ovl_cache_update+0x2c4/0xc30 fs/overlayfs/readdir.c:577
 ovl_iterate_merged fs/overlayfs/readdir.c:882 [inline]
 ovl_iterate+0x686/0x21a0 fs/overlayfs/readdir.c:930
 wrap_directory_iterator+0x96/0xe0 fs/readdir.c:67
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3131 [inline]
 call_rcu+0xee/0x890 kernel/rcu/tree.c:3251
 __dentry_kill+0x4a9/0x690 fs/dcache.c:738
 finish_dput+0xc9/0x480 fs/dcache.c:928
 ovl_cache_update+0x68e/0xc30 fs/overlayfs/readdir.c:643
 ovl_iterate_merged fs/overlayfs/readdir.c:882 [inline]
 ovl_iterate+0x686/0x21a0 fs/overlayfs/readdir.c:930
 wrap_directory_iterator+0x96/0xe0 fs/readdir.c:67
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888119c6e468
 which belongs to the cache dentry of size 312
The buggy address is located 24 bytes to the right of
 allocated 312-byte region [ffff888119c6e468, ffff888119c6e5a0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x119c6e
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888119c6fed9
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888160417140 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800150015 00000000f5000000 ffff888119c6fed9
head: 017ff00000000040 ffff888160417140 dead000000000100 dead000000000122
head: 0000000000000000 0000000800150015 00000000f5000000 ffff888119c6fed9
head: 017ff00000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 1, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_RECLAIMABLE|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5645, tgid 5645 (syz-executor), ts 69015847006, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x231/0x280 mm/page_alloc.c:1858
 prep_new_page mm/page_alloc.c:1866 [inline]
 get_page_from_freelist+0x24ba/0x2540 mm/page_alloc.c:3946
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5226
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7251
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4651
 alloc_from_pcs mm/slub.c:4749 [inline]
 slab_alloc_node mm/slub.c:4883 [inline]
 kmem_cache_alloc_lru_noprof+0x37c/0x640 mm/slub.c:4917
 __d_alloc+0x37/0x6f0 fs/dcache.c:1808
 d_alloc+0x4b/0x190 fs/dcache.c:1887
 lookup_one_qstr_excl+0xd8/0x360 fs/namei.c:1801
 __start_dirop fs/namei.c:2915 [inline]
 start_dirop fs/namei.c:2937 [inline]
 filename_create+0x20e/0x370 fs/namei.c:4949
 filename_mkdirat+0xd2/0x510 fs/namei.c:5286
 __do_sys_mkdirat fs/namei.c:5314 [inline]
 __se_sys_mkdirat+0x35/0x150 fs/namei.c:5311
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page_owner free stack trace missing

Memory state around the buggy address:
 ffff888119c6e480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888119c6e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888119c6e580: 00 00 00 00 fc fc fc fc fc fc fc fc fa fb fb fb
                                        ^
 ffff888119c6e600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888119c6e680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

WARNING in __d_instantiate

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      254f49634ee16a731174d2ae34bc50bd5f45e731
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/01f9ecd4-e5ae-4e96-aff5-2636e33c0528/config
syz repro: https://ci.syzbot.org/findings/f4cad05d-60bf-4656-a4e7-27c99f4f056b/syz_repro

loop2: detected capacity change from 0 to 16
erofs (device loop2): mounted with root inode @ nid 36.
------------[ cut here ]------------
d_in_lookup(dentry)
WARNING: fs/dcache.c:2112 at __d_instantiate+0x3ea/0x6e0 fs/dcache.c:2112, CPU#0: syz.2.19/5857
Modules linked in:
CPU: 0 UID: 0 PID: 5857 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__d_instantiate+0x3ea/0x6e0 fs/dcache.c:2112
Code: 03 41 80 f6 01 41 0f b6 ce c1 e1 0d 09 c1 89 0b 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7d 25 61 09 cc e8 87 21 7c ff 90 <0f> 0b 90 e9 7f fd ff ff e8 79 21 7c ff 41 81 cc 00 00 01 00 e9 34
RSP: 0018:ffffc900038274a0 EFLAGS: 00010293
RAX: ffffffff82498229 RBX: ffff888114bb8a48 RCX: ffff88810b5e9d80
RDX: 0000000000000000 RSI: 0000000001000000 RDI: 0000000000000000
RBP: 0000000001000000 R08: 0000000000000003 R09: 0000000000000004
R10: dffffc0000000000 R11: fffff52000704e8c R12: 0000000000280000
R13: dffffc0000000000 R14: ffff888113d61960 R15: ffff888113d61964
FS:  00007fadb922c6c0(0000) GS:ffff88818dc93000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fadb8272700 CR3: 000000016d53e000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 d_make_persistent+0x8e/0x180 fs/dcache.c:3068
 shmem_mknod+0x2ea/0x360 mm/shmem.c:3881
 shmem_whiteout mm/shmem.c:4012 [inline]
 shmem_rename2+0x265/0x430 mm/shmem.c:4052
 vfs_rename+0xa96/0xeb0 fs/namei.c:6053
 ovl_do_rename_rd fs/overlayfs/overlayfs.h:372 [inline]
 ovl_check_rename_whiteout fs/overlayfs/super.c:593 [inline]
 ovl_make_workdir fs/overlayfs/super.c:713 [inline]
 ovl_get_workdir fs/overlayfs/super.c:836 [inline]
 ovl_fill_super_creds fs/overlayfs/super.c:1449 [inline]
 ovl_fill_super+0x46b7/0x5e20 fs/overlayfs/super.c:1560
 vfs_get_super fs/super.c:1327 [inline]
 get_tree_nodev+0xbb/0x150 fs/super.c:1346
 vfs_get_tree+0x92/0x2a0 fs/super.c:1754
 fc_mount fs/namespace.c:1193 [inline]
 do_new_mount_fc fs/namespace.c:3758 [inline]
 do_new_mount+0x341/0xd30 fs/namespace.c:3834
 do_mount fs/namespace.c:4167 [inline]
 __do_sys_mount fs/namespace.c:4383 [inline]
 __se_sys_mount+0x31d/0x420 fs/namespace.c:4360
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fadb839cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fadb922c028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fadb8615fa0 RCX: 00007fadb839cdd9
RDX: 0000200000000340 RSI: 00002000000000c0 RDI: 0000000000000000
RBP: 00007fadb8432d69 R08: 0000200000000380 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fadb8616038 R14: 00007fadb8615fa0 R15: 00007ffe45be7c28
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
  2026-04-27  7:43   ` Amir Goldstein
@ 2026-04-27  8:47     ` NeilBrown
  2026-04-27  9:05       ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2026-04-27  8:47 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, 27 Apr 2026, Amir Goldstein wrote:
> On Mon, Apr 27, 2026 at 6:06 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > Some ->lookup handlers will need to drop and retake the parent lock, so
> > they can safely use d_alloc_parallel().
> >
> > ->lookup can be called with the parent lock either exclusive or shared.
> >
> > A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
> >
> > This is rather ugly, but will be gone soon after we move
> > d_alloc_parallel() out of the directory lock as ->lookup() will *always*
> > called with a shared lock on the parent.
> 
> Neil,
> 
> Forgive me for being skeptical about the *always* part.

Scepticism should always be encouraged.

> 
> How long ago did we add ->iterate_shared()?
> 
> It's true that Linus eventually got rid of ->iterate(), but we did not
> get rid of the assumption that iterate_shared() might be upgraded
> to exclusive lock.
> 
> The obvious reason is that *someone* needs to do this work for
> old filesystems, which are also hard to test and nobody wants to
> touch them.
> 
> I have nothing against this patch, but I think it is more realistic
> to state that LOOKUP_SHARED is here to stay, so if you think it
> is too ugly, maybe there is something to be done about it.
> Personally, I do not see the ugliness though.
> 
> Am I misjudging the situation of shared lookup wrt old filesystems?

Yes.
All filesystems must support ->lookup with a shared lock because it is
already the case that with a simple lookup only a shared lock is provided.
A filesystem *could* examine the lookup_flags and deduce if the lock is
actually exclusive (e.g.  if LOOKUP_CREATE is set) and misbehave
accordingly, but a vanishing small number of filesystems  (NFS and ....
I can't think of any others) ever look at the lookup_flags and I'm
certain none do anything so ... creative.

So I'm certain that it is safe from the filesystem's perspective to always
call with only a shared lock.  All that is needed is to change the VFS
to only ever do that.  That means separating the lock for lookup from
the lock for create/remove/move in directory ops, and one way to view
the current patch set (the complete one, not just this initial set) is
that it does exactly that.

Thanks for the review,
NeilBrown

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci()
  2026-04-27  7:49   ` Amir Goldstein
@ 2026-04-27  8:48     ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2026-04-27  8:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, 27 Apr 2026, Amir Goldstein wrote:
> On Mon, Apr 27, 2026 at 6:07 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > A proposed change will invert the lock ordering between
> > d_alloc_parallel() and inode_lock() on the parent.
> > When that happens it will not be safe to call d_alloc_parallel() while
> > holding the parent lock - even shared.
> >
> > We don't need to keep the parent lock held when d_add_ci() is run - the
> > VFS doesn't need it as dentry is exclusively held due to
> > DCACHE_PAR_LOOKUP and the filesystem has finished its work.
> >
> > So drop and reclaim the lock (shared or exclusive as determined by
> > LOOKUP_SHARED) to avoid future deadlock.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  fs/dcache.c            | 18 +++++++++++++++++-
> >  fs/ntfs/namei.c        |  4 +++-
> >  fs/xfs/xfs_iops.c      |  3 ++-
> >  include/linux/dcache.h |  3 ++-
> >  4 files changed, 24 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 569a8ddf4c0d..a2ddfe811df3 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -2294,6 +2294,7 @@ EXPORT_SYMBOL(d_obtain_root);
> >   * @dentry: the negative dentry that was passed to the parent's lookup func
> >   * @inode:  the inode case-insensitive lookup has found
> >   * @name:   the case-exact name to be associated with the returned dentry
> > + * @bool:   %true if lookup was performed with LOOKUP_SHARED
> 
> I do not understand the choice of API.
> Seems better to pass the lookup flags and avoid exposing this very internal
> logic in the call sites...

That is an excellent suggestion, thank.  As you say, it cleans up much
messiness.  I will definitely do that.

Thanks,
NeilBrown



> 
> >   *
> >   * This is to avoid filling the dcache with case-insensitive names to the
> >   * same inode, only the actual correct case is stored in the dcache for
> > @@ -2306,7 +2307,7 @@ EXPORT_SYMBOL(d_obtain_root);
> >   * the exact case, and return the spliced entry.
> >   */
> >  struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
> > -                       struct qstr *name)
> > +                       struct qstr *name, bool shared)
> >  {
> >         struct dentry *found, *res;
> >
> > @@ -2319,6 +2320,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
> >                 iput(inode);
> >                 return found;
> >         }
> > +       /*
> > +        * We are holding parent lock and so don't want to wait for a
> > +        * d_in_lookup() dentry.  We can safely drop the parent lock and
> > +        * reclaim it as we have exclusive access to dentry as it is
> > +        * d_in_lookup() (so ->d_parent is stable) and we are near the
> > +        * end ->lookup() and will shortly drop the lock anyway.
> > +        */
> > +       if (shared)
> > +               inode_unlock_shared(d_inode(dentry->d_parent));
> > +       else
> > +               inode_unlock(d_inode(dentry->d_parent));
> >         if (d_in_lookup(dentry)) {
> >                 found = d_alloc_parallel(dentry->d_parent, name);
> >                 if (IS_ERR(found) || !d_in_lookup(found)) {
> > @@ -2332,6 +2344,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
> >                         return ERR_PTR(-ENOMEM);
> >                 }
> >         }
> > +       if (shared)
> > +               inode_lock_shared(d_inode(dentry->d_parent));
> > +       else
> > +               inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
> >         res = d_splice_alias(inode, found);
> >         if (res) {
> >                 d_lookup_done(found);
> > diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
> > index 10894de519c3..30dddef43300 100644
> > --- a/fs/ntfs/namei.c
> > +++ b/fs/ntfs/namei.c
> > @@ -8,6 +8,7 @@
> >
> >  #include <linux/exportfs.h>
> >  #include <linux/iversion.h>
> > +#include <linux/namei.h> // for LOOKUP_SHARED
> 
> ... this won't be needed
> 
> >
> >  #include "ntfs.h"
> >  #include "time.h"
> > @@ -310,7 +311,8 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
> >                 }
> >                 nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
> >
> > -               dent = d_add_ci(dent, dent_inode, &nls_name);
> > +               dent = d_add_ci(dent, dent_inode, &nls_name,
> > +                               !!(flags & LOOKUP_SHARED));
> >                 kfree(nls_name.name);
> >                 return dent;
> >
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 325c2200c501..f03d832f1468 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/security.h>
> >  #include <linux/iversion.h>
> >  #include <linux/fiemap.h>
> > +#include <linux/namei.h> // for LOOKUP_SHARED
> >
> >  /*
> >   * Directories have different lock order w.r.t. mmap_lock compared to regular
> > @@ -369,7 +370,7 @@ xfs_vn_ci_lookup(
> >         /* else case-insensitive match... */
> >         dname.name = ci_name.name;
> >         dname.len = ci_name.len;
> > -       dentry = d_add_ci(dentry, VFS_I(ip), &dname);
> > +       dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));
> 
> 
> ... and this ugliness could be avoided.
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.
  2026-04-27  8:47     ` NeilBrown
@ 2026-04-27  9:05       ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2026-04-27  9:05 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, Apr 27, 2026 at 10:47 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Mon, 27 Apr 2026, Amir Goldstein wrote:
> > On Mon, Apr 27, 2026 at 6:06 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > Some ->lookup handlers will need to drop and retake the parent lock, so
> > > they can safely use d_alloc_parallel().
> > >
> > > ->lookup can be called with the parent lock either exclusive or shared.
> > >
> > > A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
> > >
> > > This is rather ugly, but will be gone soon after we move
> > > d_alloc_parallel() out of the directory lock as ->lookup() will *always*
> > > called with a shared lock on the parent.
> >
> > Neil,
> >
> > Forgive me for being skeptical about the *always* part.
>
> Scepticism should always be encouraged.
>
> >
> > How long ago did we add ->iterate_shared()?
> >
> > It's true that Linus eventually got rid of ->iterate(), but we did not
> > get rid of the assumption that iterate_shared() might be upgraded
> > to exclusive lock.
> >
> > The obvious reason is that *someone* needs to do this work for
> > old filesystems, which are also hard to test and nobody wants to
> > touch them.
> >
> > I have nothing against this patch, but I think it is more realistic
> > to state that LOOKUP_SHARED is here to stay, so if you think it
> > is too ugly, maybe there is something to be done about it.
> > Personally, I do not see the ugliness though.
> >
> > Am I misjudging the situation of shared lookup wrt old filesystems?
>
> Yes.
> All filesystems must support ->lookup with a shared lock because it is
> already the case that with a simple lookup only a shared lock is provided.
> A filesystem *could* examine the lookup_flags and deduce if the lock is
> actually exclusive (e.g.  if LOOKUP_CREATE is set) and misbehave
> accordingly, but a vanishing small number of filesystems  (NFS and ....
> I can't think of any others) ever look at the lookup_flags and I'm
> certain none do anything so ... creative.
>
> So I'm certain that it is safe from the filesystem's perspective to always
> call with only a shared lock.  All that is needed is to change the VFS
> to only ever do that.  That means separating the lock for lookup from
> the lock for create/remove/move in directory ops, and one way to view
> the current patch set (the complete one, not just this initial set) is
> that it does exactly that.
>

I see.
I will remain skeptical, because there is always *something* with some
special filesystems - think fuse servers (not under your control) which
have grown to rely on the kernel to serialize atomic_open() on the directory -
but the level of skepticism is lower now ;)

I mean it would suck pretty badly if locking order would be fs_type dependapt,
but honestly, I do not think we will be able to avoid that, so we need to
make sure that humans and lockdep are able to understand the different
scopes of vfs locking at least per filesystem type.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return()
  2026-04-27  4:01 ` [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return() NeilBrown
@ 2026-04-27  9:40   ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2026-04-27  9:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, Apr 27, 2026 at 6:07 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> ovl_lookup currently needs to check if a dentry with the same name has
> already been added to the dcache as readdir might need to do.  This
> is an unnecessary performance cost to manage a rare race.
>
> If ovl could know which in-lookup dentries raced with readdir, it could
> limit the extra lookup to just those.
>
> So add d_alloc_noblock_return() which provides the in-lookup dentry when
> it returns -EWOULDBLOCK.
>
> ovl_readdir() can use this this and flag the dentry such that
> ovl_lookup() and easily check if a repeat lookup is needed.
>
> Signed-off-by: NeilBrown <neil@brown.name>

Very nice!

One nit about the API

> ---
>  fs/dcache.c              | 50 ++++++++++++++++++++++++++++++++++++----
>  fs/overlayfs/namei.c     | 23 ++++++++++--------
>  fs/overlayfs/overlayfs.h |  2 ++
>  fs/overlayfs/readdir.c   |  7 ++++--
>  include/linux/dcache.h   |  2 ++
>  5 files changed, 68 insertions(+), 16 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index a2ddfe811df3..2f11257b725b 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2749,7 +2749,8 @@ enum alloc_para {
>  static inline
>  struct dentry *__d_alloc_parallel(struct dentry *parent,
>                                   const struct qstr *name,
> -                                 enum alloc_para how)
> +                                 enum alloc_para how,
> +                                 struct dentry **dentryp)
>  {
>         unsigned int hash = name->hash;
>         struct hlist_bl_head *b = in_lookup_hash(parent, hash);
> @@ -2836,7 +2837,10 @@ struct dentry *__d_alloc_parallel(struct dentry *parent,
>                         case ALLOC_PARA_FAIL:
>                                 spin_unlock(&dentry->d_lock);
>                                 dput(new);
> -                               dput(dentry);
> +                               if (dentryp)
> +                                       *dentryp = dentry;
> +                               else
> +                                       dput(dentry);
>                                 return ERR_PTR(-EWOULDBLOCK);
>                         case ALLOC_PARA_WAIT:
>                                 wait_var_event_spinlock(&dentry->d_flags,
> @@ -2899,7 +2903,7 @@ struct dentry *__d_alloc_parallel(struct dentry *parent,
>  struct dentry *d_alloc_parallel(struct dentry *parent,
>                                 const struct qstr *name)
>  {
> -       return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
> +       return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT, NULL);
>  }
>  EXPORT_SYMBOL(d_alloc_parallel);
>
> @@ -2931,11 +2935,49 @@ struct dentry *d_alloc_noblock(struct dentry *parent,
>
>         de = try_lookup_noperm(name, parent);
>         if (!de)
> -               de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
> +               de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL, NULL);
>         return de;
>  }
>  EXPORT_SYMBOL(d_alloc_noblock);
>
> +/**
> + * d_alloc_noblock_return() - find or allocate a new dentry
> + * @parent - dentry of the parent
> + * @name   - name of the dentry within that parent.
> + * @dentryp - place to store the blocking dentry
> + *
> + * A new dentry is allocated and, providing it is unique, added to the
> + * relevant index.
> + * If an existing dentry is found with the same parent/name that is
> + * not d_in_lookup() then that is returned instead.
> + * If the existing dentry is d_in_lookup(), d_alloc_noblock()
> + * returns with error %-EWOULDBLOCK and the blocking dentry is passed
> + * in @dentryp.  The dentry must be dput() by the caller.

This contract is a bit subtle.
We have plenty of contracts where the caller must dput() in case of success
or in case of error, but must dput in case of a specific error that
sounds fragile.

How about:
* If the existing dentry is d_in_lookup(), d_alloc_noblock()
 * returns with error %-EWOULDBLOCK and the blocking dentry is passed
 * in @dentryp. Regardless of the returned error, if @dentryp is set by this
 * function, the returned dentry must be dput() by the caller.

Thanks,
Amir.

> + *
> + * Thus if the returned dentry is d_in_lookup() then the caller has
> + * exclusive access until it completes the lookup.
> + * If the returned dentry is not d_in_lookup() then a lookup has
> + * already completed.
> + *
> + * The @name need not already have ->hash set.
> + *
> + * Returns: the dentry, whether found or allocated, or an error
> + *    %-ENOMEM, %-EWOULDBLOCK, and anything returned by ->d_hash().
> + */
> +struct dentry *d_alloc_noblock_return(struct dentry *parent,
> +                                     struct qstr *name,
> +                                     struct dentry **dentryp)
> +{
> +       struct dentry *de;
> +
> +       de = try_lookup_noperm(name, parent);
> +       if (!de)
> +               de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL,
> +                                       dentryp);
> +       return de;
> +}
> +EXPORT_SYMBOL(d_alloc_noblock_return);
> +
>  /*
>   * - Unhash the dentry
>   * - Retrieve and clear the waitqueue head in dentry
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 69032eb2b1e2..524e661c3c8d 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -1400,16 +1400,19 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         if (dentry->d_name.len > ofs->namelen)
>                 return ERR_PTR(-ENAMETOOLONG);
>
> -       /*
> -        * The existance of this in-lookup dentry might have forced
> -        * readdir to do the lookup with a new dentry.  If so we must
> -        * return that one.
> -        */
> -       alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
> -                                           dentry->d_name.len),
> -                                 dentry->d_parent);
> -       if (alias && !IS_ERR(alias))
> -               return alias;
> +       if (ovl_dentry_test_flag(OVL_E_RACED_READDIR, dentry)) {
> +               ovl_dentry_clear_flag(OVL_E_RACED_READDIR, dentry);
> +               /*
> +                * The existance of this in-lookup dentry might have
> +                * forced readdir to do the lookup with a new dentry.
> +                * If so we must return that one.
> +                */
> +               alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
> +                                                   dentry->d_name.len),
> +                                         dentry->d_parent);
> +               if (alias && !IS_ERR(alias))
> +                       return alias;
> +       }
>
>         with_ovl_creds(dentry->d_sb)
>                 err = ovl_lookup_layers(&ctx, &d);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index b75df37f70ac..bd6f1a25aca1 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -71,6 +71,8 @@ enum ovl_entry_flag {
>         OVL_E_CONNECTED,
>         /* Lower stack may contain xwhiteout entries */
>         OVL_E_XWHITEOUTS,
> +       /* dentry was found in-lookup during readdir */
> +       OVL_E_RACED_READDIR,
>  };
>
>  enum {
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index e03b32491941..e483bd939a8c 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -553,7 +553,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>  {
>         struct dentry *dir = path->dentry;
>         struct ovl_fs *ofs = OVL_FS(dir->d_sb);
> -       struct dentry *this = NULL;
> +       struct dentry *this = NULL, *in_lookup;
>         enum ovl_path_type type;
>         u64 ino = p->real_ino;
>         int xinobits = ovl_xino_bits(ofs);
> @@ -574,7 +574,8 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>                 }
>         }
>         /* This checks also for xwhiteouts */
> -       this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
> +       this = d_alloc_noblock_return(dir, &QSTR_LEN(p->name, p->len),
> +                                     &in_lookup);
>         if (this == ERR_PTR(-EWOULDBLOCK)) {
>                 /*
>                  * Some other thead is looking up this name and will
> @@ -583,6 +584,8 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>                  * lookup gets a turn it will find and return this
>                  * dentry.
>                  */
> +               ovl_dentry_set_flag(OVL_E_RACED_READDIR, in_lookup);
> +               dput(in_lookup);
>                 this = d_alloc_name(dir, p->name);
>         }
>         if (!IS_ERR(this) && !d_unhashed(this)) {
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 662b98185337..db7cdcbac775 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -258,6 +258,8 @@ extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
>  extern struct dentry * d_alloc_anon(struct super_block *);
>  extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
>  extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
> +extern struct dentry * d_alloc_noblock_return(struct dentry *, struct qstr *,
> +                                             struct dentry **);
>  extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
>  struct dentry *d_duplicate(struct dentry *dentry);
>  /* weird procfs mess; *NOT* exported */
> --
> 2.50.0.107.gf914562f5916.dirty
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling.
  2026-04-27  4:01 ` [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling NeilBrown
@ 2026-04-27 10:10   ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2026-04-27 10:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	Jeff Layton, Trond Myklebust, Anna Schumaker, Miklos Szeredi,
	Jeremy Kerr, Ard Biesheuvel, linux-efi, linux-fsdevel, linux-nfs,
	linux-unionfs, linux-kernel

On Mon, Apr 27, 2026 at 6:07 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> lookup_one() is expected to be removed as it does not fit well with
> proposed changes to directory locking.
> Specifically d_alloc_parallel() will be ordered outside of i_rwsem
> and as iterate_shared() is called with i_rwsem held it is not safe
> to call d_alloc_parallel().
>
> We can instead call d_alloc_noblock() and then call the ->lookup, but
> that can fail if there is a lookup attempt concurrent with the
> readdir().
>
> ovl cannot afford for the lookup to fail as that could produce incorrect
> results, and it cannot safely drop i_rwsem temporarily and that could
> introduce races with handling of the directory cache.
>
> Instead we rely on the fact that ovl_iterate() has an exclusive lock on
> the directory, so any concurrent lookup will wait for the ovl_iterate()
> call to complete.  We allocate a separate dentry and if the lookup is
> successful, it is hashed with the result.
>
> When the concurrent lookup gets i_rwsem it mustn't do its own lookup -
> it must use the existing dentry.  This is found, if it exists, using
> try_lookup_noperm().

Hi Niel,

One nit about the subject -
Please change it to "ovl: stop using lookup_one() in ovl_iterate()"

Apart from that, I let Claude do the review, because it has surpassed
my abilities in reviewing such subtle api changes.

I will copy its output verbatim, if for nothing else, to encourage you
to start applying Claude review before sending out your patches...

>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/overlayfs/namei.c   | 12 ++++++++++++
>  fs/overlayfs/readdir.c | 24 ++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index ca899fdfaafd..69032eb2b1e2 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -1385,6 +1385,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
>         struct ovl_entry *poe = OVL_E(dentry->d_parent);
>         bool check_redirect = (ovl_redirect_follow(ofs) || ofs->numdatalayer);
> +       struct dentry *alias;
>         int err;
>         struct ovl_lookup_ctx ctx = {
>                 .dentry = dentry,
> @@ -1399,6 +1400,17 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>         if (dentry->d_name.len > ofs->namelen)
>                 return ERR_PTR(-ENAMETOOLONG);
>
> +       /*
> +        * The existance of this in-lookup dentry might have forced

"existance" should be "existence" in the namei.c comment

> +        * readdir to do the lookup with a new dentry.  If so we must
> +        * return that one.
> +        */
> +       alias = try_lookup_noperm(&QSTR_LEN(dentry->d_name.name,
> +                                           dentry->d_name.len),
> +                                 dentry->d_parent);
> +       if (alias && !IS_ERR(alias))
> +               return alias;
> +
>         with_ovl_creds(dentry->d_sb)
>                 err = ovl_lookup_layers(&ctx, &d);
>
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 1dcc75b3a90f..e03b32491941 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -574,8 +574,28 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>                 }
>         }
>         /* This checks also for xwhiteouts */
> -       this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
> -       if (IS_ERR_OR_NULL(this) || !this->d_inode) {
> +       this = d_alloc_noblock(dir, &QSTR_LEN(p->name, p->len));
> +       if (this == ERR_PTR(-EWOULDBLOCK)) {
> +               /*
> +                * Some other thead is looking up this name and will

"thead" should be "thread" in the readdir.c comment

> +                * block on i_rwsem before it can complete the lookup.
> +                * We will do the lookup in a new dentry and when that
> +                * lookup gets a turn it will find and return this
> +                * dentry.
> +                */
> +               this = d_alloc_name(dir, p->name);

Claude suggests this as the simplest fix to NULL deref bug below:
                    if (!this)
                            this = ERR_PTR(-ENOMEM);

> +       }
> +       if (!IS_ERR(this) && !d_unhashed(this)) {
> +               /* Either we got an in-lookup or we made our own unhashed */

The condition should be d_unhashed(this) (without !), matching the comment
which says "Either we got an in-lookup or we made our own unhashed".

d_alloc_name() calls d_alloc() which returns NULL on allocation failure.
The original code checked IS_ERR_OR_NULL(this), but the new code only
checks IS_ERR(this). A NULL this would pass the !IS_ERR(this) check and
then crash on !d_unhashed(this) (or d_unhashed(this) with the fix).

> +               struct dentry *alias = ovl_lookup(dir->d_inode, this, 0);
> +
> +               if (alias) {
> +                       d_lookup_done(this);
> +                       dput(this);
> +                       this = alias;
> +               }
> +       }
> +       if (IS_ERR(this) || !this->d_inode) {
>                 /* Mark a stale entry */
>                 p->is_whiteout = true;
>                 if (IS_ERR(this)) {
> --

Claude (Opus 4.6) review summary:

The patch has a critical condition inversion (!d_unhashed vs d_unhashed)
that would break impure readdir entirely and cause use-after-free corruption
in the dcache. It also has a missing NULL check for d_alloc_name() failure.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-04-27 10:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27  4:01 [PATCH v3 00/19] Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
2026-04-27  4:01 ` [PATCH v3 01/19] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
2026-04-27  4:01 ` [PATCH v3 02/19] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
2026-04-27  4:01 ` [PATCH v3 03/19] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
2026-04-27  4:01 ` [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel() NeilBrown
2026-04-27  4:01 ` [PATCH v3 05/19] VFS: introduce d_alloc_noblock() NeilBrown
2026-04-27  4:01 ` [PATCH v3 06/19] VFS: add d_duplicate() NeilBrown
2026-04-27  4:01 ` [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag NeilBrown
2026-04-27  7:43   ` Amir Goldstein
2026-04-27  8:47     ` NeilBrown
2026-04-27  9:05       ` Amir Goldstein
2026-04-27  4:01 ` [PATCH v3 08/19] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
2026-04-27  7:49   ` Amir Goldstein
2026-04-27  8:48     ` NeilBrown
2026-04-27  4:01 ` [PATCH v3 09/19] ovl: stop using lookup_one() in iterate_shared() handling NeilBrown
2026-04-27 10:10   ` Amir Goldstein
2026-04-27  4:01 ` [PATCH v3 10/19] VFS/ovl: add d_alloc_noblock_return() NeilBrown
2026-04-27  9:40   ` Amir Goldstein
2026-04-27  4:01 ` [PATCH v3 11/19] efivarfs: use d_alloc_name() NeilBrown
2026-04-27  4:01 ` [PATCH v3 12/19] shmem: use d_duplicate() NeilBrown
2026-04-27  4:01 ` [PATCH v3 13/19] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
2026-04-27  4:01 ` [PATCH v3 14/19] nfs: use d_splice_alias() in nfs_link() NeilBrown
2026-04-27  4:01 ` [PATCH v3 15/19] nfs: don't d_drop() before d_splice_alias() NeilBrown
2026-04-27  4:01 ` [PATCH v3 16/19] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
2026-04-27  4:01 ` [PATCH v3 17/19] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
2026-04-27  4:01 ` [PATCH v3 18/19] nfs: use d_alloc_noblock() in silly-rename NeilBrown
2026-04-27  4:01 ` [PATCH v3 19/19] nfs: use d_duplicate() NeilBrown
2026-04-27  8:42 ` [syzbot ci] Re: Prepare to lift lookup out of exclusive lock for directory ops syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox