* [PATCH v4 1/7] VFS: fix various typos in documentation for start_creating start_removing etc
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 2/7] VFS: use wait_var_event for waiting in d_alloc_parallel() NeilBrown
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
Various typos fixes.
start_creating_dentry() now documented as *creating*, not *removing* the
entry.
Unwanted spaces in Documentation/filesystems/porting.rst removed.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 8 +++----
fs/namei.c | 30 +++++++++++++--------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index fdf074429cd3..bfdff4608028 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1203,7 +1203,7 @@ will fail-safe.
---
-** mandatory**
+**mandatory**
lookup_one(), lookup_one_unlocked(), lookup_one_positive_unlocked() now
take a qstr instead of a name and len. These, not the "one_len"
@@ -1212,7 +1212,7 @@ that filesysmtem, through a mount point - which will have a mnt_idmap.
---
-** mandatory**
+**mandatory**
Functions try_lookup_one_len(), lookup_one_len(),
lookup_one_len_unlocked() and lookup_positive_unlocked() have been
@@ -1229,7 +1229,7 @@ already been performed such as after vfs_path_parent_lookup()
---
-** mandatory**
+**mandatory**
d_hash_and_lookup() is no longer exported or available outside the VFS.
Use try_lookup_noperm() instead. This adds name validation and takes
@@ -1371,7 +1371,7 @@ similar.
---
-** mandatory**
+**mandatory**
lock_rename(), lock_rename_child(), unlock_rename() are no
longer available. Use start_renaming() or similar.
diff --git a/fs/namei.c b/fs/namei.c
index c7fac83c9a85..65e60536a6d1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2942,8 +2942,8 @@ struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
* end_dirop - signal completion of a dirop
* @de: the dentry which was returned by start_dirop or similar.
*
- * If the de is an error, nothing happens. Otherwise any lock taken to
- * protect the dentry is dropped and the dentry itself is release (dput()).
+ * If the @de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is released (dput()).
*/
void end_dirop(struct dentry *de)
{
@@ -3260,7 +3260,7 @@ EXPORT_SYMBOL(lookup_one_unlocked);
* the i_rwsem itself if necessary. If a fatal signal is pending or
* delivered, it will return %-EINTR if the lock is needed.
*
- * Returns: A dentry, possibly negative, or
+ * Returns: A positive dentry, or
* - same errors as lookup_one_unlocked() or
* - ERR_PTR(-EINTR) if a fatal signal is pending.
*/
@@ -3382,7 +3382,7 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
/**
- * start_creating - prepare to create a given name with permission checking
+ * start_creating - prepare to access or create a given name with permission checking
* @idmap: idmap of the mount
* @parent: directory in which to prepare to create the name
* @name: the name to be created
@@ -3414,8 +3414,8 @@ EXPORT_SYMBOL(start_creating);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
- * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * Locks are taken and a lookup is performed prior to removing an object
+ * from a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
* If the name doesn't exist, an error is returned.
@@ -3441,7 +3441,7 @@ EXPORT_SYMBOL(start_removing);
* @parent: directory in which to prepare to create the name
* @name: the name to be created
*
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
* an object in a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
@@ -3470,7 +3470,7 @@ EXPORT_SYMBOL(start_creating_killable);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
* an object from a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
@@ -3500,7 +3500,7 @@ EXPORT_SYMBOL(start_removing_killable);
* @parent: directory in which to prepare to create the name
* @name: the name to be created
*
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
* an object in a directory.
*
* If the name already exists, a positive dentry is returned.
@@ -3523,7 +3523,7 @@ EXPORT_SYMBOL(start_creating_noperm);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
* an object from a directory.
*
* If the name doesn't exist, an error is returned.
@@ -3544,11 +3544,11 @@ struct dentry *start_removing_noperm(struct dentry *parent,
EXPORT_SYMBOL(start_removing_noperm);
/**
- * start_creating_dentry - prepare to create a given dentry
- * @parent: directory from which dentry should be removed
- * @child: the dentry to be removed
+ * start_creating_dentry - prepare to access or create a given dentry
+ * @parent: directory of dentry
+ * @child: the dentry to be prepared
*
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
* the validity of the dentry is checked: correct parent and still hashed.
*
* If the dentry is valid and negative a reference is taken and
@@ -3581,7 +3581,7 @@ EXPORT_SYMBOL(start_creating_dentry);
* @parent: directory from which dentry should be removed
* @child: the dentry to be removed
*
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
* the validity of the dentry is checked: correct parent and still hashed.
*
* If the dentry is valid and positive, a reference is taken and
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 2/7] VFS: use wait_var_event for waiting in d_alloc_parallel()
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
2026-04-30 2:03 ` [PATCH v4 1/7] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
d_alloc_parallel() currently requires a wait_queue_head to be passed in.
This must have a life time which extends until the lookup is completed.
This makes it awkward to use and particularly make it hard to use in
lookup_one_qstr_excl() which I hope to do in the future.
This patch changes d_alloc_parallel() to use wake_up_var_locked() to
wake up waiters, and wait_var_event_spinlock() to wait. dentry->d_lock
is used for synchronisation as it is already held and the relevant
times.
In most cases there will be no waiters so the wake_up_var_locked() call
(which can walk an arbitrarily long list) would be a complete waste. To
optimise this a new ->d_flags flag is added: DCACHE_LOOKUP_WAITER. This
is set whenever any thread prepares to wait for the dentry, and if it
isn't set when DCACHE_PAR_LOOKUP is cleared, no wakeup is sent.
DCACHE_LOOKUP_WAITER is cleared after the wakeup is set.
__d_lookup_unhash() no longer returns a wq. Callers check
DCACHE_LOOKUP_WAITER to check if a wakeup is needed, and
wake_up_var_locked() can find the wq.
__d_lookup_unhash() no longer needs to re-init ->d_lru. That was
previously shared (in a union) with ->d_wait but ->d_wait is now gone
so it no longer corrupts ->d_lru.
Wakeup is move out of end_dir_add() as it is conceptually a separate action.
Co-developed-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 6 ++
fs/afs/dir_silly.c | 4 +-
fs/dcache.c | 83 +++++++++++++++------------
fs/fuse/readdir.c | 3 +-
fs/namei.c | 6 +-
fs/nfs/dir.c | 6 +-
fs/nfs/unlink.c | 3 +-
fs/proc/base.c | 3 +-
fs/proc/proc_sysctl.c | 3 +-
fs/smb/client/readdir.c | 3 +-
include/linux/dcache.h | 11 ++--
include/linux/nfs_xdr.h | 1 -
12 files changed, 67 insertions(+), 65 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index bfdff4608028..5cc6ae19845c 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1385,3 +1385,9 @@ for_each_alias(dentry, inode) instead of hlist_for_each_entry; better
yet, see if any of the exported primitives could be used instead of
the entire loop. You still need to hold ->i_lock of the inode over
either form of manual loop.
+
+---
+
+**mandatory**
+
+d_alloc_parallel() no longer requires a waitqueue_head.
diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index a748fd133faf..982bb6ec15f0 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -248,13 +248,11 @@ int afs_silly_iput(struct dentry *dentry, struct inode *inode)
struct dentry *alias;
int ret;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
down_read(&dvnode->rmdir_lock);
- alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq);
+ alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name);
if (IS_ERR(alias)) {
up_read(&dvnode->rmdir_lock);
return 0;
diff --git a/fs/dcache.c b/fs/dcache.c
index 2c61aeea41f4..1a065df22ecd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2250,8 +2250,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return found;
}
if (d_in_lookup(dentry)) {
- found = d_alloc_parallel(dentry->d_parent, name,
- dentry->d_wait);
+ found = d_alloc_parallel(dentry->d_parent, name);
if (IS_ERR(found) || !d_in_lookup(found)) {
iput(inode);
return found;
@@ -2638,32 +2637,24 @@ static inline unsigned start_dir_add(struct inode *dir)
}
}
-static inline void end_dir_add(struct inode *dir, unsigned int n,
- wait_queue_head_t *d_wait)
+static inline void end_dir_add(struct inode *dir, unsigned int n)
{
smp_store_release(&dir->i_dir_seq, n + 2);
preempt_enable_nested();
- if (wq_has_sleeper(d_wait))
- wake_up_all(d_wait);
}
static void d_wait_lookup(struct dentry *dentry)
{
- if (d_in_lookup(dentry)) {
- DECLARE_WAITQUEUE(wait, current);
- add_wait_queue(dentry->d_wait, &wait);
- do {
- set_current_state(TASK_UNINTERRUPTIBLE);
- spin_unlock(&dentry->d_lock);
- schedule();
- spin_lock(&dentry->d_lock);
- } while (d_in_lookup(dentry));
+ if (likely(d_in_lookup(dentry))) {
+ dentry->d_flags |= DCACHE_LOOKUP_WAITERS;
+ wait_var_event_spinlock(&dentry->d_flags,
+ !d_in_lookup(dentry),
+ &dentry->d_lock);
}
}
struct dentry *d_alloc_parallel(struct dentry *parent,
- const struct qstr *name,
- wait_queue_head_t *wq)
+ const struct qstr *name)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2766,7 +2757,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
return dentry;
}
rcu_read_unlock();
- new->d_wait = wq;
hlist_bl_add_head(&new->d_in_lookup_hash, b);
hlist_bl_unlock(b);
return new;
@@ -2778,13 +2768,26 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
EXPORT_SYMBOL(d_alloc_parallel);
/*
- * - Unhash the dentry
- * - Retrieve and clear the waitqueue head in dentry
- * - Return the waitqueue head
+ * Move dentry from in-lookup state to busy-negative one.
+ *
+ * From now on d_in_lookup(dentry) will return false and dentry is gone from
+ * in-lookup hash.
+ *
+ * Anyone who had been waiting on it in d_alloc_parallel() is free to
+ * proceed after that. Note that waking such waiters up is left to
+ * the callers; PREEMPT_RT kernels can't have that wakeup done while
+ * in write-side critical area for ->i_dir_seq, so it's done by calling
+ * __d_wake_in_lookup_waiters() once it's safe to do so.
+ *
+ * Both __d_lookup_unhash() and __d_wake_in_lookup_waiters() should
+ * be called within the same ->d_lock scope. PAR_LOOKUP is cleared
+ * here, while LOOKUP_WAITERS (set by somebody finding dentry in
+ * the in-lookup hash and setting down to wait) is checked and cleared
+ * in __d_wake_in_lookup_waiters(). Both are gone by the end of
+ * ->d_lock scope.
*/
-static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
+static void __d_lookup_unhash(struct dentry *dentry)
{
- wait_queue_head_t *d_wait;
struct hlist_bl_head *b;
lockdep_assert_held(&dentry->d_lock);
@@ -2793,18 +2796,23 @@ static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
hlist_bl_lock(b);
dentry->d_flags &= ~DCACHE_PAR_LOOKUP;
__hlist_bl_del(&dentry->d_in_lookup_hash);
- d_wait = dentry->d_wait;
- dentry->d_wait = NULL;
hlist_bl_unlock(b);
dentry->waiters = NULL;
- INIT_LIST_HEAD(&dentry->d_lru);
- return d_wait;
+}
+
+static inline void __d_wake_in_lookup_waiters(struct dentry *dentry)
+{
+ if (dentry->d_flags & DCACHE_LOOKUP_WAITERS) {
+ wake_up_var_locked(&dentry->d_flags, &dentry->d_lock);
+ dentry->d_flags &= ~DCACHE_LOOKUP_WAITERS;
+ }
}
void __d_lookup_unhash_wake(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
- wake_up_all(__d_lookup_unhash(dentry));
+ __d_lookup_unhash(dentry);
+ __d_wake_in_lookup_waiters(dentry);
spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(__d_lookup_unhash_wake);
@@ -2814,14 +2822,13 @@ EXPORT_SYMBOL(__d_lookup_unhash_wake);
static inline void __d_add(struct dentry *dentry, struct inode *inode,
const struct dentry_operations *ops)
{
- wait_queue_head_t *d_wait;
struct inode *dir = NULL;
unsigned n;
spin_lock(&dentry->d_lock);
if (unlikely(d_in_lookup(dentry))) {
dir = dentry->d_parent->d_inode;
n = start_dir_add(dir);
- d_wait = __d_lookup_unhash(dentry);
+ __d_lookup_unhash(dentry);
}
if (unlikely(ops))
d_set_d_op(dentry, ops);
@@ -2834,8 +2841,10 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
fsnotify_update_flags(dentry);
}
__d_rehash(dentry);
- if (dir)
- end_dir_add(dir, n, d_wait);
+ if (dir) {
+ end_dir_add(dir, n);
+ __d_wake_in_lookup_waiters(dentry);
+ }
spin_unlock(&dentry->d_lock);
if (inode)
spin_unlock(&inode->i_lock);
@@ -2948,7 +2957,6 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
bool exchange)
{
struct dentry *old_parent, *p;
- wait_queue_head_t *d_wait;
struct inode *dir = NULL;
unsigned n;
@@ -2979,7 +2987,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
if (unlikely(d_in_lookup(target))) {
dir = target->d_parent->d_inode;
n = start_dir_add(dir);
- d_wait = __d_lookup_unhash(target);
+ __d_lookup_unhash(target);
}
write_seqcount_begin(&dentry->d_seq);
@@ -3018,9 +3026,10 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
write_seqcount_end(&target->d_seq);
write_seqcount_end(&dentry->d_seq);
- if (dir)
- end_dir_add(dir, n, d_wait);
-
+ if (dir) {
+ end_dir_add(dir, n);
+ __d_wake_in_lookup_waiters(dentry);
+ }
if (dentry->d_parent != old_parent)
spin_unlock(&dentry->d_parent->d_lock);
if (dentry != old_parent)
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index db5ae8ec1030..a2361f1d9905 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -164,7 +164,6 @@ static int fuse_direntplus_link(struct file *file,
struct inode *dir = d_inode(parent);
struct fuse_conn *fc;
struct inode *inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
int epoch;
if (!o->nodeid) {
@@ -201,7 +200,7 @@ static int fuse_direntplus_link(struct file *file,
dentry = d_lookup(parent, &name);
if (!dentry) {
retry:
- dentry = d_alloc_parallel(parent, &name, &wq);
+ dentry = d_alloc_parallel(parent, &name);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
}
diff --git a/fs/namei.c b/fs/namei.c
index 65e60536a6d1..a6349b31fdb6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1891,13 +1891,12 @@ static struct dentry *__lookup_slow(const struct qstr *name,
{
struct dentry *dentry, *old;
struct inode *inode = dir->d_inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
/* Don't go there if it's already dead */
if (unlikely(IS_DEADDIR(inode)))
return ERR_PTR(-ENOENT);
again:
- dentry = d_alloc_parallel(dir, name, &wq);
+ dentry = d_alloc_parallel(dir, name);
if (IS_ERR(dentry))
return dentry;
if (unlikely(!d_in_lookup(dentry))) {
@@ -4414,7 +4413,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
struct dentry *dentry;
int error, create_error = 0;
umode_t mode = op->mode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
if (unlikely(IS_DEADDIR(dir_inode)))
return ERR_PTR(-ENOENT);
@@ -4423,7 +4421,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
dentry = d_lookup(dir, &nd->last);
for (;;) {
if (!dentry) {
- dentry = d_alloc_parallel(dir, &nd->last, &wq);
+ dentry = d_alloc_parallel(dir, &nd->last);
if (IS_ERR(dentry))
return dentry;
}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e9ce1883288c..9580af999d70 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -726,7 +726,6 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
unsigned long dir_verifier)
{
struct qstr filename = QSTR_INIT(entry->name, entry->len);
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
struct dentry *dentry;
struct dentry *alias;
struct inode *inode;
@@ -755,7 +754,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
dentry = d_lookup(parent, &filename);
again:
if (!dentry) {
- dentry = d_alloc_parallel(parent, &filename, &wq);
+ dentry = d_alloc_parallel(parent, &filename);
if (IS_ERR(dentry))
return;
}
@@ -2106,7 +2105,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned open_flags,
umode_t mode)
{
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
struct nfs_open_context *ctx;
struct dentry *res;
struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -2162,7 +2160,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
d_drop(dentry);
switched = true;
dentry = d_alloc_parallel(dentry->d_parent,
- &dentry->d_name, &wq);
+ &dentry->d_name);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
if (unlikely(!d_in_lookup(dentry)))
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index df3ca4669df6..43ea897943c0 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -124,7 +124,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
struct dentry *alias;
down_read_non_owner(&NFS_I(dir)->rmdir_sem);
- alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
+ alias = d_alloc_parallel(dentry->d_parent, &data->args.name);
if (IS_ERR(alias)) {
up_read_non_owner(&NFS_I(dir)->rmdir_sem);
return 0;
@@ -185,7 +185,6 @@ nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
data->cred = get_current_cred();
data->res.dir_attr = &data->dir_attr;
- init_waitqueue_head(&data->wq);
status = -EBUSY;
spin_lock(&dentry->d_lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..d55a4b603188 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2132,8 +2132,7 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx,
goto end_instantiate;
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
- child = d_alloc_parallel(dir, &qname, &wq);
+ child = d_alloc_parallel(dir, &qname);
if (IS_ERR(child))
goto end_instantiate;
if (d_in_lookup(child)) {
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 49ab74e0bfde..04a382178c65 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -692,8 +692,7 @@ static bool proc_sys_fill_cache(struct file *file,
child = d_lookup(dir, &qname);
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
- child = d_alloc_parallel(dir, &qname, &wq);
+ child = d_alloc_parallel(dir, &qname);
if (IS_ERR(child))
return false;
if (d_in_lookup(child)) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index be22bbc4a65a..a8995261831c 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -73,7 +73,6 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions;
bool reparse_need_reval = false;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
int rc;
cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
@@ -105,7 +104,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
(fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
return;
- dentry = d_alloc_parallel(parent, name, &wq);
+ dentry = d_alloc_parallel(parent, name);
}
if (IS_ERR(dentry))
return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 2577c05f84ec..97a887be150a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -116,10 +116,7 @@ struct dentry {
* possible!
*/
- union {
- struct list_head d_lru; /* LRU list */
- wait_queue_head_t *d_wait; /* in-lookup ones only */
- };
+ struct list_head d_lru; /* LRU list */
struct hlist_node d_sib; /* child of parent list */
struct hlist_head d_children; /* our children */
/*
@@ -210,6 +207,9 @@ enum dentry_flags {
DCACHE_REFERENCED = BIT(6), /* Recently used, don't discard. */
DCACHE_DONTCACHE = BIT(7), /* Purge from memory on final dput() */
DCACHE_CANT_MOUNT = BIT(8),
+ DCACHE_LOOKUP_WAITERS = BIT(9), /* A thread is waiting for
+ * PAR_LOOKUP to clear
+ */
DCACHE_SHRINK_LIST = BIT(10),
DCACHE_OP_WEAK_REVALIDATE = BIT(11),
/*
@@ -256,8 +256,7 @@ extern void d_delete(struct dentry *);
/* allocate/de-allocate */
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
-extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
- wait_queue_head_t *);
+extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index fcbd21b5685f..6aced49d5f00 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1743,7 +1743,6 @@ struct nfs_unlinkdata {
struct nfs_removeargs args;
struct nfs_removeres res;
struct dentry *dentry;
- wait_queue_head_t wq;
const struct cred *cred;
struct nfs_fattr dir_attr;
long timeout;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup dentries
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
2026-04-30 2:03 ` [PATCH v4 1/7] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
2026-04-30 2:03 ` [PATCH v4 2/7] VFS: use wait_var_event for waiting in d_alloc_parallel() NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 4/7] VFS: introduce d_alloc_noblock() NeilBrown
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
We currently have three interfaces for attaching existing inodes to
normal filesystems(*).
- d_add() requires an unhashed or in-lookup dentry and doesn't handle
splicing in case a directory already has dentry
- d_instantiate() requires a hashed dentry, and also doesn't handle
splicing.
- d_splice_alias() requires unhashed or in-lookup and does handle
splicing, and can return an alternate dentry.
So there is no interface that supports both hashed and in-lookup, which
is what ->atomic_open needs to deal with.
Some filesystems check for in-lookup in their atomic_open and if found,
perform a ->lookup and can subsequently use d_instantiate() if the
dentry is still negative. Others d_drop() the dentry so they can use
d_splice_alias().
This last will cause a problem for proposed changes to locking which
require the dentry to remain hashed while and operation proceeds on it.
There is also no interface which splices a directory (which might
already have a dentry) to a hashed dentry. Filesystems which need to do
this d_drop() first.
Some filesystemss (NFS) skip ->lookup processing for
LOOKUP_CREATE|LOOKUP_EXCL
which includes mknod, link, symlink etc. So these inode operations
might get an unhashed or a hashed-negative dentry. There is no
interface for instantiating these so again they need to unhash
first (nfs_link)
So with this patch d_splice_alias() can handle hashed, unhashed, or
in-lookup dentries. This makes it suitable for ->lookup, ->atomic_open,
and ->mkdir and others.
As a side effect d_add() will also now handle hashed dentries, but
I have plans to remove d_add() as there is no benefit having it as
well as the others.
__d_add() currently contains code that is identical to
__d_instantiate(), so the former is changed to call the later, and both
d_add() and d_instantiate() call __d_add().
* There is also d_make_persistent() for filesystems which are
dcache-based and don't support mkdir, create etc, and
d_instantiate_new() for newly created inodes that are still locked.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/vfs.rst | 4 ++--
fs/dcache.c | 31 ++++++++++++-------------------
2 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..d8df0a84cdba 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -507,8 +507,8 @@ otherwise noted.
dentry before the first mkdir returns.
If there is any chance this could happen, then the new inode
- should be d_drop()ed and attached with d_splice_alias(). The
- returned dentry (if any) should be returned by ->mkdir().
+ should be attached with d_splice_alias(). The returned
+ dentry (if any) should be returned by ->mkdir().
``rmdir``
called by the rmdir(2) system call. Only required if you want
diff --git a/fs/dcache.c b/fs/dcache.c
index 1a065df22ecd..a4ee380f3dfc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2068,7 +2068,6 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
* (or otherwise set) by the caller to indicate that it is now
* in use by the dcache.
*/
-
void d_instantiate(struct dentry *entry, struct inode * inode)
{
BUG_ON(d_really_is_positive(entry));
@@ -2829,18 +2828,14 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
dir = dentry->d_parent->d_inode;
n = start_dir_add(dir);
__d_lookup_unhash(dentry);
+ __d_rehash(dentry);
+ } else if (d_unhashed(dentry)) {
+ __d_rehash(dentry);
}
if (unlikely(ops))
d_set_d_op(dentry, ops);
- if (inode) {
- unsigned add_flags = d_flags_for_inode(inode);
- hlist_add_head(&dentry->d_alias, &inode->i_dentry);
- raw_write_seqcount_begin(&dentry->d_seq);
- __d_set_inode_and_type(dentry, inode, add_flags);
- raw_write_seqcount_end(&dentry->d_seq);
- fsnotify_update_flags(dentry);
- }
- __d_rehash(dentry);
+ if (inode)
+ __d_instantiate(dentry, inode);
if (dir) {
end_dir_add(dir, n);
__d_wake_in_lookup_waiters(dentry);
@@ -3142,8 +3137,6 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
if (IS_ERR(inode))
return ERR_CAST(inode);
- BUG_ON(!d_unhashed(dentry));
-
if (!inode)
goto out;
@@ -3192,6 +3185,8 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
* @inode: the inode which may have a disconnected dentry
* @dentry: a negative dentry which we want to point to the inode.
*
+ * @dentry must be negative and may be in-lookup or unhashed or hashed.
+ *
* If inode is a directory and has an IS_ROOT alias, then d_move that in
* place of the given dentry and return it, else simply d_add the inode
* to the dentry and return NULL.
@@ -3199,16 +3194,14 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
* If a non-IS_ROOT directory is found, the filesystem is corrupt, and
* we should error out: directories can't have multiple aliases.
*
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
+ * This should be used to return the result of ->lookup() and to
+ * instantiate the result of ->mkdir(), is often useful for
+ * ->atomic_open, and may be used to instantiate other objects.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
- * is returned. This matches the expected return value of ->lookup.
+ * is returned. This matches the expected return value of ->lookup and
+ * ->mkdir.
*
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
*/
struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 4/7] VFS: introduce d_alloc_noblock()
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
` (2 preceding siblings ...)
2026-04-30 2:03 ` [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 5/7] VFS: add d_duplicate() NeilBrown
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
Several filesystems use the results of readdir to prime the dcache.
These filesystems use d_alloc_parallel() which can block if there is a
concurrent lookup. Blocking in that case is pointless as the lookup
will add info to the dcache and there is no value in the readdir waiting
to see if it should add the info too.
Also these calls to d_alloc_parallel() are made while the parent
directory is locked. A proposed change to locking will lock the parent
later, after d_alloc_parallel(). This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.
So this patch introduces d_alloc_noblock() which doesn't block but
instead returns ERR_PTR(-EWOULDBLOCK). Filesystems that prime the
dcache (smb/client, nfs, fuse, cephfs) can now use that and ignore
-EWOULDBLOCK errors as harmless.
Unlike d_alloc_parallel(), d_alloc_noblock() calculates the hash and
performs a lookup before an allocation, as that is what all callers
want. This done using try_lookup_noperm() necessitating the includion
of namei.h in dcache.c.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 87 ++++++++++++++++++++++++++++++++++++++++--
include/linux/dcache.h | 1 +
2 files changed, 85 insertions(+), 3 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index a4ee380f3dfc..a0096ac02790 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -32,6 +32,7 @@
#include <linux/bit_spinlock.h>
#include <linux/rculist_bl.h>
#include <linux/list_lru.h>
+#include <linux/namei.h>
#include "internal.h"
#include "mount.h"
@@ -2652,8 +2653,16 @@ static void d_wait_lookup(struct dentry *dentry)
}
}
-struct dentry *d_alloc_parallel(struct dentry *parent,
- const struct qstr *name)
+/* What to do when __d_alloc_parallel finds a d_in_lookup dentry */
+enum alloc_para {
+ ALLOC_PARA_WAIT,
+ ALLOC_PARA_FAIL,
+};
+
+static inline
+struct dentry *__d_alloc_parallel(struct dentry *parent,
+ const struct qstr *name,
+ enum alloc_para how)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2735,7 +2744,18 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
* wait for them to finish
*/
spin_lock(&dentry->d_lock);
- d_wait_lookup(dentry);
+ if (d_in_lookup(dentry))
+ switch (how) {
+ case ALLOC_PARA_FAIL:
+ spin_unlock(&dentry->d_lock);
+ dput(new);
+ dput(dentry);
+ return ERR_PTR(-EWOULDBLOCK);
+ case ALLOC_PARA_WAIT:
+ d_wait_lookup(dentry);
+ /* ... and continue */
+ }
+
/*
* it's not in-lookup anymore; in principle we should repeat
* everything from dcache lookup, but it's likely to be what
@@ -2764,8 +2784,69 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
dput(dentry);
goto retry;
}
+
+/**
+ * d_alloc_parallel() - allocate a new dentry and ensure uniqueness
+ * @parent: dentry of the parent
+ * @name: name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup(), then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_parallel() waits for
+ * that lookup to complete before returning the dentry and then ensures the
+ * match is still valid.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error %-ENOMEM.
+ */
+struct dentry *d_alloc_parallel(struct dentry *parent,
+ const struct qstr *name)
+{
+ return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
+}
EXPORT_SYMBOL(d_alloc_parallel);
+/**
+ * d_alloc_noblock() - find or allocate a new dentry
+ * @parent: dentry of the parent
+ * @name: name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup() then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_noblock()
+ * returns with error %-EWOULDBLOCK.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name need not already have ->hash set.
+ *
+ * Returns: the dentry, whether found or allocated, or an error
+ * %-ENOMEM, %-EWOULDBLOCK, and anything returned by ->d_hash().
+ */
+struct dentry *d_alloc_noblock(struct dentry *parent,
+ struct qstr *name)
+{
+ struct dentry *de;
+
+ de = try_lookup_noperm(name, parent);
+ if (!de)
+ de = __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
+ return de;
+}
+EXPORT_SYMBOL(d_alloc_noblock);
+
/*
* Move dentry from in-lookup state to busy-negative one.
*
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 97a887be150a..d22c308043bb 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -257,6 +257,7 @@ extern void d_delete(struct dentry *);
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
+extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 5/7] VFS: add d_duplicate()
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
` (3 preceding siblings ...)
2026-04-30 2:03 ` [PATCH v4 4/7] VFS: introduce d_alloc_noblock() NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag NeilBrown
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
Occasionally a single operation can require two sub-operations on the
same name, and it is important that a d_alloc_parallel() (once that can
be run unlocked) does not create another dentry with the same name
between the operations.
Two examples:
1/ rename where the target name (a positive dentry) needs to be
"silly-renamed" to a temporary name so it will remain available on the
server (NFS and AFS). Here the same name needs to be the subject
of one rename, and the target of another.
2/ rename where the subject needs to be replaced with a white-out
(shmemfs). Here the same name need to be the subject of a rename
and the target of a mknod()
In both cases the original dentry is renamed to something else, and a
replacement is instantiated, possibly as the target of d_move(), possibly
by d_instantiate().
Currently d_alloc() is used to create the dentry and the exclusive lock
on the parent ensures no other dentry is created. When
d_alloc_parallel() is moved out of the parent lock, this will no longer
be sufficient. In particular if the original is renamed away before the
new is instantiated, there is a window where d_alloc_parallel() could
create another name. "silly-rename" does work in this order. shmemfs
whiteout doesn't open this hole but is essentially the same pattern and
should use the same approach.
The new d_duplicate() creates an in-lookup dentry with the same name as
the original dentry, which must be hashed. There is no need to check if
an in-lookup dentry exists with the same name as d_alloc_parallel() will
never try add one while the hashed dentry exists. Once the new
in-lookup is created, d_alloc_parallel() will find it and wait for it to
complete, then use it.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 51 ++++++++++++++++++++++++++++++++++++++++++
include/linux/dcache.h | 1 +
2 files changed, 52 insertions(+)
diff --git a/fs/dcache.c b/fs/dcache.c
index a0096ac02790..1943607f7547 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1900,6 +1900,57 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
}
EXPORT_SYMBOL(d_alloc);
+/**
+ * d_duplicate - duplicate a dentry for combined atomic operation
+ * @dentry: the dentry to duplicate
+ *
+ * Some rename operations need to be combined with another operation
+ * inside the filesystem.
+ * 1/ A cluster filesystem when renaming to an in-use file might need to
+ * first "silly-rename" that target out of the way before the main rename
+ * 2/ A filesystem that supports white-out might want to create a whiteout
+ * in place of the file being moved.
+ *
+ * For this they need two dentries which temporarily have the same name,
+ * before one is renamed. d_duplicate() provides for this. Given a
+ * positive hashed dentry, it creates a second in-lookup dentry.
+ * Because the original dentry exists, no other thread will try to
+ * create an in-lookup dentry, so there can be no race in this create.
+ *
+ * The caller should d_move() the original to a new name, often via a
+ * rename request, and should call d_lookup_done() on the newly created
+ * dentry. If the new is instantiated then old MUST either be moved
+ * or dropped.
+ *
+ * Parent must be locked.
+ *
+ * Returns: an in-lookup dentry, or an error.
+ */
+struct dentry *d_duplicate(struct dentry *dentry)
+{
+ unsigned int hash = dentry->d_name.hash;
+ struct dentry *parent = dentry->d_parent;
+ struct hlist_bl_head *b = in_lookup_hash(parent, hash);
+ struct dentry *new = __d_alloc(parent->d_sb, &dentry->d_name);
+
+ if (unlikely(!new))
+ return ERR_PTR(-ENOMEM);
+
+ new->d_flags |= DCACHE_PAR_LOOKUP;
+ spin_lock(&parent->d_lock);
+ new->d_parent = dget_dlock(parent);
+ hlist_add_head(&new->d_sib, &parent->d_children);
+ if (parent->d_flags & DCACHE_DISCONNECTED)
+ new->d_flags |= DCACHE_DISCONNECTED;
+ spin_unlock(&dentry->d_parent->d_lock);
+
+ hlist_bl_lock(b);
+ hlist_bl_add_head(&new->d_in_lookup_hash, b);
+ hlist_bl_unlock(b);
+ return new;
+}
+EXPORT_SYMBOL(d_duplicate);
+
struct dentry *d_alloc_anon(struct super_block *sb)
{
return __d_alloc(sb, NULL);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d22c308043bb..b4663a1a0636 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -259,6 +259,7 @@ extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
+struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag.
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
` (4 preceding siblings ...)
2026-04-30 2:03 ` [PATCH v4 5/7] VFS: add d_duplicate() NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 2:03 ` [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
2026-04-30 7:50 ` [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops Jeff Layton
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
Some ->lookup handlers will need to drop and retake the parent lock, so
they can safely use d_alloc_parallel().
->lookup can be called with the parent lock either exclusive or shared.
A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
This is rather ugly, but will be gone soon after we move
d_alloc_parallel() out of the directory lock as ->lookup() will *always*
called with a shared lock on the parent.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 7 ++++---
include/linux/namei.h | 3 ++-
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index a6349b31fdb6..e77ba9d31857 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1928,7 +1928,7 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
struct inode *inode = dir->d_inode;
struct dentry *res;
inode_lock_shared(inode);
- res = __lookup_slow(name, dir, flags);
+ res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
inode_unlock_shared(inode);
return res;
}
@@ -1942,7 +1942,7 @@ static struct dentry *lookup_slow_killable(const struct qstr *name,
if (inode_lock_shared_killable(inode))
return ERR_PTR(-EINTR);
- res = __lookup_slow(name, dir, flags);
+ res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
inode_unlock_shared(inode);
return res;
}
@@ -4413,6 +4413,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
struct dentry *dentry;
int error, create_error = 0;
umode_t mode = op->mode;
+ unsigned int shared_flag = (op->open_flag & O_CREAT) ? 0 : LOOKUP_SHARED;
if (unlikely(IS_DEADDIR(dir_inode)))
return ERR_PTR(-ENOENT);
@@ -4480,7 +4481,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
if (d_in_lookup(dentry)) {
struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
- nd->flags);
+ nd->flags | shared_flag);
d_lookup_done(dentry);
if (unlikely(res)) {
if (IS_ERR(res)) {
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ad6dd9987b9..b3346a513d8f 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -37,8 +37,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
#define LOOKUP_CREATE BIT(17) /* ... in object creation */
#define LOOKUP_EXCL BIT(18) /* ... in target must not exist */
#define LOOKUP_RENAME_TARGET BIT(19) /* ... in destination of rename() */
+#define LOOKUP_SHARED BIT(20) /* Parent lock is held shared */
-/* 4 spare bits for intent */
+/* 3 spare bits for intent */
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS BIT(24) /* No symlink crossing. */
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci()
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
` (5 preceding siblings ...)
2026-04-30 2:03 ` [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag NeilBrown
@ 2026-04-30 2:03 ` NeilBrown
2026-04-30 7:50 ` [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops Jeff Layton
7 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 2:03 UTC (permalink / raw)
To: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara, Jeff Layton,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
From: NeilBrown <neil@brown.name>
A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.
We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.
So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 7 +++++++
fs/dcache.c | 18 +++++++++++++++++-
fs/ntfs/namei.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
include/linux/dcache.h | 3 ++-
5 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 5cc6ae19845c..146720fc9f6f 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1391,3 +1391,10 @@ either form of manual loop.
**mandatory**
d_alloc_parallel() no longer requires a waitqueue_head.
+
+---
+
+**mandatory**
+
+d_add_ci() must now be passed the flags arguemnt that was given to ->lookup
+
diff --git a/fs/dcache.c b/fs/dcache.c
index 1943607f7547..43cd0765670f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2275,6 +2275,7 @@ EXPORT_SYMBOL(d_obtain_root);
* @dentry: the negative dentry that was passed to the parent's lookup func
* @inode: the inode case-insensitive lookup has found
* @name: the case-exact name to be associated with the returned dentry
+ * @lookup_flags: flags passed to ->lookup
*
* This is to avoid filling the dcache with case-insensitive names to the
* same inode, only the actual correct case is stored in the dcache for
@@ -2287,7 +2288,7 @@ EXPORT_SYMBOL(d_obtain_root);
* the exact case, and return the spliced entry.
*/
struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
- struct qstr *name)
+ struct qstr *name, unsigned int lookup_flags)
{
struct dentry *found, *res;
@@ -2300,6 +2301,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
iput(inode);
return found;
}
+ /*
+ * We are holding parent lock and so don't want to wait for a
+ * d_in_lookup() dentry. We can safely drop the parent lock and
+ * reclaim it as we have exclusive access to dentry as it is
+ * d_in_lookup() (so ->d_parent is stable) and we are near the
+ * end ->lookup() and will shortly drop the lock anyway.
+ */
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_unlock_shared(d_inode(dentry->d_parent));
+ else
+ inode_unlock(d_inode(dentry->d_parent));
if (d_in_lookup(dentry)) {
found = d_alloc_parallel(dentry->d_parent, name);
if (IS_ERR(found) || !d_in_lookup(found)) {
@@ -2313,6 +2325,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return ERR_PTR(-ENOMEM);
}
}
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_lock_shared(d_inode(dentry->d_parent));
+ else
+ inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
res = d_splice_alias(inode, found);
if (res) {
d_lookup_done(found);
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 10894de519c3..e2f3430c2e6d 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -310,7 +310,7 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
}
nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
- dent = d_add_ci(dent, dent_inode, &nls_name);
+ dent = d_add_ci(dent, dent_inode, &nls_name, flags);
kfree(nls_name.name);
return dent;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 325c2200c501..db0beb3831a9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -369,7 +369,7 @@ xfs_vn_ci_lookup(
/* else case-insensitive match... */
dname.name = ci_name.name;
dname.len = ci_name.len;
- dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+ dentry = d_add_ci(dentry, VFS_I(ip), &dname, flags);
kfree(ci_name.name);
return dentry;
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b4663a1a0636..9553bffbb098 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -263,7 +263,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+ unsigned int);
extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name);
extern struct dentry *d_find_any_alias(struct inode *inode);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops
2026-04-30 2:03 [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops NeilBrown
` (6 preceding siblings ...)
2026-04-30 2:03 ` [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
@ 2026-04-30 7:50 ` Jeff Layton
2026-04-30 9:08 ` NeilBrown
7 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2026-04-30 7:50 UTC (permalink / raw)
To: NeilBrown, Linus Torvalds, Al Viro, Christian Brauner, Jan Kara,
Amir Goldstein
Cc: linux-fsdevel, linux-kernel
On Thu, 2026-04-30 at 12:03 +1000, NeilBrown wrote:
> Following are 7 VFS patches which modify or introduce APIs that will
> allow modifying filesystems so that they will work with a proposed
> change to move d_alloc_paralle() out from the parent i_rw_sem lock.
>
> If these can land in a non-rebasing tree, I can work with individual
> filesystem maintainers to start using these APIs.
>
> I haven't included d_alloc_noblock_return() as it is only needed for one
> fs (ovl) and it is not yet clear that it is the best approach.
>
> I also haven't included the change to d_alloc_name() as that is only
> needed so that I can deprecate d_alloc() and there is no rush for that.
>
> Patch 2/7 is exactly the patch Al proposed in the conversation for v3.
> I have taken the libery of adding a Signed-off-by from Al to match the
> Co-developed-by. I hope that was not inappropriate.
>
> I have been testing this series over NFS mounts from XFS so patches 2
> and 3 don't seem to be causing any problems. The changes in 4/5/6/7
> won't be tested by this, and some cannot be tested until filesystems
> start using new interfaces.
>
> Thanks,
> NeilBrown
>
>
> [PATCH v4 1/7] VFS: fix various typos in documentation for
> [PATCH v4 2/7] VFS: use wait_var_event for waiting in
> [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup
> [PATCH v4 4/7] VFS: introduce d_alloc_noblock()
> [PATCH v4 5/7] VFS: add d_duplicate()
> [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag.
> [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across
I pointed Claude at the version of this in your tree and it spotted a
regression that I think looks legitimate:
2. Lock imbalance on early return: The parent lock is dropped unconditionally before
d_alloc_parallel()/d_alloc(), but three early return paths exit without reacquiring it:
- IS_ERR(found) from d_alloc_parallel()
- !d_in_lookup(found) from d_alloc_parallel()
- !found from d_alloc()
The callers (lookup_slow(), lookup_slow_killable()) unconditionally call inode_unlock_shared()
after ->lookup() returns. If d_add_ci() returns without the lock held, the caller unlocks an
unheld rwsem — corrupting its state.
Cheers,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops
2026-04-30 7:50 ` [PATCH v4 0/7]] VFS: Prepare to lift lookup out of exclusive lock for directory ops Jeff Layton
@ 2026-04-30 9:08 ` NeilBrown
0 siblings, 0 replies; 11+ messages in thread
From: NeilBrown @ 2026-04-30 9:08 UTC (permalink / raw)
To: Jeff Layton
Cc: Linus Torvalds, Al Viro, Christian Brauner, Jan Kara,
Amir Goldstein, linux-fsdevel, linux-kernel
On Thu, 30 Apr 2026, Jeff Layton wrote:
> On Thu, 2026-04-30 at 12:03 +1000, NeilBrown wrote:
> > Following are 7 VFS patches which modify or introduce APIs that will
> > allow modifying filesystems so that they will work with a proposed
> > change to move d_alloc_paralle() out from the parent i_rw_sem lock.
> >
> > If these can land in a non-rebasing tree, I can work with individual
> > filesystem maintainers to start using these APIs.
> >
> > I haven't included d_alloc_noblock_return() as it is only needed for one
> > fs (ovl) and it is not yet clear that it is the best approach.
> >
> > I also haven't included the change to d_alloc_name() as that is only
> > needed so that I can deprecate d_alloc() and there is no rush for that.
> >
> > Patch 2/7 is exactly the patch Al proposed in the conversation for v3.
> > I have taken the libery of adding a Signed-off-by from Al to match the
> > Co-developed-by. I hope that was not inappropriate.
> >
> > I have been testing this series over NFS mounts from XFS so patches 2
> > and 3 don't seem to be causing any problems. The changes in 4/5/6/7
> > won't be tested by this, and some cannot be tested until filesystems
> > start using new interfaces.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > [PATCH v4 1/7] VFS: fix various typos in documentation for
> > [PATCH v4 2/7] VFS: use wait_var_event for waiting in
> > [PATCH v4 3/7] VFS: enhance d_splice_alias() to handle in-lookup
> > [PATCH v4 4/7] VFS: introduce d_alloc_noblock()
> > [PATCH v4 5/7] VFS: add d_duplicate()
> > [PATCH v4 6/7] VFS: Add LOOKUP_SHARED flag.
> > [PATCH v4 7/7] VFS/xfs/ntfs: drop parent lock across
>
> I pointed Claude at the version of this in your tree and it spotted a
> regression that I think looks legitimate:
>
> 2. Lock imbalance on early return: The parent lock is dropped unconditionally before
> d_alloc_parallel()/d_alloc(), but three early return paths exit without reacquiring it:
> - IS_ERR(found) from d_alloc_parallel()
> - !d_in_lookup(found) from d_alloc_parallel()
> - !found from d_alloc()
>
> The callers (lookup_slow(), lookup_slow_killable()) unconditionally call inode_unlock_shared()
> after ->lookup() returns. If d_add_ci() returns without the lock held, the caller unlocks an
> unheld rwsem — corrupting its state.
Thanks for that - yes that was careless.
The unlock/relock is only needed around d_alloc_parallel() so I've put
it there which make the problem go away.
I've updated the github repp.
New patch below.
Thanks,
NeilBrown
From: NeilBrown <neil@brown.name>
Subject: [PATCH] VFS/xfs/ntfs: drop parent lock across d_alloc_parallel() in
d_add_ci()
A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.
We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.
So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 7 +++++++
fs/dcache.c | 21 +++++++++++++++++++--
fs/ntfs/namei.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
include/linux/dcache.h | 3 ++-
5 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 5cc6ae19845c..146720fc9f6f 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1391,3 +1391,10 @@ either form of manual loop.
**mandatory**
d_alloc_parallel() no longer requires a waitqueue_head.
+
+---
+
+**mandatory**
+
+d_add_ci() must now be passed the flags arguemnt that was given to ->lookup
+
diff --git a/fs/dcache.c b/fs/dcache.c
index 1943607f7547..665ce74eaadc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2275,6 +2275,7 @@ EXPORT_SYMBOL(d_obtain_root);
* @dentry: the negative dentry that was passed to the parent's lookup func
* @inode: the inode case-insensitive lookup has found
* @name: the case-exact name to be associated with the returned dentry
+ * @lookup_flags: flags passed to ->lookup
*
* This is to avoid filling the dcache with case-insensitive names to the
* same inode, only the actual correct case is stored in the dcache for
@@ -2287,7 +2288,7 @@ EXPORT_SYMBOL(d_obtain_root);
* the exact case, and return the spliced entry.
*/
struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
- struct qstr *name)
+ struct qstr *name, unsigned int lookup_flags)
{
struct dentry *found, *res;
@@ -2301,7 +2302,23 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return found;
}
if (d_in_lookup(dentry)) {
+ /*
+ * We are holding parent lock and so don't want to wait
+ * for a d_in_lookup() dentry. We can safely drop the
+ * parent lock and reclaim it as we have exclusive
+ * access to dentry as it is d_in_lookup() (so
+ * ->d_parent is stable) and we are near the end
+ * ->lookup() and will shortly drop the lock anyway.
+ */
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_unlock_shared(d_inode(dentry->d_parent));
+ else
+ inode_unlock(d_inode(dentry->d_parent));
found = d_alloc_parallel(dentry->d_parent, name);
+ if (lookup_flags & LOOKUP_SHARED)
+ inode_lock_shared(d_inode(dentry->d_parent));
+ else
+ inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
if (IS_ERR(found) || !d_in_lookup(found)) {
iput(inode);
return found;
@@ -2311,7 +2328,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
if (!found) {
iput(inode);
return ERR_PTR(-ENOMEM);
- }
+ }
}
res = d_splice_alias(inode, found);
if (res) {
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 10894de519c3..e2f3430c2e6d 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -310,7 +310,7 @@ static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
}
nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
- dent = d_add_ci(dent, dent_inode, &nls_name);
+ dent = d_add_ci(dent, dent_inode, &nls_name, flags);
kfree(nls_name.name);
return dent;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 325c2200c501..db0beb3831a9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -369,7 +369,7 @@ xfs_vn_ci_lookup(
/* else case-insensitive match... */
dname.name = ci_name.name;
dname.len = ci_name.len;
- dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+ dentry = d_add_ci(dentry, VFS_I(ip), &dname, flags);
kfree(ci_name.name);
return dentry;
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b4663a1a0636..9553bffbb098 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -263,7 +263,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+ unsigned int);
extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name);
extern struct dentry *d_find_any_alias(struct inode *inode);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread