* Re: [PATCH mm-unstable v15 11/13] mm/khugepaged: avoid unnecessary mTHP collapse attempts
From: David Hildenbrand (Arm) @ 2026-03-12 21:19 UTC (permalink / raw)
To: Nico Pache, linux-doc, linux-kernel, linux-mm, linux-trace-kernel
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <20260226032631.234234-1-npache@redhat.com>
On 2/26/26 04:26, Nico Pache wrote:
> There are cases where, if an attempted collapse fails, all subsequent
> orders are guaranteed to also fail. Avoid these collapse attempts by
> bailing out early.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 35 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1c3711ed4513..388d3f2537e2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1492,9 +1492,42 @@ static int mthp_collapse(struct mm_struct *mm, unsigned long address,
> ret = collapse_huge_page(mm, collapse_address, referenced,
> unmapped, cc, mmap_locked,
> order);
> - if (ret == SCAN_SUCCEED) {
> +
> + switch (ret) {
> + /* Cases were we continue to next collapse candidate */
> + case SCAN_SUCCEED:
> collapsed += nr_pte_entries;
> + fallthrough;
> + case SCAN_PTE_MAPPED_HUGEPAGE:
> continue;
> + /* Cases were lower orders might still succeed */
> + case SCAN_LACK_REFERENCED_PAGE:
> + case SCAN_EXCEED_NONE_PTE:
> + case SCAN_EXCEED_SWAP_PTE:
> + case SCAN_EXCEED_SHARED_PTE:
> + case SCAN_PAGE_LOCK:
> + case SCAN_PAGE_COUNT:
> + case SCAN_PAGE_LRU:
> + case SCAN_PAGE_NULL:
> + case SCAN_DEL_PAGE_LRU:
> + case SCAN_PTE_NON_PRESENT:
> + case SCAN_PTE_UFFD_WP:
> + case SCAN_ALLOC_HUGE_PAGE_FAIL:
> + goto next_order;
> + /* Cases were no further collapse is possible */
> + case SCAN_CGROUP_CHARGE_FAIL:
> + case SCAN_COPY_MC:
> + case SCAN_ADDRESS_RANGE:
> + case SCAN_NO_PTE_TABLE:
> + case SCAN_ANY_PROCESS:
> + case SCAN_VMA_NULL:
> + case SCAN_VMA_CHECK:
> + case SCAN_SCAN_ABORT:
> + case SCAN_PAGE_ANON:
> + case SCAN_PMD_MAPPED:
> + case SCAN_FAIL:
> + default:
> + return collapsed;
> }
> }
>
LGTM, but I do wonder, given that you have the "default" case, why spell
out the ones that fall into the "default" category? I'd strip those
/* For all other cases no futher collapse is possible */
default:
return collapsed;
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH mm-unstable v15 12/13] mm/khugepaged: run khugepaged for all orders
From: David Hildenbrand (Arm) @ 2026-03-12 21:22 UTC (permalink / raw)
To: Nico Pache, linux-doc, linux-kernel, linux-mm, linux-trace-kernel
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <20260226032650.234386-1-npache@redhat.com>
On 2/26/26 04:26, Nico Pache wrote:
> From: Baolin Wang <baolin.wang@linux.alibaba.com>
>
> If any order (m)THP is enabled we should allow running khugepaged to
> attempt scanning and collapsing mTHPs. In order for khugepaged to operate
> when only mTHP sizes are specified in sysfs, we must modify the predicate
> function that determines whether it ought to run to do so.
>
> This function is currently called hugepage_pmd_enabled(), this patch
> renames it to hugepage_enabled() and updates the logic to check to
> determine whether any valid orders may exist which would justify
> khugepaged running.
>
> We must also update collapse_allowable_orders() to check all orders if
> the vma is anonymous and the collapse is khugepaged.
>
> After this patch khugepaged mTHP collapse is fully enabled.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
Nothing jumped at me
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v2 2/3] lib/bootconfig: check bounds before writing in __xbc_open_brace()
From: Andrew Morton @ 2026-03-12 21:28 UTC (permalink / raw)
To: Josh Law
Cc: Steven Rostedt, Masami Hiramatsu, Josh Law, linux-kernel,
linux-trace-kernel
In-Reply-To: <4c426803-91f8-48fc-ae8e-20676479b370@gmail.com>
On Thu, 12 Mar 2026 21:09:52 +0000 Josh Law <hlcj1234567@gmail.com> wrote:
> > That's a fair point, Steve. Given that brace_index isn't touched elsewhere and the current check effectively prevents the overflow, I agree this isn't strictly necessary. I'll drop this patch and stick with the fix for the off-by-one reporting error instead. Thanks for the feedback!
>
> Wait Steve,
> Thanks for the look. I see your point that it's currently redundant given the call patterns. It looks like Andrew has already merged this into the -mm tree, likely as a 'belt-and-suspenders' safety measure. I'll keep your feedback in mind for future cleanup, but I'm glad we got the other off-by-one fix in as well!
Please wordwrap the emails.
> And in my opinion, merging it is a decent idea.
You've changed your position without explaining why?
^ permalink raw reply
* Re: [PATCH v2 2/3] lib/bootconfig: check bounds before writing in __xbc_open_brace()
From: Josh Law @ 2026-03-12 21:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Steven Rostedt, Masami Hiramatsu, Josh Law, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260312142809.cb92d6b5197e89645edab80b@linux-foundation.org>
12 Mar 2026 21:28:11 Andrew Morton <akpm@linux-foundation.org>:
> On Thu, 12 Mar 2026 21:09:52 +0000 Josh Law <hlcj1234567@gmail.com> wrote:
>
>>> That's a fair point, Steve. Given that brace_index isn't touched elsewhere and the current check effectively prevents the overflow, I agree this isn't strictly necessary. I'll drop this patch and stick with the fix for the off-by-one reporting error instead. Thanks for the feedback!
>>
>> Wait Steve,
>> Thanks for the look. I see your point that it's currently redundant given the call patterns. It looks like Andrew has already merged this into the -mm tree, likely as a 'belt-and-suspenders' safety measure. I'll keep your feedback in mind for future cleanup, but I'm glad we got the other off-by-one fix in as well!
>
> Please wordwrap the emails.
>
>> And in my opinion, merging it is a decent idea.
>
> You've changed your position without explaining why?
Sorry, I think it should be merged because it's better to be safe than sorry, I know there is different methods of implementation, but this one still works... I know it's churn (and I'm sorry)
^ permalink raw reply
* [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
This patch set progresses my effort to improve concurrency of
directory operations and specifically to allow concurrent updates
in a given directory.
There are a bunch of VFS patches which introduce some new APIs and
improve existing ones. Then a bunch of per-filesystem changes which
adjust to meet new needs, often using the new APIs, then a final bunch
of VFS patches which discard some APIs that are no longer wanted, and
one (the second last) which makes the big change. Some of the fs
patches don't depend on any preceeding patch and if maintainers wanted
to take those early I certainly wouldn't object! I've put a '*' next
to patches which I think can be taken at any time.
My longer term goal involves pushing the parent-directory locking down
into filesystems (which can then discard it if it isn't needed) and using
exclusive dentry locking in the VFS for all directory operations other
than readdir - which by its nature needs shared locking and will
continue to use the directory lock.
The VFS already has exclusive dentry locking for the limited case of
lookup. Newly created dentries (when created by d_alloc_parallel()) are
exclusively locked using the DCACHE_PAR_LOOKUP bit. They remain
exclusive locked until they are hashed as negative or positive dentries,
or they are discarded.
DCACHE_PAR_LOOKUP currently depends on a shared parent lock to exclude
directory modifying operations. This patch set removes this dependency
so that d_alloc_parallel() can be called without locking and all
directory modifying operations receive either a hashed dentry or an
in-lookup dentry (they currently recieve either a hashed or unhashed,
or sometimes in-lookup (atomic_open only)).
The cases where a filesystem can receive an in-lookup dentry are:
- lookup. Currently can receive in-lookup or unhashed. After this patch set
it always receives in-lookup
- atomic_open. Currently can receive in-lookup or hashed-negative.
This doesn't change with this patchset.
- rename. currently can receive hashed or unhashed. After this patchset
can also receive in-lookup where previously it would receive unhashed.
This is only for the target of a rename over NFS.
- link, mknod, mkdir, symlink. currently received hashed-negative except for
NFS which notices the implied exclusive create and skips the lookup so
the filesystem can received unhashed-negative for the operation.
There are two particular needs to be addressed before we can use d_alloc_parallel()
outside of the directory lock.
1/ d_alloc_parallel() effects a blocking lock so lock ordering is important.
If we are to take the directory lock *after* calling d_alloc_parallel() (and
still holding an in-lookup dentry, as happens at least when ->atomic_open
is called) then we must never call d_alloc_parallel() while holding the
directory lock, even a shared lock.
This particularly affects readdir as several filesystems prime the dcache
with readdir results and so use d_alloc_parallel() in the ->iterate_shared
handler, which will now have deadlock potential. To address this we
introduce d_alloc_noblock() which fails rather than blocking.
A few other cases of potential lock inversion exist. These are
addressed by dropping the directory lock when it is safe to do so
before calling d_alloc_parallel(). This requires the addtion of
LOOKUP_SHARED so that ->lookup knows how the parent is locked. This
is ugly but is gone by the end of the series. After the locking is
rearranged in the second last patch, ->lookup is only ever called
with a shared lock.
2/ As d_alloc_parallel() will be able to run without the directory lock,
holding that lock exclusively is not enough to protect some dcache
manipulations. In particular, several filesystems d_drop() a dentry
and (possibly) re-hash it. This will no longer be safe as
d_alloc_parallel() could run while the dentry was dropped, would find
that name doesn't exist in the dcache, and would create a new dentry
leading to two uncoordinated dentries with the same name.
It will still be safe to d_drop() a dentry after the operation has
completed, whether in success or failure. But d_drop()ing before that
is best avoided. An early d_drop() that isn't followed by a rehash is
not clearly problematic for a filesystem which still uses parent locking
(as all do at present) but is good to discourage that pattern now.
This is addressed, in part, by changing d_splice_alias() to be able to
instantiate any negative dentry, whether hashed, unhashed, or
in-lookup. This removes the need for d_drop() in most cases.
New APIs added are:
- d_alloc_noblock - see patch 05 for details
- d_duplicate - patch 06
Removed APIs:
- d_alloc
- d_rehash
- d_add
- lookup_one
- lookup_noperm
Changed APIs:
- d_alloc_paralle - no longer requires a waitqueue_head_t
- d_splice_alias - now works with in-lookup dentry
- d_alloc_name - now works with ->d_hash
d_alloc_name() should be used with d_make_persistent(). These don't require
VFS locking as the filesystem doesn't permit create/remove via VFS calls,
and provides its own locking to avoid duplicate names.
d_splice_alias() should *always* be used:
in ->lookup
in ->iterate_shared for cache priming.
in ->atomic_open, possibly via a call to ->lookup
in ->mkdir unless d_instantiate_new() can be used.
in ->link ->symlink ->mknod if ->lookup skips LOOKUP_CREATE|LOOKUP_EXCL
Thanks for reading this far! I've been testing NFS but haven't tried
anything else yet. As well as the normal review of details I'd love to
know if I've missed any important conseqeunces of the locking change.
It is a big conceptual change and there could easily be surprising
implications.
Thanks,
NeilBrown
[PATCH 01/53] VFS: fix various typos in documentation for
[PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup
[PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
[PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
[PATCH 05/53] VFS: introduce d_alloc_noblock()
[PATCH 06/53] VFS: add d_duplicate()
[PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
[PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in
*[PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from
[PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
[PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
[PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in
[PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
[PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
[PATCH 15/53] nfs: use d_duplicate()
*[PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
*[PATCH 17/53] coda: don't d_drop() early.
[PATCH 18/53] shmem: use d_duplicate()
*[PATCH 19/53] afs: use d_time instead of d_fsdata
*[PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename
[PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode()
[PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename()
[PATCH 23/53] afs: lookup_atsys to drop and reclaim lock.
[PATCH 24/53] afs: use d_duplicate()
*[PATCH 25/53] smb/client: use d_time to store a timestamp in dentry,
*[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
*[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
[PATCH 28/53] smb/client: Use d_alloc_noblock() in
*[PATCH 29/53] exfat: simplify exfat_lookup()
*[PATCH 30/53] configfs: remove d_add() calls before
[PATCH 31/53] configfs: stop using d_add().
*[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
*[PATCH 33/53] ext4: use on-stack dentries in
[PATCH 34/53] tracefs: stop using d_add().
[PATCH 35/53] cephfs: stop using d_add().
*[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
[PATCH 37/53] cephfs: Use d_alloc_noblock() in
[PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
[PATCH 39/53] ecryptfs: stop using d_add().
[PATCH 40/53] gfs2: stop using d_add().
[PATCH 41/53] libfs: stop using d_add().
[PATCH 42/53] fuse: don't d_drop() before d_splice_alias()
[PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link()
[PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in
[PATCH 45/53] efivarfs: use d_alloc_name()
[PATCH 46/53] Remove references to d_add() in documentation and
[PATCH 47/53] VFS: make d_alloc() local to VFS.
[PATCH 48/53] VFS: remove d_add()
[PATCH 49/53] VFS: remove d_rehash()
[PATCH 50/53] VFS: remove lookup_one() and lookup_noperm()
[PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
[PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock
[PATCH 53/53] VFS: remove LOOKUP_SHARED
^ permalink raw reply
* [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Various typos fixes.
start_creating_dentry() now documented as *creating*, not *removing* the
entry.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 8 +++----
fs/namei.c | 30 +++++++++++++--------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index d02aa57e4477..560b473e02d0 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1203,7 +1203,7 @@ will fail-safe.
---
-** mandatory**
+**mandatory**
lookup_one(), lookup_one_unlocked(), lookup_one_positive_unlocked() now
take a qstr instead of a name and len. These, not the "one_len"
@@ -1212,7 +1212,7 @@ that filesysmtem, through a mount point - which will have a mnt_idmap.
---
-** mandatory**
+**mandatory**
Functions try_lookup_one_len(), lookup_one_len(),
lookup_one_len_unlocked() and lookup_positive_unlocked() have been
@@ -1229,7 +1229,7 @@ already been performed such as after vfs_path_parent_lookup()
---
-** mandatory**
+**mandatory**
d_hash_and_lookup() is no longer exported or available outside the VFS.
Use try_lookup_noperm() instead. This adds name validation and takes
@@ -1370,7 +1370,7 @@ lookup_one_qstr_excl() is no longer exported - use start_creating() or
similar.
---
-** mandatory**
+**mandatory**
lock_rename(), lock_rename_child(), unlock_rename() are no
longer available. Use start_renaming() or similar.
diff --git a/fs/namei.c b/fs/namei.c
index 77189335bbcc..6ffb8367b1cf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2936,8 +2936,8 @@ struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
* end_dirop - signal completion of a dirop
* @de: the dentry which was returned by start_dirop or similar.
*
- * If the de is an error, nothing happens. Otherwise any lock taken to
- * protect the dentry is dropped and the dentry itself is release (dput()).
+ * If the @de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is released (dput()).
*/
void end_dirop(struct dentry *de)
{
@@ -3254,7 +3254,7 @@ EXPORT_SYMBOL(lookup_one_unlocked);
* the i_rwsem itself if necessary. If a fatal signal is pending or
* delivered, it will return %-EINTR if the lock is needed.
*
- * Returns: A dentry, possibly negative, or
+ * Returns: A positive dentry, or
* - same errors as lookup_one_unlocked() or
* - ERR_PTR(-EINTR) if a fatal signal is pending.
*/
@@ -3376,7 +3376,7 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
/**
- * start_creating - prepare to create a given name with permission checking
+ * start_creating - prepare to access or create a given name with permission checking
* @idmap: idmap of the mount
* @parent: directory in which to prepare to create the name
* @name: the name to be created
@@ -3408,8 +3408,8 @@ EXPORT_SYMBOL(start_creating);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
- * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * Locks are taken and a lookup is performed prior to removing an object
+ * from a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
* If the name doesn't exist, an error is returned.
@@ -3435,7 +3435,7 @@ EXPORT_SYMBOL(start_removing);
* @parent: directory in which to prepare to create the name
* @name: the name to be created
*
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
* an object in a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
@@ -3464,7 +3464,7 @@ EXPORT_SYMBOL(start_creating_killable);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
* an object from a directory. Permission checking (MAY_EXEC) is performed
* against @idmap.
*
@@ -3494,7 +3494,7 @@ EXPORT_SYMBOL(start_removing_killable);
* @parent: directory in which to prepare to create the name
* @name: the name to be created
*
- * Locks are taken and a lookup in performed prior to creating
+ * Locks are taken and a lookup is performed prior to creating
* an object in a directory.
*
* If the name already exists, a positive dentry is returned.
@@ -3517,7 +3517,7 @@ EXPORT_SYMBOL(start_creating_noperm);
* @parent: directory in which to find the name
* @name: the name to be removed
*
- * Locks are taken and a lookup in performed prior to removing
+ * Locks are taken and a lookup is performed prior to removing
* an object from a directory.
*
* If the name doesn't exist, an error is returned.
@@ -3538,11 +3538,11 @@ struct dentry *start_removing_noperm(struct dentry *parent,
EXPORT_SYMBOL(start_removing_noperm);
/**
- * start_creating_dentry - prepare to create a given dentry
- * @parent: directory from which dentry should be removed
- * @child: the dentry to be removed
+ * start_creating_dentry - prepare to access or create a given dentry
+ * @parent: directory of dentry
+ * @child: the dentry to be prepared
*
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
* the validity of the dentry is checked: correct parent and still hashed.
*
* If the dentry is valid and negative a reference is taken and
@@ -3575,7 +3575,7 @@ EXPORT_SYMBOL(start_creating_dentry);
* @parent: directory from which dentry should be removed
* @child: the dentry to be removed
*
- * A lock is taken to protect the dentry again other dirops and
+ * A lock is taken to protect the dentry against other dirops and
* the validity of the dentry is checked: correct parent and still hashed.
*
* If the dentry is valid and positive, a reference is taken and
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
We currently have three interfaces for attaching existing inodes to
normal filesystems(*).
- d_add() requires an unhashed or in-lookup dentry and doesn't handle
splicing in case a directory already has dentry
- d_instantiate() requires a hashed dentry, and also doesn't handle
splicing.
- d_splice_alias() requires unhashed or in-lookup and does handle
splicing, and can return an alternate dentry.
So there is no interface that supports both hashed and in-lookup, which
is what ->atomic_open needs to deal with.
Some filesystems check for in-lookup in their atomic_open and if found,
perform a ->lookup and can subsequently use d_instantiate() if the
dentry is still negative. Other d_drop() the dentry so they can use
d_splice_alias().
This last will cause a problem for proposed changes to locking which
require the dentry to remain hashed while and operation proceeds on it.
There is also no interface which splices a directory (which might
already have a dentry) to a hashed dentry. Filesystems which need to do
this d_drop() first.
So with this patch d_splice_alias() can handle hashed, unhashed, or
in-lookup dentries. This makes it suitable for ->lookup, ->atomic_open,
and ->mkdir.
As a side effect d_add() will also now handle hashed dentries, but
future patches will remove d_add() as there is no benefit having it as
well as the others.
__d_add() currently contains code that is identical to
__d_instantiate(), so the former is changed to call the later, and both
d_add() and d_instantiate() call __d_add().
* There is also d_make_persistent() for filesystems which are
dcache-based and don't support mkdir, create etc, and
d_instantiate_new() for newly created inodes that are still locked.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/vfs.rst | 4 ++--
fs/dcache.c | 31 ++++++++++++-------------------
2 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c753148af88..d8df0a84cdba 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -507,8 +507,8 @@ otherwise noted.
dentry before the first mkdir returns.
If there is any chance this could happen, then the new inode
- should be d_drop()ed and attached with d_splice_alias(). The
- returned dentry (if any) should be returned by ->mkdir().
+ should be attached with d_splice_alias(). The returned
+ dentry (if any) should be returned by ->mkdir().
``rmdir``
called by the rmdir(2) system call. Only required if you want
diff --git a/fs/dcache.c b/fs/dcache.c
index 7ba1801d8132..2a100c616576 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2001,7 +2001,6 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
* (or otherwise set) by the caller to indicate that it is now
* in use by the dcache.
*/
-
void d_instantiate(struct dentry *entry, struct inode * inode)
{
BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
@@ -2755,18 +2754,14 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
dir = dentry->d_parent->d_inode;
n = start_dir_add(dir);
d_wait = __d_lookup_unhash(dentry);
+ __d_rehash(dentry);
+ } else if (d_unhashed(dentry)) {
+ __d_rehash(dentry);
}
if (unlikely(ops))
d_set_d_op(dentry, ops);
- if (inode) {
- unsigned add_flags = d_flags_for_inode(inode);
- hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
- raw_write_seqcount_begin(&dentry->d_seq);
- __d_set_inode_and_type(dentry, inode, add_flags);
- raw_write_seqcount_end(&dentry->d_seq);
- fsnotify_update_flags(dentry);
- }
- __d_rehash(dentry);
+ if (inode)
+ __d_instantiate(dentry, inode);
if (dir)
end_dir_add(dir, n, d_wait);
spin_unlock(&dentry->d_lock);
@@ -3066,8 +3061,6 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
if (IS_ERR(inode))
return ERR_CAST(inode);
- BUG_ON(!d_unhashed(dentry));
-
if (!inode)
goto out;
@@ -3116,6 +3109,8 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
* @inode: the inode which may have a disconnected dentry
* @dentry: a negative dentry which we want to point to the inode.
*
+ * @dentry must be negative and may be in-lookup or unhashed or hashed.
+ *
* If inode is a directory and has an IS_ROOT alias, then d_move that in
* place of the given dentry and return it, else simply d_add the inode
* to the dentry and return NULL.
@@ -3123,16 +3118,14 @@ struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
* If a non-IS_ROOT directory is found, the filesystem is corrupt, and
* we should error out: directories can't have multiple aliases.
*
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
+ * This should be used to return the result of ->lookup() and to
+ * instantiate the result of ->mkdir(), is often useful for
+ * ->atomic_open, and may be used to instantiate other objects.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
- * is returned. This matches the expected return value of ->lookup.
+ * is returned. This matches the expected return value of ->lookup and
+ * ->mkdir.
*
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
*/
struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
efivarfs() is similar to other filesystems which use d_alloc_name(), but
it cannot use d_alloc_name() as it has a ->d_hash function.
The only problem with using ->d_hash if available is that it can return
an error, but d_alloc_name() cannot. If we document that d_alloc_name()
cannot be used when ->d_hash returns an error, then any filesystem which
has a safe ->d_hash can safely use d_alloc_name().
So enhance d_alloc_name() to check for a ->d_hash function
and document that this is not permitted if the ->d_hash function can
fail( which efivarfs_d_hash() cannot).
Also document locking requirements for use.
This is a step towards eventually deprecating d_alloc().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/dcache.c b/fs/dcache.c
index 2a100c616576..6dfc2c7110ba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1878,12 +1878,29 @@ struct dentry *d_alloc_pseudo(struct super_block *sb, const struct qstr *name)
return dentry;
}
+/**
+ * d_alloc_name: allocate a dentry for use in a dcache-based filesystem.
+ * @parent: dentry of the parent for the dentry
+ * @name: name of the dentry
+ *
+ * d_alloc_name() allocates a dentry without any protection against races.
+ * It should only be used in directories that do not support create/rename/link
+ * inode operations. The result is typically passed to d_make_persistent().
+ *
+ * This must NOT be used by filesystems which provide a d_hash() function
+ * which can return an error.
+ */
struct dentry *d_alloc_name(struct dentry *parent, const char *name)
{
struct qstr q;
q.name = name;
q.hash_len = hashlen_string(parent, name);
+ if (parent->d_flags & DCACHE_OP_HASH) {
+ int err = parent->d_op->d_hash(parent, &q);
+ if (WARN_ON_ONCE(err))
+ return NULL;
+ }
return d_alloc(parent, &q);
}
EXPORT_SYMBOL(d_alloc_name);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
d_alloc_parallel() currently requires a wait_queue_head to be passed in.
This must have a life time which extends until the lookup is completed.
Future patches will make more use of d_alloc_parallel() and use it in
contexts where having an on-stack wait_queue_head is not convenient.
For exaple lookup_one_qstr_excl() can only use d_alloc_parallel() if it
accepts a wait_queue_head pass in by the caller.
The interface would be easier to use if this need were removed. Rather
than passing a wait_queue_head into lookup_one_qstr_excl() we can let
d_alloc_parallel() manage the wait_queue_head entirely itself.
This patch replaces the on-stack wqs with a global array of wqs which
are used as needed. A wq is NOT assigned when a dentry is first
created but only when a second thread attempts to use the same name and
so is forced to wait. At this moment a wq is chosen using a hash of the
dentry pointer and that wq is assigned to ->d_wait. The ->d_lock is
then dropped and the task waits.
When the dentry is finally moved out of "in_lookup" a wake up is only
sent if ->d_wait is not NULL. This avoids an (uncontended) spin
lock/unlock which saves a couple of atomic operations in a common case.
The wake up passes the dentry that the wake up is for as the "key" and
the waiter will only wake processes waiting on the same key. This means
that when these global waitqueues are shared (which is inevitable
though unlikely to be frequent), a task will not be woken prematurely.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 6 +++
fs/afs/dir_silly.c | 4 +-
fs/dcache.c | 78 ++++++++++++++++++++++-----
fs/fuse/readdir.c | 3 +-
fs/namei.c | 6 +--
fs/nfs/dir.c | 6 +--
fs/nfs/unlink.c | 3 +-
fs/proc/base.c | 3 +-
fs/proc/proc_sysctl.c | 3 +-
fs/smb/client/readdir.c | 3 +-
include/linux/dcache.h | 3 +-
include/linux/nfs_xdr.h | 1 -
12 files changed, 81 insertions(+), 38 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 560b473e02d0..6a507c508ccf 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1375,3 +1375,9 @@ similar.
lock_rename(), lock_rename_child(), unlock_rename() are no
longer available. Use start_renaming() or similar.
+---
+
+**mandatory**
+
+d_alloc_parallel() no longer requires a waitqueue_head. It uses one
+from an internal table when needed.
diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index a748fd133faf..982bb6ec15f0 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -248,13 +248,11 @@ int afs_silly_iput(struct dentry *dentry, struct inode *inode)
struct dentry *alias;
int ret;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
down_read(&dvnode->rmdir_lock);
- alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq);
+ alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name);
if (IS_ERR(alias)) {
up_read(&dvnode->rmdir_lock);
return 0;
diff --git a/fs/dcache.c b/fs/dcache.c
index 6dfc2c7110ba..c80406bfa0d8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2199,8 +2199,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return found;
}
if (d_in_lookup(dentry)) {
- found = d_alloc_parallel(dentry->d_parent, name,
- dentry->d_wait);
+ found = d_alloc_parallel(dentry->d_parent, name);
if (IS_ERR(found) || !d_in_lookup(found)) {
iput(inode);
return found;
@@ -2210,7 +2209,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
if (!found) {
iput(inode);
return ERR_PTR(-ENOMEM);
- }
+ }
}
res = d_splice_alias(inode, found);
if (res) {
@@ -2576,6 +2575,46 @@ void d_rehash(struct dentry * entry)
}
EXPORT_SYMBOL(d_rehash);
+#define PAR_LOOKUP_WQ_BITS 8
+#define PAR_LOOKUP_WQS (1 << PAR_LOOKUP_WQ_BITS)
+static wait_queue_head_t par_wait_table[PAR_LOOKUP_WQS] __cacheline_aligned;
+
+static int __init par_wait_init(void)
+{
+ int i;
+
+ for (i = 0; i < PAR_LOOKUP_WQS; i++)
+ init_waitqueue_head(&par_wait_table[i]);
+ return 0;
+}
+fs_initcall(par_wait_init);
+
+struct par_wait_key {
+ struct dentry *de;
+ struct wait_queue_entry wqe;
+};
+
+static int d_wait_wake_fn(struct wait_queue_entry *wq_entry,
+ unsigned mode, int sync, void *key)
+{
+ struct par_wait_key *pwk = container_of(wq_entry,
+ struct par_wait_key, wqe);
+ if (pwk->de == key)
+ return default_wake_function(wq_entry, mode, sync, key);
+ return 0;
+}
+
+static inline void d_wake_waiters(struct wait_queue_head *d_wait,
+ struct dentry *dentry)
+{
+ /* ->d_wait is only set if some thread is actually waiting.
+ * If we find it is NULL - the common case - then there was no
+ * contention and there are no waiters to be woken.
+ */
+ if (d_wait)
+ __wake_up(d_wait, TASK_NORMAL, 0, dentry);
+}
+
static inline unsigned start_dir_add(struct inode *dir)
{
preempt_disable_nested();
@@ -2588,31 +2627,42 @@ static inline unsigned start_dir_add(struct inode *dir)
}
static inline void end_dir_add(struct inode *dir, unsigned int n,
- wait_queue_head_t *d_wait)
+ wait_queue_head_t *d_wait, struct dentry *de)
{
smp_store_release(&dir->i_dir_seq, n + 2);
preempt_enable_nested();
- if (wq_has_sleeper(d_wait))
- wake_up_all(d_wait);
+ d_wake_waiters(d_wait, de);
}
static void d_wait_lookup(struct dentry *dentry)
{
if (d_in_lookup(dentry)) {
- DECLARE_WAITQUEUE(wait, current);
- add_wait_queue(dentry->d_wait, &wait);
+ struct par_wait_key wk = {
+ .de = dentry,
+ .wqe = {
+ .private = current,
+ .func = d_wait_wake_fn,
+ },
+ };
+ struct wait_queue_head *wq;
+
+ if (!dentry->d_wait)
+ dentry->d_wait = &par_wait_table[hash_ptr(dentry,
+ PAR_LOOKUP_WQ_BITS)];
+ wq = dentry->d_wait;
+ add_wait_queue(wq, &wk.wqe);
do {
set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock(&dentry->d_lock);
schedule();
spin_lock(&dentry->d_lock);
} while (d_in_lookup(dentry));
+ remove_wait_queue(wq, &wk.wqe);
}
}
struct dentry *d_alloc_parallel(struct dentry *parent,
- const struct qstr *name,
- wait_queue_head_t *wq)
+ const struct qstr *name)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2625,6 +2675,7 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
return ERR_PTR(-ENOMEM);
new->d_flags |= DCACHE_PAR_LOOKUP;
+ new->d_wait = NULL;
spin_lock(&parent->d_lock);
new->d_parent = dget_dlock(parent);
hlist_add_head(&new->d_sib, &parent->d_children);
@@ -2715,7 +2766,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
return dentry;
}
rcu_read_unlock();
- new->d_wait = wq;
hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
hlist_bl_unlock(b);
return new;
@@ -2753,7 +2803,7 @@ static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
void __d_lookup_unhash_wake(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
- wake_up_all(__d_lookup_unhash(dentry));
+ d_wake_waiters(__d_lookup_unhash(dentry), dentry);
spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(__d_lookup_unhash_wake);
@@ -2780,7 +2830,7 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode,
if (inode)
__d_instantiate(dentry, inode);
if (dir)
- end_dir_add(dir, n, d_wait);
+ end_dir_add(dir, n, d_wait, dentry);
spin_unlock(&dentry->d_lock);
if (inode)
spin_unlock(&inode->i_lock);
@@ -2964,7 +3014,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
write_seqcount_end(&dentry->d_seq);
if (dir)
- end_dir_add(dir, n, d_wait);
+ end_dir_add(dir, n, d_wait, target);
if (dentry->d_parent != old_parent)
spin_unlock(&dentry->d_parent->d_lock);
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index c2aae2eef086..f588252891af 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -160,7 +160,6 @@ static int fuse_direntplus_link(struct file *file,
struct inode *dir = d_inode(parent);
struct fuse_conn *fc;
struct inode *inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
int epoch;
if (!o->nodeid) {
@@ -197,7 +196,7 @@ static int fuse_direntplus_link(struct file *file,
dentry = d_lookup(parent, &name);
if (!dentry) {
retry:
- dentry = d_alloc_parallel(parent, &name, &wq);
+ dentry = d_alloc_parallel(parent, &name);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
}
diff --git a/fs/namei.c b/fs/namei.c
index 6ffb8367b1cf..d31c3db7eb5e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1891,13 +1891,12 @@ static struct dentry *__lookup_slow(const struct qstr *name,
{
struct dentry *dentry, *old;
struct inode *inode = dir->d_inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
/* Don't go there if it's already dead */
if (unlikely(IS_DEADDIR(inode)))
return ERR_PTR(-ENOENT);
again:
- dentry = d_alloc_parallel(dir, name, &wq);
+ dentry = d_alloc_parallel(dir, name);
if (IS_ERR(dentry))
return dentry;
if (unlikely(!d_in_lookup(dentry))) {
@@ -4408,7 +4407,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
struct dentry *dentry;
int error, create_error = 0;
umode_t mode = op->mode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
if (unlikely(IS_DEADDIR(dir_inode)))
return ERR_PTR(-ENOENT);
@@ -4417,7 +4415,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
dentry = d_lookup(dir, &nd->last);
for (;;) {
if (!dentry) {
- dentry = d_alloc_parallel(dir, &nd->last, &wq);
+ dentry = d_alloc_parallel(dir, &nd->last);
if (IS_ERR(dentry))
return dentry;
}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2402f57c8e7d..52e7656195ec 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -727,7 +727,6 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
unsigned long dir_verifier)
{
struct qstr filename = QSTR_INIT(entry->name, entry->len);
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
struct dentry *dentry;
struct dentry *alias;
struct inode *inode;
@@ -756,7 +755,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
dentry = d_lookup(parent, &filename);
again:
if (!dentry) {
- dentry = d_alloc_parallel(parent, &filename, &wq);
+ dentry = d_alloc_parallel(parent, &filename);
if (IS_ERR(dentry))
return;
}
@@ -2107,7 +2106,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned open_flags,
umode_t mode)
{
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
struct nfs_open_context *ctx;
struct dentry *res;
struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -2163,7 +2161,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
d_drop(dentry);
switched = true;
dentry = d_alloc_parallel(dentry->d_parent,
- &dentry->d_name, &wq);
+ &dentry->d_name);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
if (unlikely(!d_in_lookup(dentry)))
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index df3ca4669df6..43ea897943c0 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -124,7 +124,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
struct dentry *alias;
down_read_non_owner(&NFS_I(dir)->rmdir_sem);
- alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
+ alias = d_alloc_parallel(dentry->d_parent, &data->args.name);
if (IS_ERR(alias)) {
up_read_non_owner(&NFS_I(dir)->rmdir_sem);
return 0;
@@ -185,7 +185,6 @@ nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
data->cred = get_current_cred();
data->res.dir_attr = &data->dir_attr;
- init_waitqueue_head(&data->wq);
status = -EBUSY;
spin_lock(&dentry->d_lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4eec684baca9..070c0d58b2da 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2129,8 +2129,7 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx,
child = try_lookup_noperm(&qname, dir);
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
- child = d_alloc_parallel(dir, &qname, &wq);
+ child = d_alloc_parallel(dir, &qname);
if (IS_ERR(child))
goto end_instantiate;
if (d_in_lookup(child)) {
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 49ab74e0bfde..04a382178c65 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -692,8 +692,7 @@ static bool proc_sys_fill_cache(struct file *file,
child = d_lookup(dir, &qname);
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
- child = d_alloc_parallel(dir, &qname, &wq);
+ child = d_alloc_parallel(dir, &qname);
if (IS_ERR(child))
return false;
if (d_in_lookup(child)) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 8615a8747b7f..47f5d620b750 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -73,7 +73,6 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions;
bool reparse_need_reval = false;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
int rc;
cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
@@ -105,7 +104,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
(fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
return;
- dentry = d_alloc_parallel(parent, name, &wq);
+ dentry = d_alloc_parallel(parent, name);
}
if (IS_ERR(dentry))
return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 898c60d21c92..c6440c626a0f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -244,8 +244,7 @@ extern void d_delete(struct dentry *);
/* allocate/de-allocate */
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
-extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
- wait_queue_head_t *);
+extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..1acc2479cb38 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1740,7 +1740,6 @@ struct nfs_unlinkdata {
struct nfs_removeargs args;
struct nfs_removeres res;
struct dentry *dentry;
- wait_queue_head_t wq;
const struct cred *cred;
struct nfs_fattr dir_attr;
long timeout;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 05/53] VFS: introduce d_alloc_noblock()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Several filesystems use the results of readdir to prime the dcache.
These filesystems use d_alloc_parallel() which can block if there is a
concurrent lookup. Blocking in that case is pointless as the lookup
will add info to the dcache and there is no value in the readdir waiting
to see if it should add the info too.
Also these calls to d_alloc_parallel() are made while the parent
directory is locked. A proposed change to locking will lock the parent
later, after d_alloc_parallel(). This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.
So this patch introduces d_alloc_noblock() which doesn't block but
instead returns ERR_PTR(-EWOULDBLOCK). Filesystems that prime the
dcache (smb/client, nfs, fuse, cephfs) can now use that and ignore
-EWOULDBLOCK errors as harmless.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 82 ++++++++++++++++++++++++++++++++++++++++--
include/linux/dcache.h | 1 +
2 files changed, 80 insertions(+), 3 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index c80406bfa0d8..f4d7d200bc46 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2661,8 +2661,16 @@ static void d_wait_lookup(struct dentry *dentry)
}
}
-struct dentry *d_alloc_parallel(struct dentry *parent,
- const struct qstr *name)
+/* What to do when __d_alloc_parallel finds a d_in_lookup dentry */
+enum alloc_para {
+ ALLOC_PARA_WAIT,
+ ALLOC_PARA_FAIL,
+};
+
+static inline
+struct dentry *__d_alloc_parallel(struct dentry *parent,
+ const struct qstr *name,
+ enum alloc_para how)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2745,7 +2753,18 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
* wait for them to finish
*/
spin_lock(&dentry->d_lock);
- d_wait_lookup(dentry);
+ if (d_in_lookup(dentry))
+ switch (how) {
+ case ALLOC_PARA_FAIL:
+ spin_unlock(&dentry->d_lock);
+ dput(new);
+ dput(dentry);
+ return ERR_PTR(-EWOULDBLOCK);
+ case ALLOC_PARA_WAIT:
+ d_wait_lookup(dentry);
+ /* ... and continue */
+ }
+
/*
* it's not in-lookup anymore; in principle we should repeat
* everything from dcache lookup, but it's likely to be what
@@ -2774,8 +2793,65 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
dput(dentry);
goto retry;
}
+
+/**
+ * d_alloc_parallel() - allocate a new dentry and ensure uniqueness
+ * @parent - dentry of the parent
+ * @name - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup(), then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_parallel() waits for
+ * that lookup to complete before returning the dentry and then ensures the
+ * match is still valid.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error %-ENOMEM.
+ */
+struct dentry *d_alloc_parallel(struct dentry *parent,
+ const struct qstr *name)
+{
+ return __d_alloc_parallel(parent, name, ALLOC_PARA_WAIT);
+}
EXPORT_SYMBOL(d_alloc_parallel);
+/**
+ * d_alloc_noblock() - find or allocate a new dentry
+ * @parent - dentry of the parent
+ * @name - name of the dentry within that parent.
+ *
+ * A new dentry is allocated and, providing it is unique, added to the
+ * relevant index.
+ * If an existing dentry is found with the same parent/name that is
+ * not d_in_lookup() then that is returned instead.
+ * If the existing dentry is d_in_lookup(), d_alloc_noblock()
+ * returns with error %-EWOULDBLOCK.
+ * Thus if the returned dentry is d_in_lookup() then the caller has
+ * exclusive access until it completes the lookup.
+ * If the returned dentry is not d_in_lookup() then a lookup has
+ * already completed.
+ *
+ * The @name must already have ->hash set, as can be achieved
+ * by e.g. try_lookup_noperm().
+ *
+ * Returns: the dentry, whether found or allocated, or an error
+ * %-ENOMEM or %-EWOULDBLOCK.
+ */
+struct dentry *d_alloc_noblock(struct dentry *parent,
+ struct qstr *name)
+{
+ return __d_alloc_parallel(parent, name, ALLOC_PARA_FAIL);
+}
+EXPORT_SYMBOL(d_alloc_noblock);
+
/*
* - Unhash the dentry
* - Retrieve and clear the waitqueue head in dentry
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c6440c626a0f..3cb70b3398f0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -245,6 +245,7 @@ extern void d_delete(struct dentry *);
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
+extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 06/53] VFS: add d_duplicate()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Occasionally a single operation can require two sub-operations on the
same name, and it is important that a d_alloc_parallel() (once that can
be run unlocked) does not create another dentry with the same name
between the operations.
Two examples:
1/ rename where the target name (a positive dentry) needs to be
"silly-renamed" to a temporary name so it will remain available on the
server (NFS and AFS). Here the same name needs to be the subject
of one rename, and the target of another.
2/ rename where the subject needs to be replaced with a white-out
(shmemfs). Here the same name need to be the target of a rename
and the target of a mknod()
In both cases the original dentry is renamed to something else, and a
replacement is instantiated, possibly as the target of d_move(), possibly
by d_instantiate().
Currently d_alloc() is used to create the dentry and the exclusive lock
on the parent ensures no other dentry is created. When
d_alloc_parallel() is moved out of the parent lock, this will no longer
be sufficient. In particular if the original is renamed away before the
new is instantiated, there is a window where d_alloc_parallel() could
create another name. "silly-rename" does work in this order. shmemfs
whiteout doesn't open this hole but is essentially the same pattern and
should use the same approach.
The new d_duplicate() creates an in-lookup dentry with the same name as
the original dentry, which must be hashed. There is no need to check if
an in-lookup dentry exists with the same name as d_alloc_parallel() will
never try add one while the hashed dentry exists. Once the new
in-lookup is created, d_alloc_parallel() will find it and wait for it to
complete, then use it.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 52 ++++++++++++++++++++++++++++++++++++++++++
include/linux/dcache.h | 1 +
2 files changed, 53 insertions(+)
diff --git a/fs/dcache.c b/fs/dcache.c
index f4d7d200bc46..c12319097d6e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1832,6 +1832,58 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
}
EXPORT_SYMBOL(d_alloc);
+/**
+ * d_duplicate: duplicate a dentry for combined atomic operation
+ * @dentry: the dentry to duplicate
+ *
+ * Some rename operations need to be combined with another operation
+ * inside the filesystem.
+ * 1/ A cluster filesystem when renaming to an in-use file might need to
+ * first "silly-rename" that target out of the way before the main rename
+ * 2/ A filesystem that supports white-out might want to create a whiteout
+ * in place of the file being moved.
+ *
+ * For this they need two dentries which temporarily have the same name,
+ * before one is renamed. d_duplicate() provides for this. Given a
+ * positive hashed dentry, it creates a second in-lookup dentry.
+ * Because the original dentry exists, no other thread will try to
+ * create an in-lookup dentry, os there can be no race in this create.
+ *
+ * The caller should d_move() the original to a new name, often via a
+ * rename request, and should call d_lookup_done() on the newly created
+ * dentry. If the new is instantiated and the old MUST either be moved
+ * or dropped.
+ *
+ * Parent must be locked.
+ *
+ * Returns: an in-lookup dentry, or an error.
+ */
+struct dentry *d_duplicate(struct dentry *dentry)
+{
+ unsigned int hash = dentry->d_name.hash;
+ struct dentry *parent = dentry->d_parent;
+ struct hlist_bl_head *b = in_lookup_hash(parent, hash);
+ struct dentry *new = __d_alloc(parent->d_sb, &dentry->d_name);
+
+ if (unlikely(!new))
+ return ERR_PTR(-ENOMEM);
+
+ new->d_flags |= DCACHE_PAR_LOOKUP;
+ new->d_wait = NULL;
+ spin_lock(&parent->d_lock);
+ new->d_parent = dget_dlock(parent);
+ hlist_add_head(&new->d_sib, &parent->d_children);
+ if (parent->d_flags & DCACHE_DISCONNECTED)
+ new->d_flags |= DCACHE_DISCONNECTED;
+ spin_unlock(&dentry->d_parent->d_lock);
+
+ hlist_bl_lock(b);
+ hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
+ hlist_bl_unlock(b);
+ return new;
+}
+EXPORT_SYMBOL(d_duplicate);
+
struct dentry *d_alloc_anon(struct super_block *sb)
{
return __d_alloc(sb, NULL);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3cb70b3398f0..2a3ebd368ed9 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -247,6 +247,7 @@ extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_noblock(struct dentry *, struct qstr *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
+struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Some ->lookup handlers will need to drop and retake the parent lock, so
they can safely use d_alloc_parallel().
->lookup can be called with the parent lock either exclusive or shared.
A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
This is rather ugly, but will be gone by the end of the series when
->lookup is *always* called with a shared lock on the parent.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 7 ++++---
include/linux/namei.h | 3 ++-
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index d31c3db7eb5e..eed388ee8a30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1928,7 +1928,7 @@ static noinline struct dentry *lookup_slow(const struct qstr *name,
struct inode *inode = dir->d_inode;
struct dentry *res;
inode_lock_shared(inode);
- res = __lookup_slow(name, dir, flags);
+ res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
inode_unlock_shared(inode);
return res;
}
@@ -1942,7 +1942,7 @@ static struct dentry *lookup_slow_killable(const struct qstr *name,
if (inode_lock_shared_killable(inode))
return ERR_PTR(-EINTR);
- res = __lookup_slow(name, dir, flags);
+ res = __lookup_slow(name, dir, flags | LOOKUP_SHARED);
inode_unlock_shared(inode);
return res;
}
@@ -4407,6 +4407,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
struct dentry *dentry;
int error, create_error = 0;
umode_t mode = op->mode;
+ unsigned int shared_flag = (op->open_flag & O_CREAT) ? 0 : LOOKUP_SHARED;
if (unlikely(IS_DEADDIR(dir_inode)))
return ERR_PTR(-ENOENT);
@@ -4474,7 +4475,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
if (d_in_lookup(dentry)) {
struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
- nd->flags);
+ nd->flags | shared_flag);
d_lookup_done(dentry);
if (unlikely(res)) {
if (IS_ERR(res)) {
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ad6dd9987b9..b3346a513d8f 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -37,8 +37,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
#define LOOKUP_CREATE BIT(17) /* ... in object creation */
#define LOOKUP_EXCL BIT(18) /* ... in target must not exist */
#define LOOKUP_RENAME_TARGET BIT(19) /* ... in destination of rename() */
+#define LOOKUP_SHARED BIT(20) /* Parent lock is held shared */
-/* 4 spare bits for intent */
+/* 3 spare bits for intent */
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS BIT(24) /* No symlink crossing. */
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
A proposed change will invert the lock ordering between
d_alloc_parallel() and inode_lock() on the parent.
When that happens it will not be safe to call d_alloc_parallel() while
holding the parent lock - even shared.
We don't need to keep the parent lock held when d_add_ci() is run - the
VFS doesn't need it as dentry is exclusively held due to
DCACHE_PAR_LOOKUP and the filesystem has finished its work.
So drop and reclaim the lock (shared or exclusive as determined by
LOOKUP_SHARED) to avoid future deadlock.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/dcache.c | 18 +++++++++++++++++-
fs/xfs/xfs_iops.c | 3 ++-
include/linux/dcache.h | 3 ++-
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index c12319097d6e..a1219b446b74 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2225,6 +2225,7 @@ EXPORT_SYMBOL(d_obtain_root);
* @dentry: the negative dentry that was passed to the parent's lookup func
* @inode: the inode case-insensitive lookup has found
* @name: the case-exact name to be associated with the returned dentry
+ * @bool: %true if lookup was performed with LOOKUP_SHARED
*
* This is to avoid filling the dcache with case-insensitive names to the
* same inode, only the actual correct case is stored in the dcache for
@@ -2237,7 +2238,7 @@ EXPORT_SYMBOL(d_obtain_root);
* the exact case, and return the spliced entry.
*/
struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
- struct qstr *name)
+ struct qstr *name, bool shared)
{
struct dentry *found, *res;
@@ -2250,6 +2251,17 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
iput(inode);
return found;
}
+ /*
+ * We are holding parent lock and so don't want to wait for a
+ * d_in_lookup() dentry. We can safely drop the parent lock and
+ * reclaim it as we have exclusive access to dentry as it is
+ * d_in_lookup() (so ->d_parent is stable) and we are near the
+ * end ->lookup() and will shortly drop the lock anyway.
+ */
+ if (shared)
+ inode_unlock_shared(d_inode(dentry->d_parent));
+ else
+ inode_unlock(d_inode(dentry->d_parent));
if (d_in_lookup(dentry)) {
found = d_alloc_parallel(dentry->d_parent, name);
if (IS_ERR(found) || !d_in_lookup(found)) {
@@ -2263,6 +2275,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
return ERR_PTR(-ENOMEM);
}
}
+ if (shared)
+ inode_lock_shared(d_inode(dentry->d_parent));
+ else
+ inode_lock_nested(d_inode(dentry->d_parent), I_MUTEX_PARENT);
res = d_splice_alias(inode, found);
if (res) {
d_lookup_done(found);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 208543e57eda..ec19d3ec7cf0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -35,6 +35,7 @@
#include <linux/security.h>
#include <linux/iversion.h>
#include <linux/fiemap.h>
+#include <linux/namei.h> // for LOOKUP_SHARED
/*
* Directories have different lock order w.r.t. mmap_lock compared to regular
@@ -369,7 +370,7 @@ xfs_vn_ci_lookup(
/* else case-insensitive match... */
dname.name = ci_name.name;
dname.len = ci_name.len;
- dentry = d_add_ci(dentry, VFS_I(ip), &dname);
+ dentry = d_add_ci(dentry, VFS_I(ip), &dname, !!(flags & LOOKUP_SHARED));
kfree(ci_name.name);
return dentry;
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 2a3ebd368ed9..a97eb151d9db 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -251,7 +251,8 @@ struct dentry *d_duplicate(struct dentry *dentry);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
-extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
+extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *,
+ bool);
extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name);
extern struct dentry *d_find_any_alias(struct inode *inode);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* Re: [PATCH v2 2/3] lib/bootconfig: check bounds before writing in __xbc_open_brace()
From: Steven Rostedt @ 2026-03-12 21:47 UTC (permalink / raw)
To: Josh Law
Cc: Andrew Morton, Masami Hiramatsu, Josh Law, linux-kernel,
linux-trace-kernel
In-Reply-To: <1bd0e36d-0ea0-4a79-bbee-e38de553f1ea@gmail.com>
On Thu, 12 Mar 2026 21:30:10 +0000
Josh Law <hlcj1234567@gmail.com> wrote:
> 12 Mar 2026 21:28:11 Andrew Morton <akpm@linux-foundation.org>:
>
> > On Thu, 12 Mar 2026 21:09:52 +0000 Josh Law <hlcj1234567@gmail.com> wrote:
> >
> >>> That's a fair point, Steve. Given that brace_index isn't touched
> >>> elsewhere and the current check effectively prevents the overflow, I
> >>> agree this isn't strictly necessary. I'll drop this patch and stick
> >>> with the fix for the off-by-one reporting error instead. Thanks for
> >>> the feedback!
> >>
> >> Wait Steve,
> >> Thanks for the look. I see your point that it's currently redundant
> >> given the call patterns. It looks like Andrew has already merged this
> >> into the -mm tree, likely as a 'belt-and-suspenders' safety measure.
> >> I'll keep your feedback in mind for future cleanup, but I'm glad we
> >> got the other off-by-one fix in as well!
> >
> > Please wordwrap the emails.
> >
> >> And in my opinion, merging it is a decent idea.
> >
> > You've changed your position without explaining why?
>
/me wraps your email (claws-mail does that nicely ;-)
> Sorry, I think it should be merged because it's better to be safe than
> sorry, I know there is different methods of implementation, but this one
> still works... I know it's churn (and I'm sorry)
It's not the only place that does that. And since the value is safe as is,
I rather not touch it.
-- Steve
^ permalink raw reply
* [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
It is important that two non-create NFS "open"s of a negative dentry
don't race. They both have only a shared lock on i_rwsem and so could
run concurrently, but they might both try to call d_splice_alias() at
the same time which is confusing at best.
nfs_atomic_open() currently avoids this by discarding the negative
dentry and creating a new one using d_alloc_parallel(). Only one thread
can successfully get the d_in_lookup() dentry, the other will wait for
the first to finish, and can use the result of that first lookup.
A proposed locking change inverts the order between i_rwsem and
d_alloc_parallel() so it will not be safe to call d_alloc_parallel()
while holding i_rwsem - even shared.
We can achieve the same effect by causing ->d_revalidate to invalidate a
negative dentry when LOOKUP_OPEN is set. Doing this is consistent with
the "close to open" caching semantics of NFS which requires the server
to be queried whenever opening a file - cached information must not be
trusted.
With this change to ->d_revaliate (implemented in nfs_neg_need_reval) we
can be sure that we have exclusive access to any dentry that reaches
nfs_atomic_open(). Either O_CREAT was requested and so the parent is
locked exclusively, or the dentry will have DCACHE_PAR_LOOKUP set.
This means that the d_drop() and d_alloc_parallel() calls in
nfs_atomic_lookup() are no longer needed to provide exclusion
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 30 +++++++-----------------------
1 file changed, 7 insertions(+), 23 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 52e7656195ec..3033cc5ce12f 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1657,6 +1657,13 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
{
if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
return 0;
+ if (flags & LOOKUP_OPEN)
+ /* close-to-open semantics require we go to server
+ * on each open. By invalidating the dentry we
+ * also ensure nfs_atomic_open() always has exclusive
+ * access to the dentry.
+ */
+ return 0;
if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONEG)
return 1;
/* Case insensitive server? Revalidate negative dentries */
@@ -2112,7 +2119,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
struct inode *inode;
unsigned int lookup_flags = 0;
unsigned long dir_verifier;
- bool switched = false;
int created = 0;
int err;
@@ -2157,17 +2163,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
attr.ia_size = 0;
}
- if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
- d_drop(dentry);
- switched = true;
- dentry = d_alloc_parallel(dentry->d_parent,
- &dentry->d_name);
- if (IS_ERR(dentry))
- return PTR_ERR(dentry);
- if (unlikely(!d_in_lookup(dentry)))
- return finish_no_open(file, dentry);
- }
-
ctx = create_nfs_open_context(dentry, open_flags, file);
err = PTR_ERR(ctx);
if (IS_ERR(ctx))
@@ -2210,10 +2205,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
out:
- if (unlikely(switched)) {
- d_lookup_done(dentry);
- dput(dentry);
- }
return err;
no_open:
@@ -2236,13 +2227,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
res = ERR_PTR(-EOPENSTALE);
}
}
- if (switched) {
- d_lookup_done(dentry);
- if (!res)
- res = dentry;
- else
- dput(dentry);
- }
return finish_no_open(file, res);
}
EXPORT_SYMBOL_GPL(nfs_atomic_open);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
When filename_linkat() calls filename_create() which ultimately
calls ->lookup, the flags LOOKUP_CREATE|LOOKUP_EXCL are passed.
nfs_lookup() treats this as an exclusive create (which it is) and
skips the ->lookup, leaving the dentry unchanged.
Currently that means nfs_link() can get a hashed dentry (if the name was
already in the cache) or an unhashed dentry (if it wasn't). As none of
d_add(), d_instantiate(), d_splice_alias() could handle both of these,
nfs_link() calls d_drop() and then then d_add().
Recent changes to d_splice_alias() mean that it *can* work with either
hashed or unhashed dentries. Future changes to locking mean that it
will be unsafe to d_drop() a dentry while an operation (in this case
"link()") is still ongoing.
So change to use d_splice_alias(), and not to d_drop() until an error is
detected (as in that case was can't be sure what is actually on the server).
Also update the comment for nfs_is_exclusive_create() to note that
link(), mkdir(), mknod(), symlink() all appear as exclusive creates.
Those other than link() already used d_splice_alias() via
nfs_add_or_obtain().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3033cc5ce12f..a188b09c9a54 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1571,6 +1571,9 @@ static int nfs_check_verifier(struct inode *dir, struct dentry *dentry,
/*
* Use intent information to check whether or not we're going to do
* an O_EXCL create using this path component.
+ * Note that link(), mkdir(), mknod(), symlink() all appear as
+ * exclusive creation. Regular file creation could be distinguished
+ * with LOOKUP_OPEN.
*/
static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags)
{
@@ -2677,14 +2680,15 @@ nfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
old_dentry, dentry);
trace_nfs_link_enter(inode, dir, dentry);
- d_drop(dentry);
if (S_ISREG(inode->i_mode))
nfs_sync_inode(inode);
error = NFS_PROTO(dir)->link(inode, dir, &dentry->d_name);
if (error == 0) {
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
ihold(inode);
- d_add(dentry, inode);
+ d_splice_alias(inode, dentry);
+ } else {
+ d_drop(dentry);
}
trace_nfs_link_exit(inode, dir, dentry, error);
return error;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
nfs_add_or_obtain() is used, often via nfs_instantiate(), to attach a
newly created inode to the appropriate dentry - or to provide an
alternate dentry.
It has to drop the dentry first, which is problematic for proposed
locking changes.
As d_splice_alias() now works with hashed dentries, the d_drop() is no
longer needed.
However we still d_drop() on error as the status of the name is
uncertain.
nfs_open_and_get_state() is only used for files so we should be able to
use d_instantiate(). However as that depends on the server for
correctness, it is safer to stay with the current code pattern and use
d_splice_alias() there too.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 3 +--
fs/nfs/nfs4proc.c | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a188b09c9a54..f92ea11aea44 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2330,8 +2330,6 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
struct dentry *d;
int error;
- d_drop(dentry);
-
if (fhandle->size == 0) {
error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
fhandle, fattr);
@@ -2352,6 +2350,7 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
dput(parent);
return d;
out_error:
+ d_drop(dentry);
d = ERR_PTR(error);
goto out;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 91bcf67bd743..a4ee0c0b4567 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3099,7 +3099,6 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
nfs_set_verifier(dentry, dir_verifier);
if (d_really_is_negative(dentry)) {
struct dentry *alias;
- d_drop(dentry);
alias = d_splice_alias(igrab(state->inode), dentry);
/* d_splice_alias() can't fail here - it's a non-directory */
if (alias) {
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create.
From: NeilBrown @ 2026-03-12 21:11 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
When atomic_create fails with -ENOENT we currently d_drop() the dentry
and then re-add it (d_splice_alias()) with a NULL inode.
This drop-and-re-add will not work with proposed locking changes.
As d_splice_alias() now supports hashed dentries, we don't need the
d_drop() until it is determined that some other error has occurred.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f92ea11aea44..ffba4de3df01 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2179,7 +2179,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
- d_drop(dentry);
switch (err) {
case -ENOENT:
if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
@@ -2188,7 +2187,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
dir_verifier = nfs_save_change_attribute(dir);
nfs_set_verifier(dentry, dir_verifier);
d_splice_alias(NULL, dentry);
- break;
+ goto out;
case -EISDIR:
case -ENOTDIR:
goto no_open;
@@ -2200,6 +2199,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
default:
break;
}
+ d_drop(dentry);
goto out;
}
file->f_mode |= FMODE_CAN_ODIRECT;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
NFS uses the results of readdir to prime the dcache. Using
d_alloc_parallel() can block if there is a concurrent lookup. Blocking
in that case is pointless as the lookup will add info to the dcache and
there is no value in the readdir waiting to see if it should add the
info too.
Also this call to d_alloc_parallel() is made while the parent
directory is locked. A proposed change to locking will lock the parent
later, after d_alloc_parallel(). This means it won't be safe to wait in
d_alloc_parallel() while holding the directory lock.
So change to use d_alloc_noblock(), and use try_lookup_noperm() rather
than full_name_hash and d_lookup.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ffba4de3df01..4b73ec59bbcc 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -750,15 +750,14 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
if (filename.len == 2 && filename.name[1] == '.')
return;
}
- filename.hash = full_name_hash(parent, filename.name, filename.len);
- dentry = d_lookup(parent, &filename);
+ dentry = try_lookup_noperm(&filename, parent);
again:
- if (!dentry) {
- dentry = d_alloc_parallel(parent, &filename);
- if (IS_ERR(dentry))
- return;
- }
+ if (!dentry)
+ dentry = d_alloc_noblock(parent, &filename);
+ if (IS_ERR(dentry))
+ return;
+
if (!d_in_lookup(dentry)) {
/* Is there a mountpoint here? If so, just exit */
if (!nfs_fsid_equal(&NFS_SB(dentry->d_sb)->fsid,
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Rather than performing a normal lookup (which will be awkward with
future locking changes) use d_alloc_noblock() to find a dentry for an
unused name, and then open-code the rest of lookup_slow() to see if it
is free on the server.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/unlink.c | 56 +++++++++++++++++++++++++++++++------------------
1 file changed, 36 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 43ea897943c0..f112c13d97a1 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -445,7 +445,8 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
static unsigned int sillycounter;
unsigned char silly[SILLYNAME_LEN + 1];
unsigned long long fileid;
- struct dentry *sdentry;
+ struct dentry *sdentry, *old;
+ struct qstr qsilly;
struct inode *inode = d_inode(dentry);
struct rpc_task *task;
int error = -EBUSY;
@@ -462,26 +463,41 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
fileid = NFS_FILEID(d_inode(dentry));
- sdentry = NULL;
- do {
+newname:
+ sillycounter++;
+ scnprintf(silly, sizeof(silly),
+ SILLYNAME_PREFIX "%0*llx%0*x",
+ SILLYNAME_FILEID_LEN, fileid,
+ SILLYNAME_COUNTER_LEN, sillycounter);
+
+ dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
+ dentry, silly);
+ qsilly = QSTR(silly);
+ sdentry = try_lookup_noperm(&qsilly, dentry->d_parent);
+ if (!sdentry)
+ sdentry = d_alloc_noblock(dentry->d_parent, &qsilly);
+ if (sdentry == ERR_PTR(-EWOULDBLOCK))
+ /* Name currently being looked up */
+ goto newname;
+ /*
+ * N.B. Better to return EBUSY here ... it could be
+ * dangerous to delete the file while it's in use.
+ */
+ if (IS_ERR(sdentry))
+ goto out;
+ if (d_is_positive(sdentry)) {
dput(sdentry);
- sillycounter++;
- scnprintf(silly, sizeof(silly),
- SILLYNAME_PREFIX "%0*llx%0*x",
- SILLYNAME_FILEID_LEN, fileid,
- SILLYNAME_COUNTER_LEN, sillycounter);
-
- dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
- dentry, silly);
-
- sdentry = lookup_noperm(&QSTR(silly), dentry->d_parent);
- /*
- * N.B. Better to return EBUSY here ... it could be
- * dangerous to delete the file while it's in use.
- */
- if (IS_ERR(sdentry))
- goto out;
- } while (d_inode(sdentry) != NULL); /* need negative lookup */
+ goto newname;
+ }
+ /* This name isn't known locally - check on server */
+ old = dir->i_op->lookup(dir, sdentry, 0);
+ d_lookup_done(sdentry);
+ if (old || d_is_positive(sdentry)) {
+ if (!IS_ERR(old))
+ dput(old);
+ dput(sdentry);
+ goto newname;
+ }
ihold(inode);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 15/53] nfs: use d_duplicate()
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
As preparation for d_alloc_parallel() being allowed without the
directory locked, use d_duplicate() to duplicate a dentry for silly
rename.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 4b73ec59bbcc..655a1e8467f7 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2781,11 +2781,9 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
spin_unlock(&new_dentry->d_lock);
/* copy the target dentry's name */
- dentry = d_alloc(new_dentry->d_parent,
- &new_dentry->d_name);
+ dentry = d_duplicate(new_dentry);
if (!dentry)
goto out;
-
/* silly-rename the existing target ... */
err = nfs_sillyrename(new_dir, new_dentry);
if (err)
@@ -2850,8 +2848,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
nfs_dentry_handle_enoent(old_dentry);
/* new dentry created? */
- if (dentry)
+ if (dentry) {
+ d_lookup_done(dentry);
dput(dentry);
+ }
return error;
}
EXPORT_SYMBOL_GPL(nfs_rename);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
When performing an "impure" readdir, ovl needs to perform a lookup on some
of the names that it found.
With proposed locking changes it will not be possible to perform this
lookup (in particular, not safe to wait for d_alloc_parallel()) while
holding a lock on the directory.
ovl doesn't really need the lock at this point. It has already iterated
the directory and has cached a list of the contents. It now needs to
gather extra information about some contents. It can do this without
the lock.
After gathering that info it needs to retake the lock for API
correctness. After doing this it must check IS_DEADDIR() again to
ensure readdir always returns -ENOENT on a removed directory.
Note that while ->iterate_shared is called with a shared lock, ovl uses
WRAP_DIR_ITER() so an exclusive lock is held and so we drop and retake
that exclusive lock.
As the directory is no longer locked in ovl_cache_update() we need
dget_parent() to get a reference to the parent.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/overlayfs/readdir.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1dcc75b3a90f..d5123b37921c 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -568,13 +568,12 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
goto get;
}
if (p->len == 2) {
- /* we shall not be moved */
- this = dget(dir->d_parent);
+ this = dget_parent(dir);
goto get;
}
}
/* This checks also for xwhiteouts */
- this = lookup_one(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
+ this = lookup_one_unlocked(mnt_idmap(path->mnt), &QSTR_LEN(p->name, p->len), dir);
if (IS_ERR_OR_NULL(this) || !this->d_inode) {
/* Mark a stale entry */
p->is_whiteout = true;
@@ -666,11 +665,12 @@ static int ovl_dir_read_impure(const struct path *path, struct list_head *list,
if (err)
return err;
+ inode_unlock(path->dentry->d_inode);
list_for_each_entry_safe(p, n, list, l_node) {
if (!name_is_dot_dotdot(p->name, p->len)) {
err = ovl_cache_update(path, p, true);
if (err)
- return err;
+ break;
}
if (p->ino == p->real_ino) {
list_del(&p->l_node);
@@ -680,14 +680,19 @@ static int ovl_dir_read_impure(const struct path *path, struct list_head *list,
struct rb_node *parent = NULL;
if (WARN_ON(ovl_cache_entry_find_link(p->name, p->len,
- &newp, &parent)))
- return -EIO;
+ &newp, &parent))) {
+ err = -EIO;
+ break;
+ }
rb_link_node(&p->node, parent, newp);
rb_insert_color(&p->node, root);
}
}
- return 0;
+ inode_lock(path->dentry->d_inode);
+ if (IS_DEADDIR(path->dentry->d_inode))
+ err = -ENOENT;
+ return err;
}
static struct ovl_dir_cache *ovl_cache_get_impure(const struct path *path)
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 17/53] coda: don't d_drop() early.
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
Proposed locking changes will mean that calling d_drop() could
effectively unlock the name allowing a parallel lookup to proceed.
For this reason it could only be called *after* the attempt to create a
symlink (in this case) has completed (whether successfully or not).
So move the d_drop() to after the venus_symlink() call.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/coda/dir.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index c64b8cd81568..70eb6042fdaa 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -244,13 +244,13 @@ static int coda_symlink(struct mnt_idmap *idmap,
if (symlen > CODA_MAXPATHLEN)
return -ENAMETOOLONG;
+ error = venus_symlink(dir_inode->i_sb, coda_i2f(dir_inode), name, len,
+ symname, symlen);
/*
- * This entry is now negative. Since we do not create
+ * This entry is still negative. Since we did not create
* an inode for the entry we have to drop it.
*/
d_drop(de);
- error = venus_symlink(dir_inode->i_sb, coda_i2f(dir_inode), name, len,
- symname, symlen);
/* mtime is no good anymore */
if (!error)
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 18/53] shmem: use d_duplicate()
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
To prepare for d_alloc_parallel() being permitted without a directory
lock, use d_duplicate() when duplicating a dentry in order to create a
whiteout.
Signed-off-by: NeilBrown <neil@brown.name>
---
mm/shmem.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index b40f3cd48961..6b39a59355d7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4030,11 +4030,12 @@ static int shmem_whiteout(struct mnt_idmap *idmap,
struct dentry *whiteout;
int error;
- whiteout = d_alloc(old_dentry->d_parent, &old_dentry->d_name);
+ whiteout = d_duplicate(old_dentry);
if (!whiteout)
return -ENOMEM;
error = shmem_mknod(idmap, old_dir, whiteout,
S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
+ d_lookup_done(whiteout);
dput(whiteout);
return error;
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
* [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
From: NeilBrown @ 2026-03-12 21:12 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
Jeff Layton, Trond Myklebust, Anna Schumaker, Carlos Maiolino,
Miklos Szeredi, Amir Goldstein, Jan Harkes, Hugh Dickins,
Baolin Wang, David Howells, Marc Dionne, Steve French,
Namjae Jeon, Sungjong Seo, Yuezhang Mo, Andreas Hindborg,
Breno Leitao, Theodore Ts'o, Andreas Dilger, Steven Rostedt,
Masami Hiramatsu, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Tyler Hicks, Andreas Gruenbacher, Richard Weinberger,
Anton Ivanov, Johannes Berg, Jeremy Kerr, Ard Biesheuvel
Cc: linux-fsdevel, linux-nfs, linux-xfs, linux-unionfs, coda,
linux-mm, linux-afs, linux-cifs, linux-ext4, linux-kernel,
linux-trace-kernel, ceph-devel, ecryptfs, gfs2, linux-um,
linux-efi
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
In two places ceph drops a dentry and then calls d_splice_alias().
The d_drop() is no longer needed before d_splice_alias() and will
cause problems for proposed changes to locking.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/ceph/file.c | 2 --
fs/ceph/inode.c | 3 ---
2 files changed, 5 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 66bbf6d517a9..c40d129bbd03 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -751,8 +751,6 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
unlock_new_inode(inode);
}
if (d_in_lookup(dentry) || d_really_is_negative(dentry)) {
- if (!d_unhashed(dentry))
- d_drop(dentry);
dn = d_splice_alias(inode, dentry);
WARN_ON_ONCE(dn && dn != dentry);
}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8557b207d337..32bac5cac8c4 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1517,9 +1517,6 @@ static int splice_dentry(struct dentry **pdn, struct inode *in)
}
}
- /* dn must be unhashed */
- if (!d_unhashed(dn))
- d_drop(dn);
realdn = d_splice_alias(in, dn);
if (IS_ERR(realdn)) {
pr_err_client(cl, "error %ld %p inode %p ino %llx.%llx\n",
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox