* [PATCH] ceph: only d_add() negative dentries when they are unhashed
@ 2026-03-27 16:23 Max Kellermann
2026-03-27 18:44 ` Viacheslav Dubeyko
0 siblings, 1 reply; 2+ messages in thread
From: Max Kellermann @ 2026-03-27 16:23 UTC (permalink / raw)
To: idryomov, amarkuze, ceph-devel, linux-kernel; +Cc: Max Kellermann, stable
Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192)
CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
lookup_fast+0x9f/0x100
walk_component+0x1f/0x150
link_path_walk+0x20e/0x3d0
path_lookupat+0x68/0x180
filename_lookup+0xdc/0x1e0
vfs_statx+0x6c/0x140
vfs_fstatat+0x67/0xa0
__do_sys_newfstatat+0x24/0x60
do_syscall_64+0x6a/0x230
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.
Fixes: 2817b000b02c ("ceph: directory operations")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
fs/ceph/dir.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index bac9cfb6b982..27ce9e55e947 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -769,7 +769,8 @@ struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
d_drop(dentry);
err = -ENOENT;
} else {
- d_add(dentry, NULL);
+ if (d_unhashed(dentry))
+ d_add(dentry, NULL);
}
}
}
@@ -840,7 +841,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
spin_unlock(&ci->i_ceph_lock);
doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
ceph_vinop(dir));
- d_add(dentry, NULL);
+ if (d_unhashed(dentry))
+ d_add(dentry, NULL);
di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
return NULL;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] ceph: only d_add() negative dentries when they are unhashed
2026-03-27 16:23 [PATCH] ceph: only d_add() negative dentries when they are unhashed Max Kellermann
@ 2026-03-27 18:44 ` Viacheslav Dubeyko
0 siblings, 0 replies; 2+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-27 18:44 UTC (permalink / raw)
To: max.kellermann@ionos.com, idryomov@gmail.com, Alex Markuze,
linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org
Cc: stable@vger.kernel.org
On Fri, 2026-03-27 at 17:23 +0100, Max Kellermann wrote:
> Ceph can call d_add(dentry, NULL) on a negative dentry that is already
> present in the primary dcache hash.
>
> In the current VFS that is not safe. d_add() goes through __d_add()
> to __d_rehash(), which unconditionally reinserts dentry->d_hash into
> the hlist_bl bucket. If the dentry is already hashed, reinserting the
> same node can corrupt the bucket, including creating a self-loop.
> Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
> typically looping only on the d_name.hash mismatch check and
> eventually triggering RCU stall reports like this one:
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
> rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192)
> CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
> Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
> RIP: 0010:__d_lookup+0x46/0xb0
> Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
> RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
> RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
> RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
> RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
> R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
> R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
> FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> lookup_fast+0x9f/0x100
> walk_component+0x1f/0x150
> link_path_walk+0x20e/0x3d0
> path_lookupat+0x68/0x180
> filename_lookup+0xdc/0x1e0
> vfs_statx+0x6c/0x140
> vfs_fstatat+0x67/0xa0
> __do_sys_newfstatat+0x24/0x60
> do_syscall_64+0x6a/0x230
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> This is reachable with reused cached negative dentries. A Ceph lookup
> or atomic_open can be handed a negative dentry that is already hashed,
> and fs/ceph/dir.c then hits one of two paths that incorrectly assume
> "negative" also means "unhashed":
>
> - ceph_finish_lookup():
> MDS reply is -ENOENT with no trace
> -> d_add(dentry, NULL)
>
> - ceph_lookup():
> local ENOENT fast path for a complete directory with shared caps
> -> d_add(dentry, NULL)
>
> Both paths can therefore re-add an already-hashed negative dentry.
>
> Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
> calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
> is true.
>
> Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
> negative dentry when it is actually unhashed. If the negative dentry
> is already hashed, leave it in place and reuse it as-is.
>
> This preserves the existing behavior for unhashed dentries while
> avoiding d_hash list corruption for reused hashed negatives.
>
> Fixes: 2817b000b02c ("ceph: directory operations")
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> fs/ceph/dir.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index bac9cfb6b982..27ce9e55e947 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -769,7 +769,8 @@ struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
> d_drop(dentry);
> err = -ENOENT;
> } else {
> - d_add(dentry, NULL);
> + if (d_unhashed(dentry))
> + d_add(dentry, NULL);
> }
> }
> }
> @@ -840,7 +841,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> spin_unlock(&ci->i_ceph_lock);
> doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
> ceph_vinop(dir));
> - d_add(dentry, NULL);
> + if (d_unhashed(dentry))
> + d_add(dentry, NULL);
> di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
> return NULL;
> }
Makes sense.
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Let me run xfstests for the patch to double check that everything works well.
Thanks,
Slava.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-27 18:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 16:23 [PATCH] ceph: only d_add() negative dentries when they are unhashed Max Kellermann
2026-03-27 18:44 ` Viacheslav Dubeyko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox