All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "max.kellermann@ionos.com" <max.kellermann@ionos.com>,
	"idryomov@gmail.com" <idryomov@gmail.com>,
	Alex Markuze <amarkuze@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Cc: "stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: RE:  [PATCH] ceph: only d_add() negative dentries when they are unhashed
Date: Mon, 30 Mar 2026 17:04:00 +0000	[thread overview]
Message-ID: <f8c25bcd64be6fdaafd4e49507ea9e04110d56a5.camel@ibm.com> (raw)
In-Reply-To: <765945680a8b83b26148430752295deedea831e7.camel@ibm.com>

On Fri, 2026-03-27 at 18:44 +0000, Viacheslav Dubeyko wrote:
> On Fri, 2026-03-27 at 17:23 +0100, Max Kellermann wrote:
> > Ceph can call d_add(dentry, NULL) on a negative dentry that is already
> > present in the primary dcache hash.
> > 
> > In the current VFS that is not safe.  d_add() goes through __d_add()
> > to __d_rehash(), which unconditionally reinserts dentry->d_hash into
> > the hlist_bl bucket.  If the dentry is already hashed, reinserting the
> > same node can corrupt the bucket, including creating a self-loop.
> > Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
> > typically looping only on the d_name.hash mismatch check and
> > eventually triggering RCU stall reports like this one:
> > 
> >  rcu: INFO: rcu_sched self-detected stall on CPU
> >  rcu:         87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
> >  rcu:         (t=2101 jiffies g=79058445 q=698988 ncpus=192)
> >  CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
> >  Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
> >  RIP: 0010:__d_lookup+0x46/0xb0
> >  Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
> >  RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
> >  RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
> >  RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
> >  RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
> >  R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
> >  R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
> >  FS:  00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
> >  PKRU: 55555554
> >  Call Trace:
> >   <TASK>
> >   lookup_fast+0x9f/0x100
> >   walk_component+0x1f/0x150
> >   link_path_walk+0x20e/0x3d0
> >   path_lookupat+0x68/0x180
> >   filename_lookup+0xdc/0x1e0
> >   vfs_statx+0x6c/0x140
> >   vfs_fstatat+0x67/0xa0
> >   __do_sys_newfstatat+0x24/0x60
> >   do_syscall_64+0x6a/0x230
> >   entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > This is reachable with reused cached negative dentries.  A Ceph lookup
> > or atomic_open can be handed a negative dentry that is already hashed,
> > and fs/ceph/dir.c then hits one of two paths that incorrectly assume
> > "negative" also means "unhashed":
> > 
> >   - ceph_finish_lookup():
> >       MDS reply is -ENOENT with no trace
> >       -> d_add(dentry, NULL)
> > 
> >   - ceph_lookup():
> >       local ENOENT fast path for a complete directory with shared caps
> >       -> d_add(dentry, NULL)
> > 
> > Both paths can therefore re-add an already-hashed negative dentry.
> > 
> > Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
> > calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
> > is true.
> > 
> > Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
> > negative dentry when it is actually unhashed.  If the negative dentry
> > is already hashed, leave it in place and reuse it as-is.
> > 
> > This preserves the existing behavior for unhashed dentries while
> > avoiding d_hash list corruption for reused hashed negatives.
> > 
> > Fixes: 2817b000b02c ("ceph: directory operations")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> > ---
> >  fs/ceph/dir.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > index bac9cfb6b982..27ce9e55e947 100644
> > --- a/fs/ceph/dir.c
> > +++ b/fs/ceph/dir.c
> > @@ -769,7 +769,8 @@ struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
> >  				d_drop(dentry);
> >  				err = -ENOENT;
> >  			} else {
> > -				d_add(dentry, NULL);
> > +				if (d_unhashed(dentry))
> > +					d_add(dentry, NULL);
> >  			}
> >  		}
> >  	}
> > @@ -840,7 +841,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
> >  			spin_unlock(&ci->i_ceph_lock);
> >  			doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
> >  			      ceph_vinop(dir));
> > -			d_add(dentry, NULL);
> > +			if (d_unhashed(dentry))
> > +				d_add(dentry, NULL);
> >  			di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
> >  			return NULL;
> >  		}
> 
> Makes sense.
> 
> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> 
> Let me run xfstests for the patch to double check that everything works well.
> 

I had multiple xfstests issues during last run. Most probably, it was some
glitch on my side or inconsistent build. I need to repeat the xfstests run with
the patch.

Thanks,
Slava.

  reply	other threads:[~2026-03-30 17:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 16:23 [PATCH] ceph: only d_add() negative dentries when they are unhashed Max Kellermann
2026-03-27 18:44 ` Viacheslav Dubeyko
2026-03-30 17:04   ` Viacheslav Dubeyko [this message]
2026-03-31 17:01     ` Viacheslav Dubeyko
2026-04-02 18:11       ` Viacheslav Dubeyko
2026-04-02 18:33         ` Viacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8c25bcd64be6fdaafd4e49507ea9e04110d56a5.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=amarkuze@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.