public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	clm@meta.com,  gustavold@meta.com
Subject: Re: [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru
Date: Wed, 08 Apr 2026 17:05:41 -0400	[thread overview]
Message-ID: <10d8499d60824f319e32f52fd6a196b3645f6652.camel@kernel.org> (raw)
In-Reply-To: <20260408192620.GI3836593@ZenIV>

On Wed, 2026-04-08 at 20:26 +0100, Al Viro wrote:
> On Wed, Apr 08, 2026 at 02:28:20PM -0400, Jeff Layton wrote:
> 
> > ...it turns out that Gustavo had been chasing this independently to me,
> > and had Claude do a bit more analysis. I included it below, but here's
> > a link that may be more readable. Any thoughts?
> 
> Other than rather uncharitable ones about the usefulness of the Turing Test,
> you mean?
> 
> > **In production** (narrow race window): The livelock occasionally
> > resolves through specific timing that allows a parent dentry to be
> > freed and its slab page reused.
> 
> Livelock is real and known, all right, but do explain what does "resolves
> through specific timing that allows a parent dentry to be freed" mean.
> Especially since the reference to parent is *not* dropped until after
> having child detached from the tree and DCACHE_DENTRY_KILLED on it,
> with ->d_lock on child held over that.  So select_collect2() seeing
> the victim still locked and attached to the tree has to happen before
> the grace period for parent has a chance to begin.  And rcu_read_lock()
> grabbed there prevents that grace period from completing until we
> do the matching rcu_read_unlock() in shrink_dcache_parent().
> 
> > In `select_collect()` (the `d_walk` callback used by
> > `shrink_dcache_parent`), two types of dentries are incorrectly counted
> 						     really?
> > as "found":
> 
> > 1. **Dead dentries** (`d_lockref.count < 0`): Another CPU called
> > `lockref_mark_dead()` in `__dentry_kill()` but hasn't yet called
> > `dentry_unlist()` to remove the dentry from the parent's children list.
> > With the debug delay, the dentry stays dead-but-visible for 5ms.
> 
> Yes.  And?  That's the livelock, all right, and it needs fixing,
> but how does busy-wait here lead to UAF on anything?
> 
> > 2. **`DCACHE_SHRINK_LIST` dentries**: Already isolated by another
> > shrinker path (e.g., the global LRU shrinker from `drop_caches`) to its
> > own dispose list. These are being processed by that other path but
> > slowly (5ms per proc dentry with the debug delay).
> > 
> > When `select_collect` counts these as `found++`,
> > `shrink_dcache_parent()` sees `data.found > 0` and loops again. But
> > these dentries can never be collected onto `data.dispose` (dead ones
> > have count < 0, shrink-list ones already have `DCACHE_SHRINK_LIST`
> > set), so the loop never makes progress → **infinite loop**.
> 
> They have no business going into data.dispose; for fuck sake, dentries
> on somebody else's shrink list are explicitly fed to shrink_kill().
> 
> > **Why this is correct:**
> 
> It is not.
> 
> > - **Dead dentries (`count < 0`)**: These are being killed by another
> > CPU's `__dentry_kill()`. That CPU will call `dentry_unlist()` to remove
> > them from the parent's children list. `shrink_dcache_parent()` doesn't
> > need to wait for them — they'll disappear from the tree on their own.
> 
> ... and since they keep their parents busy, we should not leave until
> they are gone.  For real fun, consider calls from shrink_dcache_for_umount() -
> and yes, it *is* possible for another thread's shrink list to contain
> dentries from filesystem being shut down.  Legitimately so.
> 
> > - **`DCACHE_SHRINK_LIST` dentries**: These are already on another
> > shrinker's dispose list and will be processed by that path. Counting
> > them as "found" forces `shrink_dcache_parent()` to wait for the other
> > shrinker to finish, which can take arbitrarily long (especially with
> > filesystem callbacks or the debug delay).
> 
> Ditto.
> 
> > - **The `select_collect2` path** (used when `data.found > 0` but
> > `data.dispose` is empty) handles `DCACHE_SHRINK_LIST` dentries
> > separately by setting `data->victim` and processing them directly. With
> > this fix, `select_collect2` is only reached when there are genuinely
> > unprocessable dentries (count > 0, not dead, not on shrink list), not
> > when there are merely in-flight kills or concurrent shrinkers.
> 
> Bollocks, due to above.
> 
> > ### Relationship to the production UAF crash
> > 
> > The livelock is the **precursor** to the use-after-free crash seen in
> > production (P2260313060):
> 
> ... and it still offers zero explanation of the path from livelock to
> UAF.  It may or may not be real, but there's nothing in all that verbiage
> even suggesting what it might be.  And proposed analysis is flat-out
> wrong.
> 
> As for the livelock, see viro/vfs.git #work.dcache-busy-wait (in -next
> as of today).

Thanks for taking a look, Al. We'll keep looking at this thing and see
if we can collect more info, and come up with a better theory of the
crash. I'll dig deeper into the vmcore tomorrow.

FWIW, I've been running a bpftrace script across a swath of machines
that emulates the WARN_ON_ONCE() in the original patch I proposed.
We've had a few crashes while it's running and haven't yet collected
any stack traces.

The script only runs about 80% of the time though, so it's possible
we're just getting unlucky, but I sort of doubt it. It seems more
likely that these dentries are racing onto the list after we've passed
the check.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2026-04-08 21:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 16:44 [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru Jeff Layton
2026-04-07 10:51 ` Jan Kara
2026-04-08  6:42 ` Al Viro
2026-04-08 11:10   ` Jeff Layton
2026-04-08 18:28   ` Jeff Layton
2026-04-08 19:26     ` Al Viro
2026-04-08 21:05       ` Jeff Layton [this message]
2026-04-08 22:43         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10d8499d60824f319e32f52fd6a196b3645f6652.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=clm@meta.com \
    --cc=gustavold@meta.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox