From: Al Viro <viro@zeniv.linux.org.uk>
To: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
clm@meta.com, gustavold@meta.com
Subject: Re: [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru
Date: Wed, 8 Apr 2026 20:26:20 +0100 [thread overview]
Message-ID: <20260408192620.GI3836593@ZenIV> (raw)
In-Reply-To: <f44d1047296ede580c69bdad8e527ac8294746ec.camel@kernel.org>
On Wed, Apr 08, 2026 at 02:28:20PM -0400, Jeff Layton wrote:
> ...it turns out that Gustavo had been chasing this independently to me,
> and had Claude do a bit more analysis. I included it below, but here's
> a link that may be more readable. Any thoughts?
Other than rather uncharitable ones about the usefulness of the Turing Test,
you mean?
> **In production** (narrow race window): The livelock occasionally
> resolves through specific timing that allows a parent dentry to be
> freed and its slab page reused.
Livelock is real and known, all right, but do explain what does "resolves
through specific timing that allows a parent dentry to be freed" mean.
Especially since the reference to parent is *not* dropped until after
having child detached from the tree and DCACHE_DENTRY_KILLED on it,
with ->d_lock on child held over that. So select_collect2() seeing
the victim still locked and attached to the tree has to happen before
the grace period for parent has a chance to begin. And rcu_read_lock()
grabbed there prevents that grace period from completing until we
do the matching rcu_read_unlock() in shrink_dcache_parent().
> In `select_collect()` (the `d_walk` callback used by
> `shrink_dcache_parent`), two types of dentries are incorrectly counted
really?
> as "found":
> 1. **Dead dentries** (`d_lockref.count < 0`): Another CPU called
> `lockref_mark_dead()` in `__dentry_kill()` but hasn't yet called
> `dentry_unlist()` to remove the dentry from the parent's children list.
> With the debug delay, the dentry stays dead-but-visible for 5ms.
Yes. And? That's the livelock, all right, and it needs fixing,
but how does busy-wait here lead to UAF on anything?
> 2. **`DCACHE_SHRINK_LIST` dentries**: Already isolated by another
> shrinker path (e.g., the global LRU shrinker from `drop_caches`) to its
> own dispose list. These are being processed by that other path but
> slowly (5ms per proc dentry with the debug delay).
>
> When `select_collect` counts these as `found++`,
> `shrink_dcache_parent()` sees `data.found > 0` and loops again. But
> these dentries can never be collected onto `data.dispose` (dead ones
> have count < 0, shrink-list ones already have `DCACHE_SHRINK_LIST`
> set), so the loop never makes progress → **infinite loop**.
They have no business going into data.dispose; for fuck sake, dentries
on somebody else's shrink list are explicitly fed to shrink_kill().
> **Why this is correct:**
It is not.
> - **Dead dentries (`count < 0`)**: These are being killed by another
> CPU's `__dentry_kill()`. That CPU will call `dentry_unlist()` to remove
> them from the parent's children list. `shrink_dcache_parent()` doesn't
> need to wait for them — they'll disappear from the tree on their own.
... and since they keep their parents busy, we should not leave until
they are gone. For real fun, consider calls from shrink_dcache_for_umount() -
and yes, it *is* possible for another thread's shrink list to contain
dentries from filesystem being shut down. Legitimately so.
> - **`DCACHE_SHRINK_LIST` dentries**: These are already on another
> shrinker's dispose list and will be processed by that path. Counting
> them as "found" forces `shrink_dcache_parent()` to wait for the other
> shrinker to finish, which can take arbitrarily long (especially with
> filesystem callbacks or the debug delay).
Ditto.
> - **The `select_collect2` path** (used when `data.found > 0` but
> `data.dispose` is empty) handles `DCACHE_SHRINK_LIST` dentries
> separately by setting `data->victim` and processing them directly. With
> this fix, `select_collect2` is only reached when there are genuinely
> unprocessable dentries (count > 0, not dead, not on shrink list), not
> when there are merely in-flight kills or concurrent shrinkers.
Bollocks, due to above.
> ### Relationship to the production UAF crash
>
> The livelock is the **precursor** to the use-after-free crash seen in
> production (P2260313060):
... and it still offers zero explanation of the path from livelock to
UAF. It may or may not be real, but there's nothing in all that verbiage
even suggesting what it might be. And proposed analysis is flat-out
wrong.
As for the livelock, see viro/vfs.git #work.dcache-busy-wait (in -next
as of today).
next prev parent reply other threads:[~2026-04-08 19:22 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 16:44 [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru Jeff Layton
2026-04-07 10:51 ` Jan Kara
2026-04-08 6:42 ` Al Viro
2026-04-08 11:10 ` Jeff Layton
2026-04-08 18:28 ` Jeff Layton
2026-04-08 19:26 ` Al Viro [this message]
2026-04-08 21:05 ` Jeff Layton
2026-04-08 22:43 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408192620.GI3836593@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=clm@meta.com \
--cc=gustavold@meta.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox