From: Jeff Layton <jlayton@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Nikolay Borisov <nik.borisov@suse.com>,
Max Kellermann <max.kellermann@ionos.com>,
Eric Sandeen <sandeen@redhat.com>,
Paulo Alcantara <pc@manguebit.org>
Subject: Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()
Date: Thu, 09 Apr 2026 16:10:41 -0400 [thread overview]
Message-ID: <41cfd0f95b7fde411c0d59463dce979be89cb8ef.camel@kernel.org> (raw)
In-Reply-To: <20260409190213.GQ3836593@ZenIV>
On Thu, 2026-04-09 at 20:02 +0100, Al Viro wrote:
> On Thu, Apr 09, 2026 at 12:51:38PM -0400, Jeff Layton wrote:
>
> > I can also confirm that this fixes the livelock we were seeing as a
> > precursor to this UAF [1]. I'm still not 100% sure how that livelock
> > leads to a UAF. Claude thinks is has to do with the livelock making the
> > loop outlive an RCU grace period.
>
> Claude is full of substance likely to cause CoC complaints when mentioned,
> film at 11. I agree with the "livelock opens up some window" part,
> what with the stack traces there, but I very much doubt that mechanism
> has anything to do with the stuff on a shrink list staying around from
> one iteration to another. The thing is, *nothing* is held between the
> iterations of loop in there; data.dispose is empty at the end of the loop
> body in all cases and reference in data.victim is not retained either
> across the iterations either. We do have a weird RCU use pattern there,
> all right (unbalanced acquire in select_collect2() paired with drop in
> the caller of caller of select_collect2()), but the scope is neither
> long nor extended by the livelock - in that case the function between
> shrink_dentry_tree() and select_collect2() (d_walk()) is told to drop
> the locks and return to its caller immediately.
>
> What I suspect might be getting opened up is the possibility
> of d_invalidate() (e.g. from pathname resolution trying
> to walk into /proc/<pid>/...) hitting in the middle of
> proc_invalidate_siblings_dcache().
>
> Incidentally, one thing to check would be whether anything on affected
> boxen had things mounted under /proc/<pid>/... - that would give
> some idea whether detach_mounts() (and namespace_unlock() with its
> handling of the list of mountpoints) might be needed in the mix.
>
● had_submounts = false. The d_invalidate code never found submounts, so
detach_mounts() was never called for this dentry. The find_submount /
namespace_unlock path was not involved in this crash.
Summary for Al: On this crashed machine, there were no mounts under
/proc/<pid>/... in any of the 49 mount namespaces checked (from
surviving tasks). The d_invalidate stack frame shows had_submounts =
false, confirming that detach_mounts() and namespace_unlock() were not
in the mix for this crash. The code path was purely
shrink_dcache_parent → shrink_dentry_list → lock_for_kill, without any
mount detachment involved.
> > I did eventually find a vmcore and was able to confirm some things:
> >
> > - in the freed/reallocated dentry, most of the fields are clobbered,
> > but the d_lru list pointers seemed to be intact. That tells me that the
> > sucker was added to the shrink list after being freed and reallocated.
> > That means that the WARN_ON_ONCE() wouldn't have caught this anyway.
> >
> > - there was some list corruption too: it looked like there were two
> > different (on-stack) list heads in the list. There were also some
> > dentries in there that were outside the
...directories being shrunk from processes exiting. Here's what it
found when I told it to validate the integrity of the d_lru that the
realloc'ed entry was on.
More claude slop from the earlier vmcore analysis:
------------------8<------------------
### Dispose List: Two Merged Lists
The dispose list contains **55 unique entries** and **two stack-based
list heads** forming a single ring:
1. **head_A** at `0xffffc900c0e8fcc0` — crashed thread's `data.dispose`
2. **head_B** at `0xffffc900c0e7fcc0` — another thread's `data.dispose`
(exited, stack freed, exactly 0x10000 bytes apart)
```
head_A ↔ UAF ↔ mountinfo(1) ↔ swaps(/) ↔ ... ↔
head_B(STACK) ↔ status(5962) ↔ exe(5962) ↔ ... ↔
stat(8808) ↔ ... ↔ mountinfo(1) ↔ UAF ↔ head_A
```
### d_walk Escaped Its Starting Dentry
`d_walk` was called with `parent = /proc/4530/task/5964` (`data.start`
confirmed in stack frame). It should only traverse descendants of 5964.
But the dispose list contains entries from:
- `/proc/4530/task/5962/*` (151 children — sibling of 5964)
- `/proc/4530/task/6830`, `/proc/4530/task/8808` — other task entries
- `/proc/1/mountinfo`, `/proc/1/status`, `/proc/1/net` — PID 1 entries
- `/proc/sys/vm/overcommit_memory`, `/proc/sys/fs/*` — sysctl entries
- `/proc/pressure/{cpu,io,memory}` — PSI entries
- `/proc/swaps`, `/proc/cpuinfo`, `/proc/kcore` — root proc entries
------------------8<------------------
>
>
> > Note that this was an older (v6.13-based) kernel though, and it's
> > possible there are other unfixed bugs in here.
> >
> > I'm currently trying to reproduce the UAF using that mdelay() based
> > fault injection patch on something closer to mainline.
...and so far it's not trivially reproducible. I can reproduce the
stalls on a recent kernel without your patches, but whatever causes
this UAF is not easily triggerable by just stalling things out.
I think I'm going to have to hunt for more vmcores...
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-04-09 20:10 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23 0:19 ` Linus Torvalds
2026-01-23 0:36 ` Al Viro
2026-01-24 4:36 ` Al Viro
2026-01-24 4:46 ` Linus Torvalds
2026-01-24 5:36 ` Al Viro
2026-01-24 17:45 ` Linus Torvalds
2026-01-24 18:43 ` Al Viro
2026-01-24 19:32 ` Linus Torvalds
2026-01-24 20:28 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52 ` Linus Torvalds
2026-04-02 22:44 ` Al Viro
2026-04-02 22:49 ` Linus Torvalds
2026-04-02 23:16 ` Al Viro
2026-04-03 0:29 ` Linus Torvalds
2026-04-03 2:15 ` Al Viro
2026-04-04 0:02 ` Al Viro
2026-04-04 0:04 ` Linus Torvalds
2026-04-04 18:54 ` Al Viro
2026-04-04 19:04 ` Linus Torvalds
2026-04-05 0:04 ` Al Viro
2026-04-02 20:28 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03 4:46 ` Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51 ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02 ` Al Viro
2026-04-09 20:10 ` Jeff Layton [this message]
2026-04-09 21:57 ` Al Viro
2026-04-09 22:38 ` Jeff Layton
2026-04-10 8:48 ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro
2026-04-10 11:18 ` Jeff Layton
2026-04-10 11:56 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41cfd0f95b7fde411c0d59463dce979be89cb8ef.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=nik.borisov@suse.com \
--cc=pc@manguebit.org \
--cc=sandeen@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox