Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>, Jan Kara	 <jack@suse.cz>,
	Nikolay Borisov <nik.borisov@suse.com>,
	Max Kellermann	 <max.kellermann@ionos.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Paulo Alcantara	 <pc@manguebit.org>
Subject: Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()
Date: Thu, 09 Apr 2026 16:10:41 -0400	[thread overview]
Message-ID: <41cfd0f95b7fde411c0d59463dce979be89cb8ef.camel@kernel.org> (raw)
In-Reply-To: <20260409190213.GQ3836593@ZenIV>

On Thu, 2026-04-09 at 20:02 +0100, Al Viro wrote:
> On Thu, Apr 09, 2026 at 12:51:38PM -0400, Jeff Layton wrote:
> 
> > I can also confirm that this fixes the livelock we were seeing as a
> > precursor to this UAF [1]. I'm still not 100% sure how that livelock
> > leads to a UAF. Claude thinks is has to do with the livelock making the
> > loop outlive an RCU grace period.
> 
> Claude is full of substance likely to cause CoC complaints when mentioned,
> film at 11.  I agree with the "livelock opens up some window" part,
> what with the stack traces there, but I very much doubt that mechanism
> has anything to do with the stuff on a shrink list staying around from
> one iteration to another.  The thing is, *nothing* is held between the
> iterations of loop in there; data.dispose is empty at the end of the loop
> body in all cases and reference in data.victim is not retained either
> across the iterations either.  We do have a weird RCU use pattern there,
> all right (unbalanced acquire in select_collect2() paired with drop in
> the caller of caller of select_collect2()), but the scope is neither
> long nor extended by the livelock - in that case the function between
> shrink_dentry_tree() and select_collect2() (d_walk()) is told to drop
> the locks and return to its caller immediately.
> 
> What I suspect might be getting opened up is the possibility
> of d_invalidate() (e.g. from pathname resolution trying
> to walk into /proc/<pid>/...) hitting in the middle of
> proc_invalidate_siblings_dcache().
> 
> Incidentally, one thing to check would be whether anything on affected
> boxen had things mounted under /proc/<pid>/... - that would give
> some idea whether detach_mounts() (and namespace_unlock() with its
> handling of the list of mountpoints) might be needed in the mix.
> 

● had_submounts = false. The d_invalidate code never found submounts, so
   detach_mounts() was never called for this dentry. The find_submount /
   namespace_unlock path was not involved in this crash.

  Summary for Al: On this crashed machine, there were no mounts under
  /proc/<pid>/... in any of the 49 mount namespaces checked (from
  surviving tasks). The d_invalidate stack frame shows had_submounts =
  false, confirming that detach_mounts() and namespace_unlock() were not
   in the mix for this crash. The code path was purely
  shrink_dcache_parent → shrink_dentry_list → lock_for_kill, without any
   mount detachment involved.


> > I did eventually find a vmcore and was able to confirm some things:
> > 
> > - in the freed/reallocated dentry, most of the fields are clobbered,
> > but the d_lru list pointers seemed to be intact. That tells me that the
> > sucker was added to the shrink list after being freed and reallocated.
> > That means that the WARN_ON_ONCE() wouldn't have caught this anyway.
> > 
> > - there was some list corruption too: it looked like there were two
> > different (on-stack) list heads in the list. There were also some
> > dentries in there that were outside the 

...directories being shrunk from processes exiting. Here's what it
found when I told it to validate the integrity of the d_lru that the
realloc'ed entry was on.

More claude slop from the earlier vmcore analysis:

------------------8<------------------
### Dispose List: Two Merged Lists
The dispose list contains **55 unique entries** and **two stack-based
list heads** forming a single ring:
1. **head_A** at `0xffffc900c0e8fcc0` — crashed thread's `data.dispose`
2. **head_B** at `0xffffc900c0e7fcc0` — another thread's `data.dispose`
(exited, stack freed, exactly 0x10000 bytes apart)
```
head_A ↔ UAF ↔ mountinfo(1) ↔ swaps(/) ↔ ... ↔
  head_B(STACK) ↔ status(5962) ↔ exe(5962) ↔ ... ↔
  stat(8808) ↔ ... ↔ mountinfo(1) ↔ UAF ↔ head_A
```
### d_walk Escaped Its Starting Dentry
`d_walk` was called with `parent = /proc/4530/task/5964` (`data.start`
confirmed in stack frame). It should only traverse descendants of 5964.
But the dispose list contains entries from:
- `/proc/4530/task/5962/*` (151 children — sibling of 5964)
- `/proc/4530/task/6830`, `/proc/4530/task/8808` — other task entries
- `/proc/1/mountinfo`, `/proc/1/status`, `/proc/1/net` — PID 1 entries
- `/proc/sys/vm/overcommit_memory`, `/proc/sys/fs/*` — sysctl entries
- `/proc/pressure/{cpu,io,memory}` — PSI entries
- `/proc/swaps`, `/proc/cpuinfo`, `/proc/kcore` — root proc entries
------------------8<------------------

> 
>  
> > Note that this was an older (v6.13-based) kernel though, and it's
> > possible there are other unfixed bugs in here.
> > 
> > I'm currently trying to reproduce the UAF using that mdelay() based
> > fault injection patch on something closer to mainline.

...and so far it's not trivially reproducible. I can reproduce the
stalls on a recent kernel without your patches, but whatever causes
this UAF is not easily triggerable by just stalling things out.

I think I'm going to have to hunt for more vmcores...
-- 
Jeff Layton <jlayton@kernel.org>

next prev parent reply	other threads:[~2026-04-09 20:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23  0:19 ` Linus Torvalds
2026-01-23  0:36   ` Al Viro
2026-01-24  4:36     ` Al Viro
2026-01-24  4:46       ` Linus Torvalds
2026-01-24  5:36         ` Al Viro
2026-01-24 17:45           ` Linus Torvalds
2026-01-24 18:43             ` Al Viro
2026-01-24 19:32               ` Linus Torvalds
2026-01-24 20:28                 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52     ` Linus Torvalds
2026-04-02 22:44       ` Al Viro
2026-04-02 22:49         ` Linus Torvalds
2026-04-02 23:16           ` Al Viro
2026-04-03  0:29             ` Linus Torvalds
2026-04-03  2:15               ` Al Viro
2026-04-04  0:02                 ` Al Viro
2026-04-04  0:04                   ` Linus Torvalds
2026-04-04 18:54                     ` Al Viro
2026-04-04 19:04                       ` Linus Torvalds
2026-04-05  0:04                         ` Al Viro
2026-04-02 20:28   ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03  4:46     ` Al Viro
2026-04-04  8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51   ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02     ` Al Viro
2026-04-09 20:10       ` Jeff Layton [this message]
2026-04-09 21:57         ` Al Viro
2026-04-09 22:38           ` Jeff Layton
2026-04-10  8:48           ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro
2026-04-10 11:18             ` Jeff Layton
2026-04-10 11:56               ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41cfd0f95b7fde411c0d59463dce979be89cb8ef.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=nik.borisov@suse.com \
    --cc=pc@manguebit.org \
    --cc=sandeen@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox