Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>, Jan Kara	 <jack@suse.cz>,
	Nikolay Borisov <nik.borisov@suse.com>,
	Max Kellermann	 <max.kellermann@ionos.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Paulo Alcantara	 <pc@manguebit.org>
Subject: Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()
Date: Thu, 09 Apr 2026 18:38:28 -0400	[thread overview]
Message-ID: <a8fc5dfa19cf704c3f668f91443fdd1b2e01e612.camel@kernel.org> (raw)
In-Reply-To: <20260409215733.GS3836593@ZenIV>

On Thu, 2026-04-09 at 22:57 +0100, Al Viro wrote:
> On Thu, Apr 09, 2026 at 04:10:41PM -0400, Jeff Layton wrote:
> 
> > head_A ↔ UAF ↔ mountinfo(1) ↔ swaps(/) ↔ ... ↔
> >   head_B(STACK) ↔ status(5962) ↔ exe(5962) ↔ ... ↔
> >   stat(8808) ↔ ... ↔ mountinfo(1) ↔ UAF ↔ head_A
> > ```
> 

I think the above is BS. Claude says:

In the earlier analysis I walked the list both forward from 
head_B (the stack entry) and backward from head_A  That was misleading
— there's only one mountinfo node. The ring passes through it once. The
duplicate in my listing was an artifact of walking the ring from two
entry points and not  deduplicating.                                  


> Wait a bloody minute; *two* UAF and two mountinfo there?  Are the links
> in that cyclic list consistent?  ->next and ->prev, I mean.

No:

The list was NOT a proper ring:

  - FWD (head.next): LOOP at entry 1 — immediately hits a self-loop (the processed "status" dentry)
  - BWD (head.prev): LOOP at entry 55 — walks 55 entries then hits a cycle


> If I understand the notation correctly... are there two different
> dentries, both with "mountinfo" for name and PROC_I(_->d_inode)->pid
> being PID 1?  What are their ->d_parent pointing to?  Are they hashed?
> 
> > ### d_walk Escaped Its Starting Dentry
> > `d_walk` was called with `parent = /proc/4530/task/5964` (`data.start`
> > confirmed in stack frame). It should only traverse descendants of 5964.
> > But the dispose list contains entries from:
> > - `/proc/4530/task/5962/*` (151 children — sibling of 5964)
> > - `/proc/4530/task/6830`, `/proc/4530/task/8808` — other task entries
> > - `/proc/1/mountinfo`, `/proc/1/status`, `/proc/1/net` — PID 1 entries
> > - `/proc/sys/vm/overcommit_memory`, `/proc/sys/fs/*` — sysctl entries
> > - `/proc/pressure/{cpu,io,memory}` — PSI entries
> > - `/proc/swaps`, `/proc/cpuinfo`, `/proc/kcore` — root proc entries
> 
> Not at all obvious; there's a list from another thread mixed into that,
> and we've no idea what had the root been for that one.  For that matter,
> I'd check ->d_flags on the entries, just to verify that those are
> from shrink lists and not from LRU - the fact that these UAF still
> have pointers to plausible dentries does not mean they are from the
> same moment in time; if dentry in question had been on a different
> list before getting freed...  That's why the question about ->prev
> and ->next consistncy.
>
> And seeing that procfs has zero callers of d_splice_alias() or d_move(),
> I would expect ->d_parent on all dentries in there to be constant over
> the entire lifetime; would be rather hard for d_walk() to escape in
> such conditions...

Definitely not legitimately. I suspect some other sort of mem
corruption. /proc/1/mountinfo is on the list though:

mountinfo dentries on dispose list: 1                                 
                                                                                                    
    /proc/1/mountinfo (dentry 0xffff88823deef800)                     
      d_parent: 0xffff891373c06a80 "1" -> "/" (root)                  
      hashed: yes (d_hash.pprev=0xffff893d337264d0)                   
      PROC_I->pid nr: 3217077                                         
      d_inode: 0xffff88ea55a44508                                     

Claude's theory (maybe garbage):

PID 1's mountinfo being on the list is actually a strong clue — it's
a dentry that would be on the global LRU (negative or zero-count
after someone closed /proc/1/mountinfo), and drop_caches would pull
it off the LRU via prune_dcache_sb. That's likely how it ended up on
Thread B's dispose list.

Here's a cursory comparitive analysis across the 3 vmcores I have so
far:

  Comparative analysis across three vmcores                                                      
  =========================================                                                      
                                                                                                 
  Kernel                  6.13.2-0_fbk5-hf2   6.13.2-0_fbk12-hf1      6.13.2-0_fbk5-hf2          
  Date                    Mar 31               Apr 9                    Apr 2                    
  Process                 SREventBase61        tokio-runtime-w          TTLSFWorkerU151          
  d_invalidate target     /task/5964           /task/3555587            /task/1102               
  had_submounts           False                False                    False                    
  UAF slab                kmalloc-96           kmalloc-64               kmalloc-96               
  UAF d_flags             0x0                  0x81e73ce0               0x0                      
  UAF d_sb                0x0                  0x0                      0x0                      
  UAF d_lockref.count     0                    -14080                   -771573072               

  Consistent findings across all three cores:                                                    
                        
    - had_submounts is False in every case.  detach_mounts() and                                 
      namespace_unlock() are not involved in any of these crashes.                               

    - No mounts under /proc/<pid>/... in any mount namespace (checked                            
      on the od0103 core across 49 namespaces, zero proc submounts).                             

    - All three are in the same call path:                                                       
        release_task -> proc_flush_pid -> d_invalidate ->                                        
        shrink_dcache_parent -> shrink_dentry_list -> spin_lock (UAF)                            
                                                
    - The UAF'd dentry is in a non-dentry slab in every case (kmalloc-64                         
      or kmalloc-96), confirming the dentry was freed and its page reused                        
      by a different slab cache before shrink_dentry_list tried to lock it.                      
                                                                                                 
    - Different applications triggered the crash (Rust async runtime,                            
      C++ worker, Java-like worker) -- not application-specific.

next prev parent reply	other threads:[~2026-04-09 22:38 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23  0:19 ` Linus Torvalds
2026-01-23  0:36   ` Al Viro
2026-01-24  4:36     ` Al Viro
2026-01-24  4:46       ` Linus Torvalds
2026-01-24  5:36         ` Al Viro
2026-01-24 17:45           ` Linus Torvalds
2026-01-24 18:43             ` Al Viro
2026-01-24 19:32               ` Linus Torvalds
2026-01-24 20:28                 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52     ` Linus Torvalds
2026-04-02 22:44       ` Al Viro
2026-04-02 22:49         ` Linus Torvalds
2026-04-02 23:16           ` Al Viro
2026-04-03  0:29             ` Linus Torvalds
2026-04-03  2:15               ` Al Viro
2026-04-04  0:02                 ` Al Viro
2026-04-04  0:04                   ` Linus Torvalds
2026-04-04 18:54                     ` Al Viro
2026-04-04 19:04                       ` Linus Torvalds
2026-04-05  0:04                         ` Al Viro
2026-04-02 20:28   ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03  4:46     ` Al Viro
2026-04-04  8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51   ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02     ` Al Viro
2026-04-09 20:10       ` Jeff Layton
2026-04-09 21:57         ` Al Viro
2026-04-09 22:38           ` Jeff Layton [this message]
2026-04-10  8:48           ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8fc5dfa19cf704c3f668f91443fdd1b2e01e612.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=nik.borisov@suse.com \
    --cc=pc@manguebit.org \
    --cc=sandeen@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox