public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,  Jan Kara <jack@suse.cz>,
	Nikolay Borisov <nik.borisov@suse.com>,
	Max Kellermann	 <max.kellermann@ionos.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Paulo Alcantara	 <pc@manguebit.org>
Subject: Re: [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent()
Date: Thu, 09 Apr 2026 12:51:38 -0400	[thread overview]
Message-ID: <53b4795292d1df2fe1569fc724325ab52fcab322.camel@kernel.org> (raw)
In-Reply-To: <20260404080751.2526990-1-viro@zeniv.linux.org.uk>

On Sat, 2026-04-04 at 09:07 +0100, Al Viro wrote:
> Changes since v2:
> 	* separate select_data from the completion list elements - add a separate
> struct completion_list (private to fs/dcache.c at the moment, but it's completely
> independent from struct dentry and can be lifted e.g. into linux/completion.h
> if there are other potential users).
> 
> Branch (currently -rc6-based) lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.dcache-busy-wait
> individual patches in followups.  Please, review.
> 
> Shortlog:
> Al Viro (4):
>   for_each_alias(): helper macro for iterating through dentries of given inode
>   struct dentry: make ->d_u anonymous
>   dcache.c: more idiomatic "positives are not allowed" sanity checks
>   get rid of busy-waiting in shrink_dcache_tree()
> 
> Diffstat:
>  Documentation/filesystems/porting.rst |  10 ++
>  fs/affs/amigaffs.c                    |   2 +-
>  fs/ceph/mds_client.c                  |   4 +-
>  fs/dcache.c                           | 129 +++++++++++++++++++-------
>  fs/exportfs/expfs.c                   |   2 +-
>  fs/inode.c                            |   2 +-
>  fs/nfs/dir.c                          |   4 +-
>  fs/nfs/getroot.c                      |   2 +-
>  fs/notify/fsnotify.c                  |   2 +-
>  fs/ocfs2/dcache.c                     |   2 +-
>  fs/overlayfs/dir.c                    |   2 +-
>  fs/smb/client/inode.c                 |   2 +-
>  include/linux/dcache.h                |  24 ++++-
>  13 files changed, 140 insertions(+), 47 deletions(-)
> 
> Series overview
> 
> First a couple of cleanups to reduce the amount of noise:
> 
> 1) for_each_alias(): helper macro for iterating through dentries of given inode
> 
> Most of the places using d_alias are loops iterating through all aliases for
> given inode; introduce a helper macro (for_each_alias(dentry, inode))
> and convert open-coded instances of such loop to it.
> 
> They are easier to read that way and it reduces the noise on the next steps.
> 
> 2) struct dentry: make ->d_u anonymous
> 
> Making ->d_rcu and (then) ->d_child overlapping dates back to 2006; anon unions
> support had been added to gcc only in 4.6 (2011) and the minimal gcc version hadn't
> been bumped to that until 4.19 (2018).
> These days there's no reason not to keep that union named.
> 
> Next we get rid of the aforementioned pathologies:
> 
> 3) dcache.c: more idiomatic "positives are not allowed" sanity checks
> 
> Several functions have BUG_ON/WARN_ON sanity checks that want to verify
> that dentry is not positive and instead of looking at ->d_inode (as we
> do in all other places that check that) they look at ->d_alias.
> Just use the normal helpers instead - that way we no longer even look
> at ->d_alias for negative dentries
> 
> Now that everything is ready for adding a new member and killing busy-wait:
> 
> 4) get rid of busy-waiting in shrink_dcache_tree()
> 
> If shrink_dcache_tree() runs into a potential victim that is already
> dying, it must wait for that dentry to go away.  To avoid busy-waiting
> we need some object to wait on and a way for dentry_unlist() to see that
> we need to be notified.
> 
> The obvious place for the object to wait on would be on our stack frame.
> We will store a pointer to that object (struct completion_list) in victim
> dentry; if there's more than one thread wanting to wait for the same
> dentry to finish dying, we'll have their instances linked into a list,
> with reference in dentry pointing to the head of that list.
> 
> * new object - struct completion_list.  A pair of struct completion and
> pointer to the next instance.  That's what shrink_dcache_tree() will wait
> on if needed.
> 
> * add a new member (->waiters, opaque pointer to struct completion_list)
> to struct dentry.  It is defined for negative live dentries that are
> not in-lookup ones and it will remain NULL for almost all of them.
> 
> It does not conflict with ->d_rcu (defined for killed dentries), ->d_alias
> (defined for positive dentries, all live) or ->d_in_lookup_hash (defined
> for in-lookup dentries, all live negative).  That allows to colocate
> all four members.
> 
> * make sure that all places where dentry enters the state where ->waiters
> is defined (live, negative, not-in-lookup) initialize ->waiters to NULL.
> 
> * if select_collect2() runs into a dentry that is already dying, have
> its caller insert a local instance of struct completion_list into the
> head of the list hanging off dentry->waiters and wait for completion.
> 
> * if dentry_unlist() sees non-NULL ->waiters, have it carefully walk
> through the completion_list instances in that list, calling complete()
> for each.
> 
> For now struct completion_list is local to fs/dcache.c; it's obviously
> dentry-agnostic, and it can be trivially lifted into linux/completion.h
> if somebody finds a reason to do so...
> 
> 

This all looks good, Al. Nice work. I like the new wrappers and anon
union too -- makes the code a lot more readable.

I can also confirm that this fixes the livelock we were seeing as a
precursor to this UAF [1]. I'm still not 100% sure how that livelock
leads to a UAF. Claude thinks is has to do with the livelock making the
loop outlive an RCU grace period.

I did eventually find a vmcore and was able to confirm some things:

- in the freed/reallocated dentry, most of the fields are clobbered,
but the d_lru list pointers seemed to be intact. That tells me that the
sucker was added to the shrink list after being freed and reallocated.
That means that the WARN_ON_ONCE() wouldn't have caught this anyway.

- there was some list corruption too: it looked like there were two
different (on-stack) list heads in the list. There were also some
dentries in there that were outside the shrin

Note that this was an older (v6.13-based) kernel though, and it's
possible there are other unfixed bugs in here.

I'm currently trying to reproduce the UAF using that mdelay() based
fault injection patch on something closer to mainline.

[1]: https://lore.kernel.org/linux-fsdevel/20260406-dcache-warn-v1-1-c665efbc005f@kernel.org/

I reviewed the version in your work.dcache-busy-wait branch, but you
can add this to the lot.

Reviewed-by: Jeff Layton <jlayton@kernel.org>



  parent reply	other threads:[~2026-04-09 16:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23  0:19 ` Linus Torvalds
2026-01-23  0:36   ` Al Viro
2026-01-24  4:36     ` Al Viro
2026-01-24  4:46       ` Linus Torvalds
2026-01-24  5:36         ` Al Viro
2026-01-24 17:45           ` Linus Torvalds
2026-01-24 18:43             ` Al Viro
2026-01-24 19:32               ` Linus Torvalds
2026-01-24 20:28                 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52     ` Linus Torvalds
2026-04-02 22:44       ` Al Viro
2026-04-02 22:49         ` Linus Torvalds
2026-04-02 23:16           ` Al Viro
2026-04-03  0:29             ` Linus Torvalds
2026-04-03  2:15               ` Al Viro
2026-04-04  0:02                 ` Al Viro
2026-04-04  0:04                   ` Linus Torvalds
2026-04-04 18:54                     ` Al Viro
2026-04-04 19:04                       ` Linus Torvalds
2026-04-05  0:04                         ` Al Viro
2026-04-02 20:28   ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03  4:46     ` Al Viro
2026-04-04  8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51   ` Jeff Layton [this message]
2026-04-09 19:02     ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-09 20:10       ` Jeff Layton
2026-04-09 21:57         ` Al Viro
2026-04-09 22:38           ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53b4795292d1df2fe1569fc724325ab52fcab322.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=nik.borisov@suse.com \
    --cc=pc@manguebit.org \
    --cc=sandeen@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox