Re: [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	clm@meta.com, gustavold@meta.com
Subject: Re: [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru
Date: Wed, 8 Apr 2026 23:43:19 +0100	[thread overview]
Message-ID: <20260408224319.GJ3836593@ZenIV> (raw)
In-Reply-To: <10d8499d60824f319e32f52fd6a196b3645f6652.camel@kernel.org>

On Wed, Apr 08, 2026 at 05:05:41PM -0400, Jeff Layton wrote:

> > ... and it still offers zero explanation of the path from livelock to
> > UAF.  It may or may not be real, but there's nothing in all that verbiage
> > even suggesting what it might be.  And proposed analysis is flat-out
> > wrong.
> > 
> > As for the livelock, see viro/vfs.git #work.dcache-busy-wait (in -next
> > as of today).
> 
> Thanks for taking a look, Al. We'll keep looking at this thing and see
> if we can collect more info, and come up with a better theory of the
> crash. I'll dig deeper into the vmcore tomorrow.

OK...  FWIW, on the trees up to 7.0-rc7:
	* dentry passed to select_collect2() must be still attached to
the tree (no DCACHE_DENTRY_KILLED on it)
	* caller holds ->d_lock on that dentry.
	* in case when its ->d_lockref.count is positive, it's busy (at least
at the moment) and there's nothing to be done; move on.
	* in case when its ->d_lockref.count is negative, it's getting killed
right now.  That's a busy-wait case - we ignore it on this pass, but remember
that we'll need to rescan.
	* in case when its ->d_lockref.count is 0 and it's not on a shrink list:
it's evictable, move it to data.dispose, no problem.
	* in case when its ->d_lockref.count is 0 and it *is* on a shrink list:
try to steal it.  We can't do that from the d_walk() callback itself - it's
a non-blocking environment, to start with, so no evictions are possible there.
What we can do is to have it returned to shrink_dcache_tree() - grab rcu_read_lock()
to make sure it won't get freed (it hasn't reached dentry_unlist(), let alone
dentry_free(), so the grace period for freeing hasn't started yet) and tell d_walk()
to stop and return to caller immediately.
	Note that we deliberately return without rcu_read_unlock() - that's what
protects the victim from getting freed under us.  Other thread might or might not
get around to starting the eviction of the victim, but whether it does that or not,
it won't get around to freeing it.
	In shrink_dcache_tree() we grab ->d_lock on the victim again (it had been
dropped by d_walk() when the callback returned).  Then we call lock_for_kill(),
which starts with checking that ->d_lockref.count is 0.  If it isn't, there's
nothing to be done to that sucker; it either went busy (in which case we should
ignore it and move on) or somebody (likely the owner of the shrink list it had
been on) got around to evicting it; in the latter case we need to wait until
the damn thing is killed.  *IF* refcount is still 0, we carefully acquire the
->i_lock on its inode (if any).  Note that we are still holding rcu_read_lock(),
so it can't get freed under us even if we have to drop and regain ->d_lock.
We do need to recheck that refcount is still zero if we do that, obviously.
	Failing lock_for_kill() is either due to dentry becoming busy (and thus
to be skipped) or due to dentry having been passed to __dentry_kill().  In
the latter case we busy-wait, same as we'd do if we saw negative ->d_lockref.count
in select_collect2().
	Successful lock_for_kill() is followed by evicting the sucker;
we do *not* remove it (or its ancestors, if it had been the only thing
holding them busy) from whatever shrink list they'd been on.  Anything on
shrink lists gets reduced to the state of passive chunk of memory no
longer connected to filesystem objects, marked with DCACHE_DENTRY_KILLED
and left for the owner of shrink list to free once it gets around to that.
There's no urgency anymore -
		rcu_read_lock();
		if (!lock_for_kill(dentry)) {
			bool can_free;
			rcu_read_unlock();
			d_shrink_del(dentry);
			can_free = dentry->d_flags & DCACHE_DENTRY_KILLED;
			spin_unlock(&dentry->d_lock);
			if (can_free)
				dentry_free(dentry);
			continue;
		}
on the shrink_dentry_list() side will have lock_for_kill() return false
(->d_lockref.count already negative), remove the sucker from the list,
see DCACHE_DENTRY_KILLED on it, unlock the sucker and free it.  Note that
nothing in that sequence touches any fs objects - it's just a disposal of
inert chunk of memory now.
	Note that we can't be stealing from ourselves - if anything had
been added to data.dispose, d_walk() sees D_WALK_QUIT or D_WALK_NORETRY
and either buggers off immediately or sets retry to false.  It won't
restart walking the tree in either case, so there's no way for the same
dentry to be revisited by it.

	Having one shrink_dentry_tree() steal from another is OK - see
the conditions when shrink_dentry_list() and __dentry_kill() are calling
dentry_free() (called 'can_free' in both).

	Getting rid of busy-wait is handled with fairly small modification
to that: select_collect2() treats negative refcount same as it would
treat 0-and-on-shrink-list case and shrink_dcache_tree() checks if the
victim has negative refcount and no DCACHE_DENTRY_KILLED yet.  In that
case it adds a local object (struct completion_list node) to a list hanging
off dentry->waiters, evicts whatever else it might've collected, then waits
for dentry_unlist() on the victim to have called complete() on the
node.completion for everything on ->waiters.
	The rest of the logics is unchanged - if victim is not in the
middle of eviction, we try to steal it, etc., same as in the mainline.

     prev parent reply	other threads:[~2026-04-08 22:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 16:44 [PATCH] dcache: warn when a dentry is freed with a non-empty ->d_lru Jeff Layton
2026-04-07 10:51 ` Jan Kara
2026-04-08  6:42 ` Al Viro
2026-04-08 11:10   ` Jeff Layton
2026-04-08 18:28   ` Jeff Layton
2026-04-08 19:26     ` Al Viro
2026-04-08 21:05       ` Jeff Layton
2026-04-08 22:43         ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260408224319.GJ3836593@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=clm@meta.com \
    --cc=gustavold@meta.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox