public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Jeff Layton <jlayton@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Nikolay Borisov <nik.borisov@suse.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Paulo Alcantara <pc@manguebit.org>
Subject: Re: [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order
Date: Fri, 10 Apr 2026 21:24:04 +0100	[thread overview]
Message-ID: <20260410202404.GW3836593@ZenIV> (raw)
In-Reply-To: <CAHk-=wiiJ4uO6C3U-B_SVuyjzQpNd3yMjWck8kJ7tZ4oPwhGWA@mail.gmail.com>

On Fri, Apr 10, 2026 at 12:30:13PM -0700, Linus Torvalds wrote:

> The reason it exists is because lock_for_kill() can drop d_lock(), but
> that's in the unlikely case that we cn't just immediately get the
> inode lock.
> 
> So honestly, I think that rcu_read_lock() should be inside
> lock_for_kill(), rather than in the caller as a "just in case things
> go down".

Yup, in the cascade of followups I've mentioned...

> IOW, that code basically now not only does that typically unnecessary
> rcu locking (and then unlocking), it does so *because* it knows about
> the subtle internal behavior of lock_for_kill().
> 
> In contrast, if we put it inside lock_for_kill(), all we need is a
> fairly straightforward comment along the lines of
> 
>   "we're dropping the spinlock in order to take the inode lock, but
> the caller may be depending on RCU behavior so we need to take the rcu
> read lock first".
> 
>  that would turn strange illogical code that also generates worse code
> into straightforwardly explained code that also performs better.
> 
> Ok, so "performs better" is kind of exaggerated, in that obviously the
> extra rcu_read_lock/unlock sequences aren't exactly expensive, but
> still - I feel it's a win-win to just do this differently.
> 
> Or am I missing somethign else?

Mostly that fast_dput() calling conventions would need to change a bit
as well.

With the above as step 1,

Step 2: fold lock_for_kill() into shrink_kill().  Result: 2 call sites of
lock_for_kill() left, shrink_kill() becomes
static inline void shrink_kill(struct dentry *victim)
{
	while (lock_for_kill(victim)) {   
		rcu_read_unlock();
		victim = __dentry_kill(victim);
		if (!victim)
			return;
		rcu_read_lock();
	}
	spin_unlock(&victim->d_lock);
	rcu_read_unlock();
}
shrink_dentry_list() and shrink_dentry_tree() calling shrink_kill()
without bothering with lock_for_kill().

Step 3: note that both call sites of lock_for_kill() are followed with the
same sequence - drop ->d_lock + rcu_read_unlock on failure,
rcu_read_unlock + __dentry_kill() on success.  And these are the
only callers of __dentry_kill().  Moreover, NULL from __dentry_kill()
means "bugger off" in both callers.  So add

struct dentry *dentry_kill(dentry)
{
	if (!lock_for_kill(dentry)) {
		spin_unlock(&dentry->d_lock);
		rcu_read_unlock();
		return NULL;
	}
	rcu_read_unlock();
	return __dentry_kill(dentry);
}
and convert both callers to that:
static inline void shrink_kill(struct dentry *victim)
{
	while ((victim = dentry_kill(victim)) != NULL)
		rcu_read_lock();
}
and
static void finish_dput(struct dentry *dentry)
{
	while ((dentry = dentry_kill(dentry)) != NULL) {
		if (retain_dentry(dentry, true)) {
			spin_unlock(&dentry->d_lock);
			return;
		}
		rcu_read_lock();
	}
}

Step 4: note that callers of shrink_kill() and finish_dput()
have dentry->d_lock held.  We can (assuming there's no RCU
bugs) drop the lock before calling them and grab it the
first thing in.  Moreover, we can shift grabbing it into
lock_for_kill(), leaving these
static inline void shrink_kill(struct dentry *victim)
{
	while ((victim = dentry_kill(victim)) != NULL)
		;
}
and
static void finish_dput(struct dentry *dentry)
{
	while ((dentry = dentry_kill(dentry)) != NULL) {
		if (retain_dentry(dentry, true)) {
			spin_unlock(&dentry->d_lock);
			return;
		}
	}
}
resp., with the callers becoming
void dput(struct dentry *dentry)
{
	if (!dentry)
		return;
	might_sleep();
	rcu_read_lock();
	if (likely(fast_dput(dentry))) {
		rcu_read_unlock();
		return;
	}
	rcu_read_unlock();
	finish_dput(dentry);
}

void d_make_discardable(struct dentry *dentry)
{
        spin_lock(&dentry->d_lock);
        WARN_ON(!(dentry->d_flags & DCACHE_PERSISTENT));
        dentry->d_flags &= ~DCACHE_PERSISTENT;
        dentry->d_lockref.count--;
        finish_dput(dentry);
}

void shrink_dentry_list(struct list_head *list)
{
        while (!list_empty(list)) {
                struct dentry *dentry;
 
                dentry = list_entry(list->prev, struct dentry, d_lru);
                spin_lock(&dentry->d_lock);
                d_shrink_del(dentry);
		if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED) {
                        spin_unlock(&dentry->d_lock);
			dentry_free(dentry);
			continue;
		}
                shrink_kill(dentry);
	}
}

and, in shrink_dentry_list(),
                        spin_lock(&v->d_lock);
			rcu_read_unlock();	// now held by ->d_lock
                        if (v->d_lockref.count < 0 &&
                            !(v->d_flags & DCACHE_DENTRY_KILLED)) {
                                struct completion_list wait;
                                // It's busy dying; have it notify us once
                                // it becomes invisible to d_walk().
                                d_add_waiter(v, &wait);
                                spin_unlock(&v->d_lock);
                                if (!list_empty(&data.dispose))
                                        shrink_dentry_list(&data.dispose);
                                wait_for_completion(&wait.completion);
                                continue;
                        }
			shrink_kill(v);

lock_for_kill() has grown a call of rcu_read_lock() in the very beginning,
will be gone shortly.

Step 5: pull the followup rcu_read_unlock() (and dropping ->d_lock on
failure) into lock_for_kill(); again, that assumes no RCU bugs there:
static bool lock_for_kill(struct dentry *dentry)
{
        struct inode *inode = dentry->d_inode;
 
        if (unlikely(dentry->d_lockref.count)) {
		spin_unlock(&dentry->d_lock);
                return false;
	}
 
        if (!inode || likely(spin_trylock(&inode->i_lock)))
                return true;
 
	rcu_read_lock();
        do {   
                spin_unlock(&dentry->d_lock);
                spin_lock(&inode->i_lock);
                spin_lock(&dentry->d_lock);
                if (likely(inode == dentry->d_inode))
                        break;
                spin_unlock(&inode->i_lock);
                inode = dentry->d_inode;
        } while (inode);
	rcu_read_unlock();
        if (likely(!dentry->d_lockref.count))
                return true;
        if (inode)
                spin_unlock(&inode->i_lock);
	spin_unlock(&dentry->d_lock);
        return false;
}

and in dentry_kill()
{
	if (!lock_for_kill(dentry))
		return NULL;
	return __dentry_kill(dentry);
}

Step 6: fold __dentry_kill() into dentry_kill().

Step 7: both callers of fast_dput() are identical:
        rcu_read_lock();
        if (likely(fast_dput(dentry))) {
                rcu_read_unlock();
                return;
        }
	rcu_read_unlock();
(in dput() and dput_to_list()).  Pull rcu_read_lock() and rcu_read_unlock()
into fast_dput(); ends up with 5 lines added there.  Result (with comments
skipped) is
static inline bool fast_dput(struct dentry *dentry)
{
        int ret;
 
	rcu_read_lock();		// added
        ret = lockref_put_return(&dentry->d_lockref);
 
        if (unlikely(ret < 0)) {
                spin_lock(&dentry->d_lock);
		rcu_read_unlock();	// added
                if (WARN_ON_ONCE(dentry->d_lockref.count <= 0)) {
                        spin_unlock(&dentry->d_lock);
                        return true;
                }
                dentry->d_lockref.count--;
                goto locked;
        }
        if (ret) {
		rcu_read_unlock();	// added
                return true;
	}
        if (retain_dentry(dentry, false)) {
		rcu_read_unlock();	// added
                return true;
	}
        spin_lock(&dentry->d_lock);
	rcu_read_unlock();		// added
locked:
        if (dentry->d_lockref.count || retain_dentry(dentry, true)) {
                spin_unlock(&dentry->d_lock);
                return true;
        }
        return false;
}

Callers become
void dput(struct dentry *dentry)
{
	if (!dentry)
		return;
	might_sleep();
	if (likely(fast_dput(dentry)))
		return;
	finish_dput(dentry);
}
and
void dput_to_list(struct dentry *dentry, struct list_head *list)
{
	if (likely(fast_dput(dentry)))
		return;
	to_shrink_list(dentry, list);
	spin_unlock(&dentry->d_lock);
}
resp.

  reply	other threads:[~2026-04-10 20:20 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23  0:19 ` Linus Torvalds
2026-01-23  0:36   ` Al Viro
2026-01-24  4:36     ` Al Viro
2026-01-24  4:46       ` Linus Torvalds
2026-01-24  5:36         ` Al Viro
2026-01-24 17:45           ` Linus Torvalds
2026-01-24 18:43             ` Al Viro
2026-01-24 19:32               ` Linus Torvalds
2026-01-24 20:28                 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08   ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52     ` Linus Torvalds
2026-04-02 22:44       ` Al Viro
2026-04-02 22:49         ` Linus Torvalds
2026-04-02 23:16           ` Al Viro
2026-04-03  0:29             ` Linus Torvalds
2026-04-03  2:15               ` Al Viro
2026-04-04  0:02                 ` Al Viro
2026-04-04  0:04                   ` Linus Torvalds
2026-04-04 18:54                     ` Al Viro
2026-04-04 19:04                       ` Linus Torvalds
2026-04-05  0:04                         ` Al Viro
2026-04-02 20:28   ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03  4:46     ` Al Viro
2026-04-04  8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04  8:07   ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51   ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02     ` Al Viro
2026-04-09 20:10       ` Jeff Layton
2026-04-09 21:57         ` Al Viro
2026-04-09 22:38           ` Jeff Layton
2026-04-10  8:48           ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro
2026-04-10 11:18             ` Jeff Layton
2026-04-10 11:56               ` Jeff Layton
2026-04-10 15:25             ` Linus Torvalds
2026-04-10 15:57               ` Al Viro
2026-04-10 16:27               ` Boqun Feng
2026-04-10 17:31                 ` Linus Torvalds
2026-04-10 18:11                   ` Paul E. McKenney
2026-04-10 18:21                   ` Jeff Layton
2026-04-10 19:19                     ` Al Viro
2026-04-10 19:32                       ` Jeff Layton
2026-04-10 21:13                         ` Calvin Owens
2026-04-10 21:24                           ` Al Viro
2026-04-10 22:15                             ` Calvin Owens
2026-04-10 23:05                               ` Al Viro
2026-04-10 23:30                                 ` Calvin Owens
2026-04-11  0:51                                   ` Al Viro
2026-04-10 17:32               ` Paul E. McKenney
2026-04-10 18:26                 ` Jeff Layton
2026-04-10 18:36                   ` Paul E. McKenney
2026-04-10 18:52               ` Al Viro
2026-04-10 19:21                 ` Paul E. McKenney
2026-04-10 19:30                 ` Linus Torvalds
2026-04-10 20:24                   ` Al Viro [this message]
2026-04-10 20:48                     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260410202404.GW3836593@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=boqun@kernel.org \
    --cc=brauner@kernel.org \
    --cc=frederic@kernel.org \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pc@manguebit.org \
    --cc=sandeen@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox