From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Jeff Layton <jlayton@kernel.org>,
linux-fsdevel@vger.kernel.org,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Nikolay Borisov <nik.borisov@suse.com>,
Max Kellermann <max.kellermann@ionos.com>,
Eric Sandeen <sandeen@redhat.com>,
Paulo Alcantara <pc@manguebit.org>
Subject: Re: [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order
Date: Fri, 10 Apr 2026 21:24:04 +0100 [thread overview]
Message-ID: <20260410202404.GW3836593@ZenIV> (raw)
In-Reply-To: <CAHk-=wiiJ4uO6C3U-B_SVuyjzQpNd3yMjWck8kJ7tZ4oPwhGWA@mail.gmail.com>
On Fri, Apr 10, 2026 at 12:30:13PM -0700, Linus Torvalds wrote:
> The reason it exists is because lock_for_kill() can drop d_lock(), but
> that's in the unlikely case that we cn't just immediately get the
> inode lock.
>
> So honestly, I think that rcu_read_lock() should be inside
> lock_for_kill(), rather than in the caller as a "just in case things
> go down".
Yup, in the cascade of followups I've mentioned...
> IOW, that code basically now not only does that typically unnecessary
> rcu locking (and then unlocking), it does so *because* it knows about
> the subtle internal behavior of lock_for_kill().
>
> In contrast, if we put it inside lock_for_kill(), all we need is a
> fairly straightforward comment along the lines of
>
> "we're dropping the spinlock in order to take the inode lock, but
> the caller may be depending on RCU behavior so we need to take the rcu
> read lock first".
>
> that would turn strange illogical code that also generates worse code
> into straightforwardly explained code that also performs better.
>
> Ok, so "performs better" is kind of exaggerated, in that obviously the
> extra rcu_read_lock/unlock sequences aren't exactly expensive, but
> still - I feel it's a win-win to just do this differently.
>
> Or am I missing somethign else?
Mostly that fast_dput() calling conventions would need to change a bit
as well.
With the above as step 1,
Step 2: fold lock_for_kill() into shrink_kill(). Result: 2 call sites of
lock_for_kill() left, shrink_kill() becomes
static inline void shrink_kill(struct dentry *victim)
{
while (lock_for_kill(victim)) {
rcu_read_unlock();
victim = __dentry_kill(victim);
if (!victim)
return;
rcu_read_lock();
}
spin_unlock(&victim->d_lock);
rcu_read_unlock();
}
shrink_dentry_list() and shrink_dentry_tree() calling shrink_kill()
without bothering with lock_for_kill().
Step 3: note that both call sites of lock_for_kill() are followed with the
same sequence - drop ->d_lock + rcu_read_unlock on failure,
rcu_read_unlock + __dentry_kill() on success. And these are the
only callers of __dentry_kill(). Moreover, NULL from __dentry_kill()
means "bugger off" in both callers. So add
struct dentry *dentry_kill(dentry)
{
if (!lock_for_kill(dentry)) {
spin_unlock(&dentry->d_lock);
rcu_read_unlock();
return NULL;
}
rcu_read_unlock();
return __dentry_kill(dentry);
}
and convert both callers to that:
static inline void shrink_kill(struct dentry *victim)
{
while ((victim = dentry_kill(victim)) != NULL)
rcu_read_lock();
}
and
static void finish_dput(struct dentry *dentry)
{
while ((dentry = dentry_kill(dentry)) != NULL) {
if (retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return;
}
rcu_read_lock();
}
}
Step 4: note that callers of shrink_kill() and finish_dput()
have dentry->d_lock held. We can (assuming there's no RCU
bugs) drop the lock before calling them and grab it the
first thing in. Moreover, we can shift grabbing it into
lock_for_kill(), leaving these
static inline void shrink_kill(struct dentry *victim)
{
while ((victim = dentry_kill(victim)) != NULL)
;
}
and
static void finish_dput(struct dentry *dentry)
{
while ((dentry = dentry_kill(dentry)) != NULL) {
if (retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return;
}
}
}
resp., with the callers becoming
void dput(struct dentry *dentry)
{
if (!dentry)
return;
might_sleep();
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
return;
}
rcu_read_unlock();
finish_dput(dentry);
}
void d_make_discardable(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
WARN_ON(!(dentry->d_flags & DCACHE_PERSISTENT));
dentry->d_flags &= ~DCACHE_PERSISTENT;
dentry->d_lockref.count--;
finish_dput(dentry);
}
void shrink_dentry_list(struct list_head *list)
{
while (!list_empty(list)) {
struct dentry *dentry;
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(&dentry->d_lock);
d_shrink_del(dentry);
if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED) {
spin_unlock(&dentry->d_lock);
dentry_free(dentry);
continue;
}
shrink_kill(dentry);
}
}
and, in shrink_dentry_list(),
spin_lock(&v->d_lock);
rcu_read_unlock(); // now held by ->d_lock
if (v->d_lockref.count < 0 &&
!(v->d_flags & DCACHE_DENTRY_KILLED)) {
struct completion_list wait;
// It's busy dying; have it notify us once
// it becomes invisible to d_walk().
d_add_waiter(v, &wait);
spin_unlock(&v->d_lock);
if (!list_empty(&data.dispose))
shrink_dentry_list(&data.dispose);
wait_for_completion(&wait.completion);
continue;
}
shrink_kill(v);
lock_for_kill() has grown a call of rcu_read_lock() in the very beginning,
will be gone shortly.
Step 5: pull the followup rcu_read_unlock() (and dropping ->d_lock on
failure) into lock_for_kill(); again, that assumes no RCU bugs there:
static bool lock_for_kill(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
if (unlikely(dentry->d_lockref.count)) {
spin_unlock(&dentry->d_lock);
return false;
}
if (!inode || likely(spin_trylock(&inode->i_lock)))
return true;
rcu_read_lock();
do {
spin_unlock(&dentry->d_lock);
spin_lock(&inode->i_lock);
spin_lock(&dentry->d_lock);
if (likely(inode == dentry->d_inode))
break;
spin_unlock(&inode->i_lock);
inode = dentry->d_inode;
} while (inode);
rcu_read_unlock();
if (likely(!dentry->d_lockref.count))
return true;
if (inode)
spin_unlock(&inode->i_lock);
spin_unlock(&dentry->d_lock);
return false;
}
and in dentry_kill()
{
if (!lock_for_kill(dentry))
return NULL;
return __dentry_kill(dentry);
}
Step 6: fold __dentry_kill() into dentry_kill().
Step 7: both callers of fast_dput() are identical:
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
return;
}
rcu_read_unlock();
(in dput() and dput_to_list()). Pull rcu_read_lock() and rcu_read_unlock()
into fast_dput(); ends up with 5 lines added there. Result (with comments
skipped) is
static inline bool fast_dput(struct dentry *dentry)
{
int ret;
rcu_read_lock(); // added
ret = lockref_put_return(&dentry->d_lockref);
if (unlikely(ret < 0)) {
spin_lock(&dentry->d_lock);
rcu_read_unlock(); // added
if (WARN_ON_ONCE(dentry->d_lockref.count <= 0)) {
spin_unlock(&dentry->d_lock);
return true;
}
dentry->d_lockref.count--;
goto locked;
}
if (ret) {
rcu_read_unlock(); // added
return true;
}
if (retain_dentry(dentry, false)) {
rcu_read_unlock(); // added
return true;
}
spin_lock(&dentry->d_lock);
rcu_read_unlock(); // added
locked:
if (dentry->d_lockref.count || retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return true;
}
return false;
}
Callers become
void dput(struct dentry *dentry)
{
if (!dentry)
return;
might_sleep();
if (likely(fast_dput(dentry)))
return;
finish_dput(dentry);
}
and
void dput_to_list(struct dentry *dentry, struct list_head *list)
{
if (likely(fast_dput(dentry)))
return;
to_shrink_list(dentry, list);
spin_unlock(&dentry->d_lock);
}
resp.
next prev parent reply other threads:[~2026-04-10 20:20 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 20:20 [PATCH][RFC] get rid of busy-wait in shrink_dcache_tree() Al Viro
2026-01-23 0:19 ` Linus Torvalds
2026-01-23 0:36 ` Al Viro
2026-01-24 4:36 ` Al Viro
2026-01-24 4:46 ` Linus Torvalds
2026-01-24 5:36 ` Al Viro
2026-01-24 17:45 ` Linus Torvalds
2026-01-24 18:43 ` Al Viro
2026-01-24 19:32 ` Linus Torvalds
2026-01-24 20:28 ` Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-02 18:08 ` [RFC PATCH v2 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-02 19:52 ` Linus Torvalds
2026-04-02 22:44 ` Al Viro
2026-04-02 22:49 ` Linus Torvalds
2026-04-02 23:16 ` Al Viro
2026-04-03 0:29 ` Linus Torvalds
2026-04-03 2:15 ` Al Viro
2026-04-04 0:02 ` Al Viro
2026-04-04 0:04 ` Linus Torvalds
2026-04-04 18:54 ` Al Viro
2026-04-04 19:04 ` Linus Torvalds
2026-04-05 0:04 ` Al Viro
2026-04-02 20:28 ` [RFC PATCH v2 0/4] getting rid of busy-wait in shrink_dcache_parent() Paulo Alcantara
2026-04-03 4:46 ` Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 " Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 1/4] for_each_alias(): helper macro for iterating through dentries of given inode Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 2/4] struct dentry: make ->d_u anonymous Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 3/4] dcache.c: more idiomatic "positives are not allowed" sanity checks Al Viro
2026-04-04 8:07 ` [RFC PATCH v3 4/4] get rid of busy-waiting in shrink_dcache_tree() Al Viro
2026-04-09 16:51 ` [RFC PATCH v3 0/4] getting rid of busy-wait in shrink_dcache_parent() Jeff Layton
2026-04-09 19:02 ` Al Viro
2026-04-09 20:10 ` Jeff Layton
2026-04-09 21:57 ` Al Viro
2026-04-09 22:38 ` Jeff Layton
2026-04-10 8:48 ` [RFC][PATCH] make sure that lock_for_kill() callers drop the locks in safe order Al Viro
2026-04-10 11:18 ` Jeff Layton
2026-04-10 11:56 ` Jeff Layton
2026-04-10 15:25 ` Linus Torvalds
2026-04-10 15:57 ` Al Viro
2026-04-10 16:27 ` Boqun Feng
2026-04-10 17:31 ` Linus Torvalds
2026-04-10 18:11 ` Paul E. McKenney
2026-04-10 18:21 ` Jeff Layton
2026-04-10 19:19 ` Al Viro
2026-04-10 19:32 ` Jeff Layton
2026-04-10 21:13 ` Calvin Owens
2026-04-10 21:24 ` Al Viro
2026-04-10 22:15 ` Calvin Owens
2026-04-10 23:05 ` Al Viro
2026-04-10 23:30 ` Calvin Owens
2026-04-11 0:51 ` Al Viro
2026-04-10 17:32 ` Paul E. McKenney
2026-04-10 18:26 ` Jeff Layton
2026-04-10 18:36 ` Paul E. McKenney
2026-04-10 18:52 ` Al Viro
2026-04-10 19:21 ` Paul E. McKenney
2026-04-10 19:30 ` Linus Torvalds
2026-04-10 20:24 ` Al Viro [this message]
2026-04-10 20:48 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260410202404.GW3836593@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=boqun@kernel.org \
--cc=brauner@kernel.org \
--cc=frederic@kernel.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=nik.borisov@suse.com \
--cc=paulmck@kernel.org \
--cc=pc@manguebit.org \
--cc=sandeen@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox