From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
NeilBrown <neil@brown.name>
Subject: Re: [RFC PATCH 12/25] reducing rcu_read_lock() scopes in dput and friends, step 1
Date: Tue, 5 May 2026 23:42:56 +0100 [thread overview]
Message-ID: <20260505224256.GH3518998@ZenIV> (raw)
In-Reply-To: <CAADWXX_dChuduyjE2=h4JsRkX1Tbjhgw47p0v9KOSbHkWCs7ng@mail.gmail.com>
On Tue, May 05, 2026 at 09:47:03AM -0700, Linus Torvalds wrote:
> On Mon, May 4, 2026 at 10:54 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > This one breaks rcu_read_lock() scopes in front of shrink_kill() call
> > in shrink_dentry_list() and finish_dput() call in dput(); in both
> > places we are withing ->d_lock scope, so doing that won't change
> > the RCU read-side critical areas.
>
> Ugh.
>
> Ok, so I see what this does and why it does it, but I still dislike
> this one intensely. That
>
> rcu_read_unlock();
> rcu_read_lock();
>
> pattern is just ugly, and I'd like to see any actual numbers that it
> makes a real latency difference.
It almost certainly doesn't; the only point is to split the changes into
equivalent transformations - the whole thing is tricky enough as it is.
> > With that done, we have all calls of shrink_kill() and finish_dput()
> > immediately preceded by rcu_read_lock(); next step will use that.
>
> Yes, the next patch at least removes the ugly pattern. It still
> obviously *does* the same "drop and immediately re-take", just without
> making it as obvious.
Yes, and the next two steps after that drive those down into lock_for_kill(),
where we finally get the sucker off the fast path - it's not (re)taken unless
trylock on inode->i_lock fails and we have to play games with dropping and
retaking ->d_lock.
> Honestly, if this is then going to make some still later patch
> cleaner, I'd prefer 12/13 to be just combined into one "change rcu
> locking semantics to be better".
>
> Because as-is, I think patch 12 on its own is just ugly.
>
> It looks like patch 14 then uses the thing to clean up dentry_kill() a
> bit, so that combined patch wouldn't trigger my "this is just ugly"
> case, adn would still have a real reason for it (outside of a
> theoretical "shrink rcu region" without numbers to say why it's
> needed).
The main payoff is in #15, really. The entire 12--17 started as a single
commit, which I fucked up on the first two attempts. Thus the carve-up
into chunks too small for idiot me to fumble...
FWIW, an issue with large rcu_read_lock() scopes (as opposed to RCU read-side
critical areas) is that they are harder to reason about; having them cover
smaller chunks makes it a lot more obvious what each one is for.
In the fs/dcache.c after this series:
1. dentry_name_snapshot(). Protection of external name - from fetching
->d_name.name to atomic_inc_not_zero() on the external name refcount.
The paired RCU delay is somewhat subtle and needs to be documented.
In release_dentry_name_snapshot() and copy_name() it's the use
of kfree_rcu() when refcount hits zero; that's the obvious part.
In dentry_free() it's having __d_free_external() called via call_rcu(),
regardless of DCACHE_NORCU on dentry itself.
2. lock_for_kill(). Bridging over the spinlock switchover in case of
trylock failure, protecting dentry and inode from being freed.
Safety for !NORCU case: freeing of dentry is RCU-delayed due to !NORCU,
freeing of inode is RCU-delayed with 3 exceptions. Inode must have
non-NULL ->destroy_inode(), NULL ->free_inode() and ->destroy_inode()
must do prompt freeing (the last part excludes XFS)
* pipefs inodes (only on NORCU dentries, except for pipefs root,
which is never dropped)
* btrfs-test inodes. A bug, should have btrfs_test_destroy_inode()
reduced to the call of btrfs_drop_extent_map_range(), with ->free_inode
set to btrfs_free_inode, same as for btrfs proper. Not triggered due to
the very limited use of those suckers - it's never exposed to userland and
self-tests don't step into that fun.
* bpffs inodes. A bug. Another manifestation of the same bug had
been reported by folks doing fuzzing; I'd posted a completely untested fix
about a week ago, need to return to it (Alexey replied with "Feel free to
take it into your tree directly", I didn't get around to testing it yet).
Safety for NORCU case: we only get there if dentry refcount had
been dropped to zero by ourselves; no other thread will be calling
dentry_kill() on that dentry *or* making it negative. IOW, neither
dentry nor inode are going to be even scheduled for freeing until after
we return.
3. fast_dput(). Protected area from attempted decrement of refcount
to either having grabbed ->d_lock or to deciding that we are done
and returning true, protecting dentry from being freed. Safety for
!NORCU: dentry freeing is RCU-delayed; if it's already scheduled by
the time we have grabbed ->d_lock, we'll observe a negative number
in ->d_lockref.count, drop ->d_lock and that'll be our last access to
dentry. Safety for NORCU: if decrement succeeds and yields non-zero,
we are not touching dentry anymore. If it succeeds and yields zero, no
other thread will be able to increment the refcount, so no other thread
will be able to drive it back to zero. As the result, no other thread
will even attempt to call dentry_kill(), let alone free the sucker.
The comment in front is stale - will update.
4. dget_parent(). Protects candidate parent from freeing. Protected area
from fetching ->d_parent to lockref_get_not_zero() on its refcount.
Safety: NORCU dentries are never anyone's parents (and argument of
dget_parent() is held by the caller). For !NORCU dentries freeing
is RCU-delayed.
5. dget_parent(). Protects candidate parent from freeing. Protected area
from fetching ->d_parent to grabbing ->d_lock on it. Safety: same as the
previous one. FWIW, I wonder if this
rcu_read_lock();
ret = dentry->d_parent;
spin_lock(&ret->d_lock);
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
rcu_read_unlock();
goto repeat;
}
rcu_read_unlock();
would be better off as
rcu_read_lock();
ret = dentry->d_parent;
spin_lock(&ret->d_lock);
rcu_read_unlock();
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
goto repeat;
}
Or, for that matter, as
scoped_guard(rcu) {
// grab ->d_parent->d_lock safely
// NORCU dentries can't be anyone's parents
ret = dentry->d_parent;
spin_lock(&ret->d_lock);
}
// dentry->d_parent is not stable; however, the result of comparison is
// - the set of children is stabilized by ->d_lock
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
goto repeat;
}
5. d_walk(). Bridging over from ->d_lock on dentry to ->d_lock on
this_parent (aka dentry->d_parent, fetched under dentry->d_lock),
protecting this_parent from being freed. NORCU dentries are never
parents, so freeing this_parent is RCU-delayed.
6. select_collect2(). Found dentry that isn't busy and can't be placed
into our shrink list; need to return it to caller's caller, protecting
it from being freed. We are in ->d_lock scope, but that'll end in the
caller, so we need to extend the RCU read-side critical area all the
way into shrink_dcache_tree() (caller's caller). The end of scope is
in shrink_dcache_tree(), after it has grabbed ->d_lock. NORCU dentries
are never passed to d_walk() callbacks, so freeing is RCU-delayed.
7. shrink_dcache_for_umount(). Bridging over from ->s_roots_lock to
->d_lock, protecting the secondary root we'd found from being freed; NORCU
dentries are never anyone's secondary roots, so freeing is RCU-delayed.
8. __d_lookup(). Protects hlist_bl_for_each_entry_rcu traversal of the hash
chain. All ->d_hash modifications are done by RCU-aware primitives (and
serialized on chain's bitlock), NORCU dentries are never hashed, so freeing
of anything we walk into is RCU-delayed.
9. d_alloc_parallel(). Scope from __d_lookup_rcu() to lockref_get_not_dead().
Protects the hash chain traversal in __d_lookup_rcu() from being disrupted
under us and its result from being freed under us; NORCU dentries are never
returned by __d_lookup_rcu().
10. d_alloc_parallel(). Bridges over from the bitlock of in-lookup
hash chain to ->d_lock, protects the found match from being freed.
->d_lock nests outside of the chain's bitlock, so we need to transfer.
Nothing on the in-lookup hash chains is NORCU, so freeing is RCU-delayed.
11. is_subdir(). Somewhat excessive - we only need it around the lockless
call of d_ancestor(), to protect ->d_parent involved from freeing.
Additionally we have two functions that rely upon rcu_read_lock() held
by the caller: __d_lookup_rcu(), __d_lookup_rcu_op_compare(). Both
use it to protect the hash chain traversal (with the same iterator as
__d_lookup()); again, hash chain modifications are all by RCU-aware primitives
and no NORCU dentries are ever hashed, so nothing we walk through can be freed
under us. The latter provides RCU read-side critical area for ->d_compare(),
protecting dentry argument, the external name (if any) and whatever objects
the method itself might need (if any).
d_same_name() and dentry_cmp() are interesting - they rely either upon
being called in RCU read-side critical area, protecting dentry and external
name (if any) from being freed along with whatever objects required by
->d_compare() (in case of d_same_name()) OR upon being called with dentry
having name and parent stable and an active reference to superblock held
by the caller.
next prev parent reply other threads:[~2026-05-05 22:42 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 5:53 [RFC PATCH 00/25] assorted dcache cleanups and fixes Al Viro
2026-05-05 5:53 ` [RFC PATCH 01/25] VFS: use wait_var_event for waiting in d_alloc_parallel() Al Viro
2026-05-05 5:53 ` [RFC PATCH 02/25] alloc_path_pseudo(): make sure we don't end up with NORCU dentries for directories Al Viro
2026-05-05 8:21 ` NeilBrown
2026-05-05 17:48 ` Al Viro
2026-05-05 5:53 ` [RFC PATCH 03/25] fix a race between d_find_any_alias() and final dput() of NORCU dentries Al Viro
2026-05-05 17:06 ` Linus Torvalds
2026-05-05 20:29 ` Al Viro
2026-05-05 5:53 ` [RFC PATCH 04/25] find_acceptable_alias(): skip NORCU aliases with zero refcount Al Viro
2026-05-05 5:53 ` [RFC PATCH 05/25] select_collect(): ignore dentries on shrink lists if they have positive refcounts Al Viro
2026-05-05 5:53 ` [RFC PATCH 06/25] make to_shrink_list() return whether it has moved dentry to list Al Viro
2026-05-05 5:53 ` [RFC PATCH 07/25] kill d_dispose_if_unused() Al Viro
2026-05-05 5:53 ` [RFC PATCH 08/25] d_prune_aliases(): make sure to skip NORCU aliases Al Viro
2026-05-05 5:53 ` [RFC PATCH 09/25] shrink_dentry_list(): start with removing from shrink list Al Viro
2026-05-07 20:39 ` Al Viro
2026-05-05 5:53 ` [RFC PATCH 10/25] fold lock_for_kill() into shrink_kill() Al Viro
2026-05-05 5:53 ` [RFC PATCH 11/25] fold lock_for_kill() and __dentry_kill() into common helper Al Viro
2026-05-05 5:53 ` [RFC PATCH 12/25] reducing rcu_read_lock() scopes in dput and friends, step 1 Al Viro
2026-05-05 8:55 ` NeilBrown
2026-05-05 14:22 ` Al Viro
2026-05-05 21:58 ` NeilBrown
2026-05-05 16:47 ` Linus Torvalds
2026-05-05 22:42 ` Al Viro [this message]
2026-05-07 7:35 ` Al Viro
2026-05-07 15:32 ` Linus Torvalds
2026-05-05 5:54 ` [RFC PATCH 13/25] reducing rcu_read_lock() scopes in dput and friends, step 2 Al Viro
2026-05-05 5:54 ` [RFC PATCH 14/25] reducing rcu_read_lock() scopes in dput and friends, step 3 Al Viro
2026-05-05 5:54 ` [RFC PATCH 15/25] reducing rcu_read_lock() scopes in dput and friends, step 4 Al Viro
2026-05-05 5:54 ` [RFC PATCH 16/25] reducing rcu_read_lock() scopes in dput and friends, step 5 Al Viro
2026-05-05 5:54 ` [RFC PATCH 17/25] reducing rcu_read_lock() scopes in dput and friends, step 6 Al Viro
2026-05-05 5:54 ` [RFC PATCH 18/25] adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill() Al Viro
2026-05-05 5:54 ` [RFC PATCH 19/25] document dentry_kill() Al Viro
2026-05-05 5:54 ` [RFC PATCH 20/25] d_walk(): shrink rcu_read_lock() scope Al Viro
2026-05-05 17:01 ` Linus Torvalds
2026-05-05 20:05 ` Al Viro
2026-05-05 21:40 ` Frederic Weisbecker
2026-05-05 22:50 ` Al Viro
2026-05-06 3:49 ` Paul E. McKenney
2026-05-07 22:39 ` NeilBrown
2026-05-07 23:21 ` Paul E. McKenney
2026-05-08 14:47 ` Al Viro
2026-05-08 22:03 ` Paul E. McKenney
2026-05-08 23:03 ` Al Viro
2026-05-08 3:01 ` Al Viro
2026-05-05 5:54 ` [RFC PATCH 21/25] shrinking rcu_read_lock() scope in d_alloc_parallel() Al Viro
2026-05-07 21:52 ` Jori Koolstra
2026-05-08 3:12 ` Al Viro
2026-05-08 9:28 ` Jori Koolstra
2026-05-05 5:54 ` [RFC PATCH 22/25] shrink_dentry_tree(): unify the calls of shrink_dentry_list() Al Viro
2026-05-05 5:54 ` [RFC PATCH 23/25] wind ->s_roots via ->d_sib instead of ->d_hash Al Viro
2026-05-05 5:54 ` [RFC PATCH 24/25] nfs: get rid of fake root dentries Al Viro
2026-05-05 5:54 ` [RFC PATCH 25/25] make cursors NORCU Al Viro
2026-05-05 17:09 ` [RFC PATCH 00/25] assorted dcache cleanups and fixes Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505224256.GH3518998@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=neil@brown.name \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox