Linux filesystem development
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	NeilBrown <neil@brown.name>, Boqun Feng <boqun@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>
Subject: Re: [RFC PATCH 20/25] d_walk(): shrink rcu_read_lock() scope
Date: Fri, 8 May 2026 04:01:12 +0100	[thread overview]
Message-ID: <20260508030112.GN3518998@ZenIV> (raw)
In-Reply-To: <afpj3FBrvHer0E6Y@pavilion.home>

On Tue, May 05, 2026 at 11:40:44PM +0200, Frederic Weisbecker wrote:

> > I would really prefer to have all explicit rcu_read_lock() in fs/dcache.c
> > clearly documented ;-/
> 
> And the calls to rcu_read_lock() / rcu_read_unlock() are _usually_ cheap enough
> that their explicit calls should go unnoticed over an implicit spinlock RCU
> critical section. Ok rcu_read_unlock() has its slowpath but it would otherwise
> trigger randomly when implicit read side are used (context switch, interrupts,
> etc...)
> 
> Also rcu_read_lock() has the advantage to tell clearly that we are doing an RCU read
> side (and its uses indeed claim to be documented) here while relying on spinlocks to do
> that is fine but implicit and more error-prone when there are tricky things
> like spinlock transfers: better don't ask for trouble.
> 
> So I would say that we should probably avoid introducing such an API if possible.

The problem is, for dentries such transfer may be _less_ error-prone than
a large rcu_read_lock() scope.  There's such thing as NORCU dentries, and
for those freeing is not RCU-delayed.  So for d_walk() the safety of
                dentry = this_parent;
		this_parent = dentry->d_parent;
		 
		spin_unlock(&dentry->d_lock);
		spin_lock(&this_parent->d_lock);
requires more than just "everything is in RCU read-side critical area"; it's
also "and NORCU dentries never occur as anyone's parents".

We do rely upon ->d_lock for all other transitions in that graph walk -
for descent into a subtree and for traversals into the next sibling,
including those during ascent from subtree.

Any place where we rely solely upon rcu_read_lock() needs an analysis of
NORCU case and it's not always "NORCU can't occur here" - e.g. fast_dput()
is hit every time we close a pipe or socket, and these are the original
reason for having NORCU in the first place.

FWIW, the rules (and I'm still not finished writing down a coherent
variant of those) are very heavily ->d_lock-based; basically, we have
3 safe cases - counting reference, RCU reference and locked reference.
We don't have that enforced by the type system, unfortunately, and I'm
not sure we *can* express that with C type system in a way that would
be entirely compiler-enforced.  I do think we can reduce most of that
down to a handful of "prove the safety manually" areas; we'll see.

Long story short, rcu_read_lock() is really too blunt a tool in that area.
Look at the list of scopes in
https://lore.kernel.org/all/20260507073510.GL3518998@ZenIV/
and tell me if you like having to rely upon that kind of analysis in
proof of correctness...

  parent reply	other threads:[~2026-05-08  3:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  5:53 [RFC PATCH 00/25] assorted dcache cleanups and fixes Al Viro
2026-05-05  5:53 ` [RFC PATCH 01/25] VFS: use wait_var_event for waiting in d_alloc_parallel() Al Viro
2026-05-05  5:53 ` [RFC PATCH 02/25] alloc_path_pseudo(): make sure we don't end up with NORCU dentries for directories Al Viro
2026-05-05  8:21   ` NeilBrown
2026-05-05 17:48     ` Al Viro
2026-05-05  5:53 ` [RFC PATCH 03/25] fix a race between d_find_any_alias() and final dput() of NORCU dentries Al Viro
2026-05-05 17:06   ` Linus Torvalds
2026-05-05 20:29     ` Al Viro
2026-05-05  5:53 ` [RFC PATCH 04/25] find_acceptable_alias(): skip NORCU aliases with zero refcount Al Viro
2026-05-05  5:53 ` [RFC PATCH 05/25] select_collect(): ignore dentries on shrink lists if they have positive refcounts Al Viro
2026-05-05  5:53 ` [RFC PATCH 06/25] make to_shrink_list() return whether it has moved dentry to list Al Viro
2026-05-05  5:53 ` [RFC PATCH 07/25] kill d_dispose_if_unused() Al Viro
2026-05-05  5:53 ` [RFC PATCH 08/25] d_prune_aliases(): make sure to skip NORCU aliases Al Viro
2026-05-05  5:53 ` [RFC PATCH 09/25] shrink_dentry_list(): start with removing from shrink list Al Viro
2026-05-07 20:39   ` Al Viro
2026-05-05  5:53 ` [RFC PATCH 10/25] fold lock_for_kill() into shrink_kill() Al Viro
2026-05-05  5:53 ` [RFC PATCH 11/25] fold lock_for_kill() and __dentry_kill() into common helper Al Viro
2026-05-05  5:53 ` [RFC PATCH 12/25] reducing rcu_read_lock() scopes in dput and friends, step 1 Al Viro
2026-05-05  8:55   ` NeilBrown
2026-05-05 14:22     ` Al Viro
2026-05-05 21:58       ` NeilBrown
2026-05-05 16:47   ` Linus Torvalds
2026-05-05 22:42     ` Al Viro
2026-05-07  7:35       ` Al Viro
2026-05-07 15:32         ` Linus Torvalds
2026-05-05  5:54 ` [RFC PATCH 13/25] reducing rcu_read_lock() scopes in dput and friends, step 2 Al Viro
2026-05-05  5:54 ` [RFC PATCH 14/25] reducing rcu_read_lock() scopes in dput and friends, step 3 Al Viro
2026-05-05  5:54 ` [RFC PATCH 15/25] reducing rcu_read_lock() scopes in dput and friends, step 4 Al Viro
2026-05-05  5:54 ` [RFC PATCH 16/25] reducing rcu_read_lock() scopes in dput and friends, step 5 Al Viro
2026-05-05  5:54 ` [RFC PATCH 17/25] reducing rcu_read_lock() scopes in dput and friends, step 6 Al Viro
2026-05-05  5:54 ` [RFC PATCH 18/25] adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill() Al Viro
2026-05-05  5:54 ` [RFC PATCH 19/25] document dentry_kill() Al Viro
2026-05-05  5:54 ` [RFC PATCH 20/25] d_walk(): shrink rcu_read_lock() scope Al Viro
2026-05-05 17:01   ` Linus Torvalds
2026-05-05 20:05     ` Al Viro
2026-05-05 21:40       ` Frederic Weisbecker
2026-05-05 22:50         ` Al Viro
2026-05-06  3:49           ` Paul E. McKenney
2026-05-07 22:39             ` NeilBrown
2026-05-07 23:21               ` Paul E. McKenney
2026-05-08 14:47                 ` Al Viro
2026-05-08 22:03                   ` Paul E. McKenney
2026-05-08 23:03                     ` Al Viro
2026-05-08  3:01         ` Al Viro [this message]
2026-05-05  5:54 ` [RFC PATCH 21/25] shrinking rcu_read_lock() scope in d_alloc_parallel() Al Viro
2026-05-07 21:52   ` Jori Koolstra
2026-05-08  3:12     ` Al Viro
2026-05-08  9:28       ` Jori Koolstra
2026-05-05  5:54 ` [RFC PATCH 22/25] shrink_dentry_tree(): unify the calls of shrink_dentry_list() Al Viro
2026-05-05  5:54 ` [RFC PATCH 23/25] wind ->s_roots via ->d_sib instead of ->d_hash Al Viro
2026-05-05  5:54 ` [RFC PATCH 24/25] nfs: get rid of fake root dentries Al Viro
2026-05-05  5:54 ` [RFC PATCH 25/25] make cursors NORCU Al Viro
2026-05-05 17:09 ` [RFC PATCH 00/25] assorted dcache cleanups and fixes Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508030112.GN3518998@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=boqun@kernel.org \
    --cc=brauner@kernel.org \
    --cc=frederic@kernel.org \
    --cc=jack@suse.cz \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=paulmck@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox