[RFC] shrinking rcu_read_lock() scope in d_alloc_parallel()

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>
Subject: [RFC] shrinking rcu_read_lock() scope in d_alloc_parallel()
Date: Thu, 23 Apr 2026 19:29:17 +0100	[thread overview]
Message-ID: <20260423182917.GP3518998@ZenIV> (raw)

	While documening rcu_read_lock() uses in fs/dcache.c, I've run
into a fairly opaque one in d_alloc_parallel() and I wonder if we would
be better off reducing that one.

	The current variant serves two purposes.  We start with lookup
in normal hash, and there it puts __d_lookup_rcu() and subsequent
lockref_get_not_dead() into the same RCU read-side critical area.
If no match is found, we proceed to lock the hash chain of in-lookup
hash and scan that for a match.  If we find a match, we want to grab
it and wait for lookup in progress to finish.  Since the bitlock we use
for these hash chains has to nest inside ->d_lock, we need to unlock
the chain first and use lockref_get_not_dead() on the match.  That
has to be done without breaking the RCU read-side critical area, and
we use the same rcu_read_lock() scope to bridge over.

	The thing is, after having grabbed the reference (and it
is very unlikely to fail) we proceed to grab ->d_lock, which makes
lockref_get_not_dead() somewhat pointless.  If we grab ->d_lock first
and replace lockref_get_not_dead() with direct check for negative
->d_lockref.count + increment if it's non-negative we can move
rcu_read_unlock() immediately after grabbing ->d_lock.  Moreover,
we don't need the RCU read-side critical area to be contiguous since
before earlier __d_lookup_rcu() - we can just as well terminate the
earlier one ASAP and call rcu_read_lock() again only after having
found a match (if any) in the in-lookup hash chain.

	That makes the entire thing easier to follow and the purpose
of those rcu_read_lock() calls easier to describe - the first scope is
for __d_lookup_rcu() + lockref_get_not_dead(), the second one bridges
over from the bitlock scope to the ->d_lock scope on the match found in
in-lookup hash.

	Does anybody see any problems with something like the delta below
(on top of current mainline)?  It seems to survive the local beating, but
then I hadn't tried it with RT configs, etc.


diff --git a/fs/dcache.c b/fs/dcache.c
index 2c61aeea41f4..08aca152fccf 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2684,38 +2684,33 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 	spin_unlock(&parent->d_lock);
 
 retry:
-	rcu_read_lock();
 	seq = smp_load_acquire(&parent->d_inode->i_dir_seq);
 	r_seq = read_seqbegin(&rename_lock);
+	rcu_read_lock();
 	dentry = __d_lookup_rcu(parent, name, &d_seq);
 	if (unlikely(dentry)) {
 		if (!lockref_get_not_dead(&dentry->d_lockref)) {
 			rcu_read_unlock();
 			goto retry;
 		}
+		rcu_read_unlock();
 		if (read_seqcount_retry(&dentry->d_seq, d_seq)) {
-			rcu_read_unlock();
 			dput(dentry);
 			goto retry;
 		}
-		rcu_read_unlock();
 		dput(new);
 		return dentry;
 	}
-	if (unlikely(read_seqretry(&rename_lock, r_seq))) {
-		rcu_read_unlock();
+	rcu_read_unlock();
+	if (unlikely(read_seqretry(&rename_lock, r_seq)))
 		goto retry;
-	}
 
-	if (unlikely(seq & 1)) {
-		rcu_read_unlock();
+	if (unlikely(seq & 1))
 		goto retry;
-	}
 
 	hlist_bl_lock(b);
 	if (unlikely(READ_ONCE(parent->d_inode->i_dir_seq) != seq)) {
 		hlist_bl_unlock(b);
-		rcu_read_unlock();
 		goto retry;
 	}
 	/*
@@ -2732,19 +2727,20 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 			continue;
 		if (!d_same_name(dentry, parent, name))
 			continue;
+		rcu_read_lock();
 		hlist_bl_unlock(b);
+		spin_lock(&dentry->d_lock);
+		rcu_read_unlock();
 		/* now we can try to grab a reference */
-		if (!lockref_get_not_dead(&dentry->d_lockref)) {
-			rcu_read_unlock();
+		if (unlikely(dentry->d_lockref.count < 0)) {
+			spin_unlock(&dentry->d_lock);
 			goto retry;
 		}
-
-		rcu_read_unlock();
 		/*
 		 * somebody is likely to be still doing lookup for it;
-		 * wait for them to finish
+		 * pin it and wait for them to finish
 		 */
-		spin_lock(&dentry->d_lock);
+		dget_dlock(dentry);
 		d_wait_lookup(dentry);
 		/*
 		 * it's not in-lookup anymore; in principle we should repeat
@@ -2765,7 +2761,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
 		dput(new);
 		return dentry;
 	}
-	rcu_read_unlock();
 	new->d_wait = wq;
 	hlist_bl_add_head(&new->d_in_lookup_hash, b);
 	hlist_bl_unlock(b);

next             reply	other threads:[~2026-04-23 18:29 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 18:29 Al Viro [this message]
2026-04-23 21:09 ` [RFC] shrinking rcu_read_lock() scope in d_alloc_parallel() Linus Torvalds

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2c61aeea41f dfblob:08aca152fcc )
 OR (
bs:"[RFC] shrinking rcu_read_lock() scope in d_alloc_parallel()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423182917.GP3518998@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox