public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] fuse update for 6.19
Date: Mon, 13 Apr 2026 01:29:40 +0100	[thread overview]
Message-ID: <20260413002940.GA2846532@ZenIV> (raw)
In-Reply-To: <20260412061644.GA2485189@ZenIV>

On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:

> 	Looking at that thing again, I really hate how subtle it is ;-/
> Consider the lockless case in your ->d_release() and try to write down
> the reasons why it's safe.  Among the other things, kfree_rcu() is there
> to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> doubles as delaying that freeing past the scope of dentry_hash[].lock in
> which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> to lockless path.  Another side of that fun is the proof that if
> fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> Locked case of ->d_release() is easy; proving that the lockless one
> is OK without explict barriers is more interesting.  Basically, all
> insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> we are observing in RB_EMPTY_NODE() has come from those we will hit the
> locked case.  If the value we observe has come from RB_CLEAR_NODE() in
> earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> wrt each other and insertions), there mustn't have been any subsequent
> insertions, or ->d_release() would've observed the effect of those.
> If the value has come from the _same_ fuse_dentry_tree_work(), the
> implicit barrier in spin_lock() would've ordered that store wrt beginning
> of the scope, and since dentry_free() is ordered wrt ->d_release()
> the callback of call_rcu() in there wouldn't run until the the end of
> the scope in question.  So I think it's OK, but having it done without
> a single comment either on barriers or on memory safety...  Ouch.
> 
> 	Note that we *can* run into a dentry getting killed just after
> RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> a no-op for dying and killed ones - it's not just "If dentry has no
> external references, move it to shrink list" as your comment in
> fs/dcache.c says.  And in your case it very much does have an
> external reference - it's just that it's an equivalent of RCU one,
> with all the joy that inflicts upon the user.

FWIW, here's what I've ended up with yesterday:

	/*
	 * ->d_release() is ordered (on ->d_lock) after everything done
	 * while holding counting references, including all calls of
	 * fuse_dentry_add_tree_node().  Freeing a dentry is RCU-delayed
	 * and scheduling it is ordered after ->d_release().
	 *
	 * In ->d_release() we see the initialization and all stores done by
	 * fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
	 * ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
	 * we also see all stores from fuse_dentry_tree_work() except possibly
	 * the last one.
	 *
	 * If the value fuse_dentry_release() observes in
	 *	if (!RB_EMPTY_NODE(&fd->node))
	 * has come from initialization, the node had never been inserted
	 * into rbtree and fuse_dentry_tree_work() won't see it at all.
	 *
	 * If the value we observe there comes from the last store in
	 * fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
	 * false and we are going to enter a dentry_hash[].lock scope,
	 * providing an ordering wrt fuse_dentry_tree_work() which either
	 * won't find fd in rbtree, or will complete before we
	 * get to even scheduling dentry freeing.
	 * 
	 * If the value we observe comes from fuse_dentry_tree_work(), we will
	 * be ordered past the beginning of dentry_hash[].lock scope that store
	 * had happened in.  Since call_rcu() that schedules freeing the dentry
	 * is ordered past the return from ->d_release(), freeing won't happen
	 * until the end of that scope.  Since there can be no subsequent calls
	 * of fuse_dentry_add_tree_node(), any subsequent calls of
	 * fuse_dentry_tree_work() won't see the node at all.
	 *
	 * Either way, if fuse_dentry_tree_work() sees a node, we are
	 * guaranteed that node->dentry won't get freed under it.
	 *
	 * Freeing the node itself is also RCU-delayed and scheduled in the
	 * very end of FUSE ->d_release(); similar considerations shows that
	 * if fuse_dentry_tree_work() sees a node, we are guaranteed that
	 * the node won't get freed under us either.
	 *
	 * While it is possible to run into a dying (or killed) dentry in
	 * fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
	 * used for those - it's a no-op in such case.
	 */

Is the above sane?  If it is, something similar would be worth having
written down somewhere in fs/fuse - it's not trivial and it's not hard
to break accidentally...

As for the primitive itself...

/**
 * __move_to_shrink_list - try to place a dentry into a shrink list
 * @dentry:	dentry to try putting into shrink list
 * @list:	the list to put @dentry into.
 * Returns:	true @dentry had been placed into @list, false otherwise
 *
 * If @dentry is idle and not already include into a shrink list, move
 * it into @list and return %true; otherwise do nothing and return %false.
 *
 * Caller must be holding @dentry->d_lock.  There must have been no calls of
 * dentry_free(@dentry) prior to the beginning of the RCU read-side critical
 * area in which __move_to_shrink_list(@dentry, @list) is called.
 *
 * @list should be thread-private and eventually emptied by passing it to
 * shrink_dentry_list().
 */
bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
        if (likely(!dentry->d_lockref.count &&
	    !(dentry->d_flags & DCACHE_SHRINK_LIST))) {
		if (dentry->d_flags & DCACHE_LRU_LIST)
			d_lru_del(dentry);
		d_shrink_add(dentry, list);
		return true;
	}
	return false;
}
EXPORT_SYMBOL(__move_to_shrink_list);

With something like
	spin_lock(&fd->dentry->d_lock);
	/* If dentry is still referenced, let next dput release it */
	fd->dentry->d_flags |= DCACHE_OP_DELETE;
	__move_to_shrink_list(fd->dentry, &dispose);
	spin_unlock(&fd->dentry->d_lock);
in fuse_dentry_tree_work() (along with the explanations of memory safety,
that is)

Comments?

  reply	other threads:[~2026-04-13  0:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04  8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
2025-12-05 23:47 ` Linus Torvalds
2025-12-06  1:42   ` Al Viro
2025-12-06  1:52     ` Linus Torvalds
2025-12-06  2:28       ` Al Viro
2025-12-06  3:10         ` Al Viro
2025-12-06  3:29         ` Linus Torvalds
2025-12-06  3:54           ` Al Viro
2025-12-06  4:22             ` Al Viro
2025-12-08 10:37               ` Miklos Szeredi
2026-01-14 15:23               ` Miklos Szeredi
2026-04-12  6:16                 ` Al Viro
2026-04-13  0:29                   ` Al Viro [this message]
2026-04-13 15:32                     ` Miklos Szeredi
2025-12-06  0:17 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413002940.GA2846532@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox