From: Al Viro <viro@zeniv.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] fuse update for 6.19
Date: Mon, 13 Apr 2026 01:29:40 +0100 [thread overview]
Message-ID: <20260413002940.GA2846532@ZenIV> (raw)
In-Reply-To: <20260412061644.GA2485189@ZenIV>
On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:
> Looking at that thing again, I really hate how subtle it is ;-/
> Consider the lockless case in your ->d_release() and try to write down
> the reasons why it's safe. Among the other things, kfree_rcu() is there
> to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> doubles as delaying that freeing past the scope of dentry_hash[].lock in
> which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> to lockless path. Another side of that fun is the proof that if
> fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> Locked case of ->d_release() is easy; proving that the lockless one
> is OK without explict barriers is more interesting. Basically, all
> insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> we are observing in RB_EMPTY_NODE() has come from those we will hit the
> locked case. If the value we observe has come from RB_CLEAR_NODE() in
> earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> wrt each other and insertions), there mustn't have been any subsequent
> insertions, or ->d_release() would've observed the effect of those.
> If the value has come from the _same_ fuse_dentry_tree_work(), the
> implicit barrier in spin_lock() would've ordered that store wrt beginning
> of the scope, and since dentry_free() is ordered wrt ->d_release()
> the callback of call_rcu() in there wouldn't run until the the end of
> the scope in question. So I think it's OK, but having it done without
> a single comment either on barriers or on memory safety... Ouch.
>
> Note that we *can* run into a dentry getting killed just after
> RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> a no-op for dying and killed ones - it's not just "If dentry has no
> external references, move it to shrink list" as your comment in
> fs/dcache.c says. And in your case it very much does have an
> external reference - it's just that it's an equivalent of RCU one,
> with all the joy that inflicts upon the user.
FWIW, here's what I've ended up with yesterday:
/*
* ->d_release() is ordered (on ->d_lock) after everything done
* while holding counting references, including all calls of
* fuse_dentry_add_tree_node(). Freeing a dentry is RCU-delayed
* and scheduling it is ordered after ->d_release().
*
* In ->d_release() we see the initialization and all stores done by
* fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
* ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
* we also see all stores from fuse_dentry_tree_work() except possibly
* the last one.
*
* If the value fuse_dentry_release() observes in
* if (!RB_EMPTY_NODE(&fd->node))
* has come from initialization, the node had never been inserted
* into rbtree and fuse_dentry_tree_work() won't see it at all.
*
* If the value we observe there comes from the last store in
* fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
* false and we are going to enter a dentry_hash[].lock scope,
* providing an ordering wrt fuse_dentry_tree_work() which either
* won't find fd in rbtree, or will complete before we
* get to even scheduling dentry freeing.
*
* If the value we observe comes from fuse_dentry_tree_work(), we will
* be ordered past the beginning of dentry_hash[].lock scope that store
* had happened in. Since call_rcu() that schedules freeing the dentry
* is ordered past the return from ->d_release(), freeing won't happen
* until the end of that scope. Since there can be no subsequent calls
* of fuse_dentry_add_tree_node(), any subsequent calls of
* fuse_dentry_tree_work() won't see the node at all.
*
* Either way, if fuse_dentry_tree_work() sees a node, we are
* guaranteed that node->dentry won't get freed under it.
*
* Freeing the node itself is also RCU-delayed and scheduled in the
* very end of FUSE ->d_release(); similar considerations shows that
* if fuse_dentry_tree_work() sees a node, we are guaranteed that
* the node won't get freed under us either.
*
* While it is possible to run into a dying (or killed) dentry in
* fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
* used for those - it's a no-op in such case.
*/
Is the above sane? If it is, something similar would be worth having
written down somewhere in fs/fuse - it's not trivial and it's not hard
to break accidentally...
As for the primitive itself...
/**
* __move_to_shrink_list - try to place a dentry into a shrink list
* @dentry: dentry to try putting into shrink list
* @list: the list to put @dentry into.
* Returns: true @dentry had been placed into @list, false otherwise
*
* If @dentry is idle and not already include into a shrink list, move
* it into @list and return %true; otherwise do nothing and return %false.
*
* Caller must be holding @dentry->d_lock. There must have been no calls of
* dentry_free(@dentry) prior to the beginning of the RCU read-side critical
* area in which __move_to_shrink_list(@dentry, @list) is called.
*
* @list should be thread-private and eventually emptied by passing it to
* shrink_dentry_list().
*/
bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
if (likely(!dentry->d_lockref.count &&
!(dentry->d_flags & DCACHE_SHRINK_LIST))) {
if (dentry->d_flags & DCACHE_LRU_LIST)
d_lru_del(dentry);
d_shrink_add(dentry, list);
return true;
}
return false;
}
EXPORT_SYMBOL(__move_to_shrink_list);
With something like
spin_lock(&fd->dentry->d_lock);
/* If dentry is still referenced, let next dput release it */
fd->dentry->d_flags |= DCACHE_OP_DELETE;
__move_to_shrink_list(fd->dentry, &dispose);
spin_unlock(&fd->dentry->d_lock);
in fuse_dentry_tree_work() (along with the explanations of memory safety,
that is)
Comments?
next prev parent reply other threads:[~2026-04-13 0:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
2025-12-05 23:47 ` Linus Torvalds
2025-12-06 1:42 ` Al Viro
2025-12-06 1:52 ` Linus Torvalds
2025-12-06 2:28 ` Al Viro
2025-12-06 3:10 ` Al Viro
2025-12-06 3:29 ` Linus Torvalds
2025-12-06 3:54 ` Al Viro
2025-12-06 4:22 ` Al Viro
2025-12-08 10:37 ` Miklos Szeredi
2026-01-14 15:23 ` Miklos Szeredi
2026-04-12 6:16 ` Al Viro
2026-04-13 0:29 ` Al Viro [this message]
2026-04-13 15:32 ` Miklos Szeredi
2026-04-13 22:04 ` Al Viro
2025-12-06 0:17 ` pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260413002940.GA2846532@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.