From: Al Viro <viro@zeniv.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] fuse update for 6.19
Date: Mon, 13 Apr 2026 01:29:40 +0100 [thread overview]
Message-ID: <20260413002940.GA2846532@ZenIV> (raw)
In-Reply-To: <20260412061644.GA2485189@ZenIV>
On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:
> Looking at that thing again, I really hate how subtle it is ;-/
> Consider the lockless case in your ->d_release() and try to write down
> the reasons why it's safe. Among the other things, kfree_rcu() is there
> to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> doubles as delaying that freeing past the scope of dentry_hash[].lock in
> which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> to lockless path. Another side of that fun is the proof that if
> fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> Locked case of ->d_release() is easy; proving that the lockless one
> is OK without explict barriers is more interesting. Basically, all
> insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> we are observing in RB_EMPTY_NODE() has come from those we will hit the
> locked case. If the value we observe has come from RB_CLEAR_NODE() in
> earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> wrt each other and insertions), there mustn't have been any subsequent
> insertions, or ->d_release() would've observed the effect of those.
> If the value has come from the _same_ fuse_dentry_tree_work(), the
> implicit barrier in spin_lock() would've ordered that store wrt beginning
> of the scope, and since dentry_free() is ordered wrt ->d_release()
> the callback of call_rcu() in there wouldn't run until the the end of
> the scope in question. So I think it's OK, but having it done without
> a single comment either on barriers or on memory safety... Ouch.
>
> Note that we *can* run into a dentry getting killed just after
> RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> a no-op for dying and killed ones - it's not just "If dentry has no
> external references, move it to shrink list" as your comment in
> fs/dcache.c says. And in your case it very much does have an
> external reference - it's just that it's an equivalent of RCU one,
> with all the joy that inflicts upon the user.
FWIW, here's what I've ended up with yesterday:
/*
* ->d_release() is ordered (on ->d_lock) after everything done
* while holding counting references, including all calls of
* fuse_dentry_add_tree_node(). Freeing a dentry is RCU-delayed
* and scheduling it is ordered after ->d_release().
*
* In ->d_release() we see the initialization and all stores done by
* fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
* ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
* we also see all stores from fuse_dentry_tree_work() except possibly
* the last one.
*
* If the value fuse_dentry_release() observes in
* if (!RB_EMPTY_NODE(&fd->node))
* has come from initialization, the node had never been inserted
* into rbtree and fuse_dentry_tree_work() won't see it at all.
*
* If the value we observe there comes from the last store in
* fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
* false and we are going to enter a dentry_hash[].lock scope,
* providing an ordering wrt fuse_dentry_tree_work() which either
* won't find fd in rbtree, or will complete before we
* get to even scheduling dentry freeing.
*
* If the value we observe comes from fuse_dentry_tree_work(), we will
* be ordered past the beginning of dentry_hash[].lock scope that store
* had happened in. Since call_rcu() that schedules freeing the dentry
* is ordered past the return from ->d_release(), freeing won't happen
* until the end of that scope. Since there can be no subsequent calls
* of fuse_dentry_add_tree_node(), any subsequent calls of
* fuse_dentry_tree_work() won't see the node at all.
*
* Either way, if fuse_dentry_tree_work() sees a node, we are
* guaranteed that node->dentry won't get freed under it.
*
* Freeing the node itself is also RCU-delayed and scheduled in the
* very end of FUSE ->d_release(); similar considerations shows that
* if fuse_dentry_tree_work() sees a node, we are guaranteed that
* the node won't get freed under us either.
*
* While it is possible to run into a dying (or killed) dentry in
* fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
* used for those - it's a no-op in such case.
*/
Is the above sane? If it is, something similar would be worth having
written down somewhere in fs/fuse - it's not trivial and it's not hard
to break accidentally...
As for the primitive itself...
/**
* __move_to_shrink_list - try to place a dentry into a shrink list
* @dentry: dentry to try putting into shrink list
* @list: the list to put @dentry into.
* Returns: true @dentry had been placed into @list, false otherwise
*
* If @dentry is idle and not already include into a shrink list, move
* it into @list and return %true; otherwise do nothing and return %false.
*
* Caller must be holding @dentry->d_lock. There must have been no calls of
* dentry_free(@dentry) prior to the beginning of the RCU read-side critical
* area in which __move_to_shrink_list(@dentry, @list) is called.
*
* @list should be thread-private and eventually emptied by passing it to
* shrink_dentry_list().
*/
bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
if (likely(!dentry->d_lockref.count &&
!(dentry->d_flags & DCACHE_SHRINK_LIST))) {
if (dentry->d_flags & DCACHE_LRU_LIST)
d_lru_del(dentry);
d_shrink_add(dentry, list);
return true;
}
return false;
}
EXPORT_SYMBOL(__move_to_shrink_list);
With something like
spin_lock(&fd->dentry->d_lock);
/* If dentry is still referenced, let next dput release it */
fd->dentry->d_flags |= DCACHE_OP_DELETE;
__move_to_shrink_list(fd->dentry, &dispose);
spin_unlock(&fd->dentry->d_lock);
in fuse_dentry_tree_work() (along with the explanations of memory safety,
that is)
Comments?
next prev parent reply other threads:[~2026-04-13 0:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
2025-12-05 23:47 ` Linus Torvalds
2025-12-06 1:42 ` Al Viro
2025-12-06 1:52 ` Linus Torvalds
2025-12-06 2:28 ` Al Viro
2025-12-06 3:10 ` Al Viro
2025-12-06 3:29 ` Linus Torvalds
2025-12-06 3:54 ` Al Viro
2025-12-06 4:22 ` Al Viro
2025-12-08 10:37 ` Miklos Szeredi
2026-01-14 15:23 ` Miklos Szeredi
2026-04-12 6:16 ` Al Viro
2026-04-13 0:29 ` Al Viro [this message]
2026-04-13 15:32 ` Miklos Szeredi
2025-12-06 0:17 ` pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260413002940.GA2846532@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox