public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] fuse update for 6.19
@ 2025-12-04  8:25 Miklos Szeredi
  2025-12-05 23:47 ` Linus Torvalds
  2025-12-06  0:17 ` pr-tracker-bot
  0 siblings, 2 replies; 15+ messages in thread
From: Miklos Szeredi @ 2025-12-04  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-kernel

Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
tags/fuse-update-6.19

- Add mechanism for cleaning out unused, stale dentries; controlled
via a module option (Luis Henriques)

- Fix various bugs

- Cleanups

The stale dentry cleanup has a patch touching dcache.c: this extracts
a helper from d_prune_aliases() that puts the unused dentry on a
dispose list.  Export this and shrink_dentry_list() to modules.

Thanks,
Miklos

---
Bernd Schubert (3):
      fuse: Fix whitespace for fuse_uring_args_to_ring() comment
      fuse: Invalidate the page cache after FOPEN_DIRECT_IO write
      fuse: Always flush the page cache before FOPEN_DIRECT_IO write

Cheng Ding (1):
      fuse: missing copy_finish in fuse-over-io-uring argument copies

Dan Carpenter (1):
      fuse: Uninitialized variable in fuse_epoch_work()

Darrick J. Wong (1):
      fuse: signal that a fuse inode should exhibit local fs behaviors

Joanne Koong (2):
      fuse: fix readahead reclaim deadlock
      fuse: fix io-uring list corruption for terminated non-committed requests

Luis Henriques (4):
      dcache: export shrink_dentry_list() and add new helper
d_dispose_if_unused()
      fuse: new work queue to periodically invalidate expired dentries
      fuse: new work queue to invalidate dentries from old epochs
      fuse: refactor fuse_conn_put() to remove negative logic.

Miklos Szeredi (1):
      fuse: add WARN_ON and comment for RCU revalidate

Miquel Sabaté Solà (2):
      fuse: use strscpy instead of strcpy
      fuse: rename 'namelen' to 'namesize'

---
 fs/dcache.c            |  18 ++--
 fs/fuse/dev.c          |   9 +-
 fs/fuse/dev_uring.c    |  12 ++-
 fs/fuse/dir.c          | 248 +++++++++++++++++++++++++++++++++++++++++++------
 fs/fuse/file.c         |  37 ++++++--
 fs/fuse/fuse_dev_i.h   |   1 +
 fs/fuse/fuse_i.h       |  28 +++++-
 fs/fuse/inode.c        |  44 +++++----
 fs/overlayfs/super.c   |  12 ++-
 include/linux/dcache.h |   2 +
 10 files changed, 340 insertions(+), 71 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-04  8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
@ 2025-12-05 23:47 ` Linus Torvalds
  2025-12-06  1:42   ` Al Viro
  2025-12-06  0:17 ` pr-tracker-bot
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-12-05 23:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel

On Thu, 4 Dec 2025 at 00:25, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> The stale dentry cleanup has a patch touching dcache.c: this extracts
> a helper from d_prune_aliases() that puts the unused dentry on a
> dispose list.  Export this and shrink_dentry_list() to modules.

Is that

        spin_lock(&dentry->d_lock);
        if (!dentry->d_lockref.count)
                to_shrink_list(dentry, dispose);
        spin_unlock(&dentry->d_lock);

thing possibly hot, and count might be commonly non-zero?

Because it's possible that we could just make it a lockref operation
where we atomically don't take the lock if the count is non-zero so
that we don't unnecessarily move cachelines around...

IOW, some kind of "lockref_lock_if_zero()" pattern?

I have no idea what the fuse dentry lifetime patterns might be, maybe
this is a complete non-issue...

         Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-04  8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
  2025-12-05 23:47 ` Linus Torvalds
@ 2025-12-06  0:17 ` pr-tracker-bot
  1 sibling, 0 replies; 15+ messages in thread
From: pr-tracker-bot @ 2025-12-06  0:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

The pull request you sent on Thu, 4 Dec 2025 09:25:29 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git tags/fuse-update-6.19

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4b6b4321280ea1ea1e101fd39d8664195d18ecb0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-05 23:47 ` Linus Torvalds
@ 2025-12-06  1:42   ` Al Viro
  2025-12-06  1:52     ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2025-12-06  1:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Fri, Dec 05, 2025 at 03:47:50PM -0800, Linus Torvalds wrote:
> On Thu, 4 Dec 2025 at 00:25, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > The stale dentry cleanup has a patch touching dcache.c: this extracts
> > a helper from d_prune_aliases() that puts the unused dentry on a
> > dispose list.  Export this and shrink_dentry_list() to modules.
> 
> Is that
> 
>         spin_lock(&dentry->d_lock);
>         if (!dentry->d_lockref.count)
>                 to_shrink_list(dentry, dispose);
>         spin_unlock(&dentry->d_lock);
> 
> thing possibly hot, and count might be commonly non-zero?
> 
> Because it's possible that we could just make it a lockref operation
> where we atomically don't take the lock if the count is non-zero so
> that we don't unnecessarily move cachelines around...
> 
> IOW, some kind of "lockref_lock_if_zero()" pattern?
> 
> I have no idea what the fuse dentry lifetime patterns might be, maybe
> this is a complete non-issue...

Far more interesting question, IMO, is what's to prevent memory
pressure from evicting the damn argument right under us.

AFAICS, fuse_dentry_tree_work() calls that thing with no locks held.
The one and only reason why that's OK in d_prune_aliases() is ->i_lock
held over that thing - that's enough to prevent eviction.  I don't
see anything to serve the same purpose here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  1:42   ` Al Viro
@ 2025-12-06  1:52     ` Linus Torvalds
  2025-12-06  2:28       ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-12-06  1:52 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Fri, 5 Dec 2025 at 17:42, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Far more interesting question, IMO, is what's to prevent memory
> pressure from evicting the damn argument right under us.

That was my first reaction, but look at the 'fuse_dentry_prune()' logic.

So if the dentry is removed by the VFS layer, it should be removed here too.

But maybe I missed something,

            Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  1:52     ` Linus Torvalds
@ 2025-12-06  2:28       ` Al Viro
  2025-12-06  3:10         ` Al Viro
  2025-12-06  3:29         ` Linus Torvalds
  0 siblings, 2 replies; 15+ messages in thread
From: Al Viro @ 2025-12-06  2:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Fri, Dec 05, 2025 at 05:52:51PM -0800, Linus Torvalds wrote:
> On Fri, 5 Dec 2025 at 17:42, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Far more interesting question, IMO, is what's to prevent memory
> > pressure from evicting the damn argument right under us.
> 
> That was my first reaction, but look at the 'fuse_dentry_prune()' logic.
> 
> So if the dentry is removed by the VFS layer, it should be removed here too.

Sure, ->d_prune() would take it out of the rbtree, but what if it hits
                                rb_erase(&fd->node, &dentry_hash[i].tree);
                                RB_CLEAR_NODE(&fd->node);
                                spin_unlock(&dentry_hash[i].lock);
... right here, when we are not holding any locks anymore?
                                d_dispose_if_unused(fd->dentry, &dispose);
                                cond_resched();
                                spin_lock(&dentry_hash[i].lock);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  2:28       ` Al Viro
@ 2025-12-06  3:10         ` Al Viro
  2025-12-06  3:29         ` Linus Torvalds
  1 sibling, 0 replies; 15+ messages in thread
From: Al Viro @ 2025-12-06  3:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Sat, Dec 06, 2025 at 02:28:26AM +0000, Al Viro wrote:
> On Fri, Dec 05, 2025 at 05:52:51PM -0800, Linus Torvalds wrote:
> > On Fri, 5 Dec 2025 at 17:42, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Far more interesting question, IMO, is what's to prevent memory
> > > pressure from evicting the damn argument right under us.
> > 
> > That was my first reaction, but look at the 'fuse_dentry_prune()' logic.
> > 
> > So if the dentry is removed by the VFS layer, it should be removed here too.
> 
> Sure, ->d_prune() would take it out of the rbtree, but what if it hits
>                                 rb_erase(&fd->node, &dentry_hash[i].tree);
>                                 RB_CLEAR_NODE(&fd->node);
>                                 spin_unlock(&dentry_hash[i].lock);
> ... right here, when we are not holding any locks anymore?
>                                 d_dispose_if_unused(fd->dentry, &dispose);
>                                 cond_resched();
>                                 spin_lock(&dentry_hash[i].lock);

... and with what fuse_dentry_prune() is doing, we can't grab ->d_lock
or bump ->d_count before dropping dentry_hash[...].lock.  ->d_release()
is the one called outside of ->d_lock; ->d_prune() is under it, so we'd
get AB-BA deadlock if we tried to do that kind of stuff.

Moving the eviction to ->d_release() might be doable; then we'd have
fuse locks outside of ->d_lock and could call that thing under those.

I'll need to poke around some more, but TBH I don't like that primitive -
it's really easy to fuck up and conditions for its safe use are, AFAICS,
never spelled out.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  2:28       ` Al Viro
  2025-12-06  3:10         ` Al Viro
@ 2025-12-06  3:29         ` Linus Torvalds
  2025-12-06  3:54           ` Al Viro
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-12-06  3:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Fri, 5 Dec 2025 at 18:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Sure, ->d_prune() would take it out of the rbtree, but what if it hits

Ahh.

Maybe increase the d_count before releasing that rbtree lock?

Or yeah, maybe moving it to d_release. Miklos?

           Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  3:29         ` Linus Torvalds
@ 2025-12-06  3:54           ` Al Viro
  2025-12-06  4:22             ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2025-12-06  3:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Fri, Dec 05, 2025 at 07:29:13PM -0800, Linus Torvalds wrote:
> On Fri, 5 Dec 2025 at 18:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Sure, ->d_prune() would take it out of the rbtree, but what if it hits
> 
> Ahh.
> 
> Maybe increase the d_count before releasing that rbtree lock?
> 
> Or yeah, maybe moving it to d_release. Miklos?

Moving it to ->d_release() would be my preference, TBH.  Then
we could simply dget() the sucker under the lock and follow
that with existing dput_to_list() after dropping the lock...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  3:54           ` Al Viro
@ 2025-12-06  4:22             ` Al Viro
  2025-12-08 10:37               ` Miklos Szeredi
  2026-01-14 15:23               ` Miklos Szeredi
  0 siblings, 2 replies; 15+ messages in thread
From: Al Viro @ 2025-12-06  4:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Sat, Dec 06, 2025 at 03:54:03AM +0000, Al Viro wrote:
> On Fri, Dec 05, 2025 at 07:29:13PM -0800, Linus Torvalds wrote:
> > On Fri, 5 Dec 2025 at 18:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Sure, ->d_prune() would take it out of the rbtree, but what if it hits
> > 
> > Ahh.
> > 
> > Maybe increase the d_count before releasing that rbtree lock?
> > 
> > Or yeah, maybe moving it to d_release. Miklos?
> 
> Moving it to ->d_release() would be my preference, TBH.  Then
> we could simply dget() the sucker under the lock and follow
> that with existing dput_to_list() after dropping the lock...

s/dget/grab ->d_lock, increment ->d_count if not negative,
drop ->d_lock/ - we need to deal with the possibility of
the victim just going into __dentry_kill() as we find it.

And yes, it would be better off with something like
lockref_get_if_zero(struct lockref *lockref)
{
	bool retval = false;
	CMPXCHG_LOOP(
		new.count++;
		if (old_count != 0)
			return false;
	,
		return true;
	);
	spin_lock(&lockref->lock);
	if (lockref->count == 0)
		lockref->count = 1;
		retval = true;
	}
	spin_unlock(&lockref->lock);
	return retval;
}

with
		while (node) {
			fd = rb_entry(node, struct fuse_dentry, node);
			if (!time_after64(get_jiffies_64(), fd->time))
				break;
			rb_erase(&fd->node, &dentry_hash[i].tree);
			RB_CLEAR_NODE(&fd->node);
			if (lockref_get_if_zero(&dentry->d_lockref))
				dput_to_list(dentry);
			if (need_resched()) {
				spin_unlock(&dentry_hash[i].lock);
				schedule();
				spin_lock(&dentry_hash[i].lock);
			}
			node = rb_first(&dentry_hash[i].tree);
		}
in that loop.  Actually... a couple of questions:
	* why do we call shrink_dentry_list() separately for each hash
bucket?  Easier to gather everything and call it once...
	* what's the point of rbtree there?  What's wrong with plain
hlist?  Folks?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  4:22             ` Al Viro
@ 2025-12-08 10:37               ` Miklos Szeredi
  2026-01-14 15:23               ` Miklos Szeredi
  1 sibling, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2025-12-08 10:37 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Sat, 6 Dec 2025 at 05:22, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Dec 06, 2025 at 03:54:03AM +0000, Al Viro wrote:
> > On Fri, Dec 05, 2025 at 07:29:13PM -0800, Linus Torvalds wrote:
> > > On Fri, 5 Dec 2025 at 18:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > Sure, ->d_prune() would take it out of the rbtree, but what if it hits
> > >
> > > Ahh.
> > >
> > > Maybe increase the d_count before releasing that rbtree lock?
> > >
> > > Or yeah, maybe moving it to d_release. Miklos?
> >
> > Moving it to ->d_release() would be my preference, TBH.  Then
> > we could simply dget() the sucker under the lock and follow
> > that with existing dput_to_list() after dropping the lock...
>
> s/dget/grab ->d_lock, increment ->d_count if not negative,
> drop ->d_lock/ - we need to deal with the possibility of
> the victim just going into __dentry_kill() as we find it.
>
> And yes, it would be better off with something like
> lockref_get_if_zero(struct lockref *lockref)
> {
>         bool retval = false;
>         CMPXCHG_LOOP(
>                 new.count++;
>                 if (old_count != 0)
>                         return false;
>         ,
>                 return true;
>         );
>         spin_lock(&lockref->lock);
>         if (lockref->count == 0)
>                 lockref->count = 1;
>                 retval = true;
>         }
>         spin_unlock(&lockref->lock);
>         return retval;
> }
>
> with
>                 while (node) {
>                         fd = rb_entry(node, struct fuse_dentry, node);
>                         if (!time_after64(get_jiffies_64(), fd->time))
>                                 break;
>                         rb_erase(&fd->node, &dentry_hash[i].tree);
>                         RB_CLEAR_NODE(&fd->node);
>                         if (lockref_get_if_zero(&dentry->d_lockref))
>                                 dput_to_list(dentry);
>                         if (need_resched()) {
>                                 spin_unlock(&dentry_hash[i].lock);
>                                 schedule();
>                                 spin_lock(&dentry_hash[i].lock);
>                         }
>                         node = rb_first(&dentry_hash[i].tree);
>                 }
> in that loop.  Actually... a couple of questions:

Looks good.  Do you want me to submit a proper patch?

>         * why do we call shrink_dentry_list() separately for each hash
> bucket?  Easier to gather everything and call it once...

No good reason.

>         * what's the point of rbtree there?  What's wrong with plain
> hlist?  Folks?

The list needs to be ordered wrt. end of validity time.  The timeout
can be different from one dentry to another even within a fuse fs, but
more likely to be varied between different fuse filesystems, so
insertion time itself doesn't determine the validity end time.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2025-12-06  4:22             ` Al Viro
  2025-12-08 10:37               ` Miklos Szeredi
@ 2026-01-14 15:23               ` Miklos Szeredi
  2026-04-12  6:16                 ` Al Viro
  1 sibling, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2026-01-14 15:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Sat, 6 Dec 2025 at 05:22, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Dec 06, 2025 at 03:54:03AM +0000, Al Viro wrote:
> > On Fri, Dec 05, 2025 at 07:29:13PM -0800, Linus Torvalds wrote:
> > > On Fri, 5 Dec 2025 at 18:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > Sure, ->d_prune() would take it out of the rbtree, but what if it hits
> > >
> > > Ahh.
> > >
> > > Maybe increase the d_count before releasing that rbtree lock?
> > >
> > > Or yeah, maybe moving it to d_release. Miklos?
> >
> > Moving it to ->d_release() would be my preference, TBH.  Then
> > we could simply dget() the sucker under the lock and follow
> > that with existing dput_to_list() after dropping the lock...
>
> s/dget/grab ->d_lock, increment ->d_count if not negative,
> drop ->d_lock/ - we need to deal with the possibility of
> the victim just going into __dentry_kill() as we find it.
>
> And yes, it would be better off with something like
> lockref_get_if_zero(struct lockref *lockref)
> {
>         bool retval = false;
>         CMPXCHG_LOOP(
>                 new.count++;
>                 if (old_count != 0)
>                         return false;
>         ,
>                 return true;
>         );
>         spin_lock(&lockref->lock);
>         if (lockref->count == 0)
>                 lockref->count = 1;
>                 retval = true;
>         }
>         spin_unlock(&lockref->lock);
>         return retval;
> }
>
> with
>                 while (node) {
>                         fd = rb_entry(node, struct fuse_dentry, node);
>                         if (!time_after64(get_jiffies_64(), fd->time))
>                                 break;
>                         rb_erase(&fd->node, &dentry_hash[i].tree);
>                         RB_CLEAR_NODE(&fd->node);
>                         if (lockref_get_if_zero(&dentry->d_lockref))
>                                 dput_to_list(dentry);
>                         if (need_resched()) {
>                                 spin_unlock(&dentry_hash[i].lock);
>                                 schedule();
>                                 spin_lock(&dentry_hash[i].lock);
>                         }
>                         node = rb_first(&dentry_hash[i].tree);
>                 }

Posted a short patchset fixing this.

I really think there's no point in doing the get-ref, drop-ref dance.
 Retrieving the dentry from any source will require locking and that
needs to nest d_lock with or without the refcount manipulation.

So I kept the d_dispose_if_unused() API, but added a note to the
function doc that additional locking is necessary to prevent eviction.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2026-01-14 15:23               ` Miklos Szeredi
@ 2026-04-12  6:16                 ` Al Viro
  2026-04-13  0:29                   ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2026-04-12  6:16 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Wed, Jan 14, 2026 at 04:23:04PM +0100, Miklos Szeredi wrote:

> Posted a short patchset fixing this.
> 
> I really think there's no point in doing the get-ref, drop-ref dance.
>  Retrieving the dentry from any source will require locking and that
> needs to nest d_lock with or without the refcount manipulation.
> 
> So I kept the d_dispose_if_unused() API, but added a note to the
> function doc that additional locking is necessary to prevent eviction.

	Looking at that thing again, I really hate how subtle it is ;-/
Consider the lockless case in your ->d_release() and try to write down
the reasons why it's safe.  Among the other things, kfree_rcu() is there
to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
doubles as delaying that freeing past the scope of dentry_hash[].lock in
which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
to lockless path.  Another side of that fun is the proof that if
fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
Locked case of ->d_release() is easy; proving that the lockless one
is OK without explict barriers is more interesting.  Basically, all
insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
we are observing in RB_EMPTY_NODE() has come from those we will hit the
locked case.  If the value we observe has come from RB_CLEAR_NODE() in
earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
wrt each other and insertions), there mustn't have been any subsequent
insertions, or ->d_release() would've observed the effect of those.
If the value has come from the _same_ fuse_dentry_tree_work(), the
implicit barrier in spin_lock() would've ordered that store wrt beginning
of the scope, and since dentry_free() is ordered wrt ->d_release()
the callback of call_rcu() in there wouldn't run until the the end of
the scope in question.  So I think it's OK, but having it done without
a single comment either on barriers or on memory safety...  Ouch.

	Note that we *can* run into a dentry getting killed just after
RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
a no-op for dying and killed ones - it's not just "If dentry has no
external references, move it to shrink list" as your comment in
fs/dcache.c says.  And in your case it very much does have an
external reference - it's just that it's an equivalent of RCU one,
with all the joy that inflicts upon the user.

	FWIW, right now I'm still looking for the source of the UAF
reported by jlayton; I think this stuff in fuse can be excluded as
a possible cause, but I'm not happy with that primitive at all... ;-/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2026-04-12  6:16                 ` Al Viro
@ 2026-04-13  0:29                   ` Al Viro
  2026-04-13 15:32                     ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2026-04-13  0:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:

> 	Looking at that thing again, I really hate how subtle it is ;-/
> Consider the lockless case in your ->d_release() and try to write down
> the reasons why it's safe.  Among the other things, kfree_rcu() is there
> to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> doubles as delaying that freeing past the scope of dentry_hash[].lock in
> which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> to lockless path.  Another side of that fun is the proof that if
> fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> Locked case of ->d_release() is easy; proving that the lockless one
> is OK without explict barriers is more interesting.  Basically, all
> insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> we are observing in RB_EMPTY_NODE() has come from those we will hit the
> locked case.  If the value we observe has come from RB_CLEAR_NODE() in
> earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> wrt each other and insertions), there mustn't have been any subsequent
> insertions, or ->d_release() would've observed the effect of those.
> If the value has come from the _same_ fuse_dentry_tree_work(), the
> implicit barrier in spin_lock() would've ordered that store wrt beginning
> of the scope, and since dentry_free() is ordered wrt ->d_release()
> the callback of call_rcu() in there wouldn't run until the the end of
> the scope in question.  So I think it's OK, but having it done without
> a single comment either on barriers or on memory safety...  Ouch.
> 
> 	Note that we *can* run into a dentry getting killed just after
> RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> a no-op for dying and killed ones - it's not just "If dentry has no
> external references, move it to shrink list" as your comment in
> fs/dcache.c says.  And in your case it very much does have an
> external reference - it's just that it's an equivalent of RCU one,
> with all the joy that inflicts upon the user.

FWIW, here's what I've ended up with yesterday:

	/*
	 * ->d_release() is ordered (on ->d_lock) after everything done
	 * while holding counting references, including all calls of
	 * fuse_dentry_add_tree_node().  Freeing a dentry is RCU-delayed
	 * and scheduling it is ordered after ->d_release().
	 *
	 * In ->d_release() we see the initialization and all stores done by
	 * fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
	 * ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
	 * we also see all stores from fuse_dentry_tree_work() except possibly
	 * the last one.
	 *
	 * If the value fuse_dentry_release() observes in
	 *	if (!RB_EMPTY_NODE(&fd->node))
	 * has come from initialization, the node had never been inserted
	 * into rbtree and fuse_dentry_tree_work() won't see it at all.
	 *
	 * If the value we observe there comes from the last store in
	 * fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
	 * false and we are going to enter a dentry_hash[].lock scope,
	 * providing an ordering wrt fuse_dentry_tree_work() which either
	 * won't find fd in rbtree, or will complete before we
	 * get to even scheduling dentry freeing.
	 * 
	 * If the value we observe comes from fuse_dentry_tree_work(), we will
	 * be ordered past the beginning of dentry_hash[].lock scope that store
	 * had happened in.  Since call_rcu() that schedules freeing the dentry
	 * is ordered past the return from ->d_release(), freeing won't happen
	 * until the end of that scope.  Since there can be no subsequent calls
	 * of fuse_dentry_add_tree_node(), any subsequent calls of
	 * fuse_dentry_tree_work() won't see the node at all.
	 *
	 * Either way, if fuse_dentry_tree_work() sees a node, we are
	 * guaranteed that node->dentry won't get freed under it.
	 *
	 * Freeing the node itself is also RCU-delayed and scheduled in the
	 * very end of FUSE ->d_release(); similar considerations shows that
	 * if fuse_dentry_tree_work() sees a node, we are guaranteed that
	 * the node won't get freed under us either.
	 *
	 * While it is possible to run into a dying (or killed) dentry in
	 * fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
	 * used for those - it's a no-op in such case.
	 */

Is the above sane?  If it is, something similar would be worth having
written down somewhere in fs/fuse - it's not trivial and it's not hard
to break accidentally...

As for the primitive itself...

/**
 * __move_to_shrink_list - try to place a dentry into a shrink list
 * @dentry:	dentry to try putting into shrink list
 * @list:	the list to put @dentry into.
 * Returns:	true @dentry had been placed into @list, false otherwise
 *
 * If @dentry is idle and not already include into a shrink list, move
 * it into @list and return %true; otherwise do nothing and return %false.
 *
 * Caller must be holding @dentry->d_lock.  There must have been no calls of
 * dentry_free(@dentry) prior to the beginning of the RCU read-side critical
 * area in which __move_to_shrink_list(@dentry, @list) is called.
 *
 * @list should be thread-private and eventually emptied by passing it to
 * shrink_dentry_list().
 */
bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
        if (likely(!dentry->d_lockref.count &&
	    !(dentry->d_flags & DCACHE_SHRINK_LIST))) {
		if (dentry->d_flags & DCACHE_LRU_LIST)
			d_lru_del(dentry);
		d_shrink_add(dentry, list);
		return true;
	}
	return false;
}
EXPORT_SYMBOL(__move_to_shrink_list);

With something like
	spin_lock(&fd->dentry->d_lock);
	/* If dentry is still referenced, let next dput release it */
	fd->dentry->d_flags |= DCACHE_OP_DELETE;
	__move_to_shrink_list(fd->dentry, &dispose);
	spin_unlock(&fd->dentry->d_lock);
in fuse_dentry_tree_work() (along with the explanations of memory safety,
that is)

Comments?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] fuse update for 6.19
  2026-04-13  0:29                   ` Al Viro
@ 2026-04-13 15:32                     ` Miklos Szeredi
  0 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2026-04-13 15:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Mon, 13 Apr 2026 at 02:25, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:
>
> >       Looking at that thing again, I really hate how subtle it is ;-/
> > Consider the lockless case in your ->d_release() and try to write down
> > the reasons why it's safe.  Among the other things, kfree_rcu() is there
> > to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> > doubles as delaying that freeing past the scope of dentry_hash[].lock in
> > which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> > to lockless path.  Another side of that fun is the proof that if
> > fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> > Locked case of ->d_release() is easy; proving that the lockless one
> > is OK without explict barriers is more interesting.  Basically, all
> > insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> > we are observing in RB_EMPTY_NODE() has come from those we will hit the
> > locked case.  If the value we observe has come from RB_CLEAR_NODE() in
> > earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> > wrt each other and insertions), there mustn't have been any subsequent
> > insertions, or ->d_release() would've observed the effect of those.
> > If the value has come from the _same_ fuse_dentry_tree_work(), the
> > implicit barrier in spin_lock() would've ordered that store wrt beginning
> > of the scope, and since dentry_free() is ordered wrt ->d_release()
> > the callback of call_rcu() in there wouldn't run until the the end of
> > the scope in question.  So I think it's OK, but having it done without
> > a single comment either on barriers or on memory safety...  Ouch.
> >
> >       Note that we *can* run into a dentry getting killed just after
> > RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> > or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> > a no-op for dying and killed ones - it's not just "If dentry has no
> > external references, move it to shrink list" as your comment in
> > fs/dcache.c says.  And in your case it very much does have an
> > external reference - it's just that it's an equivalent of RCU one,
> > with all the joy that inflicts upon the user.
>
> FWIW, here's what I've ended up with yesterday:
>
>         /*
>          * ->d_release() is ordered (on ->d_lock) after everything done
>          * while holding counting references, including all calls of
>          * fuse_dentry_add_tree_node().  Freeing a dentry is RCU-delayed
>          * and scheduling it is ordered after ->d_release().
>          *
>          * In ->d_release() we see the initialization and all stores done by
>          * fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
>          * ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
>          * we also see all stores from fuse_dentry_tree_work() except possibly
>          * the last one.
>          *
>          * If the value fuse_dentry_release() observes in
>          *      if (!RB_EMPTY_NODE(&fd->node))
>          * has come from initialization, the node had never been inserted
>          * into rbtree and fuse_dentry_tree_work() won't see it at all.
>          *
>          * If the value we observe there comes from the last store in
>          * fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
>          * false and we are going to enter a dentry_hash[].lock scope,
>          * providing an ordering wrt fuse_dentry_tree_work() which either
>          * won't find fd in rbtree, or will complete before we
>          * get to even scheduling dentry freeing.
>          *
>          * If the value we observe comes from fuse_dentry_tree_work(), we will
>          * be ordered past the beginning of dentry_hash[].lock scope that store
>          * had happened in.  Since call_rcu() that schedules freeing the dentry
>          * is ordered past the return from ->d_release(), freeing won't happen
>          * until the end of that scope.  Since there can be no subsequent calls
>          * of fuse_dentry_add_tree_node(), any subsequent calls of
>          * fuse_dentry_tree_work() won't see the node at all.
>          *
>          * Either way, if fuse_dentry_tree_work() sees a node, we are
>          * guaranteed that node->dentry won't get freed under it.

This is indeed subtle.

I think best way to un-subtle it would be to just remove the lockless
check and unconditionally call fuse_dentry_tree_del_node(), which
would make the ordering explicit on dentry_hash[].lock.

>          *
>          * Freeing the node itself is also RCU-delayed and scheduled in the
>          * very end of FUSE ->d_release(); similar considerations shows that
>          * if fuse_dentry_tree_work() sees a node, we are guaranteed that
>          * the node won't get freed under us either.
>          *
>          * While it is possible to run into a dying (or killed) dentry in
>          * fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
>          * used for those - it's a no-op in such case.
>          */
>
> Is the above sane?  If it is, something similar would be worth having
> written down somewhere in fs/fuse - it's not trivial and it's not hard
> to break accidentally...

Yeah, thanks for the analysis.

>
> As for the primitive itself...
>
> /**
>  * __move_to_shrink_list - try to place a dentry into a shrink list
>  * @dentry:     dentry to try putting into shrink list
>  * @list:       the list to put @dentry into.
>  * Returns:     true @dentry had been placed into @list, false otherwise
>  *
>  * If @dentry is idle and not already include into a shrink list, move
>  * it into @list and return %true; otherwise do nothing and return %false.
>  *
>  * Caller must be holding @dentry->d_lock.  There must have been no calls of
>  * dentry_free(@dentry) prior to the beginning of the RCU read-side critical
>  * area in which __move_to_shrink_list(@dentry, @list) is called.
>  *
>  * @list should be thread-private and eventually emptied by passing it to
>  * shrink_dentry_list().
>  */
> bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
> __must_hold(&dentry->d_lock)
> {
>         if (likely(!dentry->d_lockref.count &&
>             !(dentry->d_flags & DCACHE_SHRINK_LIST))) {
>                 if (dentry->d_flags & DCACHE_LRU_LIST)
>                         d_lru_del(dentry);
>                 d_shrink_add(dentry, list);
>                 return true;
>         }
>         return false;
> }
> EXPORT_SYMBOL(__move_to_shrink_list);
>
> With something like
>         spin_lock(&fd->dentry->d_lock);
>         /* If dentry is still referenced, let next dput release it */
>         fd->dentry->d_flags |= DCACHE_OP_DELETE;
>         __move_to_shrink_list(fd->dentry, &dispose);
>         spin_unlock(&fd->dentry->d_lock);
> in fuse_dentry_tree_work() (along with the explanations of memory safety,
> that is)

Does this make any difference besides not having to lock d_lock twice?

Not against this change, just trying to understand the full implication.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-13 15:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-04  8:25 [GIT PULL] fuse update for 6.19 Miklos Szeredi
2025-12-05 23:47 ` Linus Torvalds
2025-12-06  1:42   ` Al Viro
2025-12-06  1:52     ` Linus Torvalds
2025-12-06  2:28       ` Al Viro
2025-12-06  3:10         ` Al Viro
2025-12-06  3:29         ` Linus Torvalds
2025-12-06  3:54           ` Al Viro
2025-12-06  4:22             ` Al Viro
2025-12-08 10:37               ` Miklos Szeredi
2026-01-14 15:23               ` Miklos Szeredi
2026-04-12  6:16                 ` Al Viro
2026-04-13  0:29                   ` Al Viro
2026-04-13 15:32                     ` Miklos Szeredi
2025-12-06  0:17 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox