linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Henriques <luis@igalia.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Matt Harvey <mharvey@jumptrading.com>,
	Bernd Schubert <bschubert@ddn.com>,
	 Joanne Koong <joannelkoong@gmail.com>
Subject: Re: [PATCH v4] fuse: add new function to invalidate cache for all inodes
Date: Thu, 13 Feb 2025 11:08:28 +0000	[thread overview]
Message-ID: <87mseqcdar.fsf@igalia.com> (raw)
In-Reply-To: <Z60bD3C_p_PHao0n@dread.disaster.area> (Dave Chinner's message of "Thu, 13 Feb 2025 09:05:03 +1100")

On Thu, Feb 13 2025, Dave Chinner wrote:

> On Wed, Feb 12, 2025 at 11:32:40AM +0000, Luis Henriques wrote:
>> On Wed, Feb 12 2025, Dave Chinner wrote:
>> 
>> > [ FWIW: if the commit message directly references someone else's
>> > related (and somewhat relevant) work, please directly CC those
>> > people on the patch(set). I only noticed this by chance, not because
>> > I read every FUSE related patch that goes by me. ]
>> 
>> Point taken -- I should have included you on CC since the initial RFC.
>> 
>> > On Tue, Feb 11, 2025 at 09:26:04AM +0000, Luis Henriques wrote:
>> >> Currently userspace is able to notify the kernel to invalidate the cache
>> >> for an inode.  This means that, if all the inodes in a filesystem need to
>> >> be invalidated, then userspace needs to iterate through all of them and do
>> >> this kernel notification separately.
>> >> 
>> >> This patch adds a new option that allows userspace to invalidate all the
>> >> inodes with a single notification operation.  In addition to invalidate
>> >> all the inodes, it also shrinks the sb dcache.
>> >
>> > That, IMO, seems like a bit naive - we generally don't allow user
>> > controlled denial of service vectors to be added to the kernel. i.e.
>> > this is the equivalent of allowing FUSE fs specific 'echo 1 >
>> > /proc/sys/vm/drop_caches' via some fuse specific UAPI. We only allow
>> > root access to /proc/sys/vm/drop_caches because it can otherwise be
>> > easily abused to cause system wide performance issues.
>> >
>> > It also strikes me as a somewhat dangerous precendent - invalidating
>> > random VFS caches through user APIs hidden deep in random fs
>> > implementations makes for poor visibility and difficult maintenance
>> > of VFS level functionality...
>> 
>> Hmm... OK, I understand the concern and your comment makes perfect sense.
>> But would it be acceptable to move this API upper in the stack and make it
>> visible at the VFS layer?  Something similar to the 'drop_caches' but with
>> a superblock granularity.  I haven't spent any time thinking how could
>> that be done, but it wouldn't be "hidden deep" anymore.
>
> I'm yet to see any justification for why 'user driven entire
> filesystem cache invalidation' is needed. Get agreement on whether
> the functionality should exist first, then worry about how to
> implement it.
>
>> >> Signed-off-by: Luis Henriques <luis@igalia.com>
>> >> ---
>> >> * Changes since v3
>> >> - Added comments to clarify semantic changes in fuse_reverse_inval_inode()
>> >>   when called with FUSE_INVAL_ALL_INODES (suggested by Bernd).
>> >> - Added comments to inodes iteration loop to clarify __iget/iput usage
>> >>   (suggested by Joanne)
>> >> - Dropped get_fuse_mount() call -- fuse_mount can be obtained from
>> >>   fuse_ilookup() directly (suggested by Joanne)
>> >> 
>> >> (Also dropped the RFC from the subject.)
>> >> 
>> >> * Changes since v2
>> >> - Use the new helper from fuse_reverse_inval_inode(), as suggested by Bernd.
>> >> - Also updated patch description as per checkpatch.pl suggestion.
>> >> 
>> >> * Changes since v1
>> >> As suggested by Bernd, this patch v2 simply adds an helper function that
>> >> will make it easier to replace most of it's code by a call to function
>> >> super_iter_inodes() when Dave Chinner's patch[1] eventually gets merged.
>> >> 
>> >> [1] https://lore.kernel.org/r/20241002014017.3801899-3-david@fromorbit.com
>> >
>> > That doesn't make the functionality any more palatable.
>> >
>> > Those iterators are the first step in removing the VFS inode list
>> > and only maintaining it in filesystems that actually need this
>> > functionality. We want this list to go away because maintaining it
>> > is a general VFS cache scalability limitation.
>> >
>> > i.e. if a filesystem has internal functionality that requires
>> > iterating all instantiated inodes, the filesystem itself should
>> > maintain that list in the most efficient manner for the filesystem's
>> > iteration requirements not rely on the VFS to maintain this
>> > information for it.
>> 
>> Right, and in my use-case that's exactly what is currently being done: the
>> FUSE API to invalidate individual inodes is being used.
>>
>> This new
>> functionality just tries to make life easier to userspace when *all* the
>> inodes need to be invalidated. (For reference, the use-case is CVMFS, a
>> read-only FS, where new generations of a filesystem snapshot may become
>> available at some point and the previous one needs to be wiped from the
>> cache.)
>
> But you can't actually "wipe" referenced inodes from cache. That is a
> use case for revoke(), not inode cache invalidation.

I guess the word "wipe" wasn't the best choice.  See below.

>> > I'm left to ponder why the invalidation isn't simply:
>> >
>> > 	/* Remove all possible active references to cached inodes */
>> > 	shrink_dcache_sb();
>> >
>> > 	/* Remove all unreferenced inodes from cache */
>> > 	invalidate_inodes();
>> >
>> > Which will result in far more of the inode cache for the filesystem
>> > being invalidated than the above code....
>> 
>> To be honest, my initial attempt to implement this feature actually used
>> invalidate_inodes().  For some reason that I don't remember anymore why I
>> decided to implement the iterator myself.  I'll go look at that code again
>> and run some tests on this (much!) simplified version of the invalidation
>> function your suggesting.
>
> The above code, while simpler, still doesn't resolve the problem of
> invalidation of inodes that have active references (e.g. open files
> on them). They can't be "invalidated" in this way - they can't be
> removed from cache until all active references go away.

Sure, I understand that and that's *not* what I'm trying to do.  I guess
I'm just failing to describe my goal.  If there's a userspace process that
has a file open for an inode that does not exist anymore, that process
will continue using it -- the user-space filesystem will have to deal with
that.

Right now, fuse allows the user-space filesystem to notify the kernel that
*one* inode is not valid anymore.  This is a per inode operation.  I guess
this is very useful, for example, for network filesystems, where an inode
may have been deleted from somewhere else.

However, when user-space wants to do this for all the filesystem inodes,
it will be slow.  With my patch all I wanted to do is to make it a bit
less painful by moving the inodes iteration into the kernel.

Cheers,
-- 
Luís

> i.e. any operation that is based on the assumption that we can
> remove all references to inodes and dentries by walking across them
> and dropping cache references to them is fundamentally flawed. To do
> this reliably, all active references have to be hunted down and
> released before the inodes can be removed from VFS visibility. i.e.
> the mythical revoke() operation would need to be implemented for
> this to work...
>
> -Dave.
>
> -- 
> Dave Chinner
> david@fromorbit.com


      reply	other threads:[~2025-02-13 11:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11  9:26 [PATCH v4] fuse: add new function to invalidate cache for all inodes Luis Henriques
2025-02-11 20:56 ` Dave Chinner
2025-02-12 11:32   ` Luis Henriques
2025-02-12 22:05     ` Dave Chinner
2025-02-13 11:08       ` Luis Henriques [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mseqcdar.fsf@igalia.com \
    --to=luis@igalia.com \
    --cc=bschubert@ddn.com \
    --cc=david@fromorbit.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mharvey@jumptrading.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).