linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Luis Henriques <luis@igalia.com>,
	Bernd Schubert <bernd@bsbernd.com>, Theodore Ts'o <tytso@mit.edu>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Bernd Schubert <bschubert@ddn.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kevin Chen <kchen@ddn.com>, Matt Harvey <mharvey@jumptrading.com>
Subject: Re: [RFC] Another take at restarting FUSE servers
Date: Wed, 5 Nov 2025 13:38:55 -0800	[thread overview]
Message-ID: <20251105213855.GL196362@frogsfrogsfrogs> (raw)
In-Reply-To: <CAOQ4uxg+w5LHnVbYGLc_pq+zfAw5UXbfo0M2=dxFGKLmBvJ+5Q@mail.gmail.com>

On Wed, Nov 05, 2025 at 04:30:51PM +0100, Amir Goldstein wrote:
> On Wed, Nov 5, 2025 at 12:50 PM Luis Henriques <luis@igalia.com> wrote:
> >
> > Hi Amir,
> >
> > On Wed, Nov 05 2025, Amir Goldstein wrote:
> >
> > > On Tue, Nov 4, 2025 at 3:52 PM Luis Henriques <luis@igalia.com> wrote:
> >
> > <...>
> >
> > >> > fuse_entry_out was extended once and fuse_reply_entry()
> > >> > sends the size of the struct.
> > >>
> > >> So, if I'm understanding you correctly, you're suggesting to extend
> > >> fuse_entry_out to add the new handle (a 'size' field + the actual handle).
> > >
> > > Well it depends...
> > >
> > > There are several ways to do it.
> > > I would really like to get Miklos and Bernd's opinion on the preferred way.
> >
> > Sure, all feedback is welcome!
> >
> > > So far, it looks like the client determines the size of the output args.
> > >
> > > If we want the server to be able to write a different file handle size
> > > per inode that's going to be a bigger challenge.
> > >
> > > I think it's plenty enough if server and client negotiate a max file handle
> > > size and then the client always reserves enough space in the output
> > > args buffer.
> > >
> > > One more thing to ask is what is "the actual handle".
> > > If "the actual handle" is the variable sized struct file_handle then
> > > the size is already available in the file handle header.
> >
> > Actually, this is exactly what I was trying to mimic for my initial
> > attempt.  However, I was not going to do any size negotiation but instead
> > define a maximum size for the handle.  See below.
> >
> > > If it is not, then I think some sort of type or version of the file handles
> > > encoding should be negotiated beyond the max handle size.
> >
> > In my initial stab at this I was going to take a very simple approach and
> > hard-code a maximum size for the handle.  This would have the advantage of
> > allowing the server to use different sizes for different inodes (though
> > I'm not sure how useful that would be in practice).  So, in summary, I
> > would define the new handle like this:
> >
> > /* Same value as MAX_HANDLE_SZ */
> > #define FUSE_MAX_HANDLE_SZ 128
> >
> > struct fuse_file_handle {
> >         uint32_t        size;
> >         uint32_t        padding;
> 
> I think that the handle type is going to be relevant as well.
> 
> >         char            handle[FUSE_MAX_HANDLE_SZ];
> > };
> >
> > and this struct would be included in fuse_entry_out.
> >
> > There's probably a problem with having this (big) fixed size increase to
> > fuse_entry_out, but maybe that could be fixed once I have all the other
> > details sorted out.  Hopefully I'm not oversimplifying the problem,
> > skipping the need for negotiating a handle size.
> >
> 
> Maybe this fixed size is reasonable for the first version of FUSE protocol
> as long as this overhead is NOT added if the server does not opt-in for the
> feature.
> 
> IOW, allow the server to negotiate FUSE_MAX_HANDLE_SZ or 0,
> but keep the negotiation protocol extendable to another value later on.
> 
> > >> That's probably a good idea.  I was working towards having the
> > >> LOOKUP_HANDLE to be similar to LOOKUP, but extending it so that it would
> > >> include:
> > >>
> > >>  - An extra inarg: the parent directory handle.  (To be honest, I'm not
> > >>    really sure this would be needed.)
> > >
> > > Yes, I think you need extra inarg.
> > > Why would it not be needed?
> > > The problem is that you cannot know if the parent node id in the lookup
> > > command is stale after server restart.
> >
> > Ah, of course.  Hence the need for this extra inarg.
> >
> > > The thing is that the kernel fuse inode will need to store the file handle,
> > > much the same as an NFS client stores the file handle provided by the
> > > NFS server.
> > >
> > > FYI, fanotify has an optimized way to store file handles in
> > > struct fanotify_fid_event - small file handles are stored inline
> > > and larger file handles can use an external buffer.
> > >
> > > But fuse does not need to support any size of file handles.
> > > For first version we could definitely simplify things by limiting the size
> > > of supported file handles, because server and client need to negotiate
> > > the max file handle size anyway.
> >
> > I'll definitely need to have a look at how fanotify does that.  But I
> > guess that if my simplistic approach with a static array is acceptable for
> > now, I'll stick with it for the initial attempt to implement this, and
> > eventually revisit it later to do something more clever.
> >
> 
> What you proposed is the extension of fuse_entry_out for fuse
> protocol.
> 
> My reference to fanotify_fid_event is meant to explain how to encode
> a file handle in fuse_inode in cache, because the fuse_inode_cachep
> cannot have variable sized inodes and in most of the cases, a short
> inline file handle should be enough.
> 
> Therefore, if you limit the support in the first version to something like
> FANOTIFY_INLINE_FH_LEN, you can always store the file handle
> in fuse_inode and postpone support for bigger file handles to later.

I suggest that you also provide a way for the fuse server to tell the
kernel that it can construct its own handles from {fuse_inode::nodeid,
inode::i_generation} if they want something more efficient than
uploading 128b blobs.

--D

> Thanks,
> Amir.
> 

  reply	other threads:[~2025-11-05 21:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 13:56 [RFC] Another take at restarting FUSE servers Luis Henriques
2025-07-29 23:38 ` Darrick J. Wong
2025-07-30 14:04   ` Luis Henriques
2025-07-31 11:33     ` Christian Brauner
2025-07-31 12:23       ` Luis Henriques
2025-07-31 17:29       ` Darrick J. Wong
2025-08-04  8:45         ` Christian Brauner
2025-08-12 19:28           ` Darrick J. Wong
2025-07-31 13:04   ` Theodore Ts'o
2025-07-31 17:38     ` Darrick J. Wong
2025-08-01 10:15       ` Luis Henriques
2025-08-11 15:43         ` Darrick J. Wong
2025-08-13 13:14           ` Luis Henriques
2025-09-12 10:31         ` Bernd Schubert
2025-09-12 11:41           ` Amir Goldstein
2025-09-12 12:29             ` Bernd Schubert
2025-09-12 14:58               ` Darrick J. Wong
2025-09-12 15:20                 ` Bernd Schubert
2025-09-15  4:43                   ` Darrick J. Wong
2025-09-15  7:07                 ` Amir Goldstein
2025-09-15  8:27                   ` Bernd Schubert
2025-09-15  8:41                     ` Amir Goldstein
2025-09-16  2:53                       ` Darrick J. Wong
2025-09-16  7:59                         ` Amir Goldstein
2025-09-18 17:50                           ` Darrick J. Wong
2025-11-04 11:40                           ` Luis Henriques
2025-11-04 13:10                             ` Amir Goldstein
2025-11-04 14:52                               ` Luis Henriques
2025-11-05 10:21                                 ` Amir Goldstein
2025-11-05 11:50                                   ` Luis Henriques
2025-11-05 15:30                                     ` Amir Goldstein
2025-11-05 21:38                                       ` Darrick J. Wong [this message]
2025-11-05 21:46                                         ` Bernd Schubert
2025-11-05 22:06                                           ` Bernd Schubert
2025-11-05 22:24                               ` Bernd Schubert
2025-11-05 22:42                                 ` Darrick J. Wong
2025-11-05 22:48                                   ` Bernd Schubert
2025-11-06  0:21                                     ` Darrick J. Wong
2025-11-06 10:13                                     ` Amir Goldstein
2025-11-06 15:12                                       ` Luis Henriques
2025-11-06 15:58                                         ` Luis Henriques
2025-11-06 15:49                                       ` Darrick J. Wong
2025-11-06 16:08                                         ` Stef Bon
2025-11-07  9:25                                           ` Luis Henriques
2025-11-10  8:20                                             ` Stef Bon
2025-11-06 16:11                                         ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251105213855.GL196362@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bernd@bsbernd.com \
    --cc=bschubert@ddn.com \
    --cc=kchen@ddn.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luis@igalia.com \
    --cc=mharvey@jumptrading.com \
    --cc=miklos@szeredi.hu \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).