From: "Darrick J. Wong" <djwong@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Luis Henriques <luis@igalia.com>,
Bernd Schubert <bernd@bsbernd.com>, Theodore Ts'o <tytso@mit.edu>,
Miklos Szeredi <miklos@szeredi.hu>,
Bernd Schubert <bschubert@ddn.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Kevin Chen <kchen@ddn.com>, Matt Harvey <mharvey@jumptrading.com>
Subject: Re: [RFC] Another take at restarting FUSE servers
Date: Wed, 5 Nov 2025 13:38:55 -0800 [thread overview]
Message-ID: <20251105213855.GL196362@frogsfrogsfrogs> (raw)
In-Reply-To: <CAOQ4uxg+w5LHnVbYGLc_pq+zfAw5UXbfo0M2=dxFGKLmBvJ+5Q@mail.gmail.com>
On Wed, Nov 05, 2025 at 04:30:51PM +0100, Amir Goldstein wrote:
> On Wed, Nov 5, 2025 at 12:50 PM Luis Henriques <luis@igalia.com> wrote:
> >
> > Hi Amir,
> >
> > On Wed, Nov 05 2025, Amir Goldstein wrote:
> >
> > > On Tue, Nov 4, 2025 at 3:52 PM Luis Henriques <luis@igalia.com> wrote:
> >
> > <...>
> >
> > >> > fuse_entry_out was extended once and fuse_reply_entry()
> > >> > sends the size of the struct.
> > >>
> > >> So, if I'm understanding you correctly, you're suggesting to extend
> > >> fuse_entry_out to add the new handle (a 'size' field + the actual handle).
> > >
> > > Well it depends...
> > >
> > > There are several ways to do it.
> > > I would really like to get Miklos and Bernd's opinion on the preferred way.
> >
> > Sure, all feedback is welcome!
> >
> > > So far, it looks like the client determines the size of the output args.
> > >
> > > If we want the server to be able to write a different file handle size
> > > per inode that's going to be a bigger challenge.
> > >
> > > I think it's plenty enough if server and client negotiate a max file handle
> > > size and then the client always reserves enough space in the output
> > > args buffer.
> > >
> > > One more thing to ask is what is "the actual handle".
> > > If "the actual handle" is the variable sized struct file_handle then
> > > the size is already available in the file handle header.
> >
> > Actually, this is exactly what I was trying to mimic for my initial
> > attempt. However, I was not going to do any size negotiation but instead
> > define a maximum size for the handle. See below.
> >
> > > If it is not, then I think some sort of type or version of the file handles
> > > encoding should be negotiated beyond the max handle size.
> >
> > In my initial stab at this I was going to take a very simple approach and
> > hard-code a maximum size for the handle. This would have the advantage of
> > allowing the server to use different sizes for different inodes (though
> > I'm not sure how useful that would be in practice). So, in summary, I
> > would define the new handle like this:
> >
> > /* Same value as MAX_HANDLE_SZ */
> > #define FUSE_MAX_HANDLE_SZ 128
> >
> > struct fuse_file_handle {
> > uint32_t size;
> > uint32_t padding;
>
> I think that the handle type is going to be relevant as well.
>
> > char handle[FUSE_MAX_HANDLE_SZ];
> > };
> >
> > and this struct would be included in fuse_entry_out.
> >
> > There's probably a problem with having this (big) fixed size increase to
> > fuse_entry_out, but maybe that could be fixed once I have all the other
> > details sorted out. Hopefully I'm not oversimplifying the problem,
> > skipping the need for negotiating a handle size.
> >
>
> Maybe this fixed size is reasonable for the first version of FUSE protocol
> as long as this overhead is NOT added if the server does not opt-in for the
> feature.
>
> IOW, allow the server to negotiate FUSE_MAX_HANDLE_SZ or 0,
> but keep the negotiation protocol extendable to another value later on.
>
> > >> That's probably a good idea. I was working towards having the
> > >> LOOKUP_HANDLE to be similar to LOOKUP, but extending it so that it would
> > >> include:
> > >>
> > >> - An extra inarg: the parent directory handle. (To be honest, I'm not
> > >> really sure this would be needed.)
> > >
> > > Yes, I think you need extra inarg.
> > > Why would it not be needed?
> > > The problem is that you cannot know if the parent node id in the lookup
> > > command is stale after server restart.
> >
> > Ah, of course. Hence the need for this extra inarg.
> >
> > > The thing is that the kernel fuse inode will need to store the file handle,
> > > much the same as an NFS client stores the file handle provided by the
> > > NFS server.
> > >
> > > FYI, fanotify has an optimized way to store file handles in
> > > struct fanotify_fid_event - small file handles are stored inline
> > > and larger file handles can use an external buffer.
> > >
> > > But fuse does not need to support any size of file handles.
> > > For first version we could definitely simplify things by limiting the size
> > > of supported file handles, because server and client need to negotiate
> > > the max file handle size anyway.
> >
> > I'll definitely need to have a look at how fanotify does that. But I
> > guess that if my simplistic approach with a static array is acceptable for
> > now, I'll stick with it for the initial attempt to implement this, and
> > eventually revisit it later to do something more clever.
> >
>
> What you proposed is the extension of fuse_entry_out for fuse
> protocol.
>
> My reference to fanotify_fid_event is meant to explain how to encode
> a file handle in fuse_inode in cache, because the fuse_inode_cachep
> cannot have variable sized inodes and in most of the cases, a short
> inline file handle should be enough.
>
> Therefore, if you limit the support in the first version to something like
> FANOTIFY_INLINE_FH_LEN, you can always store the file handle
> in fuse_inode and postpone support for bigger file handles to later.
I suggest that you also provide a way for the fuse server to tell the
kernel that it can construct its own handles from {fuse_inode::nodeid,
inode::i_generation} if they want something more efficient than
uploading 128b blobs.
--D
> Thanks,
> Amir.
>
next prev parent reply other threads:[~2025-11-05 21:38 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 13:56 [RFC] Another take at restarting FUSE servers Luis Henriques
2025-07-29 23:38 ` Darrick J. Wong
2025-07-30 14:04 ` Luis Henriques
2025-07-31 11:33 ` Christian Brauner
2025-07-31 12:23 ` Luis Henriques
2025-07-31 17:29 ` Darrick J. Wong
2025-08-04 8:45 ` Christian Brauner
2025-08-12 19:28 ` Darrick J. Wong
2025-07-31 13:04 ` Theodore Ts'o
2025-07-31 17:38 ` Darrick J. Wong
2025-08-01 10:15 ` Luis Henriques
2025-08-11 15:43 ` Darrick J. Wong
2025-08-13 13:14 ` Luis Henriques
2025-09-12 10:31 ` Bernd Schubert
2025-09-12 11:41 ` Amir Goldstein
2025-09-12 12:29 ` Bernd Schubert
2025-09-12 14:58 ` Darrick J. Wong
2025-09-12 15:20 ` Bernd Schubert
2025-09-15 4:43 ` Darrick J. Wong
2025-09-15 7:07 ` Amir Goldstein
2025-09-15 8:27 ` Bernd Schubert
2025-09-15 8:41 ` Amir Goldstein
2025-09-16 2:53 ` Darrick J. Wong
2025-09-16 7:59 ` Amir Goldstein
2025-09-18 17:50 ` Darrick J. Wong
2025-11-04 11:40 ` Luis Henriques
2025-11-04 13:10 ` Amir Goldstein
2025-11-04 14:52 ` Luis Henriques
2025-11-05 10:21 ` Amir Goldstein
2025-11-05 11:50 ` Luis Henriques
2025-11-05 15:30 ` Amir Goldstein
2025-11-05 21:38 ` Darrick J. Wong [this message]
2025-11-05 21:46 ` Bernd Schubert
2025-11-05 22:06 ` Bernd Schubert
2025-11-05 22:24 ` Bernd Schubert
2025-11-05 22:42 ` Darrick J. Wong
2025-11-05 22:48 ` Bernd Schubert
2025-11-06 0:21 ` Darrick J. Wong
2025-11-06 10:13 ` Amir Goldstein
2025-11-06 15:12 ` Luis Henriques
2025-11-06 15:58 ` Luis Henriques
2025-11-06 15:49 ` Darrick J. Wong
2025-11-06 16:08 ` Stef Bon
2025-11-07 9:25 ` Luis Henriques
2025-11-10 8:20 ` Stef Bon
2025-11-06 16:11 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251105213855.GL196362@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=bernd@bsbernd.com \
--cc=bschubert@ddn.com \
--cc=kchen@ddn.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luis@igalia.com \
--cc=mharvey@jumptrading.com \
--cc=miklos@szeredi.hu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.