From: Bernd Schubert <bernd@bsbernd.com>
To: Bernd Schubert <bschubert@ddn.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Amir Goldstein <amir73il@gmail.com>
Cc: Luis Henriques <luis@igalia.com>, Theodore Ts'o <tytso@mit.edu>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Kevin Chen <kchen@ddn.com>, Matt Harvey <mharvey@jumptrading.com>
Subject: Re: [RFC] Another take at restarting FUSE servers
Date: Wed, 5 Nov 2025 23:06:09 +0100 [thread overview]
Message-ID: <ed5084e4-af1c-4185-b66f-2b42d56d37a3@bsbernd.com> (raw)
In-Reply-To: <cb7c4237-74b4-4220-90f7-caf59d673bc4@ddn.com>
On 11/5/25 22:46, Bernd Schubert wrote:
>
>
> On 11/5/25 22:38, Darrick J. Wong wrote:
>> On Wed, Nov 05, 2025 at 04:30:51PM +0100, Amir Goldstein wrote:
>>> On Wed, Nov 5, 2025 at 12:50 PM Luis Henriques <luis@igalia.com> wrote:
>>>>
>>>> Hi Amir,
>>>>
>>>> On Wed, Nov 05 2025, Amir Goldstein wrote:
>>>>
>>>>> On Tue, Nov 4, 2025 at 3:52 PM Luis Henriques <luis@igalia.com> wrote:
>>>>
>>>> <...>
>>>>
>>>>>>> fuse_entry_out was extended once and fuse_reply_entry()
>>>>>>> sends the size of the struct.
>>>>>>
>>>>>> So, if I'm understanding you correctly, you're suggesting to extend
>>>>>> fuse_entry_out to add the new handle (a 'size' field + the actual handle).
>>>>>
>>>>> Well it depends...
>>>>>
>>>>> There are several ways to do it.
>>>>> I would really like to get Miklos and Bernd's opinion on the preferred way.
>>>>
>>>> Sure, all feedback is welcome!
>>>>
>>>>> So far, it looks like the client determines the size of the output args.
>>>>>
>>>>> If we want the server to be able to write a different file handle size
>>>>> per inode that's going to be a bigger challenge.
>>>>>
>>>>> I think it's plenty enough if server and client negotiate a max file handle
>>>>> size and then the client always reserves enough space in the output
>>>>> args buffer.
>>>>>
>>>>> One more thing to ask is what is "the actual handle".
>>>>> If "the actual handle" is the variable sized struct file_handle then
>>>>> the size is already available in the file handle header.
>>>>
>>>> Actually, this is exactly what I was trying to mimic for my initial
>>>> attempt. However, I was not going to do any size negotiation but instead
>>>> define a maximum size for the handle. See below.
>>>>
>>>>> If it is not, then I think some sort of type or version of the file handles
>>>>> encoding should be negotiated beyond the max handle size.
>>>>
>>>> In my initial stab at this I was going to take a very simple approach and
>>>> hard-code a maximum size for the handle. This would have the advantage of
>>>> allowing the server to use different sizes for different inodes (though
>>>> I'm not sure how useful that would be in practice). So, in summary, I
>>>> would define the new handle like this:
>>>>
>>>> /* Same value as MAX_HANDLE_SZ */
>>>> #define FUSE_MAX_HANDLE_SZ 128
>>>>
>>>> struct fuse_file_handle {
>>>> uint32_t size;
>>>> uint32_t padding;
>>>
>>> I think that the handle type is going to be relevant as well.
>>>
>>>> char handle[FUSE_MAX_HANDLE_SZ];
>>>> };
>>>>
>>>> and this struct would be included in fuse_entry_out.
>>>>
>>>> There's probably a problem with having this (big) fixed size increase to
>>>> fuse_entry_out, but maybe that could be fixed once I have all the other
>>>> details sorted out. Hopefully I'm not oversimplifying the problem,
>>>> skipping the need for negotiating a handle size.
>>>>
>>>
>>> Maybe this fixed size is reasonable for the first version of FUSE protocol
>>> as long as this overhead is NOT added if the server does not opt-in for the
>>> feature.
>>>
>>> IOW, allow the server to negotiate FUSE_MAX_HANDLE_SZ or 0,
>>> but keep the negotiation protocol extendable to another value later on.
>>>
>>>>>> That's probably a good idea. I was working towards having the
>>>>>> LOOKUP_HANDLE to be similar to LOOKUP, but extending it so that it would
>>>>>> include:
>>>>>>
>>>>>> - An extra inarg: the parent directory handle. (To be honest, I'm not
>>>>>> really sure this would be needed.)
>>>>>
>>>>> Yes, I think you need extra inarg.
>>>>> Why would it not be needed?
>>>>> The problem is that you cannot know if the parent node id in the lookup
>>>>> command is stale after server restart.
>>>>
>>>> Ah, of course. Hence the need for this extra inarg.
>>>>
>>>>> The thing is that the kernel fuse inode will need to store the file handle,
>>>>> much the same as an NFS client stores the file handle provided by the
>>>>> NFS server.
>>>>>
>>>>> FYI, fanotify has an optimized way to store file handles in
>>>>> struct fanotify_fid_event - small file handles are stored inline
>>>>> and larger file handles can use an external buffer.
>>>>>
>>>>> But fuse does not need to support any size of file handles.
>>>>> For first version we could definitely simplify things by limiting the size
>>>>> of supported file handles, because server and client need to negotiate
>>>>> the max file handle size anyway.
>>>>
>>>> I'll definitely need to have a look at how fanotify does that. But I
>>>> guess that if my simplistic approach with a static array is acceptable for
>>>> now, I'll stick with it for the initial attempt to implement this, and
>>>> eventually revisit it later to do something more clever.
>>>>
>>>
>>> What you proposed is the extension of fuse_entry_out for fuse
>>> protocol.
>>>
>>> My reference to fanotify_fid_event is meant to explain how to encode
>>> a file handle in fuse_inode in cache, because the fuse_inode_cachep
>>> cannot have variable sized inodes and in most of the cases, a short
>>> inline file handle should be enough.
>>>
>>> Therefore, if you limit the support in the first version to something like
>>> FANOTIFY_INLINE_FH_LEN, you can always store the file handle
>>> in fuse_inode and postpone support for bigger file handles to later.
>>
>> I suggest that you also provide a way for the fuse server to tell the
>> kernel that it can construct its own handles from {fuse_inode::nodeid,
>> inode::i_generation} if they want something more efficient than
>> uploading 128b blobs.
>
> Isn't that covered by handle size defined in FUSE_INIT reply? I.e.
> handle size would be 0B in this case?
Sorry my fault, yeah, this needs a special flag.
next prev parent reply other threads:[~2025-11-05 22:06 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 13:56 [RFC] Another take at restarting FUSE servers Luis Henriques
2025-07-29 23:38 ` Darrick J. Wong
2025-07-30 14:04 ` Luis Henriques
2025-07-31 11:33 ` Christian Brauner
2025-07-31 12:23 ` Luis Henriques
2025-07-31 17:29 ` Darrick J. Wong
2025-08-04 8:45 ` Christian Brauner
2025-08-12 19:28 ` Darrick J. Wong
2025-07-31 13:04 ` Theodore Ts'o
2025-07-31 17:38 ` Darrick J. Wong
2025-08-01 10:15 ` Luis Henriques
2025-08-11 15:43 ` Darrick J. Wong
2025-08-13 13:14 ` Luis Henriques
2025-09-12 10:31 ` Bernd Schubert
2025-09-12 11:41 ` Amir Goldstein
2025-09-12 12:29 ` Bernd Schubert
2025-09-12 14:58 ` Darrick J. Wong
2025-09-12 15:20 ` Bernd Schubert
2025-09-15 4:43 ` Darrick J. Wong
2025-09-15 7:07 ` Amir Goldstein
2025-09-15 8:27 ` Bernd Schubert
2025-09-15 8:41 ` Amir Goldstein
2025-09-16 2:53 ` Darrick J. Wong
2025-09-16 7:59 ` Amir Goldstein
2025-09-18 17:50 ` Darrick J. Wong
2025-11-04 11:40 ` Luis Henriques
2025-11-04 13:10 ` Amir Goldstein
2025-11-04 14:52 ` Luis Henriques
2025-11-05 10:21 ` Amir Goldstein
2025-11-05 11:50 ` Luis Henriques
2025-11-05 15:30 ` Amir Goldstein
2025-11-05 21:38 ` Darrick J. Wong
2025-11-05 21:46 ` Bernd Schubert
2025-11-05 22:06 ` Bernd Schubert [this message]
2025-11-05 22:24 ` Bernd Schubert
2025-11-05 22:42 ` Darrick J. Wong
2025-11-05 22:48 ` Bernd Schubert
2025-11-06 0:21 ` Darrick J. Wong
2025-11-06 10:13 ` Amir Goldstein
2025-11-06 15:12 ` Luis Henriques
2025-11-06 15:58 ` Luis Henriques
2025-11-06 15:49 ` Darrick J. Wong
2025-11-06 16:08 ` Stef Bon
2025-11-07 9:25 ` Luis Henriques
2025-11-10 8:20 ` Stef Bon
2025-11-06 16:11 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed5084e4-af1c-4185-b66f-2b42d56d37a3@bsbernd.com \
--to=bernd@bsbernd.com \
--cc=amir73il@gmail.com \
--cc=bschubert@ddn.com \
--cc=djwong@kernel.org \
--cc=kchen@ddn.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luis@igalia.com \
--cc=mharvey@jumptrading.com \
--cc=miklos@szeredi.hu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).