From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73FFC2DF141; Wed, 5 Nov 2025 21:38:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762378736; cv=none; b=AFWStp95yejFRX2C+Xjr+vRt0UKaiQBE7805I6SSahDNhe/vGiWn5du4ArPz+RxlSC7tcTHAeNOPGZumUUegErlVGbgc/mK43U2mRSScvIZIlyDGkdIr73en7WmcTZTyny0lBJvVilWXnGvSFTNL4WeQ3Irh88a49Km0Ki0DX0Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762378736; c=relaxed/simple; bh=xfMxF7RJqMp/U35x5Qb8xJcQ59UFp9oCFcL0rXtzeXA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dmLX5Qj/F/XFT4NfBsp8bArabXOn8CS2dDl/7fhIuGVlWz5R7CEfSoNF1oll4gpcsNywugpcdmlJWjsj6Dceyk83VquD1OTWQJNrxZ8pW/lI5Qdz0AuGTgpKgVuZYJMPQDvLmJviTsMQ0jlduQKgTagbdg2LRRPHBbVr1fygrco= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ml0XQbcG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ml0XQbcG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DEFE1C4CEF5; Wed, 5 Nov 2025 21:38:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762378736; bh=xfMxF7RJqMp/U35x5Qb8xJcQ59UFp9oCFcL0rXtzeXA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ml0XQbcGGuiLwCLdDg8r53j5HzYL33Ao6m9BgkSJEMxiXiwQK4LeuWa2yW0qkgNRG WhlKlC15hOYLvUb9Czfxs9ZMT2S7tHNgQ2k+y7KRuHXEQvYVK4DRpJmGKO5DJHYPuM FvO8OWn3R2Hkv5jM8ryWES+16LO5sX6bZ1HCw8/9BK7V5o6jZMjlKFO5+5Z/TtEVqZ FNhabcqCcqj4Ku0v6hJ/Ny9IB+FYM5ykLQTqBXVVkSWkui1buygjE+iK0eYvcAjZUQ 0IPETHH++vovZljs250Mb9rZ7tZJ5h9Mqi/n81lxoiBlUFgvMlcmGqO+qF9H7kJFcS 66QIVhHaP0vZw== Date: Wed, 5 Nov 2025 13:38:55 -0800 From: "Darrick J. Wong" To: Amir Goldstein Cc: Luis Henriques , Bernd Schubert , Theodore Ts'o , Miklos Szeredi , Bernd Schubert , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kevin Chen , Matt Harvey Subject: Re: [RFC] Another take at restarting FUSE servers Message-ID: <20251105213855.GL196362@frogsfrogsfrogs> References: <2e1db15f-b2b1-487f-9f42-44dc7480b2e2@bsbernd.com> <20250916025341.GO1587915@frogsfrogsfrogs> <87ldkm6n5o.fsf@wotan.olymp> <87cy5x7sud.fsf@wotan.olymp> <87ecqcpujw.fsf@wotan.olymp> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Nov 05, 2025 at 04:30:51PM +0100, Amir Goldstein wrote: > On Wed, Nov 5, 2025 at 12:50 PM Luis Henriques wrote: > > > > Hi Amir, > > > > On Wed, Nov 05 2025, Amir Goldstein wrote: > > > > > On Tue, Nov 4, 2025 at 3:52 PM Luis Henriques wrote: > > > > <...> > > > > >> > fuse_entry_out was extended once and fuse_reply_entry() > > >> > sends the size of the struct. > > >> > > >> So, if I'm understanding you correctly, you're suggesting to extend > > >> fuse_entry_out to add the new handle (a 'size' field + the actual handle). > > > > > > Well it depends... > > > > > > There are several ways to do it. > > > I would really like to get Miklos and Bernd's opinion on the preferred way. > > > > Sure, all feedback is welcome! > > > > > So far, it looks like the client determines the size of the output args. > > > > > > If we want the server to be able to write a different file handle size > > > per inode that's going to be a bigger challenge. > > > > > > I think it's plenty enough if server and client negotiate a max file handle > > > size and then the client always reserves enough space in the output > > > args buffer. > > > > > > One more thing to ask is what is "the actual handle". > > > If "the actual handle" is the variable sized struct file_handle then > > > the size is already available in the file handle header. > > > > Actually, this is exactly what I was trying to mimic for my initial > > attempt. However, I was not going to do any size negotiation but instead > > define a maximum size for the handle. See below. > > > > > If it is not, then I think some sort of type or version of the file handles > > > encoding should be negotiated beyond the max handle size. > > > > In my initial stab at this I was going to take a very simple approach and > > hard-code a maximum size for the handle. This would have the advantage of > > allowing the server to use different sizes for different inodes (though > > I'm not sure how useful that would be in practice). So, in summary, I > > would define the new handle like this: > > > > /* Same value as MAX_HANDLE_SZ */ > > #define FUSE_MAX_HANDLE_SZ 128 > > > > struct fuse_file_handle { > > uint32_t size; > > uint32_t padding; > > I think that the handle type is going to be relevant as well. > > > char handle[FUSE_MAX_HANDLE_SZ]; > > }; > > > > and this struct would be included in fuse_entry_out. > > > > There's probably a problem with having this (big) fixed size increase to > > fuse_entry_out, but maybe that could be fixed once I have all the other > > details sorted out. Hopefully I'm not oversimplifying the problem, > > skipping the need for negotiating a handle size. > > > > Maybe this fixed size is reasonable for the first version of FUSE protocol > as long as this overhead is NOT added if the server does not opt-in for the > feature. > > IOW, allow the server to negotiate FUSE_MAX_HANDLE_SZ or 0, > but keep the negotiation protocol extendable to another value later on. > > > >> That's probably a good idea. I was working towards having the > > >> LOOKUP_HANDLE to be similar to LOOKUP, but extending it so that it would > > >> include: > > >> > > >> - An extra inarg: the parent directory handle. (To be honest, I'm not > > >> really sure this would be needed.) > > > > > > Yes, I think you need extra inarg. > > > Why would it not be needed? > > > The problem is that you cannot know if the parent node id in the lookup > > > command is stale after server restart. > > > > Ah, of course. Hence the need for this extra inarg. > > > > > The thing is that the kernel fuse inode will need to store the file handle, > > > much the same as an NFS client stores the file handle provided by the > > > NFS server. > > > > > > FYI, fanotify has an optimized way to store file handles in > > > struct fanotify_fid_event - small file handles are stored inline > > > and larger file handles can use an external buffer. > > > > > > But fuse does not need to support any size of file handles. > > > For first version we could definitely simplify things by limiting the size > > > of supported file handles, because server and client need to negotiate > > > the max file handle size anyway. > > > > I'll definitely need to have a look at how fanotify does that. But I > > guess that if my simplistic approach with a static array is acceptable for > > now, I'll stick with it for the initial attempt to implement this, and > > eventually revisit it later to do something more clever. > > > > What you proposed is the extension of fuse_entry_out for fuse > protocol. > > My reference to fanotify_fid_event is meant to explain how to encode > a file handle in fuse_inode in cache, because the fuse_inode_cachep > cannot have variable sized inodes and in most of the cases, a short > inline file handle should be enough. > > Therefore, if you limit the support in the first version to something like > FANOTIFY_INLINE_FH_LEN, you can always store the file handle > in fuse_inode and postpone support for bigger file handles to later. I suggest that you also provide a way for the fuse server to tell the kernel that it can construct its own handles from {fuse_inode::nodeid, inode::i_generation} if they want something more efficient than uploading 128b blobs. --D > Thanks, > Amir. >