From: "Darrick J. Wong" <djwong@kernel.org>
To: John Groves <John@groves.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Dan Williams <dan.j.williams@intel.com>,
Bernd Schubert <bschubert@ddn.com>,
John Groves <jgroves@micron.com>,
Jonathan Corbet <corbet@lwn.net>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Luis Henriques <luis@igalia.com>,
Randy Dunlap <rdunlap@infradead.org>,
Jeff Layton <jlayton@kernel.org>,
Kent Overstreet <kent.overstreet@linux.dev>,
Petr Vorel <pvorel@suse.cz>, Brian Foster <bfoster@redhat.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Amir Goldstein <amir73il@gmail.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Stefan Hajnoczi <shajnocz@redhat.com>,
Joanne Koong <joannelkoong@gmail.com>,
Josef Bacik <josef@toxicpanda.com>,
Aravind Ramesh <arramesh@micron.com>,
Ajay Joshi <ajayjoshi@micron.com>
Subject: Re: [RFC PATCH 13/19] famfs_fuse: Create files with famfs fmaps
Date: Mon, 12 May 2025 21:03:21 -0700 [thread overview]
Message-ID: <20250513040321.GO1035866@frogsfrogsfrogs> (raw)
In-Reply-To: <aytnzv4tmp7fdvpgxdfoe2ncu7qaxlp2svsxiskfnrvdnknhmp@uu4ifgc6aj34>
On Mon, May 12, 2025 at 02:51:45PM -0500, John Groves wrote:
> On 25/05/06 06:56PM, Miklos Szeredi wrote:
> > On Mon, 28 Apr 2025 at 21:00, Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > > <nod> I don't know what Miklos' opinion is about having multiple
> > > fusecmds that do similar things -- on the one hand keeping yours and my
> > > efforts separate explodes the amount of userspace abi that everyone must
> > > maintain, but on the other hand it then doesn't couple our projects
> > > together, which might be a good thing if it turns out that our domain
> > > models are /really/ actually quite different.
> >
> > Sharing the interface at least would definitely be worthwhile, as
> > there does not seem to be a great deal of difference between the
> > generic one and the famfs specific one. Only implementing part of the
> > functionality that the generic one provides would be fine.
>
> Agreed. I'm coming around to thinking the most practical approach would be
> to share the GET_FMAP message/response, but to add a separate response
> format for Darrick's use case - when the time comes. In this patch set,
> that starts with 'struct fuse_famfs_fmap_header' and is followed by the
> approriate extent structures, serialized in the message. Collectively
> that's an fmap in message format.
Well in that case I might as well just plumb in the pieces I need as
separate fuse commands. fuse_args::opcode is u32, there's plenty of
space left.
> Side note: the current patch set sends back the logically-variable-sized
> fmap in a fixed-size message, but V2 of the series will address that;
> I got some help from Bernd there, but haven't finished it yet.
>
> So the next version of the patch set would, say, add a more generic first
> 'struct fmap_header' that would indicate whether the next item would be
> 'struct fuse_famfs_fmap_header' (i.e. my/famfs metadata) or some other
> to be codified metadata format. I'm going here because I'm dubious that
> we even *can* do grand-unified-fmap-metadata (or that we should try).
>
> This will require versioning the affected structures, unless we think
> the fmap-in-message structure can be opaque to the rest of fuse. @miklos,
> is there an example to follow regarding struct versioning in
> already-existing fuse structures?
/me is a n00b, but isn't that a simple matter of making sure that new
revisions change the structure size, and then you can key off of that?
> > > (Especially because I suspect that interleaving is the norm for memory,
> > > whereas we try to avoid that for disk filesystems.)
> >
> > So interleaved extents are just like normal ones except they repeat,
> > right? What about adding a special "repeat last N extent
> > descriptions" type of extent?
>
> It's a bit more than that. The comment at [1] makes it possible to understand
> the scheme, but I'd be happy to talk through it with you on a call if that
> seems helpful.
>
> An interleaved extent stripes data spread across N memory devices in raid 0
> format; the space from each device is described by a single simple extent
> (so it's contigous), but it's not consumed contiguously - it's consumed in
> fixed-sized chunks that precess across the devices. Notwithstanding that I
> couldn't explain it very well when we talked about it at LPC, I think I
> could make it pretty clear in a pretty brief call now.
>
> In any case, you have my word that it's actually quite elegant :D
> (seriously, but also with a smile...)
Admittedly the more I think about the interleaving in famfs vs straight
block mappings for disk filesystems, the more I think they ought to be
separate interfaces for code that solves different problems. Then both
our codebases will remain relatively cohesive.
> > > > But the current implementation does not contemplate partially cached fmaps.
> > > >
> > > > Adding notification could address revoking them post-haste (is that why
> > > > you're thinking about notifications? And if not can you elaborate on what
> > > > you're after there?).
> > >
> > > Yeah, invalidating the mapping cache at random places. If, say, you
> > > implement a clustered filesystem with iomap, the metadata server could
> > > inform the fuse server on the local node that a certain range of inode X
> > > has been written to, at which point you need to revoke any local leases,
> > > invalidate the pagecache, and invalidate the iomapping cache to force
> > > the client to requery the server.
> > >
> > > Or if your fuse server wants to implement its own weird operations (e.g.
> > > XFS EXCHANGE-RANGE) this would make that possible without needing to
> > > add a bunch of code to fs/fuse/ for the benefit of a single fuse driver.
> >
> > Wouldn't existing invalidation framework be sufficient?
> >
> > Thanks,
> > Miklos
>
> My current thinking is that Darrick's use case doesn't need GET_DAXDEV, but
> famfs does. I think Darrick's use case has one backing device, and that should
> be passed in at mount time. Correct me if you think that might be wrong.
Technically speaking iomap can operate on /any/ block or dax device as
long as you have a reference to them. Once I get more of the plumbing
sorted out I'll start thinking about how to handle multi-device
filesystems like XFS which can put file data on more than 1 block
device.
I was thinking that the fuse server could just send a REGISTER_DEVICE
notification to the fuse driver (I know, again with the notifications
:)), the kernel replies with a magic cookie, and that's what gets passed
in the {read,write,map}_dev field.
Right now I reconfigured fuse2fs to present itself as a "fuseblk" driver
so that at least we know that inode->i_sb->s_bdev is a valid pointer.
It turns out to be useful because the kernel sends FUSE_DESTROY commands
synchronously during unmount, which avoids the situation where umount
exits but the block device still can't be opened O_EXCL because the fuse
server program is still exiting. It may be useful for some day wiring
up some of the block device ops to fuse servers. Though I think it
might conflict with CONFIG_BLK_DEV_WRITE_MOUNTED=y
I just barely got directio writes and pagecache read/write working
through iomap today, though I'm still getting used to the fuse inode
locking model and sorting through the bugs. :)
(I wonder how nasty would it be to pass fds to the fuse kernel driver
from fuseblk servers?)
> Famfs doesn't necessarily have just one backing dev, which means that famfs
> could pass in the *primary* backing dev at mount time, but it would still
> need GET_DAXDEV to get the rest. But if I just use GET_FMAP every time, I
> only need one way to do this.
>
> I'll add a few more responses to Darrick's reply...
Hehhe onto that message go I.
--D
>
> Thanks,
> John
>
> [1] https://github.com/cxl-micron-reskit/famfs-linux/blob/c57553c4ca91f0634f137285840ab25be8a87c30/fs/fuse/famfs_kfmap.h#L13
>
>
next prev parent reply other threads:[~2025-05-13 4:03 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 1:33 [RFC PATCH 00/19] famfs: port into fuse John Groves
2025-04-21 1:33 ` [RFC PATCH 01/19] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2025-04-21 1:33 ` [RFC PATCH 02/19] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2025-04-21 1:33 ` [RFC PATCH 03/19] dev_dax_iomap: Save the kva from memremap John Groves
2025-04-21 1:33 ` [RFC PATCH 04/19] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2025-04-21 1:33 ` [RFC PATCH 05/19] dev_dax_iomap: export dax_dev_get() John Groves
2025-04-21 1:33 ` [RFC PATCH 06/19] dev_dax_iomap: (ignore!) Drop poisoned page warning in fs/dax.c John Groves
2025-04-21 1:33 ` [RFC PATCH 07/19] famfs_fuse: magic.h: Add famfs magic numbers John Groves
2025-04-21 1:33 ` [RFC PATCH 08/19] famfs_fuse: Kconfig John Groves
2025-04-21 1:33 ` [RFC PATCH 09/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2025-04-21 1:33 ` [RFC PATCH 10/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2025-04-23 1:36 ` Joanne Koong
2025-04-23 20:23 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 11/19] famfs_fuse: Basic famfs mount opts John Groves
2025-04-23 1:51 ` Joanne Koong
2025-04-23 20:19 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 12/19] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2025-05-02 5:48 ` Joanne Koong
2025-05-02 20:35 ` Darrick J. Wong
2025-05-12 16:28 ` John Groves
2025-05-22 15:45 ` Amir Goldstein
2025-05-23 0:30 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 13/19] famfs_fuse: Create files with famfs fmaps John Groves
2025-04-21 21:57 ` Darrick J. Wong
2025-04-21 22:31 ` John Groves
2025-04-24 13:43 ` John Groves
2025-04-24 14:38 ` Darrick J. Wong
2025-04-28 1:48 ` John Groves
2025-04-28 19:00 ` Darrick J. Wong
2025-05-06 16:56 ` Miklos Szeredi
2025-05-08 15:56 ` Darrick J. Wong
2025-05-13 9:14 ` Miklos Szeredi
2025-05-15 2:06 ` Darrick J. Wong
2025-05-16 10:06 ` Miklos Szeredi
2025-05-16 23:17 ` Darrick J. Wong
2025-05-12 19:51 ` John Groves
2025-05-13 4:03 ` Darrick J. Wong [this message]
2025-04-21 1:33 ` [RFC PATCH 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2025-04-21 3:43 ` Randy Dunlap
2025-04-21 20:57 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 15/19] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2025-04-21 1:33 ` [RFC PATCH 16/19] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2025-04-21 1:33 ` [RFC PATCH 17/19] famfs_fuse: Add famfs metadata documentation John Groves
2025-04-21 3:51 ` Randy Dunlap
2025-04-21 21:00 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 18/19] famfs_fuse: Add documentation John Groves
2025-04-22 2:10 ` Randy Dunlap
2025-04-28 1:50 ` John Groves
2025-04-21 1:33 ` [RFC PATCH 19/19] famfs_fuse: (ignore) debug cruft John Groves
2025-04-21 18:27 ` [RFC PATCH 00/19] famfs: port into fuse Darrick J. Wong
2025-04-21 22:00 ` John Groves
2025-04-22 1:25 ` Darrick J. Wong
2025-04-22 11:50 ` John Groves
2025-04-30 14:42 ` Alireza Sanaee
2025-05-01 2:13 ` John Groves
2025-05-21 22:30 ` John Groves
2025-05-21 23:11 ` Darrick J. Wong
2025-05-22 15:55 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250513040321.GO1035866@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=John@groves.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=ajayjoshi@micron.com \
--cc=amir73il@gmail.com \
--cc=arramesh@micron.com \
--cc=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=bschubert@ddn.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=jack@suse.cz \
--cc=jgroves@micron.com \
--cc=jlayton@kernel.org \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luis@igalia.com \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=pvorel@suse.cz \
--cc=rdunlap@infradead.org \
--cc=shajnocz@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox