public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Groves <John@groves.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Dan Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Luis Henriques <luis@igalia.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Petr Vorel <pvorel@suse.cz>, Brian Foster <bfoster@redhat.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Amir Goldstein <amir73il@gmail.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>
Subject: Re: [RFC PATCH 13/19] famfs_fuse: Create files with famfs fmaps
Date: Mon, 12 May 2025 21:03:21 -0700	[thread overview]
Message-ID: <20250513040321.GO1035866@frogsfrogsfrogs> (raw)
In-Reply-To: <aytnzv4tmp7fdvpgxdfoe2ncu7qaxlp2svsxiskfnrvdnknhmp@uu4ifgc6aj34>

On Mon, May 12, 2025 at 02:51:45PM -0500, John Groves wrote:
> On 25/05/06 06:56PM, Miklos Szeredi wrote:
> > On Mon, 28 Apr 2025 at 21:00, Darrick J. Wong <djwong@kernel.org> wrote:
> > 
> > > <nod> I don't know what Miklos' opinion is about having multiple
> > > fusecmds that do similar things -- on the one hand keeping yours and my
> > > efforts separate explodes the amount of userspace abi that everyone must
> > > maintain, but on the other hand it then doesn't couple our projects
> > > together, which might be a good thing if it turns out that our domain
> > > models are /really/ actually quite different.
> > 
> > Sharing the interface at least would definitely be worthwhile, as
> > there does not seem to be a great deal of difference between the
> > generic one and the famfs specific one.  Only implementing part of the
> > functionality that the generic one provides would be fine.
> 
> Agreed. I'm coming around to thinking the most practical approach would be
> to share the GET_FMAP message/response, but to add a separate response
> format for Darrick's use case - when the time comes. In this patch set, 
> that starts with 'struct fuse_famfs_fmap_header' and is followed by the 
> approriate extent structures, serialized in the message. Collectively 
> that's an fmap in message format.

Well in that case I might as well just plumb in the pieces I need as
separate fuse commands.  fuse_args::opcode is u32, there's plenty of
space left.

> Side note: the current patch set sends back the logically-variable-sized 
> fmap in a fixed-size message, but V2 of the series will address that; 
> I got some help from Bernd there, but haven't finished it yet.
> 
> So the next version of the patch set would, say, add a more generic first
> 'struct fmap_header' that would indicate whether the next item would be
> 'struct fuse_famfs_fmap_header' (i.e. my/famfs metadata) or some other
> to be codified metadata format. I'm going here because I'm dubious that
> we even *can* do grand-unified-fmap-metadata (or that we should try).
> 
> This will require versioning the affected structures, unless we think
> the fmap-in-message structure can be opaque to the rest of fuse. @miklos,
> is there an example to follow regarding struct versioning in 
> already-existing fuse structures?

/me is a n00b, but isn't that a simple matter of making sure that new
revisions change the structure size, and then you can key off of that?

> > > (Especially because I suspect that interleaving is the norm for memory,
> > > whereas we try to avoid that for disk filesystems.)
> > 
> > So interleaved extents are just like normal ones except they repeat,
> > right?  What about adding a special "repeat last N extent
> > descriptions" type of extent?
> 
> It's a bit more than that. The comment at [1] makes it possible to understand
> the scheme, but I'd be happy to talk through it with you on a call if that
> seems helpful.
> 
> An interleaved extent stripes data spread across N memory devices in raid 0
> format; the space from each device is described by a single simple extent 
> (so it's contigous), but it's not consumed contiguously - it's consumed in 
> fixed-sized chunks that precess across the devices. Notwithstanding that I 
> couldn't explain it very well when we talked about it at LPC, I think I 
> could make it pretty clear in a pretty brief call now.
> 
> In any case, you have my word that it's actually quite elegant :D
> (seriously, but also with a smile...)

Admittedly the more I think about the interleaving in famfs vs straight
block mappings for disk filesystems, the more I think they ought to be
separate interfaces for code that solves different problems.  Then both
our codebases will remain relatively cohesive.

> > > > But the current implementation does not contemplate partially cached fmaps.
> > > >
> > > > Adding notification could address revoking them post-haste (is that why
> > > > you're thinking about notifications? And if not can you elaborate on what
> > > > you're after there?).
> > >
> > > Yeah, invalidating the mapping cache at random places.  If, say, you
> > > implement a clustered filesystem with iomap, the metadata server could
> > > inform the fuse server on the local node that a certain range of inode X
> > > has been written to, at which point you need to revoke any local leases,
> > > invalidate the pagecache, and invalidate the iomapping cache to force
> > > the client to requery the server.
> > >
> > > Or if your fuse server wants to implement its own weird operations (e.g.
> > > XFS EXCHANGE-RANGE) this would make that possible without needing to
> > > add a bunch of code to fs/fuse/ for the benefit of a single fuse driver.
> > 
> > Wouldn't existing invalidation framework be sufficient?
> > 
> > Thanks,
> > Miklos
> 
> My current thinking is that Darrick's use case doesn't need GET_DAXDEV, but
> famfs does. I think Darrick's use case has one backing device, and that should
> be passed in at mount time. Correct me if you think that might be wrong.

Technically speaking iomap can operate on /any/ block or dax device as
long as you have a reference to them.  Once I get more of the plumbing
sorted out I'll start thinking about how to handle multi-device
filesystems like XFS which can put file data on more than 1 block
device.

I was thinking that the fuse server could just send a REGISTER_DEVICE
notification to the fuse driver (I know, again with the notifications
:)), the kernel replies with a magic cookie, and that's what gets passed
in the {read,write,map}_dev field.

Right now I reconfigured fuse2fs to present itself as a "fuseblk" driver
so that at least we know that inode->i_sb->s_bdev is a valid pointer.
It turns out to be useful because the kernel sends FUSE_DESTROY commands
synchronously during unmount, which avoids the situation where umount
exits but the block device still can't be opened O_EXCL because the fuse
server program is still exiting.  It may be useful for some day wiring
up some of the block device ops to fuse servers.  Though I think it
might conflict with CONFIG_BLK_DEV_WRITE_MOUNTED=y

I just barely got directio writes and pagecache read/write working
through iomap today, though I'm still getting used to the fuse inode
locking model and sorting through the bugs. :)

(I wonder how nasty would it be to pass fds to the fuse kernel driver
from fuseblk servers?)

> Famfs doesn't necessarily have just one backing dev, which means that famfs
> could pass in the *primary* backing dev at mount time, but it would still
> need GET_DAXDEV to get the rest. But if I just use GET_FMAP every time, I
> only need one way to do this.
> 
> I'll add a few more responses to Darrick's reply...

Hehhe onto that message go I.

--D

> 
> Thanks,
> John
> 
> [1] https://github.com/cxl-micron-reskit/famfs-linux/blob/c57553c4ca91f0634f137285840ab25be8a87c30/fs/fuse/famfs_kfmap.h#L13
> 
> 

  reply	other threads:[~2025-05-13  4:03 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-21  1:33 [RFC PATCH 00/19] famfs: port into fuse John Groves
2025-04-21  1:33 ` [RFC PATCH 01/19] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2025-04-21  1:33 ` [RFC PATCH 02/19] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2025-04-21  1:33 ` [RFC PATCH 03/19] dev_dax_iomap: Save the kva from memremap John Groves
2025-04-21  1:33 ` [RFC PATCH 04/19] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2025-04-21  1:33 ` [RFC PATCH 05/19] dev_dax_iomap: export dax_dev_get() John Groves
2025-04-21  1:33 ` [RFC PATCH 06/19] dev_dax_iomap: (ignore!) Drop poisoned page warning in fs/dax.c John Groves
2025-04-21  1:33 ` [RFC PATCH 07/19] famfs_fuse: magic.h: Add famfs magic numbers John Groves
2025-04-21  1:33 ` [RFC PATCH 08/19] famfs_fuse: Kconfig John Groves
2025-04-21  1:33 ` [RFC PATCH 09/19] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2025-04-21  1:33 ` [RFC PATCH 10/19] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2025-04-23  1:36   ` Joanne Koong
2025-04-23 20:23     ` John Groves
2025-04-21  1:33 ` [RFC PATCH 11/19] famfs_fuse: Basic famfs mount opts John Groves
2025-04-23  1:51   ` Joanne Koong
2025-04-23 20:19     ` John Groves
2025-04-21  1:33 ` [RFC PATCH 12/19] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2025-05-02  5:48   ` Joanne Koong
2025-05-02 20:35     ` Darrick J. Wong
2025-05-12 16:28     ` John Groves
2025-05-22 15:45       ` Amir Goldstein
2025-05-23  0:30         ` John Groves
2025-04-21  1:33 ` [RFC PATCH 13/19] famfs_fuse: Create files with famfs fmaps John Groves
2025-04-21 21:57   ` Darrick J. Wong
2025-04-21 22:31     ` John Groves
2025-04-24 13:43   ` John Groves
2025-04-24 14:38     ` Darrick J. Wong
2025-04-28  1:48       ` John Groves
2025-04-28 19:00         ` Darrick J. Wong
2025-05-06 16:56           ` Miklos Szeredi
2025-05-08 15:56             ` Darrick J. Wong
2025-05-13  9:14               ` Miklos Szeredi
2025-05-15  2:06                 ` Darrick J. Wong
2025-05-16 10:06                   ` Miklos Szeredi
2025-05-16 23:17                     ` Darrick J. Wong
2025-05-12 19:51             ` John Groves
2025-05-13  4:03               ` Darrick J. Wong [this message]
2025-04-21  1:33 ` [RFC PATCH 14/19] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2025-04-21  3:43   ` Randy Dunlap
2025-04-21 20:57     ` John Groves
2025-04-21  1:33 ` [RFC PATCH 15/19] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2025-04-21  1:33 ` [RFC PATCH 16/19] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2025-04-21  1:33 ` [RFC PATCH 17/19] famfs_fuse: Add famfs metadata documentation John Groves
2025-04-21  3:51   ` Randy Dunlap
2025-04-21 21:00     ` John Groves
2025-04-21  1:33 ` [RFC PATCH 18/19] famfs_fuse: Add documentation John Groves
2025-04-22  2:10   ` Randy Dunlap
2025-04-28  1:50     ` John Groves
2025-04-21  1:33 ` [RFC PATCH 19/19] famfs_fuse: (ignore) debug cruft John Groves
2025-04-21 18:27 ` [RFC PATCH 00/19] famfs: port into fuse Darrick J. Wong
2025-04-21 22:00   ` John Groves
2025-04-22  1:25     ` Darrick J. Wong
2025-04-22 11:50       ` John Groves
2025-04-30 14:42 ` Alireza Sanaee
2025-05-01  2:13   ` John Groves
2025-05-21 22:30 ` John Groves
2025-05-21 23:11   ` Darrick J. Wong
2025-05-22 15:55   ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250513040321.GO1035866@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=John@groves.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayjoshi@micron.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=jack@suse.cz \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luis@igalia.com \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=pvorel@suse.cz \
    --cc=rdunlap@infradead.org \
    --cc=shajnocz@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox