public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dan Williams <djbw@kernel.org>
Cc: Gregory Price <gourry@gourry.net>,
	Joanne Koong <joannelkoong@gmail.com>,
	John Groves <John@groves.net>, Miklos Szeredi <miklos@szeredi.hu>,
	Bernd Schubert <bernd@bsbernd.com>,
	John Groves <john@jagalactic.com>,
	Dan J Williams <dan.j.williams@intel.com>,
	Bernd Schubert <bschubert@ddn.com>,
	Alison Schofield <alison.schofield@intel.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	David Hildenbrand <david@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Chen Linxuan <chenlinxuan@uniontech.com>,
	James Morse <james.morse@arm.com>, Fuad Tabba <tabba@google.com>,
	Sean Christopherson <seanjc@google.com>,
	Shivank Garg <shivankg@amd.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>,
	"venkataravis@micron.com" <venkataravis@micron.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V10 00/10] famfs: port into fuse
Date: Thu, 16 Apr 2026 15:43:31 -0700	[thread overview]
Message-ID: <20260416224331.GD114184@frogsfrogsfrogs> (raw)
In-Reply-To: <43d36427-4629-4712-a262-391e64006eb5@app.fastmail.com>

On Thu, Apr 16, 2026 at 01:53:27PM -0700, Dan Williams wrote:
> 
> 
> On Thu, Apr 16, 2026, at 1:14 PM, Gregory Price wrote:
> > On Thu, Apr 16, 2026 at 08:56:46AM -0700, Joanne Koong wrote:
> >> On Tue, Apr 14, 2026 at 5:10 PM John Groves <John@groves.net> wrote:
> >> >
> >> > There is a FUSE_DAX_FMAP capability that the kernel may advertise or not
> >> > at init time; this capability "is" the famfs GET_FMAP AND GET_DAXDEV
> >> > commands. In the future, if we find a way to use BPF (or some other
> >> > mechanism) to avoid needing those fuse messages, the kernel could be updated
> >> > to NEVER advertise the FUSE_DAX_FMAP capability. All of the famfs-specific
> >> > code could be taken out of kernels that never advertise that capability.
> >> 
> >> I’m not sure the capability bit can be used like that (though I am
> >> hoping it can!). As I understand it, once the kernel advertises a
> >> capability, it must continue supporting it in future kernels else
> >> userspace programs that rely on it will break.

So don't break fuse servers.  If you wanted to (say) get rid of
GET_FMAP in favor of IOMAP_BEGIN, you could alter libfuse to translate a
fuse server's ->get_fmap implementation into the equivalent
->iomap_begin, and eventually the kernel can stop making GET_FMAP calls
to userspace.

The trouble here is that I've also seen half a dozen projects vendoring
libfuse so that's a nightmare that will have to be dealt with.  But
maybe that doesn't even matter, because...

> > FUSE_DAX_FMAP is already conditional on CONFIG_FUSE_DAX, the kernel is
> > not required to continue advertising FUSE_DAX_FMAP in perpetuity.
> >
> > Setting CONFIG_FUSE_DAX=n does not mean userland "is broken", this would
> > only be the case if FUSE_DAX_FMAP was advertised but not actually
> > supported.

...the memory interleaving is a rather interesting quality of famfs.
There's no good way to express a formulaic meta-mapping in traditional
iomap parlance, and famfs needs that to interleave across memory
controllers/dimm boxen/whatever.  Throwing individual iomaps at the
kernel is a very inefficient way to do that.  So I don't think there's a
good reason to get rid of GET_FMAP at this time...

> > If DAX were removed from the kernel (unlikely, but stick with me) this
> > would be equivalent to permanently changing CONFIG_FUSE_DAX to always
> > off, and there would be no squabbles over whether that particular
> > change broke userland (there would be much strife over removing dax).

...however the strongest case (IMO) would be if (having merged famfs) we
then merge fuse-iomap after famfs.  Then we extend the existing
fuse-iomap-bpf prototype to allow per-mount and per-inode iomap bpf ops.
That enables us to analyze thoroughly the performance characteristics of:

a) Using GET_FMAP as-is

b) Uploading raw iomaps (HA)

c) Uploading a single bpf program to make iomaps, exchanging fmap-style
mapping data into a bpf map, and having the single bpf program walk
through the map

d) Uploading a custom bpf program per famfs file to make iomaps.  No
bpfmap required, but the setup and compilation are now much more complex

Then we'll finally know which approach is the best, having broken the
Gordian Knot of how to merge famfs and fuse-iomap.

If we decide that (c) or (d) are actually better, then guess what?  To
get any of the iomap functionality, you have to set an inode flag, and
that (FUSE_CAP_FAMFS && FUSE_CAP_IOMAP && FUSE_ATTR_IOMAP) is the signal
for "don't call GET_FMAP".  FUSE_CAP_FAMFS && (!FUSE_CAP_IOMAP ||
!FUSE_ATTR_IOMAP) means "call GET_FMAP".

Yes, we burn a couple of fuse command values to find out, but that's all.

(TBH I still dislike GET_DAXDEV, that really should just be another
application of backing files, and the backing file id gets passed to
GET_FMAP.)

What do you all think of doing that?

> > While not a deprecation method, this is what capability bits are
> > designed for. Same as cpuid capability bits - just because the bit is
> > there doesn't mean a processor is required to support it in perpetuity.
> >
> > They're only required to support it if the bit is turned on.
> >
> 
> Right, if the protocol on day one is "user space must ask which method
> is available", then userspace can not be surprised when one option
> disappears. So to give time for the bpf approach to mature the kernel
> can do something like "famfs and bpf  mapping support are available".
> In some future kernel the famfs native option disappears after a
> deprecation period.
> 
> When folks ask 10 years from now why this ever supported optionality
> the explanation is "oh because famfs enjoyed first mover advantage to
> prove out fs semantics layered on dax devices", or "turns out there
> are some cases where bpf is not fast enough but it still stops the
> proliferation of more in kernel mapping implementations".

Yes.  We're not *capable* of determining the best mechanism unless we
can start shipping these things to users to get their feedback.  Only
then can we iterate and make real improvements.

> Something like FUSE_DAX_FMAP is always available but the backend to
> that is optionally native vs bpf. ...or some other arrangement to make
> it clear that native might be gone someday.

--D

  reply	other threads:[~2026-04-16 22:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260331123702.35052-1-john@jagalactic.com>
2026-03-31 12:37 ` [PATCH V10 00/10] famfs: port into fuse John Groves
2026-03-31 12:38   ` [PATCH V10 01/10] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-03-31 12:38   ` [PATCH V10 02/10] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-03-31 12:38   ` [PATCH V10 03/10] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-03-31 12:38   ` [PATCH V10 04/10] famfs_fuse: Create files with famfs fmaps John Groves
2026-03-31 12:38   ` [PATCH V10 05/10] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-03-31 12:39   ` [PATCH V10 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-03-31 12:39   ` [PATCH V10 07/10] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-03-31 12:39   ` [PATCH V10 08/10] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-03-31 12:39   ` [PATCH V10 09/10] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-03-31 12:39   ` [PATCH V10 10/10] famfs_fuse: Add documentation John Groves
2026-04-01 15:15   ` [PATCH V10 00/10] famfs: port into fuse John Groves
2026-04-06 17:43   ` Joanne Koong
2026-04-10 14:46     ` John Groves
2026-04-10 15:24       ` Bernd Schubert
2026-04-10 18:38         ` John Groves
2026-04-10 19:44           ` Joanne Koong
2026-04-14 13:19             ` Miklos Szeredi
2026-04-14 13:41               ` John Groves
2026-04-14 14:18                 ` Miklos Szeredi
2026-04-14 15:23                   ` John Groves
2026-04-14 18:57                 ` Darrick J. Wong
2026-04-14 22:13                   ` Joanne Koong
2026-04-14 23:36                     ` Darrick J. Wong
2026-04-15  0:10                     ` John Groves
2026-04-16 15:56                       ` Joanne Koong
2026-04-16 20:14                         ` Gregory Price
2026-04-16 20:53                           ` Dan Williams
2026-04-16 22:43                             ` Darrick J. Wong [this message]
2026-04-17  0:44                               ` Joanne Koong
2026-04-17  1:24                           ` Joanne Koong
2026-04-14 22:20                   ` Gregory Price
2026-04-15  8:16                     ` David Hildenbrand (Arm)
2026-04-15 13:34                       ` Gregory Price
2026-04-15 14:04                         ` Miklos Szeredi
2026-04-15 15:10                           ` Matthew Wilcox
2026-04-15 15:28                             ` Darrick J. Wong
2026-04-15 15:32                             ` Gregory Price
2026-04-15 17:12                               ` Joanne Koong
2026-04-15 19:40                                 ` Gregory Price
2026-04-14 23:53                   ` John Groves
2026-04-15  0:15                     ` Darrick J. Wong
2026-04-15  8:57                       ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260416224331.GD114184@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=John@groves.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ackerleytng@google.com \
    --cc=ajayjoshi@micron.com \
    --cc=alison.schofield@intel.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=bagasdotme@gmail.com \
    --cc=bernd@bsbernd.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=chenlinxuan@uniontech.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@kernel.org \
    --cc=djbw@kernel.org \
    --cc=gourry@gourry.net \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=john@jagalactic.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=seanjc@google.com \
    --cc=shajnocz@redhat.com \
    --cc=shivankg@amd.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tabba@google.com \
    --cc=venkataravis@micron.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox