From: John Groves <John@groves.net>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Bernd Schubert <bernd@bsbernd.com>,
John Groves <john@jagalactic.com>,
Dan Williams <dan.j.williams@intel.com>,
Bernd Schubert <bschubert@ddn.com>,
Alison Schofield <alison.schofield@intel.com>,
John Groves <jgroves@micron.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
David Hildenbrand <david@kernel.org>,
Christian Brauner <brauner@kernel.org>,
Randy Dunlap <rdunlap@infradead.org>,
Jeff Layton <jlayton@kernel.org>,
Amir Goldstein <amir73il@gmail.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Stefan Hajnoczi <shajnocz@redhat.com>,
Josef Bacik <josef@toxicpanda.com>,
Bagas Sanjaya <bagasdotme@gmail.com>,
Chen Linxuan <chenlinxuan@uniontech.com>,
James Morse <james.morse@arm.com>, Fuad Tabba <tabba@google.com>,
Sean Christopherson <seanjc@google.com>,
Shivank Garg <shivankg@amd.com>,
Ackerley Tng <ackerleytng@google.com>,
Gregory Price <gourry@gourry.net>,
Aravind Ramesh <arramesh@micron.com>,
Ajay Joshi <ajayjoshi@micron.com>,
"venkataravis@micron.com" <venkataravis@micron.com>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
djbw@kernel.org
Subject: Re: [PATCH V10 00/10] famfs: port into fuse
Date: Tue, 14 Apr 2026 19:10:38 -0500 [thread overview]
Message-ID: <ad7Tps4tkNbndd9Z@groves.net> (raw)
In-Reply-To: <CAJnrk1ZgcMuwfMpT1fXvUwBBiq9eWFHWVeOFQFFKiamGGe1RJg@mail.gmail.com>
On 26/04/14 03:13PM, Joanne Koong wrote:
> On Tue, Apr 14, 2026 at 11:57 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Apr 14, 2026 at 08:41:42AM -0500, John Groves wrote:
> > > On 26/04/14 03:19PM, Miklos Szeredi wrote:
> > > > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > > Overall, my intention with bringing this up is just to make sure we're
> > > > > at least aware of this alternative before anything is merged and
> > > > > permanent. If Miklos and you think we should land this series, then
> > > > > I'm on board with that.
> > > >
> > > > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > > > absolutely necessary. This was the main sticking point originally,
> > > > but there seemed to be no better alternative.
> > > >
> > > > However with the bpf approach this would be gone, which is great.
> >
> > Well... you can't get away with having *no* mapping interface at all.
>
> Yes but the mapping interface should be *generic*, not one that is so
> specifically tailored to one server. fuse will have to support this
> forever.
Mapping interfaces being generic is a nice idea, but I'm no sure it's
realistic in a generalized sense. But other mitigating comments below.
>
> > You still have to define a UABI that BPF programs can use to convey
> > mapping data into fsdax/iomap. BTF is a nice piece of work that smooths
> > over minor fluctuations in struct layout between a running kernel and
> > a precompiled BPF program, but fundamentally we still need a fuse-native
> > representation.
> >
> > That last sentence was an indirect way of saying: No, we're not going
> > to export struct iomap to userspace. The fuse-iomap patchset provides
> > all the UABI pieces we need for regular filesystems (ext4) and hardware
> > adjacent filesystems (famfs) to exchange file mapping data with the
> > kernel. This has been out for review since last October, but the lack
> > of engagement with that patchset (or its February resubmission) doesn't
> > leave me with confidence that any of it is going anywhere.
> >
> > Note: The reason for bolting BPF atop fuse-iomap is so that famfs can
> > upload bpf programs to generate interleaved mappings. It's not so hard
> > to convert famfs' iomapping paths to use fuse-iomap, but I haven't
> > helped him do that because:
> >
> > a) I have no idea what Miklos' thoughts are about merging any of the
> > famfs stuff.
> >
> > b) I also have no idea what his thoughts are about fuse-iomap. The
> > sparse replies are not encouraging.
> >
> > c) It didn't seem fair to John to make him take on a whole new patchset
> > dependency given (a) and (b).
> >
> > d) Nobody ever replied to my reply to the LSFMM thread about "can we do
> > some code review of fuse iomap without waiting three months for LSFMM?"
> > I've literally done nothing with fuse-iomap for two of the three months
> > requested.
> >
> > > > So let us please at least have a try at this. I'm not into bpf yet,
> > > > but willing to learn.
> >
> > I sent out the patches to enable exactly this sort of experimentation
> > two months ago, and have not received any responses:
> >
> > https://lore.kernel.org/linux-fsdevel/177188736765.3938194.6770791688236041940.stgit@frogsfrogsfrogs/
> >
> > I would like to say this as gently as possible: I don't know what the
> > problem here is, Miklos -- are you uninterested in the work? Do you
> > have too many other things to do inside RH that you can't talk about?
> > Is it too difficult to figure out how the iomap stuff fits into the rest
> > of the fuse codebase? Do you need help from the rest of us to get
> > reviews done? Is there something else with which I could help?
> >
> > Because ... over the past few years, many of my team's filesystem
> > projects have endured monthslong review cycles and often fail to get
> > merged. This has led to burnout and frustration among my teammates such
> > that many of them chose to move on to other things. For the remaining
> > people, it was very difficult to justify continuing headcount when
> > progress on projects is so slow that individuals cannot achieve even one
> > milestone per quarter on any project.
> >
> > There's now nobody left here but me.
> >
> > I'm not blaming you (Miklos) for any of this, but that is the current
> > deplorable state of things.
> >
> > > > Thanks,
> > > > Miklos
> > >
> > > Thanks for responding...
> > >
> > > My short response: Noooooooooo!!!!!!
> > >
> > > I very strongly object to making this a prerequisite to merging. This
> > > is an untested idea that will certainly delay us by at least a couple
> > > of merge windows when products are shipping now, and the existing approach
> > > has been in circulation for a long time. It is TOO LATE!!!!!!
> >
> > /me notes that has "we're shipping so you have to merge it over peoples'
> > concerns" rarely carries the day in LKML land, and has never ended well
> > in the few cases that it happens. As Ted is fond of saying, this is a
> > team sport, not an individual effort. Unfortunately, to abuse your
> > sports metaphor, we all play for the ******* A's.
> >
> > That said, you're clearly pissed at the goalposts changing yet again,
> > and that's really not fair that we collectively keep moving them.
> >
> > It's a rotten situation that I could have even helped you to solve both
> > our problems via fuse-iomap, but I just couldn't motivate myself to
> > entwine our two projects until the technical direction questions got
> > answered.
> >
> > > Famfs is not a science project, it's enablement for actual products and
> > > early versions are available now!!!
> > >
> > > That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
> >
> > Heck, the fuse command field is a u32. There are plenty of numberspace
> > left, and the kernel can just *stop issuing them*.
>
> I don't think the problem is the command field. As I understand it, if
> this lands and is converted over later, none of the famfs code in this
> series can be removed from fuse. If fuse has native non-bpf support
> for famfs, then it will always need to have that. That's the part that
> worries me.
I believe this basic premise is completely wrong. Here is why:
There is a FUSE_DAX_FMAP capability that the kernel may advertise or not
at init time; this capability "is" the famfs GET_FMAP AND GET_DAXDEV
commands. In the future, if we find a way to use BPF (or some other
mechanism) to avoid needing those fuse messages, the kernel could be updated
to NEVER advertise the FUSE_DAX_FMAP capability. All of the famfs-specific
code could be taken out of kernels that never advertise that capability.
Simple, really. Can't re-use the message opcodes, but as Darrick pointed out
those are not a scarce resource.
>
> >
> > > What are the risks of converting to BPF?
>
> I think maybe there is a misinterpretation of what the alternative
> approach entails. From my point of view, the alternative approach is
> not that different from what is already in this series. The only piece
> of the famfs logic that would need to use bpf is the logic for
> finding/computing the extent mappings (which is the famfs-specific
> logic that would not be applicable to any other server). That famfs
> bpf code is minimal and already written [1], as it is just the logic
> that is in patch 6 [2] in this series copied over. No other part of
> famfs touches bpf. The rest is renaming the functions in
> fs/fuse/famfs.c to generic fuse_iomap_dax_XXX names (the logic is the
> same logic in this series, eg invoking the lower-level calls to
> dax_iomap_rw/fault/etc) and moving the daxdev setup/initialization to
> connection initialization time where the server passes that daxdev
> setup info/configs upfront. I don't think this would delay things by
> several merge windows, as the code is already mostly written. If it
> would be helpful, I can clean up what's in the prototype and send that
> out.
>
> I think the part that is not clear yet and needs to be verified is
> whether this approach runs into any technical limitations on famfs's
> production workloads. For example, does the overhead of using bpf maps
> lead to a noticeable performance drop on real workloads? In the
> future, will there be too many extent mappings on high-scale systems
> to make this feasible? etc. If there are technical reasons why the
> famfs logic has to be in fuse, then imo we should figure that out and
> ideally that's the discussion we should be having. I am not a cxl
> expert so perhaps there is something missing in the approach that
> makes it not sufficient on production systems. If we don't end up
> going with the alternative approach, I still think this series should
> try to make the famfs uapi additions to fuse as generic as possible
> since that will be irreversible.
>
> If we expedited the alternative approach in terms of reviewing and
> merging, would that suffice? Is the main pushback the timing of it, eg
> that it would take too long to get reviewed, merged, and shipped?
>
> > >
> > > - I don't know how to do it - so it'll be slow (kinda like my fuse learning
> > > curve cost about a year because this is not that similar to anything
> > > else that was already in fuse.
> >
> > ...and per above, BPF isn't some magic savior that avoids the expansion
> > of the UABI.
>
> It doesn't avoid the expansion of the UABI but it makes the UABI
> generic (eg plenty of future servers can/will use the generic iomap
> layer).
Um, advertised capabilities allow contraction of the UABI-handling code with
only some small cruft. Code that is only reachable in the presence of dead
capability can totally be removed.
>
> >
> > > - Those of us who are involved don't fully understand either the security
> > > or performance implications of this. It
> >
> > Correct. I sure think it's swell that people can inject IR programs
> > that jit/link into the kernel. Don't ask which secondary connotation of
> > "swell" I'm talking about.
>
> bpf is used elsewhere in the kernel (eg networking, scheduling). If it
> is the case that it is unsafe (which maybe it is, I don't know), then
> wouldn't those other areas have the same issues?
See my long comment to Darrick's prior email.
I suspect that this would be the only place BPF has been tried for a vma
fault handler. That is a special, performance critical path - especially
for famfs. In discussion with the right people we can probably reason
through whether this is a non-starter or not.
>
> >
> > > - Famfs is enabling access to memory and mapping fault handling must be
> > > at "memory speed". We know that BPF walks some data structures when a
> > > program executes. That exposes us to additional serialized L3 cache
> > > misses each time we service a mapping fault (any TLB & page table miss).
> > > This should be studied side-by-side with the existing approach under
> > > multiple loads before being adopted for production.
> >
> > Yes, it should. AFAICT if one switched to a per-inode bpf program, then
> > you could do per-inode bpf programs. Then you don't even need the bpf
> > map, and the ->iomap_begin becomes an indirect call into JITted x86_64
> > math code.
> >
> > (The downside is that dyn code can't be meaningfully signed, requires
> > clang on the system, and you have to deal with inode eviction issues.)
> >
> > > - This has never been done in production, and we're throwing it in the way
> > > of a project that has been soaking for years and needs to support early
> > > shipments of products.
> >
> > Correct. I haven't even implemented BPF-iomap for fuse4fs. This BPF
> > integration stuff is *highly* experimental code.
>
> I think what fuse4fs needs for bpf is significantly more complicated
> and intensive than what famfs needs. For famfs, the extent mapping
> logic is straightforward computation.
>
> >
> > > If this is the only path, I'd like to revive famfs as a standalone file
> > > system. I'm still maintaining that and it's still in use.
> >
> > Honestly, you should probably just ship that to your users. As long as
> > the ondisk format doesn't change much, switching the implementation at a
> > later date is at least still possible.
>
> I recognize this is an unfair situation John as you've already spent
> years working on this and did what the community asked with rewriting
> it. What I'm hoping to convey is that the approach where the extent
> computing/finding logic gets moved to bpf is not radically different
> from the famfs logic already in this patchset. In my view, moving this
> logic to bpf is more advantageous for both fuse *and* famfs
> (decoupling famfs releases from kernel releases) - it would be great
> to consider this on technical merits if expediting the timeline of the
> alternative approach would suffice.
>
> Thanks,
> Joanne
>
> [1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
> [2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/
>
> >
> > --D
Regards,
John
next prev parent reply other threads:[~2026-04-15 0:11 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260331123702.35052-1-john@jagalactic.com>
2026-03-31 12:37 ` [PATCH V10 00/10] famfs: port into fuse John Groves
2026-03-31 12:38 ` [PATCH V10 01/10] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2026-03-31 12:38 ` [PATCH V10 02/10] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2026-03-31 12:38 ` [PATCH V10 03/10] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2026-03-31 12:38 ` [PATCH V10 04/10] famfs_fuse: Create files with famfs fmaps John Groves
2026-03-31 12:38 ` [PATCH V10 05/10] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2026-03-31 12:39 ` [PATCH V10 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2026-03-31 12:39 ` [PATCH V10 07/10] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2026-03-31 12:39 ` [PATCH V10 08/10] famfs_fuse: Add DAX address_space_operations with noop_dirty_folio John Groves
2026-03-31 12:39 ` [PATCH V10 09/10] famfs_fuse: Add famfs fmap metadata documentation John Groves
2026-03-31 12:39 ` [PATCH V10 10/10] famfs_fuse: Add documentation John Groves
2026-04-01 15:15 ` [PATCH V10 00/10] famfs: port into fuse John Groves
2026-04-06 17:43 ` Joanne Koong
2026-04-10 14:46 ` John Groves
2026-04-10 15:24 ` Bernd Schubert
2026-04-10 18:38 ` John Groves
2026-04-10 19:44 ` Joanne Koong
2026-04-14 13:19 ` Miklos Szeredi
2026-04-14 13:41 ` John Groves
2026-04-14 14:18 ` Miklos Szeredi
2026-04-14 15:23 ` John Groves
2026-04-14 18:57 ` Darrick J. Wong
2026-04-14 22:13 ` Joanne Koong
2026-04-14 23:36 ` Darrick J. Wong
2026-04-15 0:10 ` John Groves [this message]
2026-04-14 22:20 ` Gregory Price
2026-04-14 23:53 ` John Groves
2026-04-15 0:15 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad7Tps4tkNbndd9Z@groves.net \
--to=john@groves.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=ackerleytng@google.com \
--cc=ajayjoshi@micron.com \
--cc=alison.schofield@intel.com \
--cc=amir73il@gmail.com \
--cc=arramesh@micron.com \
--cc=bagasdotme@gmail.com \
--cc=bernd@bsbernd.com \
--cc=brauner@kernel.org \
--cc=bschubert@ddn.com \
--cc=chenlinxuan@uniontech.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=david@kernel.org \
--cc=djbw@kernel.org \
--cc=djwong@kernel.org \
--cc=gourry@gourry.net \
--cc=jack@suse.cz \
--cc=james.morse@arm.com \
--cc=jgroves@micron.com \
--cc=jlayton@kernel.org \
--cc=joannelkoong@gmail.com \
--cc=john@jagalactic.com \
--cc=josef@toxicpanda.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=rdunlap@infradead.org \
--cc=seanjc@google.com \
--cc=shajnocz@redhat.com \
--cc=shivankg@amd.com \
--cc=skhan@linuxfoundation.org \
--cc=tabba@google.com \
--cc=venkataravis@micron.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox