public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: John Groves <john@groves.net>,
	Amir Goldstein <amir73il@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	"f-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Bernd Schubert <bernd@bsbernd.com>,
	Luis Henriques <luis@igalia.com>,
	Horst Birthelmer <horst@birthelmer.de>
Subject: Re: [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more
Date: Fri, 20 Feb 2026 16:37:56 -0800	[thread overview]
Message-ID: <20260221003756.GD11076@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1YMqDKA5gDZasrxGjJtfdbhmjxX5uhUv=OSPyA=G5EE+Q@mail.gmail.com>

On Wed, Feb 11, 2026 at 08:46:26PM -0800, Joanne Koong wrote:
> On Fri, Feb 6, 2026 at 4:22 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Fri, Feb 6, 2026 at 12:48 PM John Groves <john@groves.net> wrote:
> > >
> > > On 26/02/05 09:52PM, Darrick J. Wong wrote:
> > > > On Thu, Feb 05, 2026 at 10:27:52AM +0100, Amir Goldstein wrote:
> > > > > On Thu, Feb 5, 2026 at 4:33 AM John Groves <john@jagalactic.com> wrote:
> > > > > >
> > > > > > On 26/02/04 11:06AM, Darrick J. Wong wrote:
> > > > > >
> > > > > > [ ... ]
> > > > > >
> > > > > > > >  - famfs: export distributed memory
> > > > > > >
> > > > > > > This has been, uh, hanging out for an extraordinarily long time.
> > > > > >
> > > > > > Um, *yeah*. Although a significant part of that time was on me, because
> > > > > > getting it ported into fuse was kinda hard, my users and I are hoping we
> > > > > > can get this upstreamed fairly soon now. I'm hoping that after the 6.19
> > > > > > merge window dust settles we can negotiate any needed changes etc. and
> > > > > > shoot for the 7.0 merge window.
> > > >
> > > > I think we've all missed getting merged for 7.0 since 6.19 will be
> > > > released in 3 days. :/
> > > >
> > > > (Granted most of the maintainers I know are /much/ less conservative
> > > > than I was about the schedule)
> > >
> > > Doh - right you are...
> > >
> > > >
> > > > > I think that the work on famfs is setting an example, and I very much
> > > > > hope it will be a good example, of how improving existing infrastructure
> > > > > (FUSE) is a better contribution than adding another fs to the pile.
> > > >
> > > > Yeah.  Joanne and I spent a couple of days this week coprogramming a
> > > > prototype of a way for famfs to create BPF programs to handle
> > > > INTERLEAVED_EXTENT files.  We might be ready to show that off in a
> > > > couple of weeks, and that might be a way to clear up the
> > > > GET_FMAP/IOMAP_BEGIN logjam at last.
> > >
> > > I'd love to learn more about this; happy to do a call if that's a
> > > good way to get me briefed.
> > >
> > > I [generally but not specifically] understand how this could avoid
> > > GET_FMAP, but not GET_DAXDEV.
> > >
> > > But I'm not sure it could (or should) avoid dax_iomap_rw() and
> > > dax_iomap_fault(). The thing is that those call my begin() function
> > > to resolve an offset in a file to an offset on a daxdev, and then
> > > dax completes the fault or memcpy. In that dance, famfs never knows
> > > the kernel address of the memory at all (also true of xfs in fs-dax
> > > mode, unless that's changed fairly recently). I think that's a pretty
> > > decent interface all in all.
> > >
> > > Also: dunno whether y'all have looked at the dax patches in the famfs
> > > series, but the solution to working with Alistair's folio-ification
> > > and cleanup of the dax layer (which set me back months) was to create
> > > drivers/dax/fsdev.c, which, when bound to a daxdev in place of
> > > drivers/dax/device.c, configures folios & pages compatibly with
> > > fs-dax. So I kinda think I need the dax_iomap* interface.
> > >
> > > As usual, if I'm overlooking something let me know...
> >
> > Hi John,
> >
> > The conversation started [1] on Darrick's containerization patchset
> > about using bpf to a) avoid extra requests / context switching for
> > ->iomap_begin and ->iomap_end calls and b) offload what would
> > otherwise have to be hard-coded kernel logic into userspace, which
> > gives userspace more flexibility / control with updating the logic and
> > is less of a maintenance burden for fuse. There was some musing [2]
> > about whether with bpf infrastructure added, it would allow famfs to
> > move all famfs-specific logic to userspace/bpf.
> >
> > I agree that it makes sense for famfs to go through dax iomap
> > interfaces. imo it seems cleanest if fuse has a generic iomap
> > interface with iomap dax going through that plumbing, and any
> > famfs-specific logic that would be needed beyond that (eg computing
> > the interleaved mappings) being moved to custom famfs bpf programs. I
> > started trying to implement this yesterday afternoon because I wanted
> > to make sure it would actually be doable for the famfs logic before
> > bringing it up and I didn't want to derail your project. So far I only
> > have the general iomap interface for fuse added with dax operations
> > going through dax_iomap* and haven't tried out integrating the famfs
> > GET_FMAP/GET_DAXDEV bpf program part yet but I'm planning/hoping to
> > get to that early next week. The work I did with Darrick this week was
> > on getting a server's bpf programs hooked up to fuse through bpf links
> > and Darrick has fleshed that out and gotten that working now. If it
> > turns out famfs can go through a generic iomap fuse plumbing layer,
> > I'd be curious to hear your thoughts on which approach you'd prefer.
> 
> I put together a quick prototype to test this out - this is what it
> looks like with fuse having a generic iomap interface that supports
> dax [1], and the famfs custom logic moved to a bpf program [2]. I

The bpf maps that you've used to upload per-inode data into the kernel
is a /much/ cleaner method than custom-compiling C into BPF at runtime!
You can statically compile the BPF object code into the fuse server,
which means that (a) you can take advantage of the bpftool skeletons,
and (b) you can in theory vendor-sign the BPF code if and when that
becomes a requirement.

I think that's way better than having to put vmlinux.h and
fuse_iomap_bpf.h on the deployed system.  Though there's one hitch in
example/Makefile:

vmlinux.h:
	$(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > $@

The build system isn't necessarily running the same kernel as the deploy
images.  It might be for Meta, but it's not unheard of for our build
system to be running (say) OL10+UEK8 kernel, but the build target is OL8
and UEK7.

There doesn't seem to be any standardization across distros for where a
vmlinux.h file might be found.  Fedora puts it under
/usr/src/$unamestuf, Debian puts it in /usr/include/$gcc_triple, and I
guess SUSE doesn't ship it at all?

That's going to be a headache for deployment as I've been muttering for
a couple of weeks now. :(

Maybe we could reduce the fuse-iomap bpf definitions to use only
cardinal types and the types that iomap itself defines.  That might not
be too hard right now because bpf functions reuse structures from
include/uapi/fuse.h, which currently use uint{8,16,32,64}_t.  It'll get
harder if that __uintXX_t -> __uXX transition actually happens.

But getting back to the famfs bpf stuff, I think doing the interleaved
mappings via BPF gives the famfs server a lot more flexibility in terms
of what it can do when future hardware arrives with even weirder
configurations.

--D

> didn't change much, I just moved around your famfs code to the bpf
> side. The kernel side changes are in [3] and the libfuse changes are
> in [4].
> 
> For testing out the prototype, I hooked it up to passthrough_hp to
> test running the bpf program and verify that it is able to find the
> extent from the bpf map. In my opinion, this makes the fuse side
> infrastructure cleaner and more extendable for other servers that will
> want to go through dax iomap in the future, but I think this also has
> a few benefits for famfs. Instead of needing to issue a FUSE_GET_FMAP
> request after a file is opened, the server can directly populate the
> metadata map from userspace with the mapping info when it processes
> the FUSE_OPEN request, which gets rid of the roundtrip cost. The
> server can dynamically update the metadata at any time from userspace
> if the mapping info needs to change in the future. For setting up the
> daxdevs, I moved your logic to the init side, where the server passes
> the daxdev info upfront through an IOMAP_CONFIG exchange with the
> kernel initializing the daxdevs based off that info. I think this will
> also make deploying future updates for famfs easier, as updating the
> logic won't need to go through the upstream kernel mailing list
> process and deploying updates won't require a new kernel release.
> 
> These are just my two cents based on my (cursory) understanding of
> famfs. Just wanted to float this alternative approach in case it's
> useful.
> 
> Thanks,
> Joanne
> 
> [1] https://github.com/joannekoong/linux/commit/b8f9d284a6955391f00f576d890e1c1ccc943cfd
> [2] https://github.com/joannekoong/libfuse/commit/444fa27fa9fd2118a0dc332933197faf9bbf25aa
> [3] https://github.com/joannekoong/linux/commits/prototype_generic_iomap_dax/
> [4] https://github.com/joannekoong/libfuse/commits/famfs_bpf/
> 
> >
> > Thanks,
> > Joanne
> >
> > [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#md1b8003a109760d8ee1d5397e053673c1978ed4d
> > [2] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#u
> >
> > >
> > > Regards,
> > > John
> > >
> 

  reply	other threads:[~2026-02-21  0:37 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aYIsRc03fGhQ7vbS@groves.net>
2026-02-02 13:51 ` [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more Miklos Szeredi
2026-02-02 16:14   ` Amir Goldstein
2026-02-03  7:55     ` Miklos Szeredi
2026-02-03  9:19       ` [Lsf-pc] " Jan Kara
2026-02-03 10:31         ` Amir Goldstein
2026-02-04  9:22       ` Joanne Koong
2026-02-04 10:37         ` Amir Goldstein
2026-02-04 10:43         ` [Lsf-pc] " Jan Kara
2026-02-06  6:09           ` Darrick J. Wong
2026-02-21  6:07             ` Demi Marie Obenour
2026-02-21  7:07               ` Darrick J. Wong
2026-02-21 22:16                 ` Demi Marie Obenour
2026-02-23 21:58                   ` Darrick J. Wong
2026-02-04 20:47         ` Bernd Schubert
2026-02-06  6:26         ` Darrick J. Wong
2026-02-03 10:15     ` Luis Henriques
2026-02-03 10:20       ` Amir Goldstein
2026-02-03 10:38         ` Luis Henriques
2026-02-03 14:20         ` Christian Brauner
2026-02-03 10:36   ` Amir Goldstein
2026-02-03 17:13   ` John Groves
2026-02-04 19:06   ` Darrick J. Wong
2026-02-04 19:38     ` Horst Birthelmer
2026-02-04 20:58     ` Bernd Schubert
2026-02-06  5:47       ` Darrick J. Wong
2026-02-04 22:50     ` Gao Xiang
2026-02-06  5:38       ` Darrick J. Wong
2026-02-06  6:15         ` Gao Xiang
2026-02-21  0:47           ` Darrick J. Wong
2026-03-17  4:17             ` Gao Xiang
2026-03-18 21:51               ` Darrick J. Wong
2026-03-19  8:05                 ` Gao Xiang
2026-03-22  3:25                 ` Demi Marie Obenour
2026-03-22  3:52                   ` Gao Xiang
2026-03-22  4:51                   ` Gao Xiang
2026-03-22  5:13                     ` Demi Marie Obenour
2026-03-22  5:30                       ` Gao Xiang
2026-03-23  9:54                     ` [Lsf-pc] " Jan Kara
2026-03-23 10:19                       ` Gao Xiang
2026-03-23 11:14                         ` Jan Kara
2026-03-23 11:42                           ` Gao Xiang
2026-03-23 12:01                             ` Gao Xiang
2026-03-23 14:13                               ` Jan Kara
2026-03-23 14:36                                 ` Gao Xiang
2026-03-23 14:47                                   ` Jan Kara
2026-03-23 14:57                                     ` Gao Xiang
2026-03-24  8:48                                     ` Christian Brauner
2026-03-24  9:30                                       ` Gao Xiang
2026-03-24  9:49                                         ` Demi Marie Obenour
2026-03-24  9:53                                           ` Gao Xiang
2026-03-24 10:02                                             ` Demi Marie Obenour
2026-03-24 10:14                                               ` Gao Xiang
2026-03-24 10:17                                                 ` Demi Marie Obenour
2026-03-24 10:25                                                   ` Gao Xiang
2026-03-24 11:58                                       ` Demi Marie Obenour
2026-03-24 12:21                                         ` Gao Xiang
2026-03-26 14:39                                           ` Christian Brauner
2026-03-26 15:10                                             ` Gao Xiang
2026-03-26 16:11                                               ` Gao Xiang
2026-03-26 16:24                                                 ` Amir Goldstein
2026-03-26 16:37                                                   ` Gao Xiang
2026-03-23 12:08                           ` Demi Marie Obenour
2026-03-23 12:13                             ` Gao Xiang
2026-03-23 12:19                               ` Demi Marie Obenour
2026-03-23 12:30                                 ` Gao Xiang
2026-03-23 12:33                                   ` Gao Xiang
2026-03-22  5:14                   ` Gao Xiang
2026-03-23  9:43                     ` [Lsf-pc] " Jan Kara
2026-03-23 10:05                       ` Gao Xiang
2026-03-23 10:14                         ` Jan Kara
2026-03-23 10:30                           ` Gao Xiang
2026-02-04 23:19     ` Gao Xiang
2026-02-05  3:33     ` John Groves
2026-02-05  9:27       ` Amir Goldstein
2026-02-06  5:52         ` Darrick J. Wong
2026-02-06 20:48           ` John Groves
2026-02-07  0:22             ` Joanne Koong
2026-02-12  4:46               ` Joanne Koong
2026-02-21  0:37                 ` Darrick J. Wong [this message]
2026-02-26 20:21                   ` Joanne Koong
2026-03-03  4:57                     ` Darrick J. Wong
2026-03-03 17:28                       ` Joanne Koong
2026-02-20 23:59             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260221003756.GD11076@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bernd@bsbernd.com \
    --cc=horst@birthelmer.de \
    --cc=joannelkoong@gmail.com \
    --cc=john@groves.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=luis@igalia.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox