From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: John Groves <john@groves.net>,
Amir Goldstein <amir73il@gmail.com>,
Miklos Szeredi <miklos@szeredi.hu>,
"f-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Bernd Schubert <bernd@bsbernd.com>,
Luis Henriques <luis@igalia.com>,
Horst Birthelmer <horst@birthelmer.de>
Subject: Re: [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more
Date: Mon, 2 Mar 2026 20:57:55 -0800 [thread overview]
Message-ID: <20260303045755.GN13829@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1ZJksW=uz1itdh+zoaQBo_XQ4ZSF13BSnZXMie5pBCvYA@mail.gmail.com>
On Thu, Feb 26, 2026 at 12:21:43PM -0800, Joanne Koong wrote:
> On Fri, Feb 20, 2026 at 4:37 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Wed, Feb 11, 2026 at 08:46:26PM -0800, Joanne Koong wrote:
> > > On Fri, Feb 6, 2026 at 4:22 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Fri, Feb 6, 2026 at 12:48 PM John Groves <john@groves.net> wrote:
> > > > >
> > > > > On 26/02/05 09:52PM, Darrick J. Wong wrote:
> > > > > > On Thu, Feb 05, 2026 at 10:27:52AM +0100, Amir Goldstein wrote:
> > > > > > > On Thu, Feb 5, 2026 at 4:33 AM John Groves <john@jagalactic.com> wrote:
> > > > > > > >
> > > > > > > > On 26/02/04 11:06AM, Darrick J. Wong wrote:
> > > > > > > >
> > > > > > > > [ ... ]
> > > > > > > >
> > > > > > > > > > - famfs: export distributed memory
> > > > > > > > >
> > > > > > > > > This has been, uh, hanging out for an extraordinarily long time.
> > > > > > > >
> > > > > > > > Um, *yeah*. Although a significant part of that time was on me, because
> > > > > > > > getting it ported into fuse was kinda hard, my users and I are hoping we
> > > > > > > > can get this upstreamed fairly soon now. I'm hoping that after the 6.19
> > > > > > > > merge window dust settles we can negotiate any needed changes etc. and
> > > > > > > > shoot for the 7.0 merge window.
> > > > > >
> > > > > > I think we've all missed getting merged for 7.0 since 6.19 will be
> > > > > > released in 3 days. :/
> > > > > >
> > > > > > (Granted most of the maintainers I know are /much/ less conservative
> > > > > > than I was about the schedule)
> > > > >
> > > > > Doh - right you are...
> > > > >
> > > > > >
> > > > > > > I think that the work on famfs is setting an example, and I very much
> > > > > > > hope it will be a good example, of how improving existing infrastructure
> > > > > > > (FUSE) is a better contribution than adding another fs to the pile.
> > > > > >
> > > > > > Yeah. Joanne and I spent a couple of days this week coprogramming a
> > > > > > prototype of a way for famfs to create BPF programs to handle
> > > > > > INTERLEAVED_EXTENT files. We might be ready to show that off in a
> > > > > > couple of weeks, and that might be a way to clear up the
> > > > > > GET_FMAP/IOMAP_BEGIN logjam at last.
> > > > >
> > > > > I'd love to learn more about this; happy to do a call if that's a
> > > > > good way to get me briefed.
> > > > >
> > > > > I [generally but not specifically] understand how this could avoid
> > > > > GET_FMAP, but not GET_DAXDEV.
> > > > >
> > > > > But I'm not sure it could (or should) avoid dax_iomap_rw() and
> > > > > dax_iomap_fault(). The thing is that those call my begin() function
> > > > > to resolve an offset in a file to an offset on a daxdev, and then
> > > > > dax completes the fault or memcpy. In that dance, famfs never knows
> > > > > the kernel address of the memory at all (also true of xfs in fs-dax
> > > > > mode, unless that's changed fairly recently). I think that's a pretty
> > > > > decent interface all in all.
> > > > >
> > > > > Also: dunno whether y'all have looked at the dax patches in the famfs
> > > > > series, but the solution to working with Alistair's folio-ification
> > > > > and cleanup of the dax layer (which set me back months) was to create
> > > > > drivers/dax/fsdev.c, which, when bound to a daxdev in place of
> > > > > drivers/dax/device.c, configures folios & pages compatibly with
> > > > > fs-dax. So I kinda think I need the dax_iomap* interface.
> > > > >
> > > > > As usual, if I'm overlooking something let me know...
> > > >
> > > > Hi John,
> > > >
> > > > The conversation started [1] on Darrick's containerization patchset
> > > > about using bpf to a) avoid extra requests / context switching for
> > > > ->iomap_begin and ->iomap_end calls and b) offload what would
> > > > otherwise have to be hard-coded kernel logic into userspace, which
> > > > gives userspace more flexibility / control with updating the logic and
> > > > is less of a maintenance burden for fuse. There was some musing [2]
> > > > about whether with bpf infrastructure added, it would allow famfs to
> > > > move all famfs-specific logic to userspace/bpf.
> > > >
> > > > I agree that it makes sense for famfs to go through dax iomap
> > > > interfaces. imo it seems cleanest if fuse has a generic iomap
> > > > interface with iomap dax going through that plumbing, and any
> > > > famfs-specific logic that would be needed beyond that (eg computing
> > > > the interleaved mappings) being moved to custom famfs bpf programs. I
> > > > started trying to implement this yesterday afternoon because I wanted
> > > > to make sure it would actually be doable for the famfs logic before
> > > > bringing it up and I didn't want to derail your project. So far I only
> > > > have the general iomap interface for fuse added with dax operations
> > > > going through dax_iomap* and haven't tried out integrating the famfs
> > > > GET_FMAP/GET_DAXDEV bpf program part yet but I'm planning/hoping to
> > > > get to that early next week. The work I did with Darrick this week was
> > > > on getting a server's bpf programs hooked up to fuse through bpf links
> > > > and Darrick has fleshed that out and gotten that working now. If it
> > > > turns out famfs can go through a generic iomap fuse plumbing layer,
> > > > I'd be curious to hear your thoughts on which approach you'd prefer.
> > >
> > > I put together a quick prototype to test this out - this is what it
> > > looks like with fuse having a generic iomap interface that supports
> > > dax [1], and the famfs custom logic moved to a bpf program [2]. I
> >
> > The bpf maps that you've used to upload per-inode data into the kernel
> > is a /much/ cleaner method than custom-compiling C into BPF at runtime!
> > You can statically compile the BPF object code into the fuse server,
> > which means that (a) you can take advantage of the bpftool skeletons,
> > and (b) you can in theory vendor-sign the BPF code if and when that
> > becomes a requirement.
> >
> > I think that's way better than having to put vmlinux.h and
> > fuse_iomap_bpf.h on the deployed system. Though there's one hitch in
> > example/Makefile:
> >
> > vmlinux.h:
> > $(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > $@
> >
> > The build system isn't necessarily running the same kernel as the deploy
> > images. It might be for Meta, but it's not unheard of for our build
> > system to be running (say) OL10+UEK8 kernel, but the build target is OL8
> > and UEK7.
> >
> > There doesn't seem to be any standardization across distros for where a
> > vmlinux.h file might be found. Fedora puts it under
> > /usr/src/$unamestuf, Debian puts it in /usr/include/$gcc_triple, and I
> > guess SUSE doesn't ship it at all?
> >
> > That's going to be a headache for deployment as I've been muttering for
> > a couple of weeks now. :(
>
> I don't think this is an issue because bpf does dynamic btf-based
> relocations (CO-RE) at load time [1]. On the target machine, when
> libbpf loads the bpf object it will read the machine's btf and patch
> any offsets in bytecode and load the fixed-up version into the kernel.
> All that's needed on the target machine for CO-RE is
> CONFIG_DEBUG_INFO_BTF=y which is enabled by default on mainstream
> distributions. I think this addresses the deployment headache you've
> been running into?
Not really -- CO-RE does indeed work quite nicely to smooth over layout
changes in C structures between a BPF program and the kernel it's being
loaded into (thanks, whoever came up with that!) but the problem I have
is how you /get/ those definitions into clang in the first place.
I was under the impression from many of the bpf examples that you're
supposed to #include a distro-provided "vmlinux.h", but there doesn't
seem to be a standard way to find that file. Most -dev packages provide
a pkgconfig file that give you the appropriate CFLAGS/LDFLAGS to add,
but apparently this is not the case for BPF...?
Perhaps it's the case that distro packages that are building BPF
programs simply add a build dependency on the package providing
vmlinux.h (e.g. Build-Depends: linux-bpf-dev on Debian) and patch in
"CFLAGS=-I/some/path" as needed?
I suppose for a dynamically generated and compiled BPF program, one
could just "bpftool skel" the /sys/kernel/btf files, capture the output,
and "#include </dev/fd/XXX>" the results. Honestly that sounds better
than trusting some weird system package.
But maybe dynamic compilation is a totally stupid idea. I did grow up
in the era of mshtml email wreaking havoc, after all...
--D
> Thanks,
> Joanne
>
> [1] https://docs.ebpf.io/concepts/core/
>
> >
> > Maybe we could reduce the fuse-iomap bpf definitions to use only
> > cardinal types and the types that iomap itself defines. That might not
> > be too hard right now because bpf functions reuse structures from
> > include/uapi/fuse.h, which currently use uint{8,16,32,64}_t. It'll get
> > harder if that __uintXX_t -> __uXX transition actually happens.
> >
> > But getting back to the famfs bpf stuff, I think doing the interleaved
> > mappings via BPF gives the famfs server a lot more flexibility in terms
> > of what it can do when future hardware arrives with even weirder
> > configurations.
> >
> > --D
> >
> > > didn't change much, I just moved around your famfs code to the bpf
> > > side. The kernel side changes are in [3] and the libfuse changes are
> > > in [4].
> > >
> > > For testing out the prototype, I hooked it up to passthrough_hp to
> > > test running the bpf program and verify that it is able to find the
> > > extent from the bpf map. In my opinion, this makes the fuse side
> > > infrastructure cleaner and more extendable for other servers that will
> > > want to go through dax iomap in the future, but I think this also has
> > > a few benefits for famfs. Instead of needing to issue a FUSE_GET_FMAP
> > > request after a file is opened, the server can directly populate the
> > > metadata map from userspace with the mapping info when it processes
> > > the FUSE_OPEN request, which gets rid of the roundtrip cost. The
> > > server can dynamically update the metadata at any time from userspace
> > > if the mapping info needs to change in the future. For setting up the
> > > daxdevs, I moved your logic to the init side, where the server passes
> > > the daxdev info upfront through an IOMAP_CONFIG exchange with the
> > > kernel initializing the daxdevs based off that info. I think this will
> > > also make deploying future updates for famfs easier, as updating the
> > > logic won't need to go through the upstream kernel mailing list
> > > process and deploying updates won't require a new kernel release.
> > >
> > > These are just my two cents based on my (cursory) understanding of
> > > famfs. Just wanted to float this alternative approach in case it's
> > > useful.
> > >
> > > Thanks,
> > > Joanne
> > >
> > > [1] https://github.com/joannekoong/linux/commit/b8f9d284a6955391f00f576d890e1c1ccc943cfd
> > > [2] https://github.com/joannekoong/libfuse/commit/444fa27fa9fd2118a0dc332933197faf9bbf25aa
> > > [3] https://github.com/joannekoong/linux/commits/prototype_generic_iomap_dax/
> > > [4] https://github.com/joannekoong/libfuse/commits/famfs_bpf/
> > >
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#md1b8003a109760d8ee1d5397e053673c1978ed4d
> > > > [2] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#u
> > > >
> > > > >
> > > > > Regards,
> > > > > John
> > > > >
> > >
>
next prev parent reply other threads:[~2026-03-03 4:57 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aYIsRc03fGhQ7vbS@groves.net>
2026-02-02 13:51 ` [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more Miklos Szeredi
2026-02-02 16:14 ` Amir Goldstein
2026-02-03 7:55 ` Miklos Szeredi
2026-02-03 9:19 ` [Lsf-pc] " Jan Kara
2026-02-03 10:31 ` Amir Goldstein
2026-02-04 9:22 ` Joanne Koong
2026-02-04 10:37 ` Amir Goldstein
2026-02-04 10:43 ` [Lsf-pc] " Jan Kara
2026-02-06 6:09 ` Darrick J. Wong
2026-02-21 6:07 ` Demi Marie Obenour
2026-02-21 7:07 ` Darrick J. Wong
2026-02-21 22:16 ` Demi Marie Obenour
2026-02-23 21:58 ` Darrick J. Wong
2026-02-04 20:47 ` Bernd Schubert
2026-02-06 6:26 ` Darrick J. Wong
2026-02-03 10:15 ` Luis Henriques
2026-02-03 10:20 ` Amir Goldstein
2026-02-03 10:38 ` Luis Henriques
2026-02-03 14:20 ` Christian Brauner
2026-02-03 10:36 ` Amir Goldstein
2026-02-03 17:13 ` John Groves
2026-02-04 19:06 ` Darrick J. Wong
2026-02-04 19:38 ` Horst Birthelmer
2026-02-04 20:58 ` Bernd Schubert
2026-02-06 5:47 ` Darrick J. Wong
2026-02-04 22:50 ` Gao Xiang
2026-02-06 5:38 ` Darrick J. Wong
2026-02-06 6:15 ` Gao Xiang
2026-02-21 0:47 ` Darrick J. Wong
2026-03-17 4:17 ` Gao Xiang
2026-03-18 21:51 ` Darrick J. Wong
2026-03-19 8:05 ` Gao Xiang
2026-03-22 3:25 ` Demi Marie Obenour
2026-03-22 3:52 ` Gao Xiang
2026-03-22 4:51 ` Gao Xiang
2026-03-22 5:13 ` Demi Marie Obenour
2026-03-22 5:30 ` Gao Xiang
2026-03-23 9:54 ` [Lsf-pc] " Jan Kara
2026-03-23 10:19 ` Gao Xiang
2026-03-23 11:14 ` Jan Kara
2026-03-23 11:42 ` Gao Xiang
2026-03-23 12:01 ` Gao Xiang
2026-03-23 14:13 ` Jan Kara
2026-03-23 14:36 ` Gao Xiang
2026-03-23 14:47 ` Jan Kara
2026-03-23 14:57 ` Gao Xiang
2026-03-24 8:48 ` Christian Brauner
2026-03-24 9:30 ` Gao Xiang
2026-03-24 9:49 ` Demi Marie Obenour
2026-03-24 9:53 ` Gao Xiang
2026-03-24 10:02 ` Demi Marie Obenour
2026-03-24 10:14 ` Gao Xiang
2026-03-24 10:17 ` Demi Marie Obenour
2026-03-24 10:25 ` Gao Xiang
2026-03-24 11:58 ` Demi Marie Obenour
2026-03-24 12:21 ` Gao Xiang
2026-03-26 14:39 ` Christian Brauner
2026-03-23 12:08 ` Demi Marie Obenour
2026-03-23 12:13 ` Gao Xiang
2026-03-23 12:19 ` Demi Marie Obenour
2026-03-23 12:30 ` Gao Xiang
2026-03-23 12:33 ` Gao Xiang
2026-03-22 5:14 ` Gao Xiang
2026-03-23 9:43 ` [Lsf-pc] " Jan Kara
2026-03-23 10:05 ` Gao Xiang
2026-03-23 10:14 ` Jan Kara
2026-03-23 10:30 ` Gao Xiang
2026-02-04 23:19 ` Gao Xiang
2026-02-05 3:33 ` John Groves
2026-02-05 9:27 ` Amir Goldstein
2026-02-06 5:52 ` Darrick J. Wong
2026-02-06 20:48 ` John Groves
2026-02-07 0:22 ` Joanne Koong
2026-02-12 4:46 ` Joanne Koong
2026-02-21 0:37 ` Darrick J. Wong
2026-02-26 20:21 ` Joanne Koong
2026-03-03 4:57 ` Darrick J. Wong [this message]
2026-03-03 17:28 ` Joanne Koong
2026-02-20 23:59 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260303045755.GN13829@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=bernd@bsbernd.com \
--cc=horst@birthelmer.de \
--cc=joannelkoong@gmail.com \
--cc=john@groves.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=luis@igalia.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox