From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: John Groves <john@groves.net>,
Amir Goldstein <amir73il@gmail.com>,
Miklos Szeredi <miklos@szeredi.hu>,
"f-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Bernd Schubert <bernd@bsbernd.com>,
Luis Henriques <luis@igalia.com>,
Horst Birthelmer <horst@birthelmer.de>
Subject: Re: [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more
Date: Mon, 2 Mar 2026 20:57:55 -0800 [thread overview]
Message-ID: <20260303045755.GN13829@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1ZJksW=uz1itdh+zoaQBo_XQ4ZSF13BSnZXMie5pBCvYA@mail.gmail.com>
On Thu, Feb 26, 2026 at 12:21:43PM -0800, Joanne Koong wrote:
> On Fri, Feb 20, 2026 at 4:37 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Wed, Feb 11, 2026 at 08:46:26PM -0800, Joanne Koong wrote:
> > > On Fri, Feb 6, 2026 at 4:22 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Fri, Feb 6, 2026 at 12:48 PM John Groves <john@groves.net> wrote:
> > > > >
> > > > > On 26/02/05 09:52PM, Darrick J. Wong wrote:
> > > > > > On Thu, Feb 05, 2026 at 10:27:52AM +0100, Amir Goldstein wrote:
> > > > > > > On Thu, Feb 5, 2026 at 4:33 AM John Groves <john@jagalactic.com> wrote:
> > > > > > > >
> > > > > > > > On 26/02/04 11:06AM, Darrick J. Wong wrote:
> > > > > > > >
> > > > > > > > [ ... ]
> > > > > > > >
> > > > > > > > > > - famfs: export distributed memory
> > > > > > > > >
> > > > > > > > > This has been, uh, hanging out for an extraordinarily long time.
> > > > > > > >
> > > > > > > > Um, *yeah*. Although a significant part of that time was on me, because
> > > > > > > > getting it ported into fuse was kinda hard, my users and I are hoping we
> > > > > > > > can get this upstreamed fairly soon now. I'm hoping that after the 6.19
> > > > > > > > merge window dust settles we can negotiate any needed changes etc. and
> > > > > > > > shoot for the 7.0 merge window.
> > > > > >
> > > > > > I think we've all missed getting merged for 7.0 since 6.19 will be
> > > > > > released in 3 days. :/
> > > > > >
> > > > > > (Granted most of the maintainers I know are /much/ less conservative
> > > > > > than I was about the schedule)
> > > > >
> > > > > Doh - right you are...
> > > > >
> > > > > >
> > > > > > > I think that the work on famfs is setting an example, and I very much
> > > > > > > hope it will be a good example, of how improving existing infrastructure
> > > > > > > (FUSE) is a better contribution than adding another fs to the pile.
> > > > > >
> > > > > > Yeah. Joanne and I spent a couple of days this week coprogramming a
> > > > > > prototype of a way for famfs to create BPF programs to handle
> > > > > > INTERLEAVED_EXTENT files. We might be ready to show that off in a
> > > > > > couple of weeks, and that might be a way to clear up the
> > > > > > GET_FMAP/IOMAP_BEGIN logjam at last.
> > > > >
> > > > > I'd love to learn more about this; happy to do a call if that's a
> > > > > good way to get me briefed.
> > > > >
> > > > > I [generally but not specifically] understand how this could avoid
> > > > > GET_FMAP, but not GET_DAXDEV.
> > > > >
> > > > > But I'm not sure it could (or should) avoid dax_iomap_rw() and
> > > > > dax_iomap_fault(). The thing is that those call my begin() function
> > > > > to resolve an offset in a file to an offset on a daxdev, and then
> > > > > dax completes the fault or memcpy. In that dance, famfs never knows
> > > > > the kernel address of the memory at all (also true of xfs in fs-dax
> > > > > mode, unless that's changed fairly recently). I think that's a pretty
> > > > > decent interface all in all.
> > > > >
> > > > > Also: dunno whether y'all have looked at the dax patches in the famfs
> > > > > series, but the solution to working with Alistair's folio-ification
> > > > > and cleanup of the dax layer (which set me back months) was to create
> > > > > drivers/dax/fsdev.c, which, when bound to a daxdev in place of
> > > > > drivers/dax/device.c, configures folios & pages compatibly with
> > > > > fs-dax. So I kinda think I need the dax_iomap* interface.
> > > > >
> > > > > As usual, if I'm overlooking something let me know...
> > > >
> > > > Hi John,
> > > >
> > > > The conversation started [1] on Darrick's containerization patchset
> > > > about using bpf to a) avoid extra requests / context switching for
> > > > ->iomap_begin and ->iomap_end calls and b) offload what would
> > > > otherwise have to be hard-coded kernel logic into userspace, which
> > > > gives userspace more flexibility / control with updating the logic and
> > > > is less of a maintenance burden for fuse. There was some musing [2]
> > > > about whether with bpf infrastructure added, it would allow famfs to
> > > > move all famfs-specific logic to userspace/bpf.
> > > >
> > > > I agree that it makes sense for famfs to go through dax iomap
> > > > interfaces. imo it seems cleanest if fuse has a generic iomap
> > > > interface with iomap dax going through that plumbing, and any
> > > > famfs-specific logic that would be needed beyond that (eg computing
> > > > the interleaved mappings) being moved to custom famfs bpf programs. I
> > > > started trying to implement this yesterday afternoon because I wanted
> > > > to make sure it would actually be doable for the famfs logic before
> > > > bringing it up and I didn't want to derail your project. So far I only
> > > > have the general iomap interface for fuse added with dax operations
> > > > going through dax_iomap* and haven't tried out integrating the famfs
> > > > GET_FMAP/GET_DAXDEV bpf program part yet but I'm planning/hoping to
> > > > get to that early next week. The work I did with Darrick this week was
> > > > on getting a server's bpf programs hooked up to fuse through bpf links
> > > > and Darrick has fleshed that out and gotten that working now. If it
> > > > turns out famfs can go through a generic iomap fuse plumbing layer,
> > > > I'd be curious to hear your thoughts on which approach you'd prefer.
> > >
> > > I put together a quick prototype to test this out - this is what it
> > > looks like with fuse having a generic iomap interface that supports
> > > dax [1], and the famfs custom logic moved to a bpf program [2]. I
> >
> > The bpf maps that you've used to upload per-inode data into the kernel
> > is a /much/ cleaner method than custom-compiling C into BPF at runtime!
> > You can statically compile the BPF object code into the fuse server,
> > which means that (a) you can take advantage of the bpftool skeletons,
> > and (b) you can in theory vendor-sign the BPF code if and when that
> > becomes a requirement.
> >
> > I think that's way better than having to put vmlinux.h and
> > fuse_iomap_bpf.h on the deployed system. Though there's one hitch in
> > example/Makefile:
> >
> > vmlinux.h:
> > $(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > $@
> >
> > The build system isn't necessarily running the same kernel as the deploy
> > images. It might be for Meta, but it's not unheard of for our build
> > system to be running (say) OL10+UEK8 kernel, but the build target is OL8
> > and UEK7.
> >
> > There doesn't seem to be any standardization across distros for where a
> > vmlinux.h file might be found. Fedora puts it under
> > /usr/src/$unamestuf, Debian puts it in /usr/include/$gcc_triple, and I
> > guess SUSE doesn't ship it at all?
> >
> > That's going to be a headache for deployment as I've been muttering for
> > a couple of weeks now. :(
>
> I don't think this is an issue because bpf does dynamic btf-based
> relocations (CO-RE) at load time [1]. On the target machine, when
> libbpf loads the bpf object it will read the machine's btf and patch
> any offsets in bytecode and load the fixed-up version into the kernel.
> All that's needed on the target machine for CO-RE is
> CONFIG_DEBUG_INFO_BTF=y which is enabled by default on mainstream
> distributions. I think this addresses the deployment headache you've
> been running into?
Not really -- CO-RE does indeed work quite nicely to smooth over layout
changes in C structures between a BPF program and the kernel it's being
loaded into (thanks, whoever came up with that!) but the problem I have
is how you /get/ those definitions into clang in the first place.
I was under the impression from many of the bpf examples that you're
supposed to #include a distro-provided "vmlinux.h", but there doesn't
seem to be a standard way to find that file. Most -dev packages provide
a pkgconfig file that give you the appropriate CFLAGS/LDFLAGS to add,
but apparently this is not the case for BPF...?
Perhaps it's the case that distro packages that are building BPF
programs simply add a build dependency on the package providing
vmlinux.h (e.g. Build-Depends: linux-bpf-dev on Debian) and patch in
"CFLAGS=-I/some/path" as needed?
I suppose for a dynamically generated and compiled BPF program, one
could just "bpftool skel" the /sys/kernel/btf files, capture the output,
and "#include </dev/fd/XXX>" the results. Honestly that sounds better
than trusting some weird system package.
But maybe dynamic compilation is a totally stupid idea. I did grow up
in the era of mshtml email wreaking havoc, after all...
--D
> Thanks,
> Joanne
>
> [1] https://docs.ebpf.io/concepts/core/
>
> >
> > Maybe we could reduce the fuse-iomap bpf definitions to use only
> > cardinal types and the types that iomap itself defines. That might not
> > be too hard right now because bpf functions reuse structures from
> > include/uapi/fuse.h, which currently use uint{8,16,32,64}_t. It'll get
> > harder if that __uintXX_t -> __uXX transition actually happens.
> >
> > But getting back to the famfs bpf stuff, I think doing the interleaved
> > mappings via BPF gives the famfs server a lot more flexibility in terms
> > of what it can do when future hardware arrives with even weirder
> > configurations.
> >
> > --D
> >
> > > didn't change much, I just moved around your famfs code to the bpf
> > > side. The kernel side changes are in [3] and the libfuse changes are
> > > in [4].
> > >
> > > For testing out the prototype, I hooked it up to passthrough_hp to
> > > test running the bpf program and verify that it is able to find the
> > > extent from the bpf map. In my opinion, this makes the fuse side
> > > infrastructure cleaner and more extendable for other servers that will
> > > want to go through dax iomap in the future, but I think this also has
> > > a few benefits for famfs. Instead of needing to issue a FUSE_GET_FMAP
> > > request after a file is opened, the server can directly populate the
> > > metadata map from userspace with the mapping info when it processes
> > > the FUSE_OPEN request, which gets rid of the roundtrip cost. The
> > > server can dynamically update the metadata at any time from userspace
> > > if the mapping info needs to change in the future. For setting up the
> > > daxdevs, I moved your logic to the init side, where the server passes
> > > the daxdev info upfront through an IOMAP_CONFIG exchange with the
> > > kernel initializing the daxdevs based off that info. I think this will
> > > also make deploying future updates for famfs easier, as updating the
> > > logic won't need to go through the upstream kernel mailing list
> > > process and deploying updates won't require a new kernel release.
> > >
> > > These are just my two cents based on my (cursory) understanding of
> > > famfs. Just wanted to float this alternative approach in case it's
> > > useful.
> > >
> > > Thanks,
> > > Joanne
> > >
> > > [1] https://github.com/joannekoong/linux/commit/b8f9d284a6955391f00f576d890e1c1ccc943cfd
> > > [2] https://github.com/joannekoong/libfuse/commit/444fa27fa9fd2118a0dc332933197faf9bbf25aa
> > > [3] https://github.com/joannekoong/linux/commits/prototype_generic_iomap_dax/
> > > [4] https://github.com/joannekoong/libfuse/commits/famfs_bpf/
> > >
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#md1b8003a109760d8ee1d5397e053673c1978ed4d
> > > > [2] https://lore.kernel.org/linux-fsdevel/CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com/t/#u
> > > >
> > > > >
> > > > > Regards,
> > > > > John
> > > > >
> > >
>
next prev parent reply other threads:[~2026-03-03 4:57 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aYIsRc03fGhQ7vbS@groves.net>
2026-02-02 13:51 ` [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more Miklos Szeredi
2026-02-02 16:14 ` Amir Goldstein
2026-02-03 7:55 ` Miklos Szeredi
2026-02-03 9:19 ` [Lsf-pc] " Jan Kara
2026-02-03 10:31 ` Amir Goldstein
2026-02-04 9:22 ` Joanne Koong
2026-02-04 10:37 ` Amir Goldstein
2026-02-04 10:43 ` [Lsf-pc] " Jan Kara
2026-02-06 6:09 ` Darrick J. Wong
2026-02-21 6:07 ` Demi Marie Obenour
2026-02-21 7:07 ` Darrick J. Wong
2026-02-21 22:16 ` Demi Marie Obenour
2026-02-23 21:58 ` Darrick J. Wong
2026-02-04 20:47 ` Bernd Schubert
2026-02-06 6:26 ` Darrick J. Wong
2026-02-03 10:15 ` Luis Henriques
2026-02-03 10:20 ` Amir Goldstein
2026-02-03 10:38 ` Luis Henriques
2026-02-03 14:20 ` Christian Brauner
2026-02-03 10:36 ` Amir Goldstein
2026-02-03 17:13 ` John Groves
2026-02-04 19:06 ` Darrick J. Wong
2026-02-04 19:38 ` Horst Birthelmer
2026-02-04 20:58 ` Bernd Schubert
2026-02-06 5:47 ` Darrick J. Wong
2026-02-04 22:50 ` Gao Xiang
2026-02-06 5:38 ` Darrick J. Wong
2026-02-06 6:15 ` Gao Xiang
2026-02-21 0:47 ` Darrick J. Wong
2026-03-17 4:17 ` Gao Xiang
2026-03-18 21:51 ` Darrick J. Wong
2026-03-19 8:05 ` Gao Xiang
2026-03-22 3:25 ` Demi Marie Obenour
2026-03-22 3:52 ` Gao Xiang
2026-03-22 4:51 ` Gao Xiang
2026-03-22 5:13 ` Demi Marie Obenour
2026-03-22 5:30 ` Gao Xiang
2026-03-23 9:54 ` [Lsf-pc] " Jan Kara
2026-03-23 10:19 ` Gao Xiang
2026-03-23 11:14 ` Jan Kara
2026-03-23 11:42 ` Gao Xiang
2026-03-23 12:01 ` Gao Xiang
2026-03-23 14:13 ` Jan Kara
2026-03-23 14:36 ` Gao Xiang
2026-03-23 14:47 ` Jan Kara
2026-03-23 14:57 ` Gao Xiang
2026-03-24 8:48 ` Christian Brauner
2026-03-24 9:30 ` Gao Xiang
2026-03-24 9:49 ` Demi Marie Obenour
2026-03-24 9:53 ` Gao Xiang
2026-03-24 10:02 ` Demi Marie Obenour
2026-03-24 10:14 ` Gao Xiang
2026-03-24 10:17 ` Demi Marie Obenour
2026-03-24 10:25 ` Gao Xiang
2026-03-24 11:58 ` Demi Marie Obenour
2026-03-24 12:21 ` Gao Xiang
2026-03-26 14:39 ` Christian Brauner
2026-03-26 15:10 ` Gao Xiang
2026-03-26 16:11 ` Gao Xiang
2026-03-26 16:24 ` Amir Goldstein
2026-03-26 16:37 ` Gao Xiang
2026-03-23 12:08 ` Demi Marie Obenour
2026-03-23 12:13 ` Gao Xiang
2026-03-23 12:19 ` Demi Marie Obenour
2026-03-23 12:30 ` Gao Xiang
2026-03-23 12:33 ` Gao Xiang
2026-03-22 5:14 ` Gao Xiang
2026-03-23 9:43 ` [Lsf-pc] " Jan Kara
2026-03-23 10:05 ` Gao Xiang
2026-03-23 10:14 ` Jan Kara
2026-03-23 10:30 ` Gao Xiang
2026-02-04 23:19 ` Gao Xiang
2026-02-05 3:33 ` John Groves
2026-02-05 9:27 ` Amir Goldstein
2026-02-06 5:52 ` Darrick J. Wong
2026-02-06 20:48 ` John Groves
2026-02-07 0:22 ` Joanne Koong
2026-02-12 4:46 ` Joanne Koong
2026-02-21 0:37 ` Darrick J. Wong
2026-02-26 20:21 ` Joanne Koong
2026-03-03 4:57 ` Darrick J. Wong [this message]
2026-03-03 17:28 ` Joanne Koong
2026-02-20 23:59 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260303045755.GN13829@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=bernd@bsbernd.com \
--cc=horst@birthelmer.de \
--cc=joannelkoong@gmail.com \
--cc=john@groves.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=luis@igalia.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.