From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, bernd@bsbernd.com, neal@gompa.dev,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better file IO performance
Date: Tue, 27 Jan 2026 15:21:25 -0800 [thread overview]
Message-ID: <20260127232125.GA5966@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1bSVy4=c=N_FfOajs1FE4o8T=Br=jFm7gBDaCGvRpgGVA@mail.gmail.com>
On Tue, Jan 27, 2026 at 11:47:31AM -0800, Joanne Koong wrote:
> On Mon, Jan 26, 2026 at 6:22 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Mon, Jan 26, 2026 at 04:59:16PM -0800, Joanne Koong wrote:
> > > On Tue, Oct 28, 2025 at 5:38 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This series connects fuse (the userspace filesystem layer) to fs-iomap
> > > > to get fuse servers out of the business of handling file I/O themselves.
> > > > By keeping the IO path mostly within the kernel, we can dramatically
> > > > improve the speed of disk-based filesystems. This enables us to move
> > > > all the filesystem metadata parsing code out of the kernel and into
> > > > userspace, which means that we can containerize them for security
> > > > without losing a lot of performance.
> > >
> > > I haven't looked through how the fuse2fs or fuse4fs servers are
> > > implemented yet (also, could you explain the difference between the
> > > two? Which one should we look at to see how it all ties together?),
> >
> > fuse4fs is a lowlevel fuse server; fuse2fs is a high(?) level fuse
> > server. fuse4fs is the successor to fuse2fs, at least on Linux and BSD.
>
> Ah I see, thanks for the explanation. In that case, I'll just look at
> fuse4fs then.
>
> >
> > > but I wonder if having bpf infrastructure hooked up to fuse would be
> > > especially helpful for what you're doing here with fuse iomap. afaict,
> > > every read/write whether it's buffered or direct will incur at least 1
> > > call to ->iomap_begin() to get the mapping metadata, which will be 2
> > > context-switches (and if the server has ->iomap_end() implemented,
> > > then 2 more context-switches).
> >
> > Yes, I agree that's a lot of context switching for file IO...
> >
> > > But it seems like the logic for retrieving mapping
> > > offsets/lengths/metadata should be pretty straightforward?
> >
> > ...but it gets very cheap if the fuse server can cache mappings in the
> > kernel to avoid all that. That is, incidentally, what patchset #7
> > implements.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache_2026-01-22
> >
> > > If the extent lookups are table lookups or tree
> > > traversals without complex side effects, then having
> > > ->iomap_begin()/->iomap_end() be executed as a bpf program would avoid
> > > the context switches and allow all the caching logic to be moved from
> > > the kernel to the server-side (eg using bpf maps).
> >
> > Hrmm. Now that /is/ an interesting proposal. Does BPF have a data
> > structure that supports interval mappings? I think the existing bpf map
>
> Not yet but I don't see why a b+ tree like data strucutre couldn't be added.
> Maybe one workaround in the meantime that could work is using a sorted
> array map and doing binary search on that, until interval mappings can
> be natively supported?
I guess, though I already had a C structure to borrow from xfs ;)
> > only does key -> value. Also, is there an upper limit on the size of a
> > map? You could have hundreds of millions of maps for a very fragmented
> > regular file.
>
> If I'm remembering correctly, there's an upper limit on the number of
> map entries, which is bounded by u32
That's problematic, since files can have 64-bit logical block numbers.
> > At one point I suggested to the famfs maintainer that it might be
> > easier/better to implement the interleaved mapping lookups as bpf
> > programs instead of being stuck with a fixed format in the fuse
> > userspace abi, but I don't know if he ever implemented that.
>
> This seems like a good use case for it too
> >
> > > Is this your
> > > assessment of it as well or do you think the server-side logic for
> > > iomap_begin()/iomap_end() is too complicated to make this realistic?
> > > Asking because I'm curious whether this direction makes sense, not
> > > because I think it would be a blocker for your series.
> >
> > For disk-based filesystems I think it would be difficult to model a bpf
> > program to do mappings, since they can basically point anywhere and be
> > of any size.
>
> Hmm I'm not familiar enough with disk-based filesystems to know what
> the "point anywhere and be of any size" means. For the mapping stuff,
> doesn't it just point to a block number? Or are you saying the problem
> would be there's too many mappings since a mapping could be any size?
The second -- mappings can be any size, and unprivileged userspace can
control the mappings.
> I was thinking the issue would be more that there might be other logic
> inside ->iomap_begin()/->iomap_end() besides the mapping stuff that
> would need to be done that would be too out-of-scope for bpf. But I
> think I need to read through the fuse4fs stuff to understand more what
> it's doing in those functions.
<nod>
--D
>
> Thanks,
> Joanne
>
> >
> > OTOH it would be enormously hilarious to me if one could load a file
> > mapping predictive model into the kernel as a bpf program and use that
> > as a first tier before checking the in-memory btree mapping cache from
> > patchset 7. Quite a few years ago now there was a FAST paper
> > establishing that even a stupid linear regression model could in theory
> > beat a disk btree lookup.
> >
> > --D
> >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > If you're going to start using this code, I strongly recommend pulling
> > > > from my git trees, which are linked below.
> > > >
> > > > This has been running on the djcloud for months with no problems. Enjoy!
> > > > Comments and questions are, as always, welcome.
> > > >
> > > > --D
> > > >
> > > > kernel git tree:
> > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-fileio
> > > > ---
> > > > Commits in this patchset:
> > > > * fuse: implement the basic iomap mechanisms
> > > > * fuse_trace: implement the basic iomap mechanisms
> > > > * fuse: make debugging configurable at runtime
> > > > * fuse: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > > * fuse_trace: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > > * fuse: flush events and send FUSE_SYNCFS and FUSE_DESTROY on unmount
> > > > * fuse: create a per-inode flag for toggling iomap
> > > > * fuse_trace: create a per-inode flag for toggling iomap
> > > > * fuse: isolate the other regular file IO paths from iomap
> > > > * fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > > * fuse_trace: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > > * fuse: implement direct IO with iomap
> > > > * fuse_trace: implement direct IO with iomap
> > > > * fuse: implement buffered IO with iomap
> > > > * fuse_trace: implement buffered IO with iomap
> > > > * fuse: implement large folios for iomap pagecache files
> > > > * fuse: use an unrestricted backing device with iomap pagecache io
> > > > * fuse: advertise support for iomap
> > > > * fuse: query filesystem geometry when using iomap
> > > > * fuse_trace: query filesystem geometry when using iomap
> > > > * fuse: implement fadvise for iomap files
> > > > * fuse: invalidate ranges of block devices being used for iomap
> > > > * fuse_trace: invalidate ranges of block devices being used for iomap
> > > > * fuse: implement inline data file IO via iomap
> > > > * fuse_trace: implement inline data file IO via iomap
> > > > * fuse: allow more statx fields
> > > > * fuse: support atomic writes with iomap
> > > > * fuse_trace: support atomic writes with iomap
> > > > * fuse: disable direct reclaim for any fuse server that uses iomap
> > > > * fuse: enable swapfile activation on iomap
> > > > * fuse: implement freeze and shutdowns for iomap filesystems
> > > > ---
> > > > fs/fuse/fuse_i.h | 161 +++
> > > > fs/fuse/fuse_trace.h | 939 +++++++++++++++++++
> > > > fs/fuse/iomap_i.h | 52 +
> > > > include/uapi/linux/fuse.h | 219 ++++
> > > > fs/fuse/Kconfig | 48 +
> > > > fs/fuse/Makefile | 1
> > > > fs/fuse/backing.c | 12
> > > > fs/fuse/dev.c | 30 +
> > > > fs/fuse/dir.c | 120 ++
> > > > fs/fuse/file.c | 133 ++-
> > > > fs/fuse/file_iomap.c | 2230 +++++++++++++++++++++++++++++++++++++++++++++
> > > > fs/fuse/inode.c | 162 +++
> > > > fs/fuse/iomode.c | 2
> > > > fs/fuse/trace.c | 2
> > > > 14 files changed, 4056 insertions(+), 55 deletions(-)
> > > > create mode 100644 fs/fuse/iomap_i.h
> > > > create mode 100644 fs/fuse/file_iomap.c
> > > >
> > >
next prev parent reply other threads:[~2026-01-27 23:21 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251029002755.GK6174@frogsfrogsfrogs>
[not found] ` <176169810144.1424854.11439355400009006946.stgit@frogsfrogsfrogs>
[not found] ` <176169810371.1424854.3010195280915622081.stgit@frogsfrogsfrogs>
2026-01-21 19:34 ` [PATCH 01/31] fuse: implement the basic iomap mechanisms Joanne Koong
2026-01-21 22:45 ` Darrick J. Wong
2026-01-22 0:06 ` Joanne Koong
2026-01-22 0:34 ` Darrick J. Wong
2026-02-05 19:22 ` Chris Mason
2026-02-05 23:31 ` Darrick J. Wong
[not found] ` <176169810415.1424854.10373764649459618752.stgit@frogsfrogsfrogs>
2026-01-21 23:42 ` [PATCH 03/31] fuse: make debugging configurable at runtime Joanne Koong
2026-01-22 0:02 ` Darrick J. Wong
2026-01-22 0:23 ` Joanne Koong
2026-01-22 0:40 ` Darrick J. Wong
[not found] ` <176169810502.1424854.13869957103489591272.stgit@frogsfrogsfrogs>
2026-01-22 1:13 ` [PATCH 07/31] fuse: create a per-inode flag for toggling iomap Joanne Koong
2026-01-22 22:22 ` Darrick J. Wong
2026-01-23 18:05 ` Joanne Koong
2026-01-24 16:54 ` Darrick J. Wong
2026-01-27 23:33 ` Darrick J. Wong
[not found] ` <176169810568.1424854.4073875923015322741.stgit@frogsfrogsfrogs>
2026-01-22 2:07 ` [PATCH 10/31] fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE} Joanne Koong
2026-01-22 22:31 ` Darrick J. Wong
[not found] ` <176169810612.1424854.16053093294573829123.stgit@frogsfrogsfrogs>
2026-01-23 18:56 ` [PATCH 12/31] fuse: implement direct IO with iomap Joanne Koong
2026-01-26 23:46 ` Darrick J. Wong
2026-02-05 19:19 ` Chris Mason
2026-02-06 2:08 ` Darrick J. Wong
2026-02-06 2:52 ` Chris Mason
2026-02-06 5:08 ` Darrick J. Wong
2026-02-06 14:27 ` Chris Mason
[not found] ` <176169810700.1424854.5753715202341698632.stgit@frogsfrogsfrogs>
2026-01-23 21:50 ` [PATCH 16/31] fuse: implement large folios for iomap pagecache files Joanne Koong
[not found] ` <176169810721.1424854.6150447623894591900.stgit@frogsfrogsfrogs>
2026-01-26 22:03 ` [PATCH 17/31] fuse: use an unrestricted backing device with iomap pagecache io Joanne Koong
2026-01-26 23:55 ` Darrick J. Wong
2026-01-27 1:35 ` Joanne Koong
2026-01-27 2:09 ` Darrick J. Wong
2026-01-27 18:04 ` Joanne Koong
2026-01-27 23:37 ` Darrick J. Wong
2026-01-27 0:59 ` [PATCHSET v6 4/8] fuse: allow servers to use iomap for better file IO performance Joanne Koong
2026-01-27 2:22 ` Darrick J. Wong
2026-01-27 19:47 ` Joanne Koong
2026-01-27 23:21 ` Darrick J. Wong [this message]
2026-01-28 0:10 ` Joanne Koong
2026-01-28 0:34 ` Darrick J. Wong
2026-01-29 1:12 ` Joanne Koong
2026-01-29 20:02 ` Darrick J. Wong
2026-01-29 22:41 ` Darrick J. Wong
2026-01-29 22:50 ` Joanne Koong
2026-01-29 23:12 ` Darrick J. Wong
[not found] ` <176169810980.1424854.10557015500766654898.stgit@frogsfrogsfrogs>
2026-02-05 18:57 ` [PATCH 29/31] fuse: disable direct reclaim for any fuse server that uses iomap Chris Mason
2026-02-06 4:25 ` Darrick J. Wong
[not found] ` <176169810874.1424854.5037707950055785011.stgit@frogsfrogsfrogs>
2026-02-05 19:01 ` [PATCH 24/31] fuse: implement inline data file IO via iomap Chris Mason
2026-02-06 2:27 ` Darrick J. Wong
[not found] ` <176169810765.1424854.10969346031644824992.stgit@frogsfrogsfrogs>
2026-02-05 19:07 ` [PATCH 19/31] fuse: query filesystem geometry when using iomap Chris Mason
2026-02-06 2:17 ` Darrick J. Wong
[not found] ` <176169810656.1424854.15239592653019383193.stgit@frogsfrogsfrogs>
2026-02-05 19:12 ` [PATCH 14/31] fuse: implement buffered IO with iomap Chris Mason
2026-02-06 2:14 ` Darrick J. Wong
[not found] ` <176169810634.1424854.13084435884326863405.stgit@frogsfrogsfrogs>
2026-02-05 19:16 ` [PATCH 13/31] fuse_trace: implement direct " Chris Mason
2026-02-06 2:12 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260127232125.GA5966@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=bernd@bsbernd.com \
--cc=joannelkoong@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=neal@gompa.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox