Linux filesystem development
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, bernd@bsbernd.com, neal@gompa.dev,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better file IO performance
Date: Tue, 27 Jan 2026 15:21:25 -0800	[thread overview]
Message-ID: <20260127232125.GA5966@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1bSVy4=c=N_FfOajs1FE4o8T=Br=jFm7gBDaCGvRpgGVA@mail.gmail.com>

On Tue, Jan 27, 2026 at 11:47:31AM -0800, Joanne Koong wrote:
> On Mon, Jan 26, 2026 at 6:22 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Mon, Jan 26, 2026 at 04:59:16PM -0800, Joanne Koong wrote:
> > > On Tue, Oct 28, 2025 at 5:38 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This series connects fuse (the userspace filesystem layer) to fs-iomap
> > > > to get fuse servers out of the business of handling file I/O themselves.
> > > > By keeping the IO path mostly within the kernel, we can dramatically
> > > > improve the speed of disk-based filesystems.  This enables us to move
> > > > all the filesystem metadata parsing code out of the kernel and into
> > > > userspace, which means that we can containerize them for security
> > > > without losing a lot of performance.
> > >
> > > I haven't looked through how the fuse2fs or fuse4fs servers are
> > > implemented yet (also, could you explain the difference between the
> > > two? Which one should we look at to see how it all ties together?),
> >
> > fuse4fs is a lowlevel fuse server; fuse2fs is a high(?) level fuse
> > server.  fuse4fs is the successor to fuse2fs, at least on Linux and BSD.
> 
> Ah I see, thanks for the explanation. In that case, I'll just look at
> fuse4fs then.
> 
> >
> > > but I wonder if having bpf infrastructure hooked up to fuse would be
> > > especially helpful for what you're doing here with fuse iomap. afaict,
> > > every read/write whether it's buffered or direct will incur at least 1
> > > call to ->iomap_begin() to get the mapping metadata, which will be 2
> > > context-switches (and if the server has ->iomap_end() implemented,
> > > then 2 more context-switches).
> >
> > Yes, I agree that's a lot of context switching for file IO...
> >
> > > But it seems like the logic for retrieving mapping
> > > offsets/lengths/metadata should be pretty straightforward?
> >
> > ...but it gets very cheap if the fuse server can cache mappings in the
> > kernel to avoid all that.  That is, incidentally, what patchset #7
> > implements.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache_2026-01-22
> >
> > > If the extent lookups are table lookups or tree
> > > traversals without complex side effects, then having
> > > ->iomap_begin()/->iomap_end() be executed as a bpf program would avoid
> > > the context switches and allow all the caching logic to be moved from
> > > the kernel to the server-side (eg using bpf maps).
> >
> > Hrmm.  Now that /is/ an interesting proposal.  Does BPF have a data
> > structure that supports interval mappings?  I think the existing bpf map
> 
> Not yet but I don't see why a b+ tree like data strucutre couldn't be added.
> Maybe one workaround in the meantime that could work is using a sorted
> array map and doing binary search on that, until interval mappings can
> be natively supported?

I guess, though I already had a C structure to borrow from xfs ;)

> > only does key -> value.  Also, is there an upper limit on the size of a
> > map?  You could have hundreds of millions of maps for a very fragmented
> > regular file.
> 
> If I'm remembering correctly, there's an upper limit on the number of
> map entries, which is bounded by u32

That's problematic, since files can have 64-bit logical block numbers.

> > At one point I suggested to the famfs maintainer that it might be
> > easier/better to implement the interleaved mapping lookups as bpf
> > programs instead of being stuck with a fixed format in the fuse
> > userspace abi, but I don't know if he ever implemented that.
> 
> This seems like a good use case for it too
> >
> > > Is this your
> > > assessment of it as well or do you think the server-side logic for
> > > iomap_begin()/iomap_end() is too complicated to make this realistic?
> > > Asking because I'm curious whether this direction makes sense, not
> > > because I think it would be a blocker for your series.
> >
> > For disk-based filesystems I think it would be difficult to model a bpf
> > program to do mappings, since they can basically point anywhere and be
> > of any size.
> 
> Hmm I'm not familiar enough with disk-based filesystems to know what
> the "point anywhere and be of any size" means. For the mapping stuff,
> doesn't it just point to a block number? Or are you saying the problem
> would be there's too many mappings since a mapping could be any size?

The second -- mappings can be any size, and unprivileged userspace can
control the mappings.

> I was thinking the issue would be more that there might be other logic
> inside ->iomap_begin()/->iomap_end() besides the mapping stuff that
> would need to be done that would be too out-of-scope for bpf. But I
> think I need to read through the fuse4fs stuff to understand more what
> it's doing in those functions.

<nod>

--D

> 
> Thanks,
> Joanne
> 
> >
> > OTOH it would be enormously hilarious to me if one could load a file
> > mapping predictive model into the kernel as a bpf program and use that
> > as a first tier before checking the in-memory btree mapping cache from
> > patchset 7.  Quite a few years ago now there was a FAST paper
> > establishing that even a stupid linear regression model could in theory
> > beat a disk btree lookup.
> >
> > --D
> >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > If you're going to start using this code, I strongly recommend pulling
> > > > from my git trees, which are linked below.
> > > >
> > > > This has been running on the djcloud for months with no problems.  Enjoy!
> > > > Comments and questions are, as always, welcome.
> > > >
> > > > --D
> > > >
> > > > kernel git tree:
> > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-fileio
> > > > ---
> > > > Commits in this patchset:
> > > >  * fuse: implement the basic iomap mechanisms
> > > >  * fuse_trace: implement the basic iomap mechanisms
> > > >  * fuse: make debugging configurable at runtime
> > > >  * fuse: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > >  * fuse_trace: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > >  * fuse: flush events and send FUSE_SYNCFS and FUSE_DESTROY on unmount
> > > >  * fuse: create a per-inode flag for toggling iomap
> > > >  * fuse_trace: create a per-inode flag for toggling iomap
> > > >  * fuse: isolate the other regular file IO paths from iomap
> > > >  * fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > >  * fuse_trace: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > >  * fuse: implement direct IO with iomap
> > > >  * fuse_trace: implement direct IO with iomap
> > > >  * fuse: implement buffered IO with iomap
> > > >  * fuse_trace: implement buffered IO with iomap
> > > >  * fuse: implement large folios for iomap pagecache files
> > > >  * fuse: use an unrestricted backing device with iomap pagecache io
> > > >  * fuse: advertise support for iomap
> > > >  * fuse: query filesystem geometry when using iomap
> > > >  * fuse_trace: query filesystem geometry when using iomap
> > > >  * fuse: implement fadvise for iomap files
> > > >  * fuse: invalidate ranges of block devices being used for iomap
> > > >  * fuse_trace: invalidate ranges of block devices being used for iomap
> > > >  * fuse: implement inline data file IO via iomap
> > > >  * fuse_trace: implement inline data file IO via iomap
> > > >  * fuse: allow more statx fields
> > > >  * fuse: support atomic writes with iomap
> > > >  * fuse_trace: support atomic writes with iomap
> > > >  * fuse: disable direct reclaim for any fuse server that uses iomap
> > > >  * fuse: enable swapfile activation on iomap
> > > >  * fuse: implement freeze and shutdowns for iomap filesystems
> > > > ---
> > > >  fs/fuse/fuse_i.h          |  161 +++
> > > >  fs/fuse/fuse_trace.h      |  939 +++++++++++++++++++
> > > >  fs/fuse/iomap_i.h         |   52 +
> > > >  include/uapi/linux/fuse.h |  219 ++++
> > > >  fs/fuse/Kconfig           |   48 +
> > > >  fs/fuse/Makefile          |    1
> > > >  fs/fuse/backing.c         |   12
> > > >  fs/fuse/dev.c             |   30 +
> > > >  fs/fuse/dir.c             |  120 ++
> > > >  fs/fuse/file.c            |  133 ++-
> > > >  fs/fuse/file_iomap.c      | 2230 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/fuse/inode.c           |  162 +++
> > > >  fs/fuse/iomode.c          |    2
> > > >  fs/fuse/trace.c           |    2
> > > >  14 files changed, 4056 insertions(+), 55 deletions(-)
> > > >  create mode 100644 fs/fuse/iomap_i.h
> > > >  create mode 100644 fs/fuse/file_iomap.c
> > > >
> > >

  reply	other threads:[~2026-01-27 23:21 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251029002755.GK6174@frogsfrogsfrogs>
     [not found] ` <176169810144.1424854.11439355400009006946.stgit@frogsfrogsfrogs>
     [not found]   ` <176169810371.1424854.3010195280915622081.stgit@frogsfrogsfrogs>
2026-01-21 19:34     ` [PATCH 01/31] fuse: implement the basic iomap mechanisms Joanne Koong
2026-01-21 22:45       ` Darrick J. Wong
2026-01-22  0:06         ` Joanne Koong
2026-01-22  0:34           ` Darrick J. Wong
2026-02-05 19:22     ` Chris Mason
2026-02-05 23:31       ` Darrick J. Wong
     [not found]   ` <176169810415.1424854.10373764649459618752.stgit@frogsfrogsfrogs>
2026-01-21 23:42     ` [PATCH 03/31] fuse: make debugging configurable at runtime Joanne Koong
2026-01-22  0:02       ` Darrick J. Wong
2026-01-22  0:23         ` Joanne Koong
2026-01-22  0:40           ` Darrick J. Wong
     [not found]   ` <176169810502.1424854.13869957103489591272.stgit@frogsfrogsfrogs>
2026-01-22  1:13     ` [PATCH 07/31] fuse: create a per-inode flag for toggling iomap Joanne Koong
2026-01-22 22:22       ` Darrick J. Wong
2026-01-23 18:05         ` Joanne Koong
2026-01-24 16:54           ` Darrick J. Wong
2026-01-27 23:33             ` Darrick J. Wong
     [not found]   ` <176169810568.1424854.4073875923015322741.stgit@frogsfrogsfrogs>
2026-01-22  2:07     ` [PATCH 10/31] fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE} Joanne Koong
2026-01-22 22:31       ` Darrick J. Wong
     [not found]   ` <176169810612.1424854.16053093294573829123.stgit@frogsfrogsfrogs>
2026-01-23 18:56     ` [PATCH 12/31] fuse: implement direct IO with iomap Joanne Koong
2026-01-26 23:46       ` Darrick J. Wong
2026-02-05 19:19     ` Chris Mason
2026-02-06  2:08       ` Darrick J. Wong
2026-02-06  2:52         ` Chris Mason
2026-02-06  5:08           ` Darrick J. Wong
2026-02-06 14:27             ` Chris Mason
     [not found]   ` <176169810700.1424854.5753715202341698632.stgit@frogsfrogsfrogs>
2026-01-23 21:50     ` [PATCH 16/31] fuse: implement large folios for iomap pagecache files Joanne Koong
     [not found]   ` <176169810721.1424854.6150447623894591900.stgit@frogsfrogsfrogs>
2026-01-26 22:03     ` [PATCH 17/31] fuse: use an unrestricted backing device with iomap pagecache io Joanne Koong
2026-01-26 23:55       ` Darrick J. Wong
2026-01-27  1:35         ` Joanne Koong
2026-01-27  2:09           ` Darrick J. Wong
2026-01-27 18:04             ` Joanne Koong
2026-01-27 23:37               ` Darrick J. Wong
2026-01-27  0:59   ` [PATCHSET v6 4/8] fuse: allow servers to use iomap for better file IO performance Joanne Koong
2026-01-27  2:22     ` Darrick J. Wong
2026-01-27 19:47       ` Joanne Koong
2026-01-27 23:21         ` Darrick J. Wong [this message]
2026-01-28  0:10           ` Joanne Koong
2026-01-28  0:34             ` Darrick J. Wong
2026-01-29  1:12               ` Joanne Koong
2026-01-29 20:02                 ` Darrick J. Wong
2026-01-29 22:41                   ` Darrick J. Wong
2026-01-29 22:50                   ` Joanne Koong
2026-01-29 23:12                     ` Darrick J. Wong
     [not found]   ` <176169810980.1424854.10557015500766654898.stgit@frogsfrogsfrogs>
2026-02-05 18:57     ` [PATCH 29/31] fuse: disable direct reclaim for any fuse server that uses iomap Chris Mason
2026-02-06  4:25       ` Darrick J. Wong
     [not found]   ` <176169810874.1424854.5037707950055785011.stgit@frogsfrogsfrogs>
2026-02-05 19:01     ` [PATCH 24/31] fuse: implement inline data file IO via iomap Chris Mason
2026-02-06  2:27       ` Darrick J. Wong
     [not found]   ` <176169810765.1424854.10969346031644824992.stgit@frogsfrogsfrogs>
2026-02-05 19:07     ` [PATCH 19/31] fuse: query filesystem geometry when using iomap Chris Mason
2026-02-06  2:17       ` Darrick J. Wong
     [not found]   ` <176169810656.1424854.15239592653019383193.stgit@frogsfrogsfrogs>
2026-02-05 19:12     ` [PATCH 14/31] fuse: implement buffered IO with iomap Chris Mason
2026-02-06  2:14       ` Darrick J. Wong
     [not found]   ` <176169810634.1424854.13084435884326863405.stgit@frogsfrogsfrogs>
2026-02-05 19:16     ` [PATCH 13/31] fuse_trace: implement direct " Chris Mason
2026-02-06  2:12       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260127232125.GA5966@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bernd@bsbernd.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=neal@gompa.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox