Linux filesystem development
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, bernd@bsbernd.com, neal@gompa.dev,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 01/31] fuse: implement the basic iomap mechanisms
Date: Wed, 21 Jan 2026 16:34:01 -0800	[thread overview]
Message-ID: <20260122003401.GL5966@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1aa7eyFLm30wiR9fVuwW6RKKniuFmFUSHhs7NVXKJVKtQ@mail.gmail.com>

On Wed, Jan 21, 2026 at 04:06:39PM -0800, Joanne Koong wrote:
> On Wed, Jan 21, 2026 at 2:45 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Wed, Jan 21, 2026 at 11:34:24AM -0800, Joanne Koong wrote:
> > > On Tue, Oct 28, 2025 at 5:45 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > >
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > >
> > > > Implement functions to enable upcalling of iomap_begin and iomap_end to
> > > > userspace fuse servers.
> > > >
> > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > ---
> > > >  fs/fuse/fuse_i.h          |   22 ++
> > > >  fs/fuse/iomap_i.h         |   36 ++++
> > > >  include/uapi/linux/fuse.h |   90 +++++++++
> > > >  fs/fuse/Kconfig           |   32 +++
> > > >  fs/fuse/Makefile          |    1
> > > >  fs/fuse/file_iomap.c      |  434 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/fuse/inode.c           |    8 +
> > > >  7 files changed, 621 insertions(+), 2 deletions(-)
> > > >  create mode 100644 fs/fuse/iomap_i.h
> > > >  create mode 100644 fs/fuse/file_iomap.c
> > > >
> > > >
> > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > > index 7c7d255d817f1e..45be59df7ae592 100644
> > > > --- a/fs/fuse/fuse_i.h
> > > > +++ b/fs/fuse/fuse_i.h
> > > > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > > > index 18713cfaf09171..7d709cf12b41a7 100644
> > > > --- a/include/uapi/linux/fuse.h
> > > > +++ b/include/uapi/linux/fuse.h
> > > > @@ -240,6 +240,9 @@
> > > >   *  - add FUSE_COPY_FILE_RANGE_64
> > > >   *  - add struct fuse_copy_file_range_out
> > > >   *  - add FUSE_NOTIFY_PRUNE
> > > > + *
> > > > + *  7.99
> > >
> > > Should this be changed to something like 7.46 now that this patch is
> > > submitted for merging into the tree?
> >
> > When review of this patchset nears completion I'll change the 99s to
> > 46 or whatever the fuse/libfuse minor version happens to be at that
> > point.
> 
> Sounds good.

I'll add another XXX comment here to increase the likelihood it doesn't
get missed.

> >
> > Nobody's touched this series since 29 October (during 6.19 development)
> > and I've been busy with xfs_healer so I'm not submitting this for 7.0
> > either.
> >
> > > > + *  - add FUSE_IOMAP and iomap_{begin,end,ioend} for regular file operations
> > > >   */
> > > >
> > > > +/* fuse-specific mapping type indicating that writes use the read mapping */
> > > > +#define FUSE_IOMAP_TYPE_PURE_OVERWRITE (255)
> > > > +
> > > > +#define FUSE_IOMAP_DEV_NULL            (0U)    /* null device cookie */
> > > > +
> > > > +/* mapping flags passed back from iomap_begin; see corresponding IOMAP_F_ */
> > > > +#define FUSE_IOMAP_F_NEW               (1U << 0)
> > > > +#define FUSE_IOMAP_F_DIRTY             (1U << 1)
> > > > +#define FUSE_IOMAP_F_SHARED            (1U << 2)
> > > > +#define FUSE_IOMAP_F_MERGED            (1U << 3)
> > > > +#define FUSE_IOMAP_F_BOUNDARY          (1U << 4)
> > > > +#define FUSE_IOMAP_F_ANON_WRITE                (1U << 5)
> > > > +#define FUSE_IOMAP_F_ATOMIC_BIO                (1U << 6)
> > >
> > > Do you think it makes sense to have the fuse iomap constants mirror
> > > the in-kernel iomap ones? Maybe I'm mistaken but it seems like the
> > > fuse iomap capabilities won't diverge too much from fs/iomap ones? I
> > > like that if they're mirrored, then it makes it simpler instead of
> > > needing to convert back and forth.
> >
> > "Mirrored"?  As in, having the define use a symbol:
> >
> > #define FUSE_IOMAP_F_NEW                IOMAP_F_NEW
> >
> > instead of defining it to be a specific numerical constant like it is
> > here?
> 
> I was thinking keeping it like it is with defining it to a specific
> numerical constant, but having the number correspond to the number
> iomap.h uses and having static asserts to ensure they match, and then
> being able to just pass struct fuse_iomap_io's flags directly to
> iomap->flags and vice versa. But I guess the iomap constants could
> change at any time since it's not a uapi.

Yep.  iomap's api stability is only guaranteed until the mtime changes
on include/linux/iomap.h.

I actually /did/ do the static assert thing earlier in the lifetime of
this patchset, but then I godbolted what the conversion functions were
actually doing and observed that gcc and clang are smart enough to
collapse all the C code into the appropriate masking if you compile with
-O2.

<snip>

> > > > +struct fuse_iomap_io {
> > > > +       uint64_t offset;        /* file offset of mapping, bytes */
> > > > +       uint64_t length;        /* length of mapping, bytes */
> > > > +       uint64_t addr;          /* disk offset of mapping, bytes */
> > > > +       uint16_t type;          /* FUSE_IOMAP_TYPE_* */
> > > > +       uint16_t flags;         /* FUSE_IOMAP_F_* */
> > > > +       uint32_t dev;           /* device cookie */
> > >
> > > Do you think it's a good idea to add a reserved field here in case we
> > > end up needing it in the future?
> >
> > I'm open to the idea of pre-padding the structs, though that's extra
> > copy overhead until they get used for something.
> 
> Bernd would know better than me on this, but iirc, fuse generally
> tries to prepad structs to avoid having to deal with backwards
> compatibility issues if future fields get added.

<nod> for xfs I've generally added one u64 unless two would round us up
to a cacheline... or just defined the struct size to be something insane
like 512 bytes.

> >
> > Does that fuse-iouring-zerocopy patchset that you're working on enable
> > the kernel to avoid copying fuse command data around?  I haven't read it
> > in sufficient (or any) detail to know the answer to that question.
> 
> No, only the payload bypasses the copy. All the header stuff would
> have to get copied out to the ring.

D'oh! :/

> >
> > Second: how easy is it to send a variable sized fuse command to
> > userspace?  It looks like some commands like FUSE_WRITE do things like:
> >
> >         if (ff->fm->fc->minor < 9)
> >                 args->in_args[0].size = FUSE_COMPAT_WRITE_IN_SIZE;
> >         else
> >                 args->in_args[0].size = sizeof(ia->write.in);
> >         args->in_args[0].value = &ia->write.in;
> >         args->in_args[1].size = count;
> >
> > Which means that future expansion can (in theory) bump the minor version
> > and send larer commands.
> >
> > It also looks like the kernel can support receiving variable-sized
> > responses, like FUSE_READ does:
> >
> >         args->out_argvar = true;
> >         args->out_numargs = 1;
> >         args->out_args[0].size = count;
> >
> > I think this means that if we ever needed to expand the _out struct to
> > allow the fuse server to send back a more lengthy response, we could
> > potentially do that without needing a minor protocol version bump.
> 
> I'm not sure, Bernd or Miklos would know more, but my general
> impression has been that we try to avoid doing the FUSE_COMPAT_ stuff
> if we can.

<nod> revving the minor protocol version will take time to propagate.

<snip>

> > > > +};
> > > > +
> > > > +struct fuse_iomap_end_in {
> > > > +       uint32_t opflags;       /* FUSE_IOMAP_OP_* */
> > > > +       uint32_t reserved;      /* zero */
> > > > +       uint64_t attr_ino;      /* matches fuse_attr:ino */
> > > > +       uint64_t pos;           /* file position, in bytes */
> > > > +       uint64_t count;         /* operation length, in bytes */
> > > > +       int64_t written;        /* bytes processed */
> > >
> > > On the fs/iomap side, I see that written is passed through by
> > > iomap_iter() to ->iomap_end through 'ssize_t advanced' but it's not
> > > clear to me why advanced needs to be signed. I think it used to also
> > > represent the error status, but it looks like now that's represented
> > > through iter->status and 'advanced' strictly reflects the number of
> > > bytes written. As such, do you think it makes sense to change
> > > 'advanced' to loff_t and have written be uint64_t instead?
> >
> > Not quite -- back in the bad old days, iomap_iter::processed was a s64
> > value that the iteration loop had to set to one of:
> >
> >  * a positive number for positive progress
> >  * zero to stop the iteration
> >  * a negative errno to fail out
> >
> > Nowadays we just move iomap_iter::pos forward via iomap_iter_advance or
> > set status to a negative number to end the iteration.

Slight inaccuracy: one sets iter->status to a negative number to fail
out of the iteration.  To end early, they should call iomap_iter without
calling iomap_iter_advance.

> > So yes, I think @advanced should be widened to 64-bits since iomap
> > operations can jump more than 2GB per iter step.  Practically speaking I
> > think this hasn't yet been a problem because the only operations that
> > can do that (fiemap, seek, swap) also don't have any client filesystems
> > that implement iomap_end; or they do but never send mappings large
> > enough to cause problems.
> >
> > iomap iters can't go backwards so @advanced could be u64 as well.
> >
> > Also the name of the ->iomap_end parameter could be changed to
> > "advanced" because iomap_end could in theory be called for any
> > operation, not just writes.  That's a throwback to the days when the
> > iomap code was just part of xfs.  It also is an unsigned quantity.
> 
> That makes sense, thanks for the context.

<nod>

<snip>

> > > > +/* Convert a mapping from the server into something the kernel can use */
> > > > +static inline void fuse_iomap_from_server(struct inode *inode,
> > >
> > > Maybe worth adding a const in front of struct inode?
> >
> > It can go away in a patch or two when we wire up bdev support.
> >
> > Though considering that fuse_iomap_enabled returns false all the way to
> > the end of the patchset I guess I could just set bdev to null and skip
> > passing in the inode at all.

Done.

> > > > +                                         struct iomap *iomap,
> > > > +                                         const struct fuse_iomap_io *fmap)
> > > > +{
> > > > +       iomap->addr = fmap->addr;
> > > > +       iomap->offset = fmap->offset;
> > > > +       iomap->length = fmap->length;
> > > > +       iomap->type = fuse_iomap_type_from_server(fmap->type);
> > > > +       iomap->flags = fuse_iomap_flags_from_server(fmap->flags);
> > > > +       iomap->bdev = inode->i_sb->s_bdev; /* XXX */
> > > > +}
> > > > +
> > > > +/* Convert a mapping from the kernel into something the server can use */
> > > > +static inline void fuse_iomap_to_server(struct fuse_iomap_io *fmap,
> > > > +                                       const struct iomap *iomap)
> > > > +{
> > > > +       fmap->addr = FUSE_IOMAP_NULL_ADDR; /* XXX */
> > > > +       fmap->offset = iomap->offset;
> > > > +       fmap->length = iomap->length;
> > > > +       fmap->type = fuse_iomap_type_to_server(iomap->type);
> > > > +       fmap->flags = fuse_iomap_flags_to_server(iomap->flags);
> > > > +       fmap->dev = FUSE_IOMAP_DEV_NULL; /* XXX */
> > >
> > > AFAICT, this only gets used for sending the FUSE_IOMAP_END request. Is
> > > passing the iomap->addr to fmap->addr and inode->i_sb->s_bdev to
> > > fmap->dev not useful to the server here?
> >
> > So far the only fields I've needed in fuse4fs are the
> > offset/count/written fields as provided by iomap_iter, and the flags
> > field from the mapping.  The addr field isn't necessary for fuse4fs
> > because the fuse server would know if the mapping had changed.  OTOH
> > it's probably harmless to send it along.
> >
> > Hrm.  I probably need a way to look up the backing_id from the iomap
> > bdev.
> >
> > Looking further ahead at the ioend patch, I just realized that iomap
> > ioends can tell you the new address of a write-append operation but they
> > don't tell you which device.  I guess you can read that from the
> > ioend->io_bio.bi_bdev.
> >
> > > Also, did you mean to leave in the /* XXX */ comments?
> >
> > Yes, because they're a reminder to come back and check if I /ever/
> > needed them.
> 
> Makes sense, seems like you're planning to remove them when the patch
> is ready to merge, if I understand correctly.

Yeah.  I also fixed this fuse_iomap_to_server to set fmap->dev.

<snip>

> > > > +static int fuse_iomap_end(struct inode *inode, loff_t pos, loff_t count,
> > > > +                         ssize_t written, unsigned opflags,
> > > > +                         struct iomap *iomap)
> > > > +{
> > > > +       struct fuse_inode *fi = get_fuse_inode(inode);
> > > > +       struct fuse_mount *fm = get_fuse_mount(inode);
> > > > +       int err = 0;
> > > > +
> > > > +       if (fuse_should_send_iomap_end(iomap, opflags, count, written)) {
> > > > +               struct fuse_iomap_end_in inarg = {
> > > > +                       .opflags = fuse_iomap_op_to_server(opflags),
> > > > +                       .attr_ino = fi->orig_ino,
> > > > +                       .pos = pos,
> > > > +                       .count = count,
> > > > +                       .written = written,
> > > > +               };
> > > > +               FUSE_ARGS(args);
> > > > +
> > > > +               fuse_iomap_to_server(&inarg.map, iomap);
> > > > +
> > > > +               args.opcode = FUSE_IOMAP_END;
> > > > +               args.nodeid = get_node_id(inode);
> > >
> > > Just curious about this - does it make sense to set args.force here
> > > for this opcode? It seems like it serves the same sort of purpose a
> > > flush request (which sets args.force) does?
> >
> > What does args.force do?  There's no documentation of what behaviors
> > these fields are supposed to trigger.
> 
> The args.force forces the request to be sent even if it gets
> interrupted by a signal. It'll also bypass the fuse_block_alloc()
> check when sending the request, but I don't think that's too relevant
> to this case.

Hrm.  For iomap_begin I think it's ok if a signal kills the IO
operation.  For iomap_end ... I guess we really should force the command
out to the server in case it needs to clean up, even if the user is
hammering on kill -9.

For iomap_ioend the same probably applies, but it's called from
workqueue context so there's not going to be a fatal signal.  But maybe
we should do that, just in case someone develops motivation to make
directio completions run in the caller's context or something.

--D

  reply	other threads:[~2026-01-22  0:34 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251029002755.GK6174@frogsfrogsfrogs>
     [not found] ` <176169810144.1424854.11439355400009006946.stgit@frogsfrogsfrogs>
     [not found]   ` <176169810371.1424854.3010195280915622081.stgit@frogsfrogsfrogs>
2026-01-21 19:34     ` [PATCH 01/31] fuse: implement the basic iomap mechanisms Joanne Koong
2026-01-21 22:45       ` Darrick J. Wong
2026-01-22  0:06         ` Joanne Koong
2026-01-22  0:34           ` Darrick J. Wong [this message]
2026-02-05 19:22     ` Chris Mason
2026-02-05 23:31       ` Darrick J. Wong
     [not found]   ` <176169810415.1424854.10373764649459618752.stgit@frogsfrogsfrogs>
2026-01-21 23:42     ` [PATCH 03/31] fuse: make debugging configurable at runtime Joanne Koong
2026-01-22  0:02       ` Darrick J. Wong
2026-01-22  0:23         ` Joanne Koong
2026-01-22  0:40           ` Darrick J. Wong
     [not found]   ` <176169810502.1424854.13869957103489591272.stgit@frogsfrogsfrogs>
2026-01-22  1:13     ` [PATCH 07/31] fuse: create a per-inode flag for toggling iomap Joanne Koong
2026-01-22 22:22       ` Darrick J. Wong
2026-01-23 18:05         ` Joanne Koong
2026-01-24 16:54           ` Darrick J. Wong
2026-01-27 23:33             ` Darrick J. Wong
     [not found]   ` <176169810568.1424854.4073875923015322741.stgit@frogsfrogsfrogs>
2026-01-22  2:07     ` [PATCH 10/31] fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE} Joanne Koong
2026-01-22 22:31       ` Darrick J. Wong
     [not found]   ` <176169810700.1424854.5753715202341698632.stgit@frogsfrogsfrogs>
2026-01-23 21:50     ` [PATCH 16/31] fuse: implement large folios for iomap pagecache files Joanne Koong
     [not found]   ` <176169810721.1424854.6150447623894591900.stgit@frogsfrogsfrogs>
2026-01-26 22:03     ` [PATCH 17/31] fuse: use an unrestricted backing device with iomap pagecache io Joanne Koong
2026-01-26 23:55       ` Darrick J. Wong
2026-01-27  1:35         ` Joanne Koong
2026-01-27  2:09           ` Darrick J. Wong
2026-01-27 18:04             ` Joanne Koong
2026-01-27 23:37               ` Darrick J. Wong
2026-01-27  0:59   ` [PATCHSET v6 4/8] fuse: allow servers to use iomap for better file IO performance Joanne Koong
2026-01-27  2:22     ` Darrick J. Wong
2026-01-27 19:47       ` Joanne Koong
2026-01-27 23:21         ` Darrick J. Wong
2026-01-28  0:10           ` Joanne Koong
2026-01-28  0:34             ` Darrick J. Wong
2026-01-29  1:12               ` Joanne Koong
2026-01-29 20:02                 ` Darrick J. Wong
2026-01-29 22:41                   ` Darrick J. Wong
2026-01-29 22:50                   ` Joanne Koong
2026-01-29 23:12                     ` Darrick J. Wong
     [not found]   ` <176169810980.1424854.10557015500766654898.stgit@frogsfrogsfrogs>
2026-02-05 18:57     ` [PATCH 29/31] fuse: disable direct reclaim for any fuse server that uses iomap Chris Mason
2026-02-06  4:25       ` Darrick J. Wong
     [not found]   ` <176169810874.1424854.5037707950055785011.stgit@frogsfrogsfrogs>
2026-02-05 19:01     ` [PATCH 24/31] fuse: implement inline data file IO via iomap Chris Mason
2026-02-06  2:27       ` Darrick J. Wong
     [not found]   ` <176169810765.1424854.10969346031644824992.stgit@frogsfrogsfrogs>
2026-02-05 19:07     ` [PATCH 19/31] fuse: query filesystem geometry when using iomap Chris Mason
2026-02-06  2:17       ` Darrick J. Wong
     [not found]   ` <176169810656.1424854.15239592653019383193.stgit@frogsfrogsfrogs>
2026-02-05 19:12     ` [PATCH 14/31] fuse: implement buffered IO with iomap Chris Mason
2026-02-06  2:14       ` Darrick J. Wong
     [not found]   ` <176169810634.1424854.13084435884326863405.stgit@frogsfrogsfrogs>
2026-02-05 19:16     ` [PATCH 13/31] fuse_trace: implement direct " Chris Mason
2026-02-06  2:12       ` Darrick J. Wong
     [not found]   ` <176169810612.1424854.16053093294573829123.stgit@frogsfrogsfrogs>
2026-01-23 18:56     ` [PATCH 12/31] fuse: " Joanne Koong
2026-01-26 23:46       ` Darrick J. Wong
2026-02-05 19:19     ` Chris Mason
2026-02-06  2:08       ` Darrick J. Wong
2026-02-06  2:52         ` Chris Mason
2026-02-06  5:08           ` Darrick J. Wong
2026-02-06 14:27             ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260122003401.GL5966@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bernd@bsbernd.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=neal@gompa.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox