From: Joanne Koong <joannelkoong@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: brauner@kernel.org, miklos@szeredi.hu, hch@infradead.org,
linux-block@vger.kernel.org, gfs2@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-doc@vger.kernel.org, hsiangkao@linux.alibaba.com,
kernel-team@meta.com
Subject: Re: [PATCH v4 09/15] iomap: add caller-provided callbacks for read and readahead
Date: Wed, 24 Sep 2025 11:18:14 -0700 [thread overview]
Message-ID: <CAJnrk1bYAaJofNBpYYKB2fWGVw-9BPrOUBy_ivmfnjR=49BmNQ@mail.gmail.com> (raw)
In-Reply-To: <20250924002654.GM1587915@frogsfrogsfrogs>
On Tue, Sep 23, 2025 at 5:26 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Mon, Sep 22, 2025 at 05:23:47PM -0700, Joanne Koong wrote:
> > Add caller-provided callbacks for read and readahead so that it can be
> > used generically, especially by filesystems that are not block-based.
> >
> > In particular, this:
> > * Modifies the read and readahead interface to take in a
> > struct iomap_read_folio_ctx that is publicly defined as:
> >
> > struct iomap_read_folio_ctx {
> > const struct iomap_read_ops *ops;
> > struct folio *cur_folio;
> > struct readahead_control *rac;
> > void *read_ctx;
> > };
>
> I'm starting to wonder if struct iomap_read_ops should contain a struct
> iomap_ops object, but that might result in more churn through this
> patchset.
>
> <shrug> What do you think?
Lol I thought the same thing a while back for "struct iomap_write_ops"
but I don't think Christoph liked the idea [1]
[1] https://lore.kernel.org/linux-fsdevel/20250618044344.GE28041@lst.de/
>
> >
> > where struct iomap_read_ops is defined as:
> >
> > struct iomap_read_ops {
> > int (*read_folio_range)(const struct iomap_iter *iter,
> > struct iomap_read_folio_ctx *ctx,
> > size_t len);
> > void (*read_submit)(struct iomap_read_folio_ctx *ctx);
> > };
> >
> > read_folio_range() reads in the folio range and is required by the
> > caller to provide. read_submit() is optional and is used for
> > submitting any pending read requests.
> >
> > * Modifies existing filesystems that use iomap for read and readahead to
> > use the new API, through the new statically inlined helpers
> > iomap_bio_read_folio() and iomap_bio_readahead(). There is no change
> > in functinality for those filesystems.
>
> Nit: functionality
Thanks, will fix this!
>
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > .../filesystems/iomap/operations.rst | 45 ++++++++++++
> > block/fops.c | 5 +-
> > fs/erofs/data.c | 5 +-
> > fs/gfs2/aops.c | 6 +-
> > fs/iomap/buffered-io.c | 68 +++++++++++--------
> > fs/xfs/xfs_aops.c | 5 +-
> > fs/zonefs/file.c | 5 +-
> > include/linux/iomap.h | 62 ++++++++++++++++-
> > 8 files changed, 158 insertions(+), 43 deletions(-)
> >
> > diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> > index 067ed8e14ef3..dbb193415c0e 100644
> > --- a/Documentation/filesystems/iomap/operations.rst
> > +++ b/Documentation/filesystems/iomap/operations.rst
> > @@ -135,6 +135,29 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
> >
> > * ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
> >
> > +``struct iomap_read_ops``
> > +--------------------------
> > +
> > +.. code-block:: c
> > +
> > + struct iomap_read_ops {
> > + int (*read_folio_range)(const struct iomap_iter *iter,
> > + struct iomap_read_folio_ctx *ctx, size_t len);
> > + void (*submit_read)(struct iomap_read_folio_ctx *ctx);
> > + };
> > +
> > +iomap calls these functions:
> > +
> > + - ``read_folio_range``: Called to read in the range. This must be provided
> > + by the caller. The caller is responsible for calling
> > + iomap_start_folio_read() and iomap_finish_folio_read() before and after
> > + reading in the folio range. This should be done even if an error is
> > + encountered during the read. This returns 0 on success or a negative error
> > + on failure.
> > +
> > + - ``submit_read``: Submit any pending read requests. This function is
> > + optional.
> > +
> > Internal per-Folio State
> > ------------------------
> >
> > @@ -182,6 +205,28 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
> > The pagecache takes whatever locks it needs before calling the
> > filesystem.
> >
> > +Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
> > +iomap_read_folio_ctx``:
> > +
> > +.. code-block:: c
> > +
> > + struct iomap_read_folio_ctx {
> > + const struct iomap_read_ops *ops;
> > + struct folio *cur_folio;
> > + struct readahead_control *rac;
> > + void *read_ctx;
> > + };
> > +
> > +``iomap_readahead`` must set:
> > + * ``ops->read_folio_range()`` and ``rac``
> > +
> > +``iomap_read_folio`` must set:
> > + * ``ops->read_folio_range()`` and ``cur_folio``
>
> Hrmm, so we're multiplexing read and readahead through the same
> iomap_read_folio_ctx. Is there ever a case where cur_folio and rac can
> both be used by the underlying machinery? I think the answer to that
> question is "no" but I don't think the struct definition makes that
> obvious.
In the ->read_folio_range() callback, both rac and cur_folio are used
for readahead, but in passing in the "struct iomap_read_folio_ctx" to
the main iomap_read_folio()/iomap_readahead() entrypoint, no both rac
and cur_folio do not get set at the same time.
We could change the signature back to something like:
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
const struct iomap_read_ops *ops, void *read_ctx);
void iomap_readahead(struct readahead_control *rac, const struct
iomap_ops *ops, const struct iomap_read_ops *ops, void *read_ctx);
but I think it might get a bit much if/when "void *private" needs to
get added too for iomap iter metadata, though maybe that's okay now
that the private read data has been renamed to read_ctx.
>
> > +
> > static int iomap_read_folio_iter(struct iomap_iter *iter,
> > struct iomap_read_folio_ctx *ctx, bool *folio_owned)
> > {
> > @@ -436,7 +438,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> > loff_t length = iomap_length(iter);
> > struct folio *folio = ctx->cur_folio;
> > size_t poff, plen;
> > - loff_t count;
> > + loff_t pos_diff;
> > int ret;
> >
> > if (iomap->type == IOMAP_INLINE) {
> > @@ -454,12 +456,16 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> > iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff,
> > &plen);
> >
> > - count = pos - iter->pos + plen;
> > - if (WARN_ON_ONCE(count > length))
> > + pos_diff = pos - iter->pos;
> > + if (WARN_ON_ONCE(pos_diff + plen > length))
> > return -EIO;
>
> Er, can these changes get their own patch describing why the count ->
> pos_diff change was made?
I will separate this out into its own patch. The reasoning behind this
is so that the ->read_folio_range() callback doesn't need to take in a
pos arg but instead can get it from iter->pos [1]
[1] https://lore.kernel.org/linux-fsdevel/aMKt52YxKi1Wrw4y@infradead.org/
Thanks for looking at this patchset!
Thanks,
Joanne
>
> --D
>
next prev parent reply other threads:[~2025-09-24 18:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 0:23 [PATCH v4 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-23 0:23 ` [PATCH v4 01/15] iomap: move bio read logic into helper function Joanne Koong
2025-09-23 0:23 ` [PATCH v4 02/15] iomap: move read/readahead bio submission " Joanne Koong
2025-09-23 0:23 ` [PATCH v4 03/15] iomap: store read/readahead bio generically Joanne Koong
2025-09-23 0:23 ` [PATCH v4 04/15] iomap: iterate over folio mapping in iomap_readpage_iter() Joanne Koong
2025-09-23 0:23 ` [PATCH v4 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-23 0:23 ` [PATCH v4 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-23 0:23 ` [PATCH v4 07/15] iomap: track read/readahead folio ownership internally Joanne Koong
2025-09-24 0:13 ` Darrick J. Wong
2025-09-23 0:23 ` [PATCH v4 08/15] iomap: add public start/finish folio read helpers Joanne Koong
2025-09-23 0:23 ` [PATCH v4 09/15] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-24 0:26 ` Darrick J. Wong
2025-09-24 18:18 ` Joanne Koong [this message]
2025-09-23 0:23 ` [PATCH v4 10/15] iomap: add bias for async read requests Joanne Koong
2025-09-24 0:28 ` Darrick J. Wong
2025-09-24 18:23 ` Joanne Koong
2025-09-23 0:23 ` [PATCH v4 11/15] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-23 0:23 ` [PATCH v4 12/15] iomap: make iomap_read_folio() a void return Joanne Koong
2025-09-23 0:23 ` [PATCH v4 13/15] fuse: use iomap for read_folio Joanne Koong
2025-09-23 15:39 ` Miklos Szeredi
2025-09-23 17:21 ` Darrick J. Wong
2025-09-23 0:23 ` [PATCH v4 14/15] fuse: use iomap for readahead Joanne Koong
2025-09-23 0:23 ` [PATCH v4 15/15] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJnrk1bYAaJofNBpYYKB2fWGVw-9BPrOUBy_ivmfnjR=49BmNQ@mail.gmail.com' \
--to=joannelkoong@gmail.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=hch@infradead.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=kernel-team@meta.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).