From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/6] iomap: replace folio_batch allocation with stack allocation
Date: Wed, 5 Nov 2025 10:27:52 -0500 [thread overview]
Message-ID: <aQts-Fg2YUoIbVsV@bfoster> (raw)
In-Reply-To: <20251105000716.GU196370@frogsfrogsfrogs>
On Tue, Nov 04, 2025 at 04:07:16PM -0800, Darrick J. Wong wrote:
> On Thu, Oct 16, 2025 at 03:02:58PM -0400, Brian Foster wrote:
> > Zhang Yi points out that the dynamic folio_batch allocation in
> > iomap_fill_dirty_folios() is problematic for the ext4 on iomap work
> > that is under development because it doesn't sufficiently handle the
> > allocation failure case (by allowing a retry, for example).
> >
> > The dynamic allocation was initially added for simplicity and to
> > help indicate whether the batch was used or not by the calling fs.
> > To address this issue, put the batch on the stack of
> > iomap_zero_range() and use a flag to control whether the batch
> > should be used in the iomap folio lookup path. This keeps things
> > simple and eliminates the concern for ext4 on iomap.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
>
> Hrmm, so who kmallocs the fbatch array now? Is that left as an exercise
> to the filesystem? I'm confused because I don't see the kmalloc call
> reappear elsewhere, at least not in this patch.
>
It's no longer dynamically allocated. It's allocated on the stack in
iomap_zero_range(), and a flag is set on the iomap if the pagecache
lookup occurs. The allocation is a potential problem for the ext4 on
iomap port, so this elides the need for dealing with alloc failures and
whatnot.
This is probably how it should have been done from the start, but when
the flag related feedback came along it was deep enough in test/review
cycle that I preferred to do it separately.
Brian
> --D
>
> > ---
> > fs/iomap/buffered-io.c | 45 ++++++++++++++++++++++++++++--------------
> > fs/iomap/iter.c | 6 +++---
> > fs/xfs/xfs_iomap.c | 11 ++++++-----
> > include/linux/iomap.h | 8 ++++++--
> > 4 files changed, 45 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 51ecb6d48feb..05ff82c5432e 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -761,7 +761,7 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter,
> > if (!mapping_large_folio_support(iter->inode->i_mapping))
> > len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
> >
> > - if (iter->fbatch) {
> > + if (iter->iomap.flags & IOMAP_F_FOLIO_BATCH) {
> > struct folio *folio = folio_batch_next(iter->fbatch);
> >
> > if (!folio)
> > @@ -858,7 +858,7 @@ static int iomap_write_begin(struct iomap_iter *iter,
> > * process so return and let the caller iterate and refill the batch.
> > */
> > if (!folio) {
> > - WARN_ON_ONCE(!iter->fbatch);
> > + WARN_ON_ONCE(!(iter->iomap.flags & IOMAP_F_FOLIO_BATCH));
> > return 0;
> > }
> >
> > @@ -1473,23 +1473,34 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> > return status;
> > }
> >
> > -loff_t
> > +/**
> > + * iomap_fill_dirty_folios - fill a folio batch with dirty folios
> > + * @iter: Iteration structure
> > + * @start: Start offset of range. Updated based on lookup progress.
> > + * @end: End offset of range
> > + *
> > + * Returns the associated control flag if the folio batch is available and the
> > + * lookup performed. The caller is responsible to set the flag on the associated
> > + * iomap.
> > + */
> > +unsigned int
> > iomap_fill_dirty_folios(
> > struct iomap_iter *iter,
> > - loff_t offset,
> > - loff_t length)
> > + loff_t *start,
> > + loff_t end)
> > {
> > struct address_space *mapping = iter->inode->i_mapping;
> > - pgoff_t start = offset >> PAGE_SHIFT;
> > - pgoff_t end = (offset + length - 1) >> PAGE_SHIFT;
> > + pgoff_t pstart = *start >> PAGE_SHIFT;
> > + pgoff_t pend = (end - 1) >> PAGE_SHIFT;
> >
> > - iter->fbatch = kmalloc(sizeof(struct folio_batch), GFP_KERNEL);
> > - if (!iter->fbatch)
> > - return offset + length;
> > - folio_batch_init(iter->fbatch);
> > + if (!iter->fbatch) {
> > + *start = end;
> > + return 0;
> > + }
> >
> > - filemap_get_folios_dirty(mapping, &start, end, iter->fbatch);
> > - return (start << PAGE_SHIFT);
> > + filemap_get_folios_dirty(mapping, &pstart, pend, iter->fbatch);
> > + *start = (pstart << PAGE_SHIFT);
> > + return IOMAP_F_FOLIO_BATCH;
> > }
> > EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios);
> >
> > @@ -1498,17 +1509,21 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> > const struct iomap_ops *ops,
> > const struct iomap_write_ops *write_ops, void *private)
> > {
> > + struct folio_batch fbatch;
> > struct iomap_iter iter = {
> > .inode = inode,
> > .pos = pos,
> > .len = len,
> > .flags = IOMAP_ZERO,
> > .private = private,
> > + .fbatch = &fbatch,
> > };
> > struct address_space *mapping = inode->i_mapping;
> > int ret;
> > bool range_dirty;
> >
> > + folio_batch_init(&fbatch);
> > +
> > /*
> > * To avoid an unconditional flush, check pagecache state and only flush
> > * if dirty and the fs returns a mapping that might convert on
> > @@ -1519,11 +1534,11 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> > while ((ret = iomap_iter(&iter, ops)) > 0) {
> > const struct iomap *srcmap = iomap_iter_srcmap(&iter);
> >
> > - if (WARN_ON_ONCE(iter.fbatch &&
> > + if (WARN_ON_ONCE((iter.iomap.flags & IOMAP_F_FOLIO_BATCH) &&
> > srcmap->type != IOMAP_UNWRITTEN))
> > return -EIO;
> >
> > - if (!iter.fbatch &&
> > + if (!(iter.iomap.flags & IOMAP_F_FOLIO_BATCH) &&
> > (srcmap->type == IOMAP_HOLE ||
> > srcmap->type == IOMAP_UNWRITTEN)) {
> > s64 status;
> > diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> > index 66ca12aac57d..026d85823c76 100644
> > --- a/fs/iomap/iter.c
> > +++ b/fs/iomap/iter.c
> > @@ -8,10 +8,10 @@
> >
> > static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > {
> > - if (iter->fbatch) {
> > + if (iter->iomap.flags & IOMAP_F_FOLIO_BATCH) {
> > folio_batch_release(iter->fbatch);
> > - kfree(iter->fbatch);
> > - iter->fbatch = NULL;
> > + folio_batch_reinit(iter->fbatch);
> > + iter->iomap.flags &= ~IOMAP_F_FOLIO_BATCH;
> > }
> >
> > iter->status = 0;
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 535bf3b8705d..01833aca37ac 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1775,7 +1775,6 @@ xfs_buffered_write_iomap_begin(
> > */
> > if (flags & IOMAP_ZERO) {
> > xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> > - u64 end;
> >
> > if (isnullstartblock(imap.br_startblock) &&
> > offset_fsb >= eof_fsb)
> > @@ -1795,12 +1794,14 @@ xfs_buffered_write_iomap_begin(
> > */
> > if (imap.br_state == XFS_EXT_UNWRITTEN &&
> > offset_fsb < eof_fsb) {
> > - loff_t len = min(count,
> > - XFS_FSB_TO_B(mp, imap.br_blockcount));
> > + loff_t foffset = offset, fend;
> >
> > - end = iomap_fill_dirty_folios(iter, offset, len);
> > + fend = offset +
> > + min(count, XFS_FSB_TO_B(mp, imap.br_blockcount));
> > + iomap_flags |= iomap_fill_dirty_folios(iter, &foffset,
> > + fend);
> > end_fsb = min_t(xfs_fileoff_t, end_fsb,
> > - XFS_B_TO_FSB(mp, end));
> > + XFS_B_TO_FSB(mp, foffset));
> > }
> >
> > xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index cd0f573156d6..79da917ff45e 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -87,6 +87,9 @@ struct vm_fault;
> > /*
> > * Flags set by the core iomap code during operations:
> > *
> > + * IOMAP_F_FOLIO_BATCH indicates that the folio batch mechanism is active
> > + * for this operation, set by iomap_fill_dirty_folios().
> > + *
> > * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> > * has changed as the result of this write operation.
> > *
> > @@ -94,6 +97,7 @@ struct vm_fault;
> > * range it covers needs to be remapped by the high level before the operation
> > * can proceed.
> > */
> > +#define IOMAP_F_FOLIO_BATCH (1U << 13)
> > #define IOMAP_F_SIZE_CHANGED (1U << 14)
> > #define IOMAP_F_STALE (1U << 15)
> >
> > @@ -351,8 +355,8 @@ bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
> > int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> > const struct iomap_ops *ops,
> > const struct iomap_write_ops *write_ops);
> > -loff_t iomap_fill_dirty_folios(struct iomap_iter *iter, loff_t offset,
> > - loff_t length);
> > +unsigned int iomap_fill_dirty_folios(struct iomap_iter *iter, loff_t *start,
> > + loff_t end);
> > int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> > bool *did_zero, const struct iomap_ops *ops,
> > const struct iomap_write_ops *write_ops, void *private);
> > --
> > 2.51.0
> >
> >
>
next prev parent reply other threads:[~2025-11-05 15:23 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 19:02 [PATCH 0/6] iomap, xfs: improve zero range flushing and lookup Brian Foster
2025-10-16 19:02 ` [PATCH 1/6] iomap: replace folio_batch allocation with stack allocation Brian Foster
2025-11-05 0:07 ` Darrick J. Wong
2025-11-05 15:27 ` Brian Foster [this message]
2025-11-05 21:41 ` Darrick J. Wong
2025-11-06 15:51 ` Brian Foster
2025-11-06 15:58 ` Darrick J. Wong
2025-10-16 19:02 ` [PATCH 2/6] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
2025-11-05 0:31 ` Darrick J. Wong
2025-11-05 15:33 ` Brian Foster
2025-11-05 22:23 ` Darrick J. Wong
2025-11-06 15:52 ` Brian Foster
2025-11-06 23:32 ` Darrick J. Wong
2025-11-07 13:52 ` Brian Foster
2025-11-07 13:59 ` Christoph Hellwig
2025-11-07 13:57 ` Christoph Hellwig
2025-11-07 13:55 ` Christoph Hellwig
2025-10-16 19:03 ` [PATCH 3/6] xfs: flush eof folio before insert range size update Brian Foster
2025-11-05 0:14 ` Darrick J. Wong
2025-11-05 15:34 ` Brian Foster
2025-11-05 22:15 ` Darrick J. Wong
2025-11-18 20:08 ` Brian Foster
2025-10-16 19:03 ` [PATCH 4/6] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
2025-11-05 22:26 ` Darrick J. Wong
2025-10-16 19:03 ` [PATCH 5/6] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
2025-10-16 19:03 ` [PATCH 6/6] xfs: replace zero range flush with folio batch Brian Foster
2025-11-05 22:37 ` Darrick J. Wong
2025-11-06 15:53 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQts-Fg2YUoIbVsV@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox