* [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances
2024-11-08 12:42 [PATCH v3 0/4] iomap: zero range flush fixes Brian Foster
@ 2024-11-08 12:42 ` Brian Foster
2024-11-09 3:00 ` Darrick J. Wong
2024-11-11 5:53 ` Christoph Hellwig
2024-11-08 12:42 ` [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
` (2 subsequent siblings)
3 siblings, 2 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-08 12:42 UTC (permalink / raw)
To: linux-fsdevel
iomap_iter_advance() zeroes the processed and mapping fields on
every non-error iteration except for the last expected iteration
(i.e. return 0 expected to terminate the iteration loop). This
appears to be circumstantial as nothing currently relies on these
fields after the final iteration.
Therefore to better faciliate iomap_iter reuse in subsequent
patches, update iomap_iter_advance() to always reset per-iteration
state on successful completion.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/iter.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index 79a0614eaab7..3790918646af 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -22,26 +22,25 @@
static inline int iomap_iter_advance(struct iomap_iter *iter)
{
bool stale = iter->iomap.flags & IOMAP_F_STALE;
+ int ret = 1;
/* handle the previous iteration (if any) */
if (iter->iomap.length) {
if (iter->processed < 0)
return iter->processed;
- if (!iter->processed && !stale)
- return 0;
if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
return -EIO;
iter->pos += iter->processed;
iter->len -= iter->processed;
- if (!iter->len)
- return 0;
+ if (!iter->len || (!iter->processed && !stale))
+ ret = 0;
}
- /* clear the state for the next iteration */
+ /* clear the per iteration state */
iter->processed = 0;
memset(&iter->iomap, 0, sizeof(iter->iomap));
memset(&iter->srcmap, 0, sizeof(iter->srcmap));
- return 1;
+ return ret;
}
static inline void iomap_iter_done(struct iomap_iter *iter)
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances
2024-11-08 12:42 ` [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances Brian Foster
@ 2024-11-09 3:00 ` Darrick J. Wong
2024-11-11 5:53 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2024-11-09 3:00 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:42:43AM -0500, Brian Foster wrote:
> iomap_iter_advance() zeroes the processed and mapping fields on
> every non-error iteration except for the last expected iteration
> (i.e. return 0 expected to terminate the iteration loop). This
> appears to be circumstantial as nothing currently relies on these
> fields after the final iteration.
>
> Therefore to better faciliate iomap_iter reuse in subsequent
> patches, update iomap_iter_advance() to always reset per-iteration
> state on successful completion.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Seems pretty straightforward to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/iomap/iter.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> index 79a0614eaab7..3790918646af 100644
> --- a/fs/iomap/iter.c
> +++ b/fs/iomap/iter.c
> @@ -22,26 +22,25 @@
> static inline int iomap_iter_advance(struct iomap_iter *iter)
> {
> bool stale = iter->iomap.flags & IOMAP_F_STALE;
> + int ret = 1;
>
> /* handle the previous iteration (if any) */
> if (iter->iomap.length) {
> if (iter->processed < 0)
> return iter->processed;
> - if (!iter->processed && !stale)
> - return 0;
> if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
> return -EIO;
> iter->pos += iter->processed;
> iter->len -= iter->processed;
> - if (!iter->len)
> - return 0;
> + if (!iter->len || (!iter->processed && !stale))
> + ret = 0;
> }
>
> - /* clear the state for the next iteration */
> + /* clear the per iteration state */
> iter->processed = 0;
> memset(&iter->iomap, 0, sizeof(iter->iomap));
> memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> - return 1;
> + return ret;
> }
>
> static inline void iomap_iter_done(struct iomap_iter *iter)
> --
> 2.47.0
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances
2024-11-08 12:42 ` [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances Brian Foster
2024-11-09 3:00 ` Darrick J. Wong
@ 2024-11-11 5:53 ` Christoph Hellwig
2024-11-12 13:59 ` Brian Foster
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-11-11 5:53 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
(btw, can you keep Ccing iomap patches to linux-xfs per the entry
in MAINTAINERS?)
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-08 12:42 [PATCH v3 0/4] iomap: zero range flush fixes Brian Foster
2024-11-08 12:42 ` [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances Brian Foster
@ 2024-11-08 12:42 ` Brian Foster
2024-11-09 3:01 ` Darrick J. Wong
2024-11-11 6:03 ` Christoph Hellwig
2024-11-08 12:42 ` [PATCH v3 3/4] iomap: elide flush from partial eof zero range Brian Foster
2024-11-08 12:42 ` [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio Brian Foster
3 siblings, 2 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-08 12:42 UTC (permalink / raw)
To: linux-fsdevel
In preparation for special handling of subranges, lift the zeroed
mapping logic from the iterator into the caller. Since this puts the
pagecache dirty check and flushing in the same place, streamline the
comments a bit as well.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
1 file changed, 22 insertions(+), 42 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ef0b68bccbb6..a78b5b9b3df3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
return filemap_write_and_wait_range(mapping, i->pos, end);
}
-static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
- bool *range_dirty)
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
loff_t pos = iter->pos;
loff_t length = iomap_length(iter);
loff_t written = 0;
- /*
- * We must zero subranges of unwritten mappings that might be dirty in
- * pagecache from previous writes. We only know whether the entire range
- * was clean or not, however, and dirty folios may have been written
- * back or reclaimed at any point after mapping lookup.
- *
- * The easiest way to deal with this is to flush pagecache to trigger
- * any pending unwritten conversions and then grab the updated extents
- * from the fs. The flush may change the current mapping, so mark it
- * stale for the iterator to remap it for the next pass to handle
- * properly.
- *
- * Note that holes are treated the same as unwritten because zero range
- * is (ab)used for partial folio zeroing in some cases. Hole backed
- * post-eof ranges can be dirtied via mapped write and the flush
- * triggers writeback time post-eof zeroing.
- */
- if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
- if (*range_dirty) {
- *range_dirty = false;
- return iomap_zero_iter_flush_and_stale(iter);
- }
- /* range is clean and already zeroed, nothing to do */
- return length;
- }
-
do {
struct folio *folio;
int status;
@@ -1433,24 +1405,32 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
bool range_dirty;
/*
- * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
- * pagecache must be flushed to ensure stale data from previous
- * buffered writes is not exposed. A flush is only required for certain
- * types of mappings, but checking pagecache after mapping lookup is
- * racy with writeback and reclaim.
+ * Zero range can skip mappings that are zero on disk so long as
+ * pagecache is clean. If pagecache was dirty prior to zero range, the
+ * mapping converts on writeback completion and so must be zeroed.
*
- * Therefore, check the entire range first and pass along whether any
- * part of it is dirty. If so and an underlying mapping warrants it,
- * flush the cache at that point. This trades off the occasional false
- * positive (and spurious flush, if the dirty data and mapping don't
- * happen to overlap) for simplicity in handling a relatively uncommon
- * situation.
+ * The simplest way to deal with this across a range is to flush
+ * pagecache and process the updated mappings. To avoid an unconditional
+ * flush, check pagecache state and only flush if dirty and the fs
+ * returns a mapping that might convert on writeback.
*/
range_dirty = filemap_range_needs_writeback(inode->i_mapping,
pos, pos + len - 1);
+ while ((ret = iomap_iter(&iter, ops)) > 0) {
+ const struct iomap *s = iomap_iter_srcmap(&iter);
+
+ if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
+ loff_t p = iomap_length(&iter);
+ if (range_dirty) {
+ range_dirty = false;
+ p = iomap_zero_iter_flush_and_stale(&iter);
+ }
+ iter.processed = p;
+ continue;
+ }
- while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
+ iter.processed = iomap_zero_iter(&iter, did_zero);
+ }
return ret;
}
EXPORT_SYMBOL_GPL(iomap_zero_range);
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-08 12:42 ` [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
@ 2024-11-09 3:01 ` Darrick J. Wong
2024-11-12 13:59 ` Brian Foster
2024-11-11 6:03 ` Christoph Hellwig
1 sibling, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2024-11-09 3:01 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> In preparation for special handling of subranges, lift the zeroed
> mapping logic from the iterator into the caller. Since this puts the
> pagecache dirty check and flushing in the same place, streamline the
> comments a bit as well.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
> 1 file changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ef0b68bccbb6..a78b5b9b3df3 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
> return filemap_write_and_wait_range(mapping, i->pos, end);
> }
>
> -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> - bool *range_dirty)
> +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> {
> - const struct iomap *srcmap = iomap_iter_srcmap(iter);
> loff_t pos = iter->pos;
> loff_t length = iomap_length(iter);
> loff_t written = 0;
>
> - /*
> - * We must zero subranges of unwritten mappings that might be dirty in
> - * pagecache from previous writes. We only know whether the entire range
> - * was clean or not, however, and dirty folios may have been written
> - * back or reclaimed at any point after mapping lookup.
> - *
> - * The easiest way to deal with this is to flush pagecache to trigger
> - * any pending unwritten conversions and then grab the updated extents
> - * from the fs. The flush may change the current mapping, so mark it
> - * stale for the iterator to remap it for the next pass to handle
> - * properly.
> - *
> - * Note that holes are treated the same as unwritten because zero range
> - * is (ab)used for partial folio zeroing in some cases. Hole backed
> - * post-eof ranges can be dirtied via mapped write and the flush
> - * triggers writeback time post-eof zeroing.
> - */
> - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> - if (*range_dirty) {
> - *range_dirty = false;
> - return iomap_zero_iter_flush_and_stale(iter);
> - }
> - /* range is clean and already zeroed, nothing to do */
> - return length;
> - }
> -
> do {
> struct folio *folio;
> int status;
> @@ -1433,24 +1405,32 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> bool range_dirty;
>
> /*
> - * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
> - * pagecache must be flushed to ensure stale data from previous
> - * buffered writes is not exposed. A flush is only required for certain
> - * types of mappings, but checking pagecache after mapping lookup is
> - * racy with writeback and reclaim.
> + * Zero range can skip mappings that are zero on disk so long as
> + * pagecache is clean. If pagecache was dirty prior to zero range, the
> + * mapping converts on writeback completion and so must be zeroed.
> *
> - * Therefore, check the entire range first and pass along whether any
> - * part of it is dirty. If so and an underlying mapping warrants it,
> - * flush the cache at that point. This trades off the occasional false
> - * positive (and spurious flush, if the dirty data and mapping don't
> - * happen to overlap) for simplicity in handling a relatively uncommon
> - * situation.
> + * The simplest way to deal with this across a range is to flush
> + * pagecache and process the updated mappings. To avoid an unconditional
> + * flush, check pagecache state and only flush if dirty and the fs
> + * returns a mapping that might convert on writeback.
> */
> range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> pos, pos + len - 1);
> + while ((ret = iomap_iter(&iter, ops)) > 0) {
> + const struct iomap *s = iomap_iter_srcmap(&iter);
> +
> + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> + loff_t p = iomap_length(&iter);
Another dumb nit: blank line after the declaration.
With that fixed, this is ok by me for further testing:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + if (range_dirty) {
> + range_dirty = false;
> + p = iomap_zero_iter_flush_and_stale(&iter);
> + }
> + iter.processed = p;
> + continue;
> + }
>
> - while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
> + iter.processed = iomap_zero_iter(&iter, did_zero);
> + }
> return ret;
> }
> EXPORT_SYMBOL_GPL(iomap_zero_range);
> --
> 2.47.0
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-09 3:01 ` Darrick J. Wong
@ 2024-11-12 13:59 ` Brian Foster
0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-12 13:59 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:01:27PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > In preparation for special handling of subranges, lift the zeroed
> > mapping logic from the iterator into the caller. Since this puts the
> > pagecache dirty check and flushing in the same place, streamline the
> > comments a bit as well.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
> > 1 file changed, 22 insertions(+), 42 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ef0b68bccbb6..a78b5b9b3df3 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
> > return filemap_write_and_wait_range(mapping, i->pos, end);
> > }
> >
> > -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> > - bool *range_dirty)
> > +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> > {
> > - const struct iomap *srcmap = iomap_iter_srcmap(iter);
> > loff_t pos = iter->pos;
> > loff_t length = iomap_length(iter);
> > loff_t written = 0;
> >
> > - /*
> > - * We must zero subranges of unwritten mappings that might be dirty in
> > - * pagecache from previous writes. We only know whether the entire range
> > - * was clean or not, however, and dirty folios may have been written
> > - * back or reclaimed at any point after mapping lookup.
> > - *
> > - * The easiest way to deal with this is to flush pagecache to trigger
> > - * any pending unwritten conversions and then grab the updated extents
> > - * from the fs. The flush may change the current mapping, so mark it
> > - * stale for the iterator to remap it for the next pass to handle
> > - * properly.
> > - *
> > - * Note that holes are treated the same as unwritten because zero range
> > - * is (ab)used for partial folio zeroing in some cases. Hole backed
> > - * post-eof ranges can be dirtied via mapped write and the flush
> > - * triggers writeback time post-eof zeroing.
> > - */
> > - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> > - if (*range_dirty) {
> > - *range_dirty = false;
> > - return iomap_zero_iter_flush_and_stale(iter);
> > - }
> > - /* range is clean and already zeroed, nothing to do */
> > - return length;
> > - }
> > -
> > do {
> > struct folio *folio;
> > int status;
> > @@ -1433,24 +1405,32 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> > bool range_dirty;
> >
> > /*
> > - * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
> > - * pagecache must be flushed to ensure stale data from previous
> > - * buffered writes is not exposed. A flush is only required for certain
> > - * types of mappings, but checking pagecache after mapping lookup is
> > - * racy with writeback and reclaim.
> > + * Zero range can skip mappings that are zero on disk so long as
> > + * pagecache is clean. If pagecache was dirty prior to zero range, the
> > + * mapping converts on writeback completion and so must be zeroed.
> > *
> > - * Therefore, check the entire range first and pass along whether any
> > - * part of it is dirty. If so and an underlying mapping warrants it,
> > - * flush the cache at that point. This trades off the occasional false
> > - * positive (and spurious flush, if the dirty data and mapping don't
> > - * happen to overlap) for simplicity in handling a relatively uncommon
> > - * situation.
> > + * The simplest way to deal with this across a range is to flush
> > + * pagecache and process the updated mappings. To avoid an unconditional
> > + * flush, check pagecache state and only flush if dirty and the fs
> > + * returns a mapping that might convert on writeback.
> > */
> > range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> > pos, pos + len - 1);
> > + while ((ret = iomap_iter(&iter, ops)) > 0) {
> > + const struct iomap *s = iomap_iter_srcmap(&iter);
> > +
> > + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > + loff_t p = iomap_length(&iter);
>
> Another dumb nit: blank line after the declaration.
>
Fixed.
> With that fixed, this is ok by me for further testing:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
Thanks.
Brian
> --D
>
> > + if (range_dirty) {
> > + range_dirty = false;
> > + p = iomap_zero_iter_flush_and_stale(&iter);
> > + }
> > + iter.processed = p;
> > + continue;
> > + }
> >
> > - while ((ret = iomap_iter(&iter, ops)) > 0)
> > - iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
> > + iter.processed = iomap_zero_iter(&iter, did_zero);
> > + }
> > return ret;
> > }
> > EXPORT_SYMBOL_GPL(iomap_zero_range);
> > --
> > 2.47.0
> >
> >
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-08 12:42 ` [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
2024-11-09 3:01 ` Darrick J. Wong
@ 2024-11-11 6:03 ` Christoph Hellwig
2024-11-12 14:00 ` Brian Foster
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-11-11 6:03 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> In preparation for special handling of subranges, lift the zeroed
> mapping logic from the iterator into the caller.
What's that special code? I don't really see anything added to this
in the new code? In general I would prefer if all code for the
iteration would be kept in a single function in preparation for
unrolling these loops. If you want to keep this code separate
from the write zeroes logic (which seems like a good idea) please
just just move the actual real zeroing out of iomap_zero_iter into
a separate helper similar to how we e.g. have multiple different
implementations in the dio iterator.
> + while ((ret = iomap_iter(&iter, ops)) > 0) {
> + const struct iomap *s = iomap_iter_srcmap(&iter);
> +
> + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> + loff_t p = iomap_length(&iter);
Also please stick to variable names that are readable and preferably
the same as in the surrounding code, e.g. s -> srcmap p -> pos.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-11 6:03 ` Christoph Hellwig
@ 2024-11-12 14:00 ` Brian Foster
2024-11-15 14:53 ` Brian Foster
0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2024-11-12 14:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On Sun, Nov 10, 2024 at 10:03:44PM -0800, Christoph Hellwig wrote:
> On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > In preparation for special handling of subranges, lift the zeroed
> > mapping logic from the iterator into the caller.
>
> What's that special code? I don't really see anything added to this
> in the new code? In general I would prefer if all code for the
> iteration would be kept in a single function in preparation for
> unrolling these loops. If you want to keep this code separate
> from the write zeroes logic (which seems like a good idea) please
> just just move the actual real zeroing out of iomap_zero_iter into
> a separate helper similar to how we e.g. have multiple different
> implementations in the dio iterator.
>
There is no special code... the special treatment is to check the dirty
state of a block unaligned start in isolation to decide whether to skip
or explicitly zero if dirty. The fallback logic is to check the dirty
state of the entire range and if needed, flush the mapping to push all
pending (dirty && unwritten) instances out to the fs so the iomap is up
to date and we can safely skip iomaps that are inherently zero on disk.
Hmm.. so I see the multiple iter modes for dio, but it looks like that
is inherent to the mapping type. That's not quite what I'm doing here,
so I'm not totally clear on what you're asking for. FWIW, I swizzled
this code around a few times and failed to ultimately find something I'd
consider elegant. For example, initial versions would have something
like another param to iomap_zero_iter() to skip the optimization logic
(i.e. don't skip zeroed extents for this call), which I think is more in
the spirit of what you're saying, but I ultimately found it cleaner to
open code that part. If you had something else in mind, could you share
some pseudocode or something to show the factoring..?
> > + while ((ret = iomap_iter(&iter, ops)) > 0) {
> > + const struct iomap *s = iomap_iter_srcmap(&iter);
> > +
> > + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > + loff_t p = iomap_length(&iter);
>
> Also please stick to variable names that are readable and preferably
> the same as in the surrounding code, e.g. s -> srcmap p -> pos.
>
Sure. I think I did this to avoid long lines, but I can change it.
Thanks.
Brian
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-12 14:00 ` Brian Foster
@ 2024-11-15 14:53 ` Brian Foster
2024-11-15 17:02 ` Darrick J. Wong
0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2024-11-15 14:53 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On Tue, Nov 12, 2024 at 09:00:35AM -0500, Brian Foster wrote:
> On Sun, Nov 10, 2024 at 10:03:44PM -0800, Christoph Hellwig wrote:
> > On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > > In preparation for special handling of subranges, lift the zeroed
> > > mapping logic from the iterator into the caller.
> >
> > What's that special code? I don't really see anything added to this
> > in the new code? In general I would prefer if all code for the
> > iteration would be kept in a single function in preparation for
> > unrolling these loops. If you want to keep this code separate
> > from the write zeroes logic (which seems like a good idea) please
> > just just move the actual real zeroing out of iomap_zero_iter into
> > a separate helper similar to how we e.g. have multiple different
> > implementations in the dio iterator.
> >
>
> There is no special code... the special treatment is to check the dirty
> state of a block unaligned start in isolation to decide whether to skip
> or explicitly zero if dirty. The fallback logic is to check the dirty
> state of the entire range and if needed, flush the mapping to push all
> pending (dirty && unwritten) instances out to the fs so the iomap is up
> to date and we can safely skip iomaps that are inherently zero on disk.
>
> Hmm.. so I see the multiple iter modes for dio, but it looks like that
> is inherent to the mapping type. That's not quite what I'm doing here,
> so I'm not totally clear on what you're asking for. FWIW, I swizzled
> this code around a few times and failed to ultimately find something I'd
> consider elegant. For example, initial versions would have something
> like another param to iomap_zero_iter() to skip the optimization logic
> (i.e. don't skip zeroed extents for this call), which I think is more in
> the spirit of what you're saying, but I ultimately found it cleaner to
> open code that part. If you had something else in mind, could you share
> some pseudocode or something to show the factoring..?
>
FWIW, I'm concurrently hacking on what I'd consider a longer term fix
here, based on some of the earlier discussions. The idea is basically
iomap provides a mechanism for the fs to attach a folio_batch of dirty
folios to the iomap, which zero range can then use as the source of
truth for which subranges to zero of an unwritten mapping.
It occurs to me that might lend itself a bit more to what you're looking
for here by avoiding the need for a new instance of the iter loop (I
assume there is some outstanding work that is affected by this?). Given
that this series was kind of a side quest for a band-aid performance fix
in the meantime, and it's not likely 6.13 material anyways, I think I'm
going to put it in a holding pattern and keep it in the back pocket in
favor of trying to move that alternate approach along, at least to where
I can post an RFC for discussion.
If that doesn't work out or there proves some critical need for it in
the meantime, then I'll post v4 for an easy fix. I'll post a v2 of patch
4 separately since that is an independent fix..
Brian
> > > + while ((ret = iomap_iter(&iter, ops)) > 0) {
> > > + const struct iomap *s = iomap_iter_srcmap(&iter);
> > > +
> > > + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > > + loff_t p = iomap_length(&iter);
> >
> > Also please stick to variable names that are readable and preferably
> > the same as in the surrounding code, e.g. s -> srcmap p -> pos.
> >
>
> Sure. I think I did this to avoid long lines, but I can change it.
> Thanks.
>
> Brian
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-15 14:53 ` Brian Foster
@ 2024-11-15 17:02 ` Darrick J. Wong
2024-11-15 19:31 ` Brian Foster
0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2024-11-15 17:02 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel
On Fri, Nov 15, 2024 at 09:53:14AM -0500, Brian Foster wrote:
> On Tue, Nov 12, 2024 at 09:00:35AM -0500, Brian Foster wrote:
> > On Sun, Nov 10, 2024 at 10:03:44PM -0800, Christoph Hellwig wrote:
> > > On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > > > In preparation for special handling of subranges, lift the zeroed
> > > > mapping logic from the iterator into the caller.
> > >
> > > What's that special code? I don't really see anything added to this
> > > in the new code? In general I would prefer if all code for the
> > > iteration would be kept in a single function in preparation for
> > > unrolling these loops. If you want to keep this code separate
> > > from the write zeroes logic (which seems like a good idea) please
> > > just just move the actual real zeroing out of iomap_zero_iter into
> > > a separate helper similar to how we e.g. have multiple different
> > > implementations in the dio iterator.
> > >
> >
> > There is no special code... the special treatment is to check the dirty
> > state of a block unaligned start in isolation to decide whether to skip
> > or explicitly zero if dirty. The fallback logic is to check the dirty
> > state of the entire range and if needed, flush the mapping to push all
> > pending (dirty && unwritten) instances out to the fs so the iomap is up
> > to date and we can safely skip iomaps that are inherently zero on disk.
> >
> > Hmm.. so I see the multiple iter modes for dio, but it looks like that
> > is inherent to the mapping type. That's not quite what I'm doing here,
> > so I'm not totally clear on what you're asking for. FWIW, I swizzled
> > this code around a few times and failed to ultimately find something I'd
> > consider elegant. For example, initial versions would have something
> > like another param to iomap_zero_iter() to skip the optimization logic
> > (i.e. don't skip zeroed extents for this call), which I think is more in
> > the spirit of what you're saying, but I ultimately found it cleaner to
> > open code that part. If you had something else in mind, could you share
> > some pseudocode or something to show the factoring..?
> >
>
> FWIW, I'm concurrently hacking on what I'd consider a longer term fix
> here, based on some of the earlier discussions. The idea is basically
> iomap provides a mechanism for the fs to attach a folio_batch of dirty
> folios to the iomap, which zero range can then use as the source of
> truth for which subranges to zero of an unwritten mapping.
That's fun! :)
I wonder, can this mechanism stretch to the generic buffered write path?
In which case, can you hang on to the folios long enough to issue
writeback on them too, if it's a synchronous write?
> It occurs to me that might lend itself a bit more to what you're looking
> for here by avoiding the need for a new instance of the iter loop (I
> assume there is some outstanding work that is affected by this?). Given
> that this series was kind of a side quest for a band-aid performance fix
> in the meantime, and it's not likely 6.13 material anyways, I think I'm
> going to put it in a holding pattern and keep it in the back pocket in
> favor of trying to move that alternate approach along, at least to where
> I can post an RFC for discussion.
>
> If that doesn't work out or there proves some critical need for it in
> the meantime, then I'll post v4 for an easy fix. I'll post a v2 of patch
> 4 separately since that is an independent fix..
I thought the bug robots were complaining about the performance hit,
so at least this part should go in sooner than later.
--D
> Brian
>
> > > > + while ((ret = iomap_iter(&iter, ops)) > 0) {
> > > > + const struct iomap *s = iomap_iter_srcmap(&iter);
> > > > +
> > > > + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > > > + loff_t p = iomap_length(&iter);
> > >
> > > Also please stick to variable names that are readable and preferably
> > > the same as in the surrounding code, e.g. s -> srcmap p -> pos.
> > >
> >
> > Sure. I think I did this to avoid long lines, but I can change it.
> > Thanks.
> >
> > Brian
> >
> >
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range()
2024-11-15 17:02 ` Darrick J. Wong
@ 2024-11-15 19:31 ` Brian Foster
0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-15 19:31 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-fsdevel
On Fri, Nov 15, 2024 at 09:02:47AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 15, 2024 at 09:53:14AM -0500, Brian Foster wrote:
> > On Tue, Nov 12, 2024 at 09:00:35AM -0500, Brian Foster wrote:
> > > On Sun, Nov 10, 2024 at 10:03:44PM -0800, Christoph Hellwig wrote:
> > > > On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > > > > In preparation for special handling of subranges, lift the zeroed
> > > > > mapping logic from the iterator into the caller.
> > > >
> > > > What's that special code? I don't really see anything added to this
> > > > in the new code? In general I would prefer if all code for the
> > > > iteration would be kept in a single function in preparation for
> > > > unrolling these loops. If you want to keep this code separate
> > > > from the write zeroes logic (which seems like a good idea) please
> > > > just just move the actual real zeroing out of iomap_zero_iter into
> > > > a separate helper similar to how we e.g. have multiple different
> > > > implementations in the dio iterator.
> > > >
> > >
> > > There is no special code... the special treatment is to check the dirty
> > > state of a block unaligned start in isolation to decide whether to skip
> > > or explicitly zero if dirty. The fallback logic is to check the dirty
> > > state of the entire range and if needed, flush the mapping to push all
> > > pending (dirty && unwritten) instances out to the fs so the iomap is up
> > > to date and we can safely skip iomaps that are inherently zero on disk.
> > >
> > > Hmm.. so I see the multiple iter modes for dio, but it looks like that
> > > is inherent to the mapping type. That's not quite what I'm doing here,
> > > so I'm not totally clear on what you're asking for. FWIW, I swizzled
> > > this code around a few times and failed to ultimately find something I'd
> > > consider elegant. For example, initial versions would have something
> > > like another param to iomap_zero_iter() to skip the optimization logic
> > > (i.e. don't skip zeroed extents for this call), which I think is more in
> > > the spirit of what you're saying, but I ultimately found it cleaner to
> > > open code that part. If you had something else in mind, could you share
> > > some pseudocode or something to show the factoring..?
> > >
> >
> > FWIW, I'm concurrently hacking on what I'd consider a longer term fix
> > here, based on some of the earlier discussions. The idea is basically
> > iomap provides a mechanism for the fs to attach a folio_batch of dirty
> > folios to the iomap, which zero range can then use as the source of
> > truth for which subranges to zero of an unwritten mapping.
>
> That's fun! :)
>
> I wonder, can this mechanism stretch to the generic buffered write path?
> In which case, can you hang on to the folios long enough to issue
> writeback on them too, if it's a synchronous write?
>
That's an interesting idea. I think it could, but that's several steps
ahead of where I'm at. My current hope is that obviously this works
generically for zero range without the need to flush or revalidate
(unless as a fallback I suppose), and then from there the same thing can
be used for seek data/hole, which has similar wonkiness wrt unwritten
mappings.
From there, it might be interesting to see if there's value in this sort
of thing for buffered writes and whatnot. I'll keep that in mind.
> > It occurs to me that might lend itself a bit more to what you're looking
> > for here by avoiding the need for a new instance of the iter loop (I
> > assume there is some outstanding work that is affected by this?). Given
> > that this series was kind of a side quest for a band-aid performance fix
> > in the meantime, and it's not likely 6.13 material anyways, I think I'm
> > going to put it in a holding pattern and keep it in the back pocket in
> > favor of trying to move that alternate approach along, at least to where
> > I can post an RFC for discussion.
> >
> > If that doesn't work out or there proves some critical need for it in
> > the meantime, then I'll post v4 for an easy fix. I'll post a v2 of patch
> > 4 separately since that is an independent fix..
>
> I thought the bug robots were complaining about the performance hit,
> so at least this part should go in sooner than later.
>
Technically, yeah.. I've just been waffling over it because I'd rather
try to make more progress on that over trying too hard to polish up this
one. Christoph was grumbling a bit on factoring related to unrolling
these loops in the future or some such thing, so I'm not really sure
where he is on that.
Ok, maybe I'll do this.. I've already fixed the outstanding nits for v4,
so I might as well just post it. If there's anything critical that needs
fixing on review then obviously I'll address it, but otherwise I'll
prioritize working toward an RFC for the batching thing over pure
factoring/aesthetic changes, because that likely means much of this code
just goes away anyways.
Brian
> --D
>
> > Brian
> >
> > > > > + while ((ret = iomap_iter(&iter, ops)) > 0) {
> > > > > + const struct iomap *s = iomap_iter_srcmap(&iter);
> > > > > +
> > > > > + if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > > > > + loff_t p = iomap_length(&iter);
> > > >
> > > > Also please stick to variable names that are readable and preferably
> > > > the same as in the surrounding code, e.g. s -> srcmap p -> pos.
> > > >
> > >
> > > Sure. I think I did this to avoid long lines, but I can change it.
> > > Thanks.
> > >
> > > Brian
> > >
> > >
> >
> >
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 3/4] iomap: elide flush from partial eof zero range
2024-11-08 12:42 [PATCH v3 0/4] iomap: zero range flush fixes Brian Foster
2024-11-08 12:42 ` [PATCH v3 1/4] iomap: reset per-iter state on non-error iter advances Brian Foster
2024-11-08 12:42 ` [PATCH v3 2/4] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
@ 2024-11-08 12:42 ` Brian Foster
2024-11-09 3:03 ` Darrick J. Wong
2024-11-11 6:06 ` Christoph Hellwig
2024-11-08 12:42 ` [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio Brian Foster
3 siblings, 2 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-08 12:42 UTC (permalink / raw)
To: linux-fsdevel
iomap zero range flushes pagecache in certain situations to
determine which parts of the range might require zeroing if dirty
data is present in pagecache. The kernel robot recently reported a
regression associated with this flushing in the following stress-ng
workload on XFS:
stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
This workload involves repeated small, strided, extending writes. On
XFS, this produces a pattern of post-eof speculative preallocation,
conversion of preallocation from delalloc to unwritten, dirtying
pagecache over newly unwritten blocks, and then rinse and repeat
from the new EOF. This leads to repetitive flushing of the EOF folio
via the zero range call XFS uses for writes that start beyond
current EOF.
To mitigate this problem, special case EOF block zeroing to prefer
zeroing the folio over a flush when the EOF folio is already dirty.
To do this, split out and open code handling of an unaligned start
offset. This brings most of the performance back by avoiding flushes
on zero range calls via write and truncate extension operations. The
flush doesn't occur in these situations because the entire range is
post-eof and therefore the folio that overlaps EOF is the only one
in the range.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a78b5b9b3df3..7f40234a301e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1401,6 +1401,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
.len = len,
.flags = IOMAP_ZERO,
};
+ struct address_space *mapping = inode->i_mapping;
+ unsigned int blocksize = i_blocksize(inode);
+ unsigned int off = pos & (blocksize - 1);
+ loff_t plen = min_t(loff_t, len, blocksize - off);
int ret;
bool range_dirty;
@@ -1410,12 +1414,28 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
* mapping converts on writeback completion and so must be zeroed.
*
* The simplest way to deal with this across a range is to flush
- * pagecache and process the updated mappings. To avoid an unconditional
- * flush, check pagecache state and only flush if dirty and the fs
- * returns a mapping that might convert on writeback.
+ * pagecache and process the updated mappings. To avoid excessive
+ * flushing on partial eof zeroing, special case it to zero the
+ * unaligned start portion if already dirty in pagecache.
+ */
+ if (off &&
+ filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
+ iter.len = plen;
+ while ((ret = iomap_iter(&iter, ops)) > 0)
+ iter.processed = iomap_zero_iter(&iter, did_zero);
+
+ iter.len = len - (iter.pos - pos);
+ if (ret || !iter.len)
+ return ret;
+ }
+
+ /*
+ * To avoid an unconditional flush, check pagecache state and only flush
+ * if dirty and the fs returns a mapping that might convert on
+ * writeback.
*/
range_dirty = filemap_range_needs_writeback(inode->i_mapping,
- pos, pos + len - 1);
+ iter.pos, iter.pos + iter.len - 1);
while ((ret = iomap_iter(&iter, ops)) > 0) {
const struct iomap *s = iomap_iter_srcmap(&iter);
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 3/4] iomap: elide flush from partial eof zero range
2024-11-08 12:42 ` [PATCH v3 3/4] iomap: elide flush from partial eof zero range Brian Foster
@ 2024-11-09 3:03 ` Darrick J. Wong
2024-11-11 6:06 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2024-11-09 3:03 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:42:45AM -0500, Brian Foster wrote:
> iomap zero range flushes pagecache in certain situations to
> determine which parts of the range might require zeroing if dirty
> data is present in pagecache. The kernel robot recently reported a
> regression associated with this flushing in the following stress-ng
> workload on XFS:
>
> stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
>
> This workload involves repeated small, strided, extending writes. On
> XFS, this produces a pattern of post-eof speculative preallocation,
> conversion of preallocation from delalloc to unwritten, dirtying
> pagecache over newly unwritten blocks, and then rinse and repeat
> from the new EOF. This leads to repetitive flushing of the EOF folio
> via the zero range call XFS uses for writes that start beyond
> current EOF.
>
> To mitigate this problem, special case EOF block zeroing to prefer
> zeroing the folio over a flush when the EOF folio is already dirty.
> To do this, split out and open code handling of an unaligned start
> offset. This brings most of the performance back by avoiding flushes
> on zero range calls via write and truncate extension operations. The
> flush doesn't occur in these situations because the entire range is
> post-eof and therefore the folio that overlaps EOF is the only one
> in the range.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
> 1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a78b5b9b3df3..7f40234a301e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1401,6 +1401,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> .len = len,
> .flags = IOMAP_ZERO,
> };
> + struct address_space *mapping = inode->i_mapping;
> + unsigned int blocksize = i_blocksize(inode);
> + unsigned int off = pos & (blocksize - 1);
> + loff_t plen = min_t(loff_t, len, blocksize - off);
> int ret;
> bool range_dirty;
>
> @@ -1410,12 +1414,28 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> * mapping converts on writeback completion and so must be zeroed.
> *
> * The simplest way to deal with this across a range is to flush
> - * pagecache and process the updated mappings. To avoid an unconditional
> - * flush, check pagecache state and only flush if dirty and the fs
> - * returns a mapping that might convert on writeback.
> + * pagecache and process the updated mappings. To avoid excessive
> + * flushing on partial eof zeroing, special case it to zero the
> + * unaligned start portion if already dirty in pagecache.
> + */
> + if (off &&
> + filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
> + iter.len = plen;
> + while ((ret = iomap_iter(&iter, ops)) > 0)
> + iter.processed = iomap_zero_iter(&iter, did_zero);
> +
> + iter.len = len - (iter.pos - pos);
> + if (ret || !iter.len)
> + return ret;
This looks much cleaner to me now, thanks for iterating :)
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + }
> +
> + /*
> + * To avoid an unconditional flush, check pagecache state and only flush
> + * if dirty and the fs returns a mapping that might convert on
> + * writeback.
> */
> range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> - pos, pos + len - 1);
> + iter.pos, iter.pos + iter.len - 1);
> while ((ret = iomap_iter(&iter, ops)) > 0) {
> const struct iomap *s = iomap_iter_srcmap(&iter);
>
> --
> 2.47.0
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 3/4] iomap: elide flush from partial eof zero range
2024-11-08 12:42 ` [PATCH v3 3/4] iomap: elide flush from partial eof zero range Brian Foster
2024-11-09 3:03 ` Darrick J. Wong
@ 2024-11-11 6:06 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-11-11 6:06 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio
2024-11-08 12:42 [PATCH v3 0/4] iomap: zero range flush fixes Brian Foster
` (2 preceding siblings ...)
2024-11-08 12:42 ` [PATCH v3 3/4] iomap: elide flush from partial eof zero range Brian Foster
@ 2024-11-08 12:42 ` Brian Foster
2024-11-09 3:06 ` Darrick J. Wong
2024-11-11 6:06 ` Christoph Hellwig
3 siblings, 2 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-08 12:42 UTC (permalink / raw)
To: linux-fsdevel
iomap_zero_range() uses buffered writes for manual zeroing, no
longer updates i_size for such writes, but is still explicitly
called for post-eof ranges. The historical use case for this is
zeroing post-eof speculative preallocation on extending writes from
XFS. However, XFS also recently changed to convert all post-eof
delalloc mappings to unwritten in the iomap_begin() handler, which
means it now never expects manual zeroing of post-eof mappings. In
other words, all post-eof mappings should be reported as holes or
unwritten.
This is a subtle dependency that can be hard to detect if violated
because associated codepaths are likely to update i_size after folio
locks are dropped, but before writeback happens to occur. For
example, if XFS reverts back to some form of manual zeroing of
post-eof blocks on write extension, writeback of those zeroed folios
will now race with the presumed i_size update from the subsequent
buffered write.
Since iomap_zero_range() can't correctly zero post-eof mappings
beyond EOF without updating i_size, warn if this ever occurs. This
serves as minimal indication that if this use case is reintroduced
by a filesystem, iomap_zero_range() might need to reconsider i_size
updates for write extending use cases.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7f40234a301e..e18830e4809b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1354,6 +1354,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
{
loff_t pos = iter->pos;
loff_t length = iomap_length(iter);
+ loff_t isize = iter->inode->i_size;
loff_t written = 0;
do {
@@ -1369,6 +1370,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
if (iter->iomap.flags & IOMAP_F_STALE)
break;
+ /* warn about zeroing folios beyond eof that won't write back */
+ WARN_ON_ONCE(folio_pos(folio) > isize);
offset = offset_in_folio(folio, pos);
if (bytes > folio_size(folio) - offset)
bytes = folio_size(folio) - offset;
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio
2024-11-08 12:42 ` [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio Brian Foster
@ 2024-11-09 3:06 ` Darrick J. Wong
2024-11-12 14:01 ` Brian Foster
2024-11-11 6:06 ` Christoph Hellwig
1 sibling, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2024-11-09 3:06 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:42:46AM -0500, Brian Foster wrote:
> iomap_zero_range() uses buffered writes for manual zeroing, no
> longer updates i_size for such writes, but is still explicitly
> called for post-eof ranges. The historical use case for this is
> zeroing post-eof speculative preallocation on extending writes from
> XFS. However, XFS also recently changed to convert all post-eof
> delalloc mappings to unwritten in the iomap_begin() handler, which
> means it now never expects manual zeroing of post-eof mappings. In
> other words, all post-eof mappings should be reported as holes or
> unwritten.
>
> This is a subtle dependency that can be hard to detect if violated
> because associated codepaths are likely to update i_size after folio
> locks are dropped, but before writeback happens to occur. For
> example, if XFS reverts back to some form of manual zeroing of
> post-eof blocks on write extension, writeback of those zeroed folios
> will now race with the presumed i_size update from the subsequent
> buffered write.
>
> Since iomap_zero_range() can't correctly zero post-eof mappings
> beyond EOF without updating i_size, warn if this ever occurs. This
> serves as minimal indication that if this use case is reintroduced
> by a filesystem, iomap_zero_range() might need to reconsider i_size
> updates for write extending use cases.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/iomap/buffered-io.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7f40234a301e..e18830e4809b 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1354,6 +1354,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> {
> loff_t pos = iter->pos;
> loff_t length = iomap_length(iter);
> + loff_t isize = iter->inode->i_size;
> loff_t written = 0;
>
> do {
> @@ -1369,6 +1370,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> if (iter->iomap.flags & IOMAP_F_STALE)
> break;
>
> + /* warn about zeroing folios beyond eof that won't write back */
> + WARN_ON_ONCE(folio_pos(folio) > isize);
WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size));?
No need to have the extra local variable for something that shouldn't
ever happen. Do you need i_size_read for correctness here?
--D
> offset = offset_in_folio(folio, pos);
> if (bytes > folio_size(folio) - offset)
> bytes = folio_size(folio) - offset;
> --
> 2.47.0
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio
2024-11-09 3:06 ` Darrick J. Wong
@ 2024-11-12 14:01 ` Brian Foster
0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2024-11-12 14:01 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel
On Fri, Nov 08, 2024 at 07:06:23PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 08, 2024 at 07:42:46AM -0500, Brian Foster wrote:
> > iomap_zero_range() uses buffered writes for manual zeroing, no
> > longer updates i_size for such writes, but is still explicitly
> > called for post-eof ranges. The historical use case for this is
> > zeroing post-eof speculative preallocation on extending writes from
> > XFS. However, XFS also recently changed to convert all post-eof
> > delalloc mappings to unwritten in the iomap_begin() handler, which
> > means it now never expects manual zeroing of post-eof mappings. In
> > other words, all post-eof mappings should be reported as holes or
> > unwritten.
> >
> > This is a subtle dependency that can be hard to detect if violated
> > because associated codepaths are likely to update i_size after folio
> > locks are dropped, but before writeback happens to occur. For
> > example, if XFS reverts back to some form of manual zeroing of
> > post-eof blocks on write extension, writeback of those zeroed folios
> > will now race with the presumed i_size update from the subsequent
> > buffered write.
> >
> > Since iomap_zero_range() can't correctly zero post-eof mappings
> > beyond EOF without updating i_size, warn if this ever occurs. This
> > serves as minimal indication that if this use case is reintroduced
> > by a filesystem, iomap_zero_range() might need to reconsider i_size
> > updates for write extending use cases.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/iomap/buffered-io.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 7f40234a301e..e18830e4809b 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1354,6 +1354,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> > {
> > loff_t pos = iter->pos;
> > loff_t length = iomap_length(iter);
> > + loff_t isize = iter->inode->i_size;
> > loff_t written = 0;
> >
> > do {
> > @@ -1369,6 +1370,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> > if (iter->iomap.flags & IOMAP_F_STALE)
> > break;
> >
> > + /* warn about zeroing folios beyond eof that won't write back */
> > + WARN_ON_ONCE(folio_pos(folio) > isize);
>
> WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size));?
>
> No need to have the extra local variable for something that shouldn't
> ever happen. Do you need i_size_read for correctness here?
>
Dropped isize. I didn't think we needed i_size_read() since we're
typically in an fs operation path, but I could be wrong. I haven't seen
any spurious warnings in my testing so far, at least.
Brian
> --D
>
> > offset = offset_in_folio(folio, pos);
> > if (bytes > folio_size(folio) - offset)
> > bytes = folio_size(folio) - offset;
> > --
> > 2.47.0
> >
> >
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio
2024-11-08 12:42 ` [PATCH v3 4/4] iomap: warn on zero range of a post-eof folio Brian Foster
2024-11-09 3:06 ` Darrick J. Wong
@ 2024-11-11 6:06 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-11-11 6:06 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel
Looks fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 20+ messages in thread