public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype
@ 2024-12-13 15:05 Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 1/4] iomap: prep work for folio_batch support Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Brian Foster @ 2024-12-13 15:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Hi all,

This is an update of the folio_batch prototype for zero range. The idea
here is first to provide a mechanism to correctly handle dirty folios
over unwritten extents on zero range without needing to flush, and
second to add it in a way such that it can be extended to other
operations in the future. For example, seek data over unwritten extents
could use similar treatment as zero range, and multiple people have
brought up the idea of using this for buffered writes in the future.

The primary change in RFCv2 is this is now ported on top of the
incremental iter advance patches, which eliminates the need for patch 1
in RFCv1 and allows plumbing into the normal folio get path.

This is still a WIP and I'm posting mainly as a reference for the
incremental advance patches. Patch 1 is a squash of 7 or 8 prep patches
that aren't all that interesting, patch 2 implements batch support,
patch 3 is a prep patch for XFS, and patch 4 updates XFS to use the iter
batch on zero range.

Thoughts, reviews, flames appreciated.

Brian

RFCv2:
- Port onto incremental advance, drop patch 1 from RFCv1.
- Moved batch into iomap_iter, dynamically allocate and drop flag.
- Tweak XFS patch to always trim zero range on EOF boundary.
RFCv1: https://lore.kernel.org/linux-fsdevel/20241119154656.774395-1-bfoster@redhat.com/

Brian Foster (4):
  iomap: prep work for folio_batch support
  iomap: optional zero range dirty folio processing
  xfs: always trim mapping to requested range for zero range
  xfs: fill dirty folios on zero range of unwritten mappings

 fs/iomap/buffered-io.c | 177 ++++++++++++++++++++++++++++++-----------
 fs/iomap/iter.c        |   6 ++
 fs/xfs/xfs_iomap.c     |  25 ++++--
 include/linux/iomap.h  |   4 +
 4 files changed, 158 insertions(+), 54 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFCv2 1/4] iomap: prep work for folio_batch support
  2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
@ 2024-12-13 15:05 ` Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Brian Foster @ 2024-12-13 15:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Squash of misc. prep work for folio_batch support.

Not-Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 100 +++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 43 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e0ae46b11413..7fdf593b58b1 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -743,10 +743,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	return 0;
 }
 
-static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos,
-		size_t len)
+static struct folio *__iomap_get_folio(struct iomap_iter *iter, size_t len)
 {
 	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
+	loff_t pos = iter->pos;
+
+	if (!mapping_large_folio_support(iter->inode->i_mapping))
+		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
 
 	if (folio_ops && folio_ops->get_folio)
 		return folio_ops->get_folio(iter, pos, len);
@@ -754,10 +757,11 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos,
 		return iomap_get_folio(iter, pos, len);
 }
 
-static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret,
+static void __iomap_put_folio(struct iomap_iter *iter, size_t ret,
 		struct folio *folio)
 {
 	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
+	loff_t pos = iter->pos;
 
 	if (folio_ops && folio_ops->put_folio) {
 		folio_ops->put_folio(iter->inode, pos, ret, folio);
@@ -767,6 +771,21 @@ static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret,
 	}
 }
 
+/* trim pos and bytes to within a given folio */
+static loff_t iomap_trim_folio_range(struct iomap_iter *iter,
+		struct folio *folio, size_t *offset, size_t *bytes)
+{
+	loff_t pos = iter->pos;
+	size_t fsize = folio_size(folio);
+
+	WARN_ON_ONCE(pos < folio_pos(folio) || pos >= folio_pos(folio) + fsize);
+
+	*offset = offset_in_folio(folio, pos);
+	if (*bytes > fsize - *offset)
+		*bytes = fsize - *offset;
+	return pos;
+}
+
 static int iomap_write_begin_inline(const struct iomap_iter *iter,
 		struct folio *folio)
 {
@@ -776,25 +795,27 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter,
 	return iomap_read_inline_data(iter, folio);
 }
 
-static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
-		size_t len, struct folio **foliop)
+/*
+ * Grab and prepare a folio for write based on iter state. Returns the folio,
+ * offset, and length. Callers can optionally pass a max length *plen,
+ * otherwise init to zero.
+ */
+static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
+		size_t *poffset, size_t *plen)
 {
 	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
+	loff_t pos;
+	size_t len = min_t(u64, SIZE_MAX, iomap_length(iter));
 	struct folio *folio;
 	int status = 0;
 
-	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
-	if (srcmap != &iter->iomap)
-		BUG_ON(pos + len > srcmap->offset + srcmap->length);
+	len = *plen > 0 ? min_t(u64, len, *plen) : len;
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
-	if (!mapping_large_folio_support(iter->inode->i_mapping))
-		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
-
-	folio = __iomap_get_folio(iter, pos, len);
+	folio = __iomap_get_folio(iter, len);
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
@@ -818,8 +839,10 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 		}
 	}
 
-	if (pos + len > folio_pos(folio) + folio_size(folio))
-		len = folio_pos(folio) + folio_size(folio) - pos;
+	pos = iomap_trim_folio_range(iter, folio, poffset, &len);
+	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
+	if (srcmap != &iter->iomap)
+		BUG_ON(pos + len > srcmap->offset + srcmap->length);
 
 	if (srcmap->type == IOMAP_INLINE)
 		status = iomap_write_begin_inline(iter, folio);
@@ -832,10 +855,11 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 		goto out_unlock;
 
 	*foliop = folio;
+	*plen = len;
 	return 0;
 
 out_unlock:
-	__iomap_put_folio(iter, pos, 0, folio);
+	__iomap_put_folio(iter, 0, folio);
 
 	return status;
 }
@@ -885,10 +909,11 @@ static void iomap_write_end_inline(const struct iomap_iter *iter,
  * Returns true if all copied bytes have been written to the pagecache,
  * otherwise return false.
  */
-static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
-		size_t copied, struct folio *folio)
+static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
+		struct folio *folio)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
+	loff_t pos = iter->pos;
 
 	if (srcmap->type == IOMAP_INLINE) {
 		iomap_write_end_inline(iter, folio, pos, copied);
@@ -922,12 +947,12 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 		size_t bytes;		/* Bytes to write to folio */
 		size_t copied;		/* Bytes copied from user */
 		size_t written;		/* Bytes have been written */
-		loff_t pos = iter->pos;
+		loff_t pos;
 		loff_t length = iomap_length(iter);
 
 		bytes = iov_iter_count(i);
 retry:
-		offset = pos & (chunk - 1);
+		offset = iter->pos & (chunk - 1);
 		bytes = min(chunk - offset, bytes);
 		status = balance_dirty_pages_ratelimited_flags(mapping,
 							       bdp_flags);
@@ -952,23 +977,21 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 			break;
 		}
 
-		status = iomap_write_begin(iter, pos, bytes, &folio);
+		status = iomap_write_begin(iter, &folio, &offset, &bytes);
 		if (unlikely(status)) {
-			iomap_write_failed(iter->inode, pos, bytes);
+			iomap_write_failed(iter->inode, iter->pos, bytes);
 			break;
 		}
 		if (iter->iomap.flags & IOMAP_F_STALE)
 			break;
 
-		offset = offset_in_folio(folio, pos);
-		if (bytes > folio_size(folio) - offset)
-			bytes = folio_size(folio) - offset;
+		pos = iter->pos;
 
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_folio(folio);
 
 		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
-		written = iomap_write_end(iter, pos, bytes, copied, folio) ?
+		written = iomap_write_end(iter, bytes, copied, folio) ?
 			  copied : 0;
 
 		/*
@@ -983,7 +1006,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 			i_size_write(iter->inode, pos + written);
 			iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
 		}
-		__iomap_put_folio(iter, pos, written, folio);
+		__iomap_put_folio(iter, written, folio);
 
 		if (old_size < pos)
 			pagecache_isize_extended(iter->inode, old_size, pos);
@@ -1276,22 +1299,17 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 		struct folio *folio;
 		int status;
 		size_t offset;
-		size_t bytes = min_t(u64, SIZE_MAX, iomap_length(iter));
-		loff_t pos = iter->pos;
+		size_t bytes = 0;
 		bool ret;
 
-		status = iomap_write_begin(iter, pos, bytes, &folio);
+		status = iomap_write_begin(iter, &folio, &offset, &bytes);
 		if (unlikely(status))
 			return status;
 		if (iomap->flags & IOMAP_F_STALE)
 			break;
 
-		offset = offset_in_folio(folio, pos);
-		if (bytes > folio_size(folio) - offset)
-			bytes = folio_size(folio) - offset;
-
-		ret = iomap_write_end(iter, pos, bytes, bytes, folio);
-		__iomap_put_folio(iter, pos, bytes, folio);
+		ret = iomap_write_end(iter, bytes, bytes, folio);
+		__iomap_put_folio(iter, bytes, folio);
 		if (WARN_ON_ONCE(!ret))
 			return -EIO;
 
@@ -1347,11 +1365,10 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 		struct folio *folio;
 		int status;
 		size_t offset;
-		size_t bytes = min_t(u64, SIZE_MAX, iomap_length(iter));
-		loff_t pos = iter->pos;
+		size_t bytes = 0;
 		bool ret;
 
-		status = iomap_write_begin(iter, pos, bytes, &folio);
+		status = iomap_write_begin(iter, &folio, &offset, &bytes);
 		if (status)
 			return status;
 		if (iter->iomap.flags & IOMAP_F_STALE)
@@ -1359,15 +1376,12 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 
 		/* warn about zeroing folios beyond eof that won't write back */
 		WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
-		offset = offset_in_folio(folio, pos);
-		if (bytes > folio_size(folio) - offset)
-			bytes = folio_size(folio) - offset;
 
 		folio_zero_range(folio, offset, bytes);
 		folio_mark_accessed(folio);
 
-		ret = iomap_write_end(iter, pos, bytes, bytes, folio);
-		__iomap_put_folio(iter, pos, bytes, folio);
+		ret = iomap_write_end(iter, bytes, bytes, folio);
+		__iomap_put_folio(iter, bytes, folio);
 		if (WARN_ON_ONCE(!ret))
 			return -EIO;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 1/4] iomap: prep work for folio_batch support Brian Foster
@ 2024-12-13 15:05 ` Brian Foster
  2025-01-09  7:20   ` Christoph Hellwig
  2024-12-13 15:05 ` [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
  3 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-12-13 15:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

The only way zero range can currently process unwritten mappings
with dirty pagecache is to check whether the range is dirty before
mapping lookup and then flush when at least one underlying mapping
is unwritten. This ordering is required to prevent iomap lookup from
racing with folio writeback and reclaim.

Since zero range can skip ranges of unwritten mappings that are
clean in cache, this operation can be improved by allowing the
filesystem to provide the set of folios backed by such mappings that
require zeroing up. In turn, rather than flush or iterate file
offsets, zero range can process each folio as normal and skip any
clean or uncached ranges in between.

As a first pass prototype solution, stuff a folio_batch in struct
iomap, provide a helper that the fs can use to populate the batch at
lookup time, and define a flag to indicate the mapping was checked.
Note that since the helper is intended for use under internal fs
locks, it trylocks folios in order to filter out clean folios. This
loosely follows the logic from filemap_range_has_writeback().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 77 ++++++++++++++++++++++++++++++++++++++++--
 fs/iomap/iter.c        |  6 ++++
 include/linux/iomap.h  |  4 +++
 3 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7fdf593b58b1..5492dc7fe963 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -751,6 +751,15 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, size_t len)
 	if (!mapping_large_folio_support(iter->inode->i_mapping))
 		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
 
+	if (iter->fbatch) {
+		struct folio *folio = folio_batch_next(iter->fbatch);
+		if (folio) {
+			folio_get(folio);
+			folio_lock(folio);
+		}
+		return folio;
+	}
+
 	if (folio_ops && folio_ops->get_folio)
 		return folio_ops->get_folio(iter, pos, len);
 	else
@@ -839,6 +848,15 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
 		}
 	}
 
+	if (!folio) {
+		WARN_ON_ONCE(!iter->fbatch);
+		len = 0;
+		goto out;
+	} else if (folio_pos(folio) > iter->pos) {
+		BUG_ON(folio_pos(folio) - iter->pos >= iomap_length(iter));
+		iomap_iter_advance(iter, folio_pos(folio) - iter->pos);
+	}
+
 	pos = iomap_trim_folio_range(iter, folio, poffset, &len);
 	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
 	if (srcmap != &iter->iomap)
@@ -854,6 +872,7 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
 	if (unlikely(status))
 		goto out_unlock;
 
+out:
 	*foliop = folio;
 	*plen = len;
 	return 0;
@@ -1374,6 +1393,11 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 		if (iter->iomap.flags & IOMAP_F_STALE)
 			break;
 
+		if (!folio) {
+			iomap_iter_advance(iter, iomap_length(iter));
+			break;
+		}
+
 		/* warn about zeroing folios beyond eof that won't write back */
 		WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
 
@@ -1393,6 +1417,49 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 	return 0;
 }
 
+loff_t
+iomap_fill_dirty_folios(
+	struct iomap_iter	*iter,
+	loff_t			offset,
+	loff_t			length)
+{
+	struct address_space	*mapping = iter->inode->i_mapping;
+	struct folio_batch	fbatch;
+	loff_t			end_pos = offset + length;
+	pgoff_t			start = offset >> PAGE_SHIFT;
+	pgoff_t			end = (end_pos - 1) >> PAGE_SHIFT;
+
+	folio_batch_init(&fbatch);
+	iter->fbatch = kmalloc(sizeof(struct folio_batch), GFP_KERNEL);
+	if (!iter->fbatch)
+		return end_pos;
+	folio_batch_init(iter->fbatch);
+
+	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
+	       folio_batch_space(iter->fbatch)) {
+		struct folio *folio;
+		while ((folio = folio_batch_next(&fbatch))) {
+			if (folio_trylock(folio)) {
+				bool clean = !folio_test_dirty(folio) &&
+					     !folio_test_writeback(folio);
+				folio_unlock(folio);
+				if (clean)
+					continue;
+			}
+
+			folio_get(folio);
+			if (!folio_batch_add(iter->fbatch, folio)) {
+				end_pos = folio_pos(folio) + folio_size(folio);
+				break;
+			}
+		}
+		folio_batch_release(&fbatch);
+	}
+
+	return end_pos;
+}
+EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios);
+
 int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
@@ -1420,7 +1487,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	 * flushing on partial eof zeroing, special case it to zero the
 	 * unaligned start portion if already dirty in pagecache.
 	 */
-	if (off &&
+	if (!iter.fbatch && off &&
 	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
 		iter.len = plen;
 		while ((ret = iomap_iter(&iter, ops)) > 0)
@@ -1441,8 +1508,12 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	while ((ret = iomap_iter(&iter, ops)) > 0) {
 		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
 
-		if (srcmap->type == IOMAP_HOLE ||
-		    srcmap->type == IOMAP_UNWRITTEN) {
+		if (WARN_ON_ONCE(iter.fbatch && srcmap->type != IOMAP_UNWRITTEN))
+			return -EIO;
+
+		if (!iter.fbatch &&
+		    (srcmap->type == IOMAP_HOLE ||
+		     srcmap->type == IOMAP_UNWRITTEN)) {
 			loff_t proc = iomap_length(&iter);
 
 			if (range_dirty) {
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index 5fe0edb51fe5..911846d7386c 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -9,6 +9,12 @@
 
 static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
 {
+	if (iter->fbatch) {
+		folio_batch_release(iter->fbatch);
+		kfree(iter->fbatch);
+		iter->fbatch = NULL;
+	}
+
 	iter->processed = 0;
 	memset(&iter->iomap, 0, sizeof(iter->iomap));
 	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 704ed98159f7..d01e5265de27 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 #include <linux/mm_types.h>
 #include <linux/blkdev.h>
+#include <linux/pagevec.h>
 
 struct address_space;
 struct fiemap_extent_info;
@@ -228,6 +229,7 @@ struct iomap_iter {
 	unsigned flags;
 	struct iomap iomap;
 	struct iomap srcmap;
+	struct folio_batch *fbatch;
 	void *private;
 };
 
@@ -315,6 +317,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
 bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
+loff_t iomap_fill_dirty_folios(struct iomap_iter *iter, loff_t offset,
+		loff_t length);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
 int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range
  2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 1/4] iomap: prep work for folio_batch support Brian Foster
  2024-12-13 15:05 ` [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing Brian Foster
@ 2024-12-13 15:05 ` Brian Foster
  2025-01-09  7:22   ` Christoph Hellwig
  2024-12-13 15:05 ` [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
  3 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-12-13 15:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Refactor and tweak the IOMAP_ZERO logic in preparation to support
filling the folio batch for unwritten mappings. Drop the superfluous
imap offset check since the hole case has already been filtered out.
Split the the delalloc case handling into a sub-branch, and always
trim the imap to the requested offset/count so it can be more easily
used to bound the range to lookup in pagecache.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iomap.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 50fa3ef89f6c..97fa860a6401 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1059,21 +1059,20 @@ xfs_buffered_write_iomap_begin(
 	}
 
 	/*
-	 * For zeroing, trim a delalloc extent that extends beyond the EOF
-	 * block.  If it starts beyond the EOF block, convert it to an
+	 * For zeroing, trim extents that extend beyond the EOF block. If a
+	 * delalloc extent starts beyond the EOF block, convert it to an
 	 * unwritten extent.
 	 */
-	if ((flags & IOMAP_ZERO) && imap.br_startoff <= offset_fsb &&
-	    isnullstartblock(imap.br_startblock)) {
+	if (flags & IOMAP_ZERO) {
 		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
 
-		if (offset_fsb >= eof_fsb)
+		if (isnullstartblock(imap.br_startblock) &&
+		    offset_fsb >= eof_fsb)
 			goto convert_delay;
-		if (end_fsb > eof_fsb) {
+		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
 			end_fsb = eof_fsb;
-			xfs_trim_extent(&imap, offset_fsb,
-					end_fsb - offset_fsb);
-		}
+
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
 
 	/*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings
  2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
                   ` (2 preceding siblings ...)
  2024-12-13 15:05 ` [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
@ 2024-12-13 15:05 ` Brian Foster
  2025-01-09  7:26   ` Christoph Hellwig
  3 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-12-13 15:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Use the iomap folio batch mechanism to identify which folios to zero
on zero range of unwritten mappings. Trim the resulting mapping if
the batch is filled (unlikely) and set the HAS_FOLIOS flag to inform
iomap that pagecache has been checked for dirty folios.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iomap.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 97fa860a6401..b7dbd34fc02f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -998,6 +998,7 @@ xfs_buffered_write_iomap_begin(
 	struct iomap		*iomap,
 	struct iomap		*srcmap)
 {
+	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter, iomap);
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
@@ -1065,12 +1066,21 @@ xfs_buffered_write_iomap_begin(
 	 */
 	if (flags & IOMAP_ZERO) {
 		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
+		u64 end;
 
 		if (isnullstartblock(imap.br_startblock) &&
 		    offset_fsb >= eof_fsb)
 			goto convert_delay;
 		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
 			end_fsb = eof_fsb;
+		if (imap.br_state == XFS_EXT_UNWRITTEN &&
+		    offset_fsb < eof_fsb) {
+			xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
+			end = iomap_fill_dirty_folios(iter,
+					XFS_FSB_TO_B(mp, imap.br_startoff),
+					XFS_FSB_TO_B(mp, imap.br_blockcount));
+			end_fsb = min_t(xfs_fileoff_t, end_fsb, XFS_B_TO_FSB(mp, end));
+		}
 
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2024-12-13 15:05 ` [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing Brian Foster
@ 2025-01-09  7:20   ` Christoph Hellwig
  2025-01-10 17:53     ` Brian Foster
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2025-01-09  7:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

Just a bit of nitpicking, otherwise this looks sane, although I'd
want to return to proper review of the squashed prep patches first.

>  	if (!mapping_large_folio_support(iter->inode->i_mapping))
>  		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
>  
> +	if (iter->fbatch) {
> +		struct folio *folio = folio_batch_next(iter->fbatch);
> +		if (folio) {

Missing empty line after the variable declaration.

> +	if (!folio) {
> +		WARN_ON_ONCE(!iter->fbatch);
> +		len = 0;
> +		goto out;

Given that the put label really just updates the return by reference
folio and len arguments we might as well do that here and avoid the
goto.

> +	} else if (folio_pos(folio) > iter->pos) {

Also no need for an else after a goto (or return).

> @@ -1374,6 +1393,11 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  		if (iter->iomap.flags & IOMAP_F_STALE)
>  			break;
>  
> +		if (!folio) {
> +			iomap_iter_advance(iter, iomap_length(iter));
> +			break;
> +		}

Maybe throw in a comment here how the NULL folio can happen?

> +	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
> +	       folio_batch_space(iter->fbatch)) {
> +		struct folio *folio;
> +		while ((folio = folio_batch_next(&fbatch))) {
> +			if (folio_trylock(folio)) {
> +				bool clean = !folio_test_dirty(folio) &&
> +					     !folio_test_writeback(folio);
> +				folio_unlock(folio);
> +				if (clean)
> +					continue;
> +			}
> +
> +			folio_get(folio);
> +			if (!folio_batch_add(iter->fbatch, folio)) {
> +				end_pos = folio_pos(folio) + folio_size(folio);
> +				break;
> +			}
> +		}
> +		folio_batch_release(&fbatch);

I think I mentioned this last time, but I'd much prefer to do away
with the locla fbatch used for processing and rewrite this using a
find_get_entry() loop.  That probably means this helper needs to move
to filemap.c, which should be easy if we pass in the mapping and outer
fbatch.

> +		if (WARN_ON_ONCE(iter.fbatch && srcmap->type != IOMAP_UNWRITTEN))

Overly long line.

>  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
>  {
> +	if (iter->fbatch) {
> +		folio_batch_release(iter->fbatch);
> +		kfree(iter->fbatch);
> +		iter->fbatch = NULL;
> +	}

Does it make sense to free the fbatch allocation on every iteration,
or should we keep the memory allocation around and only free it after
the last iteration?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range
  2024-12-13 15:05 ` [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
@ 2025-01-09  7:22   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2025-01-09  7:22 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings
  2024-12-13 15:05 ` [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
@ 2025-01-09  7:26   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2025-01-09  7:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

> +	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter, iomap);

Overly long line.

>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> @@ -1065,12 +1066,21 @@ xfs_buffered_write_iomap_begin(
>  	 */
>  	if (flags & IOMAP_ZERO) {
>  		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> +		u64 end;
>  
>  		if (isnullstartblock(imap.br_startblock) &&
>  		    offset_fsb >= eof_fsb)
>  			goto convert_delay;
>  		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
>  			end_fsb = eof_fsb;
> +		if (imap.br_state == XFS_EXT_UNWRITTEN &&
> +		    offset_fsb < eof_fsb) {
> +			xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> +			end = iomap_fill_dirty_folios(iter,
> +					XFS_FSB_TO_B(mp, imap.br_startoff),
> +					XFS_FSB_TO_B(mp, imap.br_blockcount));
> +			end_fsb = min_t(xfs_fileoff_t, end_fsb, XFS_B_TO_FSB(mp, end));

A few more here.

But most importantly please add a comment desribing the logic behind
this in the function.  It's hairy logic and not obvious from reading
the code, so explaining it will be helpful.

Splitting it into a separate helper might be useful for that, but due
to the amount of state shared with the caller that might not look all
that pretty in the end, so I'm not entirely sure about it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2025-01-09  7:20   ` Christoph Hellwig
@ 2025-01-10 17:53     ` Brian Foster
  2025-01-13  4:51       ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2025-01-10 17:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Jan 08, 2025 at 11:20:39PM -0800, Christoph Hellwig wrote:
> Just a bit of nitpicking, otherwise this looks sane, although I'd
> want to return to proper review of the squashed prep patches first.
> 

Yeah.. I mainly wanted to send this just to show the use case for the
iter advance changes. I'll look into tweaks for the various
nits/comments.

...
> > +	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
> > +	       folio_batch_space(iter->fbatch)) {
> > +		struct folio *folio;
> > +		while ((folio = folio_batch_next(&fbatch))) {
> > +			if (folio_trylock(folio)) {
> > +				bool clean = !folio_test_dirty(folio) &&
> > +					     !folio_test_writeback(folio);
> > +				folio_unlock(folio);
> > +				if (clean)
> > +					continue;
> > +			}
> > +
> > +			folio_get(folio);
> > +			if (!folio_batch_add(iter->fbatch, folio)) {
> > +				end_pos = folio_pos(folio) + folio_size(folio);
> > +				break;
> > +			}
> > +		}
> > +		folio_batch_release(&fbatch);
> 
> I think I mentioned this last time, but I'd much prefer to do away
> with the locla fbatch used for processing and rewrite this using a
> find_get_entry() loop.  That probably means this helper needs to move
> to filemap.c, which should be easy if we pass in the mapping and outer
> fbatch.
> 

I recall we discussed making this more generic. That is still on my
radar, I just hadn't got to it yet.

I don't recall the find_get_entry() loop suggestion, but that seems
reasonable at a quick glance. I've been away from this for a few weeks
but I think my main concern with this trajectory was if/how to deal with
iomap_folio_state if we wanted fully granular dirty folio && dirty block
processing.

For example, if we have a largish dirty folio backed by an unwritten
extent with maybe a single block that is actually dirty, would we be
alright to just zero the requested portion of the folio as long as some
part of the folio is dirty? Given the historical ad hoc nature of XFS
speculative prealloc zeroing, personally I don't see that as much of an
issue in practice as long as subsequent reads return zeroes, but I could
be missing something.

...
> >  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> >  {
> > +	if (iter->fbatch) {
> > +		folio_batch_release(iter->fbatch);
> > +		kfree(iter->fbatch);
> > +		iter->fbatch = NULL;
> > +	}
> 
> Does it make sense to free the fbatch allocation on every iteration,
> or should we keep the memory allocation around and only free it after
> the last iteration?
> 

In the current implementation the existence of the fbatch is what
controls the folio lookup path, so we'd only want it for unwritten
mappings. That said, this could be done differently with a flag or
something that indicates whether to use the batch. Given that we release
the folios anyways and zero range isn't the most frequent thing, I
figured this keeps things simple for now. I don't really have a strong
preference for either approach, however.

Brian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2025-01-10 17:53     ` Brian Foster
@ 2025-01-13  4:51       ` Christoph Hellwig
  2025-01-13 14:32         ` Brian Foster
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2025-01-13  4:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Fri, Jan 10, 2025 at 12:53:19PM -0500, Brian Foster wrote:
> processing.
> 
> For example, if we have a largish dirty folio backed by an unwritten
> extent with maybe a single block that is actually dirty, would we be
> alright to just zero the requested portion of the folio as long as some
> part of the folio is dirty? Given the historical ad hoc nature of XFS
> speculative prealloc zeroing, personally I don't see that as much of an
> issue in practice as long as subsequent reads return zeroes, but I could
> be missing something.

That's a very good question I haven't though about much yet.  And
everytime I try to think of the speculative preallocations and they're
implications my head begins to implode..

> > >  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > >  {
> > > +	if (iter->fbatch) {
> > > +		folio_batch_release(iter->fbatch);
> > > +		kfree(iter->fbatch);
> > > +		iter->fbatch = NULL;
> > > +	}
> > 
> > Does it make sense to free the fbatch allocation on every iteration,
> > or should we keep the memory allocation around and only free it after
> > the last iteration?
> > 
> 
> In the current implementation the existence of the fbatch is what
> controls the folio lookup path, so we'd only want it for unwritten
> mappings. That said, this could be done differently with a flag or
> something that indicates whether to use the batch. Given that we release
> the folios anyways and zero range isn't the most frequent thing, I
> figured this keeps things simple for now. I don't really have a strong
> preference for either approach, however.

I was just worried about the overhead of allocating and freeing
it all the time.  OTOH we probably rarely have more than a single
extent to process with the batch right now.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2025-01-13  4:51       ` Christoph Hellwig
@ 2025-01-13 14:32         ` Brian Foster
  2025-01-15  5:47           ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2025-01-13 14:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Sun, Jan 12, 2025 at 08:51:12PM -0800, Christoph Hellwig wrote:
> On Fri, Jan 10, 2025 at 12:53:19PM -0500, Brian Foster wrote:
> > processing.
> > 
> > For example, if we have a largish dirty folio backed by an unwritten
> > extent with maybe a single block that is actually dirty, would we be
> > alright to just zero the requested portion of the folio as long as some
> > part of the folio is dirty? Given the historical ad hoc nature of XFS
> > speculative prealloc zeroing, personally I don't see that as much of an
> > issue in practice as long as subsequent reads return zeroes, but I could
> > be missing something.
> 
> That's a very good question I haven't though about much yet.  And
> everytime I try to think of the speculative preallocations and they're
> implications my head begins to implode..
> 

Heh. Just context on my thought process, FWIW.. We've obviously zeroed
the newly exposed file range for writes that start beyond EOF for quite
some time. This new post-eof range may or may not have been backed by
speculative prealloc, and if so, that prealloc may either be delalloc or
unwritten extents depending on whether writeback occurred on the EOF
extent (assuming large enough free extents, etc.) before the extending
write.

In turn, this means that extending write zero range would have either
physically zeroed delalloc extents or skipped unwritten blocks,
depending on the situation. Personally, I don't think it really matters
which as there is no real guarantee that "all blocks not previously
written to are unwritten," for example, but rather just that "all blocks
not written to return zeroes on read." For that reason, I'm _hoping_
that we can keep this simple and just deal with some potential spurious
zeroing on folios that are already dirty, but I'm open to arguments
against that.

Note that the post-eof zero range behavior changed in XFS sometime over
the past few releases or so in that it always converts post-eof delalloc
to unwritten in iomap_begin(), but IIRC this was to deal with some other
unrelated issue. I also don't think that change is necessarily right
because it can significantly increase the rate of physical block
allocations in some workloads, but that's a separate issue, particularly
now that zero range doesn't update i_size.. ;P

> > > >  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > > >  {
> > > > +	if (iter->fbatch) {
> > > > +		folio_batch_release(iter->fbatch);
> > > > +		kfree(iter->fbatch);
> > > > +		iter->fbatch = NULL;
> > > > +	}
> > > 
> > > Does it make sense to free the fbatch allocation on every iteration,
> > > or should we keep the memory allocation around and only free it after
> > > the last iteration?
> > > 
> > 
> > In the current implementation the existence of the fbatch is what
> > controls the folio lookup path, so we'd only want it for unwritten
> > mappings. That said, this could be done differently with a flag or
> > something that indicates whether to use the batch. Given that we release
> > the folios anyways and zero range isn't the most frequent thing, I
> > figured this keeps things simple for now. I don't really have a strong
> > preference for either approach, however.
> 
> I was just worried about the overhead of allocating and freeing
> it all the time.  OTOH we probably rarely have more than a single
> extent to process with the batch right now.
> 

Ok. This maintains zero range performance in my testing so far, so I'm
going to maintain simplicity for now until there's a reason to do
otherwise. I'm open to change it on future iterations. I suspect it
might anyways if this ends up used for more operations...

Thanks again for the comments.

Brian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2025-01-13 14:32         ` Brian Foster
@ 2025-01-15  5:47           ` Christoph Hellwig
  2025-01-16 14:14             ` Brian Foster
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2025-01-15  5:47 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Mon, Jan 13, 2025 at 09:32:37AM -0500, Brian Foster wrote:
> In turn, this means that extending write zero range would have either
> physically zeroed delalloc extents or skipped unwritten blocks,
> depending on the situation. Personally, I don't think it really matters
> which as there is no real guarantee that "all blocks not previously
> written to are unwritten," for example, but rather just that "all blocks
> not written to return zeroes on read."

Yes.

> For that reason, I'm _hoping_
> that we can keep this simple and just deal with some potential spurious
> zeroing on folios that are already dirty, but I'm open to arguments
> against that.

I can't see one.  But we really should fine a way to write all this
including the arguments for an again down.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
  2025-01-15  5:47           ` Christoph Hellwig
@ 2025-01-16 14:14             ` Brian Foster
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Foster @ 2025-01-16 14:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Jan 14, 2025 at 09:47:28PM -0800, Christoph Hellwig wrote:
> On Mon, Jan 13, 2025 at 09:32:37AM -0500, Brian Foster wrote:
> > In turn, this means that extending write zero range would have either
> > physically zeroed delalloc extents or skipped unwritten blocks,
> > depending on the situation. Personally, I don't think it really matters
> > which as there is no real guarantee that "all blocks not previously
> > written to are unwritten," for example, but rather just that "all blocks
> > not written to return zeroes on read."
> 
> Yes.
> 
> > For that reason, I'm _hoping_
> > that we can keep this simple and just deal with some potential spurious
> > zeroing on folios that are already dirty, but I'm open to arguments
> > against that.
> 
> I can't see one.  But we really should fine a way to write all this
> including the arguments for an again down.
> 

Indeed. If the first non-rfc pass ultimately makes this tradeoff, I'll
plan to document the behavior in the code and the reasoning and
tradeoffs in the commit log so it can be reviewed.

Brian


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-16 14:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 1/4] iomap: prep work for folio_batch support Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing Brian Foster
2025-01-09  7:20   ` Christoph Hellwig
2025-01-10 17:53     ` Brian Foster
2025-01-13  4:51       ` Christoph Hellwig
2025-01-13 14:32         ` Brian Foster
2025-01-15  5:47           ` Christoph Hellwig
2025-01-16 14:14             ` Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
2025-01-09  7:22   ` Christoph Hellwig
2024-12-13 15:05 ` [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
2025-01-09  7:26   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox