[PATCH RFC 0/4] iomap: zero range folio batch processing prototype

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/4] iomap: zero range folio batch processing prototype
@ 2024-11-19 15:46 Brian Foster
  2024-11-19 15:46 ` [PATCH RFC 1/4] iomap: allow passing a folio into write begin path Brian Foster
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Brian Foster @ 2024-11-19 15:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Hi all,

Here's a first pass prototype of dirty folio batch processing for zero
range. The idea here is that the fs can avoid pagecache flushing or
wonky ordering/racing issues by passing in a batch of dirty folios that
might exist over unwritten mappings at lookup time.

This is only lightly tested so far, but works well enough to demonstrate
the idea. The biggest open question in my mind is how to better define
the interface. For now I just stuffed the folio_batch in the struct
iomap, but I don't really like it there due to confusion between iomap
and srcmap and because it bloats the structure.

I thought about using ->private along with a custom ->get_folio(), but I
don't think that really fits the idea of a built-in mechanism. It might
be more appropriate to attach to the iter, but that currently isn't
accessible to ->iomap_begin(). I suppose we could define an
iomap_to_iter() or some such helper that the fill helper could use to
populate the batch, but maybe there are other thoughts/ideas?

Brian

Brian Foster (4):
  iomap: allow passing a folio into write begin path
  iomap: optional zero range dirty folio processing
  xfs: always trim mapping to requested range for zero range
  xfs: fill dirty folios on zero range of unwritten mappings

 fs/iomap/buffered-io.c | 93 ++++++++++++++++++++++++++++++++++++++----
 fs/iomap/iter.c        |  2 +
 fs/xfs/xfs_iomap.c     | 28 ++++++++-----
 include/linux/iomap.h  |  5 +++
 4 files changed, 109 insertions(+), 19 deletions(-)

-- 
2.47.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC 1/4] iomap: allow passing a folio into write begin path
  2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
@ 2024-11-19 15:46 ` Brian Foster
  2024-11-20  8:38   ` Christoph Hellwig
  2024-11-19 15:46 ` [PATCH RFC 2/4] iomap: optional zero range dirty folio processing Brian Foster
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2024-11-19 15:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

To facilitate batch processing of dirty folios for zero range, tweak
the write begin path to allow the caller to optionally pass in its
own folio/pos combination.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ce73d2a48c1e..d1a86aea1a7a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -781,7 +781,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 {
 	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
-	struct folio *folio;
+	struct folio *folio = *foliop;
 	int status = 0;
 
 	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
@@ -794,9 +794,15 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 	if (!mapping_large_folio_support(iter->inode->i_mapping))
 		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
 
-	folio = __iomap_get_folio(iter, pos, len);
-	if (IS_ERR(folio))
-		return PTR_ERR(folio);
+	/*
+	 * XXX: Might want to plumb batch handling down through here. For now
+	 * let the caller do it.
+	 */
+	if (!folio) {
+		folio = __iomap_get_folio(iter, pos, len);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
+	}
 
 	/*
 	 * Now we have a locked folio, before we do anything with it we need to
@@ -918,7 +924,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 	unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
 
 	do {
-		struct folio *folio;
+		struct folio *folio = NULL;
 		loff_t old_size;
 		size_t offset;		/* Offset into folio */
 		size_t bytes;		/* Bytes to write to folio */
@@ -1281,7 +1287,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 		return length;
 
 	do {
-		struct folio *folio;
+		struct folio *folio = NULL;
 		int status;
 		size_t offset;
 		size_t bytes = min_t(u64, SIZE_MAX, length);
@@ -1385,7 +1391,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 	}
 
 	do {
-		struct folio *folio;
+		struct folio *folio = NULL;
 		int status;
 		size_t offset;
 		size_t bytes = min_t(u64, SIZE_MAX, length);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC 2/4] iomap: optional zero range dirty folio processing
  2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
  2024-11-19 15:46 ` [PATCH RFC 1/4] iomap: allow passing a folio into write begin path Brian Foster
@ 2024-11-19 15:46 ` Brian Foster
  2024-11-20  8:43   ` Christoph Hellwig
  2024-11-19 15:46 ` [PATCH RFC 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2024-11-19 15:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

The only way zero range can currently process unwritten mappings
with dirty pagecache is to check whether the range is dirty before
mapping lookup and then flush when at least one underlying mapping
is unwritten. This ordering is required to prevent iomap lookup from
racing with folio writeback and reclaim.

Since zero range can skip ranges of unwritten mappings that are
clean in cache, this operation can be improved by allowing the
filesystem to provide the set of folios backed by such mappings that
require zeroing up. In turn, rather than flush or iterate file
offsets, zero range can process each folio as normal and skip any
clean or uncached ranges in between.

As a first pass prototype solution, stuff a folio_batch in struct
iomap, provide a helper that the fs can use to populate the batch at
lookup time, and define a flag to indicate the mapping was checked.
Note that since the helper is intended for use under internal fs
locks, it trylocks folios in order to filter out clean folios. This
loosely follows the logic from filemap_range_has_writeback().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 73 ++++++++++++++++++++++++++++++++++++++++--
 fs/iomap/iter.c        |  2 ++
 include/linux/iomap.h  |  5 +++
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d1a86aea1a7a..6ed8dc8dcdd9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1363,6 +1363,10 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
+	bool has_folios = !!(iter->iomap.flags & IOMAP_F_HAS_FOLIOS);
+
+	if (WARN_ON_ONCE(has_folios && srcmap->type != IOMAP_UNWRITTEN))
+		return -EIO;
 
 	/*
 	 * We must zero subranges of unwritten mappings that might be dirty in
@@ -1381,7 +1385,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 	 * post-eof ranges can be dirtied via mapped write and the flush
 	 * triggers writeback time post-eof zeroing.
 	 */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
+	if (!has_folios &&
+	    (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)) {
 		if (*range_dirty) {
 			*range_dirty = false;
 			return iomap_zero_iter_flush_and_stale(iter);
@@ -1395,8 +1400,28 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 		int status;
 		size_t offset;
 		size_t bytes = min_t(u64, SIZE_MAX, length);
+		size_t skipped = 0;
 		bool ret;
 
+		if (has_folios) {
+			folio = folio_batch_next(&iter->iomap.fbatch);
+			if (!folio) {
+				written += length;
+				break;
+			}
+			folio_get(folio);
+			folio_lock(folio);
+			if (pos < folio_pos(folio)) {
+				skipped = folio_pos(folio) - pos;
+				if (WARN_ON_ONCE(skipped >= length))
+					break;
+				pos += skipped;
+				length -= skipped;
+				bytes = min_t(u64, SIZE_MAX, length);
+			} else
+				WARN_ON_ONCE(pos >= folio_pos(folio) + folio_size(folio));
+		}
+
 		status = iomap_write_begin(iter, pos, bytes, &folio);
 		if (status)
 			return status;
@@ -1417,7 +1442,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 
 		pos += bytes;
 		length -= bytes;
-		written += bytes;
+		written += bytes + skipped;
 	} while (length > 0);
 
 	if (did_zero)
@@ -1425,6 +1450,50 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 	return written;
 }
 
+loff_t
+iomap_fill_dirty_folios(
+	struct inode		*inode,
+	struct iomap		*iomap,
+	loff_t			offset,
+	loff_t			length)
+{
+	struct address_space	*mapping = inode->i_mapping;
+	struct folio_batch	fbatch;
+	pgoff_t			start, end;
+	loff_t			end_pos;
+
+	folio_batch_init(&fbatch);
+	folio_batch_init(&iomap->fbatch);
+
+	end_pos = offset + length;
+	start = offset >> PAGE_SHIFT;
+	end = (end_pos - 1) >> PAGE_SHIFT;
+
+	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
+	       folio_batch_space(&iomap->fbatch)) {
+		struct folio *folio;
+		while ((folio = folio_batch_next(&fbatch))) {
+			if (folio_trylock(folio)) {
+				bool clean = !folio_test_dirty(folio) &&
+					     !folio_test_writeback(folio);
+				folio_unlock(folio);
+				if (clean)
+					continue;
+			}
+
+			folio_get(folio);
+			if (!folio_batch_add(&iomap->fbatch, folio)) {
+				end_pos = folio_pos(folio) + folio_size(folio);
+				break;
+			}
+		}
+		folio_batch_release(&fbatch);
+	}
+
+	return end_pos;
+}
+EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios);
+
 int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index 79a0614eaab7..3c87b5c2b1d9 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -25,6 +25,8 @@ static inline int iomap_iter_advance(struct iomap_iter *iter)
 
 	/* handle the previous iteration (if any) */
 	if (iter->iomap.length) {
+		if (iter->iomap.flags & IOMAP_F_HAS_FOLIOS)
+			folio_batch_release(&iter->iomap.fbatch);
 		if (iter->processed < 0)
 			return iter->processed;
 		if (!iter->processed && !stale)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 27048ec10e1c..7e9d86c3defa 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 #include <linux/mm_types.h>
 #include <linux/blkdev.h>
+#include <linux/pagevec.h>
 
 struct address_space;
 struct fiemap_extent_info;
@@ -77,6 +78,7 @@ struct vm_fault;
  */
 #define IOMAP_F_SIZE_CHANGED	(1U << 8)
 #define IOMAP_F_STALE		(1U << 9)
+#define IOMAP_F_HAS_FOLIOS	(1U << 10)
 
 /*
  * Flags from 0x1000 up are for file system specific usage:
@@ -102,6 +104,7 @@ struct iomap {
 	void			*inline_data;
 	void			*private; /* filesystem private */
 	const struct iomap_folio_ops *folio_ops;
+	struct folio_batch	fbatch;
 	u64			validity_cookie; /* used with .iomap_valid() */
 };
 
@@ -301,6 +304,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
 bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
+loff_t iomap_fill_dirty_folios(struct inode *inode, struct iomap *iomap,
+		loff_t offset, loff_t length);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
 int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC 3/4] xfs: always trim mapping to requested range for zero range
  2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
  2024-11-19 15:46 ` [PATCH RFC 1/4] iomap: allow passing a folio into write begin path Brian Foster
  2024-11-19 15:46 ` [PATCH RFC 2/4] iomap: optional zero range dirty folio processing Brian Foster
@ 2024-11-19 15:46 ` Brian Foster
  2024-11-19 15:46 ` [PATCH RFC 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
  2024-11-20  8:37 ` [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Christoph Hellwig
  4 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2024-11-19 15:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Refactor and tweak the IOMAP_ZERO logic in preparation to support
filling the folio batch for unwritten mappings. Drop the superfluous
imap offset check since the hole case has already been filtered out.
Split the the delalloc case handling into a sub-branch, and always
trim the imap to the requested offset/count so it can be more easily
used to bound the range to lookup in pagecache.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iomap.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 86da16f54be9..ed42a28973bb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1029,17 +1029,16 @@ xfs_buffered_write_iomap_begin(
 	 * block.  If it starts beyond the EOF block, convert it to an
 	 * unwritten extent.
 	 */
-	if ((flags & IOMAP_ZERO) && imap.br_startoff <= offset_fsb &&
-	    isnullstartblock(imap.br_startblock)) {
-		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
-
-		if (offset_fsb >= eof_fsb)
-			goto convert_delay;
-		if (end_fsb > eof_fsb) {
-			end_fsb = eof_fsb;
-			xfs_trim_extent(&imap, offset_fsb,
-					end_fsb - offset_fsb);
+	if (flags & IOMAP_ZERO) {
+		if (isnullstartblock(imap.br_startblock)) {
+			xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
+
+			if (offset_fsb >= eof_fsb)
+				goto convert_delay;
+			if (end_fsb > eof_fsb)
+				end_fsb = eof_fsb;
 		}
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
 
 	/*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC 4/4] xfs: fill dirty folios on zero range of unwritten mappings
  2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
                   ` (2 preceding siblings ...)
  2024-11-19 15:46 ` [PATCH RFC 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
@ 2024-11-19 15:46 ` Brian Foster
  2024-11-20  8:37 ` [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Christoph Hellwig
  4 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2024-11-19 15:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Use the iomap folio batch mechanism to identify which folios to zero
on zero range of unwritten mappings. Trim the resulting mapping if
the batch is filled (unlikely) and set the HAS_FOLIOS flag to inform
iomap that pagecache has been checked for dirty folios.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iomap.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ed42a28973bb..d84b7abf540c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1037,6 +1037,15 @@ xfs_buffered_write_iomap_begin(
 				goto convert_delay;
 			if (end_fsb > eof_fsb)
 				end_fsb = eof_fsb;
+		} else if (imap.br_state == XFS_EXT_UNWRITTEN) {
+			u64 end;
+
+			xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
+			end = iomap_fill_dirty_folios(VFS_I(ip), iomap,
+					XFS_FSB_TO_B(mp, imap.br_startoff),
+					XFS_FSB_TO_B(mp, imap.br_blockcount));
+			end_fsb = min_t(xfs_fileoff_t, end_fsb, XFS_B_TO_FSB(mp, end));
+			iomap_flags |= IOMAP_F_HAS_FOLIOS;
 		}
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 0/4] iomap: zero range folio batch processing prototype
  2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
                   ` (3 preceding siblings ...)
  2024-11-19 15:46 ` [PATCH RFC 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
@ 2024-11-20  8:37 ` Christoph Hellwig
  2024-11-20 14:29   ` Brian Foster
  4 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-20  8:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Tue, Nov 19, 2024 at 10:46:52AM -0500, Brian Foster wrote:
> I thought about using ->private along with a custom ->get_folio(), but I
> don't think that really fits the idea of a built-in mechanism. It might
> be more appropriate to attach to the iter, but that currently isn't
> accessible to ->iomap_begin(). I suppose we could define an
> iomap_to_iter() or some such helper that the fill helper could use to
> populate the batch, but maybe there are other thoughts/ideas?

The iter is the right place, and you can get at it using
container_of as already done by btrfs (and osme of my upcoming code):

	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 1/4] iomap: allow passing a folio into write begin path
  2024-11-19 15:46 ` [PATCH RFC 1/4] iomap: allow passing a folio into write begin path Brian Foster
@ 2024-11-20  8:38   ` Christoph Hellwig
  2024-11-20 14:29     ` Brian Foster
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-20  8:38 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Tue, Nov 19, 2024 at 10:46:53AM -0500, Brian Foster wrote:
> To facilitate batch processing of dirty folios for zero range, tweak
> the write begin path to allow the caller to optionally pass in its
> own folio/pos combination.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ce73d2a48c1e..d1a86aea1a7a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -781,7 +781,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  {
>  	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
>  	const struct iomap *srcmap = iomap_iter_srcmap(iter);
> -	struct folio *folio;
> +	struct folio *folio = *foliop;
>  	int status = 0;
>  
>  	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> @@ -794,9 +794,15 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  	if (!mapping_large_folio_support(iter->inode->i_mapping))
>  		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
>  
> -	folio = __iomap_get_folio(iter, pos, len);
> -	if (IS_ERR(folio))
> -		return PTR_ERR(folio);
> +	/*
> +	 * XXX: Might want to plumb batch handling down through here. For now
> +	 * let the caller do it.

Yeah, plumbing in the batch here would be nicer.

I suspect doing batch processing might actually be a neat thing for
the normal write patch as well.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/4] iomap: optional zero range dirty folio processing
  2024-11-19 15:46 ` [PATCH RFC 2/4] iomap: optional zero range dirty folio processing Brian Foster
@ 2024-11-20  8:43   ` Christoph Hellwig
  2024-11-20 14:34     ` Brian Foster
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-20  8:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Tue, Nov 19, 2024 at 10:46:54AM -0500, Brian Foster wrote:
> +loff_t
> +iomap_fill_dirty_folios(
> +	struct inode		*inode,
> +	struct iomap		*iomap,

If you pass in the batch directly instead of the iomap this is
completely generic and could go into filemap.c.  Adding willy
and linux-mm for these kinds of things also tends to help to
get good review feedback and often improvements.

> +	loff_t			offset,
> +	loff_t			length)
> +{
> +	struct address_space	*mapping = inode->i_mapping;
> +	struct folio_batch	fbatch;
> +	pgoff_t			start, end;
> +	loff_t			end_pos;
> +
> +	folio_batch_init(&fbatch);
> +	folio_batch_init(&iomap->fbatch);
> +
> +	end_pos = offset + length;
> +	start = offset >> PAGE_SHIFT;
> +	end = (end_pos - 1) >> PAGE_SHIFT;

Nit: initializing these at declaration time make the code easier to
read (at least for me :)).

> +
> +	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
> +	       folio_batch_space(&iomap->fbatch)) {
> +		struct folio *folio;
> +		while ((folio = folio_batch_next(&fbatch))) {
> +			if (folio_trylock(folio)) {
> +				bool clean = !folio_test_dirty(folio) &&
> +					     !folio_test_writeback(folio);
> +				folio_unlock(folio);
> +				if (clean)
> +					continue;

What does the lock protect here given that it can become stale as soon
as we unlock?

Note that there also is a filemap_get_folios_tag that only looks up
folios with the right tag (dirty or writeback).  Currently it only
supports a single tag, which wuld not be helpful here, but this might
be worth talking to the pagecache and xarray maintainer.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 0/4] iomap: zero range folio batch processing prototype
  2024-11-20  8:37 ` [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Christoph Hellwig
@ 2024-11-20 14:29   ` Brian Foster
  2024-11-21  5:50     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2024-11-20 14:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Nov 20, 2024 at 12:37:41AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 19, 2024 at 10:46:52AM -0500, Brian Foster wrote:
> > I thought about using ->private along with a custom ->get_folio(), but I
> > don't think that really fits the idea of a built-in mechanism. It might
> > be more appropriate to attach to the iter, but that currently isn't
> > accessible to ->iomap_begin(). I suppose we could define an
> > iomap_to_iter() or some such helper that the fill helper could use to
> > populate the batch, but maybe there are other thoughts/ideas?
> 
> The iter is the right place, and you can get at it using
> container_of as already done by btrfs (and osme of my upcoming code):
> 
> 	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
> 
> 

Ok, yeah.. that's pretty much what I meant by having an iomap_to_iter()
helper, I just wasn't aware that some things were already doing that to
poke at the iter. Thanks.

I'm assuming we'd want this to be a dynamic allocation as well, since
folio_batch is fairly large in comparison (256b compared to 208b
iomap_iter).

Brian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 1/4] iomap: allow passing a folio into write begin path
  2024-11-20  8:38   ` Christoph Hellwig
@ 2024-11-20 14:29     ` Brian Foster
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2024-11-20 14:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Nov 20, 2024 at 12:38:41AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 19, 2024 at 10:46:53AM -0500, Brian Foster wrote:
> > To facilitate batch processing of dirty folios for zero range, tweak
> > the write begin path to allow the caller to optionally pass in its
> > own folio/pos combination.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/iomap/buffered-io.c | 20 +++++++++++++-------
> >  1 file changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ce73d2a48c1e..d1a86aea1a7a 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -781,7 +781,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> >  {
> >  	const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
> >  	const struct iomap *srcmap = iomap_iter_srcmap(iter);
> > -	struct folio *folio;
> > +	struct folio *folio = *foliop;
> >  	int status = 0;
> >  
> >  	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> > @@ -794,9 +794,15 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> >  	if (!mapping_large_folio_support(iter->inode->i_mapping))
> >  		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
> >  
> > -	folio = __iomap_get_folio(iter, pos, len);
> > -	if (IS_ERR(folio))
> > -		return PTR_ERR(folio);
> > +	/*
> > +	 * XXX: Might want to plumb batch handling down through here. For now
> > +	 * let the caller do it.
> 
> Yeah, plumbing in the batch here would be nicer.
> 

Ok, I'll take a closer look at that. IIRC I punted on it initially
because we'll also have to fix up pos/len down in this path when the
next folio is not contiguous. This was easier to just get the basics
working.

Brian

> I suspect doing batch processing might actually be a neat thing for
> the normal write patch as well.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/4] iomap: optional zero range dirty folio processing
  2024-11-20  8:43   ` Christoph Hellwig
@ 2024-11-20 14:34     ` Brian Foster
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2024-11-20 14:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Nov 20, 2024 at 12:43:47AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 19, 2024 at 10:46:54AM -0500, Brian Foster wrote:
> > +loff_t
> > +iomap_fill_dirty_folios(
> > +	struct inode		*inode,
> > +	struct iomap		*iomap,
> 
> If you pass in the batch directly instead of the iomap this is
> completely generic and could go into filemap.c.  Adding willy
> and linux-mm for these kinds of things also tends to help to
> get good review feedback and often improvements.
> 

That is a good idea, but I'm not sure this will remain generic. One of
the tradeoffs this current patch makes is that for the sub-folio block
size (whether small blocks or large folios), we zero any subrange so
long as the high level folio is dirty.

This causes something like generic/445 to fail on small block sizes
because we explicitly zero a subrange that technically wasn't written
to, but rather some other part of the same folio was, and therefore
SEEK_DATA finds it after when a hole was expected based on the test
workload. I was thinking of improving this by using
ifs_find_dirty_range() here to further filter out unwritten subranges
based on the target range being zeroed, but haven't done that yet.

That said, I suspect that still won't be perfect and some spurious
zeroing may still be possible. Personally, I think it might be fine to
leave as is and just fix up the fstests test to be a little less strict
about SEEK_DATA at block granularity. We have to writeback the folio
anyways and so I'm not sure it makes much practical difference. So if
there's preference to try and keep this generic in favor of that
functional tradeoff, I'm certainly fine with going that route. Thoughts?

(Hmm.. thinking more about it, it may also be possible to fix up via a
post-process on the first/last folios in the batch, so maybe this
doesn't technically have to be an either or choice.)

> > +	loff_t			offset,
> > +	loff_t			length)
> > +{
> > +	struct address_space	*mapping = inode->i_mapping;
> > +	struct folio_batch	fbatch;
> > +	pgoff_t			start, end;
> > +	loff_t			end_pos;
> > +
> > +	folio_batch_init(&fbatch);
> > +	folio_batch_init(&iomap->fbatch);
> > +
> > +	end_pos = offset + length;
> > +	start = offset >> PAGE_SHIFT;
> > +	end = (end_pos - 1) >> PAGE_SHIFT;
> 
> Nit: initializing these at declaration time make the code easier to
> read (at least for me :)).
> 

Ok.

> > +
> > +	while (filemap_get_folios(mapping, &start, end, &fbatch) &&
> > +	       folio_batch_space(&iomap->fbatch)) {
> > +		struct folio *folio;
> > +		while ((folio = folio_batch_next(&fbatch))) {
> > +			if (folio_trylock(folio)) {
> > +				bool clean = !folio_test_dirty(folio) &&
> > +					     !folio_test_writeback(folio);
> > +				folio_unlock(folio);
> > +				if (clean)
> > +					continue;
> 
> What does the lock protect here given that it can become stale as soon
> as we unlock?
> 

I thought the lock was needed to correctly test for dirty and writeback,
because dirty is cleared before writeback is set under folio lock. That
said, I suppose this could also just do the folio_test_locked() thing
that filemap_range_has_writeback() does.

Just note that this is possibly affected by how we decide to handle the
sub-folio case above, as I think we'd also need the lock for looking at
the iomap_folio_state.

> Note that there also is a filemap_get_folios_tag that only looks up
> folios with the right tag (dirty or writeback).  Currently it only
> supports a single tag, which wuld not be helpful here, but this might
> be worth talking to the pagecache and xarray maintainer.
> 

Yeah, that was one of the first things I looked into before writing the
fill helper, but atm it didn't look possible to check for an OR
combination of xarray tags. That might be a nice future improvement if
reasonably possible, or if somebody who knows that code better wants to
cook something up faster than I can, but otherwise I don't really want
to make that a hard requirement.

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 0/4] iomap: zero range folio batch processing prototype
  2024-11-20 14:29   ` Brian Foster
@ 2024-11-21  5:50     ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-21  5:50 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Wed, Nov 20, 2024 at 09:29:17AM -0500, Brian Foster wrote:
> helper, I just wasn't aware that some things were already doing that to
> poke at the iter. Thanks.
> 
> I'm assuming we'd want this to be a dynamic allocation as well, since
> folio_batch is fairly large in comparison (256b compared to 208b
> iomap_iter).

Or at least only add a pointer to it in the iter and then point to a
separate stack allocation for it.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-21  5:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-19 15:46 [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Brian Foster
2024-11-19 15:46 ` [PATCH RFC 1/4] iomap: allow passing a folio into write begin path Brian Foster
2024-11-20  8:38   ` Christoph Hellwig
2024-11-20 14:29     ` Brian Foster
2024-11-19 15:46 ` [PATCH RFC 2/4] iomap: optional zero range dirty folio processing Brian Foster
2024-11-20  8:43   ` Christoph Hellwig
2024-11-20 14:34     ` Brian Foster
2024-11-19 15:46 ` [PATCH RFC 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
2024-11-19 15:46 ` [PATCH RFC 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
2024-11-20  8:37 ` [PATCH RFC 0/4] iomap: zero range folio batch processing prototype Christoph Hellwig
2024-11-20 14:29   ` Brian Foster
2024-11-21  5:50     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox