[PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance
@ 2023-06-19  2:28 Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

Hello All,

Please find PATCHv10 which adds per-block dirty tracking to iomap.
As discussed earlier this is required to improve write performance and reduce
write amplification for cases where either blocksize is less than pagesize (such
as Power platform with 64k pagesize) or when we have a large folio (such as xfs
which currently supports large folio).

v9 -> v10:
===========
1. Mostly function renames from iomap_ifs_** to ifs_** (patch-1)
2. Addressed review comments from Darrick & Andreas
3. Added 2 suggested by patches from Matthew (patch-4 & patch-5)
4. Defined a new separate function for iomap_write_delalloc_ifs_punch() in
   Patch-8

Note: since v10 mainly had function name changes in existing patches (which
already had reviewed-by), hence I have kept the Reviewed-by from previous
reviewers as is.

Testing of v10:
===============
I have done some weekend long testing of v10 on my setup for x86 (1k & 4k bs),
arm (4k bs, 64k ps) and Power (4k bs) with xfstests. I haven't found any new
failures as such in my testing so far with xfstests.

v7/v8 -> v9
============
1. Splitted the renaming & refactoring changes into different patchsets.
   (Patch-1 & Patch-2)
2. Addressed review comments from everyone in v9.
3. Fixed a punch-out bug pointed out by Darrick in Patch-6.
4. Included iomap_ifs_calc_range() function suggested by Christoph in Patch-6.

Testing
=========
I have tested v9 on:-
   - Power with 4k blocksize -g auto
   - x86 with 1k and 1k_adv with -g auto
   - arm64 with 4k blocksize and 64k pagesize with 4k quick
   - also tested gfs2 with minimal local config (-O -b 1024 -p lock_nolock)
   - unit tested failed punch-out operation with "-f" support to pwrite in
     xfs_io.
I haven't observed any new testcase failures in any of my testing so far.

Thanks everyone for helping with reviews and suggestions.
Please do let me know if there are any further review comments on this one.

<Perf data copy paste from previous version>
=============================================
Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.

1. <test_randwrite.fio>
[global]
	ioengine=psync
	rw=randwrite
	overwrite=1
	pre_read=1
	direct=0
	bs=4k
	size=1G
	dir=./
	numjobs=8
	fdatasync=1
	runtime=60
	iodepth=64
	group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves
   their database workload performance by around ~83% (with XFS on Power)

Ritesh Harjani (IBM) (8):
  iomap: Rename iomap_page to iomap_folio_state and others
  iomap: Drop ifs argument from iomap_set_range_uptodate()
  iomap: Add some uptodate state handling helpers for ifs state bitmap
  iomap: Fix possible overflow condition in iomap_write_delalloc_scan
  iomap: Use iomap_punch_t typedef
  iomap: Refactor iomap_write_delalloc_punch() function out
  iomap: Allocate ifs in ->write_begin() early
  iomap: Add per-block dirty state tracking to improve performance

 fs/gfs2/aops.c         |   2 +-
 fs/iomap/buffered-io.c | 418 +++++++++++++++++++++++++++++------------
 fs/xfs/xfs_aops.c      |   2 +-
 fs/zonefs/file.c       |   2 +-
 include/linux/iomap.h  |   1 +
 5 files changed, 298 insertions(+), 127 deletions(-)

--
2.40.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Christoph Hellwig

struct iomap_page actually tracks per-block state of a folio.
Hence it make sense to rename some of these function names and data
structures for e.g.
1. struct iomap_page (iop) -> struct iomap_folio_state (ifs)
2. iomap_page_create() -> ifs_alloc()
3. iomap_page_release() -> ifs_free()
4. iomap_iop_set_range_uptodate() -> ifs_set_range_uptodate()
5. to_iomap_page() -> folio->private

Since in later patches we are also going to add per-block dirty state
tracking to iomap_folio_state. Hence this patch also renames "uptodate"
& "uptodate_lock" members of iomap_folio_state to "state" and"state_lock".

We don't really need to_iomap_page() function, instead directly open code
it as folio->private;

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 151 ++++++++++++++++++++---------------------
 1 file changed, 72 insertions(+), 79 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f4..2675a3e0ac1d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -24,64 +24,57 @@
 #define IOEND_BATCH_SIZE	4096
 
 /*
- * Structure allocated for each folio when block size < folio size
- * to track sub-folio uptodate status and I/O completions.
+ * Structure allocated for each folio to track per-block uptodate state
+ * and I/O completions.
  */
-struct iomap_page {
+struct iomap_folio_state {
 	atomic_t		read_bytes_pending;
 	atomic_t		write_bytes_pending;
-	spinlock_t		uptodate_lock;
-	unsigned long		uptodate[];
+	spinlock_t		state_lock;
+	unsigned long		state[];
 };
 
-static inline struct iomap_page *to_iomap_page(struct folio *folio)
-{
-	if (folio_test_private(folio))
-		return folio_get_private(folio);
-	return NULL;
-}
-
 static struct bio_set iomap_ioend_bioset;
 
-static struct iomap_page *
-iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
+static struct iomap_folio_state *ifs_alloc(struct inode *inode,
+		struct folio *folio, unsigned int flags)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
 	gfp_t gfp;
 
-	if (iop || nr_blocks <= 1)
-		return iop;
+	if (ifs || nr_blocks <= 1)
+		return ifs;
 
 	if (flags & IOMAP_NOWAIT)
 		gfp = GFP_NOWAIT;
 	else
 		gfp = GFP_NOFS | __GFP_NOFAIL;
 
-	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+	ifs = kzalloc(struct_size(ifs, state, BITS_TO_LONGS(nr_blocks)),
 		      gfp);
-	if (iop) {
-		spin_lock_init(&iop->uptodate_lock);
+	if (ifs) {
+		spin_lock_init(&ifs->state_lock);
 		if (folio_test_uptodate(folio))
-			bitmap_fill(iop->uptodate, nr_blocks);
-		folio_attach_private(folio, iop);
+			bitmap_fill(ifs->state, nr_blocks);
+		folio_attach_private(folio, ifs);
 	}
-	return iop;
+	return ifs;
 }
 
-static void iomap_page_release(struct folio *folio)
+static void ifs_free(struct folio *folio)
 {
-	struct iomap_page *iop = folio_detach_private(folio);
+	struct iomap_folio_state *ifs = folio_detach_private(folio);
 	struct inode *inode = folio->mapping->host;
 	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
 
-	if (!iop)
+	if (!ifs)
 		return;
-	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
-	WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
-	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+	WARN_ON_ONCE(atomic_read(&ifs->read_bytes_pending));
+	WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending));
+	WARN_ON_ONCE(bitmap_full(ifs->state, nr_blocks) !=
 			folio_test_uptodate(folio));
-	kfree(iop);
+	kfree(ifs);
 }
 
 /*
@@ -90,7 +83,7 @@ static void iomap_page_release(struct folio *folio)
 static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		loff_t *pos, loff_t length, size_t *offp, size_t *lenp)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 	loff_t orig_pos = *pos;
 	loff_t isize = i_size_read(inode);
 	unsigned block_bits = inode->i_blkbits;
@@ -105,12 +98,12 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	 * per-block uptodate status and adjust the offset and length if needed
 	 * to avoid reading in already uptodate ranges.
 	 */
-	if (iop) {
+	if (ifs) {
 		unsigned int i;
 
 		/* move forward for each leading block marked uptodate */
 		for (i = first; i <= last; i++) {
-			if (!test_bit(i, iop->uptodate))
+			if (!test_bit(i, ifs->state))
 				break;
 			*pos += block_size;
 			poff += block_size;
@@ -120,7 +113,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 
 		/* truncate len if we find any trailing uptodate block(s) */
 		for ( ; i <= last; i++) {
-			if (test_bit(i, iop->uptodate)) {
+			if (test_bit(i, ifs->state)) {
 				plen -= (last - i + 1) * block_size;
 				last = i - 1;
 				break;
@@ -144,26 +137,26 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	*lenp = plen;
 }
 
-static void iomap_iop_set_range_uptodate(struct folio *folio,
-		struct iomap_page *iop, size_t off, size_t len)
+static void ifs_set_range_uptodate(struct folio *folio,
+		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned first = off >> inode->i_blkbits;
 	unsigned last = (off + len - 1) >> inode->i_blkbits;
 	unsigned long flags;
 
-	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	bitmap_set(iop->uptodate, first, last - first + 1);
-	if (bitmap_full(iop->uptodate, i_blocks_per_folio(inode, folio)))
+	spin_lock_irqsave(&ifs->state_lock, flags);
+	bitmap_set(ifs->state, first, last - first + 1);
+	if (bitmap_full(ifs->state, i_blocks_per_folio(inode, folio)))
 		folio_mark_uptodate(folio);
-	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
+	spin_unlock_irqrestore(&ifs->state_lock, flags);
 }
 
 static void iomap_set_range_uptodate(struct folio *folio,
-		struct iomap_page *iop, size_t off, size_t len)
+		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
-	if (iop)
-		iomap_iop_set_range_uptodate(folio, iop, off, len);
+	if (ifs)
+		ifs_set_range_uptodate(folio, ifs, off, len);
 	else
 		folio_mark_uptodate(folio);
 }
@@ -171,16 +164,16 @@ static void iomap_set_range_uptodate(struct folio *folio,
 static void iomap_finish_folio_read(struct folio *folio, size_t offset,
 		size_t len, int error)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 
 	if (unlikely(error)) {
 		folio_clear_uptodate(folio);
 		folio_set_error(folio);
 	} else {
-		iomap_set_range_uptodate(folio, iop, offset, len);
+		iomap_set_range_uptodate(folio, ifs, offset, len);
 	}
 
-	if (!iop || atomic_sub_and_test(len, &iop->read_bytes_pending))
+	if (!ifs || atomic_sub_and_test(len, &ifs->read_bytes_pending))
 		folio_unlock(folio);
 }
 
@@ -213,7 +206,7 @@ struct iomap_readpage_ctx {
 static int iomap_read_inline_data(const struct iomap_iter *iter,
 		struct folio *folio)
 {
-	struct iomap_page *iop;
+	struct iomap_folio_state *ifs;
 	const struct iomap *iomap = iomap_iter_srcmap(iter);
 	size_t size = i_size_read(iter->inode) - iomap->offset;
 	size_t poff = offset_in_page(iomap->offset);
@@ -231,15 +224,15 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
 	if (WARN_ON_ONCE(size > iomap->length))
 		return -EIO;
 	if (offset > 0)
-		iop = iomap_page_create(iter->inode, folio, iter->flags);
+		ifs = ifs_alloc(iter->inode, folio, iter->flags);
 	else
-		iop = to_iomap_page(folio);
+		ifs = folio->private;
 
 	addr = kmap_local_folio(folio, offset);
 	memcpy(addr, iomap->inline_data, size);
 	memset(addr + size, 0, PAGE_SIZE - poff - size);
 	kunmap_local(addr);
-	iomap_set_range_uptodate(folio, iop, offset, PAGE_SIZE - poff);
+	iomap_set_range_uptodate(folio, ifs, offset, PAGE_SIZE - poff);
 	return 0;
 }
 
@@ -260,7 +253,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	loff_t pos = iter->pos + offset;
 	loff_t length = iomap_length(iter) - offset;
 	struct folio *folio = ctx->cur_folio;
-	struct iomap_page *iop;
+	struct iomap_folio_state *ifs;
 	loff_t orig_pos = pos;
 	size_t poff, plen;
 	sector_t sector;
@@ -269,20 +262,20 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		return iomap_read_inline_data(iter, folio);
 
 	/* zero post-eof blocks as the page may be mapped */
-	iop = iomap_page_create(iter->inode, folio, iter->flags);
+	ifs = ifs_alloc(iter->inode, folio, iter->flags);
 	iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
 	if (iomap_block_needs_zeroing(iter, pos)) {
 		folio_zero_range(folio, poff, plen);
-		iomap_set_range_uptodate(folio, iop, poff, plen);
+		iomap_set_range_uptodate(folio, ifs, poff, plen);
 		goto done;
 	}
 
 	ctx->cur_folio_in_bio = true;
-	if (iop)
-		atomic_add(plen, &iop->read_bytes_pending);
+	if (ifs)
+		atomic_add(plen, &ifs->read_bytes_pending);
 
 	sector = iomap_sector(iomap, pos);
 	if (!ctx->bio ||
@@ -436,11 +429,11 @@ EXPORT_SYMBOL_GPL(iomap_readahead);
  */
 bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
 	unsigned first, last, i;
 
-	if (!iop)
+	if (!ifs)
 		return false;
 
 	/* Caller's range may extend past the end of this folio */
@@ -451,7 +444,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	last = (from + count - 1) >> inode->i_blkbits;
 
 	for (i = first; i <= last; i++)
-		if (!test_bit(i, iop->uptodate))
+		if (!test_bit(i, ifs->state))
 			return false;
 	return true;
 }
@@ -490,7 +483,7 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags)
 	 */
 	if (folio_test_dirty(folio) || folio_test_writeback(folio))
 		return false;
-	iomap_page_release(folio);
+	ifs_free(folio);
 	return true;
 }
 EXPORT_SYMBOL_GPL(iomap_release_folio);
@@ -507,12 +500,12 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
 	if (offset == 0 && len == folio_size(folio)) {
 		WARN_ON_ONCE(folio_test_writeback(folio));
 		folio_cancel_dirty(folio);
-		iomap_page_release(folio);
+		ifs_free(folio);
 	} else if (folio_test_large(folio)) {
-		/* Must release the iop so the page can be split */
+		/* Must release the ifs so the page can be split */
 		WARN_ON_ONCE(!folio_test_uptodate(folio) &&
 			     folio_test_dirty(folio));
-		iomap_page_release(folio);
+		ifs_free(folio);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
@@ -547,7 +540,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 		size_t len, struct folio *folio)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
-	struct iomap_page *iop;
+	struct iomap_folio_state *ifs;
 	loff_t block_size = i_blocksize(iter->inode);
 	loff_t block_start = round_down(pos, block_size);
 	loff_t block_end = round_up(pos + len, block_size);
@@ -559,8 +552,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 		return 0;
 	folio_clear_error(folio);
 
-	iop = iomap_page_create(iter->inode, folio, iter->flags);
-	if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
+	ifs = ifs_alloc(iter->inode, folio, iter->flags);
+	if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
 		return -EAGAIN;
 
 	do {
@@ -589,7 +582,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 			if (status)
 				return status;
 		}
-		iomap_set_range_uptodate(folio, iop, poff, plen);
+		iomap_set_range_uptodate(folio, ifs, poff, plen);
 	} while ((block_start += plen) < block_end);
 
 	return 0;
@@ -696,7 +689,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct folio *folio)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 	flush_dcache_folio(folio);
 
 	/*
@@ -712,7 +705,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 */
 	if (unlikely(copied < len && !folio_test_uptodate(folio)))
 		return 0;
-	iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len);
+	iomap_set_range_uptodate(folio, ifs, offset_in_folio(folio, pos), len);
 	filemap_dirty_folio(inode->i_mapping, folio);
 	return copied;
 }
@@ -1290,17 +1283,17 @@ EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len, int error)
 {
-	struct iomap_page *iop = to_iomap_page(folio);
+	struct iomap_folio_state *ifs = folio->private;
 
 	if (error) {
 		folio_set_error(folio);
 		mapping_set_error(inode->i_mapping, error);
 	}
 
-	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !iop);
-	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) <= 0);
+	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
+	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
 
-	if (!iop || atomic_sub_and_test(len, &iop->write_bytes_pending))
+	if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
 		folio_end_writeback(folio);
 }
 
@@ -1567,7 +1560,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
  */
 static void
 iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
-		struct iomap_page *iop, struct iomap_writepage_ctx *wpc,
+		struct iomap_folio_state *ifs, struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct list_head *iolist)
 {
 	sector_t sector = iomap_sector(&wpc->iomap, pos);
@@ -1585,8 +1578,8 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
 		bio_add_folio(wpc->ioend->io_bio, folio, len, poff);
 	}
 
-	if (iop)
-		atomic_add(len, &iop->write_bytes_pending);
+	if (ifs)
+		atomic_add(len, &ifs->write_bytes_pending);
 	wpc->ioend->io_size += len;
 	wbc_account_cgroup_owner(wbc, &folio->page, len);
 }
@@ -1612,7 +1605,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct inode *inode,
 		struct folio *folio, u64 end_pos)
 {
-	struct iomap_page *iop = iomap_page_create(inode, folio, 0);
+	struct iomap_folio_state *ifs = ifs_alloc(inode, folio, 0);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	unsigned nblocks = i_blocks_per_folio(inode, folio);
@@ -1620,7 +1613,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
+	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
 
 	/*
 	 * Walk through the folio to find areas to write back. If we
@@ -1628,7 +1621,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * invalid, grab a new one.
 	 */
 	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
-		if (iop && !test_bit(i, iop->uptodate))
+		if (ifs && !test_bit(i, ifs->state))
 			continue;
 
 		error = wpc->ops->map_blocks(wpc, inode, pos);
@@ -1639,7 +1632,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 			continue;
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
-		iomap_add_to_ioend(inode, pos, folio, iop, wpc, wbc,
+		iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
 				 &submit_list);
 		count++;
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate()
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Christoph Hellwig

iomap_folio_state (ifs) can be derived directly from the folio, making it
unnecessary to pass "ifs" as an argument to iomap_set_range_uptodate().
This patch eliminates "ifs" argument from iomap_set_range_uptodate()
function.

Also, the definition of iomap_set_range_uptodate() and
ifs_set_range_uptodate() functions are moved above ifs_alloc().
In upcoming patches, we plan to introduce additional helper routines for
handling dirty state, with the intention of consolidating all of "ifs"
state handling routines at one place.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 67 +++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2675a3e0ac1d..3ff7688b360a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -36,6 +36,33 @@ struct iomap_folio_state {
 
 static struct bio_set iomap_ioend_bioset;
 
+static void ifs_set_range_uptodate(struct folio *folio,
+		struct iomap_folio_state *ifs, size_t off, size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int first_blk = off >> inode->i_blkbits;
+	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ifs->state_lock, flags);
+	bitmap_set(ifs->state, first_blk, nr_blks);
+	if (bitmap_full(ifs->state, i_blocks_per_folio(inode, folio)))
+		folio_mark_uptodate(folio);
+	spin_unlock_irqrestore(&ifs->state_lock, flags);
+}
+
+static void iomap_set_range_uptodate(struct folio *folio, size_t off,
+		size_t len)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (ifs)
+		ifs_set_range_uptodate(folio, ifs, off, len);
+	else
+		folio_mark_uptodate(folio);
+}
+
 static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 		struct folio *folio, unsigned int flags)
 {
@@ -137,30 +164,6 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	*lenp = plen;
 }
 
-static void ifs_set_range_uptodate(struct folio *folio,
-		struct iomap_folio_state *ifs, size_t off, size_t len)
-{
-	struct inode *inode = folio->mapping->host;
-	unsigned first = off >> inode->i_blkbits;
-	unsigned last = (off + len - 1) >> inode->i_blkbits;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ifs->state_lock, flags);
-	bitmap_set(ifs->state, first, last - first + 1);
-	if (bitmap_full(ifs->state, i_blocks_per_folio(inode, folio)))
-		folio_mark_uptodate(folio);
-	spin_unlock_irqrestore(&ifs->state_lock, flags);
-}
-
-static void iomap_set_range_uptodate(struct folio *folio,
-		struct iomap_folio_state *ifs, size_t off, size_t len)
-{
-	if (ifs)
-		ifs_set_range_uptodate(folio, ifs, off, len);
-	else
-		folio_mark_uptodate(folio);
-}
-
 static void iomap_finish_folio_read(struct folio *folio, size_t offset,
 		size_t len, int error)
 {
@@ -170,7 +173,7 @@ static void iomap_finish_folio_read(struct folio *folio, size_t offset,
 		folio_clear_uptodate(folio);
 		folio_set_error(folio);
 	} else {
-		iomap_set_range_uptodate(folio, ifs, offset, len);
+		iomap_set_range_uptodate(folio, offset, len);
 	}
 
 	if (!ifs || atomic_sub_and_test(len, &ifs->read_bytes_pending))
@@ -206,7 +209,6 @@ struct iomap_readpage_ctx {
 static int iomap_read_inline_data(const struct iomap_iter *iter,
 		struct folio *folio)
 {
-	struct iomap_folio_state *ifs;
 	const struct iomap *iomap = iomap_iter_srcmap(iter);
 	size_t size = i_size_read(iter->inode) - iomap->offset;
 	size_t poff = offset_in_page(iomap->offset);
@@ -224,15 +226,13 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
 	if (WARN_ON_ONCE(size > iomap->length))
 		return -EIO;
 	if (offset > 0)
-		ifs = ifs_alloc(iter->inode, folio, iter->flags);
-	else
-		ifs = folio->private;
+		ifs_alloc(iter->inode, folio, iter->flags);
 
 	addr = kmap_local_folio(folio, offset);
 	memcpy(addr, iomap->inline_data, size);
 	memset(addr + size, 0, PAGE_SIZE - poff - size);
 	kunmap_local(addr);
-	iomap_set_range_uptodate(folio, ifs, offset, PAGE_SIZE - poff);
+	iomap_set_range_uptodate(folio, offset, PAGE_SIZE - poff);
 	return 0;
 }
 
@@ -269,7 +269,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 	if (iomap_block_needs_zeroing(iter, pos)) {
 		folio_zero_range(folio, poff, plen);
-		iomap_set_range_uptodate(folio, ifs, poff, plen);
+		iomap_set_range_uptodate(folio, poff, plen);
 		goto done;
 	}
 
@@ -582,7 +582,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 			if (status)
 				return status;
 		}
-		iomap_set_range_uptodate(folio, ifs, poff, plen);
+		iomap_set_range_uptodate(folio, poff, plen);
 	} while ((block_start += plen) < block_end);
 
 	return 0;
@@ -689,7 +689,6 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct folio *folio)
 {
-	struct iomap_folio_state *ifs = folio->private;
 	flush_dcache_folio(folio);
 
 	/*
@@ -705,7 +704,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 */
 	if (unlikely(copied < len && !folio_test_uptodate(folio)))
 		return 0;
-	iomap_set_range_uptodate(folio, ifs, offset_in_folio(folio, pos), len);
+	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
 	filemap_dirty_folio(inode->i_mapping, folio);
 	return copied;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Christoph Hellwig

This patch adds two of the helper routines ifs_is_fully_uptodate()
and ifs_block_is_uptodate() for managing uptodate state of "ifs" state
bitmap.

In later patches ifs state bitmap array will also handle dirty state of all
blocks of a folio. Hence this patch adds some helper routines for handling
uptodate state of the ifs state bitmap.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3ff7688b360a..790b8413b44c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -36,6 +36,20 @@ struct iomap_folio_state {
 
 static struct bio_set iomap_ioend_bioset;
 
+static inline bool ifs_is_fully_uptodate(struct folio *folio,
+					       struct iomap_folio_state *ifs)
+{
+	struct inode *inode = folio->mapping->host;
+
+	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
+}
+
+static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
+					       unsigned int block)
+{
+	return test_bit(block, ifs->state);
+}
+
 static void ifs_set_range_uptodate(struct folio *folio,
 		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
@@ -47,7 +61,7 @@ static void ifs_set_range_uptodate(struct folio *folio,
 
 	spin_lock_irqsave(&ifs->state_lock, flags);
 	bitmap_set(ifs->state, first_blk, nr_blks);
-	if (bitmap_full(ifs->state, i_blocks_per_folio(inode, folio)))
+	if (ifs_is_fully_uptodate(folio, ifs))
 		folio_mark_uptodate(folio);
 	spin_unlock_irqrestore(&ifs->state_lock, flags);
 }
@@ -92,14 +106,12 @@ static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 static void ifs_free(struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio_detach_private(folio);
-	struct inode *inode = folio->mapping->host;
-	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
 
 	if (!ifs)
 		return;
 	WARN_ON_ONCE(atomic_read(&ifs->read_bytes_pending));
 	WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending));
-	WARN_ON_ONCE(bitmap_full(ifs->state, nr_blocks) !=
+	WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
 			folio_test_uptodate(folio));
 	kfree(ifs);
 }
@@ -130,7 +142,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 
 		/* move forward for each leading block marked uptodate */
 		for (i = first; i <= last; i++) {
-			if (!test_bit(i, ifs->state))
+			if (!ifs_block_is_uptodate(ifs, i))
 				break;
 			*pos += block_size;
 			poff += block_size;
@@ -140,7 +152,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 
 		/* truncate len if we find any trailing uptodate block(s) */
 		for ( ; i <= last; i++) {
-			if (test_bit(i, ifs->state)) {
+			if (ifs_block_is_uptodate(ifs, i)) {
 				plen -= (last - i + 1) * block_size;
 				last = i - 1;
 				break;
@@ -444,7 +456,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	last = (from + count - 1) >> inode->i_blkbits;
 
 	for (i = first; i <= last; i++)
-		if (!test_bit(i, ifs->state))
+		if (!ifs_block_is_uptodate(ifs, i))
 			return false;
 	return true;
 }
@@ -1620,7 +1632,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * invalid, grab a new one.
 	 */
 	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
-		if (ifs && !test_bit(i, ifs->state))
+		if (ifs && !ifs_block_is_uptodate(ifs, i))
 			continue;
 
 		error = wpc->ops->map_blocks(wpc, inode, pos);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
                   ` (2 preceding siblings ...)
  2023-06-19  2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

folio_next_index() returns an unsigned long value which left shifted
by PAGE_SHIFT could possibly cause an overflow. Instead use
folio_pos(folio) + folio_size(folio), which does this correctly.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 790b8413b44c..8206f3628586 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -933,7 +933,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
 			 * the end of this data range, not the end of the folio.
 			 */
 			*punch_start_byte = min_t(loff_t, end_byte,
-					folio_next_index(folio) << PAGE_SHIFT);
+					folio_pos(folio) + folio_size(folio));
 		}
 
 		/* move offset to start of next folio in range */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 5/8] iomap: Use iomap_punch_t typedef
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
                   ` (3 preceding siblings ...)
  2023-06-19  2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

It makes it much easier if we have iomap_punch_t typedef for "punch"
function pointer in all delalloc related punch, scan and release
functions. It will be useful in later patches when we will factor out
iomap_write_delalloc_punch() function.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8206f3628586..e03ffdc259c4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -23,6 +23,7 @@
 
 #define IOEND_BATCH_SIZE	4096
 
+typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
 /*
  * Structure allocated for each folio to track per-block uptodate state
  * and I/O completions.
@@ -900,7 +901,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
  */
 static int iomap_write_delalloc_scan(struct inode *inode,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
-		int (*punch)(struct inode *inode, loff_t offset, loff_t length))
+		iomap_punch_t punch)
 {
 	while (start_byte < end_byte) {
 		struct folio	*folio;
@@ -978,8 +979,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
  * the code to subtle off-by-one bugs....
  */
 static int iomap_write_delalloc_release(struct inode *inode,
-		loff_t start_byte, loff_t end_byte,
-		int (*punch)(struct inode *inode, loff_t pos, loff_t length))
+		loff_t start_byte, loff_t end_byte, iomap_punch_t punch)
 {
 	loff_t punch_start_byte = start_byte;
 	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
@@ -1072,8 +1072,7 @@ static int iomap_write_delalloc_release(struct inode *inode,
  */
 int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 		struct iomap *iomap, loff_t pos, loff_t length,
-		ssize_t written,
-		int (*punch)(struct inode *inode, loff_t pos, loff_t length))
+		ssize_t written, iomap_punch_t punch)
 {
 	loff_t			start_byte;
 	loff_t			end_byte;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
                   ` (4 preceding siblings ...)
  2023-06-19  2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Christoph Hellwig

This patch factors iomap_write_delalloc_punch() function out. This function
is resposible for actual punch out operation.
The reason for doing this is, to avoid deep indentation when we bring
punch-out of individual non-dirty blocks within a dirty folio in a later
patch (which adds per-block dirty status handling to iomap) to avoid
delalloc block leak.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 53 ++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e03ffdc259c4..2d79061022d8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -882,6 +882,32 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
+static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
+		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
+		iomap_punch_t punch)
+{
+	int ret = 0;
+
+	if (!folio_test_dirty(folio))
+		return ret;
+
+	/* if dirty, punch up to offset */
+	if (start_byte > *punch_start_byte) {
+		ret = punch(inode, *punch_start_byte,
+				start_byte - *punch_start_byte);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * Make sure the next punch start is correctly bound to
+	 * the end of this data range, not the end of the folio.
+	 */
+	*punch_start_byte = min_t(loff_t, end_byte,
+				folio_pos(folio) + folio_size(folio));
+
+	return ret;
+}
+
 /*
  * Scan the data range passed to us for dirty page cache folios. If we find a
  * dirty folio, punch out the preceeding range and update the offset from which
@@ -905,6 +931,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
 {
 	while (start_byte < end_byte) {
 		struct folio	*folio;
+		int ret;
 
 		/* grab locked page */
 		folio = filemap_lock_folio(inode->i_mapping,
@@ -915,26 +942,12 @@ static int iomap_write_delalloc_scan(struct inode *inode,
 			continue;
 		}
 
-		/* if dirty, punch up to offset */
-		if (folio_test_dirty(folio)) {
-			if (start_byte > *punch_start_byte) {
-				int	error;
-
-				error = punch(inode, *punch_start_byte,
-						start_byte - *punch_start_byte);
-				if (error) {
-					folio_unlock(folio);
-					folio_put(folio);
-					return error;
-				}
-			}
-
-			/*
-			 * Make sure the next punch start is correctly bound to
-			 * the end of this data range, not the end of the folio.
-			 */
-			*punch_start_byte = min_t(loff_t, end_byte,
-					folio_pos(folio) + folio_size(folio));
+		ret = iomap_write_delalloc_punch(inode, folio, punch_start_byte,
+						 start_byte, end_byte, punch);
+		if (ret) {
+			folio_unlock(folio);
+			folio_put(folio);
+			return ret;
 		}
 
 		/* move offset to start of next folio in range */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
                   ` (5 preceding siblings ...)
  2023-06-19  2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19  2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
  7 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Christoph Hellwig

We dont need to allocate an ifs in ->write_begin() for writes where the
position and length completely overlap with the given folio.
Therefore, such cases are skipped.

Currently when the folio is uptodate, we only allocate ifs at writeback
time (in iomap_writepage_map()). This is ok until now, but when we are
going to add support for per-block dirty state bitmap in ifs, this
could cause some performance degradation. The reason is that if we don't
allocate ifs during ->write_begin(), then we will never mark the
necessary dirty bits in ->write_end() call. And we will have to mark all
the bits as dirty at the writeback time, that could cause the same write
amplification and performance problems as it is now.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2d79061022d8..391d918ddd22 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -561,14 +561,23 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	size_t from = offset_in_folio(folio, pos), to = from + len;
 	size_t poff, plen;
 
-	if (folio_test_uptodate(folio))
+	/*
+	 * If the write completely overlaps the current folio, then
+	 * entire folio will be dirtied so there is no need for
+	 * per-block state tracking structures to be attached to this folio.
+	 */
+	if (pos <= folio_pos(folio) &&
+	    pos + len >= folio_pos(folio) + folio_size(folio))
 		return 0;
-	folio_clear_error(folio);
 
 	ifs = ifs_alloc(iter->inode, folio, iter->flags);
 	if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
 		return -EAGAIN;
 
+	if (folio_test_uptodate(folio))
+		return 0;
+	folio_clear_error(folio);
+
 	do {
 		iomap_adjust_read_range(iter->inode, folio, &block_start,
 				block_end - block_start, &poff, &plen);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
                   ` (6 preceding siblings ...)
  2023-06-19  2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
@ 2023-06-19  2:28 ` Ritesh Harjani (IBM)
  2023-06-19 12:43   ` Andreas Gruenbacher
  2023-06-19 14:26   ` Matthew Wilcox
  7 siblings, 2 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-06-19  2:28 UTC (permalink / raw)
  To: linux-xfs
  Cc: linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Matthew Wilcox, Christoph Hellwig, Brian Foster, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM), Aravinda Herle

When filesystem blocksize is less than folio size (either with
mapping_large_folio_support() or with blocksize < pagesize) and when the
folio is uptodate in pagecache, then even a byte write can cause
an entire folio to be written to disk during writeback. This happens
because we currently don't have a mechanism to track per-block dirty
state within struct iomap_folio_state. We currently only track uptodate
state.

This patch implements support for tracking per-block dirty state in
iomap_folio_state->state bitmap. This should help improve the filesystem
write performance and help reduce write amplification.

Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.

1. <test_randwrite.fio>
[global]
	ioengine=psync
	rw=randwrite
	overwrite=1
	pre_read=1
	direct=0
	bs=4k
	size=1G
	dir=./
	numjobs=8
	fdatasync=1
	runtime=60
	iodepth=64
	group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves
   their database workload performance by around ~83% (with XFS on Power)

Reported-by: Aravinda Herle <araherle@in.ibm.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/gfs2/aops.c         |   2 +-
 fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_aops.c      |   2 +-
 fs/zonefs/file.c       |   2 +-
 include/linux/iomap.h  |   1 +
 5 files changed, 171 insertions(+), 25 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a5f4be6b9213..75efec3c3b71 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
 	.writepages = gfs2_writepages,
 	.read_folio = gfs2_read_folio,
 	.readahead = gfs2_readahead,
-	.dirty_folio = filemap_dirty_folio,
+	.dirty_folio = iomap_dirty_folio,
 	.release_folio = iomap_release_folio,
 	.invalidate_folio = iomap_invalidate_folio,
 	.bmap = gfs2_bmap,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 391d918ddd22..50f5840bb5f9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -25,7 +25,7 @@
 
 typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
 /*
- * Structure allocated for each folio to track per-block uptodate state
+ * Structure allocated for each folio to track per-block uptodate, dirty state
  * and I/O completions.
  */
 struct iomap_folio_state {
@@ -35,31 +35,55 @@ struct iomap_folio_state {
 	unsigned long		state[];
 };
 
+enum iomap_block_state {
+	IOMAP_ST_UPTODATE,
+	IOMAP_ST_DIRTY,
+
+	IOMAP_ST_MAX,
+};
+
+static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
+		enum iomap_block_state state, unsigned int *first_blkp,
+		unsigned int *nr_blksp)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int first = off >> inode->i_blkbits;
+	unsigned int last = (off + len - 1) >> inode->i_blkbits;
+
+	*first_blkp = first + (state * blks_per_folio);
+	*nr_blksp = last - first + 1;
+}
+
 static struct bio_set iomap_ioend_bioset;
 
 static inline bool ifs_is_fully_uptodate(struct folio *folio,
 					       struct iomap_folio_state *ifs)
 {
 	struct inode *inode = folio->mapping->host;
+	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
 
-	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
+	return bitmap_full(ifs->state, nr_blks);
 }
 
-static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
-					       unsigned int block)
+static inline bool ifs_block_is_uptodate(struct folio *folio,
+		struct iomap_folio_state *ifs, unsigned int block)
 {
-	return test_bit(block, ifs->state);
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+
+	return test_bit(block + IOMAP_ST_UPTODATE * blks_per_folio, ifs->state);
 }
 
 static void ifs_set_range_uptodate(struct folio *folio,
 		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
-	struct inode *inode = folio->mapping->host;
-	unsigned int first_blk = off >> inode->i_blkbits;
-	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
-	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned int first_blk, nr_blks;
 	unsigned long flags;
 
+	ifs_calc_range(folio, off, len, IOMAP_ST_UPTODATE, &first_blk,
+			     &nr_blks);
 	spin_lock_irqsave(&ifs->state_lock, flags);
 	bitmap_set(ifs->state, first_blk, nr_blks);
 	if (ifs_is_fully_uptodate(folio, ifs))
@@ -78,6 +102,55 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
 		folio_mark_uptodate(folio);
 }
 
+static inline bool ifs_block_is_dirty(struct folio *folio,
+		struct iomap_folio_state *ifs, int block)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+
+	return test_bit(block + IOMAP_ST_DIRTY * blks_per_folio, ifs->state);
+}
+
+static void ifs_clear_range_dirty(struct folio *folio,
+		struct iomap_folio_state *ifs, size_t off, size_t len)
+{
+	unsigned int first_blk, nr_blks;
+	unsigned long flags;
+
+	ifs_calc_range(folio, off, len, IOMAP_ST_DIRTY, &first_blk, &nr_blks);
+	spin_lock_irqsave(&ifs->state_lock, flags);
+	bitmap_clear(ifs->state, first_blk, nr_blks);
+	spin_unlock_irqrestore(&ifs->state_lock, flags);
+}
+
+static void iomap_clear_range_dirty(struct folio *folio, size_t off, size_t len)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (ifs)
+		ifs_clear_range_dirty(folio, ifs, off, len);
+}
+
+static void ifs_set_range_dirty(struct folio *folio,
+		struct iomap_folio_state *ifs, size_t off, size_t len)
+{
+	unsigned int first_blk, nr_blks;
+	unsigned long flags;
+
+	ifs_calc_range(folio, off, len, IOMAP_ST_DIRTY, &first_blk, &nr_blks);
+	spin_lock_irqsave(&ifs->state_lock, flags);
+	bitmap_set(ifs->state, first_blk, nr_blks);
+	spin_unlock_irqrestore(&ifs->state_lock, flags);
+}
+
+static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (ifs)
+		ifs_set_range_dirty(folio, ifs, off, len);
+}
+
 static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 		struct folio *folio, unsigned int flags)
 {
@@ -93,14 +166,24 @@ static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 	else
 		gfp = GFP_NOFS | __GFP_NOFAIL;
 
-	ifs = kzalloc(struct_size(ifs, state, BITS_TO_LONGS(nr_blocks)),
-		      gfp);
-	if (ifs) {
-		spin_lock_init(&ifs->state_lock);
-		if (folio_test_uptodate(folio))
-			bitmap_fill(ifs->state, nr_blocks);
-		folio_attach_private(folio, ifs);
-	}
+	/*
+	 * ifs->state tracks two sets of state flags when the
+	 * filesystem block size is smaller than the folio size.
+	 * The first state tracks per-block uptodate and the
+	 * second tracks per-block dirty state.
+	 */
+	ifs = kzalloc(struct_size(ifs, state,
+		      BITS_TO_LONGS(IOMAP_ST_MAX * nr_blocks)), gfp);
+	if (!ifs)
+		return ifs;
+
+	spin_lock_init(&ifs->state_lock);
+	if (folio_test_uptodate(folio))
+		bitmap_set(ifs->state, 0, nr_blocks);
+	if (folio_test_dirty(folio))
+		bitmap_set(ifs->state, nr_blocks, nr_blocks);
+	folio_attach_private(folio, ifs);
+
 	return ifs;
 }
 
@@ -143,7 +226,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 
 		/* move forward for each leading block marked uptodate */
 		for (i = first; i <= last; i++) {
-			if (!ifs_block_is_uptodate(ifs, i))
+			if (!ifs_block_is_uptodate(folio, ifs, i))
 				break;
 			*pos += block_size;
 			poff += block_size;
@@ -153,7 +236,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 
 		/* truncate len if we find any trailing uptodate block(s) */
 		for ( ; i <= last; i++) {
-			if (ifs_block_is_uptodate(ifs, i)) {
+			if (ifs_block_is_uptodate(folio, ifs, i)) {
 				plen -= (last - i + 1) * block_size;
 				last = i - 1;
 				break;
@@ -457,7 +540,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	last = (from + count - 1) >> inode->i_blkbits;
 
 	for (i = first; i <= last; i++)
-		if (!ifs_block_is_uptodate(ifs, i))
+		if (!ifs_block_is_uptodate(folio, ifs, i))
 			return false;
 	return true;
 }
@@ -523,6 +606,17 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
 }
 EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
 
+bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
+{
+	struct inode *inode = mapping->host;
+	size_t len = folio_size(folio);
+
+	ifs_alloc(inode, folio, 0);
+	iomap_set_range_dirty(folio, 0, len);
+	return filemap_dirty_folio(mapping, folio);
+}
+EXPORT_SYMBOL_GPL(iomap_dirty_folio);
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -727,6 +821,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	if (unlikely(copied < len && !folio_test_uptodate(folio)))
 		return 0;
 	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
+	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
 	filemap_dirty_folio(inode->i_mapping, folio);
 	return copied;
 }
@@ -891,6 +986,43 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
+static int iomap_write_delalloc_ifs_punch(struct inode *inode,
+		struct folio *folio, loff_t start_byte, loff_t end_byte,
+		iomap_punch_t punch)
+{
+	unsigned int first_blk, last_blk, i;
+	loff_t last_byte;
+	u8 blkbits = inode->i_blkbits;
+	struct iomap_folio_state *ifs;
+	int ret = 0;
+
+	/*
+	 * When we have per-block dirty tracking, there can be
+	 * blocks within a folio which are marked uptodate
+	 * but not dirty. In that case it is necessary to punch
+	 * out such blocks to avoid leaking any delalloc blocks.
+	 */
+	ifs = folio->private;
+	if (!ifs)
+		return ret;
+
+	last_byte = min_t(loff_t, end_byte - 1,
+			folio_pos(folio) + folio_size(folio) - 1);
+	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
+	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
+	for (i = first_blk; i <= last_blk; i++) {
+		if (!ifs_block_is_dirty(folio, ifs, i)) {
+			ret = punch(inode, folio_pos(folio) + (i << blkbits),
+				    1 << blkbits);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return ret;
+}
+
+
 static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
 		iomap_punch_t punch)
@@ -907,6 +1039,13 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 		if (ret)
 			return ret;
 	}
+
+	/* Punch non-dirty blocks within folio */
+	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte,
+			end_byte, punch);
+	if (ret)
+		return ret;
+
 	/*
 	 * Make sure the next punch start is correctly bound to
 	 * the end of this data range, not the end of the folio.
@@ -1637,7 +1776,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct inode *inode,
 		struct folio *folio, u64 end_pos)
 {
-	struct iomap_folio_state *ifs = ifs_alloc(inode, folio, 0);
+	struct iomap_folio_state *ifs = folio->private;
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	unsigned nblocks = i_blocks_per_folio(inode, folio);
@@ -1645,6 +1784,11 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
+	if (!ifs && nblocks > 1) {
+		ifs = ifs_alloc(inode, folio, 0);
+		iomap_set_range_dirty(folio, 0, folio_size(folio));
+	}
+
 	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
 
 	/*
@@ -1653,7 +1797,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * invalid, grab a new one.
 	 */
 	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
-		if (ifs && !ifs_block_is_uptodate(ifs, i))
+		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
 			continue;
 
 		error = wpc->ops->map_blocks(wpc, inode, pos);
@@ -1697,6 +1841,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		}
 	}
 
+	iomap_clear_range_dirty(folio, 0, end_pos - folio_pos(folio));
 	folio_start_writeback(folio);
 	folio_unlock(folio);
 
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 451942fb38ec..2fca4b4e7fd8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -578,7 +578,7 @@ const struct address_space_operations xfs_address_space_operations = {
 	.read_folio		= xfs_vm_read_folio,
 	.readahead		= xfs_vm_readahead,
 	.writepages		= xfs_vm_writepages,
-	.dirty_folio		= filemap_dirty_folio,
+	.dirty_folio		= iomap_dirty_folio,
 	.release_folio		= iomap_release_folio,
 	.invalidate_folio	= iomap_invalidate_folio,
 	.bmap			= xfs_vm_bmap,
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f..e508c8e97372 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -175,7 +175,7 @@ const struct address_space_operations zonefs_file_aops = {
 	.read_folio		= zonefs_read_folio,
 	.readahead		= zonefs_readahead,
 	.writepages		= zonefs_writepages,
-	.dirty_folio		= filemap_dirty_folio,
+	.dirty_folio		= iomap_dirty_folio,
 	.release_folio		= iomap_release_folio,
 	.invalidate_folio	= iomap_invalidate_folio,
 	.migrate_folio		= filemap_migrate_folio,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e2b836c2e119..eb9335c46bf3 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -264,6 +264,7 @@ bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
 void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
+bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19  2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
@ 2023-06-19 12:43   ` Andreas Gruenbacher
  2023-06-19 16:55     ` Ritesh Harjani
  2023-06-19 14:26   ` Matthew Wilcox
  1 sibling, 1 reply; 17+ messages in thread
From: Andreas Gruenbacher @ 2023-06-19 12:43 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Matthew Wilcox,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

On Mon, Jun 19, 2023 at 4:29 AM Ritesh Harjani (IBM)
<ritesh.list@gmail.com> wrote:
> When filesystem blocksize is less than folio size (either with
> mapping_large_folio_support() or with blocksize < pagesize) and when the
> folio is uptodate in pagecache, then even a byte write can cause
> an entire folio to be written to disk during writeback. This happens
> because we currently don't have a mechanism to track per-block dirty
> state within struct iomap_folio_state. We currently only track uptodate
> state.
>
> This patch implements support for tracking per-block dirty state in
> iomap_folio_state->state bitmap. This should help improve the filesystem
> write performance and help reduce write amplification.
>
> Performance testing of below fio workload reveals ~16x performance
> improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
> FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
>
> 1. <test_randwrite.fio>
> [global]
>         ioengine=psync
>         rw=randwrite
>         overwrite=1
>         pre_read=1
>         direct=0
>         bs=4k
>         size=1G
>         dir=./
>         numjobs=8
>         fdatasync=1
>         runtime=60
>         iodepth=64
>         group_reporting=1
>
> [fio-run]
>
> 2. Also our internal performance team reported that this patch improves
>    their database workload performance by around ~83% (with XFS on Power)
>
> Reported-by: Aravinda Herle <araherle@in.ibm.com>
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
>  fs/gfs2/aops.c         |   2 +-
>  fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
>  fs/xfs/xfs_aops.c      |   2 +-
>  fs/zonefs/file.c       |   2 +-
>  include/linux/iomap.h  |   1 +
>  5 files changed, 171 insertions(+), 25 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index a5f4be6b9213..75efec3c3b71 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
>         .writepages = gfs2_writepages,
>         .read_folio = gfs2_read_folio,
>         .readahead = gfs2_readahead,
> -       .dirty_folio = filemap_dirty_folio,
> +       .dirty_folio = iomap_dirty_folio,
>         .release_folio = iomap_release_folio,
>         .invalidate_folio = iomap_invalidate_folio,
>         .bmap = gfs2_bmap,
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 391d918ddd22..50f5840bb5f9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -25,7 +25,7 @@
>
>  typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>  /*
> - * Structure allocated for each folio to track per-block uptodate state
> + * Structure allocated for each folio to track per-block uptodate, dirty state
>   * and I/O completions.
>   */
>  struct iomap_folio_state {
> @@ -35,31 +35,55 @@ struct iomap_folio_state {
>         unsigned long           state[];
>  };
>
> +enum iomap_block_state {
> +       IOMAP_ST_UPTODATE,
> +       IOMAP_ST_DIRTY,
> +
> +       IOMAP_ST_MAX,
> +};
> +
> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
> +               enum iomap_block_state state, unsigned int *first_blkp,
> +               unsigned int *nr_blksp)
> +{
> +       struct inode *inode = folio->mapping->host;
> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +       unsigned int first = off >> inode->i_blkbits;
> +       unsigned int last = (off + len - 1) >> inode->i_blkbits;
> +
> +       *first_blkp = first + (state * blks_per_folio);
> +       *nr_blksp = last - first + 1;
> +}
> +
>  static struct bio_set iomap_ioend_bioset;
>
>  static inline bool ifs_is_fully_uptodate(struct folio *folio,
>                                                struct iomap_folio_state *ifs)
>  {
>         struct inode *inode = folio->mapping->host;
> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +       unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;

This nr_blks calculation doesn't make sense.

> -       return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
> +       return bitmap_full(ifs->state, nr_blks);

Could you please change this to:

BUILD_BUG_ON(IOMAP_ST_UPTODATE != 0);
return bitmap_full(ifs->state, blks_per_folio);

Also, I'm seeing that the value of i_blocks_per_folio() is assigned to
local variables with various names in several places (blks_per_folio,
nr_blocks, nblocks). Maybe this could be made consistent.

>  }
>
> -static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
> -                                    blks_per_folio          unsigned int block)blks_per_folio
> +static inline bool ifs_block_is_uptodate(struct folio *folio,
> +               struct iomap_folio_state *ifs, unsigned int block)
>  {
> -       return test_bit(block, ifs->state);
> +       struct inode *inode = folio->mapping->host;
> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +
> +       return test_bit(block + IOMAP_ST_UPTODATE * blks_per_folio, ifs->state);
>  }
>
>  static void ifs_set_range_uptodate(struct folio *folio,
>                 struct iomap_folio_state *ifs, size_t off, size_t len)
>  {
> -       struct inode *inode = folio->mapping->host;
> -       unsigned int first_blk = off >> inode->i_blkbits;
> -       unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
> -       unsigned int nr_blks = last_blk - first_blk + 1;
> +       unsigned int first_blk, nr_blks;
>         unsigned long flags;
>
> +       ifs_calc_range(folio, off, len, IOMAP_ST_UPTODATE, &first_blk,
> +                            &nr_blks);
>         spin_lock_irqsave(&ifs->state_lock, flags);
>         bitmap_set(ifs->state, first_blk, nr_blks);
>         if (ifs_is_fully_uptodate(folio, ifs))
> @@ -78,6 +102,55 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>                 folio_mark_uptodate(folio);
>  }
>
> +static inline bool ifs_block_is_dirty(struct folio *folio,
> +               struct iomap_folio_state *ifs, int block)
> +{
> +       struct inode *inode = folio->mapping->host;
> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +
> +       return test_bit(block + IOMAP_ST_DIRTY * blks_per_folio, ifs->state);
> +}
> +
> +static void ifs_clear_range_dirty(struct folio *folio,
> +               struct iomap_folio_state *ifs, size_t off, size_t len)
> +{
> +       unsigned int first_blk, nr_blks;
> +       unsigned long flags;
> +
> +       ifs_calc_range(folio, off, len, IOMAP_ST_DIRTY, &first_blk, &nr_blks);
> +       spin_lock_irqsave(&ifs->state_lock, flags);
> +       bitmap_clear(ifs->state, first_blk, nr_blks);
> +       spin_unlock_irqrestore(&ifs->state_lock, flags);
> +}
> +
> +static void iomap_clear_range_dirty(struct folio *folio, size_t off, size_t len)
> +{
> +       struct iomap_folio_state *ifs = folio->private;
> +
> +       if (ifs)
> +               ifs_clear_range_dirty(folio, ifs, off, len);
> +}
> +
> +static void ifs_set_range_dirty(struct folio *folio,
> +               struct iomap_folio_state *ifs, size_t off, size_t len)
> +{
> +       unsigned int first_blk, nr_blks;
> +       unsigned long flags;
> +
> +       ifs_calc_range(folio, off, len, IOMAP_ST_DIRTY, &first_blk, &nr_blks);
> +       spin_lock_irqsave(&ifs->state_lock, flags);
> +       bitmap_set(ifs->state, first_blk, nr_blks);
> +       spin_unlock_irqrestore(&ifs->state_lock, flags);
> +}
> +
> +static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
> +{
> +       struct iomap_folio_state *ifs = folio->private;
> +
> +       if (ifs)
> +               ifs_set_range_dirty(folio, ifs, off, len);
> +}
> +
>  static struct iomap_folio_state *ifs_alloc(struct inode *inode,
>                 struct folio *folio, unsigned int flags)
>  {
> @@ -93,14 +166,24 @@ static struct iomap_folio_state *ifs_alloc(struct inode *inode,
>         else
>                 gfp = GFP_NOFS | __GFP_NOFAIL;
>
> -       ifs = kzalloc(struct_size(ifs, state, BITS_TO_LONGS(nr_blocks)),
> -                     gfp);
> -       if (ifs) {
> -               spin_lock_init(&ifs->state_lock);
> -               if (folio_test_uptodate(folio))
> -                       bitmap_fill(ifs->state, nr_blocks);
> -               folio_attach_private(folio, ifs);
> -       }
> +       /*
> +        * ifs->state tracks two sets of state flags when the
> +        * filesystem block size is smaller than the folio size.
> +        * The first state tracks per-block uptodate and the
> +        * second tracks per-block dirty state.
> +        */
> +       ifs = kzalloc(struct_size(ifs, state,
> +                     BITS_TO_LONGS(IOMAP_ST_MAX * nr_blocks)), gfp);
> +       if (!ifs)
> +               return ifs;
> +
> +       spin_lock_init(&ifs->state_lock);
> +       if (folio_test_uptodate(folio))
> +               bitmap_set(ifs->state, 0, nr_blocks);
> +       if (folio_test_dirty(folio))
> +               bitmap_set(ifs->state, nr_blocks, nr_blocks);
> +       folio_attach_private(folio, ifs);
> +
>         return ifs;
>  }
>
> @@ -143,7 +226,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>
>                 /* move forward for each leading block marked uptodate */
>                 for (i = first; i <= last; i++) {
> -                       if (!ifs_block_is_uptodate(ifs, i))
> +                       if (!ifs_block_is_uptodate(folio, ifs, i))
>                                 break;
>                         *pos += block_size;
>                         poff += block_size;
> @@ -153,7 +236,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>
>                 /* truncate len if we find any trailing uptodate block(s) */
>                 for ( ; i <= last; i++) {
> -                       if (ifs_block_is_uptodate(ifs, i)) {
> +                       if (ifs_block_is_uptodate(folio, ifs, i)) {
>                                 plen -= (last - i + 1) * block_size;
>                                 last = i - 1;
>                                 break;
> @@ -457,7 +540,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
>         last = (from + count - 1) >> inode->i_blkbits;
>
>         for (i = first; i <= last; i++)
> -               if (!ifs_block_is_uptodate(ifs, i))
> +               if (!ifs_block_is_uptodate(folio, ifs, i))
>                         return false;
>         return true;
>  }
> @@ -523,6 +606,17 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
>  }
>  EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
>
> +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
> +{
> +       struct inode *inode = mapping->host;
> +       size_t len = folio_size(folio);
> +
> +       ifs_alloc(inode, folio, 0);
> +       iomap_set_range_dirty(folio, 0, len);
> +       return filemap_dirty_folio(mapping, folio);
> +}
> +EXPORT_SYMBOL_GPL(iomap_dirty_folio);
> +
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -727,6 +821,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>         if (unlikely(copied < len && !folio_test_uptodate(folio)))
>                 return 0;
>         iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
> +       iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
>         filemap_dirty_folio(inode->i_mapping, folio);
>         return copied;
>  }
> @@ -891,6 +986,43 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>
> +static int iomap_write_delalloc_ifs_punch(struct inode *inode,
> +               struct folio *folio, loff_t start_byte, loff_t end_byte,
> +               iomap_punch_t punch)
> +{
> +       unsigned int first_blk, last_blk, i;
> +       loff_t last_byte;
> +       u8 blkbits = inode->i_blkbits;
> +       struct iomap_folio_state *ifs;
> +       int ret = 0;
> +
> +       /*
> +        * When we have per-block dirty tracking, there can be
> +        * blocks within a folio which are marked uptodate
> +        * but not dirty. In that case it is necessary to punch
> +        * out such blocks to avoid leaking any delalloc blocks.
> +        */
> +       ifs = folio->private;
> +       if (!ifs)
> +               return ret;
> +
> +       last_byte = min_t(loff_t, end_byte - 1,
> +                       folio_pos(folio) + folio_size(folio) - 1);
> +       first_blk = offset_in_folio(folio, start_byte) >> blkbits;
> +       last_blk = offset_in_folio(folio, last_byte) >> blkbits;
> +       for (i = first_blk; i <= last_blk; i++) {
> +               if (!ifs_block_is_dirty(folio, ifs, i)) {
> +                       ret = punch(inode, folio_pos(folio) + (i << blkbits),
> +                                   1 << blkbits);
> +                       if (ret)
> +                               return ret;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +
>  static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>                 loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
>                 iomap_punch_t punch)
> @@ -907,6 +1039,13 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>                 if (ret)
>                         return ret;
>         }
> +
> +       /* Punch non-dirty blocks within folio */
> +       ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte,
> +                       end_byte, punch);
> +       if (ret)
> +               return ret;
> +
>         /*
>          * Make sure the next punch start is correctly bound to
>          * the end of this data range, not the end of the folio.
> @@ -1637,7 +1776,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>                 struct writeback_control *wbc, struct inode *inode,
>                 struct folio *folio, u64 end_pos)
>  {
> -       struct iomap_folio_state *ifs = ifs_alloc(inode, folio, 0);
> +       struct iomap_folio_state *ifs = folio->private;
>         struct iomap_ioend *ioend, *next;
>         unsigned len = i_blocksize(inode);
>         unsigned nblocks = i_blocks_per_folio(inode, folio);
> @@ -1645,6 +1784,11 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>         int error = 0, count = 0, i;
>         LIST_HEAD(submit_list);
>
> +       if (!ifs && nblocks > 1) {
> +               ifs = ifs_alloc(inode, folio, 0);
> +               iomap_set_range_dirty(folio, 0, folio_size(folio));
> +       }
> +
>         WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
>
>         /*
> @@ -1653,7 +1797,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>          * invalid, grab a new one.
>          */
>         for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
> -               if (ifs && !ifs_block_is_uptodate(ifs, i))
> +               if (ifs && !ifs_block_is_dirty(folio, ifs, i))
>                         continue;
>
>                 error = wpc->ops->map_blocks(wpc, inode, pos);
> @@ -1697,6 +1841,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>                 }
>         }
>
> +       iomap_clear_range_dirty(folio, 0, end_pos - folio_pos(folio));
>         folio_start_writeback(folio);
>         folio_unlock(folio);
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 451942fb38ec..2fca4b4e7fd8 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -578,7 +578,7 @@ const struct address_space_operations xfs_address_space_operations = {
>         .read_folio             = xfs_vm_read_folio,
>         .readahead              = xfs_vm_readahead,
>         .writepages             = xfs_vm_writepages,
> -       .dirty_folio            = filemap_dirty_folio,
> +       .dirty_folio            = iomap_dirty_folio,
>         .release_folio          = iomap_release_folio,
>         .invalidate_folio       = iomap_invalidate_folio,
>         .bmap                   = xfs_vm_bmap,
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index 132f01d3461f..e508c8e97372 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -175,7 +175,7 @@ const struct address_space_operations zonefs_file_aops = {
>         .read_folio             = zonefs_read_folio,
>         .readahead              = zonefs_readahead,
>         .writepages             = zonefs_writepages,
> -       .dirty_folio            = filemap_dirty_folio,
> +       .dirty_folio            = iomap_dirty_folio,
>         .release_folio          = iomap_release_folio,
>         .invalidate_folio       = iomap_invalidate_folio,
>         .migrate_folio          = filemap_migrate_folio,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e2b836c2e119..eb9335c46bf3 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -264,6 +264,7 @@ bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
>  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
>  bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
>  void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
> +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
>  int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>                 const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> --
> 2.40.1
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19  2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
  2023-06-19 12:43   ` Andreas Gruenbacher
@ 2023-06-19 14:26   ` Matthew Wilcox
  2023-06-19 16:25     ` Ritesh Harjani
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2023-06-19 14:26 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
> +		enum iomap_block_state state, unsigned int *first_blkp,
> +		unsigned int *nr_blksp)
> +{
> +	struct inode *inode = folio->mapping->host;
> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +	unsigned int first = off >> inode->i_blkbits;
> +	unsigned int last = (off + len - 1) >> inode->i_blkbits;
> +
> +	*first_blkp = first + (state * blks_per_folio);
> +	*nr_blksp = last - first + 1;
> +}

As I said, this is not 'first_blkp'.  It's first_bitp.  I think this
misunderstanding is related to Andreas' complaint, but it's not quite
the same.

>  static inline bool ifs_is_fully_uptodate(struct folio *folio,
>  					       struct iomap_folio_state *ifs)
>  {
>  	struct inode *inode = folio->mapping->host;
> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +	unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>  
> -	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
> +	return bitmap_full(ifs->state, nr_blks);

I think we have a gap in our bitmap APIs.  We don't have a
'bitmap_range_full(src, pos, nbits)'.  We could use find_next_zero_bit(),
but that's going to do more work than necessary.

Given this lack, perhaps it's time to say that you're making all of
this too hard by using an enum, and pretending that we can switch the
positions of 'uptodate' and 'dirty' in the bitmap just by changing
the enum.  Define the uptodate bits to be the first ones in the bitmap,
document it (and why), and leave it at that.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19 14:26   ` Matthew Wilcox
@ 2023-06-19 16:25     ` Ritesh Harjani
  2023-06-19 16:54       ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2023-06-19 16:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

Matthew Wilcox <willy@infradead.org> writes:

> On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
>> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> +		enum iomap_block_state state, unsigned int *first_blkp,
>> +		unsigned int *nr_blksp)
>> +{
>> +	struct inode *inode = folio->mapping->host;
>> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +	unsigned int first = off >> inode->i_blkbits;
>> +	unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> +
>> +	*first_blkp = first + (state * blks_per_folio);
>> +	*nr_blksp = last - first + 1;
>> +}
>
> As I said, this is not 'first_blkp'.  It's first_bitp.  I think this
> misunderstanding is related to Andreas' complaint, but it's not quite
> the same.
>

We represent each FS block as a bit in the bitmap. So first_blkp or
first_bitp or first_blkbitp essentially means the same. 
I went with first_blk, first_blkp in the first place based on your
suggestion itself [1].

[1]: https://lore.kernel.org/linux-xfs/Y%2FvxlVUJ31PZYaRa@casper.infradead.org/

>>  static inline bool ifs_is_fully_uptodate(struct folio *folio,
>>  					       struct iomap_folio_state *ifs)
>>  {
>>  	struct inode *inode = folio->mapping->host;
>> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +	unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>>  
>> -	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> +	return bitmap_full(ifs->state, nr_blks);
>
> I think we have a gap in our bitmap APIs.  We don't have a
> 'bitmap_range_full(src, pos, nbits)'.  We could use find_next_zero_bit(),
> but that's going to do more work than necessary.
>
> Given this lack, perhaps it's time to say that you're making all of
> this too hard by using an enum, and pretending that we can switch the
> positions of 'uptodate' and 'dirty' in the bitmap just by changing
> the enum.

Actually I never wanted to use the the enum this way. That's why I was
not fond of the idea behind using enum in all the bitmap state
manipulation APIs (test/set/).

It was only intended to be passed as a state argument to ifs_calc_range()
function to keep all the first_blkp and nr_blksp calculation at one
place. And just use it's IOMAP_ST_MAX value while allocating state bitmap.
It was never intended to be used like this.

We can even now go back to this original idea and keep the use of the
enum limited to what I just mentioned above i.e. for ifs_calc_range().

And maybe just use this in ifs_alloc()?
BUILD_BUG_ON(IOMAP_ST_UPTODATE == 0);
BUILD_BUG_ON(IOMAP_ST_DIRTY == 1);

> Define the uptodate bits to be the first ones in the bitmap,
> document it (and why), and leave it at that.

Do you think we can go with above suggestion, or do you still think we
need to drop it?

In case if we drop it, then should we open code the calculations for
first_blk, last_blk? These calculations are done in exact same fashion
at 3 places ifs_set_range_uptodate(), ifs_clear_range_dirty() and
ifs_set_range_dirty().
Thoughts?

-ritesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19 16:25     ` Ritesh Harjani
@ 2023-06-19 16:54       ` Matthew Wilcox
  2023-06-19 17:29         ` Ritesh Harjani
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2023-06-19 16:54 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

On Mon, Jun 19, 2023 at 09:55:53PM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
> >> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
> >> +		enum iomap_block_state state, unsigned int *first_blkp,
> >> +		unsigned int *nr_blksp)
> >> +{
> >> +	struct inode *inode = folio->mapping->host;
> >> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> >> +	unsigned int first = off >> inode->i_blkbits;
> >> +	unsigned int last = (off + len - 1) >> inode->i_blkbits;
> >> +
> >> +	*first_blkp = first + (state * blks_per_folio);
> >> +	*nr_blksp = last - first + 1;
> >> +}
> >
> > As I said, this is not 'first_blkp'.  It's first_bitp.  I think this
> > misunderstanding is related to Andreas' complaint, but it's not quite
> > the same.
> >
> 
> We represent each FS block as a bit in the bitmap. So first_blkp or
> first_bitp or first_blkbitp essentially means the same. 
> I went with first_blk, first_blkp in the first place based on your
> suggestion itself [1].

No, it's not the same!  If you have 1kB blocks in a 64kB page, they're
numbered 0-63.  If you 'calc_range' for any of the dirty bits, you get
back a number in the range 64-127.  That's not a block number!  It's
the number of the bit you want to refer to.  Calling it blkp is going
to lead to confusion -- as you yourself seem to be confused.

> [1]: https://lore.kernel.org/linux-xfs/Y%2FvxlVUJ31PZYaRa@casper.infradead.org/

Those _were_ block numbers!  off >> inode->i_blkbits calculates a block
number.  (off >> inode->i_blkbits) + blocks_per_folio() does not calculate
a block number, it calculates a bit number.

> >> -	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
> >> +	return bitmap_full(ifs->state, nr_blks);
> >
> > I think we have a gap in our bitmap APIs.  We don't have a
> > 'bitmap_range_full(src, pos, nbits)'.  We could use find_next_zero_bit(),
> > but that's going to do more work than necessary.
> >
> > Given this lack, perhaps it's time to say that you're making all of
> > this too hard by using an enum, and pretending that we can switch the
> > positions of 'uptodate' and 'dirty' in the bitmap just by changing
> > the enum.
> 
> Actually I never wanted to use the the enum this way. That's why I was
> not fond of the idea behind using enum in all the bitmap state
> manipulation APIs (test/set/).
> 
> It was only intended to be passed as a state argument to ifs_calc_range()
> function to keep all the first_blkp and nr_blksp calculation at one
> place. And just use it's IOMAP_ST_MAX value while allocating state bitmap.
> It was never intended to be used like this.
> 
> We can even now go back to this original idea and keep the use of the
> enum limited to what I just mentioned above i.e. for ifs_calc_range().
> 
> And maybe just use this in ifs_alloc()?
> BUILD_BUG_ON(IOMAP_ST_UPTODATE == 0);
> BUILD_BUG_ON(IOMAP_ST_DIRTY == 1);
> 
> > Define the uptodate bits to be the first ones in the bitmap,
> > document it (and why), and leave it at that.
> 
> Do you think we can go with above suggestion, or do you still think we
> need to drop it?
> 
> In case if we drop it, then should we open code the calculations for
> first_blk, last_blk? These calculations are done in exact same fashion
> at 3 places ifs_set_range_uptodate(), ifs_clear_range_dirty() and
> ifs_set_range_dirty().
> Thoughts?

I disliked the enum from the moment I saw it, but didn't care enough to
say so.

Look, an abstraction should have a _purpose_.  The enum doesn't.  I'd
ditch this calc_range function entirely; it's just not worth it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19 12:43   ` Andreas Gruenbacher
@ 2023-06-19 16:55     ` Ritesh Harjani
  0 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani @ 2023-06-19 16:55 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Matthew Wilcox,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

Andreas Gruenbacher <agruenba@redhat.com> writes:

> On Mon, Jun 19, 2023 at 4:29 AM Ritesh Harjani (IBM)
> <ritesh.list@gmail.com> wrote:
>> When filesystem blocksize is less than folio size (either with
>> mapping_large_folio_support() or with blocksize < pagesize) and when the
>> folio is uptodate in pagecache, then even a byte write can cause
>> an entire folio to be written to disk during writeback. This happens
>> because we currently don't have a mechanism to track per-block dirty
>> state within struct iomap_folio_state. We currently only track uptodate
>> state.
>>
>> This patch implements support for tracking per-block dirty state in
>> iomap_folio_state->state bitmap. This should help improve the filesystem
>> write performance and help reduce write amplification.
>>
>> Performance testing of below fio workload reveals ~16x performance
>> improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
>> FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
>>
>> 1. <test_randwrite.fio>
>> [global]
>>         ioengine=psync
>>         rw=randwrite
>>         overwrite=1
>>         pre_read=1
>>         direct=0
>>         bs=4k
>>         size=1G
>>         dir=./
>>         numjobs=8
>>         fdatasync=1
>>         runtime=60
>>         iodepth=64
>>         group_reporting=1
>>
>> [fio-run]
>>
>> 2. Also our internal performance team reported that this patch improves
>>    their database workload performance by around ~83% (with XFS on Power)
>>
>> Reported-by: Aravinda Herle <araherle@in.ibm.com>
>> Reported-by: Brian Foster <bfoster@redhat.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>>  fs/gfs2/aops.c         |   2 +-
>>  fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
>>  fs/xfs/xfs_aops.c      |   2 +-
>>  fs/zonefs/file.c       |   2 +-
>>  include/linux/iomap.h  |   1 +
>>  5 files changed, 171 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
>> index a5f4be6b9213..75efec3c3b71 100644
>> --- a/fs/gfs2/aops.c
>> +++ b/fs/gfs2/aops.c
>> @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
>>         .writepages = gfs2_writepages,
>>         .read_folio = gfs2_read_folio,
>>         .readahead = gfs2_readahead,
>> -       .dirty_folio = filemap_dirty_folio,
>> +       .dirty_folio = iomap_dirty_folio,
>>         .release_folio = iomap_release_folio,
>>         .invalidate_folio = iomap_invalidate_folio,
>>         .bmap = gfs2_bmap,
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 391d918ddd22..50f5840bb5f9 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -25,7 +25,7 @@
>>
>>  typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>>  /*
>> - * Structure allocated for each folio to track per-block uptodate state
>> + * Structure allocated for each folio to track per-block uptodate, dirty state
>>   * and I/O completions.
>>   */
>>  struct iomap_folio_state {
>> @@ -35,31 +35,55 @@ struct iomap_folio_state {
>>         unsigned long           state[];
>>  };
>>
>> +enum iomap_block_state {
>> +       IOMAP_ST_UPTODATE,
>> +       IOMAP_ST_DIRTY,
>> +
>> +       IOMAP_ST_MAX,
>> +};
>> +
>> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> +               enum iomap_block_state state, unsigned int *first_blkp,
>> +               unsigned int *nr_blksp)
>> +{
>> +       struct inode *inode = folio->mapping->host;
>> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +       unsigned int first = off >> inode->i_blkbits;
>> +       unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> +
>> +       *first_blkp = first + (state * blks_per_folio);
>> +       *nr_blksp = last - first + 1;
>> +}
>> +
>>  static struct bio_set iomap_ioend_bioset;
>>
>>  static inline bool ifs_is_fully_uptodate(struct folio *folio,
>>                                                struct iomap_folio_state *ifs)
>>  {
>>         struct inode *inode = folio->mapping->host;
>> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +       unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>
> This nr_blks calculation doesn't make sense.
>

About this, I have replied with more details here [1]

[1]: https://lore.kernel.org/linux-xfs/87o7lbmnam.fsf@doe.com/


>> -       return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> +       return bitmap_full(ifs->state, nr_blks);
>
> Could you please change this to:
>
> BUILD_BUG_ON(IOMAP_ST_UPTODATE != 0);

ditto

> return bitmap_full(ifs->state, blks_per_folio);
>
> Also, I'm seeing that the value of i_blocks_per_folio() is assigned to
> local variables with various names in several places (blks_per_folio,
> nr_blocks, nblocks). Maybe this could be made consistent.
>

I remember giving it a try, but then it was really not worth it because
all of above naming also does make sense in the way they are getting
used currently in the code. So, I think as long as it is clear behind
what those variables means and how are they getting used, it should be
ok, IMO. 

-ritesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19 16:54       ` Matthew Wilcox
@ 2023-06-19 17:29         ` Ritesh Harjani
  2023-06-20  5:01           ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2023-06-19 17:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, Darrick J . Wong, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

Matthew Wilcox <willy@infradead.org> writes:

> On Mon, Jun 19, 2023 at 09:55:53PM +0530, Ritesh Harjani wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> 
>> > On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
>> >> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> >> +		enum iomap_block_state state, unsigned int *first_blkp,
>> >> +		unsigned int *nr_blksp)
>> >> +{
>> >> +	struct inode *inode = folio->mapping->host;
>> >> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> >> +	unsigned int first = off >> inode->i_blkbits;
>> >> +	unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> >> +
>> >> +	*first_blkp = first + (state * blks_per_folio);
>> >> +	*nr_blksp = last - first + 1;
>> >> +}
>> >
>> > As I said, this is not 'first_blkp'.  It's first_bitp.  I think this
>> > misunderstanding is related to Andreas' complaint, but it's not quite
>> > the same.
>> >
>> 
>> We represent each FS block as a bit in the bitmap. So first_blkp or
>> first_bitp or first_blkbitp essentially means the same. 
>> I went with first_blk, first_blkp in the first place based on your
>> suggestion itself [1].
>
> No, it's not the same!  If you have 1kB blocks in a 64kB page, they're
> numbered 0-63.  If you 'calc_range' for any of the dirty bits, you get
> back a number in the range 64-127.  That's not a block number!  It's
> the number of the bit you want to refer to.  Calling it blkp is going
> to lead to confusion -- as you yourself seem to be confused.
>
>> [1]: https://lore.kernel.org/linux-xfs/Y%2FvxlVUJ31PZYaRa@casper.infradead.org/
>
> Those _were_ block numbers!  off >> inode->i_blkbits calculates a block
> number.  (off >> inode->i_blkbits) + blocks_per_folio() does not calculate
> a block number, it calculates a bit number.
>

Yes, I don't mind changing it to _bit. It is derived out of an FS block
representation only. But I agree with your above argument using _bit in
variable name makes it explicit and clear.

>> >> -	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> >> +	return bitmap_full(ifs->state, nr_blks);
>> >
>> > I think we have a gap in our bitmap APIs.  We don't have a
>> > 'bitmap_range_full(src, pos, nbits)'.  We could use find_next_zero_bit(),
>> > but that's going to do more work than necessary.
>> >
>> > Given this lack, perhaps it's time to say that you're making all of
>> > this too hard by using an enum, and pretending that we can switch the
>> > positions of 'uptodate' and 'dirty' in the bitmap just by changing
>> > the enum.
>> 
>> Actually I never wanted to use the the enum this way. That's why I was
>> not fond of the idea behind using enum in all the bitmap state
>> manipulation APIs (test/set/).
>> 
>> It was only intended to be passed as a state argument to ifs_calc_range()
>> function to keep all the first_blkp and nr_blksp calculation at one
>> place. And just use it's IOMAP_ST_MAX value while allocating state bitmap.
>> It was never intended to be used like this.
>> 
>> We can even now go back to this original idea and keep the use of the
>> enum limited to what I just mentioned above i.e. for ifs_calc_range().
>> 
>> And maybe just use this in ifs_alloc()?
>> BUILD_BUG_ON(IOMAP_ST_UPTODATE == 0);
>> BUILD_BUG_ON(IOMAP_ST_DIRTY == 1);
>> 
>> > Define the uptodate bits to be the first ones in the bitmap,
>> > document it (and why), and leave it at that.
>> 
>> Do you think we can go with above suggestion, or do you still think we
>> need to drop it?
>> 
>> In case if we drop it, then should we open code the calculations for
>> first_blk, last_blk? These calculations are done in exact same fashion
>> at 3 places ifs_set_range_uptodate(), ifs_clear_range_dirty() and
>> ifs_set_range_dirty().
>> Thoughts?
>
> I disliked the enum from the moment I saw it, but didn't care enough to
> say so.
>
> Look, an abstraction should have a _purpose_.  The enum doesn't.  I'd
> ditch this calc_range function entirely; it's just not worth it.

I guess enum is creating more confusion with almost everyone than adding value.
So I don't mind ditching it (unless anyone else opposes for keeping it).

Also it would be helpful if you could let me know of any other review
comments on the rest of the patch? Does the rest looks good to you?

-ritesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-19 17:29         ` Ritesh Harjani
@ 2023-06-20  5:01           ` Darrick J. Wong
  2023-06-20  5:58             ` Ritesh Harjani
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2023-06-20  5:01 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

On Mon, Jun 19, 2023 at 10:59:09PM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Mon, Jun 19, 2023 at 09:55:53PM +0530, Ritesh Harjani wrote:
> >> Matthew Wilcox <willy@infradead.org> writes:
> >> 
> >> > On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
> >> >> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
> >> >> +		enum iomap_block_state state, unsigned int *first_blkp,
> >> >> +		unsigned int *nr_blksp)
> >> >> +{
> >> >> +	struct inode *inode = folio->mapping->host;
> >> >> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> >> >> +	unsigned int first = off >> inode->i_blkbits;
> >> >> +	unsigned int last = (off + len - 1) >> inode->i_blkbits;
> >> >> +
> >> >> +	*first_blkp = first + (state * blks_per_folio);
> >> >> +	*nr_blksp = last - first + 1;
> >> >> +}
> >> >
> >> > As I said, this is not 'first_blkp'.  It's first_bitp.  I think this
> >> > misunderstanding is related to Andreas' complaint, but it's not quite
> >> > the same.
> >> >
> >> 
> >> We represent each FS block as a bit in the bitmap. So first_blkp or
> >> first_bitp or first_blkbitp essentially means the same. 
> >> I went with first_blk, first_blkp in the first place based on your
> >> suggestion itself [1].
> >
> > No, it's not the same!  If you have 1kB blocks in a 64kB page, they're
> > numbered 0-63.  If you 'calc_range' for any of the dirty bits, you get
> > back a number in the range 64-127.  That's not a block number!  It's
> > the number of the bit you want to refer to.  Calling it blkp is going
> > to lead to confusion -- as you yourself seem to be confused.
> >
> >> [1]: https://lore.kernel.org/linux-xfs/Y%2FvxlVUJ31PZYaRa@casper.infradead.org/
> >
> > Those _were_ block numbers!  off >> inode->i_blkbits calculates a block
> > number.  (off >> inode->i_blkbits) + blocks_per_folio() does not calculate
> > a block number, it calculates a bit number.
> >
> 
> Yes, I don't mind changing it to _bit. It is derived out of an FS block
> representation only. But I agree with your above argument using _bit in
> variable name makes it explicit and clear.
> 
> >> >> -	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
> >> >> +	return bitmap_full(ifs->state, nr_blks);
> >> >
> >> > I think we have a gap in our bitmap APIs.  We don't have a
> >> > 'bitmap_range_full(src, pos, nbits)'.  We could use find_next_zero_bit(),
> >> > but that's going to do more work than necessary.
> >> >
> >> > Given this lack, perhaps it's time to say that you're making all of
> >> > this too hard by using an enum, and pretending that we can switch the
> >> > positions of 'uptodate' and 'dirty' in the bitmap just by changing
> >> > the enum.
> >> 
> >> Actually I never wanted to use the the enum this way. That's why I was
> >> not fond of the idea behind using enum in all the bitmap state
> >> manipulation APIs (test/set/).
> >> 
> >> It was only intended to be passed as a state argument to ifs_calc_range()
> >> function to keep all the first_blkp and nr_blksp calculation at one
> >> place. And just use it's IOMAP_ST_MAX value while allocating state bitmap.
> >> It was never intended to be used like this.
> >> 
> >> We can even now go back to this original idea and keep the use of the
> >> enum limited to what I just mentioned above i.e. for ifs_calc_range().
> >> 
> >> And maybe just use this in ifs_alloc()?
> >> BUILD_BUG_ON(IOMAP_ST_UPTODATE == 0);
> >> BUILD_BUG_ON(IOMAP_ST_DIRTY == 1);
> >> 
> >> > Define the uptodate bits to be the first ones in the bitmap,
> >> > document it (and why), and leave it at that.
> >> 
> >> Do you think we can go with above suggestion, or do you still think we
> >> need to drop it?
> >> 
> >> In case if we drop it, then should we open code the calculations for
> >> first_blk, last_blk? These calculations are done in exact same fashion
> >> at 3 places ifs_set_range_uptodate(), ifs_clear_range_dirty() and
> >> ifs_set_range_dirty().
> >> Thoughts?
> >
> > I disliked the enum from the moment I saw it, but didn't care enough to
> > say so.
> >
> > Look, an abstraction should have a _purpose_.  The enum doesn't.  I'd
> > ditch this calc_range function entirely; it's just not worth it.
> 
> I guess enum is creating more confusion with almost everyone than adding value.
> So I don't mind ditching it (unless anyone else opposes for keeping it).
> 
> Also it would be helpful if you could let me know of any other review
> comments on the rest of the patch? Does the rest looks good to you?

I deleted my entire angry rant about how this review has turned a
fairly simple design change into a big mess that even the reviewers
don't understand anymore.  I'm on vacation, I DGAF anymore.

Ritesh: Dump the enum; "because btrfs does it" is not sufficient
justification.  The rest is good enough, I'll put it in iomap-for-next
along with willy's thing as soon as 6.5-rc1 closes, and if you all have
further complaints, send your own patches.

--D

> -ritesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
  2023-06-20  5:01           ` Darrick J. Wong
@ 2023-06-20  5:58             ` Ritesh Harjani
  0 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani @ 2023-06-20  5:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, Andreas Gruenbacher,
	Christoph Hellwig, Brian Foster, Ojaswin Mujoo, Disha Goel,
	Aravinda Herle

"Darrick J. Wong" <djwong@kernel.org> writes:

> Ritesh: Dump the enum; "because btrfs does it" is not sufficient
> justification.  The rest is good enough, I'll put it in iomap-for-next
> along with willy's thing as soon as 6.5-rc1 closes, and if you all have
> further complaints, send your own patches.

Sure, Darrick. I agree that enum created more confusion than help (sorry
aobut that). I will dump it in the next revision & address Matthew's
review comment on the variable naming.

I agree to lock the current set of changes as functionality wise those
are looking good from everyone. Any further optimizations/improvements
can be handled seperately so that we don't complicate this one.  

Thanks once again Darrick and all the other reviewers!!

-ritesh

I am pulled into something else for next couple of days. Post that I
will work on it to get next revision out.  

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-20  5:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-19  2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
2023-06-19  2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
2023-06-19 12:43   ` Andreas Gruenbacher
2023-06-19 16:55     ` Ritesh Harjani
2023-06-19 14:26   ` Matthew Wilcox
2023-06-19 16:25     ` Ritesh Harjani
2023-06-19 16:54       ` Matthew Wilcox
2023-06-19 17:29         ` Ritesh Harjani
2023-06-20  5:01           ` Darrick J. Wong
2023-06-20  5:58             ` Ritesh Harjani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).