linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting
@ 2025-08-01  0:21 Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 01/10] mm: pass number of pages to __folio_start_writeback() Joanne Koong
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

This patchset is a stab at adding granular dirty and writeback stats
accounting for large folios.

The dirty page balancing logic uses these stats to determine things like
whether the ratelimit has been exceeded, the frequency with which pages need
to be written back, if dirtying should be throttled, etc. Currently for large
folios, if any byte in the folio is dirtied or written back, all the bytes in
the folio are accounted as such.

In particular, there are four places where dirty and writeback stats get
incremented and decremented as pages get dirtied and written back:
a) folio dirtying (filemap_dirty_folio() -> ... -> folio_account_dirtied())
   - increments NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE,
     current->nr_dirtied

b) writing back a mapping (writeback_iter() -> ... ->
folio_clear_dirty_for_io())
   - decrements NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE

c) starting writeback on a folio (folio_start_writeback())
   - increments WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING

d) ending writeback on a folio (folio_end_writeback())
   - decrements WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING

Patches 1 to 9 adds support for the 4 cases above to take in the number of
pages to be accounted, instead of accounting for the entire folio.

Patch 10 adds the iomap changes that uses these new APIs. This relies on the
iomap folio state bitmap to track which pages are dirty (so that we avoid
any double-counting). As such we can only do granular accounting if the
block size >= PAGE_SIZE.

This patchset was run through xfstests using fuse passthrough hp (with an
out-of-tree kernel patch enabling fuse large folios).

This is on top of commit d5212d81 ("Merge patch series "fuse: use iomap..."")
in Christian's vfs iomap tree, and on top of the patchset that removes
BDI_CAP_WRITEBACK_ACCT [1].

Benchmarks using a contrived test program that writes 2 GB in 128 MB chunks to
a fuse mount (with out-of-tree kernel patch that enables fuse large folios) and
then does 50k 50-byte random writes showed roughly a 10% performance improvement 
(0.625 seconds -> 0.547 seconds for the random writes).


Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250707234606.2300149-1-joannelkoong@gmail.com/


Joanne Koong (10):
  mm: pass number of pages to __folio_start_writeback()
  mm: pass number of pages to __folio_end_writeback()
  mm: add folio_end_writeback_pages() helper
  mm: pass number of pages dirtied to __folio_mark_dirty()
  mm: add filemap_dirty_folio_pages() helper
  mm: add __folio_clear_dirty_for_io() helper
  mm: add no_stats_accounting bitfield to wbc
  mm: refactor clearing dirty stats into helper function
  mm: add clear_dirty_for_io_stats() helper
  iomap: add granular dirty and writeback accounting

 fs/buffer.c                |   6 +-
 fs/ext4/page-io.c          |   2 +-
 fs/iomap/buffered-io.c     | 136 ++++++++++++++++++++++++++++++++++---
 include/linux/page-flags.h |   6 +-
 include/linux/pagemap.h    |   4 +-
 include/linux/writeback.h  |   6 ++
 mm/filemap.c               |  25 ++++---
 mm/internal.h              |   2 +-
 mm/page-writeback.c        | 127 ++++++++++++++++++++++------------
 9 files changed, 246 insertions(+), 68 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 01/10] mm: pass number of pages to __folio_start_writeback()
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 02/10] mm: pass number of pages to __folio_end_writeback() Joanne Koong
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add an additional arg to __folio_start_writeback() that takes in the
number of pages to write back.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/ext4/page-io.c          |  2 +-
 include/linux/page-flags.h |  6 +++---
 mm/page-writeback.c        | 10 +++++-----
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 179e54f3a3b6..b9ee40872040 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -580,7 +580,7 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, struct folio *folio,
 		io_folio = page_folio(bounce_page);
 	}
 
-	__folio_start_writeback(folio, keep_towrite);
+	__folio_start_writeback(folio, keep_towrite, folio_nr_pages(folio));
 
 	/* Now submit buffers to write */
 	do {
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4fe5ee67535b..7ec85ece9b67 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -854,13 +854,13 @@ static __always_inline void SetPageUptodate(struct page *page)
 
 CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)
 
-void __folio_start_writeback(struct folio *folio, bool keep_write);
+void __folio_start_writeback(struct folio *folio, bool keep_write, long nr_pages);
 void set_page_writeback(struct page *page);
 
 #define folio_start_writeback(folio)			\
-	__folio_start_writeback(folio, false)
+	__folio_start_writeback(folio, false, folio_nr_pages(folio))
 #define folio_start_writeback_keepwrite(folio)	\
-	__folio_start_writeback(folio, true)
+	__folio_start_writeback(folio, true, folio_nr_pages(folio))
 
 static __always_inline bool folio_test_head(const struct folio *folio)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 11f9a909e8de..2e6b132f7ac2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -3044,9 +3044,9 @@ bool __folio_end_writeback(struct folio *folio)
 	return ret;
 }
 
-void __folio_start_writeback(struct folio *folio, bool keep_write)
+void __folio_start_writeback(struct folio *folio, bool keep_write,
+		long nr_pages)
 {
-	long nr = folio_nr_pages(folio);
 	struct address_space *mapping = folio_mapping(folio);
 	int access_ret;
 
@@ -3067,7 +3067,7 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 		on_wblist = mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);
 
 		xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK);
-		wb_stat_mod(wb, WB_WRITEBACK, nr);
+		wb_stat_mod(wb, WB_WRITEBACK, nr_pages);
 		if (!on_wblist) {
 			wb_inode_writeback_start(wb);
 			/*
@@ -3088,8 +3088,8 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 		folio_test_set_writeback(folio);
 	}
 
-	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
-	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
+	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr_pages);
+	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr_pages);
 
 	access_ret = arch_make_folio_accessible(folio);
 	/*
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 02/10] mm: pass number of pages to __folio_end_writeback()
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 01/10] mm: pass number of pages to __folio_start_writeback() Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper Joanne Koong
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add an additional arg to __folio_end_writeback() that takes in the
number of pages that were written back.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 mm/filemap.c        |  2 +-
 mm/internal.h       |  2 +-
 mm/page-writeback.c | 13 ++++++-------
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bada249b9fb7..b69ba95746f0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1657,7 +1657,7 @@ void folio_end_writeback(struct folio *folio)
 	 * reused before the folio_wake_bit().
 	 */
 	folio_get(folio);
-	if (__folio_end_writeback(folio))
+	if (__folio_end_writeback(folio, folio_nr_pages(folio)))
 		folio_wake_bit(folio, PG_writeback);
 
 	filemap_end_dropbehind_write(folio);
diff --git a/mm/internal.h b/mm/internal.h
index 6b8ed2017743..d94f3d40cc66 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -416,7 +416,7 @@ static inline vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
 void folio_rotate_reclaimable(struct folio *folio);
-bool __folio_end_writeback(struct folio *folio);
+bool __folio_end_writeback(struct folio *folio, long nr_pages);
 void deactivate_file_folio(struct folio *folio);
 void folio_activate(struct folio *folio);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2e6b132f7ac2..2afdfaa285a6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -3008,9 +3008,8 @@ static void wb_inode_writeback_end(struct bdi_writeback *wb)
 	spin_unlock_irqrestore(&wb->work_lock, flags);
 }
 
-bool __folio_end_writeback(struct folio *folio)
+bool __folio_end_writeback(struct folio *folio, long nr_pages)
 {
-	long nr = folio_nr_pages(folio);
 	struct address_space *mapping = folio_mapping(folio);
 	bool ret;
 
@@ -3024,8 +3023,8 @@ bool __folio_end_writeback(struct folio *folio)
 		__xa_clear_mark(&mapping->i_pages, folio_index(folio),
 					PAGECACHE_TAG_WRITEBACK);
 
-		wb_stat_mod(wb, WB_WRITEBACK, -nr);
-		__wb_writeout_add(wb, nr);
+		wb_stat_mod(wb, WB_WRITEBACK, -nr_pages);
+		__wb_writeout_add(wb, nr_pages);
 		if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
 			wb_inode_writeback_end(wb);
 			if (mapping->host)
@@ -3037,9 +3036,9 @@ bool __folio_end_writeback(struct folio *folio)
 		ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
 	}
 
-	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
-	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
-	node_stat_mod_folio(folio, NR_WRITTEN, nr);
+	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr_pages);
+	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr_pages);
+	node_stat_mod_folio(folio, NR_WRITTEN, nr_pages);
 
 	return ret;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 01/10] mm: pass number of pages to __folio_start_writeback() Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 02/10] mm: pass number of pages to __folio_end_writeback() Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-12  8:03   ` Christoph Hellwig
  2025-08-01  0:21 ` [RFC PATCH v1 04/10] mm: pass number of pages dirtied to __folio_mark_dirty() Joanne Koong
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add folio_end_writeback_pages() which takes in the number of pages
written back.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c            | 25 +++++++++++++++----------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e63fbfbd5b0f..312209e0371a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1202,6 +1202,7 @@ void folio_wait_writeback(struct folio *folio);
 int folio_wait_writeback_killable(struct folio *folio);
 void end_page_writeback(struct page *page);
 void folio_end_writeback(struct folio *folio);
+void folio_end_writeback_pages(struct folio *folio, long nr_pages);
 void folio_wait_stable(struct folio *folio);
 void __folio_mark_dirty(struct folio *folio, struct address_space *, int warn);
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb);
diff --git a/mm/filemap.c b/mm/filemap.c
index b69ba95746f0..1a292cff0803 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1626,15 +1626,7 @@ static void filemap_end_dropbehind_write(struct folio *folio)
 	}
 }
 
-/**
- * folio_end_writeback - End writeback against a folio.
- * @folio: The folio.
- *
- * The folio must actually be under writeback.
- *
- * Context: May be called from process or interrupt context.
- */
-void folio_end_writeback(struct folio *folio)
+void folio_end_writeback_pages(struct folio *folio, long nr_pages)
 {
 	VM_BUG_ON_FOLIO(!folio_test_writeback(folio), folio);
 
@@ -1657,13 +1649,26 @@ void folio_end_writeback(struct folio *folio)
 	 * reused before the folio_wake_bit().
 	 */
 	folio_get(folio);
-	if (__folio_end_writeback(folio, folio_nr_pages(folio)))
+	if (__folio_end_writeback(folio, nr_pages))
 		folio_wake_bit(folio, PG_writeback);
 
 	filemap_end_dropbehind_write(folio);
 	acct_reclaim_writeback(folio);
 	folio_put(folio);
 }
+
+/**
+ * folio_end_writeback - End writeback against a folio.
+ * @folio: The folio.
+ *
+ * The folio must actually be under writeback.
+ *
+ * Context: May be called from process or interrupt context.
+ */
+void folio_end_writeback(struct folio *folio)
+{
+	folio_end_writeback_pages(folio, folio_nr_pages(folio));
+}
 EXPORT_SYMBOL(folio_end_writeback);
 
 /**
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 04/10] mm: pass number of pages dirtied to __folio_mark_dirty()
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (2 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper Joanne Koong
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add an additional arg to __folio_mark_dirty() that takes in the number
of pages dirtied, so that this can be passed to folio_account_dirtied()
when it updates the stats.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/buffer.c             |  6 ++++--
 include/linux/pagemap.h |  3 ++-
 mm/page-writeback.c     | 10 +++++-----
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8cf4a1dc481e..327bae3f724d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -751,7 +751,8 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
 	spin_unlock(&mapping->i_private_lock);
 
 	if (newly_dirty)
-		__folio_mark_dirty(folio, mapping, 1);
+		__folio_mark_dirty(folio, mapping, 1,
+				   folio_nr_pages(folio));
 
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
@@ -1209,7 +1210,8 @@ void mark_buffer_dirty(struct buffer_head *bh)
 		if (!folio_test_set_dirty(folio)) {
 			mapping = folio->mapping;
 			if (mapping)
-				__folio_mark_dirty(folio, mapping, 0);
+				__folio_mark_dirty(folio, mapping, 0,
+						   folio_nr_pages(folio));
 		}
 		if (mapping)
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 312209e0371a..0ae2c1e93ca5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1204,7 +1204,8 @@ void end_page_writeback(struct page *page);
 void folio_end_writeback(struct folio *folio);
 void folio_end_writeback_pages(struct folio *folio, long nr_pages);
 void folio_wait_stable(struct folio *folio);
-void __folio_mark_dirty(struct folio *folio, struct address_space *, int warn);
+void __folio_mark_dirty(struct folio *folio, struct address_space *, int warn,
+		long nr_pages);
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb);
 void __folio_cancel_dirty(struct folio *folio);
 static inline void folio_cancel_dirty(struct folio *folio)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2afdfaa285a6..b0ae10a6687d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2677,7 +2677,7 @@ EXPORT_SYMBOL(noop_dirty_folio);
  * NOTE: This relies on being atomic wrt interrupts.
  */
 static void folio_account_dirtied(struct folio *folio,
-		struct address_space *mapping)
+		struct address_space *mapping, long nr)
 {
 	struct inode *inode = mapping->host;
 
@@ -2685,7 +2685,6 @@ static void folio_account_dirtied(struct folio *folio,
 
 	if (mapping_can_writeback(mapping)) {
 		struct bdi_writeback *wb;
-		long nr = folio_nr_pages(folio);
 
 		inode_attach_wb(inode, folio);
 		wb = inode_to_wb(inode);
@@ -2733,14 +2732,14 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  * try_to_free_buffers() to fail.
  */
 void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
-			     int warn)
+			     int warn, long nr_pages)
 {
 	unsigned long flags;
 
 	xa_lock_irqsave(&mapping->i_pages, flags);
 	if (folio->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
-		folio_account_dirtied(folio, mapping);
+		folio_account_dirtied(folio, mapping, nr_pages);
 		__xa_set_mark(&mapping->i_pages, folio_index(folio),
 				PAGECACHE_TAG_DIRTY);
 	}
@@ -2771,7 +2770,8 @@ bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
 	if (folio_test_set_dirty(folio))
 		return false;
 
-	__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
+	__folio_mark_dirty(folio, mapping, !folio_test_private(folio),
+			folio_nr_pages(folio));
 
 	if (mapping->host) {
 		/* !PageAnon && !swapper_space */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (3 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 04/10] mm: pass number of pages dirtied to __folio_mark_dirty() Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01 17:07   ` Jan Kara
  2025-08-12  8:05   ` Christoph Hellwig
  2025-08-01  0:21 ` [RFC PATCH v1 06/10] mm: add __folio_clear_dirty_for_io() helper Joanne Koong
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add filemap_dirty_folio_pages() which takes in the number of pages to dirty.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/buffer.c               |  4 ++--
 include/linux/pagemap.h   |  2 +-
 include/linux/writeback.h |  2 ++
 mm/page-writeback.c       | 25 +++++++++++++++++++++----
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 327bae3f724d..7c05f6205d39 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -752,7 +752,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
 
 	if (newly_dirty)
 		__folio_mark_dirty(folio, mapping, 1,
-				   folio_nr_pages(folio));
+				   folio_nr_pages(folio), true);
 
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
@@ -1211,7 +1211,7 @@ void mark_buffer_dirty(struct buffer_head *bh)
 			mapping = folio->mapping;
 			if (mapping)
 				__folio_mark_dirty(folio, mapping, 0,
-						   folio_nr_pages(folio));
+						   folio_nr_pages(folio), true);
 		}
 		if (mapping)
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0ae2c1e93ca5..64f17aec9141 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1205,7 +1205,7 @@ void folio_end_writeback(struct folio *folio);
 void folio_end_writeback_pages(struct folio *folio, long nr_pages);
 void folio_wait_stable(struct folio *folio);
 void __folio_mark_dirty(struct folio *folio, struct address_space *, int warn,
-		long nr_pages);
+		long nr_pages, bool newly_dirty);
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb);
 void __folio_cancel_dirty(struct folio *folio);
 static inline void folio_cancel_dirty(struct folio *folio)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index eda4b62511f7..34afa6912a1c 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -383,6 +383,8 @@ void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end);
 
 bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio);
+bool filemap_dirty_folio_pages(struct address_space *mapping,
+			       struct folio *folio, long nr_pages);
 bool folio_redirty_for_writepage(struct writeback_control *, struct folio *);
 bool redirty_page_for_writepage(struct writeback_control *, struct page *);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b0ae10a6687d..a3805988f3ad 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2732,7 +2732,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  * try_to_free_buffers() to fail.
  */
 void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
-			     int warn, long nr_pages)
+			     int warn, long nr_pages, bool newly_dirty)
 {
 	unsigned long flags;
 
@@ -2740,12 +2740,29 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
 	if (folio->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
 		folio_account_dirtied(folio, mapping, nr_pages);
-		__xa_set_mark(&mapping->i_pages, folio_index(folio),
-				PAGECACHE_TAG_DIRTY);
+		if (newly_dirty)
+			__xa_set_mark(&mapping->i_pages, folio_index(folio),
+					PAGECACHE_TAG_DIRTY);
 	}
 	xa_unlock_irqrestore(&mapping->i_pages, flags);
 }
 
+bool filemap_dirty_folio_pages(struct address_space *mapping, struct folio *folio,
+			long nr_pages)
+{
+	bool newly_dirty = !folio_test_set_dirty(folio);
+
+	__folio_mark_dirty(folio, mapping, !folio_test_private(folio),
+			nr_pages, newly_dirty);
+
+	if (newly_dirty && mapping->host) {
+		/* !PageAnon && !swapper_space */
+		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+	}
+
+	return newly_dirty;
+}
+
 /**
  * filemap_dirty_folio - Mark a folio dirty for filesystems which do not use buffer_heads.
  * @mapping: Address space this folio belongs to.
@@ -2771,7 +2788,7 @@ bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
 		return false;
 
 	__folio_mark_dirty(folio, mapping, !folio_test_private(folio),
-			folio_nr_pages(folio));
+			folio_nr_pages(folio), true);
 
 	if (mapping->host) {
 		/* !PageAnon && !swapper_space */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 06/10] mm: add __folio_clear_dirty_for_io() helper
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (4 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc Joanne Koong
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add __folio_clear_dirty_for_io() which takes in an arg for whether the
folio and wb stats should be updated as part of the call or not.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 mm/page-writeback.c | 47 +++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a3805988f3ad..77a46bf8052f 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2927,21 +2927,7 @@ void __folio_cancel_dirty(struct folio *folio)
 }
 EXPORT_SYMBOL(__folio_cancel_dirty);
 
-/*
- * Clear a folio's dirty flag, while caring for dirty memory accounting.
- * Returns true if the folio was previously dirty.
- *
- * This is for preparing to put the folio under writeout.  We leave
- * the folio tagged as dirty in the xarray so that a concurrent
- * write-for-sync can discover it via a PAGECACHE_TAG_DIRTY walk.
- * The ->writepage implementation will run either folio_start_writeback()
- * or folio_mark_dirty(), at which stage we bring the folio's dirty flag
- * and xarray dirty tag back into sync.
- *
- * This incoherency between the folio's dirty flag and xarray tag is
- * unfortunate, but it only exists while the folio is locked.
- */
-bool folio_clear_dirty_for_io(struct folio *folio)
+static bool __folio_clear_dirty_for_io(struct folio *folio, bool update_stats)
 {
 	struct address_space *mapping = folio_mapping(folio);
 	bool ret = false;
@@ -2990,10 +2976,14 @@ bool folio_clear_dirty_for_io(struct folio *folio)
 		 */
 		wb = unlocked_inode_to_wb_begin(inode, &cookie);
 		if (folio_test_clear_dirty(folio)) {
-			long nr = folio_nr_pages(folio);
-			lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr);
-			zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
-			wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
+			if (update_stats) {
+				long nr = folio_nr_pages(folio);
+				lruvec_stat_mod_folio(folio, NR_FILE_DIRTY,
+						      -nr);
+				zone_stat_mod_folio(folio,
+						    NR_ZONE_WRITE_PENDING, -nr);
+				wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
+			}
 			ret = true;
 		}
 		unlocked_inode_to_wb_end(inode, &cookie);
@@ -3001,6 +2991,25 @@ bool folio_clear_dirty_for_io(struct folio *folio)
 	}
 	return folio_test_clear_dirty(folio);
 }
+
+/*
+ * Clear a folio's dirty flag, while caring for dirty memory accounting.
+ * Returns true if the folio was previously dirty.
+ *
+ * This is for preparing to put the folio under writeout.  We leave
+ * the folio tagged as dirty in the xarray so that a concurrent
+ * write-for-sync can discover it via a PAGECACHE_TAG_DIRTY walk.
+ * The ->writepage implementation will run either folio_start_writeback()
+ * or folio_mark_dirty(), at which stage we bring the folio's dirty flag
+ * and xarray dirty tag back into sync.
+ *
+ * This incoherency between the folio's dirty flag and xarray tag is
+ * unfortunate, but it only exists while the folio is locked.
+ */
+bool folio_clear_dirty_for_io(struct folio *folio)
+{
+	return __folio_clear_dirty_for_io(folio, true);
+}
 EXPORT_SYMBOL(folio_clear_dirty_for_io);
 
 static void wb_inode_writeback_start(struct bdi_writeback *wb)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (5 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 06/10] mm: add __folio_clear_dirty_for_io() helper Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-12  8:06   ` Christoph Hellwig
  2025-08-01  0:21 ` [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function Joanne Koong
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add a no_stats_accounting bitfield to wbc that callers can set. Hook
this up to __folio_clear_dirty_for_io() when preparing writeback.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/writeback.h | 3 +++
 mm/page-writeback.c       | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 34afa6912a1c..000795a47cb3 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -72,6 +72,9 @@ struct writeback_control {
 	 */
 	unsigned no_cgroup_owner:1;
 
+	/* Do not do any stats accounting. The caller will do this themselves */
+	unsigned no_stats_accounting:1;
+
 	/* To enable batching of swap writes to non-block-device backends,
 	 * "plug" can be set point to a 'struct swap_iocb *'.  When all swap
 	 * writes have been submitted, if with swap_iocb is not NULL,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 77a46bf8052f..c1fec76ee869 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2404,6 +2404,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
+static bool __folio_clear_dirty_for_io(struct folio *folio, bool update_stats);
 static bool folio_prepare_writeback(struct address_space *mapping,
 		struct writeback_control *wbc, struct folio *folio)
 {
@@ -2430,7 +2431,7 @@ static bool folio_prepare_writeback(struct address_space *mapping,
 	}
 	BUG_ON(folio_test_writeback(folio));
 
-	if (!folio_clear_dirty_for_io(folio))
+	if (!__folio_clear_dirty_for_io(folio, !wbc->no_stats_accounting))
 		return false;
 
 	return true;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (6 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-04 16:26   ` Jeff Layton
  2025-08-01  0:21 ` [RFC PATCH v1 09/10] mm: add clear_dirty_for_io_stats() helper Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting Joanne Koong
  9 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Move logic for clearing dirty stats into a helper function
both folio_account_cleaned() and __folio_clear_dirty_for_io() invoke.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 mm/page-writeback.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c1fec76ee869..f5916711db2d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2703,6 +2703,14 @@ static void folio_account_dirtied(struct folio *folio,
 	}
 }
 
+static void __clear_dirty_for_io_stats(struct folio *folio,
+			struct bdi_writeback *wb, long nr_pages)
+{
+	lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr_pages);
+	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr_pages);
+	wb_stat_mod(wb, WB_RECLAIMABLE, -nr_pages);
+}
+
 /*
  * Helper function for deaccounting dirty page without writeback.
  *
@@ -2711,9 +2719,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
 {
 	long nr = folio_nr_pages(folio);
 
-	lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr);
-	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
-	wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
+	__clear_dirty_for_io_stats(folio, wb, nr);
 	task_io_account_cancelled_write(nr * PAGE_SIZE);
 }
 
@@ -2977,14 +2983,9 @@ static bool __folio_clear_dirty_for_io(struct folio *folio, bool update_stats)
 		 */
 		wb = unlocked_inode_to_wb_begin(inode, &cookie);
 		if (folio_test_clear_dirty(folio)) {
-			if (update_stats) {
-				long nr = folio_nr_pages(folio);
-				lruvec_stat_mod_folio(folio, NR_FILE_DIRTY,
-						      -nr);
-				zone_stat_mod_folio(folio,
-						    NR_ZONE_WRITE_PENDING, -nr);
-				wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
-			}
+			if (update_stats)
+				__clear_dirty_for_io_stats(folio, wb,
+						folio_nr_pages(folio));
 			ret = true;
 		}
 		unlocked_inode_to_wb_end(inode, &cookie);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 09/10] mm: add clear_dirty_for_io_stats() helper
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (7 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-01  0:21 ` [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting Joanne Koong
  9 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add clear_dirty_for_io_stats() which clears dirty stats corresponding to
a folio.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/writeback.h |  1 +
 mm/page-writeback.c       | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 000795a47cb3..8ca0e106cef7 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -382,6 +382,7 @@ int write_cache_pages(struct address_space *mapping,
 		      void *data);
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
 void writeback_set_ratelimit(void);
+void clear_dirty_for_io_stats(struct folio *folio, long nr_pages);
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f5916711db2d..d49cea4854c1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2711,6 +2711,22 @@ static void __clear_dirty_for_io_stats(struct folio *folio,
 	wb_stat_mod(wb, WB_RECLAIMABLE, -nr_pages);
 }
 
+void clear_dirty_for_io_stats(struct folio *folio, long nr_pages)
+{
+	struct address_space *mapping = folio_mapping(folio);
+	struct bdi_writeback *wb;
+	struct wb_lock_cookie cookie = {};
+	struct inode *inode;
+
+	if (!mapping || !mapping_can_writeback(mapping))
+		return;
+
+	inode = mapping->host;
+	wb = unlocked_inode_to_wb_begin(inode, &cookie);
+	__clear_dirty_for_io_stats(folio, wb, nr_pages);
+	unlocked_inode_to_wb_end(inode, &cookie);
+}
+
 /*
  * Helper function for deaccounting dirty page without writeback.
  *
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
                   ` (8 preceding siblings ...)
  2025-08-01  0:21 ` [RFC PATCH v1 09/10] mm: add clear_dirty_for_io_stats() helper Joanne Koong
@ 2025-08-01  0:21 ` Joanne Koong
  2025-08-12  8:15   ` Christoph Hellwig
  2025-08-14 16:37   ` Darrick J. Wong
  9 siblings, 2 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01  0:21 UTC (permalink / raw)
  To: linux-mm, brauner; +Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

Add granular dirty and writeback accounting for large folios. These
stats are used by the mm layer for dirty balancing and throttling.
Having granular dirty and writeback accounting helps prevent
over-aggressive balancing and throttling.

There are 4 places in iomap this commit affects:
a) filemap dirtying, which now calls filemap_dirty_folio_pages()
b) writeback_iter with setting the wbc->no_stats_accounting bit and
calling clear_dirty_for_io_stats()
c) starting writeback, which now calls __folio_start_writeback()
d) ending writeback, which now calls folio_end_writeback_pages()

This relies on using the ifs->state dirty bitmap to track dirty pages in
the folio. As such, this can only be utilized on filesystems where the
block size >= PAGE_SIZE.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/iomap/buffered-io.c | 136 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 128 insertions(+), 8 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcc6e0e5334e..626c3c8399cc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -20,6 +20,8 @@ struct iomap_folio_state {
 	spinlock_t		state_lock;
 	unsigned int		read_bytes_pending;
 	atomic_t		write_bytes_pending;
+	/* number of pages being currently written back */
+	unsigned		nr_pages_writeback;
 
 	/*
 	 * Each block has two bits in this bitmap:
@@ -81,6 +83,25 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
 	return test_bit(block + blks_per_folio, ifs->state);
 }
 
+static unsigned ifs_count_dirty_pages(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned block_size = 1 << inode->i_blkbits;
+	unsigned start_blk = 0;
+	unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
+				i_blocks_per_folio(inode, folio));
+	unsigned nblks = 0;
+
+	while (start_blk < end_blk) {
+		if (ifs_block_is_dirty(folio, ifs, start_blk))
+			nblks++;
+		start_blk++;
+	}
+
+	return nblks * (block_size >> PAGE_SHIFT);
+}
+
 static unsigned ifs_find_dirty_range(struct folio *folio,
 		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
 {
@@ -165,6 +186,63 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
 		ifs_set_range_dirty(folio, ifs, off, len);
 }
 
+static long iomap_get_range_newly_dirtied(struct folio *folio, loff_t pos,
+		unsigned len)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned start_blk = pos >> inode->i_blkbits;
+	unsigned end_blk = min((unsigned)((pos + len - 1) >> inode->i_blkbits),
+				i_blocks_per_folio(inode, folio) - 1);
+	unsigned nblks = 0;
+	unsigned block_size = 1 << inode->i_blkbits;
+
+	while (start_blk <= end_blk) {
+		if (!ifs_block_is_dirty(folio, ifs, start_blk))
+			nblks++;
+		start_blk++;
+	}
+
+	return nblks * (block_size >> PAGE_SHIFT);
+}
+
+static bool iomap_granular_dirty_pages(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode;
+	unsigned block_size;
+
+	if (!ifs)
+		return false;
+
+	inode = folio->mapping->host;
+	block_size = 1 << inode->i_blkbits;
+
+	if (block_size >= PAGE_SIZE) {
+		WARN_ON(block_size & (PAGE_SIZE - 1));
+		return true;
+	}
+	return false;
+}
+
+static bool iomap_dirty_folio_range(struct address_space *mapping, struct folio *folio,
+			loff_t pos, unsigned len)
+{
+	long nr_new_dirty_pages;
+
+	if (!iomap_granular_dirty_pages(folio)) {
+		iomap_set_range_dirty(folio, pos, len);
+		return filemap_dirty_folio(mapping, folio);
+	}
+
+	nr_new_dirty_pages = iomap_get_range_newly_dirtied(folio, pos, len);
+	if (!nr_new_dirty_pages)
+		return false;
+
+	iomap_set_range_dirty(folio, pos, len);
+	return filemap_dirty_folio_pages(mapping, folio, nr_new_dirty_pages);
+}
+
 static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 		struct folio *folio, unsigned int flags)
 {
@@ -661,8 +739,7 @@ bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
 	size_t len = folio_size(folio);
 
 	ifs_alloc(inode, folio, 0);
-	iomap_set_range_dirty(folio, 0, len);
-	return filemap_dirty_folio(mapping, folio);
+	return iomap_dirty_folio_range(mapping, folio, 0, len);
 }
 EXPORT_SYMBOL_GPL(iomap_dirty_folio);
 
@@ -886,8 +963,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	if (unlikely(copied < len && !folio_test_uptodate(folio)))
 		return false;
 	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
-	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
-	filemap_dirty_folio(inode->i_mapping, folio);
+	iomap_dirty_folio_range(inode->i_mapping, folio,
+			offset_in_folio(folio, pos), copied);
 	return true;
 }
 
@@ -1560,6 +1637,29 @@ void iomap_start_folio_write(struct inode *inode, struct folio *folio,
 }
 EXPORT_SYMBOL_GPL(iomap_start_folio_write);
 
+static void iomap_folio_start_writeback(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (!iomap_granular_dirty_pages(folio))
+		return folio_start_writeback(folio);
+
+	__folio_start_writeback(folio, false, ifs->nr_pages_writeback);
+}
+
+static void iomap_folio_end_writeback(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	long nr_pages_writeback;
+
+	if (!iomap_granular_dirty_pages(folio))
+		return folio_end_writeback(folio);
+
+	nr_pages_writeback = ifs->nr_pages_writeback;
+	ifs->nr_pages_writeback = 0;
+	folio_end_writeback_pages(folio, nr_pages_writeback);
+}
+
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len)
 {
@@ -1569,7 +1669,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
 
 	if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
-		folio_end_writeback(folio);
+		iomap_folio_end_writeback(folio);
 }
 EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
@@ -1657,6 +1757,21 @@ static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
 	return true;
 }
 
+static void iomap_update_dirty_stats(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	long nr_dirty_pages;
+
+	if (iomap_granular_dirty_pages(folio)) {
+		nr_dirty_pages = ifs_count_dirty_pages(folio);
+		ifs->nr_pages_writeback = nr_dirty_pages;
+	} else {
+		nr_dirty_pages = folio_nr_pages(folio);
+	}
+
+	clear_dirty_for_io_stats(folio, nr_dirty_pages);
+}
+
 int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
@@ -1674,6 +1789,8 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 
 	trace_iomap_writeback_folio(inode, pos, folio_size(folio));
 
+	iomap_update_dirty_stats(folio);
+
 	if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
 		return 0;
 	WARN_ON_ONCE(end_pos <= pos);
@@ -1681,6 +1798,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	if (i_blocks_per_folio(inode, folio) > 1) {
 		if (!ifs) {
 			ifs = ifs_alloc(inode, folio, 0);
+			ifs->nr_pages_writeback = folio_nr_pages(folio);
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
 		}
 
@@ -1698,7 +1816,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	 * Set the writeback bit ASAP, as the I/O completion for the single
 	 * block per folio case happen hit as soon as we're submitting the bio.
 	 */
-	folio_start_writeback(folio);
+	iomap_folio_start_writeback(folio);
 
 	/*
 	 * Walk through the folio to find dirty areas to write back.
@@ -1731,10 +1849,10 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	 */
 	if (ifs) {
 		if (atomic_dec_and_test(&ifs->write_bytes_pending))
-			folio_end_writeback(folio);
+			iomap_folio_end_writeback(folio);
 	} else {
 		if (!wb_pending)
-			folio_end_writeback(folio);
+			iomap_folio_end_writeback(folio);
 	}
 	mapping_set_error(inode->i_mapping, error);
 	return error;
@@ -1756,6 +1874,8 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
 			PF_MEMALLOC))
 		return -EIO;
 
+	wpc->wbc->no_stats_accounting = true;
+
 	while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error))) {
 		error = iomap_writeback_folio(wpc, folio);
 		folio_unlock(folio);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper
  2025-08-01  0:21 ` [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper Joanne Koong
@ 2025-08-01 17:07   ` Jan Kara
  2025-08-01 21:47     ` Joanne Koong
  2025-08-12  8:05   ` Christoph Hellwig
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Kara @ 2025-08-01 17:07 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, djwong, linux-fsdevel,
	kernel-team

On Thu 31-07-25 17:21:26, Joanne Koong wrote:
> Add filemap_dirty_folio_pages() which takes in the number of pages to dirty.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
...
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index b0ae10a6687d..a3805988f3ad 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2732,7 +2732,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>   * try_to_free_buffers() to fail.
>   */
>  void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
> -			     int warn, long nr_pages)
> +			     int warn, long nr_pages, bool newly_dirty)
>  {
>  	unsigned long flags;
>  
> @@ -2740,12 +2740,29 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
>  	if (folio->mapping) {	/* Race with truncate? */
>  		WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
>  		folio_account_dirtied(folio, mapping, nr_pages);
> -		__xa_set_mark(&mapping->i_pages, folio_index(folio),
> -				PAGECACHE_TAG_DIRTY);
> +		if (newly_dirty)
> +			__xa_set_mark(&mapping->i_pages, folio_index(folio),
> +					PAGECACHE_TAG_DIRTY);
>  	}
>  	xa_unlock_irqrestore(&mapping->i_pages, flags);

I think this is a dangerous coding pattern. What is making sure that by the
time you get here newly_dirty is still valid? I mean the dirtying can race
e.g. with writeback and so it can happen that the page is clean by the time
we get here but newly_dirty is false. We are often protected by page lock
when dirtying a folio but not always... So if nothing else this requires a
careful documentation about correct use.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper
  2025-08-01 17:07   ` Jan Kara
@ 2025-08-01 21:47     ` Joanne Koong
  0 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-01 21:47 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, brauner, willy, hch, djwong, linux-fsdevel, kernel-team

On Fri, Aug 1, 2025 at 10:07 AM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 31-07-25 17:21:26, Joanne Koong wrote:
> > Add filemap_dirty_folio_pages() which takes in the number of pages to dirty.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ...
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index b0ae10a6687d..a3805988f3ad 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2732,7 +2732,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
> >   * try_to_free_buffers() to fail.
> >   */
> >  void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
> > -                          int warn, long nr_pages)
> > +                          int warn, long nr_pages, bool newly_dirty)
> >  {
> >       unsigned long flags;
> >
> > @@ -2740,12 +2740,29 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
> >       if (folio->mapping) {   /* Race with truncate? */
> >               WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
> >               folio_account_dirtied(folio, mapping, nr_pages);
> > -             __xa_set_mark(&mapping->i_pages, folio_index(folio),
> > -                             PAGECACHE_TAG_DIRTY);
> > +             if (newly_dirty)
> > +                     __xa_set_mark(&mapping->i_pages, folio_index(folio),
> > +                                     PAGECACHE_TAG_DIRTY);
> >       }
> >       xa_unlock_irqrestore(&mapping->i_pages, flags);
>
> I think this is a dangerous coding pattern. What is making sure that by the
> time you get here newly_dirty is still valid? I mean the dirtying can race
> e.g. with writeback and so it can happen that the page is clean by the time
> we get here but newly_dirty is false. We are often protected by page lock
> when dirtying a folio but not always... So if nothing else this requires a
> careful documentation about correct use.
>
>                                                                 Honza

I think races against writeback and truncation could already exist
here prior to this patch. afaict from the function documentation for
__folio_mark_dirty(), it's up to the caller to prevent this:

 * It is the caller's responsibility to prevent the folio from being truncated
 * while this function is in progress, although it may have been truncated
 * before this function is called.  Most callers have the folio locked.
 * A few have the folio blocked from truncation through other means (e.g.
 * zap_vma_pages() has it mapped and is holding the page table lock).

The documentation doesn't mention anything about writeback but I think
it applies here similarly.

I'm happy to do this another way though if there's a better approach here.

Thanks,
Joanne

> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function
  2025-08-01  0:21 ` [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function Joanne Koong
@ 2025-08-04 16:26   ` Jeff Layton
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2025-08-04 16:26 UTC (permalink / raw)
  To: Joanne Koong, linux-mm, brauner
  Cc: willy, jack, hch, djwong, linux-fsdevel, kernel-team

On Thu, 2025-07-31 at 17:21 -0700, Joanne Koong wrote:
> Move logic for clearing dirty stats into a helper function
> both folio_account_cleaned() and __folio_clear_dirty_for_io() invoke.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  mm/page-writeback.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c1fec76ee869..f5916711db2d 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2703,6 +2703,14 @@ static void folio_account_dirtied(struct folio *folio,
>  	}
>  }
>  
> +static void __clear_dirty_for_io_stats(struct folio *folio,
> +			struct bdi_writeback *wb, long nr_pages)
> +{
> +	lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr_pages);
> +	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr_pages);
> +	wb_stat_mod(wb, WB_RECLAIMABLE, -nr_pages);
> +}
> +
>  /*
>   * Helper function for deaccounting dirty page without writeback.
>   *
> @@ -2711,9 +2719,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>  {
>  	long nr = folio_nr_pages(folio);
>  
> -	lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr);
> -	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
> -	wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
> +	__clear_dirty_for_io_stats(folio, wb, nr);
>  	task_io_account_cancelled_write(nr * PAGE_SIZE);
>  }
>  
> @@ -2977,14 +2983,9 @@ static bool __folio_clear_dirty_for_io(struct folio *folio, bool update_stats)
>  		 */
>  		wb = unlocked_inode_to_wb_begin(inode, &cookie);
>  		if (folio_test_clear_dirty(folio)) {
> -			if (update_stats) {
> -				long nr = folio_nr_pages(folio);
> -				lruvec_stat_mod_folio(folio, NR_FILE_DIRTY,
> -						      -nr);
> -				zone_stat_mod_folio(folio,
> -						    NR_ZONE_WRITE_PENDING, -nr);
> -				wb_stat_mod(wb, WB_RECLAIMABLE, -nr);
> -			}
> +			if (update_stats)
> +				__clear_dirty_for_io_stats(folio, wb,
> +						folio_nr_pages(folio));
>  			ret = true;
>  		}
>  		unlocked_inode_to_wb_end(inode, &cookie);


This seems like a nice cleanup that isn't dependent on the rest of the
series.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper
  2025-08-01  0:21 ` [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper Joanne Koong
@ 2025-08-12  8:03   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2025-08-12  8:03 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, djwong, linux-fsdevel,
	kernel-team

On Thu, Jul 31, 2025 at 05:21:24PM -0700, Joanne Koong wrote:
> -/**
> - * folio_end_writeback - End writeback against a folio.
> - * @folio: The folio.
> - *
> - * The folio must actually be under writeback.
> - *
> - * Context: May be called from process or interrupt context.
> - */
> -void folio_end_writeback(struct folio *folio)
> +void folio_end_writeback_pages(struct folio *folio, long nr_pages)

Please keep the kerneldoc comment for the now more complicated function.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper
  2025-08-01  0:21 ` [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper Joanne Koong
  2025-08-01 17:07   ` Jan Kara
@ 2025-08-12  8:05   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2025-08-12  8:05 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, djwong, linux-fsdevel,
	kernel-team

> +bool filemap_dirty_folio_pages(struct address_space *mapping, struct folio *folio,

Overly long line here.  Also this function would benefit from a
kerneldoc comment.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc
  2025-08-01  0:21 ` [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc Joanne Koong
@ 2025-08-12  8:06   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2025-08-12  8:06 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, djwong, linux-fsdevel,
	kernel-team

On Thu, Jul 31, 2025 at 05:21:28PM -0700, Joanne Koong wrote:
> Add a no_stats_accounting bitfield to wbc that callers can set. Hook
> this up to __folio_clear_dirty_for_io() when preparing writeback.

Please explain the use case for this a bit more in the commit log,
and maybe also the comments.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-01  0:21 ` [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting Joanne Koong
@ 2025-08-12  8:15   ` Christoph Hellwig
  2025-08-13  1:10     ` Joanne Koong
  2025-08-14 16:37   ` Darrick J. Wong
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2025-08-12  8:15 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, djwong, linux-fsdevel,
	kernel-team

> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index bcc6e0e5334e..626c3c8399cc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -20,6 +20,8 @@ struct iomap_folio_state {
>  	spinlock_t		state_lock;
>  	unsigned int		read_bytes_pending;
>  	atomic_t		write_bytes_pending;
> +	/* number of pages being currently written back */
> +	unsigned		nr_pages_writeback;

This adds more sizse to the folio state.  Shouldn't this be the same
as

    DIV_ROUND_UP(write_bytes_pending, PAGE_SIZE)

anyway?

> +	unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
> +				i_blocks_per_folio(inode, folio));

Overly long line.  Also not sure why the cast is needed to start with?

> +	unsigned nblks = 0;
> +
> +	while (start_blk < end_blk) {
> +		if (ifs_block_is_dirty(folio, ifs, start_blk))
> +			nblks++;
> +		start_blk++;
> +	}

We have this pattern open coded in a few places.  Maybe factor it into a
helper first?  And then maybe someone smart can actually make it use
find_first_bit/find_next_bit.

> +static bool iomap_granular_dirty_pages(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode;
> +	unsigned block_size;
> +
> +	if (!ifs)
> +		return false;
> +
> +	inode = folio->mapping->host;
> +	block_size = 1 << inode->i_blkbits;
> +
> +	if (block_size >= PAGE_SIZE) {
> +		WARN_ON(block_size & (PAGE_SIZE - 1));
> +		return true;
> +	}
> +	return false;

Do we need the WARN_ON?  Both the block and page size must be powers
of two, so I can't see how it would trigger.  Also this can use the
i_blocksize helper.

I.e. just turn this into:

	return i_blocksize(folio->mapping->host) >= PAGE_SIZE;


> +static bool iomap_dirty_folio_range(struct address_space *mapping, struct folio *folio,

Overly long line.

> +	wpc->wbc->no_stats_accounting = true;

Who does the writeback accounting now?  Maybe throw in a comment if
iomap is now doing something different than all the other writeback
code.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-12  8:15   ` Christoph Hellwig
@ 2025-08-13  1:10     ` Joanne Koong
  2025-08-13 22:03       ` Joanne Koong
  0 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-13  1:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, brauner, willy, jack, djwong, linux-fsdevel,
	kernel-team

On Tue, Aug 12, 2025 at 1:15 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index bcc6e0e5334e..626c3c8399cc 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -20,6 +20,8 @@ struct iomap_folio_state {
> >       spinlock_t              state_lock;
> >       unsigned int            read_bytes_pending;
> >       atomic_t                write_bytes_pending;
> > +     /* number of pages being currently written back */
> > +     unsigned                nr_pages_writeback;
>
> This adds more sizse to the folio state.  Shouldn't this be the same
> as
>
>     DIV_ROUND_UP(write_bytes_pending, PAGE_SIZE)
>
> anyway?

I don't think we can use write_bytes_pending because writeback for a
folio may be split into multiple requests (eg for fuse, if the ranges
are not contiguous) and each request when it finishes will call
iomap_finish_folio_write() which will decrement write_bytes_pending,
but when the last folio writeback request has finished and we call
folio_end_writeback_pages(), we would need the original value of
write_bytes_pending before any of the decrements. write_bytes_pending
gets decremented since it gets used as a refcount.

I need to look more into whether readahead/read_folio and writeback
run concurrently or not but if not, maybe read_bytes_pending and
write_bytes_pending could be consolidated together.

>
> > +     unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
> > +                             i_blocks_per_folio(inode, folio));
>
> Overly long line.  Also not sure why the cast is needed to start with?

The cast is needed to avoid the compiler error of comparing a loff_t
with an unsigned int. I see there's a min_t helper, I'll use that
instead then.

>
> > +     unsigned nblks = 0;
> > +
> > +     while (start_blk < end_blk) {
> > +             if (ifs_block_is_dirty(folio, ifs, start_blk))
> > +                     nblks++;
> > +             start_blk++;
> > +     }
>
> We have this pattern open coded in a few places.  Maybe factor it into a
> helper first?  And then maybe someone smart can actually make it use
> find_first_bit/find_next_bit.
>
> > +static bool iomap_granular_dirty_pages(struct folio *folio)
> > +{
> > +     struct iomap_folio_state *ifs = folio->private;
> > +     struct inode *inode;
> > +     unsigned block_size;
> > +
> > +     if (!ifs)
> > +             return false;
> > +
> > +     inode = folio->mapping->host;
> > +     block_size = 1 << inode->i_blkbits;
> > +
> > +     if (block_size >= PAGE_SIZE) {
> > +             WARN_ON(block_size & (PAGE_SIZE - 1));
> > +             return true;
> > +     }
> > +     return false;
>
> Do we need the WARN_ON?  Both the block and page size must be powers
> of two, so I can't see how it would trigger.  Also this can use the
> i_blocksize helper.

I'll get rid of the WARN_ON and will incorporate your i_blocksize
helper suggestion.

>
> I.e. just turn this into:
>
>         return i_blocksize(folio->mapping->host) >= PAGE_SIZE;
>
>
> > +static bool iomap_dirty_folio_range(struct address_space *mapping, struct folio *folio,
>
> Overly long line.

I'll fix up the long lines in the patchset, sorry.

>
> > +     wpc->wbc->no_stats_accounting = true;
>
> Who does the writeback accounting now?  Maybe throw in a comment if
> iomap is now doing something different than all the other writeback
> code.

iomap does the writeback accounting now, which happens in
iomap_update_dirty_stats(). I'll add a comment about that.


Thanks,
Joanne
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-13  1:10     ` Joanne Koong
@ 2025-08-13 22:03       ` Joanne Koong
  0 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-13 22:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, brauner, willy, jack, djwong, linux-fsdevel,
	kernel-team

On Tue, Aug 12, 2025 at 6:10 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> I need to look more into whether readahead/read_folio and writeback
> run concurrently or not but if not, maybe read_bytes_pending and
> write_bytes_pending could be consolidated together.

Nvm, that doesn't work. read_folio() can still get called for a folio
that's under writeback if it's not fully up to date.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-01  0:21 ` [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting Joanne Koong
  2025-08-12  8:15   ` Christoph Hellwig
@ 2025-08-14 16:37   ` Darrick J. Wong
  2025-08-15 18:38     ` Joanne Koong
  1 sibling, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2025-08-14 16:37 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-mm, brauner, willy, jack, hch, linux-fsdevel, kernel-team

On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote:
> Add granular dirty and writeback accounting for large folios. These
> stats are used by the mm layer for dirty balancing and throttling.
> Having granular dirty and writeback accounting helps prevent
> over-aggressive balancing and throttling.
> 
> There are 4 places in iomap this commit affects:
> a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> b) writeback_iter with setting the wbc->no_stats_accounting bit and
> calling clear_dirty_for_io_stats()
> c) starting writeback, which now calls __folio_start_writeback()
> d) ending writeback, which now calls folio_end_writeback_pages()
> 
> This relies on using the ifs->state dirty bitmap to track dirty pages in
> the folio. As such, this can only be utilized on filesystems where the
> block size >= PAGE_SIZE.

Apologies for my slow responses this month. :)

I wonder, does this cause an observable change in the writeback
accounting and throttling behavior for non-fuse filesystems like XFS
that use large folios?  I *think* this does actually reduce throttling
for XFS, but it might not be so noticeable because the limits are much
more generous outside of fuse?

> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/iomap/buffered-io.c | 136 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 128 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index bcc6e0e5334e..626c3c8399cc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -20,6 +20,8 @@ struct iomap_folio_state {
>  	spinlock_t		state_lock;
>  	unsigned int		read_bytes_pending;
>  	atomic_t		write_bytes_pending;
> +	/* number of pages being currently written back */
> +	unsigned		nr_pages_writeback;
>  
>  	/*
>  	 * Each block has two bits in this bitmap:
> @@ -81,6 +83,25 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
>  	return test_bit(block + blks_per_folio, ifs->state);
>  }
>  
> +static unsigned ifs_count_dirty_pages(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode = folio->mapping->host;
> +	unsigned block_size = 1 << inode->i_blkbits;
> +	unsigned start_blk = 0;
> +	unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
> +				i_blocks_per_folio(inode, folio));
> +	unsigned nblks = 0;
> +
> +	while (start_blk < end_blk) {
> +		if (ifs_block_is_dirty(folio, ifs, start_blk))
> +			nblks++;
> +		start_blk++;
> +	}

Hmm, isn't this bitmap_weight(ifs->state, blks_per_folio) ?

Ohh wait no, the dirty bitmap doesn't start on a byte boundary because
the format of the bitmap is [uptodate bits][dirty bits].

Maybe those two should be reversed, because I bet the dirty state gets
changed a lot more over the lifetime of a folio than the uptodate bits.

> +
> +	return nblks * (block_size >> PAGE_SHIFT);
> +}
> +
>  static unsigned ifs_find_dirty_range(struct folio *folio,
>  		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
>  {
> @@ -165,6 +186,63 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
>  		ifs_set_range_dirty(folio, ifs, off, len);
>  }
>  
> +static long iomap_get_range_newly_dirtied(struct folio *folio, loff_t pos,
> +		unsigned len)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode = folio->mapping->host;
> +	unsigned start_blk = pos >> inode->i_blkbits;
> +	unsigned end_blk = min((unsigned)((pos + len - 1) >> inode->i_blkbits),
> +				i_blocks_per_folio(inode, folio) - 1);
> +	unsigned nblks = 0;
> +	unsigned block_size = 1 << inode->i_blkbits;
> +
> +	while (start_blk <= end_blk) {
> +		if (!ifs_block_is_dirty(folio, ifs, start_blk))
> +			nblks++;
> +		start_blk++;
> +	}

...then this becomes (endblk - startblk) - bitmap_weight().

> +
> +	return nblks * (block_size >> PAGE_SHIFT);
> +}
> +
> +static bool iomap_granular_dirty_pages(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode;
> +	unsigned block_size;
> +
> +	if (!ifs)
> +		return false;
> +
> +	inode = folio->mapping->host;
> +	block_size = 1 << inode->i_blkbits;
> +
> +	if (block_size >= PAGE_SIZE) {
> +		WARN_ON(block_size & (PAGE_SIZE - 1));
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static bool iomap_dirty_folio_range(struct address_space *mapping, struct folio *folio,
> +			loff_t pos, unsigned len)
> +{
> +	long nr_new_dirty_pages;
> +
> +	if (!iomap_granular_dirty_pages(folio)) {
> +		iomap_set_range_dirty(folio, pos, len);
> +		return filemap_dirty_folio(mapping, folio);
> +	}
> +
> +	nr_new_dirty_pages = iomap_get_range_newly_dirtied(folio, pos, len);
> +	if (!nr_new_dirty_pages)
> +		return false;
> +
> +	iomap_set_range_dirty(folio, pos, len);
> +	return filemap_dirty_folio_pages(mapping, folio, nr_new_dirty_pages);
> +}
> +
>  static struct iomap_folio_state *ifs_alloc(struct inode *inode,
>  		struct folio *folio, unsigned int flags)
>  {
> @@ -661,8 +739,7 @@ bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
>  	size_t len = folio_size(folio);
>  
>  	ifs_alloc(inode, folio, 0);
> -	iomap_set_range_dirty(folio, 0, len);
> -	return filemap_dirty_folio(mapping, folio);
> +	return iomap_dirty_folio_range(mapping, folio, 0, len);
>  }
>  EXPORT_SYMBOL_GPL(iomap_dirty_folio);
>  
> @@ -886,8 +963,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	if (unlikely(copied < len && !folio_test_uptodate(folio)))
>  		return false;
>  	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
> -	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
> -	filemap_dirty_folio(inode->i_mapping, folio);
> +	iomap_dirty_folio_range(inode->i_mapping, folio,
> +			offset_in_folio(folio, pos), copied);
>  	return true;
>  }
>  
> @@ -1560,6 +1637,29 @@ void iomap_start_folio_write(struct inode *inode, struct folio *folio,
>  }
>  EXPORT_SYMBOL_GPL(iomap_start_folio_write);
>  
> +static void iomap_folio_start_writeback(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +
> +	if (!iomap_granular_dirty_pages(folio))
> +		return folio_start_writeback(folio);
> +
> +	__folio_start_writeback(folio, false, ifs->nr_pages_writeback);
> +}
> +
> +static void iomap_folio_end_writeback(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	long nr_pages_writeback;
> +
> +	if (!iomap_granular_dirty_pages(folio))
> +		return folio_end_writeback(folio);
> +
> +	nr_pages_writeback = ifs->nr_pages_writeback;
> +	ifs->nr_pages_writeback = 0;
> +	folio_end_writeback_pages(folio, nr_pages_writeback);
> +}
> +
>  void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
>  		size_t len)
>  {
> @@ -1569,7 +1669,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
>  	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
>  
>  	if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
> -		folio_end_writeback(folio);
> +		iomap_folio_end_writeback(folio);
>  }
>  EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
>  
> @@ -1657,6 +1757,21 @@ static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
>  	return true;
>  }
>  
> +static void iomap_update_dirty_stats(struct folio *folio)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	long nr_dirty_pages;
> +
> +	if (iomap_granular_dirty_pages(folio)) {
> +		nr_dirty_pages = ifs_count_dirty_pages(folio);
> +		ifs->nr_pages_writeback = nr_dirty_pages;
> +	} else {
> +		nr_dirty_pages = folio_nr_pages(folio);
> +	}
> +
> +	clear_dirty_for_io_stats(folio, nr_dirty_pages);
> +}
> +
>  int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
> @@ -1674,6 +1789,8 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  
>  	trace_iomap_writeback_folio(inode, pos, folio_size(folio));
>  
> +	iomap_update_dirty_stats(folio);
> +
>  	if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
>  		return 0;
>  	WARN_ON_ONCE(end_pos <= pos);
> @@ -1681,6 +1798,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	if (i_blocks_per_folio(inode, folio) > 1) {
>  		if (!ifs) {
>  			ifs = ifs_alloc(inode, folio, 0);
> +			ifs->nr_pages_writeback = folio_nr_pages(folio);
>  			iomap_set_range_dirty(folio, 0, end_pos - pos);
>  		}
>  
> @@ -1698,7 +1816,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	 * Set the writeback bit ASAP, as the I/O completion for the single
>  	 * block per folio case happen hit as soon as we're submitting the bio.
>  	 */
> -	folio_start_writeback(folio);
> +	iomap_folio_start_writeback(folio);
>  
>  	/*
>  	 * Walk through the folio to find dirty areas to write back.
> @@ -1731,10 +1849,10 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	 */
>  	if (ifs) {
>  		if (atomic_dec_and_test(&ifs->write_bytes_pending))
> -			folio_end_writeback(folio);
> +			iomap_folio_end_writeback(folio);
>  	} else {
>  		if (!wb_pending)
> -			folio_end_writeback(folio);
> +			iomap_folio_end_writeback(folio);
>  	}
>  	mapping_set_error(inode->i_mapping, error);
>  	return error;
> @@ -1756,6 +1874,8 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
>  			PF_MEMALLOC))
>  		return -EIO;
>  
> +	wpc->wbc->no_stats_accounting = true;
> +
>  	while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error))) {
>  		error = iomap_writeback_folio(wpc, folio);
>  		folio_unlock(folio);
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-14 16:37   ` Darrick J. Wong
@ 2025-08-15 18:38     ` Joanne Koong
  2025-08-28  0:08       ` Joanne Koong
  0 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-15 18:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-mm, brauner, willy, jack, hch, linux-fsdevel, kernel-team

On Thu, Aug 14, 2025 at 9:38 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote:
> > Add granular dirty and writeback accounting for large folios. These
> > stats are used by the mm layer for dirty balancing and throttling.
> > Having granular dirty and writeback accounting helps prevent
> > over-aggressive balancing and throttling.
> >
> > There are 4 places in iomap this commit affects:
> > a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> > b) writeback_iter with setting the wbc->no_stats_accounting bit and
> > calling clear_dirty_for_io_stats()
> > c) starting writeback, which now calls __folio_start_writeback()
> > d) ending writeback, which now calls folio_end_writeback_pages()
> >
> > This relies on using the ifs->state dirty bitmap to track dirty pages in
> > the folio. As such, this can only be utilized on filesystems where the
> > block size >= PAGE_SIZE.
>
> Apologies for my slow responses this month. :)

No worries at all, thanks for looking at this.
>
> I wonder, does this cause an observable change in the writeback
> accounting and throttling behavior for non-fuse filesystems like XFS
> that use large folios?  I *think* this does actually reduce throttling
> for XFS, but it might not be so noticeable because the limits are much
> more generous outside of fuse?

I haven't run any benchmarks on non-fuse filesystems yet but that's
what I would expect too. Will run some benchmarks to see!

>
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  fs/iomap/buffered-io.c | 136 ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 128 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index bcc6e0e5334e..626c3c8399cc 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -20,6 +20,8 @@ struct iomap_folio_state {
> >       spinlock_t              state_lock;
> >       unsigned int            read_bytes_pending;
> >       atomic_t                write_bytes_pending;
> > +     /* number of pages being currently written back */
> > +     unsigned                nr_pages_writeback;
> >
> >       /*
> >        * Each block has two bits in this bitmap:
> > @@ -81,6 +83,25 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
> >       return test_bit(block + blks_per_folio, ifs->state);
> >  }
> >
> > +static unsigned ifs_count_dirty_pages(struct folio *folio)
> > +{
> > +     struct iomap_folio_state *ifs = folio->private;
> > +     struct inode *inode = folio->mapping->host;
> > +     unsigned block_size = 1 << inode->i_blkbits;
> > +     unsigned start_blk = 0;
> > +     unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
> > +                             i_blocks_per_folio(inode, folio));
> > +     unsigned nblks = 0;
> > +
> > +     while (start_blk < end_blk) {
> > +             if (ifs_block_is_dirty(folio, ifs, start_blk))
> > +                     nblks++;
> > +             start_blk++;
> > +     }
>
> Hmm, isn't this bitmap_weight(ifs->state, blks_per_folio) ?
>
> Ohh wait no, the dirty bitmap doesn't start on a byte boundary because
> the format of the bitmap is [uptodate bits][dirty bits].
>
> Maybe those two should be reversed, because I bet the dirty state gets
> changed a lot more over the lifetime of a folio than the uptodate bits.

I think there's the find_next_bit() helper (which Christoph also
pointed out) that could probably be used here instead. Or at least
that's how I see a lot of the driver code doing it for their bitmaps.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-15 18:38     ` Joanne Koong
@ 2025-08-28  0:08       ` Joanne Koong
  2025-08-29 23:02         ` Joanne Koong
  0 siblings, 1 reply; 24+ messages in thread
From: Joanne Koong @ 2025-08-28  0:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-mm, brauner, willy, jack, hch, linux-fsdevel, kernel-team

On Fri, Aug 15, 2025 at 11:38 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Aug 14, 2025 at 9:38 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote:
> > > Add granular dirty and writeback accounting for large folios. These
> > > stats are used by the mm layer for dirty balancing and throttling.
> > > Having granular dirty and writeback accounting helps prevent
> > > over-aggressive balancing and throttling.
> > >
> > > There are 4 places in iomap this commit affects:
> > > a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> > > b) writeback_iter with setting the wbc->no_stats_accounting bit and
> > > calling clear_dirty_for_io_stats()
> > > c) starting writeback, which now calls __folio_start_writeback()
> > > d) ending writeback, which now calls folio_end_writeback_pages()
> > >
> > > This relies on using the ifs->state dirty bitmap to track dirty pages in
> > > the folio. As such, this can only be utilized on filesystems where the
> > > block size >= PAGE_SIZE.
> >
> > Apologies for my slow responses this month. :)
>
> No worries at all, thanks for looking at this.
> >
> > I wonder, does this cause an observable change in the writeback
> > accounting and throttling behavior for non-fuse filesystems like XFS
> > that use large folios?  I *think* this does actually reduce throttling
> > for XFS, but it might not be so noticeable because the limits are much
> > more generous outside of fuse?
>
> I haven't run any benchmarks on non-fuse filesystems yet but that's
> what I would expect too. Will run some benchmarks to see!

I ran some benchmarks on xfs for the contrived test case I used for
fuse (eg writing 2 GB in 128 MB chunks and then doing 50k 50-byte
random writes) and I don't see any noticeable performance difference.

I re-tested it on fuse but this time with strictlimiting disabled and
didn't notice any difference on that either, probably because with
strictlimiting off we don't run into the upper limit in that test so
there's no extra throttling that needs to be mitigated.

It's unclear to me how often (if at all?) real workloads run up
against their dirty/writeback limits.

>
> >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  fs/iomap/buffered-io.c | 136 ++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 128 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index bcc6e0e5334e..626c3c8399cc 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -20,6 +20,8 @@ struct iomap_folio_state {
> > >       spinlock_t              state_lock;
> > >       unsigned int            read_bytes_pending;
> > >       atomic_t                write_bytes_pending;
> > > +     /* number of pages being currently written back */
> > > +     unsigned                nr_pages_writeback;
> > >
> > >       /*
> > >        * Each block has two bits in this bitmap:
> > > @@ -81,6 +83,25 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
> > >       return test_bit(block + blks_per_folio, ifs->state);
> > >  }
> > >
> > > +static unsigned ifs_count_dirty_pages(struct folio *folio)
> > > +{
> > > +     struct iomap_folio_state *ifs = folio->private;
> > > +     struct inode *inode = folio->mapping->host;
> > > +     unsigned block_size = 1 << inode->i_blkbits;
> > > +     unsigned start_blk = 0;
> > > +     unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits),
> > > +                             i_blocks_per_folio(inode, folio));
> > > +     unsigned nblks = 0;
> > > +
> > > +     while (start_blk < end_blk) {
> > > +             if (ifs_block_is_dirty(folio, ifs, start_blk))
> > > +                     nblks++;
> > > +             start_blk++;
> > > +     }
> >
> > Hmm, isn't this bitmap_weight(ifs->state, blks_per_folio) ?
> >
> > Ohh wait no, the dirty bitmap doesn't start on a byte boundary because
> > the format of the bitmap is [uptodate bits][dirty bits].
> >
> > Maybe those two should be reversed, because I bet the dirty state gets
> > changed a lot more over the lifetime of a folio than the uptodate bits.
>
> I think there's the find_next_bit() helper (which Christoph also
> pointed out) that could probably be used here instead. Or at least
> that's how I see a lot of the driver code doing it for their bitmaps.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting
  2025-08-28  0:08       ` Joanne Koong
@ 2025-08-29 23:02         ` Joanne Koong
  0 siblings, 0 replies; 24+ messages in thread
From: Joanne Koong @ 2025-08-29 23:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-mm, brauner, willy, jack, hch, linux-fsdevel, kernel-team

On Wed, Aug 27, 2025 at 5:08 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, Aug 15, 2025 at 11:38 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Thu, Aug 14, 2025 at 9:38 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote:
> > > > Add granular dirty and writeback accounting for large folios. These
> > > > stats are used by the mm layer for dirty balancing and throttling.
> > > > Having granular dirty and writeback accounting helps prevent
> > > > over-aggressive balancing and throttling.
> > > >
> > > > There are 4 places in iomap this commit affects:
> > > > a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> > > > b) writeback_iter with setting the wbc->no_stats_accounting bit and
> > > > calling clear_dirty_for_io_stats()
> > > > c) starting writeback, which now calls __folio_start_writeback()
> > > > d) ending writeback, which now calls folio_end_writeback_pages()
> > > >
> > > > This relies on using the ifs->state dirty bitmap to track dirty pages in
> > > > the folio. As such, this can only be utilized on filesystems where the
> > > > block size >= PAGE_SIZE.
> > >
> > > Apologies for my slow responses this month. :)
> >
> > No worries at all, thanks for looking at this.
> > >
> > > I wonder, does this cause an observable change in the writeback
> > > accounting and throttling behavior for non-fuse filesystems like XFS
> > > that use large folios?  I *think* this does actually reduce throttling
> > > for XFS, but it might not be so noticeable because the limits are much
> > > more generous outside of fuse?
> >
> > I haven't run any benchmarks on non-fuse filesystems yet but that's
> > what I would expect too. Will run some benchmarks to see!
>
> I ran some benchmarks on xfs for the contrived test case I used for
> fuse (eg writing 2 GB in 128 MB chunks and then doing 50k 50-byte
> random writes) and I don't see any noticeable performance difference.
>
> I re-tested it on fuse but this time with strictlimiting disabled and
> didn't notice any difference on that either, probably because with
> strictlimiting off we don't run into the upper limit in that test so
> there's no extra throttling that needs to be mitigated.
>
> It's unclear to me how often (if at all?) real workloads run up
> against their dirty/writeback limits.
>

I benchmarked it again today but this time with manually setting
/proc/sys/vm/dirty_bytes to 20% of 16 GiB and
/proc/sys/vm/dirty_background_bytes to 10% of 16 GB and testing it on
a more intense workload (the original test scenario but on 10+
threads) and and I see results now on xfs, around 3 seconds (with some
variability of taking 0.3 seconds to 5 seconds sometimes) for writes
prior to this patchset vs. a pretty consistent 0.14 seconds with this
patchset. I ran the test scenario setup a few times but it'd be great
if someone else could also run it to verify it shows up on their
system too.

I set up xfs by following the instructions in the xfstests readme:
    # xfs_io -f -c "falloc 0 10g" test.img
    # xfs_io -f -c "falloc 0 10g" scratch.img
    # mkfs.xfs test.img
    # losetup /dev/loop0 ./test.img
    # losetup /dev/loop1 ./scratch.img
    # mkdir -p /mnt/test && mount /dev/loop0 /mnt/test

and then ran:
sudo sysctl -w vm.dirty_bytes=$((3276 * 1024 * 1024)) # roughly 20% of 16GB
sudo sysctl -w vm.dirty_background_bytes=$((1638*1024*1024)) # roughly
10% of 16 GB

and then ran this test program (ai-generated) https://pastebin.com/CbcwTXjq

I'll send out an updated v2 of this series.


Thanks,
Joanne

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-08-29 23:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01  0:21 [RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 01/10] mm: pass number of pages to __folio_start_writeback() Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 02/10] mm: pass number of pages to __folio_end_writeback() Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 03/10] mm: add folio_end_writeback_pages() helper Joanne Koong
2025-08-12  8:03   ` Christoph Hellwig
2025-08-01  0:21 ` [RFC PATCH v1 04/10] mm: pass number of pages dirtied to __folio_mark_dirty() Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 05/10] mm: add filemap_dirty_folio_pages() helper Joanne Koong
2025-08-01 17:07   ` Jan Kara
2025-08-01 21:47     ` Joanne Koong
2025-08-12  8:05   ` Christoph Hellwig
2025-08-01  0:21 ` [RFC PATCH v1 06/10] mm: add __folio_clear_dirty_for_io() helper Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 07/10] mm: add no_stats_accounting bitfield to wbc Joanne Koong
2025-08-12  8:06   ` Christoph Hellwig
2025-08-01  0:21 ` [RFC PATCH v1 08/10] mm: refactor clearing dirty stats into helper function Joanne Koong
2025-08-04 16:26   ` Jeff Layton
2025-08-01  0:21 ` [RFC PATCH v1 09/10] mm: add clear_dirty_for_io_stats() helper Joanne Koong
2025-08-01  0:21 ` [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting Joanne Koong
2025-08-12  8:15   ` Christoph Hellwig
2025-08-13  1:10     ` Joanne Koong
2025-08-13 22:03       ` Joanne Koong
2025-08-14 16:37   ` Darrick J. Wong
2025-08-15 18:38     ` Joanne Koong
2025-08-28  0:08       ` Joanne Koong
2025-08-29 23:02         ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).