Convert write_cache_pages() to an iterator v5

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Convert write_cache_pages() to an iterator v5
@ 2024-01-25  8:57 Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 01/19] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
                   ` (18 more replies)
  0 siblings, 19 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

Hi all,

this is an evolution of the series Matthew Wilcox originally sent in June
2023.  Based on comments from Jan and Brian this now actually untangles
some of the more confusing conditional in the writeback code before
refactoring it into the iterator.

This version also adds a new RFC patch at the end that totally changes
the API again to a while loop, based on commnts from Jan and some of
my idea.  Let me know if that is a good idea or not, and if yes I'll
fold the changes into the earlier patches.

Changes since v4:
 - added back the (rebased) iomap conversion now that the conflict is in
   mainline
 - add a new patch to change the iterator

Changes since v3:
 - various commit log spelling fixes
 - remove a statement from a commit log that isn't true any more with the
   changes in v3
 - rename a function
 - merge two helpers

Diffstat:
 fs/iomap/buffered-io.c    |   10 -
 include/linux/pagevec.h   |   18 ++
 include/linux/writeback.h |   12 +
 mm/page-writeback.c       |  344 ++++++++++++++++++++++++++--------------------
 4 files changed, 231 insertions(+), 153 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/19] writeback: fix done_index when hitting the wbc->nr_to_write
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 02/19] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

When write_cache_pages finishes writing out a folio, it fails to update
done_index to account for the number of pages in the folio just written.
That means when range_cyclic writeback is restarted, it will be
restarted at this folio instead of after it as it should.  Fix that
by updating done_index before breaking out of the loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index cd4e4ae77c40ae..ef2334291a7270 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2505,6 +2505,7 @@ int write_cache_pages(struct address_space *mapping,
 			 * keep going until we have written all the pages
 			 * we tagged for writeback prior to entering this loop.
 			 */
+			done_index = folio->index + nr;
 			wbc->nr_to_write -= nr;
 			if (wbc->nr_to_write <= 0 &&
 			    wbc->sync_mode == WB_SYNC_NONE) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/19] writeback: also update wbc->nr_to_write on writeback failure
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 01/19] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 03/19] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

When exiting write_cache_pages early due to a non-integrity write
failure, wbc->nr_to_write currently doesn't account for the folio
we just failed to write.  This doesn't matter because the callers
always ingore the value on a failure, but moving the update to
common code will allow to simplify the code, so do it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef2334291a7270..0c1e9c016bc48f 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
 			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 			error = writepage(folio, wbc, data);
 			nr = folio_nr_pages(folio);
+			wbc->nr_to_write -= nr;
 			if (unlikely(error)) {
 				/*
 				 * Handle errors according to the type of
@@ -2506,7 +2507,6 @@ int write_cache_pages(struct address_space *mapping,
 			 * we tagged for writeback prior to entering this loop.
 			 */
 			done_index = folio->index + nr;
-			wbc->nr_to_write -= nr;
 			if (wbc->nr_to_write <= 0 &&
 			    wbc->sync_mode == WB_SYNC_NONE) {
 				done = 1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/19] writeback: rework the loop termination condition in write_cache_pages
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 01/19] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 02/19] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 04/19] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

Rework the way we deal with the cleanup after the writepage call.

First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error
returns to get it out of the way of the actual error handling path.
Then merge the code to set ret for integrity vs non-integrity writeback.
For non-integrity writeback the loop is terminated on the first error, so
ret will never be non-zero.  Then use a single block to check for
non-integrity writewack to consolidate the cases where it returns for
either an error or running off the end of nr_to_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 62 +++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0c1e9c016bc48f..259c37bc34d2a7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2474,43 +2474,39 @@ int write_cache_pages(struct address_space *mapping,
 			error = writepage(folio, wbc, data);
 			nr = folio_nr_pages(folio);
 			wbc->nr_to_write -= nr;
-			if (unlikely(error)) {
-				/*
-				 * Handle errors according to the type of
-				 * writeback. There's no need to continue for
-				 * background writeback. Just push done_index
-				 * past this page so media errors won't choke
-				 * writeout for the entire file. For integrity
-				 * writeback, we must process the entire dirty
-				 * set regardless of errors because the fs may
-				 * still have state to clear for each page. In
-				 * that case we continue processing and return
-				 * the first error.
-				 */
-				if (error == AOP_WRITEPAGE_ACTIVATE) {
-					folio_unlock(folio);
-					error = 0;
-				} else if (wbc->sync_mode != WB_SYNC_ALL) {
-					ret = error;
-					done_index = folio->index + nr;
-					done = 1;
-					break;
-				}
-				if (!ret)
-					ret = error;
+
+			/*
+			 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
+			 * value.  Eventually all instances should just unlock
+			 * the folio themselves and return 0;
+			 */
+			if (error == AOP_WRITEPAGE_ACTIVATE) {
+				folio_unlock(folio);
+				error = 0;
 			}
 
 			/*
-			 * We stop writing back only if we are not doing
-			 * integrity sync. In case of integrity sync we have to
-			 * keep going until we have written all the pages
-			 * we tagged for writeback prior to entering this loop.
+			 * For integrity sync  we have to keep going until we
+			 * have written all the folios we tagged for writeback
+			 * prior to entering this loop, even if we run past
+			 * wbc->nr_to_write or encounter errors.  This is
+			 * because the file system may still have state to clear
+			 * for each folio.   We'll eventually return the first
+			 * error encountered.
+			 *
+			 * For background writeback just push done_index past
+			 * this folio so that we can just restart where we left
+			 * off and media errors won't choke writeout for the
+			 * entire file.
 			 */
-			done_index = folio->index + nr;
-			if (wbc->nr_to_write <= 0 &&
-			    wbc->sync_mode == WB_SYNC_NONE) {
-				done = 1;
-				break;
+			if (error && !ret)
+				ret = error;
+			if (wbc->sync_mode == WB_SYNC_NONE) {
+				if (ret || wbc->nr_to_write <= 0) {
+					done_index = folio->index + nr;
+					done = 1;
+					break;
+				}
 			}
 		}
 		folio_batch_release(&fbatch);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/19] writeback: only update ->writeback_index for range_cyclic writeback
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 03/19] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 05/19] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

mapping->writeback_index is only [1] used as the starting point for
range_cyclic writeback, so there is no point in updating it for other
types of writeback.

[1] except for btrfs_defrag_file which does really odd things with
mapping->writeback_index.  But btrfs doesn't use write_cache_pages at
all, so this isn't relevant here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 259c37bc34d2a7..437745a511c634 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2403,7 +2403,6 @@ int write_cache_pages(struct address_space *mapping,
 	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
 	pgoff_t done_index;
-	int range_whole = 0;
 	xa_mark_t tag;
 
 	folio_batch_init(&fbatch);
@@ -2413,8 +2412,6 @@ int write_cache_pages(struct address_space *mapping,
 	} else {
 		index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
-		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
-			range_whole = 1;
 	}
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
 		tag_pages_for_writeback(mapping, index, end);
@@ -2514,14 +2511,21 @@ int write_cache_pages(struct address_space *mapping,
 	}
 
 	/*
-	 * If we hit the last page and there is more work to be done: wrap
-	 * back the index back to the start of the file for the next
-	 * time we are called.
+	 * For range cyclic writeback we need to remember where we stopped so
+	 * that we can continue there next time we are called.  If  we hit the
+	 * last page and there is more work to be done, wrap back to the start
+	 * of the file.
+	 *
+	 * For non-cyclic writeback we always start looking up at the beginning
+	 * of the file if we are called again, which can only happen due to
+	 * -ENOMEM from the file system.
 	 */
-	if (wbc->range_cyclic && !done)
-		done_index = 0;
-	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
-		mapping->writeback_index = done_index;
+	if (wbc->range_cyclic) {
+		if (done)
+			mapping->writeback_index = done_index;
+		else
+			mapping->writeback_index = 0;
+	}
 
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/19] writeback: remove a duplicate prototype for tag_pages_for_writeback
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 04/19] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 06/19] writeback: Factor out writeback_finish() Christoph Hellwig
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: split from a larger patch]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 453736fd1d23ce..4b8cf9e4810bad 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -363,8 +363,6 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb);
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
 
-void tag_pages_for_writeback(struct address_space *mapping,
-			     pgoff_t start, pgoff_t end);
 int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/19] writeback: Factor out writeback_finish()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 05/19] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-29 20:13   ` Brian Foster
  2024-01-25  8:57 ` [PATCH 07/19] writeback: Factor writeback_get_batch() out of write_cache_pages() Christoph Hellwig
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Instead of having a 'done' variable that controls the nested loops,
have a writeback_finish() that can be returned directly.  This involves
keeping more things in writeback_control, but it's just moving stuff
allocated on the stack to being allocated slightly earlier on the stack.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: heavily rebased, reordered and commented struct writeback_control]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h |  6 +++
 mm/page-writeback.c       | 79 ++++++++++++++++++++-------------------
 2 files changed, 47 insertions(+), 38 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 4b8cf9e4810bad..7d60a68fa4ea47 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -11,6 +11,7 @@
 #include <linux/flex_proportions.h>
 #include <linux/backing-dev-defs.h>
 #include <linux/blk_types.h>
+#include <linux/pagevec.h>
 
 struct bio;
 
@@ -40,6 +41,7 @@ enum writeback_sync_modes {
  * in a manner such that unspecified fields are set to zero.
  */
 struct writeback_control {
+	/* public fields that can be set and/or consumed by the caller: */
 	long nr_to_write;		/* Write this many pages, and decrement
 					   this for each page written */
 	long pages_skipped;		/* Pages which were not written */
@@ -77,6 +79,10 @@ struct writeback_control {
 	 */
 	struct swap_iocb **swap_plug;
 
+	/* internal fields used by the ->writepages implementation: */
+	struct folio_batch fbatch;
+	int err;
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
 	struct inode *inode;		/* inode being written out */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 437745a511c634..fcd90a176d806c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,6 +2360,29 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
+static void writeback_finish(struct address_space *mapping,
+		struct writeback_control *wbc, pgoff_t done_index)
+{
+	folio_batch_release(&wbc->fbatch);
+
+	/*
+	 * For range cyclic writeback we need to remember where we stopped so
+	 * that we can continue there next time we are called.  If  we hit the
+	 * last page and there is more work to be done, wrap back to the start
+	 * of the file.
+	 *
+	 * For non-cyclic writeback we always start looking up at the beginning
+	 * of the file if we are called again, which can only happen due to
+	 * -ENOMEM from the file system.
+	 */
+	if (wbc->range_cyclic) {
+		if (wbc->err || wbc->nr_to_write <= 0)
+			mapping->writeback_index = done_index;
+		else
+			mapping->writeback_index = 0;
+	}
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2395,17 +2418,12 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data)
 {
-	int ret = 0;
-	int done = 0;
 	int error;
-	struct folio_batch fbatch;
 	int nr_folios;
 	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
-	pgoff_t done_index;
 	xa_mark_t tag;
 
-	folio_batch_init(&fbatch);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* prev offset */
 		end = -1;
@@ -2419,22 +2437,23 @@ int write_cache_pages(struct address_space *mapping,
 	} else {
 		tag = PAGECACHE_TAG_DIRTY;
 	}
-	done_index = index;
-	while (!done && (index <= end)) {
+
+	folio_batch_init(&wbc->fbatch);
+	wbc->err = 0;
+
+	while (index <= end) {
 		int i;
 
 		nr_folios = filemap_get_folios_tag(mapping, &index, end,
-				tag, &fbatch);
+				tag, &wbc->fbatch);
 
 		if (nr_folios == 0)
 			break;
 
 		for (i = 0; i < nr_folios; i++) {
-			struct folio *folio = fbatch.folios[i];
+			struct folio *folio = wbc->fbatch.folios[i];
 			unsigned long nr;
 
-			done_index = folio->index;
-
 			folio_lock(folio);
 
 			/*
@@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
 				folio_unlock(folio);
 				error = 0;
 			}
+		
+			if (error && !wbc->err)
+				wbc->err = error;
 
 			/*
 			 * For integrity sync  we have to keep going until we
@@ -2496,38 +2518,19 @@ int write_cache_pages(struct address_space *mapping,
 			 * off and media errors won't choke writeout for the
 			 * entire file.
 			 */
-			if (error && !ret)
-				ret = error;
-			if (wbc->sync_mode == WB_SYNC_NONE) {
-				if (ret || wbc->nr_to_write <= 0) {
-					done_index = folio->index + nr;
-					done = 1;
-					break;
-				}
+			if (wbc->sync_mode == WB_SYNC_NONE &&
+			    (wbc->err || wbc->nr_to_write <= 0)) {
+				writeback_finish(mapping, wbc,
+						folio->index + nr);
+				return error;
 			}
 		}
-		folio_batch_release(&fbatch);
+		folio_batch_release(&wbc->fbatch);
 		cond_resched();
 	}
 
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		if (done)
-			mapping->writeback_index = done_index;
-		else
-			mapping->writeback_index = 0;
-	}
-
-	return ret;
+	writeback_finish(mapping, wbc, 0);
+	return 0;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/19] writeback: Factor writeback_get_batch() out of write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 06/19] writeback: Factor out writeback_finish() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 08/19] writeback: Factor folio_prepare_writeback() " Christoph Hellwig
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This simple helper will be the basis of the writeback iterator.
To make this work, we need to remember the current index
and end positions in writeback_control.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: heavily rebased, add helpers to get the tag and end index,
      don't keep the end index in struct writeback_control]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h |  1 +
 mm/page-writeback.c       | 49 +++++++++++++++++++++++++--------------
 2 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 7d60a68fa4ea47..a091817a5dba55 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -81,6 +81,7 @@ struct writeback_control {
 
 	/* internal fields used by the ->writepages implementation: */
 	struct folio_batch fbatch;
+	pgoff_t index;
 	int err;
 
 #ifdef CONFIG_CGROUP_WRITEBACK
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fcd90a176d806c..426d929741e46e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2383,6 +2383,29 @@ static void writeback_finish(struct address_space *mapping,
 	}
 }
 
+static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
+{
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		return PAGECACHE_TAG_TOWRITE;
+	return PAGECACHE_TAG_DIRTY;
+}
+
+static pgoff_t wbc_end(struct writeback_control *wbc)
+{
+	if (wbc->range_cyclic)
+		return -1;
+	return wbc->range_end >> PAGE_SHIFT;
+}
+
+static void writeback_get_batch(struct address_space *mapping,
+		struct writeback_control *wbc)
+{
+	folio_batch_release(&wbc->fbatch);
+	cond_resched();
+	filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+			wbc_to_tag(wbc), &wbc->fbatch);
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2419,38 +2442,30 @@ int write_cache_pages(struct address_space *mapping,
 		      void *data)
 {
 	int error;
-	int nr_folios;
-	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
-	xa_mark_t tag;
 
 	if (wbc->range_cyclic) {
-		index = mapping->writeback_index; /* prev offset */
+		wbc->index = mapping->writeback_index; /* prev offset */
 		end = -1;
 	} else {
-		index = wbc->range_start >> PAGE_SHIFT;
+		wbc->index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
 	}
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
-		tag_pages_for_writeback(mapping, index, end);
-		tag = PAGECACHE_TAG_TOWRITE;
-	} else {
-		tag = PAGECACHE_TAG_DIRTY;
-	}
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag_pages_for_writeback(mapping, wbc->index, end);
 
 	folio_batch_init(&wbc->fbatch);
 	wbc->err = 0;
 
-	while (index <= end) {
+	while (wbc->index <= end) {
 		int i;
 
-		nr_folios = filemap_get_folios_tag(mapping, &index, end,
-				tag, &wbc->fbatch);
+		writeback_get_batch(mapping, wbc);
 
-		if (nr_folios == 0)
+		if (wbc->fbatch.nr == 0)
 			break;
 
-		for (i = 0; i < nr_folios; i++) {
+		for (i = 0; i < wbc->fbatch.nr; i++) {
 			struct folio *folio = wbc->fbatch.folios[i];
 			unsigned long nr;
 
@@ -2525,8 +2540,6 @@ int write_cache_pages(struct address_space *mapping,
 				return error;
 			}
 		}
-		folio_batch_release(&wbc->fbatch);
-		cond_resched();
 	}
 
 	writeback_finish(mapping, wbc, 0);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/19] writeback: Factor folio_prepare_writeback() out of write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 07/19] writeback: Factor writeback_get_batch() out of write_cache_pages() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 09/19] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Reduce write_cache_pages() by about 30 lines; much of it is commentary,
but it all bundles nicely into an obvious function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: rename should_writeback_folio to folio_prepare_writeback based on
      a comment from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 61 +++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 426d929741e46e..cec683c7217d2e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2397,6 +2397,38 @@ static pgoff_t wbc_end(struct writeback_control *wbc)
 	return wbc->range_end >> PAGE_SHIFT;
 }
 
+static bool folio_prepare_writeback(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio)
+{
+	/*
+	 * Folio truncated or invalidated. We can freely skip it then,
+	 * even for data integrity operations: the folio has disappeared
+	 * concurrently, so there could be no real expectation of this
+	 * data integrity operation even if there is now a new, dirty
+	 * folio at the same pagecache index.
+	 */
+	if (unlikely(folio->mapping != mapping))
+		return false;
+
+	/*
+	 * Did somebody else write it for us?
+	 */
+	if (!folio_test_dirty(folio))
+		return false;
+
+	if (folio_test_writeback(folio)) {
+		if (wbc->sync_mode == WB_SYNC_NONE)
+			return false;
+		folio_wait_writeback(folio);
+	}
+	BUG_ON(folio_test_writeback(folio));
+
+	if (!folio_clear_dirty_for_io(folio))
+		return false;
+
+	return true;
+}
+
 static void writeback_get_batch(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
@@ -2470,38 +2502,13 @@ int write_cache_pages(struct address_space *mapping,
 			unsigned long nr;
 
 			folio_lock(folio);
-
-			/*
-			 * Page truncated or invalidated. We can freely skip it
-			 * then, even for data integrity operations: the page
-			 * has disappeared concurrently, so there could be no
-			 * real expectation of this data integrity operation
-			 * even if there is now a new, dirty page at the same
-			 * pagecache address.
-			 */
-			if (unlikely(folio->mapping != mapping)) {
-continue_unlock:
+			if (!folio_prepare_writeback(mapping, wbc, folio)) {
 				folio_unlock(folio);
 				continue;
 			}
 
-			if (!folio_test_dirty(folio)) {
-				/* someone wrote it for us */
-				goto continue_unlock;
-			}
-
-			if (folio_test_writeback(folio)) {
-				if (wbc->sync_mode != WB_SYNC_NONE)
-					folio_wait_writeback(folio);
-				else
-					goto continue_unlock;
-			}
-
-			BUG_ON(folio_test_writeback(folio));
-			if (!folio_clear_dirty_for_io(folio))
-				goto continue_unlock;
-
 			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+
 			error = writepage(folio, wbc, data);
 			nr = folio_nr_pages(folio);
 			wbc->nr_to_write -= nr;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/19] writeback: Simplify the loops in write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 08/19] writeback: Factor folio_prepare_writeback() " Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 10/19] pagevec: Add ability to iterate a queue Christoph Hellwig
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Collapse the two nested loops into one.  This is needed as a step
towards turning this into an iterator.

Note that this drops the "index <= end" check in the previous outer loop
and just relies on filemap_get_folios_tag() to return 0 entries when
index > end.  This actually has a subtle implication when end == -1
because then the returned index will be -1 as well and thus if there is
page present on index -1, we could be looping indefinitely. But as the
comment in filemap_get_folios_tag documents this as already broken anyway
we should not worry about it here either.  The fix for that would
probably a change to the filemap_get_folios_tag() calling convention.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: updated the commit log based on feedback from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 94 ++++++++++++++++++++++-----------------------
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index cec683c7217d2e..d6ac414ddce9ca 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2475,6 +2475,7 @@ int write_cache_pages(struct address_space *mapping,
 {
 	int error;
 	pgoff_t end;		/* Inclusive */
+	int i = 0;
 
 	if (wbc->range_cyclic) {
 		wbc->index = mapping->writeback_index; /* prev offset */
@@ -2489,63 +2490,60 @@ int write_cache_pages(struct address_space *mapping,
 	folio_batch_init(&wbc->fbatch);
 	wbc->err = 0;
 
-	while (wbc->index <= end) {
-		int i;
-
-		writeback_get_batch(mapping, wbc);
+	for (;;) {
+		struct folio *folio;
+		unsigned long nr;
 
+		if (i == wbc->fbatch.nr) {
+			writeback_get_batch(mapping, wbc);
+			i = 0;
+		}
 		if (wbc->fbatch.nr == 0)
 			break;
 
-		for (i = 0; i < wbc->fbatch.nr; i++) {
-			struct folio *folio = wbc->fbatch.folios[i];
-			unsigned long nr;
+		folio = wbc->fbatch.folios[i++];
 
-			folio_lock(folio);
-			if (!folio_prepare_writeback(mapping, wbc, folio)) {
-				folio_unlock(folio);
-				continue;
-			}
+		folio_lock(folio);
+		if (!folio_prepare_writeback(mapping, wbc, folio)) {
+			folio_unlock(folio);
+			continue;
+		}
 
-			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+		trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 
-			error = writepage(folio, wbc, data);
-			nr = folio_nr_pages(folio);
-			wbc->nr_to_write -= nr;
+		error = writepage(folio, wbc, data);
+		nr = folio_nr_pages(folio);
+		wbc->nr_to_write -= nr;
 
-			/*
-			 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
-			 * value.  Eventually all instances should just unlock
-			 * the folio themselves and return 0;
-			 */
-			if (error == AOP_WRITEPAGE_ACTIVATE) {
-				folio_unlock(folio);
-				error = 0;
-			}
-		
-			if (error && !wbc->err)
-				wbc->err = error;
+		/*
+		 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
+		 * Eventually all instances should just unlock the folio
+		 * themselves and return 0;
+		 */
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
 
-			/*
-			 * For integrity sync  we have to keep going until we
-			 * have written all the folios we tagged for writeback
-			 * prior to entering this loop, even if we run past
-			 * wbc->nr_to_write or encounter errors.  This is
-			 * because the file system may still have state to clear
-			 * for each folio.   We'll eventually return the first
-			 * error encountered.
-			 *
-			 * For background writeback just push done_index past
-			 * this folio so that we can just restart where we left
-			 * off and media errors won't choke writeout for the
-			 * entire file.
-			 */
-			if (wbc->sync_mode == WB_SYNC_NONE &&
-			    (wbc->err || wbc->nr_to_write <= 0)) {
-				writeback_finish(mapping, wbc,
-						folio->index + nr);
-				return error;
-			}
+		if (error && !wbc->err)
+			wbc->err = error;
+
+		/*
+		 * For integrity sync  we have to keep going until we have
+		 * written all the folios we tagged for writeback prior to
+		 * entering this loop, even if we run past wbc->nr_to_write or
+		 * encounter errors.  This is because the file system may still
+		 * have state to clear for each folio.   We'll eventually return
+		 * the first error encountered.
+		 *
+		 * For background writeback just push done_index past this folio
+		 * so that we can just restart where we left off and media
+		 * errors won't choke writeout for the entire file.
+		 */
+		if (wbc->sync_mode == WB_SYNC_NONE &&
+		    (wbc->err || wbc->nr_to_write <= 0)) {
+			writeback_finish(mapping, wbc, folio->index + nr);
+			return error;
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/19] pagevec: Add ability to iterate a queue
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 09/19] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 11/19] writeback: Use the folio_batch queue iterator Christoph Hellwig
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack.  It would
generate some very weird and suboptimal I/O patterns for page writeback
to iterate over the batch as a stack.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/pagevec.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 87cc678adc850b..fcc06c300a72c3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,6 +27,7 @@ struct folio;
  */
 struct folio_batch {
 	unsigned char nr;
+	unsigned char i;
 	bool percpu_pvec_drained;
 	struct folio *folios[PAGEVEC_SIZE];
 };
@@ -40,12 +41,14 @@ struct folio_batch {
 static inline void folio_batch_init(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->i = 0;
 	fbatch->percpu_pvec_drained = false;
 }
 
 static inline void folio_batch_reinit(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->i = 0;
 }
 
 static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
 	return folio_batch_space(fbatch);
 }
 
+/**
+ * folio_batch_next - Return the next folio to process.
+ * @fbatch: The folio batch being processed.
+ *
+ * Use this function to implement a queue of folios.
+ *
+ * Return: The next folio in the queue, or NULL if the queue is empty.
+ */
+static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
+{
+	if (fbatch->i == fbatch->nr)
+		return NULL;
+	return fbatch->folios[fbatch->i++];
+}
+
 void __folio_batch_release(struct folio_batch *pvec);
 
 static inline void folio_batch_release(struct folio_batch *fbatch)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/19] writeback: Use the folio_batch queue iterator
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 10/19] pagevec: Add ability to iterate a queue Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 12/19] writeback: Factor writeback_iter_init() out of write_cache_pages() Christoph Hellwig
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Instead of keeping our own local iterator variable, use the one just
added to folio_batch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d6ac414ddce9ca..432bb42d0829d1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2429,13 +2429,21 @@ static bool folio_prepare_writeback(struct address_space *mapping,
 	return true;
 }
 
-static void writeback_get_batch(struct address_space *mapping,
+static struct folio *writeback_get_folio(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
-	folio_batch_release(&wbc->fbatch);
-	cond_resched();
-	filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
-			wbc_to_tag(wbc), &wbc->fbatch);
+	struct folio *folio;
+
+	folio = folio_batch_next(&wbc->fbatch);
+	if (!folio) {
+		folio_batch_release(&wbc->fbatch);
+		cond_resched();
+		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+				wbc_to_tag(wbc), &wbc->fbatch);
+		folio = folio_batch_next(&wbc->fbatch);
+	}
+
+	return folio;
 }
 
 /**
@@ -2475,7 +2483,6 @@ int write_cache_pages(struct address_space *mapping,
 {
 	int error;
 	pgoff_t end;		/* Inclusive */
-	int i = 0;
 
 	if (wbc->range_cyclic) {
 		wbc->index = mapping->writeback_index; /* prev offset */
@@ -2491,18 +2498,12 @@ int write_cache_pages(struct address_space *mapping,
 	wbc->err = 0;
 
 	for (;;) {
-		struct folio *folio;
+		struct folio *folio = writeback_get_folio(mapping, wbc);
 		unsigned long nr;
 
-		if (i == wbc->fbatch.nr) {
-			writeback_get_batch(mapping, wbc);
-			i = 0;
-		}
-		if (wbc->fbatch.nr == 0)
+		if (!folio)
 			break;
 
-		folio = wbc->fbatch.folios[i++];
-
 		folio_lock(folio);
 		if (!folio_prepare_writeback(mapping, wbc, folio)) {
 			folio_unlock(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/19] writeback: Factor writeback_iter_init() out of write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 11/19] writeback: Use the folio_batch queue iterator Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 13/19] writeback: Move the folio_prepare_writeback loop " Christoph Hellwig
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Make it return the first folio in the batch so that we can use it
in a typical for() pattern.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 432bb42d0829d1..ae9f659e6796ac 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2446,6 +2446,22 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 	return folio;
 }
 
+static struct folio *writeback_iter_init(struct address_space *mapping,
+		struct writeback_control *wbc)
+{
+	if (wbc->range_cyclic)
+		wbc->index = mapping->writeback_index; /* prev offset */
+	else
+		wbc->index = wbc->range_start >> PAGE_SHIFT;
+
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
+
+	wbc->err = 0;
+	folio_batch_init(&wbc->fbatch);
+	return writeback_get_folio(mapping, wbc);
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2481,29 +2497,14 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data)
 {
+	struct folio *folio;
 	int error;
-	pgoff_t end;		/* Inclusive */
 
-	if (wbc->range_cyclic) {
-		wbc->index = mapping->writeback_index; /* prev offset */
-		end = -1;
-	} else {
-		wbc->index = wbc->range_start >> PAGE_SHIFT;
-		end = wbc->range_end >> PAGE_SHIFT;
-	}
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, wbc->index, end);
-
-	folio_batch_init(&wbc->fbatch);
-	wbc->err = 0;
-
-	for (;;) {
-		struct folio *folio = writeback_get_folio(mapping, wbc);
+	for (folio = writeback_iter_init(mapping, wbc);
+	     folio;
+	     folio = writeback_get_folio(mapping, wbc)) {
 		unsigned long nr;
 
-		if (!folio)
-			break;
-
 		folio_lock(folio);
 		if (!folio_prepare_writeback(mapping, wbc, folio)) {
 			folio_unlock(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/19] writeback: Move the folio_prepare_writeback loop out of write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 12/19] writeback: Factor writeback_iter_init() out of write_cache_pages() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 14/19] writeback: Factor writeback_iter_next() " Christoph Hellwig
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Move the loop for should-we-write-this-folio to writeback_get_folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: folded the loop into the existing helper instead of a separate one
      as suggested by Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ae9f659e6796ac..d170dab07402ce 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2434,6 +2434,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 {
 	struct folio *folio;
 
+retry:
 	folio = folio_batch_next(&wbc->fbatch);
 	if (!folio) {
 		folio_batch_release(&wbc->fbatch);
@@ -2441,8 +2442,17 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
+		if (!folio)
+			return NULL;
 	}
 
+	folio_lock(folio);
+	if (unlikely(!folio_prepare_writeback(mapping, wbc, folio))) {
+		folio_unlock(folio);
+		goto retry;
+	}
+
+	trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 	return folio;
 }
 
@@ -2505,14 +2515,6 @@ int write_cache_pages(struct address_space *mapping,
 	     folio = writeback_get_folio(mapping, wbc)) {
 		unsigned long nr;
 
-		folio_lock(folio);
-		if (!folio_prepare_writeback(mapping, wbc, folio)) {
-			folio_unlock(folio);
-			continue;
-		}
-
-		trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
-
 		error = writepage(folio, wbc, data);
 		nr = folio_nr_pages(folio);
 		wbc->nr_to_write -= nr;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/19] writeback: Factor writeback_iter_next() out of write_cache_pages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 13/19] writeback: Move the folio_prepare_writeback loop " Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 15/19] writeback: Add for_each_writeback_folio() Christoph Hellwig
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Pull the post-processing of the writepage_t callback into a separate
function.  That means changing writeback_get_next() to call
writeback_finish() when we naturally run out of folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 85 ++++++++++++++++++++++++---------------------
 1 file changed, 45 insertions(+), 40 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d170dab07402ce..d5815237fbec29 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2442,8 +2442,10 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
-		if (!folio)
+		if (!folio) {
+			writeback_finish(mapping, wbc, 0);
 			return NULL;
+		}
 	}
 
 	folio_lock(folio);
@@ -2472,6 +2474,46 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
 	return writeback_get_folio(mapping, wbc);
 }
 
+static struct folio *writeback_iter_next(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int error)
+{
+	unsigned long nr = folio_nr_pages(folio);
+
+	wbc->nr_to_write -= nr;
+
+	/*
+	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
+	 * Eventually all instances should just unlock the folio themselves and
+	 * return 0;
+	 */
+	if (error == AOP_WRITEPAGE_ACTIVATE) {
+		folio_unlock(folio);
+		error = 0;
+	}
+
+	if (error && !wbc->err)
+		wbc->err = error;
+
+	/*
+	 * For integrity sync  we have to keep going until we have written all
+	 * the folios we tagged for writeback prior to entering the writeback
+	 * loop, even if we run past wbc->nr_to_write or encounter errors.
+	 * This is because the file system may still have state to clear for
+	 * each folio.   We'll eventually return the first error encountered.
+	 *
+	 * For background writeback just push done_index past this folio so that
+	 * we can just restart where we left off and media errors won't choke
+	 * writeout for the entire file.
+	 */
+	if (wbc->sync_mode == WB_SYNC_NONE &&
+	    (wbc->err || wbc->nr_to_write <= 0)) {
+		writeback_finish(mapping, wbc, folio->index + nr);
+		return NULL;
+	}
+
+	return writeback_get_folio(mapping, wbc);
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2512,47 +2554,10 @@ int write_cache_pages(struct address_space *mapping,
 
 	for (folio = writeback_iter_init(mapping, wbc);
 	     folio;
-	     folio = writeback_get_folio(mapping, wbc)) {
-		unsigned long nr;
-
+	     folio = writeback_iter_next(mapping, wbc, folio, error))
 		error = writepage(folio, wbc, data);
-		nr = folio_nr_pages(folio);
-		wbc->nr_to_write -= nr;
-
-		/*
-		 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
-		 * Eventually all instances should just unlock the folio
-		 * themselves and return 0;
-		 */
-		if (error == AOP_WRITEPAGE_ACTIVATE) {
-			folio_unlock(folio);
-			error = 0;
-		}
-
-		if (error && !wbc->err)
-			wbc->err = error;
 
-		/*
-		 * For integrity sync  we have to keep going until we have
-		 * written all the folios we tagged for writeback prior to
-		 * entering this loop, even if we run past wbc->nr_to_write or
-		 * encounter errors.  This is because the file system may still
-		 * have state to clear for each folio.   We'll eventually return
-		 * the first error encountered.
-		 *
-		 * For background writeback just push done_index past this folio
-		 * so that we can just restart where we left off and media
-		 * errors won't choke writeout for the entire file.
-		 */
-		if (wbc->sync_mode == WB_SYNC_NONE &&
-		    (wbc->err || wbc->nr_to_write <= 0)) {
-			writeback_finish(mapping, wbc, folio->index + nr);
-			return error;
-		}
-	}
-
-	writeback_finish(mapping, wbc, 0);
-	return 0;
+	return wbc->err;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 15/19] writeback: Add for_each_writeback_folio()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 14/19] writeback: Factor writeback_iter_next() " Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 16/19] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Wrap up the iterator with a nice bit of syntactic sugar.  Now the
caller doesn't need to know about wbc->err and can just return error,
not knowing that the iterator took care of storing errors correctly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h | 10 ++++++++++
 mm/page-writeback.c       |  8 +++-----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a091817a5dba55..2416da933440e2 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -367,6 +367,16 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
+struct folio *writeback_iter_init(struct address_space *mapping,
+		struct writeback_control *wbc);
+struct folio *writeback_iter_next(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int error);
+
+#define for_each_writeback_folio(mapping, wbc, folio, error)		\
+	for (folio = writeback_iter_init(mapping, wbc);			\
+	     folio || ((error = wbc->err), false);			\
+	     folio = writeback_iter_next(mapping, wbc, folio, error))
+
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d5815237fbec29..aca0f43021a20c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2458,7 +2458,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 	return folio;
 }
 
-static struct folio *writeback_iter_init(struct address_space *mapping,
+struct folio *writeback_iter_init(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
 	if (wbc->range_cyclic)
@@ -2474,7 +2474,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
 	return writeback_get_folio(mapping, wbc);
 }
 
-static struct folio *writeback_iter_next(struct address_space *mapping,
+struct folio *writeback_iter_next(struct address_space *mapping,
 		struct writeback_control *wbc, struct folio *folio, int error)
 {
 	unsigned long nr = folio_nr_pages(folio);
@@ -2552,9 +2552,7 @@ int write_cache_pages(struct address_space *mapping,
 	struct folio *folio;
 	int error;
 
-	for (folio = writeback_iter_init(mapping, wbc);
-	     folio;
-	     folio = writeback_iter_next(mapping, wbc, folio, error))
+	for_each_writeback_folio(mapping, wbc, folio, error)
 		error = writepage(folio, wbc, data);
 
 	return wbc->err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 16/19] writeback: Remove a use of write_cache_pages() from do_writepages()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 15/19] writeback: Add for_each_writeback_folio() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 17/19] writeback: update the kerneldoc comment for tag_pages_for_writeback Christoph Hellwig
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new for_each_writeback_folio() directly instead of indirecting
through a callback.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index aca0f43021a20c..81034d5d72e1f4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2559,13 +2559,21 @@ int write_cache_pages(struct address_space *mapping,
 }
 EXPORT_SYMBOL(write_cache_pages);
 
-static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
-		void *data)
+static int writeback_use_writepage(struct address_space *mapping,
+		struct writeback_control *wbc)
 {
-	struct address_space *mapping = data;
-	int ret = mapping->a_ops->writepage(&folio->page, wbc);
-	mapping_set_error(mapping, ret);
-	return ret;
+	struct blk_plug plug;
+	struct folio *folio;
+	int err;
+
+	blk_start_plug(&plug);
+	for_each_writeback_folio(mapping, wbc, folio, err) {
+		err = mapping->a_ops->writepage(&folio->page, wbc);
+		mapping_set_error(mapping, err);
+	}
+	blk_finish_plug(&plug);
+
+	return err;
 }
 
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
@@ -2581,12 +2589,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 		if (mapping->a_ops->writepages) {
 			ret = mapping->a_ops->writepages(mapping, wbc);
 		} else if (mapping->a_ops->writepage) {
-			struct blk_plug plug;
-
-			blk_start_plug(&plug);
-			ret = write_cache_pages(mapping, wbc, writepage_cb,
-						mapping);
-			blk_finish_plug(&plug);
+			ret = writeback_use_writepage(mapping, wbc);
 		} else {
 			/* deal with chardevs and other special files */
 			ret = 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 17/19] writeback: update the kerneldoc comment for tag_pages_for_writeback
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 16/19] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio() Christoph Hellwig
  2024-01-25  8:57 ` [PATCH 19/19] writeback: simplify writeback iteration Christoph Hellwig
  18 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Dave Chinner, Jan Kara

Don't refer to write_cache_pages, which now is just a wrapper for the
writeback iterator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 mm/page-writeback.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81034d5d72e1f4..2a4b5aee5decd9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
 }
 
 /**
- * tag_pages_for_writeback - tag pages to be written by write_cache_pages
+ * tag_pages_for_writeback - tag pages to be written by writeback
  * @mapping: address space structure to write
  * @start: starting page index
  * @end: ending page index (inclusive)
  *
  * This function scans the page range from @start to @end (inclusive) and tags
- * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
- * that write_cache_pages (or whoever calls this function) will then use
- * TOWRITE tag to identify pages eligible for writeback.  This mechanism is
- * used to avoid livelocking of writeback by a process steadily creating new
- * dirty pages in the file (thus it is important for this function to be quick
- * so that it can tag pages faster than a dirtying process can create them).
+ * all pages that have DIRTY tag set with a special TOWRITE tag.  The caller
+ * can then use the TOWRITE tag to identify pages eligible for writeback.
+ * This mechanism is used to avoid livelocking of writeback by a process
+ * steadily creating new dirty pages in the file (thus it is important for this
+ * function to be quick so that it can tag pages faster than a dirtying process
+ * can create them).
  */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio()
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 17/19] writeback: update the kerneldoc comment for tag_pages_for_writeback Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-30 10:03   ` Jan Kara
  2024-01-25  8:57 ` [PATCH 19/19] writeback: simplify writeback iteration Christoph Hellwig
  18 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

From: Matthew Wilcox <willy@infradead.org>

This removes one indirect function call per folio, and adds typesafety
by not casting through a void pointer.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 093c4515b22a53..58b3661f5eac9e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1887,9 +1887,8 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
  * regular allocated space.
  */
 static int iomap_do_writepage(struct folio *folio,
-		struct writeback_control *wbc, void *data)
+		struct writeback_control *wbc, struct iomap_writepage_ctx *wpc)
 {
-	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = folio->mapping->host;
 	u64 end_pos, isize;
 
@@ -1986,10 +1985,12 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 		struct iomap_writepage_ctx *wpc,
 		const struct iomap_writeback_ops *ops)
 {
-	int			ret;
+	struct folio *folio;
+	int ret;
 
 	wpc->ops = ops;
-	ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
+	for_each_writeback_folio(mapping, wbc, folio, ret)
+		ret = iomap_do_writepage(folio, wbc, wpc);
 	if (!wpc->ioend)
 		return ret;
 	return iomap_submit_ioend(wpc, wpc->ioend, ret);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2024-01-25  8:57 ` [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio() Christoph Hellwig
@ 2024-01-25  8:57 ` Christoph Hellwig
  2024-01-30 10:46   ` Jan Kara
  18 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

Based on the feedback from Jan I've tried to figure out how to
avoid the error magic in the for_each_writeback_folio.  This patch
tries to implement this by switching to an open while loop over a
single writeback_iter() function:

	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
		...
	}

the twist here is that the error value is passed by reference, so that
the iterator can restore it when breaking out of the loop.

Additionally it moves the AOP_WRITEPAGE_ACTIVATE out of the iterator
and into the callers, in preparation for eventually killing it off
with the phase out of write_cache_pages().

To me this form of the loop feels easier to follow, and it has the
added advantage that writeback_iter() can actually be nicely used in
nested loops, which should help with further iterizing the iomap
writeback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c    |   7 +-
 include/linux/writeback.h |  11 +--
 mm/page-writeback.c       | 174 +++++++++++++++++++++-----------------
 3 files changed, 102 insertions(+), 90 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58b3661f5eac9e..1593a783176ca2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1985,12 +1985,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 		struct iomap_writepage_ctx *wpc,
 		const struct iomap_writeback_ops *ops)
 {
-	struct folio *folio;
-	int ret;
+	struct folio *folio = NULL;
+	int ret = 0;
 
 	wpc->ops = ops;
-	for_each_writeback_folio(mapping, wbc, folio, ret)
+	while ((folio = writeback_iter(mapping, wbc, folio, &ret)))
 		ret = iomap_do_writepage(folio, wbc, wpc);
+
 	if (!wpc->ioend)
 		return ret;
 	return iomap_submit_ioend(wpc, wpc->ioend, ret);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 2416da933440e2..fc4605627496fc 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -367,15 +367,8 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc);
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error);
-
-#define for_each_writeback_folio(mapping, wbc, folio, error)		\
-	for (folio = writeback_iter_init(mapping, wbc);			\
-	     folio || ((error = wbc->err), false);			\
-	     folio = writeback_iter_next(mapping, wbc, folio, error))
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error);
 
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2a4b5aee5decd9..9e1cce9be63524 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,29 +2360,6 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
-static void writeback_finish(struct address_space *mapping,
-		struct writeback_control *wbc, pgoff_t done_index)
-{
-	folio_batch_release(&wbc->fbatch);
-
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		if (wbc->err || wbc->nr_to_write <= 0)
-			mapping->writeback_index = done_index;
-		else
-			mapping->writeback_index = 0;
-	}
-}
-
 static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
 {
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
@@ -2442,10 +2419,8 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
-		if (!folio) {
-			writeback_finish(mapping, wbc, 0);
+		if (!folio)
 			return NULL;
-		}
 	}
 
 	folio_lock(folio);
@@ -2458,60 +2433,92 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 	return folio;
 }
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc)
-{
-	if (wbc->range_cyclic)
-		wbc->index = mapping->writeback_index; /* prev offset */
-	else
-		wbc->index = wbc->range_start >> PAGE_SHIFT;
-
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
-
-	wbc->err = 0;
-	folio_batch_init(&wbc->fbatch);
-	return writeback_get_folio(mapping, wbc);
-}
-
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error)
+/**
+ * writepage_iter - iterate folio of a mapping for writeback
+ * @mapping: address space structure to write
+ * @wbc: writeback context
+ * @folio: previously iterated folio (%NULL to start)
+ * @error: in-out pointer for writeback errors (see below)
+ *
+ * This function should be called in a while loop in the ->writepages
+ * implementation and returns the next folio for the writeback operation
+ * described by @wbc on @mapping.
+ *
+ * To start writeback @folio should be passed as NULL, for every following
+ * iteration the folio returned by this function previously should be passed.
+ * @error should contain the error from the previous writeback iteration when
+ * calling writeback_iter.
+ *
+ * Once the writeback described in @wbc has finished, this function will return
+ * %NULL and if there was an error in any iteration restore it to @error.
+ *
+ * Note: callers should not manually break out of the loop using break or goto.
+ */
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error)
 {
-	unsigned long nr = folio_nr_pages(folio);
+	if (folio) {
+		wbc->nr_to_write -= folio_nr_pages(folio);
+		if (*error && !wbc->err)
+			wbc->err = *error;
 
-	wbc->nr_to_write -= nr;
-
-	/*
-	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
-	 * Eventually all instances should just unlock the folio themselves and
-	 * return 0;
-	 */
-	if (error == AOP_WRITEPAGE_ACTIVATE) {
-		folio_unlock(folio);
-		error = 0;
+		/*
+		 * For integrity sync  we have to keep going until we have
+		 * written all the folios we tagged for writeback prior to
+		 * entering the writeback loop, even if we run past
+		 * wbc->nr_to_write or encounter errors.
+		 *
+		 * This is because the file system may still have state to clear
+		 * for each folio.  We'll eventually return the first error
+		 * encountered.
+		 *
+		 * For background writeback just push done_index past this folio
+		 * so that we can just restart where we left off and media
+		 * errors won't choke writeout for the entire file.
+		 */
+		if (wbc->sync_mode == WB_SYNC_NONE &&
+		    (wbc->err || wbc->nr_to_write <= 0))
+			goto finish;
+	} else {
+		if (wbc->range_cyclic)
+			wbc->index = mapping->writeback_index; /* prev offset */
+		else
+			wbc->index = wbc->range_start >> PAGE_SHIFT;
+		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+			tag_pages_for_writeback(mapping, wbc->index,
+					wbc_end(wbc));
+		folio_batch_init(&wbc->fbatch);
+		wbc->err = 0;
 	}
 
-	if (error && !wbc->err)
-		wbc->err = error;
+	folio = writeback_get_folio(mapping, wbc);
+	if (!folio)
+		goto finish;
+	return folio;
+
+finish:
+	folio_batch_release(&wbc->fbatch);
 
 	/*
-	 * For integrity sync  we have to keep going until we have written all
-	 * the folios we tagged for writeback prior to entering the writeback
-	 * loop, even if we run past wbc->nr_to_write or encounter errors.
-	 * This is because the file system may still have state to clear for
-	 * each folio.   We'll eventually return the first error encountered.
+	 * For range cyclic writeback we need to remember where we stopped so
+	 * that we can continue there next time we are called.  If  we hit the
+	 * last page and there is more work to be done, wrap back to the start
+	 * of the file.
 	 *
-	 * For background writeback just push done_index past this folio so that
-	 * we can just restart where we left off and media errors won't choke
-	 * writeout for the entire file.
+	 * For non-cyclic writeback we always start looking up at the beginning
+	 * of the file if we are called again, which can only happen due to
+	 * -ENOMEM from the file system.
 	 */
-	if (wbc->sync_mode == WB_SYNC_NONE &&
-	    (wbc->err || wbc->nr_to_write <= 0)) {
-		writeback_finish(mapping, wbc, folio->index + nr);
-		return NULL;
+	if (wbc->range_cyclic) {
+		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
+		if (wbc->err || wbc->nr_to_write <= 0)
+			mapping->writeback_index =
+				folio->index + folio_nr_pages(folio);
+		else
+			mapping->writeback_index = 0;
 	}
-
-	return writeback_get_folio(mapping, wbc);
+	*error = wbc->err;
+	return NULL;
 }
 
 /**
@@ -2549,13 +2556,18 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data)
 {
-	struct folio *folio;
-	int error;
+	struct folio *folio = NULL;
+	int error = 0;
 
-	for_each_writeback_folio(mapping, wbc, folio, error)
+	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
 		error = writepage(folio, wbc, data);
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
+	}
 
-	return wbc->err;
+	return error;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
@@ -2563,13 +2575,17 @@ static int writeback_use_writepage(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
 	struct blk_plug plug;
-	struct folio *folio;
-	int err;
+	struct folio *folio = 0;
+	int err = 0;
 
 	blk_start_plug(&plug);
-	for_each_writeback_folio(mapping, wbc, folio, err) {
+	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
 		err = mapping->a_ops->writepage(&folio->page, wbc);
 		mapping_set_error(mapping, err);
+		if (err == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			err = 0;
+		}
 	}
 	blk_finish_plug(&plug);
 
@@ -2590,6 +2606,8 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 			ret = mapping->a_ops->writepages(mapping, wbc);
 		} else if (mapping->a_ops->writepage) {
 			ret = writeback_use_writepage(mapping, wbc);
+			if (!ret)
+				ret = wbc->err;
 		} else {
 			/* deal with chardevs and other special files */
 			ret = 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/19] writeback: Factor out writeback_finish()
  2024-01-25  8:57 ` [PATCH 06/19] writeback: Factor out writeback_finish() Christoph Hellwig
@ 2024-01-29 20:13   ` Brian Foster
  2024-01-30 14:04     ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Brian Foster @ 2024-01-29 20:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

On Thu, Jan 25, 2024 at 09:57:45AM +0100, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Instead of having a 'done' variable that controls the nested loops,
> have a writeback_finish() that can be returned directly.  This involves
> keeping more things in writeback_control, but it's just moving stuff
> allocated on the stack to being allocated slightly earlier on the stack.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: heavily rebased, reordered and commented struct writeback_control]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Acked-by: Dave Chinner <dchinner@redhat.com>
> ---
>  include/linux/writeback.h |  6 +++
>  mm/page-writeback.c       | 79 ++++++++++++++++++++-------------------
>  2 files changed, 47 insertions(+), 38 deletions(-)
> 
...
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 437745a511c634..fcd90a176d806c 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
...
> @@ -2419,22 +2437,23 @@ int write_cache_pages(struct address_space *mapping,
>  	} else {
>  		tag = PAGECACHE_TAG_DIRTY;
>  	}
> -	done_index = index;
> -	while (!done && (index <= end)) {
> +
> +	folio_batch_init(&wbc->fbatch);
> +	wbc->err = 0;
> +
> +	while (index <= end) {
>  		int i;
>  
>  		nr_folios = filemap_get_folios_tag(mapping, &index, end,
> -				tag, &fbatch);
> +				tag, &wbc->fbatch);
>  
>  		if (nr_folios == 0)
>  			break;
>  
>  		for (i = 0; i < nr_folios; i++) {
> -			struct folio *folio = fbatch.folios[i];
> +			struct folio *folio = wbc->fbatch.folios[i];
>  			unsigned long nr;
>  
> -			done_index = folio->index;
> -
>  			folio_lock(folio);
>  
>  			/*
> @@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
>  				folio_unlock(folio);
>  				error = 0;
>  			}
> +		

JFYI: whitespace damage on the above line.

> +			if (error && !wbc->err)
> +				wbc->err = error;
>  

Also what happened to the return of the above "first error encountered"
for the WB_SYNC_ALL case? Is that not needed for some reason (and so the
comment just below might require an update)?

Brian

>  			/*
>  			 * For integrity sync  we have to keep going until we
> @@ -2496,38 +2518,19 @@ int write_cache_pages(struct address_space *mapping,
>  			 * off and media errors won't choke writeout for the
>  			 * entire file.
>  			 */
> -			if (error && !ret)
> -				ret = error;
> -			if (wbc->sync_mode == WB_SYNC_NONE) {
> -				if (ret || wbc->nr_to_write <= 0) {
> -					done_index = folio->index + nr;
> -					done = 1;
> -					break;
> -				}
> +			if (wbc->sync_mode == WB_SYNC_NONE &&
> +			    (wbc->err || wbc->nr_to_write <= 0)) {
> +				writeback_finish(mapping, wbc,
> +						folio->index + nr);
> +				return error;
>  			}
>  		}
> -		folio_batch_release(&fbatch);
> +		folio_batch_release(&wbc->fbatch);
>  		cond_resched();
>  	}
>  
> -	/*
> -	 * For range cyclic writeback we need to remember where we stopped so
> -	 * that we can continue there next time we are called.  If  we hit the
> -	 * last page and there is more work to be done, wrap back to the start
> -	 * of the file.
> -	 *
> -	 * For non-cyclic writeback we always start looking up at the beginning
> -	 * of the file if we are called again, which can only happen due to
> -	 * -ENOMEM from the file system.
> -	 */
> -	if (wbc->range_cyclic) {
> -		if (done)
> -			mapping->writeback_index = done_index;
> -		else
> -			mapping->writeback_index = 0;
> -	}
> -
> -	return ret;
> +	writeback_finish(mapping, wbc, 0);
> +	return 0;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio()
  2024-01-25  8:57 ` [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio() Christoph Hellwig
@ 2024-01-30 10:03   ` Jan Kara
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2024-01-30 10:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Thu 25-01-24 09:57:57, Christoph Hellwig wrote:
> From: Matthew Wilcox <willy@infradead.org>
> 
> This removes one indirect function call per folio, and adds typesafety
> by not casting through a void pointer.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/iomap/buffered-io.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 093c4515b22a53..58b3661f5eac9e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1887,9 +1887,8 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>   * regular allocated space.
>   */
>  static int iomap_do_writepage(struct folio *folio,
> -		struct writeback_control *wbc, void *data)
> +		struct writeback_control *wbc, struct iomap_writepage_ctx *wpc)
>  {
> -	struct iomap_writepage_ctx *wpc = data;
>  	struct inode *inode = folio->mapping->host;
>  	u64 end_pos, isize;
>  
> @@ -1986,10 +1985,12 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
>  		struct iomap_writepage_ctx *wpc,
>  		const struct iomap_writeback_ops *ops)
>  {
> -	int			ret;
> +	struct folio *folio;
> +	int ret;
>  
>  	wpc->ops = ops;
> -	ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
> +	for_each_writeback_folio(mapping, wbc, folio, ret)
> +		ret = iomap_do_writepage(folio, wbc, wpc);
>  	if (!wpc->ioend)
>  		return ret;
>  	return iomap_submit_ioend(wpc, wpc->ioend, ret);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-25  8:57 ` [PATCH 19/19] writeback: simplify writeback iteration Christoph Hellwig
@ 2024-01-30 10:46   ` Jan Kara
  2024-01-30 14:16     ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Kara @ 2024-01-30 10:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Thu 25-01-24 09:57:58, Christoph Hellwig wrote:
> Based on the feedback from Jan I've tried to figure out how to
> avoid the error magic in the for_each_writeback_folio.  This patch
> tries to implement this by switching to an open while loop over a
> single writeback_iter() function:
> 
> 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> 		...
> 	}
> 
> the twist here is that the error value is passed by reference, so that
> the iterator can restore it when breaking out of the loop.
> 
> Additionally it moves the AOP_WRITEPAGE_ACTIVATE out of the iterator
> and into the callers, in preparation for eventually killing it off
> with the phase out of write_cache_pages().
> 
> To me this form of the loop feels easier to follow, and it has the
> added advantage that writeback_iter() can actually be nicely used in
> nested loops, which should help with further iterizing the iomap
> writeback code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looking at it now I'm thinking whether we would not be better off to
completely dump the 'error' argument of writeback_iter() /
writeback_iter_next() and just make all .writepage implementations set
wbc->err directly. But that means touching all the ~20 writepage
implementations we still have...

Couple of comments regarding this implementation below (overall I agree it
seems somewhat easier to follow the code).

> +/**
> + * writepage_iter - iterate folio of a mapping for writeback
> + * @mapping: address space structure to write
> + * @wbc: writeback context
> + * @folio: previously iterated folio (%NULL to start)
> + * @error: in-out pointer for writeback errors (see below)
> + *
> + * This function should be called in a while loop in the ->writepages
> + * implementation and returns the next folio for the writeback operation
> + * described by @wbc on @mapping.
> + *
> + * To start writeback @folio should be passed as NULL, for every following
> + * iteration the folio returned by this function previously should be passed.
> + * @error should contain the error from the previous writeback iteration when
> + * calling writeback_iter.
> + *
> + * Once the writeback described in @wbc has finished, this function will return
> + * %NULL and if there was an error in any iteration restore it to @error.
> + *
> + * Note: callers should not manually break out of the loop using break or goto.
> + */
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	unsigned long nr = folio_nr_pages(folio);
> +	if (folio) {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
> +		if (*error && !wbc->err)
> +			wbc->err = *error;
>  
> -	wbc->nr_to_write -= nr;
> -
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity sync  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 *
> +		 * For background writeback just push done_index past this folio
> +		 * so that we can just restart where we left off and media
> +		 * errors won't choke writeout for the entire file.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_NONE &&
> +		    (wbc->err || wbc->nr_to_write <= 0))
> +			goto finish;

I think it would be a bit more comprehensible if we replace the goto with:
			folio_batch_release(&wbc->fbatch);
			if (wbc->range_cyclic)
				mapping->writeback_index =
					folio->index + folio_nr_pages(folio);
			*error = wbc->err;
			return NULL;

> +	} else {
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index; /* prev offset */
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio)
> +		goto finish;

And here we just need to do:
		if (wbc->range_cyclic)
			mapping->writeback_index = 0;
		*error = wbc->err;
		return NULL;

> +	return folio;
> +
> +finish:
> +	folio_batch_release(&wbc->fbatch);
>  
>  	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> +	 * For range cyclic writeback we need to remember where we stopped so
> +	 * that we can continue there next time we are called.  If  we hit the
> +	 * last page and there is more work to be done, wrap back to the start
> +	 * of the file.
>  	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> +	 * For non-cyclic writeback we always start looking up at the beginning
> +	 * of the file if we are called again, which can only happen due to
> +	 * -ENOMEM from the file system.
>  	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +	if (wbc->range_cyclic) {
> +		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
> +		if (wbc->err || wbc->nr_to_write <= 0)
> +			mapping->writeback_index =
> +				folio->index + folio_nr_pages(folio);
> +		else
> +			mapping->writeback_index = 0;
>  	}
> -
> -	return writeback_get_folio(mapping, wbc);
> +	*error = wbc->err;
> +	return NULL;
>  }
>  
>  /**
> @@ -2563,13 +2575,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = 0;
			     ^^ NULL please


> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
> @@ -2590,6 +2606,8 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
>  			ret = mapping->a_ops->writepages(mapping, wbc);
>  		} else if (mapping->a_ops->writepage) {
>  			ret = writeback_use_writepage(mapping, wbc);
> +			if (!ret)
> +				ret = wbc->err;

AFAICT this should not be needed as writeback_iter() made sure wbc->err is
returned when set?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/19] writeback: Factor out writeback_finish()
  2024-01-29 20:13   ` Brian Foster
@ 2024-01-30 14:04     ` Christoph Hellwig
  2024-01-30 14:28       ` Brian Foster
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-30 14:04 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel, Jan Kara, Dave Chinner

On Mon, Jan 29, 2024 at 03:13:47PM -0500, Brian Foster wrote:
> > @@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
> >  				folio_unlock(folio);
> >  				error = 0;
> >  			}
> > +		
> 
> JFYI: whitespace damage on the above line.

Thanks, fixed.

> 
> > +			if (error && !wbc->err)
> > +				wbc->err = error;
> >  
> 
> Also what happened to the return of the above "first error encountered"
> for the WB_SYNC_ALL case? Is that not needed for some reason (and so the
> comment just below might require an update)?

No, this got broken during the various rebases (and is fixed again later
in the series).  We need to return wbc->err from write_cache_pages at
this stage, I'll fix it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-30 10:46   ` Jan Kara
@ 2024-01-30 14:16     ` Christoph Hellwig
  2024-01-30 14:22       ` Christoph Hellwig
  2024-01-30 21:50       ` Jan Kara
  0 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-30 14:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Brian Foster, Christian Brauner, Darrick J. Wong,
	linux-xfs, linux-fsdevel, linux-kernel

On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> Looking at it now I'm thinking whether we would not be better off to
> completely dump the 'error' argument of writeback_iter() /
> writeback_iter_next() and just make all .writepage implementations set
> wbc->err directly. But that means touching all the ~20 writepage
> implementations we still have...

Heh.  I actually had an earlier version that looked at wbc->err in
the ->writepages callers.  But it felt a bit too ugly.

> > +		 */
> > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > +		    (wbc->err || wbc->nr_to_write <= 0))
> > +			goto finish;
> 
> I think it would be a bit more comprehensible if we replace the goto with:
> 			folio_batch_release(&wbc->fbatch);
> 			if (wbc->range_cyclic)
> 				mapping->writeback_index =
> 					folio->index + folio_nr_pages(folio);
> 			*error = wbc->err;
> 			return NULL;

I agree that keeping the logic on when to break and when to set the
writeback_index is good, but duplicating the batch release and error
assignment seems a bit suboptimal.  Let me know what you think of the
alternatіve variant below.

> > +	struct folio *folio = 0;
> 			     ^^ NULL please

Fixed.

> >  			ret = writeback_use_writepage(mapping, wbc);
> > +			if (!ret)
> > +				ret = wbc->err;
> 
> AFAICT this should not be needed as writeback_iter() made sure wbc->err is
> returned when set?

Heh.  That's a leftover from my above mentioned different attempt at
error handling and shouldn't have stayed in.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-30 14:16     ` Christoph Hellwig
@ 2024-01-30 14:22       ` Christoph Hellwig
  2024-01-30 14:32         ` Christoph Hellwig
  2024-01-30 21:50       ` Jan Kara
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-30 14:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Brian Foster, Christian Brauner, Darrick J. Wong,
	linux-xfs, linux-fsdevel, linux-kernel

On Tue, Jan 30, 2024 at 03:16:01PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> > Looking at it now I'm thinking whether we would not be better off to
> > completely dump the 'error' argument of writeback_iter() /
> > writeback_iter_next() and just make all .writepage implementations set
> > wbc->err directly. But that means touching all the ~20 writepage
> > implementations we still have...
> 
> Heh.  I actually had an earlier version that looked at wbc->err in
> the ->writepages callers.  But it felt a bit too ugly.
> 
> > > +		 */
> > > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > > +		    (wbc->err || wbc->nr_to_write <= 0))
> > > +			goto finish;
> > 
> > I think it would be a bit more comprehensible if we replace the goto with:
> > 			folio_batch_release(&wbc->fbatch);
> > 			if (wbc->range_cyclic)
> > 				mapping->writeback_index =
> > 					folio->index + folio_nr_pages(folio);
> > 			*error = wbc->err;
> > 			return NULL;
> 
> I agree that keeping the logic on when to break and when to set the
> writeback_index is good, but duplicating the batch release and error
> assignment seems a bit suboptimal.  Let me know what you think of the
> alternatіve variant below.

And now for real:


diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d8fcbac2d72310..3e4aa5bfe75819 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2475,10 +2475,23 @@ struct folio *writeback_iter(struct address_space *mapping,
 		 * For background writeback just push done_index past this folio
 		 * so that we can just restart where we left off and media
 		 * errors won't choke writeout for the entire file.
+		 *
+		 * For range cyclic writeback we need to remember where we
+		 * stopped so that we can continue there next time we are
+		 * called.  If we hit the last page and there is more work
+		 * to be done, wrap back to the start of the file.
+		 *
+		 * For non-cyclic writeback we always start looking up at the
+		 * beginning of the file if we are called again, which can only
+		 * happen due to -ENOMEM from the file system.
 		 */
 		if (wbc->sync_mode == WB_SYNC_NONE &&
-		    (wbc->err || wbc->nr_to_write <= 0))
+		    (wbc->err || wbc->nr_to_write <= 0)) {
+			if (wbc->range_cyclic)
+				mapping->writeback_index =
+					folio->index + folio_nr_pages(folio);
 			goto finish;
+		}
 	} else {
 		if (wbc->range_cyclic)
 			wbc->index = mapping->writeback_index; /* prev offset */
@@ -2492,31 +2505,15 @@ struct folio *writeback_iter(struct address_space *mapping,
 	}
 
 	folio = writeback_get_folio(mapping, wbc);
-	if (!folio)
+	if (!folio) {
+		if (wbc->range_cyclic)
+			mapping->writeback_index = 0;
 		goto finish;
+	}
 	return folio;
 
 finish:
 	folio_batch_release(&wbc->fbatch);
-
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
-		if (wbc->err || wbc->nr_to_write <= 0)
-			mapping->writeback_index =
-				folio->index + folio_nr_pages(folio);
-		else
-			mapping->writeback_index = 0;
-	}
 	*error = wbc->err;
 	return NULL;
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/19] writeback: Factor out writeback_finish()
  2024-01-30 14:04     ` Christoph Hellwig
@ 2024-01-30 14:28       ` Brian Foster
  0 siblings, 0 replies; 34+ messages in thread
From: Brian Foster @ 2024-01-30 14:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

On Tue, Jan 30, 2024 at 03:04:59PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 29, 2024 at 03:13:47PM -0500, Brian Foster wrote:
> > > @@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
> > >  				folio_unlock(folio);
> > >  				error = 0;
> > >  			}
> > > +		
> > 
> > JFYI: whitespace damage on the above line.
> 
> Thanks, fixed.
> 
> > 
> > > +			if (error && !wbc->err)
> > > +				wbc->err = error;
> > >  
> > 
> > Also what happened to the return of the above "first error encountered"
> > for the WB_SYNC_ALL case? Is that not needed for some reason (and so the
> > comment just below might require an update)?
> 
> No, this got broken during the various rebases (and is fixed again later
> in the series).  We need to return wbc->err from write_cache_pages at
> this stage, I'll fix it.
> 

Ok, I noticed it was added back once I got to more of the iter
abstraction bits and so figured it was a transient/unintentional thing.
The above tweak makes sense to me.

FWIW, I haven't stared at the final patch long enough to have a strong
opinion. I tend to agree with Jan that the error handling logic in the
current series is a little wonky in that it's one of those things I'd
have to go read the implementation every time to remember what it does,
but the broader changes all seem reasonable to me. So for patches 1-18
and with the above tweak:

Reviewed-by: Brian Foster <bfoster@redhat.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-30 14:22       ` Christoph Hellwig
@ 2024-01-30 14:32         ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-30 14:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Brian Foster, Christian Brauner, Darrick J. Wong,
	linux-xfs, linux-fsdevel, linux-kernel

On Tue, Jan 30, 2024 at 03:22:05PM +0100, Christoph Hellwig wrote:
> And now for real:

Another slight variant of this would be to move the nr_to_write || err
check for WB_SYNC_NONE out of the branch.  This adds extra tests for
the initial iteration, but unindents a huge comment, and moves it closer
to the other branch that it also describes.  The downside at least to
me is that the normal non-termination path is not the straight line
through function.  Thoughs?

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 973f57ad9ee548..ff6e73453aa8c4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2461,24 +2461,6 @@ struct folio *writeback_iter(struct address_space *mapping,
 		wbc->nr_to_write -= folio_nr_pages(folio);
 		if (*error && !wbc->err)
 			wbc->err = *error;
-
-		/*
-		 * For integrity sync  we have to keep going until we have
-		 * written all the folios we tagged for writeback prior to
-		 * entering the writeback loop, even if we run past
-		 * wbc->nr_to_write or encounter errors.
-		 *
-		 * This is because the file system may still have state to clear
-		 * for each folio.  We'll eventually return the first error
-		 * encountered.
-		 *
-		 * For background writeback just push done_index past this folio
-		 * so that we can just restart where we left off and media
-		 * errors won't choke writeout for the entire file.
-		 */
-		if (wbc->sync_mode == WB_SYNC_NONE &&
-		    (wbc->err || wbc->nr_to_write <= 0))
-			goto finish;
 	} else {
 		if (wbc->range_cyclic)
 			wbc->index = mapping->writeback_index; /* prev offset */
@@ -2491,17 +2473,20 @@ struct folio *writeback_iter(struct address_space *mapping,
 		wbc->err = 0;
 	}
 
-	folio = writeback_get_folio(mapping, wbc);
-	if (!folio)
-		goto finish;
-	return folio;
-
-finish:
-	folio_batch_release(&wbc->fbatch);
-
 	/*
+	 * For integrity sync  we have to keep going until we have written all
+	 * the folios we tagged for writeback prior to entering the writeback
+	 * loop, even if we run past wbc->nr_to_write or encounter errors.
+	 *
+	 * This is because the file system may still have state to clear for
+	 * each folio.  We'll eventually return the first error encountered.
+	 *
+	 * For background writeback just push done_index past this folio so that
+	 * we can just restart where we left off and media errors won't choke
+	 * writeout for the entire file.
+	 *
 	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
+	 * that we can continue there next time we are called.  If we hit the
 	 * last page and there is more work to be done, wrap back to the start
 	 * of the file.
 	 *
@@ -2509,14 +2494,21 @@ struct folio *writeback_iter(struct address_space *mapping,
 	 * of the file if we are called again, which can only happen due to
 	 * -ENOMEM from the file system.
 	 */
-	if (wbc->range_cyclic) {
-		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
-		if (wbc->err || wbc->nr_to_write <= 0)
+	if (wbc->sync_mode == WB_SYNC_NONE &&
+	    (wbc->err || wbc->nr_to_write <= 0)) {
+		if (wbc->range_cyclic)
 			mapping->writeback_index =
 				folio->index + folio_nr_pages(folio);
-		else
+	} else {
+		folio = writeback_get_folio(mapping, wbc);
+		if (folio)
+			return folio;
+
+		if (wbc->range_cyclic)
 			mapping->writeback_index = 0;
 	}
+
+	folio_batch_release(&wbc->fbatch);
 	*error = wbc->err;
 	return NULL;
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-30 14:16     ` Christoph Hellwig
  2024-01-30 14:22       ` Christoph Hellwig
@ 2024-01-30 21:50       ` Jan Kara
  2024-01-31  7:14         ` Christoph Hellwig
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Kara @ 2024-01-30 21:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Brian Foster, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel

On Tue 30-01-24 15:16:01, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> > Looking at it now I'm thinking whether we would not be better off to
> > completely dump the 'error' argument of writeback_iter() /
> > writeback_iter_next() and just make all .writepage implementations set
> > wbc->err directly. But that means touching all the ~20 writepage
> > implementations we still have...
> 
> Heh.  I actually had an earlier version that looked at wbc->err in
> the ->writepages callers.  But it felt a bit too ugly.

OK.

> > > +		 */
> > > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > > +		    (wbc->err || wbc->nr_to_write <= 0))
> > > +			goto finish;
> > 
> > I think it would be a bit more comprehensible if we replace the goto with:
> > 			folio_batch_release(&wbc->fbatch);
> > 			if (wbc->range_cyclic)
> > 				mapping->writeback_index =
> > 					folio->index + folio_nr_pages(folio);
> > 			*error = wbc->err;
> > 			return NULL;
> 
> I agree that keeping the logic on when to break and when to set the
> writeback_index is good, but duplicating the batch release and error
> assignment seems a bit suboptimal.  Let me know what you think of the
> alternatіve variant below.

Well, batch release needs to be only here because if writeback_get_folio()
returns NULL, the batch has been already released by it. So what would be
duplicated is only the error assignment. But I'm fine with the version in
the following email and actually somewhat prefer it compared the yet
another variant you've sent.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-30 21:50       ` Jan Kara
@ 2024-01-31  7:14         ` Christoph Hellwig
  2024-01-31 10:40           ` Jan Kara
  2024-01-31 12:50           ` Brian Foster
  0 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-31  7:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Brian Foster, Christian Brauner, Darrick J. Wong,
	linux-xfs, linux-fsdevel, linux-kernel

On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> Well, batch release needs to be only here because if writeback_get_folio()
> returns NULL, the batch has been already released by it.

Indeed.

> So what would be
> duplicated is only the error assignment. But I'm fine with the version in
> the following email and actually somewhat prefer it compared the yet
> another variant you've sent.

So how about another variant, this is closer to your original suggestion.
But I've switched around the ordered of the folio or not branches
from my original patch, and completely reworked and (IMHO) improved the
comments.  it replaces patch 19 instead of being incremental to be
readable:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58b3661f5eac9e..1593a783176ca2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1985,12 +1985,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 		struct iomap_writepage_ctx *wpc,
 		const struct iomap_writeback_ops *ops)
 {
-	struct folio *folio;
-	int ret;
+	struct folio *folio = NULL;
+	int ret = 0;
 
 	wpc->ops = ops;
-	for_each_writeback_folio(mapping, wbc, folio, ret)
+	while ((folio = writeback_iter(mapping, wbc, folio, &ret)))
 		ret = iomap_do_writepage(folio, wbc, wpc);
+
 	if (!wpc->ioend)
 		return ret;
 	return iomap_submit_ioend(wpc, wpc->ioend, ret);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 2416da933440e2..fc4605627496fc 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -367,15 +367,8 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc);
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error);
-
-#define for_each_writeback_folio(mapping, wbc, folio, error)		\
-	for (folio = writeback_iter_init(mapping, wbc);			\
-	     folio || ((error = wbc->err), false);			\
-	     folio = writeback_iter_next(mapping, wbc, folio, error))
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error);
 
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0763c4353a676a..eefcb00cb7b227 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,29 +2360,6 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
-static void writeback_finish(struct address_space *mapping,
-		struct writeback_control *wbc, pgoff_t done_index)
-{
-	folio_batch_release(&wbc->fbatch);
-
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		if (wbc->err || wbc->nr_to_write <= 0)
-			mapping->writeback_index = done_index;
-		else
-			mapping->writeback_index = 0;
-	}
-}
-
 static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
 {
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
@@ -2442,10 +2419,8 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
-		if (!folio) {
-			writeback_finish(mapping, wbc, 0);
+		if (!folio)
 			return NULL;
-		}
 	}
 
 	folio_lock(folio);
@@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 	return folio;
 }
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc)
+/**
+ * writepage_iter - iterate folio of a mapping for writeback
+ * @mapping: address space structure to write
+ * @wbc: writeback context
+ * @folio: previously iterated folio (%NULL to start)
+ * @error: in-out pointer for writeback errors (see below)
+ *
+ * This function should be called in a while loop in the ->writepages
+ * implementation and returns the next folio for the writeback operation
+ * described by @wbc on @mapping.
+ *
+ * To start writeback @folio should be passed as NULL, for every following
+ * iteration the folio returned by this function previously should be passed.
+ * @error should contain the error from the previous writeback iteration when
+ * calling writeback_iter.
+ *
+ * Once the writeback described in @wbc has finished, this function will return
+ * %NULL and if there was an error in any iteration restore it to @error.
+ *
+ * Note: callers should not manually break out of the loop using break or goto.
+ */
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error)
 {
-	if (wbc->range_cyclic)
-		wbc->index = mapping->writeback_index; /* prev offset */
-	else
-		wbc->index = wbc->range_start >> PAGE_SHIFT;
-
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
-
-	wbc->err = 0;
-	folio_batch_init(&wbc->fbatch);
-	return writeback_get_folio(mapping, wbc);
-}
+	if (!folio) {
+		folio_batch_init(&wbc->fbatch);
+		wbc->err = 0;
 
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error)
-{
-	unsigned long nr = folio_nr_pages(folio);
+		/*
+		 * For range cyclic writeback we remember where we stopped so
+		 * that we can continue where we stopped.
+		 *
+		 * For non-cyclic writeback we always start at the beginning of
+		 * the passed in range.
+		 */
+		if (wbc->range_cyclic)
+			wbc->index = mapping->writeback_index;
+		else
+			wbc->index = wbc->range_start >> PAGE_SHIFT;
 
-	wbc->nr_to_write -= nr;
+		/*
+		 * To avoid livelocks when other processes dirty new pages, we
+		 * first tag pages which should be written back and only then
+		 * start writing them.
+		 *
+		 * For data-integrity sync we have to be careful so that we do
+		 * not miss some pages (e.g., because some other process has
+		 * cleared the TOWRITE tag we set).  The rule we follow is that
+		 * TOWRITE tag can be cleared only by the process clearing the
+		 * DIRTY tag (and submitting the page for I/O).
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+			tag_pages_for_writeback(mapping, wbc->index,
+					wbc_end(wbc));
+	} else {
+		wbc->nr_to_write -= folio_nr_pages(folio);
 
-	/*
-	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
-	 * Eventually all instances should just unlock the folio themselves and
-	 * return 0;
-	 */
-	if (error == AOP_WRITEPAGE_ACTIVATE) {
-		folio_unlock(folio);
-		error = 0;
+		/*
+		 * For integrity writeback  we have to keep going until we have
+		 * written all the folios we tagged for writeback prior to
+		 * entering the writeback loop, even if we run past
+		 * wbc->nr_to_write or encounter errors, and just stash away
+		 * the first error we encounter in wbc->err so that it can
+		 * be retrieved on return.
+		 *
+		 * This is because the file system may still have state to clear
+		 * for each folio.  We'll eventually return the first error
+		 * encountered.
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL) {
+			if (*error && !wbc->err)
+				wbc->err = *error;
+		} else {
+			if (*error || wbc->nr_to_write <= 0)
+				goto done;
+		}
 	}
 
-	if (error && !wbc->err)
-		wbc->err = error;
+	folio = writeback_get_folio(mapping, wbc);
+	if (!folio) {
+		/*
+		 * For range cyclic writeback not finding another folios means
+		 * that we are at the end of the file.  In that case go back
+		 * to the start of the file for the next call.
+		 */
+		if (wbc->range_cyclic)
+			mapping->writeback_index = 0;
 
-	/*
-	 * For integrity sync  we have to keep going until we have written all
-	 * the folios we tagged for writeback prior to entering the writeback
-	 * loop, even if we run past wbc->nr_to_write or encounter errors.
-	 * This is because the file system may still have state to clear for
-	 * each folio.   We'll eventually return the first error encountered.
-	 *
-	 * For background writeback just push done_index past this folio so that
-	 * we can just restart where we left off and media errors won't choke
-	 * writeout for the entire file.
-	 */
-	if (wbc->sync_mode == WB_SYNC_NONE &&
-	    (wbc->err || wbc->nr_to_write <= 0)) {
-		writeback_finish(mapping, wbc, folio->index + nr);
-		return NULL;
+		/*
+		 * Return the first error we encountered (if there was any) to
+		 * the caller now that we are done.
+		 */
+		*error = wbc->err;
 	}
+	return folio;
 
-	return writeback_get_folio(mapping, wbc);
+done:
+	if (wbc->range_cyclic)
+		mapping->writeback_index = folio->index + folio_nr_pages(folio);
+	folio_batch_release(&wbc->fbatch);
+	return NULL;
 }
 
 /**
@@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data)
 {
-	struct folio *folio;
-	int error;
+	struct folio *folio = NULL;
+	int error = 0;
 
-	for_each_writeback_folio(mapping, wbc, folio, error)
+	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
 		error = writepage(folio, wbc, data);
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
+	}
 
-	return wbc->err;
+	return error;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
@@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
 	struct blk_plug plug;
-	struct folio *folio;
-	int err;
+	struct folio *folio = NULL;
+	int err = 0;
 
 	blk_start_plug(&plug);
-	for_each_writeback_folio(mapping, wbc, folio, err) {
+	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
 		err = mapping->a_ops->writepage(&folio->page, wbc);
 		mapping_set_error(mapping, err);
+		if (err == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			err = 0;
+		}
 	}
 	blk_finish_plug(&plug);
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-31  7:14         ` Christoph Hellwig
@ 2024-01-31 10:40           ` Jan Kara
  2024-01-31 10:43             ` Christoph Hellwig
  2024-01-31 12:50           ` Brian Foster
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Kara @ 2024-01-31 10:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Brian Foster, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel

On Wed 31-01-24 08:14:37, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> > Well, batch release needs to be only here because if writeback_get_folio()
> > returns NULL, the batch has been already released by it.
> 
> Indeed.
> 
> > So what would be
> > duplicated is only the error assignment. But I'm fine with the version in
> > the following email and actually somewhat prefer it compared the yet
> > another variant you've sent.
> 
> So how about another variant, this is closer to your original suggestion.
> But I've switched around the ordered of the folio or not branches
> from my original patch, and completely reworked and (IMHO) improved the
> comments.  it replaces patch 19 instead of being incremental to be
> readable:

Yes, this looks very good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 58b3661f5eac9e..1593a783176ca2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1985,12 +1985,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
>  		struct iomap_writepage_ctx *wpc,
>  		const struct iomap_writeback_ops *ops)
>  {
> -	struct folio *folio;
> -	int ret;
> +	struct folio *folio = NULL;
> +	int ret = 0;
>  
>  	wpc->ops = ops;
> -	for_each_writeback_folio(mapping, wbc, folio, ret)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &ret)))
>  		ret = iomap_do_writepage(folio, wbc, wpc);
> +
>  	if (!wpc->ioend)
>  		return ret;
>  	return iomap_submit_ioend(wpc, wpc->ioend, ret);
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 2416da933440e2..fc4605627496fc 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -367,15 +367,8 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
>  
>  bool wb_over_bg_thresh(struct bdi_writeback *wb);
>  
> -struct folio *writeback_iter_init(struct address_space *mapping,
> -		struct writeback_control *wbc);
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error);
> -
> -#define for_each_writeback_folio(mapping, wbc, folio, error)		\
> -	for (folio = writeback_iter_init(mapping, wbc);			\
> -	     folio || ((error = wbc->err), false);			\
> -	     folio = writeback_iter_next(mapping, wbc, folio, error))
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error);
>  
>  typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
>  				void *data);
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0763c4353a676a..eefcb00cb7b227 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2360,29 +2360,6 @@ void tag_pages_for_writeback(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(tag_pages_for_writeback);
>  
> -static void writeback_finish(struct address_space *mapping,
> -		struct writeback_control *wbc, pgoff_t done_index)
> -{
> -	folio_batch_release(&wbc->fbatch);
> -
> -	/*
> -	 * For range cyclic writeback we need to remember where we stopped so
> -	 * that we can continue there next time we are called.  If  we hit the
> -	 * last page and there is more work to be done, wrap back to the start
> -	 * of the file.
> -	 *
> -	 * For non-cyclic writeback we always start looking up at the beginning
> -	 * of the file if we are called again, which can only happen due to
> -	 * -ENOMEM from the file system.
> -	 */
> -	if (wbc->range_cyclic) {
> -		if (wbc->err || wbc->nr_to_write <= 0)
> -			mapping->writeback_index = done_index;
> -		else
> -			mapping->writeback_index = 0;
> -	}
> -}
> -
>  static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
>  {
>  	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> @@ -2442,10 +2419,8 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
>  				wbc_to_tag(wbc), &wbc->fbatch);
>  		folio = folio_batch_next(&wbc->fbatch);
> -		if (!folio) {
> -			writeback_finish(mapping, wbc, 0);
> +		if (!folio)
>  			return NULL;
> -		}
>  	}
>  
>  	folio_lock(folio);
> @@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  	return folio;
>  }
>  
> -struct folio *writeback_iter_init(struct address_space *mapping,
> -		struct writeback_control *wbc)
> +/**
> + * writepage_iter - iterate folio of a mapping for writeback
> + * @mapping: address space structure to write
> + * @wbc: writeback context
> + * @folio: previously iterated folio (%NULL to start)
> + * @error: in-out pointer for writeback errors (see below)
> + *
> + * This function should be called in a while loop in the ->writepages
> + * implementation and returns the next folio for the writeback operation
> + * described by @wbc on @mapping.
> + *
> + * To start writeback @folio should be passed as NULL, for every following
> + * iteration the folio returned by this function previously should be passed.
> + * @error should contain the error from the previous writeback iteration when
> + * calling writeback_iter.
> + *
> + * Once the writeback described in @wbc has finished, this function will return
> + * %NULL and if there was an error in any iteration restore it to @error.
> + *
> + * Note: callers should not manually break out of the loop using break or goto.
> + */
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	if (wbc->range_cyclic)
> -		wbc->index = mapping->writeback_index; /* prev offset */
> -	else
> -		wbc->index = wbc->range_start >> PAGE_SHIFT;
> -
> -	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> -		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
> -
> -	wbc->err = 0;
> -	folio_batch_init(&wbc->fbatch);
> -	return writeback_get_folio(mapping, wbc);
> -}
> +	if (!folio) {
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;
>  
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error)
> -{
> -	unsigned long nr = folio_nr_pages(folio);
> +		/*
> +		 * For range cyclic writeback we remember where we stopped so
> +		 * that we can continue where we stopped.
> +		 *
> +		 * For non-cyclic writeback we always start at the beginning of
> +		 * the passed in range.
> +		 */
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index;
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
>  
> -	wbc->nr_to_write -= nr;
> +		/*
> +		 * To avoid livelocks when other processes dirty new pages, we
> +		 * first tag pages which should be written back and only then
> +		 * start writing them.
> +		 *
> +		 * For data-integrity sync we have to be careful so that we do
> +		 * not miss some pages (e.g., because some other process has
> +		 * cleared the TOWRITE tag we set).  The rule we follow is that
> +		 * TOWRITE tag can be cleared only by the process clearing the
> +		 * DIRTY tag (and submitting the page for I/O).
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +	} else {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity writeback  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors, and just stash away
> +		 * the first error we encounter in wbc->err so that it can
> +		 * be retrieved on return.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL) {
> +			if (*error && !wbc->err)
> +				wbc->err = *error;
> +		} else {
> +			if (*error || wbc->nr_to_write <= 0)
> +				goto done;
> +		}
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * For range cyclic writeback not finding another folios means
> +		 * that we are at the end of the file.  In that case go back
> +		 * to the start of the file for the next call.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
>  
> -	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> -	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> -	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller now that we are done.
> +		 */
> +		*error = wbc->err;
>  	}
> +	return folio;
>  
> -	return writeback_get_folio(mapping, wbc);
> +done:
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	folio_batch_release(&wbc->fbatch);
> +	return NULL;
>  }
>  
>  /**
> @@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
>  		      struct writeback_control *wbc, writepage_t writepage,
>  		      void *data)
>  {
> -	struct folio *folio;
> -	int error;
> +	struct folio *folio = NULL;
> +	int error = 0;
>  
> -	for_each_writeback_folio(mapping, wbc, folio, error)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
>  		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
>  
> -	return wbc->err;
> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> @@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = NULL;
> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-31 10:40           ` Jan Kara
@ 2024-01-31 10:43             ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-31 10:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Brian Foster, Christian Brauner, Darrick J. Wong,
	linux-xfs, linux-fsdevel, linux-kernel

On Wed, Jan 31, 2024 at 11:40:45AM +0100, Jan Kara wrote:
> Yes, this looks very good to me. Feel free to add:

Thanks!  I'd also really love to hear from Brian and Matthew before
reposting.  Or from anyone else interested..


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-31  7:14         ` Christoph Hellwig
  2024-01-31 10:40           ` Jan Kara
@ 2024-01-31 12:50           ` Brian Foster
  2024-01-31 13:01             ` Christoph Hellwig
  1 sibling, 1 reply; 34+ messages in thread
From: Brian Foster @ 2024-01-31 12:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Wed, Jan 31, 2024 at 08:14:37AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> > Well, batch release needs to be only here because if writeback_get_folio()
> > returns NULL, the batch has been already released by it.
> 
> Indeed.
> 
> > So what would be
> > duplicated is only the error assignment. But I'm fine with the version in
> > the following email and actually somewhat prefer it compared the yet
> > another variant you've sent.
> 
> So how about another variant, this is closer to your original suggestion.
> But I've switched around the ordered of the folio or not branches
> from my original patch, and completely reworked and (IMHO) improved the
> comments.  it replaces patch 19 instead of being incremental to be
> readable:
> 

Not a thorough review but at first glance I like it. I think the direct
function call makes things easier to follow. Just one quick thought...

...
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0763c4353a676a..eefcb00cb7b227 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
...
> @@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  	return folio;
>  }
>  
...
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	if (wbc->range_cyclic)
> -		wbc->index = mapping->writeback_index; /* prev offset */
> -	else
> -		wbc->index = wbc->range_start >> PAGE_SHIFT;
> -
> -	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> -		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
> -
> -	wbc->err = 0;
> -	folio_batch_init(&wbc->fbatch);
> -	return writeback_get_folio(mapping, wbc);
> -}
> +	if (!folio) {
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;

The implied field initialization via !folio feels a little wonky to me
just because it's not clear from the client code that both fields must
be initialized. Even though the interface is simpler, I wonder if it's
still worth having a dumb/macro type init function that at least does
the batch and error field initialization.

Or on second thought maybe having writeback_iter() reset *error as well
on folio == NULL might be a little cleaner without changing the
interface....

Brian

>  
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error)
> -{
> -	unsigned long nr = folio_nr_pages(folio);
> +		/*
> +		 * For range cyclic writeback we remember where we stopped so
> +		 * that we can continue where we stopped.
> +		 *
> +		 * For non-cyclic writeback we always start at the beginning of
> +		 * the passed in range.
> +		 */
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index;
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
>  
> -	wbc->nr_to_write -= nr;
> +		/*
> +		 * To avoid livelocks when other processes dirty new pages, we
> +		 * first tag pages which should be written back and only then
> +		 * start writing them.
> +		 *
> +		 * For data-integrity sync we have to be careful so that we do
> +		 * not miss some pages (e.g., because some other process has
> +		 * cleared the TOWRITE tag we set).  The rule we follow is that
> +		 * TOWRITE tag can be cleared only by the process clearing the
> +		 * DIRTY tag (and submitting the page for I/O).
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +	} else {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity writeback  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors, and just stash away
> +		 * the first error we encounter in wbc->err so that it can
> +		 * be retrieved on return.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL) {
> +			if (*error && !wbc->err)
> +				wbc->err = *error;
> +		} else {
> +			if (*error || wbc->nr_to_write <= 0)
> +				goto done;
> +		}
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * For range cyclic writeback not finding another folios means
> +		 * that we are at the end of the file.  In that case go back
> +		 * to the start of the file for the next call.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
>  
> -	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> -	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> -	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller now that we are done.
> +		 */
> +		*error = wbc->err;
>  	}
> +	return folio;
>  
> -	return writeback_get_folio(mapping, wbc);
> +done:
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	folio_batch_release(&wbc->fbatch);
> +	return NULL;
>  }
>  
>  /**
> @@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
>  		      struct writeback_control *wbc, writepage_t writepage,
>  		      void *data)
>  {
> -	struct folio *folio;
> -	int error;
> +	struct folio *folio = NULL;
> +	int error = 0;
>  
> -	for_each_writeback_folio(mapping, wbc, folio, error)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
>  		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
>  
> -	return wbc->err;
> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> @@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = NULL;
> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/19] writeback: simplify writeback iteration
  2024-01-31 12:50           ` Brian Foster
@ 2024-01-31 13:01             ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2024-01-31 13:01 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Jan Kara, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel

On Wed, Jan 31, 2024 at 07:50:25AM -0500, Brian Foster wrote:
> The implied field initialization via !folio feels a little wonky to me
> just because it's not clear from the client code that both fields must
> be initialized. Even though the interface is simpler, I wonder if it's
> still worth having a dumb/macro type init function that at least does
> the batch and error field initialization.
> 
> Or on second thought maybe having writeback_iter() reset *error as well
> on folio == NULL might be a little cleaner without changing the
> interface....

I like that second idea.  An initialization helper I could live with,
but if only folio needs a defined state, that seems superflous.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-01-31 13:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25  8:57 Convert write_cache_pages() to an iterator v5 Christoph Hellwig
2024-01-25  8:57 ` [PATCH 01/19] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
2024-01-25  8:57 ` [PATCH 02/19] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
2024-01-25  8:57 ` [PATCH 03/19] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
2024-01-25  8:57 ` [PATCH 04/19] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
2024-01-25  8:57 ` [PATCH 05/19] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
2024-01-25  8:57 ` [PATCH 06/19] writeback: Factor out writeback_finish() Christoph Hellwig
2024-01-29 20:13   ` Brian Foster
2024-01-30 14:04     ` Christoph Hellwig
2024-01-30 14:28       ` Brian Foster
2024-01-25  8:57 ` [PATCH 07/19] writeback: Factor writeback_get_batch() out of write_cache_pages() Christoph Hellwig
2024-01-25  8:57 ` [PATCH 08/19] writeback: Factor folio_prepare_writeback() " Christoph Hellwig
2024-01-25  8:57 ` [PATCH 09/19] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
2024-01-25  8:57 ` [PATCH 10/19] pagevec: Add ability to iterate a queue Christoph Hellwig
2024-01-25  8:57 ` [PATCH 11/19] writeback: Use the folio_batch queue iterator Christoph Hellwig
2024-01-25  8:57 ` [PATCH 12/19] writeback: Factor writeback_iter_init() out of write_cache_pages() Christoph Hellwig
2024-01-25  8:57 ` [PATCH 13/19] writeback: Move the folio_prepare_writeback loop " Christoph Hellwig
2024-01-25  8:57 ` [PATCH 14/19] writeback: Factor writeback_iter_next() " Christoph Hellwig
2024-01-25  8:57 ` [PATCH 15/19] writeback: Add for_each_writeback_folio() Christoph Hellwig
2024-01-25  8:57 ` [PATCH 16/19] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
2024-01-25  8:57 ` [PATCH 17/19] writeback: update the kerneldoc comment for tag_pages_for_writeback Christoph Hellwig
2024-01-25  8:57 ` [PATCH 18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio() Christoph Hellwig
2024-01-30 10:03   ` Jan Kara
2024-01-25  8:57 ` [PATCH 19/19] writeback: simplify writeback iteration Christoph Hellwig
2024-01-30 10:46   ` Jan Kara
2024-01-30 14:16     ` Christoph Hellwig
2024-01-30 14:22       ` Christoph Hellwig
2024-01-30 14:32         ` Christoph Hellwig
2024-01-30 21:50       ` Jan Kara
2024-01-31  7:14         ` Christoph Hellwig
2024-01-31 10:40           ` Jan Kara
2024-01-31 10:43             ` Christoph Hellwig
2024-01-31 12:50           ` Brian Foster
2024-01-31 13:01             ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).