convert write_cache_pages() to an iterator v6

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* convert write_cache_pages() to an iterator v6
@ 2024-02-03  7:11 Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 01/13] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

Hi all,

this is an evolution of the series Matthew Wilcox originally sent in June
2023, which has changed quite a bit since and now has a while based
iterator.

Note that in this version two patches are so different from the previous
version that I've not kept any Reviews or Acks for them, even if the
final result look almost the same as the previous patches with the
incremental patch on the list.

Changes since v5:
 - completely reshuffle the series to directly prepare for the
   writeback_iter() style.
 - don't require *error to be initialized on first call
 - improve various comments
 - fix a bisection hazard where write_cache_pages don't return delayed
   error for a few commits
 - fix a whitespace error
 - drop the iomap patch again for now as the iomap map multiple blocks
   series isn't in mainline yet

Changes since v4:
 - added back the (rebased) iomap conversion now that the conflict is in
   mainline
 - add a new patch to change the iterator

Changes since v3:
 - various commit log spelling fixes
 - remove a statement from a commit log that isn't true any more with the
   changes in v3
 - rename a function
 - merge two helpers

Diffstat:
 fs/iomap/buffered-io.c    |   10 -
 include/linux/pagevec.h   |   18 ++
 include/linux/writeback.h |   12 +
 mm/page-writeback.c       |  344 ++++++++++++++++++++++++++--------------------
 4 files changed, 231 insertions(+), 153 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/13] writeback: remove a duplicate prototype for tag_pages_for_writeback
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 02/13] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: split from a larger patch]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 453736fd1d23ce..4b8cf9e4810bad 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -363,8 +363,6 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb);
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
 
-void tag_pages_for_writeback(struct address_space *mapping,
-			     pgoff_t start, pgoff_t end);
 int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/13] writeback: fix done_index when hitting the wbc->nr_to_write
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 01/13] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 03/13] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

When write_cache_pages finishes writing out a folio, it fails to update
done_index to account for the number of pages in the folio just written.
That means when range_cyclic writeback is restarted, it will be
restarted at this folio instead of after it as it should.  Fix that
by updating done_index before breaking out of the loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 02147b61712bc9..b4d978f77b0b69 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2505,6 +2505,7 @@ int write_cache_pages(struct address_space *mapping,
 			 * keep going until we have written all the pages
 			 * we tagged for writeback prior to entering this loop.
 			 */
+			done_index = folio->index + nr;
 			wbc->nr_to_write -= nr;
 			if (wbc->nr_to_write <= 0 &&
 			    wbc->sync_mode == WB_SYNC_NONE) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/13] writeback: also update wbc->nr_to_write on writeback failure
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 01/13] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 02/13] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 04/13] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

When exiting write_cache_pages early due to a non-integrity write
failure, wbc->nr_to_write currently doesn't account for the folio
we just failed to write.  This doesn't matter because the callers
always ingore the value on a failure, but moving the update to
common code will allow to simplify the code, so do it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b4d978f77b0b69..ee9eb347890cd3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
 			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 			error = writepage(folio, wbc, data);
 			nr = folio_nr_pages(folio);
+			wbc->nr_to_write -= nr;
 			if (unlikely(error)) {
 				/*
 				 * Handle errors according to the type of
@@ -2506,7 +2507,6 @@ int write_cache_pages(struct address_space *mapping,
 			 * we tagged for writeback prior to entering this loop.
 			 */
 			done_index = folio->index + nr;
-			wbc->nr_to_write -= nr;
 			if (wbc->nr_to_write <= 0 &&
 			    wbc->sync_mode == WB_SYNC_NONE) {
 				done = 1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/13] writeback: only update ->writeback_index for range_cyclic writeback
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 03/13] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

mapping->writeback_index is only [1] used as the starting point for
range_cyclic writeback, so there is no point in updating it for other
types of writeback.

[1] except for btrfs_defrag_file which does really odd things with
mapping->writeback_index.  But btrfs doesn't use write_cache_pages at
all, so this isn't relevant here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ee9eb347890cd3..c7c494526bc650 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2403,7 +2403,6 @@ int write_cache_pages(struct address_space *mapping,
 	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
 	pgoff_t done_index;
-	int range_whole = 0;
 	xa_mark_t tag;
 
 	folio_batch_init(&fbatch);
@@ -2413,8 +2412,6 @@ int write_cache_pages(struct address_space *mapping,
 	} else {
 		index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
-		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
-			range_whole = 1;
 	}
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
 		tag_pages_for_writeback(mapping, index, end);
@@ -2518,14 +2515,21 @@ int write_cache_pages(struct address_space *mapping,
 	}
 
 	/*
-	 * If we hit the last page and there is more work to be done: wrap
-	 * back the index back to the start of the file for the next
-	 * time we are called.
+	 * For range cyclic writeback we need to remember where we stopped so
+	 * that we can continue there next time we are called.  If  we hit the
+	 * last page and there is more work to be done, wrap back to the start
+	 * of the file.
+	 *
+	 * For non-cyclic writeback we always start looking up at the beginning
+	 * of the file if we are called again, which can only happen due to
+	 * -ENOMEM from the file system.
 	 */
-	if (wbc->range_cyclic && !done)
-		done_index = 0;
-	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
-		mapping->writeback_index = done_index;
+	if (wbc->range_cyclic) {
+		if (done)
+			mapping->writeback_index = done_index;
+		else
+			mapping->writeback_index = 0;
+	}
 
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 04/13] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-05 11:34   ` Jan Kara
  2024-02-05 15:32   ` Brian Foster
  2024-02-03  7:11 ` [PATCH 06/13] writeback: Factor folio_prepare_writeback() out of write_cache_pages() Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

Rework the way we deal with the cleanup after the writepage call.

First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error
returns to get it out of the way of the actual error handling path.

The split the handling on intgrity vs non-integrity branches first,
and return early using a goto for the non-ingegrity early loop condition
to remove the need for the done and done_index local variables, and for
assigning the error to ret when we can just return error directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/page-writeback.c | 84 ++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 51 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c7c494526bc650..88b2c4c111c01b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2396,13 +2396,12 @@ int write_cache_pages(struct address_space *mapping,
 		      void *data)
 {
 	int ret = 0;
-	int done = 0;
 	int error;
 	struct folio_batch fbatch;
+	struct folio *folio;
 	int nr_folios;
 	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
-	pgoff_t done_index;
 	xa_mark_t tag;
 
 	folio_batch_init(&fbatch);
@@ -2419,8 +2418,7 @@ int write_cache_pages(struct address_space *mapping,
 	} else {
 		tag = PAGECACHE_TAG_DIRTY;
 	}
-	done_index = index;
-	while (!done && (index <= end)) {
+	while (index <= end) {
 		int i;
 
 		nr_folios = filemap_get_folios_tag(mapping, &index, end,
@@ -2430,11 +2428,7 @@ int write_cache_pages(struct address_space *mapping,
 			break;
 
 		for (i = 0; i < nr_folios; i++) {
-			struct folio *folio = fbatch.folios[i];
-			unsigned long nr;
-
-			done_index = folio->index;
-
+			folio = fbatch.folios[i];
 			folio_lock(folio);
 
 			/*
@@ -2469,45 +2463,32 @@ int write_cache_pages(struct address_space *mapping,
 
 			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 			error = writepage(folio, wbc, data);
-			nr = folio_nr_pages(folio);
-			wbc->nr_to_write -= nr;
-			if (unlikely(error)) {
-				/*
-				 * Handle errors according to the type of
-				 * writeback. There's no need to continue for
-				 * background writeback. Just push done_index
-				 * past this page so media errors won't choke
-				 * writeout for the entire file. For integrity
-				 * writeback, we must process the entire dirty
-				 * set regardless of errors because the fs may
-				 * still have state to clear for each page. In
-				 * that case we continue processing and return
-				 * the first error.
-				 */
-				if (error == AOP_WRITEPAGE_ACTIVATE) {
-					folio_unlock(folio);
-					error = 0;
-				} else if (wbc->sync_mode != WB_SYNC_ALL) {
-					ret = error;
-					done_index = folio->index + nr;
-					done = 1;
-					break;
-				}
-				if (!ret)
-					ret = error;
+			wbc->nr_to_write -= folio_nr_pages(folio);
+
+			if (error == AOP_WRITEPAGE_ACTIVATE) {
+				folio_unlock(folio);
+				error = 0;
 			}
 
 			/*
-			 * We stop writing back only if we are not doing
-			 * integrity sync. In case of integrity sync we have to
-			 * keep going until we have written all the pages
-			 * we tagged for writeback prior to entering this loop.
+			 * For integrity writeback we have to keep going until
+			 * we have written all the folios we tagged for
+			 * writeback above, even if we run past wbc->nr_to_write
+			 * or encounter errors.
+			 * We stash away the first error we encounter in
+			 * wbc->saved_err so that it can be retrieved when we're
+			 * done.  This is because the file system may still have
+			 * state to clear for each folio.
+			 *
+			 * For background writeback we exit as soon as we run
+			 * past wbc->nr_to_write or encounter the first error.
 			 */
-			done_index = folio->index + nr;
-			if (wbc->nr_to_write <= 0 &&
-			    wbc->sync_mode == WB_SYNC_NONE) {
-				done = 1;
-				break;
+			if (wbc->sync_mode == WB_SYNC_ALL) {
+				if (error && !ret)
+					ret = error;
+			} else {
+				if (error || wbc->nr_to_write <= 0)
+					goto done;
 			}
 		}
 		folio_batch_release(&fbatch);
@@ -2524,14 +2505,15 @@ int write_cache_pages(struct address_space *mapping,
 	 * of the file if we are called again, which can only happen due to
 	 * -ENOMEM from the file system.
 	 */
-	if (wbc->range_cyclic) {
-		if (done)
-			mapping->writeback_index = done_index;
-		else
-			mapping->writeback_index = 0;
-	}
-
+	if (wbc->range_cyclic)
+		mapping->writeback_index = 0;
 	return ret;
+
+done:
+	folio_batch_release(&fbatch);
+	if (wbc->range_cyclic)
+		mapping->writeback_index = folio->index + folio_nr_pages(folio);
+	return error;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/13] writeback: Factor folio_prepare_writeback() out of write_cache_pages()
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 07/13] writeback: Factor writeback_get_batch() " Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Reduce write_cache_pages() by about 30 lines; much of it is commentary,
but it all bundles nicely into an obvious function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: rename should_writeback_folio to folio_prepare_writeback based on
      a comment from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 61 +++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 88b2c4c111c01b..949193624baa38 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,6 +2360,38 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
+static bool folio_prepare_writeback(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio)
+{
+	/*
+	 * Folio truncated or invalidated. We can freely skip it then,
+	 * even for data integrity operations: the folio has disappeared
+	 * concurrently, so there could be no real expectation of this
+	 * data integrity operation even if there is now a new, dirty
+	 * folio at the same pagecache index.
+	 */
+	if (unlikely(folio->mapping != mapping))
+		return false;
+
+	/*
+	 * Did somebody else write it for us?
+	 */
+	if (!folio_test_dirty(folio))
+		return false;
+
+	if (folio_test_writeback(folio)) {
+		if (wbc->sync_mode == WB_SYNC_NONE)
+			return false;
+		folio_wait_writeback(folio);
+	}
+	BUG_ON(folio_test_writeback(folio));
+
+	if (!folio_clear_dirty_for_io(folio))
+		return false;
+
+	return true;
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2430,38 +2462,13 @@ int write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_folios; i++) {
 			folio = fbatch.folios[i];
 			folio_lock(folio);
-
-			/*
-			 * Page truncated or invalidated. We can freely skip it
-			 * then, even for data integrity operations: the page
-			 * has disappeared concurrently, so there could be no
-			 * real expectation of this data integrity operation
-			 * even if there is now a new, dirty page at the same
-			 * pagecache address.
-			 */
-			if (unlikely(folio->mapping != mapping)) {
-continue_unlock:
+			if (!folio_prepare_writeback(mapping, wbc, folio)) {
 				folio_unlock(folio);
 				continue;
 			}
 
-			if (!folio_test_dirty(folio)) {
-				/* someone wrote it for us */
-				goto continue_unlock;
-			}
-
-			if (folio_test_writeback(folio)) {
-				if (wbc->sync_mode != WB_SYNC_NONE)
-					folio_wait_writeback(folio);
-				else
-					goto continue_unlock;
-			}
-
-			BUG_ON(folio_test_writeback(folio));
-			if (!folio_clear_dirty_for_io(folio))
-				goto continue_unlock;
-
 			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+
 			error = writepage(folio, wbc, data);
 			wbc->nr_to_write -= folio_nr_pages(folio);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/13] writeback: Factor writeback_get_batch() out of write_cache_pages()
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 06/13] writeback: Factor folio_prepare_writeback() out of write_cache_pages() Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 08/13] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This simple helper will be the basis of the writeback iterator.
To make this work, we need to remember the current index
and end positions in writeback_control.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: heavily rebased, add helpers to get the tag and end index,
      don't keep the end index in struct writeback_control]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/writeback.h |  6 ++++
 mm/page-writeback.c       | 60 +++++++++++++++++++++++++--------------
 2 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 4b8cf9e4810bad..f67b3ea866a0fb 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -11,6 +11,7 @@
 #include <linux/flex_proportions.h>
 #include <linux/backing-dev-defs.h>
 #include <linux/blk_types.h>
+#include <linux/pagevec.h>
 
 struct bio;
 
@@ -40,6 +41,7 @@ enum writeback_sync_modes {
  * in a manner such that unspecified fields are set to zero.
  */
 struct writeback_control {
+	/* public fields that can be set and/or consumed by the caller: */
 	long nr_to_write;		/* Write this many pages, and decrement
 					   this for each page written */
 	long pages_skipped;		/* Pages which were not written */
@@ -77,6 +79,10 @@ struct writeback_control {
 	 */
 	struct swap_iocb **swap_plug;
 
+	/* internal fields used by the ->writepages implementation: */
+	struct folio_batch fbatch;
+	pgoff_t index;
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
 	struct inode *inode;		/* inode being written out */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 949193624baa38..23363ed712f646 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2392,6 +2392,29 @@ static bool folio_prepare_writeback(struct address_space *mapping,
 	return true;
 }
 
+static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
+{
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		return PAGECACHE_TAG_TOWRITE;
+	return PAGECACHE_TAG_DIRTY;
+}
+
+static pgoff_t wbc_end(struct writeback_control *wbc)
+{
+	if (wbc->range_cyclic)
+		return -1;
+	return wbc->range_end >> PAGE_SHIFT;
+}
+
+static void writeback_get_batch(struct address_space *mapping,
+		struct writeback_control *wbc)
+{
+	folio_batch_release(&wbc->fbatch);
+	cond_resched();
+	filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+			wbc_to_tag(wbc), &wbc->fbatch);
+}
+
 /**
  * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
  * @mapping: address space structure to write
@@ -2429,38 +2452,32 @@ int write_cache_pages(struct address_space *mapping,
 {
 	int ret = 0;
 	int error;
-	struct folio_batch fbatch;
 	struct folio *folio;
-	int nr_folios;
-	pgoff_t index;
 	pgoff_t end;		/* Inclusive */
-	xa_mark_t tag;
 
-	folio_batch_init(&fbatch);
 	if (wbc->range_cyclic) {
-		index = mapping->writeback_index; /* prev offset */
+		wbc->index = mapping->writeback_index; /* prev offset */
 		end = -1;
 	} else {
-		index = wbc->range_start >> PAGE_SHIFT;
+		wbc->index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
 	}
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
-		tag_pages_for_writeback(mapping, index, end);
-		tag = PAGECACHE_TAG_TOWRITE;
-	} else {
-		tag = PAGECACHE_TAG_DIRTY;
-	}
-	while (index <= end) {
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag_pages_for_writeback(mapping, wbc->index, end);
+
+	folio_batch_init(&wbc->fbatch);
+
+	while (wbc->index <= end) {
 		int i;
 
-		nr_folios = filemap_get_folios_tag(mapping, &index, end,
-				tag, &fbatch);
+		writeback_get_batch(mapping, wbc);
 
-		if (nr_folios == 0)
+		if (wbc->fbatch.nr == 0)
 			break;
 
-		for (i = 0; i < nr_folios; i++) {
-			folio = fbatch.folios[i];
+		for (i = 0; i < wbc->fbatch.nr; i++) {
+			folio = wbc->fbatch.folios[i];
+
 			folio_lock(folio);
 			if (!folio_prepare_writeback(mapping, wbc, folio)) {
 				folio_unlock(folio);
@@ -2498,8 +2515,6 @@ int write_cache_pages(struct address_space *mapping,
 					goto done;
 			}
 		}
-		folio_batch_release(&fbatch);
-		cond_resched();
 	}
 
 	/*
@@ -2512,12 +2527,13 @@ int write_cache_pages(struct address_space *mapping,
 	 * of the file if we are called again, which can only happen due to
 	 * -ENOMEM from the file system.
 	 */
+	folio_batch_release(&wbc->fbatch);
 	if (wbc->range_cyclic)
 		mapping->writeback_index = 0;
 	return ret;
 
 done:
-	folio_batch_release(&fbatch);
+	folio_batch_release(&wbc->fbatch);
 	if (wbc->range_cyclic)
 		mapping->writeback_index = folio->index + folio_nr_pages(folio);
 	return error;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/13] writeback: Simplify the loops in write_cache_pages()
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 07/13] writeback: Factor writeback_get_batch() " Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 09/13] pagevec: Add ability to iterate a queue Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Collapse the two nested loops into one.  This is needed as a step
towards turning this into an iterator.

Note that this drops the "index <= end" check in the previous outer loop
and just relies on filemap_get_folios_tag() to return 0 entries when
index > end.  This actually has a subtle implication when end == -1
because then the returned index will be -1 as well and thus if there is
page present on index -1, we could be looping indefinitely. But as the
comment in filemap_get_folios_tag documents this as already broken anyway
we should not worry about it here either.  The fix for that would
probably a change to the filemap_get_folios_tag() calling convention.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: updated the commit log based on feedback from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 75 ++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 23363ed712f646..d7ab42def43035 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2454,6 +2454,7 @@ int write_cache_pages(struct address_space *mapping,
 	int error;
 	struct folio *folio;
 	pgoff_t end;		/* Inclusive */
+	int i = 0;
 
 	if (wbc->range_cyclic) {
 		wbc->index = mapping->writeback_index; /* prev offset */
@@ -2467,53 +2468,49 @@ int write_cache_pages(struct address_space *mapping,
 
 	folio_batch_init(&wbc->fbatch);
 
-	while (wbc->index <= end) {
-		int i;
-
-		writeback_get_batch(mapping, wbc);
-
+	for (;;) {
+		if (i == wbc->fbatch.nr) {
+			writeback_get_batch(mapping, wbc);
+			i = 0;
+		}
 		if (wbc->fbatch.nr == 0)
 			break;
 
-		for (i = 0; i < wbc->fbatch.nr; i++) {
-			folio = wbc->fbatch.folios[i];
+		folio = wbc->fbatch.folios[i++];
 
-			folio_lock(folio);
-			if (!folio_prepare_writeback(mapping, wbc, folio)) {
-				folio_unlock(folio);
-				continue;
-			}
+		folio_lock(folio);
+		if (!folio_prepare_writeback(mapping, wbc, folio)) {
+			folio_unlock(folio);
+			continue;
+		}
 
-			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+		trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 
-			error = writepage(folio, wbc, data);
-			wbc->nr_to_write -= folio_nr_pages(folio);
+		error = writepage(folio, wbc, data);
+		wbc->nr_to_write -= folio_nr_pages(folio);
 
-			if (error == AOP_WRITEPAGE_ACTIVATE) {
-				folio_unlock(folio);
-				error = 0;
-			}
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
 
-			/*
-			 * For integrity writeback we have to keep going until
-			 * we have written all the folios we tagged for
-			 * writeback above, even if we run past wbc->nr_to_write
-			 * or encounter errors.
-			 * We stash away the first error we encounter in
-			 * wbc->saved_err so that it can be retrieved when we're
-			 * done.  This is because the file system may still have
-			 * state to clear for each folio.
-			 *
-			 * For background writeback we exit as soon as we run
-			 * past wbc->nr_to_write or encounter the first error.
-			 */
-			if (wbc->sync_mode == WB_SYNC_ALL) {
-				if (error && !ret)
-					ret = error;
-			} else {
-				if (error || wbc->nr_to_write <= 0)
-					goto done;
-			}
+		/*
+		 * For integrity writeback we have to keep going until we have
+		 * written all the folios we tagged for writeback above, even if
+		 * we run past wbc->nr_to_write or encounter errors.
+		 * We stash away the first error we encounter in wbc->saved_err
+		 * so that it can be retrieved when we're done.  This is because
+		 * the file system may still have state to clear for each folio.
+		 *
+		 * For background writeback we exit as soon as we run past
+		 * wbc->nr_to_write or encounter the first error.
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL) {
+			if (error && !ret)
+				ret = error;
+		} else {
+			if (error || wbc->nr_to_write <= 0)
+				goto done;
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/13] pagevec: Add ability to iterate a queue
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 08/13] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 10/13] writeback: Use the folio_batch queue iterator Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack.  It would
generate some very weird and suboptimal I/O patterns for page writeback
to iterate over the batch as a stack.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/pagevec.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 87cc678adc850b..fcc06c300a72c3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,6 +27,7 @@ struct folio;
  */
 struct folio_batch {
 	unsigned char nr;
+	unsigned char i;
 	bool percpu_pvec_drained;
 	struct folio *folios[PAGEVEC_SIZE];
 };
@@ -40,12 +41,14 @@ struct folio_batch {
 static inline void folio_batch_init(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->i = 0;
 	fbatch->percpu_pvec_drained = false;
 }
 
 static inline void folio_batch_reinit(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->i = 0;
 }
 
 static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
 	return folio_batch_space(fbatch);
 }
 
+/**
+ * folio_batch_next - Return the next folio to process.
+ * @fbatch: The folio batch being processed.
+ *
+ * Use this function to implement a queue of folios.
+ *
+ * Return: The next folio in the queue, or NULL if the queue is empty.
+ */
+static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
+{
+	if (fbatch->i == fbatch->nr)
+		return NULL;
+	return fbatch->folios[fbatch->i++];
+}
+
 void __folio_batch_release(struct folio_batch *pvec);
 
 static inline void folio_batch_release(struct folio_batch *fbatch)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/13] writeback: Use the folio_batch queue iterator
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 09/13] pagevec: Add ability to iterate a queue Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 11/13] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Instead of keeping our own local iterator variable, use the one just
added to folio_batch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d7ab42def43035..095ba4db9dcc17 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2406,13 +2406,21 @@ static pgoff_t wbc_end(struct writeback_control *wbc)
 	return wbc->range_end >> PAGE_SHIFT;
 }
 
-static void writeback_get_batch(struct address_space *mapping,
+static struct folio *writeback_get_folio(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
-	folio_batch_release(&wbc->fbatch);
-	cond_resched();
-	filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
-			wbc_to_tag(wbc), &wbc->fbatch);
+	struct folio *folio;
+
+	folio = folio_batch_next(&wbc->fbatch);
+	if (!folio) {
+		folio_batch_release(&wbc->fbatch);
+		cond_resched();
+		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+				wbc_to_tag(wbc), &wbc->fbatch);
+		folio = folio_batch_next(&wbc->fbatch);
+	}
+
+	return folio;
 }
 
 /**
@@ -2454,7 +2462,6 @@ int write_cache_pages(struct address_space *mapping,
 	int error;
 	struct folio *folio;
 	pgoff_t end;		/* Inclusive */
-	int i = 0;
 
 	if (wbc->range_cyclic) {
 		wbc->index = mapping->writeback_index; /* prev offset */
@@ -2469,15 +2476,10 @@ int write_cache_pages(struct address_space *mapping,
 	folio_batch_init(&wbc->fbatch);
 
 	for (;;) {
-		if (i == wbc->fbatch.nr) {
-			writeback_get_batch(mapping, wbc);
-			i = 0;
-		}
-		if (wbc->fbatch.nr == 0)
+		folio = writeback_get_folio(mapping, wbc);
+		if (!folio)
 			break;
 
-		folio = wbc->fbatch.folios[i++];
-
 		folio_lock(folio);
 		if (!folio_prepare_writeback(mapping, wbc, folio)) {
 			folio_unlock(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/13] writeback: Move the folio_prepare_writeback loop out of write_cache_pages()
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 10/13] writeback: Use the folio_batch queue iterator Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 12/13] writeback: add a writeback iterator Christoph Hellwig
  2024-02-03  7:11 ` [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
  12 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel, Jan Kara, Dave Chinner

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Move the loop for should-we-write-this-folio to writeback_get_folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: folded the loop into the existing helper instead of a separate one
      as suggested by Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page-writeback.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 095ba4db9dcc17..3abb053e70580e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2411,6 +2411,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 {
 	struct folio *folio;
 
+retry:
 	folio = folio_batch_next(&wbc->fbatch);
 	if (!folio) {
 		folio_batch_release(&wbc->fbatch);
@@ -2418,8 +2419,17 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
+		if (!folio)
+			return NULL;
+	}
+
+	folio_lock(folio);
+	if (unlikely(!folio_prepare_writeback(mapping, wbc, folio))) {
+		folio_unlock(folio);
+		goto retry;
 	}
 
+	trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
 	return folio;
 }
 
@@ -2480,14 +2490,6 @@ int write_cache_pages(struct address_space *mapping,
 		if (!folio)
 			break;
 
-		folio_lock(folio);
-		if (!folio_prepare_writeback(mapping, wbc, folio)) {
-			folio_unlock(folio);
-			continue;
-		}
-
-		trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
-
 		error = writepage(folio, wbc, data);
 		wbc->nr_to_write -= folio_nr_pages(folio);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/13] writeback: add a writeback iterator
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 11/13] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-05 11:39   ` Jan Kara
  2024-02-05 15:33   ` Brian Foster
  2024-02-03  7:11 ` [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
  12 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

Refactor the code left in write_cache_pages into an iterator that the
file system can call to get the next folio for a writeback operation:

	struct folio *folio = NULL;

	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
		error = <do per-foli writeback>;
	}

The twist here is that the error value is passed by reference, so that
the iterator can restore it when breaking out of the loop.

Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
iterator and needs is just kept in the write_cache_pages legacy wrapper.
in preparation for eventually killing it off.

Heavily based on a for_each* based iterator from Matthew Wilcox.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/writeback.h |   4 +
 mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
 2 files changed, 118 insertions(+), 78 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index f67b3ea866a0fb..9845cb62e40b2d 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -82,6 +82,7 @@ struct writeback_control {
 	/* internal fields used by the ->writepages implementation: */
 	struct folio_batch fbatch;
 	pgoff_t index;
+	int saved_err;
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
@@ -366,6 +367,9 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error);
+
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3abb053e70580e..5fe4cdb7dbd61a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
 }
 
 /**
- * tag_pages_for_writeback - tag pages to be written by write_cache_pages
+ * tag_pages_for_writeback - tag pages to be written by writeback
  * @mapping: address space structure to write
  * @start: starting page index
  * @end: ending page index (inclusive)
  *
  * This function scans the page range from @start to @end (inclusive) and tags
- * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
- * that write_cache_pages (or whoever calls this function) will then use
- * TOWRITE tag to identify pages eligible for writeback.  This mechanism is
- * used to avoid livelocking of writeback by a process steadily creating new
- * dirty pages in the file (thus it is important for this function to be quick
- * so that it can tag pages faster than a dirtying process can create them).
+ * all pages that have DIRTY tag set with a special TOWRITE tag.  The caller
+ * can then use the TOWRITE tag to identify pages eligible for writeback.
+ * This mechanism is used to avoid livelocking of writeback by a process
+ * steadily creating new dirty pages in the file (thus it is important for this
+ * function to be quick so that it can tag pages faster than a dirtying process
+ * can create them).
  */
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end)
@@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 }
 
 /**
- * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
+ * writeback_iter - iterate folio of a mapping for writeback
  * @mapping: address space structure to write
- * @wbc: subtract the number of written pages from *@wbc->nr_to_write
- * @writepage: function called for each page
- * @data: data passed to writepage function
+ * @wbc: writeback context
+ * @folio: previously iterated folio (%NULL to start)
+ * @error: in-out pointer for writeback errors (see below)
  *
- * If a page is already under I/O, write_cache_pages() skips it, even
- * if it's dirty.  This is desirable behaviour for memory-cleaning writeback,
- * but it is INCORRECT for data-integrity system calls such as fsync().  fsync()
- * and msync() need to guarantee that all the data which was dirty at the time
- * the call was made get new I/O started against them.  If wbc->sync_mode is
- * WB_SYNC_ALL then we were called for data integrity and we must wait for
- * existing IO to complete.
- *
- * To avoid livelocks (when other process dirties new pages), we first tag
- * pages which should be written back with TOWRITE tag and only then start
- * writing them. For data-integrity sync we have to be careful so that we do
- * not miss some pages (e.g., because some other process has cleared TOWRITE
- * tag we set). The rule we follow is that TOWRITE tag can be cleared only
- * by the process clearing the DIRTY tag (and submitting the page for IO).
- *
- * To avoid deadlocks between range_cyclic writeback and callers that hold
- * pages in PageWriteback to aggregate IO until write_cache_pages() returns,
- * we do not loop back to the start of the file. Doing so causes a page
- * lock/page writeback access order inversion - we should only ever lock
- * multiple pages in ascending page->index order, and looping back to the start
- * of the file violates that rule and causes deadlocks.
+ * This function returns the next folio for the writeback operation described by
+ * @wbc on @mapping and  should be called in a while loop in the ->writepages
+ * implementation.
  *
- * Return: %0 on success, negative error code otherwise
+ * To start the writeback operation, %NULL is passed in the @folio argument, and
+ * for every subsequent iteration the folio returned previously should be passed
+ * back in.
+ *
+ * If there was an error in the per-folio writeback inside the writeback_iter()
+ * loop, @error should be set to the error value.
+ *
+ * Once the writeback described in @wbc has finished, this function will return
+ * %NULL and if there was an error in any iteration restore it to @error.
+ *
+ * Note: callers should not manually break out of the loop using break or goto
+ * but must keep calling writeback_iter() until it returns %NULL.
+ *
+ * Return: the folio to write or %NULL if the loop is done.
  */
-int write_cache_pages(struct address_space *mapping,
-		      struct writeback_control *wbc, writepage_t writepage,
-		      void *data)
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error)
 {
-	int ret = 0;
-	int error;
-	struct folio *folio;
-	pgoff_t end;		/* Inclusive */
-
-	if (wbc->range_cyclic) {
-		wbc->index = mapping->writeback_index; /* prev offset */
-		end = -1;
-	} else {
-		wbc->index = wbc->range_start >> PAGE_SHIFT;
-		end = wbc->range_end >> PAGE_SHIFT;
-	}
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, wbc->index, end);
-
-	folio_batch_init(&wbc->fbatch);
+	if (!folio) {
+		folio_batch_init(&wbc->fbatch);
+		wbc->saved_err = *error = 0;
 
-	for (;;) {
-		folio = writeback_get_folio(mapping, wbc);
-		if (!folio)
-			break;
+		/*
+		 * For range cyclic writeback we remember where we stopped so
+		 * that we can continue where we stopped.
+		 *
+		 * For non-cyclic writeback we always start at the beginning of
+		 * the passed in range.
+		 */
+		if (wbc->range_cyclic)
+			wbc->index = mapping->writeback_index;
+		else
+			wbc->index = wbc->range_start >> PAGE_SHIFT;
 
-		error = writepage(folio, wbc, data);
+		/*
+		 * To avoid livelocks when other processes dirty new pages, we
+		 * first tag pages which should be written back and only then
+		 * start writing them.
+		 *
+		 * For data-integrity writeback we have to be careful so that we
+		 * do not miss some pages (e.g., because some other process has
+		 * cleared the TOWRITE tag we set).  The rule we follow is that
+		 * TOWRITE tag can be cleared only by the process clearing the
+		 * DIRTY tag (and submitting the page for I/O).
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+			tag_pages_for_writeback(mapping, wbc->index,
+					wbc_end(wbc));
+	} else {
 		wbc->nr_to_write -= folio_nr_pages(folio);
 
-		if (error == AOP_WRITEPAGE_ACTIVATE) {
-			folio_unlock(folio);
-			error = 0;
-		}
+		WARN_ON_ONCE(*error > 0);
 
 		/*
 		 * For integrity writeback we have to keep going until we have
@@ -2510,33 +2509,70 @@ int write_cache_pages(struct address_space *mapping,
 		 * wbc->nr_to_write or encounter the first error.
 		 */
 		if (wbc->sync_mode == WB_SYNC_ALL) {
-			if (error && !ret)
-				ret = error;
+			if (*error && !wbc->saved_err)
+				wbc->saved_err = *error;
 		} else {
-			if (error || wbc->nr_to_write <= 0)
+			if (*error || wbc->nr_to_write <= 0)
 				goto done;
 		}
 	}
 
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	folio_batch_release(&wbc->fbatch);
-	if (wbc->range_cyclic)
-		mapping->writeback_index = 0;
-	return ret;
+	folio = writeback_get_folio(mapping, wbc);
+	if (!folio) {
+		/*
+		 * To avoid deadlocks between range_cyclic writeback and callers
+		 * that hold pages in PageWriteback to aggregate I/O until
+		 * the writeback iteration finishes, we do not loop back to the
+		 * start of the file.  Doing so causes a page lock/page
+		 * writeback access order inversion - we should only ever lock
+		 * multiple pages in ascending page->index order, and looping
+		 * back to the start of the file violates that rule and causes
+		 * deadlocks.
+		 */
+		if (wbc->range_cyclic)
+			mapping->writeback_index = 0;
+
+		/*
+		 * Return the first error we encountered (if there was any) to
+		 * the caller.
+		 */
+		*error = wbc->saved_err;
+	}
+	return folio;
 
 done:
 	folio_batch_release(&wbc->fbatch);
 	if (wbc->range_cyclic)
 		mapping->writeback_index = folio->index + folio_nr_pages(folio);
+	return NULL;
+}
+
+/**
+ * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
+ * @mapping: address space structure to write
+ * @wbc: subtract the number of written pages from *@wbc->nr_to_write
+ * @writepage: function called for each page
+ * @data: data passed to writepage function
+ *
+ * Return: %0 on success, negative error code otherwise
+ *
+ * Note: please use writeback_iter() instead.
+ */
+int write_cache_pages(struct address_space *mapping,
+		      struct writeback_control *wbc, writepage_t writepage,
+		      void *data)
+{
+	struct folio *folio = NULL;
+	int error;
+
+	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
+		error = writepage(folio, wbc, data);
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
+	}
+
 	return error;
 }
 EXPORT_SYMBOL(write_cache_pages);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages()
  2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-02-03  7:11 ` [PATCH 12/13] writeback: add a writeback iterator Christoph Hellwig
@ 2024-02-03  7:11 ` Christoph Hellwig
  2024-02-05 11:43   ` Jan Kara
  2024-02-05 15:34   ` Brian Foster
  12 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-03  7:11 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new writeback_iter() directly instead of indirecting
through a callback.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: ported to the while based iter style]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/page-writeback.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 5fe4cdb7dbd61a..53ff2d8219ddb6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2577,13 +2577,25 @@ int write_cache_pages(struct address_space *mapping,
 }
 EXPORT_SYMBOL(write_cache_pages);
 
-static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
-		void *data)
+static int writeback_use_writepage(struct address_space *mapping,
+		struct writeback_control *wbc)
 {
-	struct address_space *mapping = data;
-	int ret = mapping->a_ops->writepage(&folio->page, wbc);
-	mapping_set_error(mapping, ret);
-	return ret;
+	struct folio *folio = NULL;
+	struct blk_plug plug;
+	int err;
+
+	blk_start_plug(&plug);
+	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
+		err = mapping->a_ops->writepage(&folio->page, wbc);
+		mapping_set_error(mapping, err);
+		if (err == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			err = 0;
+		}
+	}
+	blk_finish_plug(&plug);
+
+	return err;
 }
 
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
@@ -2599,12 +2611,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 		if (mapping->a_ops->writepages) {
 			ret = mapping->a_ops->writepages(mapping, wbc);
 		} else if (mapping->a_ops->writepage) {
-			struct blk_plug plug;
-
-			blk_start_plug(&plug);
-			ret = write_cache_pages(mapping, wbc, writepage_cb,
-						mapping);
-			blk_finish_plug(&plug);
+			ret = writeback_use_writepage(mapping, wbc);
 		} else {
 			/* deal with chardevs and other special files */
 			ret = 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages
  2024-02-03  7:11 ` [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
@ 2024-02-05 11:34   ` Jan Kara
  2024-02-05 15:32   ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2024-02-05 11:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat 03-02-24 08:11:39, Christoph Hellwig wrote:
> Rework the way we deal with the cleanup after the writepage call.
> 
> First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error
> returns to get it out of the way of the actual error handling path.
> 
> The split the handling on intgrity vs non-integrity branches first,
> and return early using a goto for the non-ingegrity early loop condition
> to remove the need for the done and done_index local variables, and for
> assigning the error to ret when we can just return error directly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/page-writeback.c | 84 ++++++++++++++++++---------------------------
>  1 file changed, 33 insertions(+), 51 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c7c494526bc650..88b2c4c111c01b 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2396,13 +2396,12 @@ int write_cache_pages(struct address_space *mapping,
>  		      void *data)
>  {
>  	int ret = 0;
> -	int done = 0;
>  	int error;
>  	struct folio_batch fbatch;
> +	struct folio *folio;
>  	int nr_folios;
>  	pgoff_t index;
>  	pgoff_t end;		/* Inclusive */
> -	pgoff_t done_index;
>  	xa_mark_t tag;
>  
>  	folio_batch_init(&fbatch);
> @@ -2419,8 +2418,7 @@ int write_cache_pages(struct address_space *mapping,
>  	} else {
>  		tag = PAGECACHE_TAG_DIRTY;
>  	}
> -	done_index = index;
> -	while (!done && (index <= end)) {
> +	while (index <= end) {
>  		int i;
>  
>  		nr_folios = filemap_get_folios_tag(mapping, &index, end,
> @@ -2430,11 +2428,7 @@ int write_cache_pages(struct address_space *mapping,
>  			break;
>  
>  		for (i = 0; i < nr_folios; i++) {
> -			struct folio *folio = fbatch.folios[i];
> -			unsigned long nr;
> -
> -			done_index = folio->index;
> -
> +			folio = fbatch.folios[i];
>  			folio_lock(folio);
>  
>  			/*
> @@ -2469,45 +2463,32 @@ int write_cache_pages(struct address_space *mapping,
>  
>  			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
>  			error = writepage(folio, wbc, data);
> -			nr = folio_nr_pages(folio);
> -			wbc->nr_to_write -= nr;
> -			if (unlikely(error)) {
> -				/*
> -				 * Handle errors according to the type of
> -				 * writeback. There's no need to continue for
> -				 * background writeback. Just push done_index
> -				 * past this page so media errors won't choke
> -				 * writeout for the entire file. For integrity
> -				 * writeback, we must process the entire dirty
> -				 * set regardless of errors because the fs may
> -				 * still have state to clear for each page. In
> -				 * that case we continue processing and return
> -				 * the first error.
> -				 */
> -				if (error == AOP_WRITEPAGE_ACTIVATE) {
> -					folio_unlock(folio);
> -					error = 0;
> -				} else if (wbc->sync_mode != WB_SYNC_ALL) {
> -					ret = error;
> -					done_index = folio->index + nr;
> -					done = 1;
> -					break;
> -				}
> -				if (!ret)
> -					ret = error;
> +			wbc->nr_to_write -= folio_nr_pages(folio);
> +
> +			if (error == AOP_WRITEPAGE_ACTIVATE) {
> +				folio_unlock(folio);
> +				error = 0;
>  			}
>  
>  			/*
> -			 * We stop writing back only if we are not doing
> -			 * integrity sync. In case of integrity sync we have to
> -			 * keep going until we have written all the pages
> -			 * we tagged for writeback prior to entering this loop.
> +			 * For integrity writeback we have to keep going until
> +			 * we have written all the folios we tagged for
> +			 * writeback above, even if we run past wbc->nr_to_write
> +			 * or encounter errors.
> +			 * We stash away the first error we encounter in
> +			 * wbc->saved_err so that it can be retrieved when we're
> +			 * done.  This is because the file system may still have
> +			 * state to clear for each folio.
> +			 *
> +			 * For background writeback we exit as soon as we run
> +			 * past wbc->nr_to_write or encounter the first error.
>  			 */
> -			done_index = folio->index + nr;
> -			if (wbc->nr_to_write <= 0 &&
> -			    wbc->sync_mode == WB_SYNC_NONE) {
> -				done = 1;
> -				break;
> +			if (wbc->sync_mode == WB_SYNC_ALL) {
> +				if (error && !ret)
> +					ret = error;
> +			} else {
> +				if (error || wbc->nr_to_write <= 0)
> +					goto done;
>  			}
>  		}
>  		folio_batch_release(&fbatch);
> @@ -2524,14 +2505,15 @@ int write_cache_pages(struct address_space *mapping,
>  	 * of the file if we are called again, which can only happen due to
>  	 * -ENOMEM from the file system.
>  	 */
> -	if (wbc->range_cyclic) {
> -		if (done)
> -			mapping->writeback_index = done_index;
> -		else
> -			mapping->writeback_index = 0;
> -	}
> -
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = 0;
>  	return ret;
> +
> +done:
> +	folio_batch_release(&fbatch);
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 12/13] writeback: add a writeback iterator
  2024-02-03  7:11 ` [PATCH 12/13] writeback: add a writeback iterator Christoph Hellwig
@ 2024-02-05 11:39   ` Jan Kara
  2024-02-05 15:33   ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2024-02-05 11:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat 03-02-24 08:11:46, Christoph Hellwig wrote:
> Refactor the code left in write_cache_pages into an iterator that the
> file system can call to get the next folio for a writeback operation:
> 
> 	struct folio *folio = NULL;
> 
> 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> 		error = <do per-foli writeback>;
> 	}
> 
> The twist here is that the error value is passed by reference, so that
> the iterator can restore it when breaking out of the loop.
> 
> Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
> iterator and needs is just kept in the write_cache_pages legacy wrapper.
> in preparation for eventually killing it off.
> 
> Heavily based on a for_each* based iterator from Matthew Wilcox.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/writeback.h |   4 +
>  mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
>  2 files changed, 118 insertions(+), 78 deletions(-)
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index f67b3ea866a0fb..9845cb62e40b2d 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -82,6 +82,7 @@ struct writeback_control {
>  	/* internal fields used by the ->writepages implementation: */
>  	struct folio_batch fbatch;
>  	pgoff_t index;
> +	int saved_err;
>  
>  #ifdef CONFIG_CGROUP_WRITEBACK
>  	struct bdi_writeback *wb;	/* wb this writeback is issued under */
> @@ -366,6 +367,9 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
>  
>  bool wb_over_bg_thresh(struct bdi_writeback *wb);
>  
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error);
> +
>  typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
>  				void *data);
>  
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 3abb053e70580e..5fe4cdb7dbd61a 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
>  }
>  
>  /**
> - * tag_pages_for_writeback - tag pages to be written by write_cache_pages
> + * tag_pages_for_writeback - tag pages to be written by writeback
>   * @mapping: address space structure to write
>   * @start: starting page index
>   * @end: ending page index (inclusive)
>   *
>   * This function scans the page range from @start to @end (inclusive) and tags
> - * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
> - * that write_cache_pages (or whoever calls this function) will then use
> - * TOWRITE tag to identify pages eligible for writeback.  This mechanism is
> - * used to avoid livelocking of writeback by a process steadily creating new
> - * dirty pages in the file (thus it is important for this function to be quick
> - * so that it can tag pages faster than a dirtying process can create them).
> + * all pages that have DIRTY tag set with a special TOWRITE tag.  The caller
> + * can then use the TOWRITE tag to identify pages eligible for writeback.
> + * This mechanism is used to avoid livelocking of writeback by a process
> + * steadily creating new dirty pages in the file (thus it is important for this
> + * function to be quick so that it can tag pages faster than a dirtying process
> + * can create them).
>   */
>  void tag_pages_for_writeback(struct address_space *mapping,
>  			     pgoff_t start, pgoff_t end)
> @@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  }
>  
>  /**
> - * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> + * writeback_iter - iterate folio of a mapping for writeback
>   * @mapping: address space structure to write
> - * @wbc: subtract the number of written pages from *@wbc->nr_to_write
> - * @writepage: function called for each page
> - * @data: data passed to writepage function
> + * @wbc: writeback context
> + * @folio: previously iterated folio (%NULL to start)
> + * @error: in-out pointer for writeback errors (see below)
>   *
> - * If a page is already under I/O, write_cache_pages() skips it, even
> - * if it's dirty.  This is desirable behaviour for memory-cleaning writeback,
> - * but it is INCORRECT for data-integrity system calls such as fsync().  fsync()
> - * and msync() need to guarantee that all the data which was dirty at the time
> - * the call was made get new I/O started against them.  If wbc->sync_mode is
> - * WB_SYNC_ALL then we were called for data integrity and we must wait for
> - * existing IO to complete.
> - *
> - * To avoid livelocks (when other process dirties new pages), we first tag
> - * pages which should be written back with TOWRITE tag and only then start
> - * writing them. For data-integrity sync we have to be careful so that we do
> - * not miss some pages (e.g., because some other process has cleared TOWRITE
> - * tag we set). The rule we follow is that TOWRITE tag can be cleared only
> - * by the process clearing the DIRTY tag (and submitting the page for IO).
> - *
> - * To avoid deadlocks between range_cyclic writeback and callers that hold
> - * pages in PageWriteback to aggregate IO until write_cache_pages() returns,
> - * we do not loop back to the start of the file. Doing so causes a page
> - * lock/page writeback access order inversion - we should only ever lock
> - * multiple pages in ascending page->index order, and looping back to the start
> - * of the file violates that rule and causes deadlocks.
> + * This function returns the next folio for the writeback operation described by
> + * @wbc on @mapping and  should be called in a while loop in the ->writepages
> + * implementation.
>   *
> - * Return: %0 on success, negative error code otherwise
> + * To start the writeback operation, %NULL is passed in the @folio argument, and
> + * for every subsequent iteration the folio returned previously should be passed
> + * back in.
> + *
> + * If there was an error in the per-folio writeback inside the writeback_iter()
> + * loop, @error should be set to the error value.
> + *
> + * Once the writeback described in @wbc has finished, this function will return
> + * %NULL and if there was an error in any iteration restore it to @error.
> + *
> + * Note: callers should not manually break out of the loop using break or goto
> + * but must keep calling writeback_iter() until it returns %NULL.
> + *
> + * Return: the folio to write or %NULL if the loop is done.
>   */
> -int write_cache_pages(struct address_space *mapping,
> -		      struct writeback_control *wbc, writepage_t writepage,
> -		      void *data)
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	int ret = 0;
> -	int error;
> -	struct folio *folio;
> -	pgoff_t end;		/* Inclusive */
> -
> -	if (wbc->range_cyclic) {
> -		wbc->index = mapping->writeback_index; /* prev offset */
> -		end = -1;
> -	} else {
> -		wbc->index = wbc->range_start >> PAGE_SHIFT;
> -		end = wbc->range_end >> PAGE_SHIFT;
> -	}
> -	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> -		tag_pages_for_writeback(mapping, wbc->index, end);
> -
> -	folio_batch_init(&wbc->fbatch);
> +	if (!folio) {
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->saved_err = *error = 0;
>  
> -	for (;;) {
> -		folio = writeback_get_folio(mapping, wbc);
> -		if (!folio)
> -			break;
> +		/*
> +		 * For range cyclic writeback we remember where we stopped so
> +		 * that we can continue where we stopped.
> +		 *
> +		 * For non-cyclic writeback we always start at the beginning of
> +		 * the passed in range.
> +		 */
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index;
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
>  
> -		error = writepage(folio, wbc, data);
> +		/*
> +		 * To avoid livelocks when other processes dirty new pages, we
> +		 * first tag pages which should be written back and only then
> +		 * start writing them.
> +		 *
> +		 * For data-integrity writeback we have to be careful so that we
> +		 * do not miss some pages (e.g., because some other process has
> +		 * cleared the TOWRITE tag we set).  The rule we follow is that
> +		 * TOWRITE tag can be cleared only by the process clearing the
> +		 * DIRTY tag (and submitting the page for I/O).
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +	} else {
>  		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -		if (error == AOP_WRITEPAGE_ACTIVATE) {
> -			folio_unlock(folio);
> -			error = 0;
> -		}
> +		WARN_ON_ONCE(*error > 0);
>  
>  		/*
>  		 * For integrity writeback we have to keep going until we have
> @@ -2510,33 +2509,70 @@ int write_cache_pages(struct address_space *mapping,
>  		 * wbc->nr_to_write or encounter the first error.
>  		 */
>  		if (wbc->sync_mode == WB_SYNC_ALL) {
> -			if (error && !ret)
> -				ret = error;
> +			if (*error && !wbc->saved_err)
> +				wbc->saved_err = *error;
>  		} else {
> -			if (error || wbc->nr_to_write <= 0)
> +			if (*error || wbc->nr_to_write <= 0)
>  				goto done;
>  		}
>  	}
>  
> -	/*
> -	 * For range cyclic writeback we need to remember where we stopped so
> -	 * that we can continue there next time we are called.  If  we hit the
> -	 * last page and there is more work to be done, wrap back to the start
> -	 * of the file.
> -	 *
> -	 * For non-cyclic writeback we always start looking up at the beginning
> -	 * of the file if we are called again, which can only happen due to
> -	 * -ENOMEM from the file system.
> -	 */
> -	folio_batch_release(&wbc->fbatch);
> -	if (wbc->range_cyclic)
> -		mapping->writeback_index = 0;
> -	return ret;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * To avoid deadlocks between range_cyclic writeback and callers
> +		 * that hold pages in PageWriteback to aggregate I/O until
> +		 * the writeback iteration finishes, we do not loop back to the
> +		 * start of the file.  Doing so causes a page lock/page
> +		 * writeback access order inversion - we should only ever lock
> +		 * multiple pages in ascending page->index order, and looping
> +		 * back to the start of the file violates that rule and causes
> +		 * deadlocks.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
> +
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller.
> +		 */
> +		*error = wbc->saved_err;
> +	}
> +	return folio;
>  
>  done:
>  	folio_batch_release(&wbc->fbatch);
>  	if (wbc->range_cyclic)
>  		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	return NULL;
> +}
> +
> +/**
> + * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> + * @mapping: address space structure to write
> + * @wbc: subtract the number of written pages from *@wbc->nr_to_write
> + * @writepage: function called for each page
> + * @data: data passed to writepage function
> + *
> + * Return: %0 on success, negative error code otherwise
> + *
> + * Note: please use writeback_iter() instead.
> + */
> +int write_cache_pages(struct address_space *mapping,
> +		      struct writeback_control *wbc, writepage_t writepage,
> +		      void *data)
> +{
> +	struct folio *folio = NULL;
> +	int error;
> +
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> +		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
> +
>  	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages()
  2024-02-03  7:11 ` [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
@ 2024-02-05 11:43   ` Jan Kara
  2024-02-05 15:34   ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2024-02-05 11:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat 03-02-24 08:11:47, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new writeback_iter() directly instead of indirecting
> through a callback.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: ported to the while based iter style]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

I've just noticed one preexisting problem which was made more visible by
your reshuffling:

> +static int writeback_use_writepage(struct address_space *mapping,
> +		struct writeback_control *wbc)
>  {
> -	struct address_space *mapping = data;
> -	int ret = mapping->a_ops->writepage(&folio->page, wbc);
> -	mapping_set_error(mapping, ret);
> -	return ret;
> +	struct folio *folio = NULL;
> +	struct blk_plug plug;
> +	int err;
> +
> +	blk_start_plug(&plug);
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
> +		err = mapping->a_ops->writepage(&folio->page, wbc);
> +		mapping_set_error(mapping, err);

So if ->writepage() returns AOP_WRITEPAGE_ACTIVATE, we shouldn't call
mapping_set_error() because that's just going to confuse the error
tracking.

> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
> +	}
> +	blk_finish_plug(&plug);
> +
> +	return err;
>  }

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages
  2024-02-03  7:11 ` [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
  2024-02-05 11:34   ` Jan Kara
@ 2024-02-05 15:32   ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Brian Foster @ 2024-02-05 15:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat, Feb 03, 2024 at 08:11:39AM +0100, Christoph Hellwig wrote:
> Rework the way we deal with the cleanup after the writepage call.
> 
> First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error
> returns to get it out of the way of the actual error handling path.
> 
> The split the handling on intgrity vs non-integrity branches first,
> and return early using a goto for the non-ingegrity early loop condition
> to remove the need for the done and done_index local variables, and for
> assigning the error to ret when we can just return error directly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/page-writeback.c | 84 ++++++++++++++++++---------------------------
>  1 file changed, 33 insertions(+), 51 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c7c494526bc650..88b2c4c111c01b 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
...
> @@ -2524,14 +2505,15 @@ int write_cache_pages(struct address_space *mapping,
>  	 * of the file if we are called again, which can only happen due to
>  	 * -ENOMEM from the file system.
>  	 */
> -	if (wbc->range_cyclic) {
> -		if (done)
> -			mapping->writeback_index = done_index;
> -		else
> -			mapping->writeback_index = 0;
> -	}
> -
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = 0;
>  	return ret;
> +
> +done:
> +	folio_batch_release(&fbatch);
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);

Shouldn't this release the batch after we're done accessing folio? With
that addressed:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 12/13] writeback: add a writeback iterator
  2024-02-03  7:11 ` [PATCH 12/13] writeback: add a writeback iterator Christoph Hellwig
  2024-02-05 11:39   ` Jan Kara
@ 2024-02-05 15:33   ` Brian Foster
  2024-02-06 14:54     ` Brian Foster
  1 sibling, 1 reply; 23+ messages in thread
From: Brian Foster @ 2024-02-05 15:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat, Feb 03, 2024 at 08:11:46AM +0100, Christoph Hellwig wrote:
> Refactor the code left in write_cache_pages into an iterator that the
> file system can call to get the next folio for a writeback operation:
> 
> 	struct folio *folio = NULL;
> 
> 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> 		error = <do per-foli writeback>;
> 	}
> 
> The twist here is that the error value is passed by reference, so that
> the iterator can restore it when breaking out of the loop.
> 
> Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
> iterator and needs is just kept in the write_cache_pages legacy wrapper.
> in preparation for eventually killing it off.
> 
> Heavily based on a for_each* based iterator from Matthew Wilcox.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/writeback.h |   4 +
>  mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
>  2 files changed, 118 insertions(+), 78 deletions(-)
> 
...
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 3abb053e70580e..5fe4cdb7dbd61a 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
...
> @@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  }
>  
>  /**
...
>   */
> -int write_cache_pages(struct address_space *mapping,
> -		      struct writeback_control *wbc, writepage_t writepage,
> -		      void *data)
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
...
> +	} else {
>  		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -		if (error == AOP_WRITEPAGE_ACTIVATE) {
> -			folio_unlock(folio);
> -			error = 0;
> -		}
> +		WARN_ON_ONCE(*error > 0);

Why the warning on writeback error here? It looks like new behavior, but
maybe I missed something. Otherwise the factoring LGTM.

Brian

>  
>  		/*
>  		 * For integrity writeback we have to keep going until we have
> @@ -2510,33 +2509,70 @@ int write_cache_pages(struct address_space *mapping,
>  		 * wbc->nr_to_write or encounter the first error.
>  		 */
>  		if (wbc->sync_mode == WB_SYNC_ALL) {
> -			if (error && !ret)
> -				ret = error;
> +			if (*error && !wbc->saved_err)
> +				wbc->saved_err = *error;
>  		} else {
> -			if (error || wbc->nr_to_write <= 0)
> +			if (*error || wbc->nr_to_write <= 0)
>  				goto done;
>  		}
>  	}
>  
> -	/*
> -	 * For range cyclic writeback we need to remember where we stopped so
> -	 * that we can continue there next time we are called.  If  we hit the
> -	 * last page and there is more work to be done, wrap back to the start
> -	 * of the file.
> -	 *
> -	 * For non-cyclic writeback we always start looking up at the beginning
> -	 * of the file if we are called again, which can only happen due to
> -	 * -ENOMEM from the file system.
> -	 */
> -	folio_batch_release(&wbc->fbatch);
> -	if (wbc->range_cyclic)
> -		mapping->writeback_index = 0;
> -	return ret;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * To avoid deadlocks between range_cyclic writeback and callers
> +		 * that hold pages in PageWriteback to aggregate I/O until
> +		 * the writeback iteration finishes, we do not loop back to the
> +		 * start of the file.  Doing so causes a page lock/page
> +		 * writeback access order inversion - we should only ever lock
> +		 * multiple pages in ascending page->index order, and looping
> +		 * back to the start of the file violates that rule and causes
> +		 * deadlocks.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
> +
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller.
> +		 */
> +		*error = wbc->saved_err;
> +	}
> +	return folio;
>  
>  done:
>  	folio_batch_release(&wbc->fbatch);
>  	if (wbc->range_cyclic)
>  		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	return NULL;
> +}
> +
> +/**
> + * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> + * @mapping: address space structure to write
> + * @wbc: subtract the number of written pages from *@wbc->nr_to_write
> + * @writepage: function called for each page
> + * @data: data passed to writepage function
> + *
> + * Return: %0 on success, negative error code otherwise
> + *
> + * Note: please use writeback_iter() instead.
> + */
> +int write_cache_pages(struct address_space *mapping,
> +		      struct writeback_control *wbc, writepage_t writepage,
> +		      void *data)
> +{
> +	struct folio *folio = NULL;
> +	int error;
> +
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> +		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
> +
>  	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages()
  2024-02-03  7:11 ` [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
  2024-02-05 11:43   ` Jan Kara
@ 2024-02-05 15:34   ` Brian Foster
  1 sibling, 0 replies; 23+ messages in thread
From: Brian Foster @ 2024-02-05 15:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Sat, Feb 03, 2024 at 08:11:47AM +0100, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new writeback_iter() directly instead of indirecting
> through a callback.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: ported to the while based iter style]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

LGTM, modulo Jan's comments:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  mm/page-writeback.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 5fe4cdb7dbd61a..53ff2d8219ddb6 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2577,13 +2577,25 @@ int write_cache_pages(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> -static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
> -		void *data)
> +static int writeback_use_writepage(struct address_space *mapping,
> +		struct writeback_control *wbc)
>  {
> -	struct address_space *mapping = data;
> -	int ret = mapping->a_ops->writepage(&folio->page, wbc);
> -	mapping_set_error(mapping, ret);
> -	return ret;
> +	struct folio *folio = NULL;
> +	struct blk_plug plug;
> +	int err;
> +
> +	blk_start_plug(&plug);
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
> +		err = mapping->a_ops->writepage(&folio->page, wbc);
> +		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
> +	}
> +	blk_finish_plug(&plug);
> +
> +	return err;
>  }
>  
>  int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
> @@ -2599,12 +2611,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
>  		if (mapping->a_ops->writepages) {
>  			ret = mapping->a_ops->writepages(mapping, wbc);
>  		} else if (mapping->a_ops->writepage) {
> -			struct blk_plug plug;
> -
> -			blk_start_plug(&plug);
> -			ret = write_cache_pages(mapping, wbc, writepage_cb,
> -						mapping);
> -			blk_finish_plug(&plug);
> +			ret = writeback_use_writepage(mapping, wbc);
>  		} else {
>  			/* deal with chardevs and other special files */
>  			ret = 0;
> -- 
> 2.39.2
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 12/13] writeback: add a writeback iterator
  2024-02-05 15:33   ` Brian Foster
@ 2024-02-06 14:54     ` Brian Foster
  2024-02-07  8:42       ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2024-02-06 14:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
	Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	linux-kernel

On Mon, Feb 05, 2024 at 10:33:52AM -0500, Brian Foster wrote:
> On Sat, Feb 03, 2024 at 08:11:46AM +0100, Christoph Hellwig wrote:
> > Refactor the code left in write_cache_pages into an iterator that the
> > file system can call to get the next folio for a writeback operation:
> > 
> > 	struct folio *folio = NULL;
> > 
> > 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > 		error = <do per-foli writeback>;
> > 	}
> > 
> > The twist here is that the error value is passed by reference, so that
> > the iterator can restore it when breaking out of the loop.
> > 
> > Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
> > iterator and needs is just kept in the write_cache_pages legacy wrapper.
> > in preparation for eventually killing it off.
> > 
> > Heavily based on a for_each* based iterator from Matthew Wilcox.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  include/linux/writeback.h |   4 +
> >  mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
> >  2 files changed, 118 insertions(+), 78 deletions(-)
> > 
> ...
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 3abb053e70580e..5fe4cdb7dbd61a 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> ...
> > @@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
> >  }
> >  
> >  /**
> ...
> >   */
> > -int write_cache_pages(struct address_space *mapping,
> > -		      struct writeback_control *wbc, writepage_t writepage,
> > -		      void *data)
> > +struct folio *writeback_iter(struct address_space *mapping,
> > +		struct writeback_control *wbc, struct folio *folio, int *error)
> >  {
> ...
> > +	} else {
> >  		wbc->nr_to_write -= folio_nr_pages(folio);
> >  
> > -		if (error == AOP_WRITEPAGE_ACTIVATE) {
> > -			folio_unlock(folio);
> > -			error = 0;
> > -		}
> > +		WARN_ON_ONCE(*error > 0);
> 
> Why the warning on writeback error here? It looks like new behavior, but
> maybe I missed something. Otherwise the factoring LGTM.

Err, sorry.. I glossed over the > 0 check and read it as < 0.
Disregard, this seems reasonable to me as long as we no longer expect
those AOP returns (which I'm not really clear on either, but anyways..):

Reviewed-by: Brian Foster <bfoster@redhat.com>

> 
> Brian
> 
> >  
> >  		/*
> >  		 * For integrity writeback we have to keep going until we have
> > @@ -2510,33 +2509,70 @@ int write_cache_pages(struct address_space *mapping,
> >  		 * wbc->nr_to_write or encounter the first error.
> >  		 */
> >  		if (wbc->sync_mode == WB_SYNC_ALL) {
> > -			if (error && !ret)
> > -				ret = error;
> > +			if (*error && !wbc->saved_err)
> > +				wbc->saved_err = *error;
> >  		} else {
> > -			if (error || wbc->nr_to_write <= 0)
> > +			if (*error || wbc->nr_to_write <= 0)
> >  				goto done;
> >  		}
> >  	}
> >  
> > -	/*
> > -	 * For range cyclic writeback we need to remember where we stopped so
> > -	 * that we can continue there next time we are called.  If  we hit the
> > -	 * last page and there is more work to be done, wrap back to the start
> > -	 * of the file.
> > -	 *
> > -	 * For non-cyclic writeback we always start looking up at the beginning
> > -	 * of the file if we are called again, which can only happen due to
> > -	 * -ENOMEM from the file system.
> > -	 */
> > -	folio_batch_release(&wbc->fbatch);
> > -	if (wbc->range_cyclic)
> > -		mapping->writeback_index = 0;
> > -	return ret;
> > +	folio = writeback_get_folio(mapping, wbc);
> > +	if (!folio) {
> > +		/*
> > +		 * To avoid deadlocks between range_cyclic writeback and callers
> > +		 * that hold pages in PageWriteback to aggregate I/O until
> > +		 * the writeback iteration finishes, we do not loop back to the
> > +		 * start of the file.  Doing so causes a page lock/page
> > +		 * writeback access order inversion - we should only ever lock
> > +		 * multiple pages in ascending page->index order, and looping
> > +		 * back to the start of the file violates that rule and causes
> > +		 * deadlocks.
> > +		 */
> > +		if (wbc->range_cyclic)
> > +			mapping->writeback_index = 0;
> > +
> > +		/*
> > +		 * Return the first error we encountered (if there was any) to
> > +		 * the caller.
> > +		 */
> > +		*error = wbc->saved_err;
> > +	}
> > +	return folio;
> >  
> >  done:
> >  	folio_batch_release(&wbc->fbatch);
> >  	if (wbc->range_cyclic)
> >  		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> > + * @mapping: address space structure to write
> > + * @wbc: subtract the number of written pages from *@wbc->nr_to_write
> > + * @writepage: function called for each page
> > + * @data: data passed to writepage function
> > + *
> > + * Return: %0 on success, negative error code otherwise
> > + *
> > + * Note: please use writeback_iter() instead.
> > + */
> > +int write_cache_pages(struct address_space *mapping,
> > +		      struct writeback_control *wbc, writepage_t writepage,
> > +		      void *data)
> > +{
> > +	struct folio *folio = NULL;
> > +	int error;
> > +
> > +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > +		error = writepage(folio, wbc, data);
> > +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> > +			folio_unlock(folio);
> > +			error = 0;
> > +		}
> > +	}
> > +
> >  	return error;
> >  }
> >  EXPORT_SYMBOL(write_cache_pages);
> > -- 
> > 2.39.2
> > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 12/13] writeback: add a writeback iterator
  2024-02-06 14:54     ` Brian Foster
@ 2024-02-07  8:42       ` Jan Kara
  2024-02-07 14:00         ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2024-02-07  8:42 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel

On Tue 06-02-24 09:54:01, Brian Foster wrote:
> On Mon, Feb 05, 2024 at 10:33:52AM -0500, Brian Foster wrote:
> > On Sat, Feb 03, 2024 at 08:11:46AM +0100, Christoph Hellwig wrote:
> > > Refactor the code left in write_cache_pages into an iterator that the
> > > file system can call to get the next folio for a writeback operation:
> > > 
> > > 	struct folio *folio = NULL;
> > > 
> > > 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > > 		error = <do per-foli writeback>;
> > > 	}
> > > 
> > > The twist here is that the error value is passed by reference, so that
> > > the iterator can restore it when breaking out of the loop.
> > > 
> > > Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
> > > iterator and needs is just kept in the write_cache_pages legacy wrapper.
> > > in preparation for eventually killing it off.
> > > 
> > > Heavily based on a for_each* based iterator from Matthew Wilcox.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  include/linux/writeback.h |   4 +
> > >  mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
> > >  2 files changed, 118 insertions(+), 78 deletions(-)
> > > 
> > ...
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 3abb053e70580e..5fe4cdb7dbd61a 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > ...
> > > @@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
> > >  }
> > >  
> > >  /**
> > ...
> > >   */
> > > -int write_cache_pages(struct address_space *mapping,
> > > -		      struct writeback_control *wbc, writepage_t writepage,
> > > -		      void *data)
> > > +struct folio *writeback_iter(struct address_space *mapping,
> > > +		struct writeback_control *wbc, struct folio *folio, int *error)
> > >  {
> > ...
> > > +	} else {
> > >  		wbc->nr_to_write -= folio_nr_pages(folio);
> > >  
> > > -		if (error == AOP_WRITEPAGE_ACTIVATE) {
> > > -			folio_unlock(folio);
> > > -			error = 0;
> > > -		}
> > > +		WARN_ON_ONCE(*error > 0);
> > 
> > Why the warning on writeback error here? It looks like new behavior, but
> > maybe I missed something. Otherwise the factoring LGTM.
> 
> Err, sorry.. I glossed over the > 0 check and read it as < 0.
> Disregard, this seems reasonable to me as long as we no longer expect
> those AOP returns (which I'm not really clear on either, but anyways..):
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

So my understanding is that AOP_WRITEPAGE_ACTIVATE should be now handled
directly by the caller of ->writepage hook and not by writeback_iter()
which is the reason why the warning is here.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 12/13] writeback: add a writeback iterator
  2024-02-07  8:42       ` Jan Kara
@ 2024-02-07 14:00         ` Brian Foster
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2024-02-07 14:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, Matthew Wilcox, Jan Kara,
	David Howells, Christian Brauner, Darrick J. Wong, linux-xfs,
	linux-fsdevel, linux-kernel

On Wed, Feb 07, 2024 at 09:42:24AM +0100, Jan Kara wrote:
> On Tue 06-02-24 09:54:01, Brian Foster wrote:
> > On Mon, Feb 05, 2024 at 10:33:52AM -0500, Brian Foster wrote:
> > > On Sat, Feb 03, 2024 at 08:11:46AM +0100, Christoph Hellwig wrote:
> > > > Refactor the code left in write_cache_pages into an iterator that the
> > > > file system can call to get the next folio for a writeback operation:
> > > > 
> > > > 	struct folio *folio = NULL;
> > > > 
> > > > 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> > > > 		error = <do per-foli writeback>;
> > > > 	}
> > > > 
> > > > The twist here is that the error value is passed by reference, so that
> > > > the iterator can restore it when breaking out of the loop.
> > > > 
> > > > Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
> > > > iterator and needs is just kept in the write_cache_pages legacy wrapper.
> > > > in preparation for eventually killing it off.
> > > > 
> > > > Heavily based on a for_each* based iterator from Matthew Wilcox.
> > > > 
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  include/linux/writeback.h |   4 +
> > > >  mm/page-writeback.c       | 192 ++++++++++++++++++++++----------------
> > > >  2 files changed, 118 insertions(+), 78 deletions(-)
> > > > 
> > > ...
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index 3abb053e70580e..5fe4cdb7dbd61a 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > ...
> > > > @@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
> > > >  }
> > > >  
> > > >  /**
> > > ...
> > > >   */
> > > > -int write_cache_pages(struct address_space *mapping,
> > > > -		      struct writeback_control *wbc, writepage_t writepage,
> > > > -		      void *data)
> > > > +struct folio *writeback_iter(struct address_space *mapping,
> > > > +		struct writeback_control *wbc, struct folio *folio, int *error)
> > > >  {
> > > ...
> > > > +	} else {
> > > >  		wbc->nr_to_write -= folio_nr_pages(folio);
> > > >  
> > > > -		if (error == AOP_WRITEPAGE_ACTIVATE) {
> > > > -			folio_unlock(folio);
> > > > -			error = 0;
> > > > -		}
> > > > +		WARN_ON_ONCE(*error > 0);
> > > 
> > > Why the warning on writeback error here? It looks like new behavior, but
> > > maybe I missed something. Otherwise the factoring LGTM.
> > 
> > Err, sorry.. I glossed over the > 0 check and read it as < 0.
> > Disregard, this seems reasonable to me as long as we no longer expect
> > those AOP returns (which I'm not really clear on either, but anyways..):
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> So my understanding is that AOP_WRITEPAGE_ACTIVATE should be now handled
> directly by the caller of ->writepage hook and not by writeback_iter()
> which is the reason why the warning is here.
> 

Yeah, I wasn't really familiar with the AOP error codes, saw that
multiple existed and just assumed they might be arbitrarily relevant
across different aop callbacks (and so then filtered the check/clear for
WRITEPAGE_ACTIVATE out of my brain ;). On taking a closer look, it seems
like the only other one (AOP_TRUNCATED_PAGE) doesn't have any relevance
to ->writepage(), so this all makes more sense to me now. Thanks.

Brian

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-02-07 13:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-03  7:11 convert write_cache_pages() to an iterator v6 Christoph Hellwig
2024-02-03  7:11 ` [PATCH 01/13] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
2024-02-03  7:11 ` [PATCH 02/13] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
2024-02-03  7:11 ` [PATCH 03/13] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
2024-02-03  7:11 ` [PATCH 04/13] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
2024-02-03  7:11 ` [PATCH 05/13] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
2024-02-05 11:34   ` Jan Kara
2024-02-05 15:32   ` Brian Foster
2024-02-03  7:11 ` [PATCH 06/13] writeback: Factor folio_prepare_writeback() out of write_cache_pages() Christoph Hellwig
2024-02-03  7:11 ` [PATCH 07/13] writeback: Factor writeback_get_batch() " Christoph Hellwig
2024-02-03  7:11 ` [PATCH 08/13] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
2024-02-03  7:11 ` [PATCH 09/13] pagevec: Add ability to iterate a queue Christoph Hellwig
2024-02-03  7:11 ` [PATCH 10/13] writeback: Use the folio_batch queue iterator Christoph Hellwig
2024-02-03  7:11 ` [PATCH 11/13] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() Christoph Hellwig
2024-02-03  7:11 ` [PATCH 12/13] writeback: add a writeback iterator Christoph Hellwig
2024-02-05 11:39   ` Jan Kara
2024-02-05 15:33   ` Brian Foster
2024-02-06 14:54     ` Brian Foster
2024-02-07  8:42       ` Jan Kara
2024-02-07 14:00         ` Brian Foster
2024-02-03  7:11 ` [PATCH 13/13] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
2024-02-05 11:43   ` Jan Kara
2024-02-05 15:34   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).