* [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 9:31 ` Jan Kara
2024-02-15 11:44 ` Brian Foster
2024-02-15 6:36 ` [PATCH 02/14] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
` (13 subsequent siblings)
14 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel
mapping_set_error should only be called on 0 returns (which it ignores)
or a negative error code.
writepage_cb ends up being able to call writepage_cb on the magic
AOP_WRITEPAGE_ACTIVATE return value from ->writepage which means
success but the caller needs to unlock the page. Ignore that and
just call mapping_set_error on negative errors.
(no fixes tag as this goes back more than 20 years over various renames
and refactors so I've given up chasing down the original introduction)
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/page-writeback.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3f255534986a2f..703e83c69ffe08 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2535,7 +2535,9 @@ static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
{
struct address_space *mapping = data;
int ret = mapping->a_ops->writepage(&folio->page, wbc);
- mapping_set_error(mapping, ret);
+
+ if (ret < 0)
+ mapping_set_error(mapping, ret);
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE
2024-02-15 6:36 ` [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE Christoph Hellwig
@ 2024-02-15 9:31 ` Jan Kara
2024-02-15 11:44 ` Brian Foster
1 sibling, 0 replies; 18+ messages in thread
From: Jan Kara @ 2024-02-15 9:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel
On Thu 15-02-24 07:36:36, Christoph Hellwig wrote:
> mapping_set_error should only be called on 0 returns (which it ignores)
> or a negative error code.
>
> writepage_cb ends up being able to call writepage_cb on the magic
> AOP_WRITEPAGE_ACTIVATE return value from ->writepage which means
> success but the caller needs to unlock the page. Ignore that and
> just call mapping_set_error on negative errors.
>
> (no fixes tag as this goes back more than 20 years over various renames
> and refactors so I've given up chasing down the original introduction)
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> mm/page-writeback.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 3f255534986a2f..703e83c69ffe08 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2535,7 +2535,9 @@ static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
> {
> struct address_space *mapping = data;
> int ret = mapping->a_ops->writepage(&folio->page, wbc);
> - mapping_set_error(mapping, ret);
> +
> + if (ret < 0)
> + mapping_set_error(mapping, ret);
> return ret;
> }
>
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE
2024-02-15 6:36 ` [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE Christoph Hellwig
2024-02-15 9:31 ` Jan Kara
@ 2024-02-15 11:44 ` Brian Foster
1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2024-02-15 11:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, Matthew Wilcox, Jan Kara, David Howells,
Christian Brauner, linux-fsdevel, linux-kernel
On Thu, Feb 15, 2024 at 07:36:36AM +0100, Christoph Hellwig wrote:
> mapping_set_error should only be called on 0 returns (which it ignores)
> or a negative error code.
>
> writepage_cb ends up being able to call writepage_cb on the magic
> AOP_WRITEPAGE_ACTIVATE return value from ->writepage which means
> success but the caller needs to unlock the page. Ignore that and
> just call mapping_set_error on negative errors.
>
> (no fixes tag as this goes back more than 20 years over various renames
> and refactors so I've given up chasing down the original introduction)
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
> mm/page-writeback.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 3f255534986a2f..703e83c69ffe08 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2535,7 +2535,9 @@ static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
> {
> struct address_space *mapping = data;
> int ret = mapping->a_ops->writepage(&folio->page, wbc);
> - mapping_set_error(mapping, ret);
> +
> + if (ret < 0)
> + mapping_set_error(mapping, ret);
> return ret;
> }
>
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 02/14] writeback: remove a duplicate prototype for tag_pages_for_writeback
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
2024-02-15 6:36 ` [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 03/14] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: split from a larger patch]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
include/linux/writeback.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 453736fd1d23ce..4b8cf9e4810bad 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -363,8 +363,6 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb);
typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
void *data);
-void tag_pages_for_writeback(struct address_space *mapping,
- pgoff_t start, pgoff_t end);
int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 03/14] writeback: fix done_index when hitting the wbc->nr_to_write
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
2024-02-15 6:36 ` [PATCH 01/14] writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATE Christoph Hellwig
2024-02-15 6:36 ` [PATCH 02/14] writeback: remove a duplicate prototype for tag_pages_for_writeback Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 04/14] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
When write_cache_pages finishes writing out a folio, it fails to update
done_index to account for the number of pages in the folio just written.
That means when range_cyclic writeback is restarted, it will be
restarted at this folio instead of after it as it should. Fix that
by updating done_index before breaking out of the loop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 703e83c69ffe08..2679176da316b3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2505,6 +2505,7 @@ int write_cache_pages(struct address_space *mapping,
* keep going until we have written all the pages
* we tagged for writeback prior to entering this loop.
*/
+ done_index = folio->index + nr;
wbc->nr_to_write -= nr;
if (wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 04/14] writeback: also update wbc->nr_to_write on writeback failure
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (2 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 03/14] writeback: fix done_index when hitting the wbc->nr_to_write Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 05/14] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
When exiting write_cache_pages early due to a non-integrity write
failure, wbc->nr_to_write currently doesn't account for the folio
we just failed to write. This doesn't matter because the callers
always ingore the value on a failure, but moving the update to
common code will allow to simplify the code, so do it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2679176da316b3..abbee369405a83 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
error = writepage(folio, wbc, data);
nr = folio_nr_pages(folio);
+ wbc->nr_to_write -= nr;
if (unlikely(error)) {
/*
* Handle errors according to the type of
@@ -2506,7 +2507,6 @@ int write_cache_pages(struct address_space *mapping,
* we tagged for writeback prior to entering this loop.
*/
done_index = folio->index + nr;
- wbc->nr_to_write -= nr;
if (wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 05/14] writeback: only update ->writeback_index for range_cyclic writeback
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (3 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 04/14] writeback: also update wbc->nr_to_write on writeback failure Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 06/14] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
mapping->writeback_index is only [1] used as the starting point for
range_cyclic writeback, so there is no point in updating it for other
types of writeback.
[1] except for btrfs_defrag_file which does really odd things with
mapping->writeback_index. But btrfs doesn't use write_cache_pages at
all, so this isn't relevant here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index abbee369405a83..f02014007b57cc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2403,7 +2403,6 @@ int write_cache_pages(struct address_space *mapping,
pgoff_t index;
pgoff_t end; /* Inclusive */
pgoff_t done_index;
- int range_whole = 0;
xa_mark_t tag;
folio_batch_init(&fbatch);
@@ -2413,8 +2412,6 @@ int write_cache_pages(struct address_space *mapping,
} else {
index = wbc->range_start >> PAGE_SHIFT;
end = wbc->range_end >> PAGE_SHIFT;
- if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
- range_whole = 1;
}
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
tag_pages_for_writeback(mapping, index, end);
@@ -2518,14 +2515,21 @@ int write_cache_pages(struct address_space *mapping,
}
/*
- * If we hit the last page and there is more work to be done: wrap
- * back the index back to the start of the file for the next
- * time we are called.
+ * For range cyclic writeback we need to remember where we stopped so
+ * that we can continue there next time we are called. If we hit the
+ * last page and there is more work to be done, wrap back to the start
+ * of the file.
+ *
+ * For non-cyclic writeback we always start looking up at the beginning
+ * of the file if we are called again, which can only happen due to
+ * -ENOMEM from the file system.
*/
- if (wbc->range_cyclic && !done)
- done_index = 0;
- if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
- mapping->writeback_index = done_index;
+ if (wbc->range_cyclic) {
+ if (done)
+ mapping->writeback_index = done_index;
+ else
+ mapping->writeback_index = 0;
+ }
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 06/14] writeback: rework the loop termination condition in write_cache_pages
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (4 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 05/14] writeback: only update ->writeback_index for range_cyclic writeback Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 07/14] writeback: Factor folio_prepare_writeback() out of write_cache_pages() Christoph Hellwig
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara
Rework the way we deal with the cleanup after the writepage call.
First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error
returns to get it out of the way of the actual error handling path.
The split the handling on intgrity vs non-integrity branches first,
and return early using a goto for the non-ingegrity early loop condition
to remove the need for the done and done_index local variables, and for
assigning the error to ret when we can just return error directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 84 ++++++++++++++++++---------------------------
1 file changed, 33 insertions(+), 51 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f02014007b57cc..3a1e23bed4052c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2396,13 +2396,12 @@ int write_cache_pages(struct address_space *mapping,
void *data)
{
int ret = 0;
- int done = 0;
int error;
struct folio_batch fbatch;
+ struct folio *folio;
int nr_folios;
pgoff_t index;
pgoff_t end; /* Inclusive */
- pgoff_t done_index;
xa_mark_t tag;
folio_batch_init(&fbatch);
@@ -2419,8 +2418,7 @@ int write_cache_pages(struct address_space *mapping,
} else {
tag = PAGECACHE_TAG_DIRTY;
}
- done_index = index;
- while (!done && (index <= end)) {
+ while (index <= end) {
int i;
nr_folios = filemap_get_folios_tag(mapping, &index, end,
@@ -2430,11 +2428,7 @@ int write_cache_pages(struct address_space *mapping,
break;
for (i = 0; i < nr_folios; i++) {
- struct folio *folio = fbatch.folios[i];
- unsigned long nr;
-
- done_index = folio->index;
-
+ folio = fbatch.folios[i];
folio_lock(folio);
/*
@@ -2469,45 +2463,32 @@ int write_cache_pages(struct address_space *mapping,
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
error = writepage(folio, wbc, data);
- nr = folio_nr_pages(folio);
- wbc->nr_to_write -= nr;
- if (unlikely(error)) {
- /*
- * Handle errors according to the type of
- * writeback. There's no need to continue for
- * background writeback. Just push done_index
- * past this page so media errors won't choke
- * writeout for the entire file. For integrity
- * writeback, we must process the entire dirty
- * set regardless of errors because the fs may
- * still have state to clear for each page. In
- * that case we continue processing and return
- * the first error.
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- } else if (wbc->sync_mode != WB_SYNC_ALL) {
- ret = error;
- done_index = folio->index + nr;
- done = 1;
- break;
- }
- if (!ret)
- ret = error;
+ wbc->nr_to_write -= folio_nr_pages(folio);
+
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
}
/*
- * We stop writing back only if we are not doing
- * integrity sync. In case of integrity sync we have to
- * keep going until we have written all the pages
- * we tagged for writeback prior to entering this loop.
+ * For integrity writeback we have to keep going until
+ * we have written all the folios we tagged for
+ * writeback above, even if we run past wbc->nr_to_write
+ * or encounter errors.
+ * We stash away the first error we encounter in
+ * wbc->saved_err so that it can be retrieved when we're
+ * done. This is because the file system may still have
+ * state to clear for each folio.
+ *
+ * For background writeback we exit as soon as we run
+ * past wbc->nr_to_write or encounter the first error.
*/
- done_index = folio->index + nr;
- if (wbc->nr_to_write <= 0 &&
- wbc->sync_mode == WB_SYNC_NONE) {
- done = 1;
- break;
+ if (wbc->sync_mode == WB_SYNC_ALL) {
+ if (error && !ret)
+ ret = error;
+ } else {
+ if (error || wbc->nr_to_write <= 0)
+ goto done;
}
}
folio_batch_release(&fbatch);
@@ -2524,14 +2505,15 @@ int write_cache_pages(struct address_space *mapping,
* of the file if we are called again, which can only happen due to
* -ENOMEM from the file system.
*/
- if (wbc->range_cyclic) {
- if (done)
- mapping->writeback_index = done_index;
- else
- mapping->writeback_index = 0;
- }
-
+ if (wbc->range_cyclic)
+ mapping->writeback_index = 0;
return ret;
+
+done:
+ if (wbc->range_cyclic)
+ mapping->writeback_index = folio->index + folio_nr_pages(folio);
+ folio_batch_release(&fbatch);
+ return error;
}
EXPORT_SYMBOL(write_cache_pages);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 07/14] writeback: Factor folio_prepare_writeback() out of write_cache_pages()
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (5 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 06/14] writeback: rework the loop termination condition in write_cache_pages Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 08/14] writeback: Factor writeback_get_batch() " Christoph Hellwig
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Reduce write_cache_pages() by about 30 lines; much of it is commentary,
but it all bundles nicely into an obvious function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: rename should_writeback_folio to folio_prepare_writeback based on
a comment from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 61 +++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 27 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3a1e23bed4052c..4bc55ce9597d13 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,6 +2360,38 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);
+static bool folio_prepare_writeback(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio)
+{
+ /*
+ * Folio truncated or invalidated. We can freely skip it then,
+ * even for data integrity operations: the folio has disappeared
+ * concurrently, so there could be no real expectation of this
+ * data integrity operation even if there is now a new, dirty
+ * folio at the same pagecache index.
+ */
+ if (unlikely(folio->mapping != mapping))
+ return false;
+
+ /*
+ * Did somebody else write it for us?
+ */
+ if (!folio_test_dirty(folio))
+ return false;
+
+ if (folio_test_writeback(folio)) {
+ if (wbc->sync_mode == WB_SYNC_NONE)
+ return false;
+ folio_wait_writeback(folio);
+ }
+ BUG_ON(folio_test_writeback(folio));
+
+ if (!folio_clear_dirty_for_io(folio))
+ return false;
+
+ return true;
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2430,38 +2462,13 @@ int write_cache_pages(struct address_space *mapping,
for (i = 0; i < nr_folios; i++) {
folio = fbatch.folios[i];
folio_lock(folio);
-
- /*
- * Page truncated or invalidated. We can freely skip it
- * then, even for data integrity operations: the page
- * has disappeared concurrently, so there could be no
- * real expectation of this data integrity operation
- * even if there is now a new, dirty page at the same
- * pagecache address.
- */
- if (unlikely(folio->mapping != mapping)) {
-continue_unlock:
+ if (!folio_prepare_writeback(mapping, wbc, folio)) {
folio_unlock(folio);
continue;
}
- if (!folio_test_dirty(folio)) {
- /* someone wrote it for us */
- goto continue_unlock;
- }
-
- if (folio_test_writeback(folio)) {
- if (wbc->sync_mode != WB_SYNC_NONE)
- folio_wait_writeback(folio);
- else
- goto continue_unlock;
- }
-
- BUG_ON(folio_test_writeback(folio));
- if (!folio_clear_dirty_for_io(folio))
- goto continue_unlock;
-
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+
error = writepage(folio, wbc, data);
wbc->nr_to_write -= folio_nr_pages(folio);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 08/14] writeback: Factor writeback_get_batch() out of write_cache_pages()
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (6 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 07/14] writeback: Factor folio_prepare_writeback() out of write_cache_pages() Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 09/14] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
This simple helper will be the basis of the writeback iterator.
To make this work, we need to remember the current index
and end positions in writeback_control.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: heavily rebased, add helpers to get the tag and end index,
don't keep the end index in struct writeback_control]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
include/linux/writeback.h | 6 ++++
mm/page-writeback.c | 60 +++++++++++++++++++++++++--------------
2 files changed, 44 insertions(+), 22 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 4b8cf9e4810bad..f67b3ea866a0fb 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -11,6 +11,7 @@
#include <linux/flex_proportions.h>
#include <linux/backing-dev-defs.h>
#include <linux/blk_types.h>
+#include <linux/pagevec.h>
struct bio;
@@ -40,6 +41,7 @@ enum writeback_sync_modes {
* in a manner such that unspecified fields are set to zero.
*/
struct writeback_control {
+ /* public fields that can be set and/or consumed by the caller: */
long nr_to_write; /* Write this many pages, and decrement
this for each page written */
long pages_skipped; /* Pages which were not written */
@@ -77,6 +79,10 @@ struct writeback_control {
*/
struct swap_iocb **swap_plug;
+ /* internal fields used by the ->writepages implementation: */
+ struct folio_batch fbatch;
+ pgoff_t index;
+
#ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *wb; /* wb this writeback is issued under */
struct inode *inode; /* inode being written out */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4bc55ce9597d13..5b8dbbef713722 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2392,6 +2392,29 @@ static bool folio_prepare_writeback(struct address_space *mapping,
return true;
}
+static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
+{
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ return PAGECACHE_TAG_TOWRITE;
+ return PAGECACHE_TAG_DIRTY;
+}
+
+static pgoff_t wbc_end(struct writeback_control *wbc)
+{
+ if (wbc->range_cyclic)
+ return -1;
+ return wbc->range_end >> PAGE_SHIFT;
+}
+
+static void writeback_get_batch(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ folio_batch_release(&wbc->fbatch);
+ cond_resched();
+ filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+ wbc_to_tag(wbc), &wbc->fbatch);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2429,38 +2452,32 @@ int write_cache_pages(struct address_space *mapping,
{
int ret = 0;
int error;
- struct folio_batch fbatch;
struct folio *folio;
- int nr_folios;
- pgoff_t index;
pgoff_t end; /* Inclusive */
- xa_mark_t tag;
- folio_batch_init(&fbatch);
if (wbc->range_cyclic) {
- index = mapping->writeback_index; /* prev offset */
+ wbc->index = mapping->writeback_index; /* prev offset */
end = -1;
} else {
- index = wbc->range_start >> PAGE_SHIFT;
+ wbc->index = wbc->range_start >> PAGE_SHIFT;
end = wbc->range_end >> PAGE_SHIFT;
}
- if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
- tag_pages_for_writeback(mapping, index, end);
- tag = PAGECACHE_TAG_TOWRITE;
- } else {
- tag = PAGECACHE_TAG_DIRTY;
- }
- while (index <= end) {
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ tag_pages_for_writeback(mapping, wbc->index, end);
+
+ folio_batch_init(&wbc->fbatch);
+
+ while (wbc->index <= end) {
int i;
- nr_folios = filemap_get_folios_tag(mapping, &index, end,
- tag, &fbatch);
+ writeback_get_batch(mapping, wbc);
- if (nr_folios == 0)
+ if (wbc->fbatch.nr == 0)
break;
- for (i = 0; i < nr_folios; i++) {
- folio = fbatch.folios[i];
+ for (i = 0; i < wbc->fbatch.nr; i++) {
+ folio = wbc->fbatch.folios[i];
+
folio_lock(folio);
if (!folio_prepare_writeback(mapping, wbc, folio)) {
folio_unlock(folio);
@@ -2498,8 +2515,6 @@ int write_cache_pages(struct address_space *mapping,
goto done;
}
}
- folio_batch_release(&fbatch);
- cond_resched();
}
/*
@@ -2512,6 +2527,7 @@ int write_cache_pages(struct address_space *mapping,
* of the file if we are called again, which can only happen due to
* -ENOMEM from the file system.
*/
+ folio_batch_release(&wbc->fbatch);
if (wbc->range_cyclic)
mapping->writeback_index = 0;
return ret;
@@ -2519,7 +2535,7 @@ int write_cache_pages(struct address_space *mapping,
done:
if (wbc->range_cyclic)
mapping->writeback_index = folio->index + folio_nr_pages(folio);
- folio_batch_release(&fbatch);
+ folio_batch_release(&wbc->fbatch);
return error;
}
EXPORT_SYMBOL(write_cache_pages);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 09/14] writeback: Simplify the loops in write_cache_pages()
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (7 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 08/14] writeback: Factor writeback_get_batch() " Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 10/14] pagevec: Add ability to iterate a queue Christoph Hellwig
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Collapse the two nested loops into one. This is needed as a step
towards turning this into an iterator.
Note that this drops the "index <= end" check in the previous outer loop
and just relies on filemap_get_folios_tag() to return 0 entries when
index > end. This actually has a subtle implication when end == -1
because then the returned index will be -1 as well and thus if there is
page present on index -1, we could be looping indefinitely. But as the
comment in filemap_get_folios_tag documents this as already broken anyway
we should not worry about it here either. The fix for that would
probably a change to the filemap_get_folios_tag() calling convention.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: updated the commit log based on feedback from Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 75 ++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 39 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 5b8dbbef713722..358ce3ade9ad1e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2454,6 +2454,7 @@ int write_cache_pages(struct address_space *mapping,
int error;
struct folio *folio;
pgoff_t end; /* Inclusive */
+ int i = 0;
if (wbc->range_cyclic) {
wbc->index = mapping->writeback_index; /* prev offset */
@@ -2467,53 +2468,49 @@ int write_cache_pages(struct address_space *mapping,
folio_batch_init(&wbc->fbatch);
- while (wbc->index <= end) {
- int i;
-
- writeback_get_batch(mapping, wbc);
-
+ for (;;) {
+ if (i == wbc->fbatch.nr) {
+ writeback_get_batch(mapping, wbc);
+ i = 0;
+ }
if (wbc->fbatch.nr == 0)
break;
- for (i = 0; i < wbc->fbatch.nr; i++) {
- folio = wbc->fbatch.folios[i];
+ folio = wbc->fbatch.folios[i++];
- folio_lock(folio);
- if (!folio_prepare_writeback(mapping, wbc, folio)) {
- folio_unlock(folio);
- continue;
- }
+ folio_lock(folio);
+ if (!folio_prepare_writeback(mapping, wbc, folio)) {
+ folio_unlock(folio);
+ continue;
+ }
- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
- error = writepage(folio, wbc, data);
- wbc->nr_to_write -= folio_nr_pages(folio);
+ error = writepage(folio, wbc, data);
+ wbc->nr_to_write -= folio_nr_pages(folio);
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- }
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ }
- /*
- * For integrity writeback we have to keep going until
- * we have written all the folios we tagged for
- * writeback above, even if we run past wbc->nr_to_write
- * or encounter errors.
- * We stash away the first error we encounter in
- * wbc->saved_err so that it can be retrieved when we're
- * done. This is because the file system may still have
- * state to clear for each folio.
- *
- * For background writeback we exit as soon as we run
- * past wbc->nr_to_write or encounter the first error.
- */
- if (wbc->sync_mode == WB_SYNC_ALL) {
- if (error && !ret)
- ret = error;
- } else {
- if (error || wbc->nr_to_write <= 0)
- goto done;
- }
+ /*
+ * For integrity writeback we have to keep going until we have
+ * written all the folios we tagged for writeback above, even if
+ * we run past wbc->nr_to_write or encounter errors.
+ * We stash away the first error we encounter in wbc->saved_err
+ * so that it can be retrieved when we're done. This is because
+ * the file system may still have state to clear for each folio.
+ *
+ * For background writeback we exit as soon as we run past
+ * wbc->nr_to_write or encounter the first error.
+ */
+ if (wbc->sync_mode == WB_SYNC_ALL) {
+ if (error && !ret)
+ ret = error;
+ } else {
+ if (error || wbc->nr_to_write <= 0)
+ goto done;
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 10/14] pagevec: Add ability to iterate a queue
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (8 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 09/14] writeback: Simplify the loops in write_cache_pages() Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 11/14] writeback: Use the folio_batch queue iterator Christoph Hellwig
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack. It would
generate some very weird and suboptimal I/O patterns for page writeback
to iterate over the batch as a stack.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
include/linux/pagevec.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 87cc678adc850b..fcc06c300a72c3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,6 +27,7 @@ struct folio;
*/
struct folio_batch {
unsigned char nr;
+ unsigned char i;
bool percpu_pvec_drained;
struct folio *folios[PAGEVEC_SIZE];
};
@@ -40,12 +41,14 @@ struct folio_batch {
static inline void folio_batch_init(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
fbatch->percpu_pvec_drained = false;
}
static inline void folio_batch_reinit(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
}
static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
return folio_batch_space(fbatch);
}
+/**
+ * folio_batch_next - Return the next folio to process.
+ * @fbatch: The folio batch being processed.
+ *
+ * Use this function to implement a queue of folios.
+ *
+ * Return: The next folio in the queue, or NULL if the queue is empty.
+ */
+static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
+{
+ if (fbatch->i == fbatch->nr)
+ return NULL;
+ return fbatch->folios[fbatch->i++];
+}
+
void __folio_batch_release(struct folio_batch *pvec);
static inline void folio_batch_release(struct folio_batch *fbatch)
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 11/14] writeback: Use the folio_batch queue iterator
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (9 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 10/14] pagevec: Add ability to iterate a queue Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 12/14] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() Christoph Hellwig
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Instead of keeping our own local iterator variable, use the one just
added to folio_batch.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 358ce3ade9ad1e..3cbe4a7daa357c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2406,13 +2406,21 @@ static pgoff_t wbc_end(struct writeback_control *wbc)
return wbc->range_end >> PAGE_SHIFT;
}
-static void writeback_get_batch(struct address_space *mapping,
+static struct folio *writeback_get_folio(struct address_space *mapping,
struct writeback_control *wbc)
{
- folio_batch_release(&wbc->fbatch);
- cond_resched();
- filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
- wbc_to_tag(wbc), &wbc->fbatch);
+ struct folio *folio;
+
+ folio = folio_batch_next(&wbc->fbatch);
+ if (!folio) {
+ folio_batch_release(&wbc->fbatch);
+ cond_resched();
+ filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+ wbc_to_tag(wbc), &wbc->fbatch);
+ folio = folio_batch_next(&wbc->fbatch);
+ }
+
+ return folio;
}
/**
@@ -2454,7 +2462,6 @@ int write_cache_pages(struct address_space *mapping,
int error;
struct folio *folio;
pgoff_t end; /* Inclusive */
- int i = 0;
if (wbc->range_cyclic) {
wbc->index = mapping->writeback_index; /* prev offset */
@@ -2469,15 +2476,10 @@ int write_cache_pages(struct address_space *mapping,
folio_batch_init(&wbc->fbatch);
for (;;) {
- if (i == wbc->fbatch.nr) {
- writeback_get_batch(mapping, wbc);
- i = 0;
- }
- if (wbc->fbatch.nr == 0)
+ folio = writeback_get_folio(mapping, wbc);
+ if (!folio)
break;
- folio = wbc->fbatch.folios[i++];
-
folio_lock(folio);
if (!folio_prepare_writeback(mapping, wbc, folio)) {
folio_unlock(folio);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 12/14] writeback: Move the folio_prepare_writeback loop out of write_cache_pages()
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (10 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 11/14] writeback: Use the folio_batch queue iterator Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 13/14] writeback: add a writeback iterator Christoph Hellwig
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara,
Dave Chinner
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Move the loop for should-we-write-this-folio to writeback_get_folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: folded the loop into the existing helper instead of a separate one
as suggested by Jan Kara]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
---
mm/page-writeback.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3cbe4a7daa357c..fc421402f81881 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2411,6 +2411,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
{
struct folio *folio;
+retry:
folio = folio_batch_next(&wbc->fbatch);
if (!folio) {
folio_batch_release(&wbc->fbatch);
@@ -2418,8 +2419,17 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
wbc_to_tag(wbc), &wbc->fbatch);
folio = folio_batch_next(&wbc->fbatch);
+ if (!folio)
+ return NULL;
+ }
+
+ folio_lock(folio);
+ if (unlikely(!folio_prepare_writeback(mapping, wbc, folio))) {
+ folio_unlock(folio);
+ goto retry;
}
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
return folio;
}
@@ -2480,14 +2490,6 @@ int write_cache_pages(struct address_space *mapping,
if (!folio)
break;
- folio_lock(folio);
- if (!folio_prepare_writeback(mapping, wbc, folio)) {
- folio_unlock(folio);
- continue;
- }
-
- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
-
error = writepage(folio, wbc, data);
wbc->nr_to_write -= folio_nr_pages(folio);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 13/14] writeback: add a writeback iterator
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (11 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 12/14] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-15 6:36 ` [PATCH 14/14] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
2024-02-19 6:28 ` convert write_cache_pages() to an iterator v8 Christoph Hellwig
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara
Refactor the code left in write_cache_pages into an iterator that the
file system can call to get the next folio for a writeback operation:
struct folio *folio = NULL;
while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
error = <do per-folio writeback>;
}
The twist here is that the error value is passed by reference, so that
the iterator can restore it when breaking out of the loop.
Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
iterator and needs is just kept in the write_cache_pages legacy wrapper.
in preparation for eventually killing it off.
Heavily based on a for_each* based iterator from Matthew Wilcox.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
include/linux/writeback.h | 4 +
mm/page-writeback.c | 192 ++++++++++++++++++++++----------------
2 files changed, 118 insertions(+), 78 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index f67b3ea866a0fb..9845cb62e40b2d 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -82,6 +82,7 @@ struct writeback_control {
/* internal fields used by the ->writepages implementation: */
struct folio_batch fbatch;
pgoff_t index;
+ int saved_err;
#ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *wb; /* wb this writeback is issued under */
@@ -366,6 +367,9 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
bool wb_over_bg_thresh(struct bdi_writeback *wb);
+struct folio *writeback_iter(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int *error);
+
typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
void *data);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc421402f81881..3da4345f08a335 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
}
/**
- * tag_pages_for_writeback - tag pages to be written by write_cache_pages
+ * tag_pages_for_writeback - tag pages to be written by writeback
* @mapping: address space structure to write
* @start: starting page index
* @end: ending page index (inclusive)
*
* This function scans the page range from @start to @end (inclusive) and tags
- * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
- * that write_cache_pages (or whoever calls this function) will then use
- * TOWRITE tag to identify pages eligible for writeback. This mechanism is
- * used to avoid livelocking of writeback by a process steadily creating new
- * dirty pages in the file (thus it is important for this function to be quick
- * so that it can tag pages faster than a dirtying process can create them).
+ * all pages that have DIRTY tag set with a special TOWRITE tag. The caller
+ * can then use the TOWRITE tag to identify pages eligible for writeback.
+ * This mechanism is used to avoid livelocking of writeback by a process
+ * steadily creating new dirty pages in the file (thus it is important for this
+ * function to be quick so that it can tag pages faster than a dirtying process
+ * can create them).
*/
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end)
@@ -2434,69 +2434,68 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
}
/**
- * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
+ * writeback_iter - iterate folio of a mapping for writeback
* @mapping: address space structure to write
- * @wbc: subtract the number of written pages from *@wbc->nr_to_write
- * @writepage: function called for each page
- * @data: data passed to writepage function
+ * @wbc: writeback context
+ * @folio: previously iterated folio (%NULL to start)
+ * @error: in-out pointer for writeback errors (see below)
*
- * If a page is already under I/O, write_cache_pages() skips it, even
- * if it's dirty. This is desirable behaviour for memory-cleaning writeback,
- * but it is INCORRECT for data-integrity system calls such as fsync(). fsync()
- * and msync() need to guarantee that all the data which was dirty at the time
- * the call was made get new I/O started against them. If wbc->sync_mode is
- * WB_SYNC_ALL then we were called for data integrity and we must wait for
- * existing IO to complete.
- *
- * To avoid livelocks (when other process dirties new pages), we first tag
- * pages which should be written back with TOWRITE tag and only then start
- * writing them. For data-integrity sync we have to be careful so that we do
- * not miss some pages (e.g., because some other process has cleared TOWRITE
- * tag we set). The rule we follow is that TOWRITE tag can be cleared only
- * by the process clearing the DIRTY tag (and submitting the page for IO).
- *
- * To avoid deadlocks between range_cyclic writeback and callers that hold
- * pages in PageWriteback to aggregate IO until write_cache_pages() returns,
- * we do not loop back to the start of the file. Doing so causes a page
- * lock/page writeback access order inversion - we should only ever lock
- * multiple pages in ascending page->index order, and looping back to the start
- * of the file violates that rule and causes deadlocks.
+ * This function returns the next folio for the writeback operation described by
+ * @wbc on @mapping and should be called in a while loop in the ->writepages
+ * implementation.
*
- * Return: %0 on success, negative error code otherwise
+ * To start the writeback operation, %NULL is passed in the @folio argument, and
+ * for every subsequent iteration the folio returned previously should be passed
+ * back in.
+ *
+ * If there was an error in the per-folio writeback inside the writeback_iter()
+ * loop, @error should be set to the error value.
+ *
+ * Once the writeback described in @wbc has finished, this function will return
+ * %NULL and if there was an error in any iteration restore it to @error.
+ *
+ * Note: callers should not manually break out of the loop using break or goto
+ * but must keep calling writeback_iter() until it returns %NULL.
+ *
+ * Return: the folio to write or %NULL if the loop is done.
*/
-int write_cache_pages(struct address_space *mapping,
- struct writeback_control *wbc, writepage_t writepage,
- void *data)
+struct folio *writeback_iter(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int *error)
{
- int ret = 0;
- int error;
- struct folio *folio;
- pgoff_t end; /* Inclusive */
-
- if (wbc->range_cyclic) {
- wbc->index = mapping->writeback_index; /* prev offset */
- end = -1;
- } else {
- wbc->index = wbc->range_start >> PAGE_SHIFT;
- end = wbc->range_end >> PAGE_SHIFT;
- }
- if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
- tag_pages_for_writeback(mapping, wbc->index, end);
-
- folio_batch_init(&wbc->fbatch);
+ if (!folio) {
+ folio_batch_init(&wbc->fbatch);
+ wbc->saved_err = *error = 0;
- for (;;) {
- folio = writeback_get_folio(mapping, wbc);
- if (!folio)
- break;
+ /*
+ * For range cyclic writeback we remember where we stopped so
+ * that we can continue where we stopped.
+ *
+ * For non-cyclic writeback we always start at the beginning of
+ * the passed in range.
+ */
+ if (wbc->range_cyclic)
+ wbc->index = mapping->writeback_index;
+ else
+ wbc->index = wbc->range_start >> PAGE_SHIFT;
- error = writepage(folio, wbc, data);
+ /*
+ * To avoid livelocks when other processes dirty new pages, we
+ * first tag pages which should be written back and only then
+ * start writing them.
+ *
+ * For data-integrity writeback we have to be careful so that we
+ * do not miss some pages (e.g., because some other process has
+ * cleared the TOWRITE tag we set). The rule we follow is that
+ * TOWRITE tag can be cleared only by the process clearing the
+ * DIRTY tag (and submitting the page for I/O).
+ */
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ tag_pages_for_writeback(mapping, wbc->index,
+ wbc_end(wbc));
+ } else {
wbc->nr_to_write -= folio_nr_pages(folio);
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- }
+ WARN_ON_ONCE(*error > 0);
/*
* For integrity writeback we have to keep going until we have
@@ -2510,33 +2509,70 @@ int write_cache_pages(struct address_space *mapping,
* wbc->nr_to_write or encounter the first error.
*/
if (wbc->sync_mode == WB_SYNC_ALL) {
- if (error && !ret)
- ret = error;
+ if (*error && !wbc->saved_err)
+ wbc->saved_err = *error;
} else {
- if (error || wbc->nr_to_write <= 0)
+ if (*error || wbc->nr_to_write <= 0)
goto done;
}
}
- /*
- * For range cyclic writeback we need to remember where we stopped so
- * that we can continue there next time we are called. If we hit the
- * last page and there is more work to be done, wrap back to the start
- * of the file.
- *
- * For non-cyclic writeback we always start looking up at the beginning
- * of the file if we are called again, which can only happen due to
- * -ENOMEM from the file system.
- */
- folio_batch_release(&wbc->fbatch);
- if (wbc->range_cyclic)
- mapping->writeback_index = 0;
- return ret;
+ folio = writeback_get_folio(mapping, wbc);
+ if (!folio) {
+ /*
+ * To avoid deadlocks between range_cyclic writeback and callers
+ * that hold pages in PageWriteback to aggregate I/O until
+ * the writeback iteration finishes, we do not loop back to the
+ * start of the file. Doing so causes a page lock/page
+ * writeback access order inversion - we should only ever lock
+ * multiple pages in ascending page->index order, and looping
+ * back to the start of the file violates that rule and causes
+ * deadlocks.
+ */
+ if (wbc->range_cyclic)
+ mapping->writeback_index = 0;
+
+ /*
+ * Return the first error we encountered (if there was any) to
+ * the caller.
+ */
+ *error = wbc->saved_err;
+ }
+ return folio;
done:
if (wbc->range_cyclic)
mapping->writeback_index = folio->index + folio_nr_pages(folio);
folio_batch_release(&wbc->fbatch);
+ return NULL;
+}
+
+/**
+ * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
+ * @mapping: address space structure to write
+ * @wbc: subtract the number of written pages from *@wbc->nr_to_write
+ * @writepage: function called for each page
+ * @data: data passed to writepage function
+ *
+ * Return: %0 on success, negative error code otherwise
+ *
+ * Note: please use writeback_iter() instead.
+ */
+int write_cache_pages(struct address_space *mapping,
+ struct writeback_control *wbc, writepage_t writepage,
+ void *data)
+{
+ struct folio *folio = NULL;
+ int error;
+
+ while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
+ error = writepage(folio, wbc, data);
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ }
+ }
+
return error;
}
EXPORT_SYMBOL(write_cache_pages);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 14/14] writeback: Remove a use of write_cache_pages() from do_writepages()
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (12 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 13/14] writeback: add a writeback iterator Christoph Hellwig
@ 2024-02-15 6:36 ` Christoph Hellwig
2024-02-19 6:28 ` convert write_cache_pages() to an iterator v8 Christoph Hellwig
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-15 6:36 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Use the new writeback_iter() directly instead of indirecting
through a callback.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: ported to the while based iter style]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3da4345f08a335..3e19b87049db17 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2577,15 +2577,25 @@ int write_cache_pages(struct address_space *mapping,
}
EXPORT_SYMBOL(write_cache_pages);
-static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
- void *data)
+static int writeback_use_writepage(struct address_space *mapping,
+ struct writeback_control *wbc)
{
- struct address_space *mapping = data;
- int ret = mapping->a_ops->writepage(&folio->page, wbc);
+ struct folio *folio = NULL;
+ struct blk_plug plug;
+ int err;
- if (ret < 0)
- mapping_set_error(mapping, ret);
- return ret;
+ blk_start_plug(&plug);
+ while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
+ err = mapping->a_ops->writepage(&folio->page, wbc);
+ if (err == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ err = 0;
+ }
+ mapping_set_error(mapping, err);
+ }
+ blk_finish_plug(&plug);
+
+ return err;
}
int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
@@ -2601,12 +2611,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
if (mapping->a_ops->writepages) {
ret = mapping->a_ops->writepages(mapping, wbc);
} else if (mapping->a_ops->writepage) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
- ret = write_cache_pages(mapping, wbc, writepage_cb,
- mapping);
- blk_finish_plug(&plug);
+ ret = writeback_use_writepage(mapping, wbc);
} else {
/* deal with chardevs and other special files */
ret = 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: convert write_cache_pages() to an iterator v8
2024-02-15 6:36 convert write_cache_pages() to an iterator v8 Christoph Hellwig
` (13 preceding siblings ...)
2024-02-15 6:36 ` [PATCH 14/14] writeback: Remove a use of write_cache_pages() from do_writepages() Christoph Hellwig
@ 2024-02-19 6:28 ` Christoph Hellwig
14 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2024-02-19 6:28 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Jan Kara, David Howells, Brian Foster,
Christian Brauner, linux-fsdevel, linux-kernel
Now that we've got all the reviews it would be good to think about
queuing up the series. Look like page-writeback.c usually goes through
the mm tree, and that seems good for this series as well. Andrew?
^ permalink raw reply [flat|nested] 18+ messages in thread