* [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig
@ 2025-10-15 6:27 ` Christoph Hellwig
2025-10-15 7:05 ` Damien Le Moal
` (2 more replies)
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 3 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
Return the pages directly when calculated instead of first assigning
them back to a variable, and directly return for the data integrity /
tagged case instead of going through an else clause.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/fs-writeback.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2b35e80037fe..11fd08a0efb8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
* (maybe slowly) sync all tagged pages
*/
if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
- pages = LONG_MAX;
- else {
- pages = min(wb->avg_write_bandwidth / 2,
- global_wb_domain.dirty_limit / DIRTY_SCOPE);
- pages = min(pages, work->nr_pages);
- pages = round_down(pages + MIN_WRITEBACK_PAGES,
- MIN_WRITEBACK_PAGES);
- }
+ return LONG_MAX;
- return pages;
+ pages = min(wb->avg_write_bandwidth / 2,
+ global_wb_domain.dirty_limit / DIRTY_SCOPE);
+ pages = min(pages, work->nr_pages);
+ return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
}
/*
--
2.47.3
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
@ 2025-10-15 7:05 ` Damien Le Moal
2025-10-15 15:48 ` Darrick J. Wong
2025-10-20 9:34 ` Jan Kara
2 siblings, 0 replies; 23+ messages in thread
From: Damien Le Moal @ 2025-10-15 7:05 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel,
linux-xfs
On 2025/10/15 15:27, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-15 7:05 ` Damien Le Moal
@ 2025-10-15 15:48 ` Darrick J. Wong
2025-10-20 9:34 ` Jan Kara
2 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-10-15 15:48 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 03:27:14PM +0900, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks pretty simple to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fs-writeback.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2b35e80037fe..11fd08a0efb8 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> * (maybe slowly) sync all tagged pages
> */
> if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
> - pages = LONG_MAX;
> - else {
> - pages = min(wb->avg_write_bandwidth / 2,
> - global_wb_domain.dirty_limit / DIRTY_SCOPE);
> - pages = min(pages, work->nr_pages);
> - pages = round_down(pages + MIN_WRITEBACK_PAGES,
> - MIN_WRITEBACK_PAGES);
> - }
> + return LONG_MAX;
>
> - return pages;
> + pages = min(wb->avg_write_bandwidth / 2,
> + global_wb_domain.dirty_limit / DIRTY_SCOPE);
> + pages = min(pages, work->nr_pages);
> + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> }
>
> /*
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-15 7:05 ` Damien Le Moal
2025-10-15 15:48 ` Darrick J. Wong
@ 2025-10-20 9:34 ` Jan Kara
2 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2025-10-20 9:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed 15-10-25 15:27:14, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2b35e80037fe..11fd08a0efb8 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> * (maybe slowly) sync all tagged pages
> */
> if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
> - pages = LONG_MAX;
> - else {
> - pages = min(wb->avg_write_bandwidth / 2,
> - global_wb_domain.dirty_limit / DIRTY_SCOPE);
> - pages = min(pages, work->nr_pages);
> - pages = round_down(pages + MIN_WRITEBACK_PAGES,
> - MIN_WRITEBACK_PAGES);
> - }
> + return LONG_MAX;
>
> - return pages;
> + pages = min(wb->avg_write_bandwidth / 2,
> + global_wb_domain.dirty_limit / DIRTY_SCOPE);
> + pages = min(pages, work->nr_pages);
> + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> }
>
> /*
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
@ 2025-10-15 6:27 ` Christoph Hellwig
2025-10-15 7:09 ` Damien Le Moal
` (3 more replies)
2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
2025-10-15 7:11 ` allow file systems to increase the minimum writeback chunk size Damien Le Moal
3 siblings, 4 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
The relatively low minimal writeback size of 4MiB leads means that
written back inodes on rotational media are switched a lot. Besides
introducing additional seeks, this also can lead to extreme file
fragmentation on zoned devices when a lot of files are cached relative
to the available writeback bandwidth.
Add a superblock field that allows the file system to override the
default size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/fs-writeback.c | 14 +++++---------
fs/super.c | 1 +
include/linux/fs.h | 1 +
include/linux/writeback.h | 5 +++++
4 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 11fd08a0efb8..6d50b02cdab6 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -31,11 +31,6 @@
#include <linux/memcontrol.h>
#include "internal.h"
-/*
- * 4MB minimal write chunk size
- */
-#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
-
/*
* Passed into wb_writeback(), essentially a subset of writeback_control
*/
@@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
return ret;
}
-static long writeback_chunk_size(struct bdi_writeback *wb,
- struct wb_writeback_work *work)
+static long writeback_chunk_size(struct super_block *sb,
+ struct bdi_writeback *wb, struct wb_writeback_work *work)
{
long pages;
@@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
pages = min(wb->avg_write_bandwidth / 2,
global_wb_domain.dirty_limit / DIRTY_SCOPE);
pages = min(pages, work->nr_pages);
- return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
+ return round_down(pages + sb->s_min_writeback_pages,
+ sb->s_min_writeback_pages);
}
/*
@@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
inode->i_state |= I_SYNC;
wbc_attach_and_unlock_inode(&wbc, inode);
- write_chunk = writeback_chunk_size(wb, work);
+ write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
wbc.nr_to_write = write_chunk;
wbc.pages_skipped = 0;
diff --git a/fs/super.c b/fs/super.c
index 5bab94fb7e03..599c1d2641fe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
goto fail;
if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
goto fail;
+ s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
return s;
fail:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..23f1f10646b7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1583,6 +1583,7 @@ struct super_block {
spinlock_t s_inode_wblist_lock;
struct list_head s_inodes_wb; /* writeback inodes */
+ unsigned int s_min_writeback_pages;
} __randomize_layout;
static inline struct user_namespace *i_user_ns(const struct inode *inode)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 22dd4adc5667..49e1dd96f43e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
void sb_mark_inode_writeback(struct inode *inode);
void sb_clear_inode_writeback(struct inode *inode);
+/*
+ * 4MB minimal write chunk size
+ */
+#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
+
#endif /* WRITEBACK_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
@ 2025-10-15 7:09 ` Damien Le Moal
2025-10-15 7:27 ` Christoph Hellwig
2025-10-15 15:13 ` Theodore Ts'o
` (2 subsequent siblings)
3 siblings, 1 reply; 23+ messages in thread
From: Damien Le Moal @ 2025-10-15 7:09 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel,
linux-xfs
On 2025/10/15 15:27, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB leads means that
Removes "leads" in the above.
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..23f1f10646b7 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + unsigned int s_min_writeback_pages;
Given that writeback_chunk_size() returns a long type, maybe this should be a long ?
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 7:09 ` Damien Le Moal
@ 2025-10-15 7:27 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-15 7:27 UTC (permalink / raw)
To: Damien Le Moal
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino,
Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel,
linux-xfs
On Wed, Oct 15, 2025 at 04:09:13PM +0900, Damien Le Moal wrote:
> > + unsigned int s_min_writeback_pages;
>
> Given that writeback_chunk_size() returns a long type, maybe this should be a long ?
Not that it currently matters much, but yes for consistency this should
be a long.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
2025-10-15 7:09 ` Damien Le Moal
@ 2025-10-15 15:13 ` Theodore Ts'o
2025-10-16 4:33 ` Christoph Hellwig
2025-10-15 15:57 ` Darrick J. Wong
2025-10-15 20:49 ` Dave Chinner
3 siblings, 1 reply; 23+ messages in thread
From: Theodore Ts'o @ 2025-10-15 15:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB leads means that
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
I wonder if we should bump the default; and if the concern is that
might be problematic for super slow devices (e.g., cheap USB thumb
drives), perhaps we can measure the time needed to complete the
writeback, and then dynamically adjust the value based on the apparent
write bandwidth?
We could have each file system implement something like this, but
maybe there should be a way to do this in fs generic code?
- Ted
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 15:13 ` Theodore Ts'o
@ 2025-10-16 4:33 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-16 4:33 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino,
Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 11:13:53AM -0400, Theodore Ts'o wrote:
> I wonder if we should bump the default; and if the concern is that
> might be problematic for super slow devices (e.g., cheap USB thumb
> drives), perhaps we can measure the time needed to complete the
> writeback, and then dynamically adjust the value based on the apparent
> write bandwidth?
>
> We could have each file system implement something like this, but
> maybe there should be a way to do this in fs generic code?
Right now my main concern here is zoned file systems where the switching
directly leads to fragmentation. Besides XFS that would in theory also
affect f2fs and btrfs, but unlike XFS they do not do the trivial data
separation by inode but just throw all writes into the blender with (f2fs)
or without (btrfs) some hot cold separation applied. But even if they did
it finding the zone size is file system specific, so right now I don't see
much too share. If we end up with duplicate code I'll happily factor it
into helpers.
>
> - Ted
---end quoted text---
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
2025-10-15 7:09 ` Damien Le Moal
2025-10-15 15:13 ` Theodore Ts'o
@ 2025-10-15 15:57 ` Darrick J. Wong
2025-10-16 4:37 ` Christoph Hellwig
2025-10-15 20:49 ` Dave Chinner
3 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-10-15 15:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB leads means that
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
I havea a few side-questy questions about this patch:
Should this be some sort of BDI field? Maybe there are other workloads
that create a lot of dirty pages and the sysadmin would like to be able
to tell the fs to schedule larger chunks of writeback before switching
to another inode?
XFS can have two volumes, should we be using the rtdev's bdi for
realtime files and the data dev's bdi for non-rt files? That looks like
a mess to sort out though, since there's a fair number of places where
we just dereference super_block::s_bdi.
Also I have no idea what we'd do for filesystem raid -- synthesize a bdi
for that? And then how would you advertise that such-and-such fd maps
to a particular bdi?
(Except for the first question, I don't view the other Qs as blocking
issues; the mechanical code change looks ok to me aside from
s_min_writeback_pages should be long like Ted said)
--D
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/fs-writeback.c | 14 +++++---------
> fs/super.c | 1 +
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 4 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 11fd08a0efb8..6d50b02cdab6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -31,11 +31,6 @@
> #include <linux/memcontrol.h>
> #include "internal.h"
>
> -/*
> - * 4MB minimal write chunk size
> - */
> -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> -
> /*
> * Passed into wb_writeback(), essentially a subset of writeback_control
> */
> @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
> return ret;
> }
>
> -static long writeback_chunk_size(struct bdi_writeback *wb,
> - struct wb_writeback_work *work)
> +static long writeback_chunk_size(struct super_block *sb,
> + struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> long pages;
>
> @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> pages = min(wb->avg_write_bandwidth / 2,
> global_wb_domain.dirty_limit / DIRTY_SCOPE);
> pages = min(pages, work->nr_pages);
> - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> + return round_down(pages + sb->s_min_writeback_pages,
> + sb->s_min_writeback_pages);
> }
>
> /*
> @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> inode->i_state |= I_SYNC;
> wbc_attach_and_unlock_inode(&wbc, inode);
>
> - write_chunk = writeback_chunk_size(wb, work);
> + write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
> wbc.nr_to_write = write_chunk;
> wbc.pages_skipped = 0;
>
> diff --git a/fs/super.c b/fs/super.c
> index 5bab94fb7e03..599c1d2641fe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
> goto fail;
> if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
> goto fail;
> + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
> return s;
>
> fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..23f1f10646b7 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + unsigned int s_min_writeback_pages;
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 22dd4adc5667..49e1dd96f43e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
> void sb_mark_inode_writeback(struct inode *inode);
> void sb_clear_inode_writeback(struct inode *inode);
>
> +/*
> + * 4MB minimal write chunk size
> + */
> +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> +
> #endif /* WRITEBACK_H */
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 15:57 ` Darrick J. Wong
@ 2025-10-16 4:37 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-16 4:37 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino,
Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 08:57:35AM -0700, Darrick J. Wong wrote:
> Should this be some sort of BDI field? Maybe there are other workloads
> that create a lot of dirty pages and the sysadmin would like to be able
> to tell the fs to schedule larger chunks of writeback before switching
> to another inode?
The BDI is not owned by the file system, but rather the gendisk, so we
can't just override it in the file systems. I still hope that eventually
changes, in which case we could revisit it. Having a tunable sounds neat,
but I'd rather get the fix out first and then design something like that.
>
> XFS can have two volumes, should we be using the rtdev's bdi for
> realtime files and the data dev's bdi for non-rt files? That looks like
> a mess to sort out though, since there's a fair number of places where
> we just dereference super_block::s_bdi.
Each file system only uses a single BDI, which in case of XFS is the
one of the gendisk that the main device sits on. Only the bdevfs uses
multiple BDIs (one per file{) and that required hard coded hacks in the
writeback code. I don't think there is any benefit in having multiple
BIDs for real file system, the parallelization work that just got reposted
works inside a BDI.
> Also I have no idea what we'd do for filesystem raid -- synthesize a bdi
> for that? And then how would you advertise that such-and-such fd maps
> to a particular bdi?
btrfs allocates it's own BDI. And I hope that we eventually move to a
model where the file system always own the BDI as that would simplify
object lifetimes an relationships and locking a lot.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
` (2 preceding siblings ...)
2025-10-15 15:57 ` Darrick J. Wong
@ 2025-10-15 20:49 ` Dave Chinner
2025-10-16 4:39 ` Christoph Hellwig
3 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2025-10-15 20:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB leads means that
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
Hmmm - won't changing this for the zoned rtdev also change behaviour
for writeback on the data device? i.e. upping the minimum for the
normal data device on XFS will mean writeback bandwidth sharing is a
lot less "fair" and higher latency when we have a mix of different
file sizes than it currently is...
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-15 20:49 ` Dave Chinner
@ 2025-10-16 4:39 ` Christoph Hellwig
2025-10-16 8:23 ` Dave Chinner
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-16 4:39 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino,
Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
On Thu, Oct 16, 2025 at 07:49:20AM +1100, Dave Chinner wrote:
> On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote:
> > The relatively low minimal writeback size of 4MiB leads means that
> > written back inodes on rotational media are switched a lot. Besides
> > introducing additional seeks, this also can lead to extreme file
> > fragmentation on zoned devices when a lot of files are cached relative
> > to the available writeback bandwidth.
> >
> > Add a superblock field that allows the file system to override the
> > default size.
>
> Hmmm - won't changing this for the zoned rtdev also change behaviour
> for writeback on the data device? i.e. upping the minimum for the
> normal data device on XFS will mean writeback bandwidth sharing is a
> lot less "fair" and higher latency when we have a mix of different
> file sizes than it currently is...
In theory it is. In practice with a zoned file system the main device
is:
a) typically only used for metadata
b) a fast SSD when not actually on the same device
So I think these concerns are valid, but not really worth replacing the
simple superblock field with a method to query the value. But I'll write
a comment documenting these assumptions as that is useful for future
readers of the code.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
---end quoted text---
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-16 4:39 ` Christoph Hellwig
@ 2025-10-16 8:23 ` Dave Chinner
0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2025-10-16 8:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Thu, Oct 16, 2025 at 06:39:58AM +0200, Christoph Hellwig wrote:
> On Thu, Oct 16, 2025 at 07:49:20AM +1100, Dave Chinner wrote:
> > On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote:
> > > The relatively low minimal writeback size of 4MiB leads means that
> > > written back inodes on rotational media are switched a lot. Besides
> > > introducing additional seeks, this also can lead to extreme file
> > > fragmentation on zoned devices when a lot of files are cached relative
> > > to the available writeback bandwidth.
> > >
> > > Add a superblock field that allows the file system to override the
> > > default size.
> >
> > Hmmm - won't changing this for the zoned rtdev also change behaviour
> > for writeback on the data device? i.e. upping the minimum for the
> > normal data device on XFS will mean writeback bandwidth sharing is a
> > lot less "fair" and higher latency when we have a mix of different
> > file sizes than it currently is...
>
> In theory it is. In practice with a zoned file system the main device
> is:
>
> a) typically only used for metadata
> b) a fast SSD when not actually on the same device
>
> So I think these concerns are valid, but not really worth replacing the
> simple superblock field with a method to query the value. But I'll write
> a comment documenting these assumptions as that is useful for future
> readers of the code.
That sounds reasonable to me. Eventually we might want to explore
per-device BDIs, but for the moment documenting the trade-off being
made is good enough.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems
2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
@ 2025-10-15 6:27 ` Christoph Hellwig
2025-10-15 7:10 ` Damien Le Moal
2025-10-15 16:01 ` Darrick J. Wong
2025-10-15 7:11 ` allow file systems to increase the minimum writeback chunk size Damien Le Moal
3 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
Set s_min_writeback_pages to the zone size, so that writeback always
writes up to a full zone. This ensures that writeback does not add
spurious file fragmentation when writing back a large number of
files that are larger than the zone size.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_alloc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 1147bacb2da8..0f4e460fd3ea 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1215,6 +1215,7 @@ xfs_mount_zones(
.mp = mp,
};
struct xfs_buftarg *bt = mp->m_rtdev_targp;
+ xfs_extlen_t zone_blocks = mp->m_groups[XG_TYPE_RTG].blocks;
int error;
if (!bt) {
@@ -1245,10 +1246,12 @@ xfs_mount_zones(
return -ENOMEM;
xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
- mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks,
- mp->m_max_open_zones);
+ mp->m_sb.sb_rgcount, zone_blocks, mp->m_max_open_zones);
trace_xfs_zones_mount(mp);
+ mp->m_super->s_min_writeback_pages =
+ XFS_FSB_TO_B(mp, zone_blocks) >> PAGE_SHIFT;
+
if (bdev_is_zoned(bt->bt_bdev)) {
error = blkdev_report_zones(bt->bt_bdev,
XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart),
--
2.47.3
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems
2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
@ 2025-10-15 7:10 ` Damien Le Moal
2025-10-15 16:01 ` Darrick J. Wong
1 sibling, 0 replies; 23+ messages in thread
From: Damien Le Moal @ 2025-10-15 7:10 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel,
linux-xfs
On 2025/10/15 15:27, Christoph Hellwig wrote:
> Set s_min_writeback_pages to the zone size, so that writeback always
> writes up to a full zone. This ensures that writeback does not add
> spurious file fragmentation when writing back a large number of
> files that are larger than the zone size.
>
> Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems
2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
2025-10-15 7:10 ` Damien Le Moal
@ 2025-10-15 16:01 ` Darrick J. Wong
1 sibling, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-10-15 16:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, Oct 15, 2025 at 03:27:16PM +0900, Christoph Hellwig wrote:
> Set s_min_writeback_pages to the zone size, so that writeback always
> writes up to a full zone. This ensures that writeback does not add
> spurious file fragmentation when writing back a large number of
> files that are larger than the zone size.
>
> Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_alloc.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
> index 1147bacb2da8..0f4e460fd3ea 100644
> --- a/fs/xfs/xfs_zone_alloc.c
> +++ b/fs/xfs/xfs_zone_alloc.c
> @@ -1215,6 +1215,7 @@ xfs_mount_zones(
> .mp = mp,
> };
> struct xfs_buftarg *bt = mp->m_rtdev_targp;
> + xfs_extlen_t zone_blocks = mp->m_groups[XG_TYPE_RTG].blocks;
> int error;
>
> if (!bt) {
> @@ -1245,10 +1246,12 @@ xfs_mount_zones(
> return -ENOMEM;
>
> xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
> - mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks,
> - mp->m_max_open_zones);
> + mp->m_sb.sb_rgcount, zone_blocks, mp->m_max_open_zones);
> trace_xfs_zones_mount(mp);
>
> + mp->m_super->s_min_writeback_pages =
> + XFS_FSB_TO_B(mp, zone_blocks) >> PAGE_SHIFT;
Hmm. The maximum rtgroup (and hence zone) size is 2^31-1 blocks.
That quantity is casted to int64_t by FSB_TO_B, then shifted down by
PAGE_SHIFT. So I think there's no chance of an overflow here,
especially if s_min_writeback_pages becomes type long.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> +
> if (bdev_is_zoned(bt->bt_bdev)) {
> error = blkdev_report_zones(bt->bt_bdev,
> XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart),
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: allow file systems to increase the minimum writeback chunk size
2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig
` (2 preceding siblings ...)
2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
@ 2025-10-15 7:11 ` Damien Le Moal
3 siblings, 0 replies; 23+ messages in thread
From: Damien Le Moal @ 2025-10-15 7:11 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel,
linux-xfs
On 2025/10/15 15:27, Christoph Hellwig wrote:
> Hi all,
>
> The relatively low minimal writeback size of 4MiB leads means that
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size, and set it to the zone size for zoned XFS.
For the series:
Tested-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 23+ messages in thread