* allow file systems to increase the minimum writeback chunk size v2
@ 2025-10-17 3:45 Christoph Hellwig
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
Hi all,
The relatively low minimal writeback size of 4MiB leads means that
written back inodes on rotational media are switched a lot. Besides
introducing additional seeks, this also can lead to extreme file
fragmentation on zoned devices when a lot of files are cached relative
to the available writeback bandwidth.
Add a superblock field that allows the file system to override the
default size, and set it to the zone size for zoned XFS.
Changes since v1:
- covert the field to a long to match other related writeback code
- cap the zone XFS writeback size to the maximum extent size
- write an extensive comment about the tradeoffs of setting the value
- fix a commit message typo
Diffstat:
fs/fs-writeback.c | 26 +++++++++-----------------
fs/super.c | 1 +
fs/xfs/xfs_zone_alloc.c | 28 ++++++++++++++++++++++++++--
include/linux/fs.h | 1 +
include/linux/writeback.h | 5 +++++
5 files changed, 42 insertions(+), 19 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
@ 2025-10-17 3:45 ` Christoph Hellwig
2025-10-17 12:31 ` Jan Kara
` (2 more replies)
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs, Darrick J. Wong
Return the pages directly when calculated instead of first assigning
them back to a variable, and directly return for the data integrity /
tagged case instead of going through an else clause.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fs-writeback.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2b35e80037fe..11fd08a0efb8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
* (maybe slowly) sync all tagged pages
*/
if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
- pages = LONG_MAX;
- else {
- pages = min(wb->avg_write_bandwidth / 2,
- global_wb_domain.dirty_limit / DIRTY_SCOPE);
- pages = min(pages, work->nr_pages);
- pages = round_down(pages + MIN_WRITEBACK_PAGES,
- MIN_WRITEBACK_PAGES);
- }
+ return LONG_MAX;
- return pages;
+ pages = min(wb->avg_write_bandwidth / 2,
+ global_wb_domain.dirty_limit / DIRTY_SCOPE);
+ pages = min(pages, work->nr_pages);
+ return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
}
/*
--
2.47.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
@ 2025-10-17 3:45 ` Christoph Hellwig
2025-10-17 12:32 ` Jan Kara
` (3 more replies)
2025-10-17 3:45 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
2025-10-22 5:34 ` allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
3 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
The relatively low minimal writeback size of 4MiB means that written back
inodes on rotational media are switched a lot. Besides introducing
additional seeks, this also can lead to extreme file fragmentation on
zoned devices when a lot of files are cached relative to the available
writeback bandwidth.
Add a superblock field that allows the file system to override the
default size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/fs-writeback.c | 14 +++++---------
fs/super.c | 1 +
include/linux/fs.h | 1 +
include/linux/writeback.h | 5 +++++
4 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 11fd08a0efb8..6d50b02cdab6 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -31,11 +31,6 @@
#include <linux/memcontrol.h>
#include "internal.h"
-/*
- * 4MB minimal write chunk size
- */
-#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
-
/*
* Passed into wb_writeback(), essentially a subset of writeback_control
*/
@@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
return ret;
}
-static long writeback_chunk_size(struct bdi_writeback *wb,
- struct wb_writeback_work *work)
+static long writeback_chunk_size(struct super_block *sb,
+ struct bdi_writeback *wb, struct wb_writeback_work *work)
{
long pages;
@@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
pages = min(wb->avg_write_bandwidth / 2,
global_wb_domain.dirty_limit / DIRTY_SCOPE);
pages = min(pages, work->nr_pages);
- return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
+ return round_down(pages + sb->s_min_writeback_pages,
+ sb->s_min_writeback_pages);
}
/*
@@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
inode->i_state |= I_SYNC;
wbc_attach_and_unlock_inode(&wbc, inode);
- write_chunk = writeback_chunk_size(wb, work);
+ write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
wbc.nr_to_write = write_chunk;
wbc.pages_skipped = 0;
diff --git a/fs/super.c b/fs/super.c
index 5bab94fb7e03..599c1d2641fe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
goto fail;
if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
goto fail;
+ s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
return s;
fail:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..ae6f37c6eaa4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1583,6 +1583,7 @@ struct super_block {
spinlock_t s_inode_wblist_lock;
struct list_head s_inodes_wb; /* writeback inodes */
+ long s_min_writeback_pages;
} __randomize_layout;
static inline struct user_namespace *i_user_ns(const struct inode *inode)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 22dd4adc5667..49e1dd96f43e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
void sb_mark_inode_writeback(struct inode *inode);
void sb_clear_inode_writeback(struct inode *inode);
+/*
+ * 4MB minimal write chunk size
+ */
+#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
+
#endif /* WRITEBACK_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems
2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
@ 2025-10-17 3:45 ` Christoph Hellwig
2025-10-22 5:34 ` allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
3 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs, Darrick J. Wong
Set s_min_writeback_pages to the zone size, so that writeback always
writes up to a full zone. This ensures that writeback does not add
spurious file fragmentation when writing back a large number of
files that are larger than the zone size.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_zone_alloc.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 1147bacb2da8..c342595acc3e 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1215,6 +1215,7 @@ xfs_mount_zones(
.mp = mp,
};
struct xfs_buftarg *bt = mp->m_rtdev_targp;
+ xfs_extlen_t zone_blocks = mp->m_groups[XG_TYPE_RTG].blocks;
int error;
if (!bt) {
@@ -1245,10 +1246,33 @@ xfs_mount_zones(
return -ENOMEM;
xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
- mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks,
- mp->m_max_open_zones);
+ mp->m_sb.sb_rgcount, zone_blocks, mp->m_max_open_zones);
trace_xfs_zones_mount(mp);
+ /*
+ * The writeback code switches between inodes regularly to provide
+ * fairness. The default lower bound is 4MiB, but for zoned file
+ * systems we want to increase that both to reduce seeks, but also more
+ * importantly so that workloads that writes files in a multiple of the
+ * zone size do not get fragmented and require garbage collection when
+ * they shouldn't. Increase is to the zone size capped by the max
+ * extent len.
+ *
+ * Note that because s_min_writeback_pages is a superblock field, this
+ * value also get applied to non-zoned files on the data device if
+ * there are any. On typical zoned setup all data is on the RT device
+ * because using the more efficient sequential write required zones
+ * is the reason for using the zone allocator, and either the RT device
+ * and the (meta)data device are on the same block device, or the
+ * (meta)data device is on a fast SSD while the data on the RT device
+ * is on a SMR HDD. In any combination of the above cases enforcing
+ * the higher min_writeback_pages for non-RT inodes is either a noop
+ * or beneficial.
+ */
+ mp->m_super->s_min_writeback_pages =
+ XFS_FSB_TO_B(mp, min(zone_blocks, XFS_MAX_BMBT_EXTLEN)) >>
+ PAGE_SHIFT;
+
if (bdev_is_zoned(bt->bt_bdev)) {
error = blkdev_report_zones(bt->bt_bdev,
XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart),
--
2.47.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
@ 2025-10-17 12:31 ` Jan Kara
2025-10-20 9:35 ` Jan Kara
2025-10-24 14:11 ` Nirjhar Roy (IBM)
2 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2025-10-17 12:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs,
Darrick J. Wong
On Fri 17-10-25 05:45:47, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
I think I've already given my tag to this patch but anyway, feel free to
add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2b35e80037fe..11fd08a0efb8 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> * (maybe slowly) sync all tagged pages
> */
> if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
> - pages = LONG_MAX;
> - else {
> - pages = min(wb->avg_write_bandwidth / 2,
> - global_wb_domain.dirty_limit / DIRTY_SCOPE);
> - pages = min(pages, work->nr_pages);
> - pages = round_down(pages + MIN_WRITEBACK_PAGES,
> - MIN_WRITEBACK_PAGES);
> - }
> + return LONG_MAX;
>
> - return pages;
> + pages = min(wb->avg_write_bandwidth / 2,
> + global_wb_domain.dirty_limit / DIRTY_SCOPE);
> + pages = min(pages, work->nr_pages);
> + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> }
>
> /*
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
@ 2025-10-17 12:32 ` Jan Kara
2025-10-17 15:45 ` Darrick J. Wong
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2025-10-17 12:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Fri 17-10-25 05:45:48, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB means that written back
> inodes on rotational media are switched a lot. Besides introducing
> additional seeks, this also can lead to extreme file fragmentation on
> zoned devices when a lot of files are cached relative to the available
> writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 14 +++++---------
> fs/super.c | 1 +
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 4 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 11fd08a0efb8..6d50b02cdab6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -31,11 +31,6 @@
> #include <linux/memcontrol.h>
> #include "internal.h"
>
> -/*
> - * 4MB minimal write chunk size
> - */
> -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> -
> /*
> * Passed into wb_writeback(), essentially a subset of writeback_control
> */
> @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
> return ret;
> }
>
> -static long writeback_chunk_size(struct bdi_writeback *wb,
> - struct wb_writeback_work *work)
> +static long writeback_chunk_size(struct super_block *sb,
> + struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> long pages;
>
> @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> pages = min(wb->avg_write_bandwidth / 2,
> global_wb_domain.dirty_limit / DIRTY_SCOPE);
> pages = min(pages, work->nr_pages);
> - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> + return round_down(pages + sb->s_min_writeback_pages,
> + sb->s_min_writeback_pages);
> }
>
> /*
> @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> inode->i_state |= I_SYNC;
> wbc_attach_and_unlock_inode(&wbc, inode);
>
> - write_chunk = writeback_chunk_size(wb, work);
> + write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
> wbc.nr_to_write = write_chunk;
> wbc.pages_skipped = 0;
>
> diff --git a/fs/super.c b/fs/super.c
> index 5bab94fb7e03..599c1d2641fe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
> goto fail;
> if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
> goto fail;
> + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
> return s;
>
> fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..ae6f37c6eaa4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + long s_min_writeback_pages;
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 22dd4adc5667..49e1dd96f43e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
> void sb_mark_inode_writeback(struct inode *inode);
> void sb_clear_inode_writeback(struct inode *inode);
>
> +/*
> + * 4MB minimal write chunk size
> + */
> +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> +
> #endif /* WRITEBACK_H */
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
2025-10-17 12:32 ` Jan Kara
@ 2025-10-17 15:45 ` Darrick J. Wong
2025-10-20 9:35 ` Jan Kara
2025-10-24 14:33 ` Nirjhar Roy (IBM)
3 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2025-10-17 15:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Fri, Oct 17, 2025 at 05:45:48AM +0200, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB means that written back
> inodes on rotational media are switched a lot. Besides introducing
> additional seeks, this also can lead to extreme file fragmentation on
> zoned devices when a lot of files are cached relative to the available
> writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
The comment in the next patch satisfies me sufficiently, so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fs-writeback.c | 14 +++++---------
> fs/super.c | 1 +
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 4 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 11fd08a0efb8..6d50b02cdab6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -31,11 +31,6 @@
> #include <linux/memcontrol.h>
> #include "internal.h"
>
> -/*
> - * 4MB minimal write chunk size
> - */
> -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> -
> /*
> * Passed into wb_writeback(), essentially a subset of writeback_control
> */
> @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
> return ret;
> }
>
> -static long writeback_chunk_size(struct bdi_writeback *wb,
> - struct wb_writeback_work *work)
> +static long writeback_chunk_size(struct super_block *sb,
> + struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> long pages;
>
> @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> pages = min(wb->avg_write_bandwidth / 2,
> global_wb_domain.dirty_limit / DIRTY_SCOPE);
> pages = min(pages, work->nr_pages);
> - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> + return round_down(pages + sb->s_min_writeback_pages,
> + sb->s_min_writeback_pages);
> }
>
> /*
> @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> inode->i_state |= I_SYNC;
> wbc_attach_and_unlock_inode(&wbc, inode);
>
> - write_chunk = writeback_chunk_size(wb, work);
> + write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
> wbc.nr_to_write = write_chunk;
> wbc.pages_skipped = 0;
>
> diff --git a/fs/super.c b/fs/super.c
> index 5bab94fb7e03..599c1d2641fe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
> goto fail;
> if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
> goto fail;
> + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
> return s;
>
> fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..ae6f37c6eaa4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + long s_min_writeback_pages;
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 22dd4adc5667..49e1dd96f43e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
> void sb_mark_inode_writeback(struct inode *inode);
> void sb_clear_inode_writeback(struct inode *inode);
>
> +/*
> + * 4MB minimal write chunk size
> + */
> +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> +
> #endif /* WRITEBACK_H */
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
2025-10-17 12:32 ` Jan Kara
2025-10-17 15:45 ` Darrick J. Wong
@ 2025-10-20 9:35 ` Jan Kara
2025-10-24 14:33 ` Nirjhar Roy (IBM)
3 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2025-10-20 9:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Fri 17-10-25 05:45:48, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB means that written back
> inodes on rotational media are switched a lot. Besides introducing
> additional seeks, this also can lead to extreme file fragmentation on
> zoned devices when a lot of files are cached relative to the available
> writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 14 +++++---------
> fs/super.c | 1 +
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 4 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 11fd08a0efb8..6d50b02cdab6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -31,11 +31,6 @@
> #include <linux/memcontrol.h>
> #include "internal.h"
>
> -/*
> - * 4MB minimal write chunk size
> - */
> -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> -
> /*
> * Passed into wb_writeback(), essentially a subset of writeback_control
> */
> @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
> return ret;
> }
>
> -static long writeback_chunk_size(struct bdi_writeback *wb,
> - struct wb_writeback_work *work)
> +static long writeback_chunk_size(struct super_block *sb,
> + struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> long pages;
>
> @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> pages = min(wb->avg_write_bandwidth / 2,
> global_wb_domain.dirty_limit / DIRTY_SCOPE);
> pages = min(pages, work->nr_pages);
> - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> + return round_down(pages + sb->s_min_writeback_pages,
> + sb->s_min_writeback_pages);
> }
>
> /*
> @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> inode->i_state |= I_SYNC;
> wbc_attach_and_unlock_inode(&wbc, inode);
>
> - write_chunk = writeback_chunk_size(wb, work);
> + write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
> wbc.nr_to_write = write_chunk;
> wbc.pages_skipped = 0;
>
> diff --git a/fs/super.c b/fs/super.c
> index 5bab94fb7e03..599c1d2641fe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
> goto fail;
> if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
> goto fail;
> + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
> return s;
>
> fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..ae6f37c6eaa4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + long s_min_writeback_pages;
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 22dd4adc5667..49e1dd96f43e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
> void sb_mark_inode_writeback(struct inode *inode);
> void sb_clear_inode_writeback(struct inode *inode);
>
> +/*
> + * 4MB minimal write chunk size
> + */
> +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> +
> #endif /* WRITEBACK_H */
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-17 12:31 ` Jan Kara
@ 2025-10-20 9:35 ` Jan Kara
2025-10-24 14:11 ` Nirjhar Roy (IBM)
2 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2025-10-20 9:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton,
willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs,
Darrick J. Wong
On Fri 17-10-25 05:45:47, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Looks good, feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2b35e80037fe..11fd08a0efb8 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> * (maybe slowly) sync all tagged pages
> */
> if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
> - pages = LONG_MAX;
> - else {
> - pages = min(wb->avg_write_bandwidth / 2,
> - global_wb_domain.dirty_limit / DIRTY_SCOPE);
> - pages = min(pages, work->nr_pages);
> - pages = round_down(pages + MIN_WRITEBACK_PAGES,
> - MIN_WRITEBACK_PAGES);
> - }
> + return LONG_MAX;
>
> - return pages;
> + pages = min(wb->avg_write_bandwidth / 2,
> + global_wb_domain.dirty_limit / DIRTY_SCOPE);
> + pages = min(pages, work->nr_pages);
> + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> }
>
> /*
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: allow file systems to increase the minimum writeback chunk size v2
2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
` (2 preceding siblings ...)
2025-10-17 3:45 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
@ 2025-10-22 5:34 ` Christoph Hellwig
2025-10-22 18:38 ` Andrew Morton
3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-22 5:34 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
Looks like everything is reviewed now, can we get this queued up
as it fixes nasty fragmentation for zoned XFS?
It seems like the most recent writeback updates went through the VFS
tree, although -mm has been quite common as well.
On Fri, Oct 17, 2025 at 05:45:46AM +0200, Christoph Hellwig wrote:
> Hi all,
>
> The relatively low minimal writeback size of 4MiB leads means that
> written back inodes on rotational media are switched a lot. Besides
> introducing additional seeks, this also can lead to extreme file
> fragmentation on zoned devices when a lot of files are cached relative
> to the available writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size, and set it to the zone size for zoned XFS.
>
> Changes since v1:
> - covert the field to a long to match other related writeback code
> - cap the zone XFS writeback size to the maximum extent size
> - write an extensive comment about the tradeoffs of setting the value
> - fix a commit message typo
>
> Diffstat:
> fs/fs-writeback.c | 26 +++++++++-----------------
> fs/super.c | 1 +
> fs/xfs/xfs_zone_alloc.c | 28 ++++++++++++++++++++++++++--
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 5 files changed, 42 insertions(+), 19 deletions(-)
---end quoted text---
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: allow file systems to increase the minimum writeback chunk size v2
2025-10-22 5:34 ` allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
@ 2025-10-22 18:38 ` Andrew Morton
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2025-10-22 18:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Carlos Maiolino, willy, dlemoal,
hans.holmberg, linux-mm, linux-fsdevel, linux-xfs
On Wed, 22 Oct 2025 07:34:34 +0200 Christoph Hellwig <hch@lst.de> wrote:
> Looks like everything is reviewed now, can we get this queued up
> as it fixes nasty fragmentation for zoned XFS?
>
> It seems like the most recent writeback updates went through the VFS
> tree, although -mm has been quite common as well.
mpage, writeback, readahead, filemap, buffer.c etc have traditionally
been MM tree things (heck, I basically wrote them all a mere 20 years
ago).
They're transitioning to being fs things nowadays, and that makes sense
- filesystems are the clients for this code.
But please do keep cc'ing linux-mm and myself on this work.
> > fs/fs-writeback.c | 26 +++++++++-----------------
> > fs/super.c | 1 +
> > fs/xfs/xfs_zone_alloc.c | 28 ++++++++++++++++++++++++++--
> > include/linux/fs.h | 1 +
> > include/linux/writeback.h | 5 +++++
VFS tree, please.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-17 12:31 ` Jan Kara
2025-10-20 9:35 ` Jan Kara
@ 2025-10-24 14:11 ` Nirjhar Roy (IBM)
2 siblings, 0 replies; 14+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-10-24 14:11 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs, Darrick J. Wong
On Fri, 2025-10-17 at 05:45 +0200, Christoph Hellwig wrote:
> Return the pages directly when calculated instead of first assigning
> them back to a variable, and directly return for the data integrity /
> tagged case instead of going through an else clause.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> fs/fs-writeback.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2b35e80037fe..11fd08a0efb8 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> * (maybe slowly) sync all tagged pages
> */
> if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
> - pages = LONG_MAX;
> - else {
> - pages = min(wb->avg_write_bandwidth / 2,
> - global_wb_domain.dirty_limit / DIRTY_SCOPE);
> - pages = min(pages, work->nr_pages);
> - pages = round_down(pages + MIN_WRITEBACK_PAGES,
> - MIN_WRITEBACK_PAGES);
> - }
> + return LONG_MAX;
>
> - return pages;
> + pages = min(wb->avg_write_bandwidth / 2,
> + global_wb_domain.dirty_limit / DIRTY_SCOPE);
> + pages = min(pages, work->nr_pages);
> + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
This looks fine to me since this simplies the overall structure of the code. I don't think this
introduces any functional change.
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> }
>
> /*
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
` (2 preceding siblings ...)
2025-10-20 9:35 ` Jan Kara
@ 2025-10-24 14:33 ` Nirjhar Roy (IBM)
2025-10-24 15:12 ` Christoph Hellwig
3 siblings, 1 reply; 14+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-10-24 14:33 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
On Fri, 2025-10-17 at 05:45 +0200, Christoph Hellwig wrote:
> The relatively low minimal writeback size of 4MiB means that written back
> inodes on rotational media are switched a lot. Besides introducing
> additional seeks, this also can lead to extreme file fragmentation on
> zoned devices when a lot of files are cached relative to the available
> writeback bandwidth.
>
> Add a superblock field that allows the file system to override the
> default size.
So this patch doesn't really explicitly set s_min_writeback_pages to a non-default/overridden value,
right? That is being done in the next patch, isn't it?
--NR
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/fs-writeback.c | 14 +++++---------
> fs/super.c | 1 +
> include/linux/fs.h | 1 +
> include/linux/writeback.h | 5 +++++
> 4 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 11fd08a0efb8..6d50b02cdab6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -31,11 +31,6 @@
> #include <linux/memcontrol.h>
> #include "internal.h"
>
> -/*
> - * 4MB minimal write chunk size
> - */
> -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> -
> /*
> * Passed into wb_writeback(), essentially a subset of writeback_control
> */
> @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode,
> return ret;
> }
>
> -static long writeback_chunk_size(struct bdi_writeback *wb,
> - struct wb_writeback_work *work)
> +static long writeback_chunk_size(struct super_block *sb,
> + struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> long pages;
>
> @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
> pages = min(wb->avg_write_bandwidth / 2,
> global_wb_domain.dirty_limit / DIRTY_SCOPE);
> pages = min(pages, work->nr_pages);
> - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES);
> + return round_down(pages + sb->s_min_writeback_pages,
> + sb->s_min_writeback_pages);
> }
>
> /*
> @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb,
> inode->i_state |= I_SYNC;
> wbc_attach_and_unlock_inode(&wbc, inode);
>
> - write_chunk = writeback_chunk_size(wb, work);
> + write_chunk = writeback_chunk_size(inode->i_sb, wb, work);
> wbc.nr_to_write = write_chunk;
> wbc.pages_skipped = 0;
>
> diff --git a/fs/super.c b/fs/super.c
> index 5bab94fb7e03..599c1d2641fe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
> goto fail;
> if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
> goto fail;
> + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES;
> return s;
>
> fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..ae6f37c6eaa4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1583,6 +1583,7 @@ struct super_block {
>
> spinlock_t s_inode_wblist_lock;
> struct list_head s_inodes_wb; /* writeback inodes */
> + long s_min_writeback_pages;
> } __randomize_layout;
>
> static inline struct user_namespace *i_user_ns(const struct inode *inode)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 22dd4adc5667..49e1dd96f43e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *);
> void sb_mark_inode_writeback(struct inode *inode);
> void sb_clear_inode_writeback(struct inode *inode);
>
> +/*
> + * 4MB minimal write chunk size
> + */
> +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10))
> +
> #endif /* WRITEBACK_H */
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES
2025-10-24 14:33 ` Nirjhar Roy (IBM)
@ 2025-10-24 15:12 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-10-24 15:12 UTC (permalink / raw)
To: Nirjhar Roy (IBM)
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino,
Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
On Fri, Oct 24, 2025 at 08:03:34PM +0530, Nirjhar Roy (IBM) wrote:
> So this patch doesn't really explicitly set s_min_writeback_pages to a non-default/overridden value,
> right? That is being done in the next patch, isn't it?
Exactly.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-24 15:12 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
2025-10-17 3:45 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
2025-10-17 12:31 ` Jan Kara
2025-10-20 9:35 ` Jan Kara
2025-10-24 14:11 ` Nirjhar Roy (IBM)
2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig
2025-10-17 12:32 ` Jan Kara
2025-10-17 15:45 ` Darrick J. Wong
2025-10-20 9:35 ` Jan Kara
2025-10-24 14:33 ` Nirjhar Roy (IBM)
2025-10-24 15:12 ` Christoph Hellwig
2025-10-17 3:45 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig
2025-10-22 5:34 ` allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig
2025-10-22 18:38 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).