* allow file systems to increase the minimum writeback chunk size
@ 2025-10-15 6:27 Christoph Hellwig
2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw)
To: Christian Brauner, Jan Kara, Carlos Maiolino
Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm,
linux-fsdevel, linux-xfs
Hi all,
The relatively low minimal writeback size of 4MiB leads means that
written back inodes on rotational media are switched a lot. Besides
introducing additional seeks, this also can lead to extreme file
fragmentation on zoned devices when a lot of files are cached relative
to the available writeback bandwidth.
Add a superblock field that allows the file system to override the
default size, and set it to the zone size for zoned XFS.
Diffstat:
b/fs/fs-writeback.c | 14 +++++---------
b/fs/super.c | 1 +
b/fs/xfs/xfs_zone_alloc.c | 7 +++++--
b/include/linux/fs.h | 1 +
b/include/linux/writeback.h | 5 +++++
fs/fs-writeback.c | 14 +++++---------
6 files changed, 22 insertions(+), 20 deletions(-)
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH 1/3] writeback: cleanup writeback_chunk_size 2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig @ 2025-10-15 6:27 ` Christoph Hellwig 2025-10-15 7:05 ` Damien Le Moal ` (2 more replies) 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 3 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw) To: Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs Return the pages directly when calculated instead of first assigning them back to a variable, and directly return for the data integrity / tagged case instead of going through an else clause. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/fs-writeback.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 2b35e80037fe..11fd08a0efb8 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb, * (maybe slowly) sync all tagged pages */ if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) - pages = LONG_MAX; - else { - pages = min(wb->avg_write_bandwidth / 2, - global_wb_domain.dirty_limit / DIRTY_SCOPE); - pages = min(pages, work->nr_pages); - pages = round_down(pages + MIN_WRITEBACK_PAGES, - MIN_WRITEBACK_PAGES); - } + return LONG_MAX; - return pages; + pages = min(wb->avg_write_bandwidth / 2, + global_wb_domain.dirty_limit / DIRTY_SCOPE); + pages = min(pages, work->nr_pages); + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); } /* -- 2.47.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig @ 2025-10-15 7:05 ` Damien Le Moal 2025-10-15 15:48 ` Darrick J. Wong 2025-10-20 9:34 ` Jan Kara 2 siblings, 0 replies; 25+ messages in thread From: Damien Le Moal @ 2025-10-15 7:05 UTC (permalink / raw) To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On 2025/10/15 15:27, Christoph Hellwig wrote: > Return the pages directly when calculated instead of first assigning > them back to a variable, and directly return for the data integrity / > tagged case instead of going through an else clause. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig 2025-10-15 7:05 ` Damien Le Moal @ 2025-10-15 15:48 ` Darrick J. Wong 2025-10-20 9:34 ` Jan Kara 2 siblings, 0 replies; 25+ messages in thread From: Darrick J. Wong @ 2025-10-15 15:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 03:27:14PM +0900, Christoph Hellwig wrote: > Return the pages directly when calculated instead of first assigning > them back to a variable, and directly return for the data integrity / > tagged case instead of going through an else clause. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks pretty simple to me, Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > --- > fs/fs-writeback.c | 14 +++++--------- > 1 file changed, 5 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 2b35e80037fe..11fd08a0efb8 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > * (maybe slowly) sync all tagged pages > */ > if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) > - pages = LONG_MAX; > - else { > - pages = min(wb->avg_write_bandwidth / 2, > - global_wb_domain.dirty_limit / DIRTY_SCOPE); > - pages = min(pages, work->nr_pages); > - pages = round_down(pages + MIN_WRITEBACK_PAGES, > - MIN_WRITEBACK_PAGES); > - } > + return LONG_MAX; > > - return pages; > + pages = min(wb->avg_write_bandwidth / 2, > + global_wb_domain.dirty_limit / DIRTY_SCOPE); > + pages = min(pages, work->nr_pages); > + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > } > > /* > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] writeback: cleanup writeback_chunk_size 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig 2025-10-15 7:05 ` Damien Le Moal 2025-10-15 15:48 ` Darrick J. Wong @ 2025-10-20 9:34 ` Jan Kara 2 siblings, 0 replies; 25+ messages in thread From: Jan Kara @ 2025-10-20 9:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed 15-10-25 15:27:14, Christoph Hellwig wrote: > Return the pages directly when calculated instead of first assigning > them back to a variable, and directly return for the data integrity / > tagged case instead of going through an else clause. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/fs-writeback.c | 14 +++++--------- > 1 file changed, 5 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 2b35e80037fe..11fd08a0efb8 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1893,16 +1893,12 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > * (maybe slowly) sync all tagged pages > */ > if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) > - pages = LONG_MAX; > - else { > - pages = min(wb->avg_write_bandwidth / 2, > - global_wb_domain.dirty_limit / DIRTY_SCOPE); > - pages = min(pages, work->nr_pages); > - pages = round_down(pages + MIN_WRITEBACK_PAGES, > - MIN_WRITEBACK_PAGES); > - } > + return LONG_MAX; > > - return pages; > + pages = min(wb->avg_write_bandwidth / 2, > + global_wb_domain.dirty_limit / DIRTY_SCOPE); > + pages = min(pages, work->nr_pages); > + return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > } > > /* > -- > 2.47.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig @ 2025-10-15 6:27 ` Christoph Hellwig 2025-10-15 7:09 ` Damien Le Moal ` (3 more replies) 2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig 2025-10-15 7:11 ` allow file systems to increase the minimum writeback chunk size Damien Le Moal 3 siblings, 4 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw) To: Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs The relatively low minimal writeback size of 4MiB leads means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. Add a superblock field that allows the file system to override the default size. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/fs-writeback.c | 14 +++++--------- fs/super.c | 1 + include/linux/fs.h | 1 + include/linux/writeback.h | 5 +++++ 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 11fd08a0efb8..6d50b02cdab6 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -31,11 +31,6 @@ #include <linux/memcontrol.h> #include "internal.h" -/* - * 4MB minimal write chunk size - */ -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) - /* * Passed into wb_writeback(), essentially a subset of writeback_control */ @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, return ret; } -static long writeback_chunk_size(struct bdi_writeback *wb, - struct wb_writeback_work *work) +static long writeback_chunk_size(struct super_block *sb, + struct bdi_writeback *wb, struct wb_writeback_work *work) { long pages; @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, pages = min(wb->avg_write_bandwidth / 2, global_wb_domain.dirty_limit / DIRTY_SCOPE); pages = min(pages, work->nr_pages); - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); + return round_down(pages + sb->s_min_writeback_pages, + sb->s_min_writeback_pages); } /* @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, inode->i_state |= I_SYNC; wbc_attach_and_unlock_inode(&wbc, inode); - write_chunk = writeback_chunk_size(wb, work); + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); wbc.nr_to_write = write_chunk; wbc.pages_skipped = 0; diff --git a/fs/super.c b/fs/super.c index 5bab94fb7e03..599c1d2641fe 100644 --- a/fs/super.c +++ b/fs/super.c @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, goto fail; if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) goto fail; + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; return s; fail: diff --git a/include/linux/fs.h b/include/linux/fs.h index c895146c1444..23f1f10646b7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1583,6 +1583,7 @@ struct super_block { spinlock_t s_inode_wblist_lock; struct list_head s_inodes_wb; /* writeback inodes */ + unsigned int s_min_writeback_pages; } __randomize_layout; static inline struct user_namespace *i_user_ns(const struct inode *inode) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 22dd4adc5667..49e1dd96f43e 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); void sb_mark_inode_writeback(struct inode *inode); void sb_clear_inode_writeback(struct inode *inode); +/* + * 4MB minimal write chunk size + */ +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) + #endif /* WRITEBACK_H */ -- 2.47.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig @ 2025-10-15 7:09 ` Damien Le Moal 2025-10-15 7:27 ` Christoph Hellwig 2025-10-15 15:13 ` Theodore Ts'o ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Damien Le Moal @ 2025-10-15 7:09 UTC (permalink / raw) To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On 2025/10/15 15:27, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB leads means that Removes "leads" in the above. > written back inodes on rotational media are switched a lot. Besides > introducing additional seeks, this also can lead to extreme file > fragmentation on zoned devices when a lot of files are cached relative > to the available writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..23f1f10646b7 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + unsigned int s_min_writeback_pages; Given that writeback_chunk_size() returns a long type, maybe this should be a long ? > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 7:09 ` Damien Le Moal @ 2025-10-15 7:27 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-15 7:27 UTC (permalink / raw) To: Damien Le Moal Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 04:09:13PM +0900, Damien Le Moal wrote: > > + unsigned int s_min_writeback_pages; > > Given that writeback_chunk_size() returns a long type, maybe this should be a long ? Not that it currently matters much, but yes for consistency this should be a long. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-15 7:09 ` Damien Le Moal @ 2025-10-15 15:13 ` Theodore Ts'o 2025-10-16 4:33 ` Christoph Hellwig 2025-10-15 15:57 ` Darrick J. Wong 2025-10-15 20:49 ` Dave Chinner 3 siblings, 1 reply; 25+ messages in thread From: Theodore Ts'o @ 2025-10-15 15:13 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB leads means that > written back inodes on rotational media are switched a lot. Besides > introducing additional seeks, this also can lead to extreme file > fragmentation on zoned devices when a lot of files are cached relative > to the available writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. I wonder if we should bump the default; and if the concern is that might be problematic for super slow devices (e.g., cheap USB thumb drives), perhaps we can measure the time needed to complete the writeback, and then dynamically adjust the value based on the apparent write bandwidth? We could have each file system implement something like this, but maybe there should be a way to do this in fs generic code? - Ted ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 15:13 ` Theodore Ts'o @ 2025-10-16 4:33 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-16 4:33 UTC (permalink / raw) To: Theodore Ts'o Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 11:13:53AM -0400, Theodore Ts'o wrote: > I wonder if we should bump the default; and if the concern is that > might be problematic for super slow devices (e.g., cheap USB thumb > drives), perhaps we can measure the time needed to complete the > writeback, and then dynamically adjust the value based on the apparent > write bandwidth? > > We could have each file system implement something like this, but > maybe there should be a way to do this in fs generic code? Right now my main concern here is zoned file systems where the switching directly leads to fragmentation. Besides XFS that would in theory also affect f2fs and btrfs, but unlike XFS they do not do the trivial data separation by inode but just throw all writes into the blender with (f2fs) or without (btrfs) some hot cold separation applied. But even if they did it finding the zone size is file system specific, so right now I don't see much too share. If we end up with duplicate code I'll happily factor it into helpers. > > - Ted ---end quoted text--- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-15 7:09 ` Damien Le Moal 2025-10-15 15:13 ` Theodore Ts'o @ 2025-10-15 15:57 ` Darrick J. Wong 2025-10-16 4:37 ` Christoph Hellwig 2025-10-15 20:49 ` Dave Chinner 3 siblings, 1 reply; 25+ messages in thread From: Darrick J. Wong @ 2025-10-15 15:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB leads means that > written back inodes on rotational media are switched a lot. Besides > introducing additional seeks, this also can lead to extreme file > fragmentation on zoned devices when a lot of files are cached relative > to the available writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. I havea a few side-questy questions about this patch: Should this be some sort of BDI field? Maybe there are other workloads that create a lot of dirty pages and the sysadmin would like to be able to tell the fs to schedule larger chunks of writeback before switching to another inode? XFS can have two volumes, should we be using the rtdev's bdi for realtime files and the data dev's bdi for non-rt files? That looks like a mess to sort out though, since there's a fair number of places where we just dereference super_block::s_bdi. Also I have no idea what we'd do for filesystem raid -- synthesize a bdi for that? And then how would you advertise that such-and-such fd maps to a particular bdi? (Except for the first question, I don't view the other Qs as blocking issues; the mechanical code change looks ok to me aside from s_min_writeback_pages should be long like Ted said) --D > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/fs-writeback.c | 14 +++++--------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > include/linux/writeback.h | 5 +++++ > 4 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 11fd08a0efb8..6d50b02cdab6 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -31,11 +31,6 @@ > #include <linux/memcontrol.h> > #include "internal.h" > > -/* > - * 4MB minimal write chunk size > - */ > -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > - > /* > * Passed into wb_writeback(), essentially a subset of writeback_control > */ > @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, > return ret; > } > > -static long writeback_chunk_size(struct bdi_writeback *wb, > - struct wb_writeback_work *work) > +static long writeback_chunk_size(struct super_block *sb, > + struct bdi_writeback *wb, struct wb_writeback_work *work) > { > long pages; > > @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > + return round_down(pages + sb->s_min_writeback_pages, > + sb->s_min_writeback_pages); > } > > /* > @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, > inode->i_state |= I_SYNC; > wbc_attach_and_unlock_inode(&wbc, inode); > > - write_chunk = writeback_chunk_size(wb, work); > + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); > wbc.nr_to_write = write_chunk; > wbc.pages_skipped = 0; > > diff --git a/fs/super.c b/fs/super.c > index 5bab94fb7e03..599c1d2641fe 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > goto fail; > if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) > goto fail; > + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; > return s; > > fail: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..23f1f10646b7 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + unsigned int s_min_writeback_pages; > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 22dd4adc5667..49e1dd96f43e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); > void sb_mark_inode_writeback(struct inode *inode); > void sb_clear_inode_writeback(struct inode *inode); > > +/* > + * 4MB minimal write chunk size > + */ > +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > + > #endif /* WRITEBACK_H */ > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 15:57 ` Darrick J. Wong @ 2025-10-16 4:37 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-16 4:37 UTC (permalink / raw) To: Darrick J. Wong Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 08:57:35AM -0700, Darrick J. Wong wrote: > Should this be some sort of BDI field? Maybe there are other workloads > that create a lot of dirty pages and the sysadmin would like to be able > to tell the fs to schedule larger chunks of writeback before switching > to another inode? The BDI is not owned by the file system, but rather the gendisk, so we can't just override it in the file systems. I still hope that eventually changes, in which case we could revisit it. Having a tunable sounds neat, but I'd rather get the fix out first and then design something like that. > > XFS can have two volumes, should we be using the rtdev's bdi for > realtime files and the data dev's bdi for non-rt files? That looks like > a mess to sort out though, since there's a fair number of places where > we just dereference super_block::s_bdi. Each file system only uses a single BDI, which in case of XFS is the one of the gendisk that the main device sits on. Only the bdevfs uses multiple BDIs (one per file{) and that required hard coded hacks in the writeback code. I don't think there is any benefit in having multiple BIDs for real file system, the parallelization work that just got reposted works inside a BDI. > Also I have no idea what we'd do for filesystem raid -- synthesize a bdi > for that? And then how would you advertise that such-and-such fd maps > to a particular bdi? btrfs allocates it's own BDI. And I hope that we eventually move to a model where the file system always own the BDI as that would simplify object lifetimes an relationships and locking a lot. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig ` (2 preceding siblings ...) 2025-10-15 15:57 ` Darrick J. Wong @ 2025-10-15 20:49 ` Dave Chinner 2025-10-16 4:39 ` Christoph Hellwig 3 siblings, 1 reply; 25+ messages in thread From: Dave Chinner @ 2025-10-15 20:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB leads means that > written back inodes on rotational media are switched a lot. Besides > introducing additional seeks, this also can lead to extreme file > fragmentation on zoned devices when a lot of files are cached relative > to the available writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. Hmmm - won't changing this for the zoned rtdev also change behaviour for writeback on the data device? i.e. upping the minimum for the normal data device on XFS will mean writeback bandwidth sharing is a lot less "fair" and higher latency when we have a mix of different file sizes than it currently is... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-15 20:49 ` Dave Chinner @ 2025-10-16 4:39 ` Christoph Hellwig 2025-10-16 8:23 ` Dave Chinner 0 siblings, 1 reply; 25+ messages in thread From: Christoph Hellwig @ 2025-10-16 4:39 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Thu, Oct 16, 2025 at 07:49:20AM +1100, Dave Chinner wrote: > On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote: > > The relatively low minimal writeback size of 4MiB leads means that > > written back inodes on rotational media are switched a lot. Besides > > introducing additional seeks, this also can lead to extreme file > > fragmentation on zoned devices when a lot of files are cached relative > > to the available writeback bandwidth. > > > > Add a superblock field that allows the file system to override the > > default size. > > Hmmm - won't changing this for the zoned rtdev also change behaviour > for writeback on the data device? i.e. upping the minimum for the > normal data device on XFS will mean writeback bandwidth sharing is a > lot less "fair" and higher latency when we have a mix of different > file sizes than it currently is... In theory it is. In practice with a zoned file system the main device is: a) typically only used for metadata b) a fast SSD when not actually on the same device So I think these concerns are valid, but not really worth replacing the simple superblock field with a method to query the value. But I'll write a comment documenting these assumptions as that is useful for future readers of the code. > > -Dave. > -- > Dave Chinner > david@fromorbit.com ---end quoted text--- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-16 4:39 ` Christoph Hellwig @ 2025-10-16 8:23 ` Dave Chinner 0 siblings, 0 replies; 25+ messages in thread From: Dave Chinner @ 2025-10-16 8:23 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Thu, Oct 16, 2025 at 06:39:58AM +0200, Christoph Hellwig wrote: > On Thu, Oct 16, 2025 at 07:49:20AM +1100, Dave Chinner wrote: > > On Wed, Oct 15, 2025 at 03:27:15PM +0900, Christoph Hellwig wrote: > > > The relatively low minimal writeback size of 4MiB leads means that > > > written back inodes on rotational media are switched a lot. Besides > > > introducing additional seeks, this also can lead to extreme file > > > fragmentation on zoned devices when a lot of files are cached relative > > > to the available writeback bandwidth. > > > > > > Add a superblock field that allows the file system to override the > > > default size. > > > > Hmmm - won't changing this for the zoned rtdev also change behaviour > > for writeback on the data device? i.e. upping the minimum for the > > normal data device on XFS will mean writeback bandwidth sharing is a > > lot less "fair" and higher latency when we have a mix of different > > file sizes than it currently is... > > In theory it is. In practice with a zoned file system the main device > is: > > a) typically only used for metadata > b) a fast SSD when not actually on the same device > > So I think these concerns are valid, but not really worth replacing the > simple superblock field with a method to query the value. But I'll write > a comment documenting these assumptions as that is useful for future > readers of the code. That sounds reasonable to me. Eventually we might want to explore per-device BDIs, but for the moment documenting the trade-off being made is good enough. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems 2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig @ 2025-10-15 6:27 ` Christoph Hellwig 2025-10-15 7:10 ` Damien Le Moal 2025-10-15 16:01 ` Darrick J. Wong 2025-10-15 7:11 ` allow file systems to increase the minimum writeback chunk size Damien Le Moal 3 siblings, 2 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-15 6:27 UTC (permalink / raw) To: Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs Set s_min_writeback_pages to the zone size, so that writeback always writes up to a full zone. This ensures that writeback does not add spurious file fragmentation when writing back a large number of files that are larger than the zone size. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_zone_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c index 1147bacb2da8..0f4e460fd3ea 100644 --- a/fs/xfs/xfs_zone_alloc.c +++ b/fs/xfs/xfs_zone_alloc.c @@ -1215,6 +1215,7 @@ xfs_mount_zones( .mp = mp, }; struct xfs_buftarg *bt = mp->m_rtdev_targp; + xfs_extlen_t zone_blocks = mp->m_groups[XG_TYPE_RTG].blocks; int error; if (!bt) { @@ -1245,10 +1246,12 @@ xfs_mount_zones( return -ENOMEM; xfs_info(mp, "%u zones of %u blocks (%u max open zones)", - mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks, - mp->m_max_open_zones); + mp->m_sb.sb_rgcount, zone_blocks, mp->m_max_open_zones); trace_xfs_zones_mount(mp); + mp->m_super->s_min_writeback_pages = + XFS_FSB_TO_B(mp, zone_blocks) >> PAGE_SHIFT; + if (bdev_is_zoned(bt->bt_bdev)) { error = blkdev_report_zones(bt->bt_bdev, XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart), -- 2.47.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems 2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig @ 2025-10-15 7:10 ` Damien Le Moal 2025-10-15 16:01 ` Darrick J. Wong 1 sibling, 0 replies; 25+ messages in thread From: Damien Le Moal @ 2025-10-15 7:10 UTC (permalink / raw) To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On 2025/10/15 15:27, Christoph Hellwig wrote: > Set s_min_writeback_pages to the zone size, so that writeback always > writes up to a full zone. This ensures that writeback does not add > spurious file fragmentation when writing back a large number of > files that are larger than the zone size. > > Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems 2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig 2025-10-15 7:10 ` Damien Le Moal @ 2025-10-15 16:01 ` Darrick J. Wong 1 sibling, 0 replies; 25+ messages in thread From: Darrick J. Wong @ 2025-10-15 16:01 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Wed, Oct 15, 2025 at 03:27:16PM +0900, Christoph Hellwig wrote: > Set s_min_writeback_pages to the zone size, so that writeback always > writes up to a full zone. This ensures that writeback does not add > spurious file fragmentation when writing back a large number of > files that are larger than the zone size. > > Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_zone_alloc.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c > index 1147bacb2da8..0f4e460fd3ea 100644 > --- a/fs/xfs/xfs_zone_alloc.c > +++ b/fs/xfs/xfs_zone_alloc.c > @@ -1215,6 +1215,7 @@ xfs_mount_zones( > .mp = mp, > }; > struct xfs_buftarg *bt = mp->m_rtdev_targp; > + xfs_extlen_t zone_blocks = mp->m_groups[XG_TYPE_RTG].blocks; > int error; > > if (!bt) { > @@ -1245,10 +1246,12 @@ xfs_mount_zones( > return -ENOMEM; > > xfs_info(mp, "%u zones of %u blocks (%u max open zones)", > - mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks, > - mp->m_max_open_zones); > + mp->m_sb.sb_rgcount, zone_blocks, mp->m_max_open_zones); > trace_xfs_zones_mount(mp); > > + mp->m_super->s_min_writeback_pages = > + XFS_FSB_TO_B(mp, zone_blocks) >> PAGE_SHIFT; Hmm. The maximum rtgroup (and hence zone) size is 2^31-1 blocks. That quantity is casted to int64_t by FSB_TO_B, then shifted down by PAGE_SHIFT. So I think there's no chance of an overflow here, especially if s_min_writeback_pages becomes type long. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > + > if (bdev_is_zoned(bt->bt_bdev)) { > error = blkdev_report_zones(bt->bt_bdev, > XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart), > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: allow file systems to increase the minimum writeback chunk size 2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig ` (2 preceding siblings ...) 2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig @ 2025-10-15 7:11 ` Damien Le Moal 3 siblings, 0 replies; 25+ messages in thread From: Damien Le Moal @ 2025-10-15 7:11 UTC (permalink / raw) To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On 2025/10/15 15:27, Christoph Hellwig wrote: > Hi all, > > The relatively low minimal writeback size of 4MiB leads means that > written back inodes on rotational media are switched a lot. Besides > introducing additional seeks, this also can lead to extreme file > fragmentation on zoned devices when a lot of files are cached relative > to the available writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size, and set it to the zone size for zoned XFS. For the series: Tested-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 25+ messages in thread
* allow file systems to increase the minimum writeback chunk size v2 @ 2025-10-17 3:45 Christoph Hellwig 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw) To: Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs Hi all, The relatively low minimal writeback size of 4MiB leads means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. Add a superblock field that allows the file system to override the default size, and set it to the zone size for zoned XFS. Changes since v1: - covert the field to a long to match other related writeback code - cap the zone XFS writeback size to the maximum extent size - write an extensive comment about the tradeoffs of setting the value - fix a commit message typo Diffstat: fs/fs-writeback.c | 26 +++++++++----------------- fs/super.c | 1 + fs/xfs/xfs_zone_alloc.c | 28 ++++++++++++++++++++++++++-- include/linux/fs.h | 1 + include/linux/writeback.h | 5 +++++ 5 files changed, 42 insertions(+), 19 deletions(-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig @ 2025-10-17 3:45 ` Christoph Hellwig 2025-10-17 12:32 ` Jan Kara ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-17 3:45 UTC (permalink / raw) To: Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. Add a superblock field that allows the file system to override the default size. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/fs-writeback.c | 14 +++++--------- fs/super.c | 1 + include/linux/fs.h | 1 + include/linux/writeback.h | 5 +++++ 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 11fd08a0efb8..6d50b02cdab6 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -31,11 +31,6 @@ #include <linux/memcontrol.h> #include "internal.h" -/* - * 4MB minimal write chunk size - */ -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) - /* * Passed into wb_writeback(), essentially a subset of writeback_control */ @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, return ret; } -static long writeback_chunk_size(struct bdi_writeback *wb, - struct wb_writeback_work *work) +static long writeback_chunk_size(struct super_block *sb, + struct bdi_writeback *wb, struct wb_writeback_work *work) { long pages; @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, pages = min(wb->avg_write_bandwidth / 2, global_wb_domain.dirty_limit / DIRTY_SCOPE); pages = min(pages, work->nr_pages); - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); + return round_down(pages + sb->s_min_writeback_pages, + sb->s_min_writeback_pages); } /* @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, inode->i_state |= I_SYNC; wbc_attach_and_unlock_inode(&wbc, inode); - write_chunk = writeback_chunk_size(wb, work); + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); wbc.nr_to_write = write_chunk; wbc.pages_skipped = 0; diff --git a/fs/super.c b/fs/super.c index 5bab94fb7e03..599c1d2641fe 100644 --- a/fs/super.c +++ b/fs/super.c @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, goto fail; if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) goto fail; + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; return s; fail: diff --git a/include/linux/fs.h b/include/linux/fs.h index c895146c1444..ae6f37c6eaa4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1583,6 +1583,7 @@ struct super_block { spinlock_t s_inode_wblist_lock; struct list_head s_inodes_wb; /* writeback inodes */ + long s_min_writeback_pages; } __randomize_layout; static inline struct user_namespace *i_user_ns(const struct inode *inode) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 22dd4adc5667..49e1dd96f43e 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); void sb_mark_inode_writeback(struct inode *inode); void sb_clear_inode_writeback(struct inode *inode); +/* + * 4MB minimal write chunk size + */ +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) + #endif /* WRITEBACK_H */ -- 2.47.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig @ 2025-10-17 12:32 ` Jan Kara 2025-10-17 15:45 ` Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: Jan Kara @ 2025-10-17 12:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Fri 17-10-25 05:45:48, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB means that written back > inodes on rotational media are switched a lot. Besides introducing > additional seeks, this also can lead to extreme file fragmentation on > zoned devices when a lot of files are cached relative to the available > writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/fs-writeback.c | 14 +++++--------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > include/linux/writeback.h | 5 +++++ > 4 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 11fd08a0efb8..6d50b02cdab6 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -31,11 +31,6 @@ > #include <linux/memcontrol.h> > #include "internal.h" > > -/* > - * 4MB minimal write chunk size > - */ > -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > - > /* > * Passed into wb_writeback(), essentially a subset of writeback_control > */ > @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, > return ret; > } > > -static long writeback_chunk_size(struct bdi_writeback *wb, > - struct wb_writeback_work *work) > +static long writeback_chunk_size(struct super_block *sb, > + struct bdi_writeback *wb, struct wb_writeback_work *work) > { > long pages; > > @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > + return round_down(pages + sb->s_min_writeback_pages, > + sb->s_min_writeback_pages); > } > > /* > @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, > inode->i_state |= I_SYNC; > wbc_attach_and_unlock_inode(&wbc, inode); > > - write_chunk = writeback_chunk_size(wb, work); > + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); > wbc.nr_to_write = write_chunk; > wbc.pages_skipped = 0; > > diff --git a/fs/super.c b/fs/super.c > index 5bab94fb7e03..599c1d2641fe 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > goto fail; > if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) > goto fail; > + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; > return s; > > fail: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..ae6f37c6eaa4 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + long s_min_writeback_pages; > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 22dd4adc5667..49e1dd96f43e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); > void sb_mark_inode_writeback(struct inode *inode); > void sb_clear_inode_writeback(struct inode *inode); > > +/* > + * 4MB minimal write chunk size > + */ > +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > + > #endif /* WRITEBACK_H */ > -- > 2.47.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-17 12:32 ` Jan Kara @ 2025-10-17 15:45 ` Darrick J. Wong 2025-10-20 9:35 ` Jan Kara 2025-10-24 14:33 ` Nirjhar Roy (IBM) 3 siblings, 0 replies; 25+ messages in thread From: Darrick J. Wong @ 2025-10-17 15:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Fri, Oct 17, 2025 at 05:45:48AM +0200, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB means that written back > inodes on rotational media are switched a lot. Besides introducing > additional seeks, this also can lead to extreme file fragmentation on > zoned devices when a lot of files are cached relative to the available > writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. > > Signed-off-by: Christoph Hellwig <hch@lst.de> The comment in the next patch satisfies me sufficiently, so Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > --- > fs/fs-writeback.c | 14 +++++--------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > include/linux/writeback.h | 5 +++++ > 4 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 11fd08a0efb8..6d50b02cdab6 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -31,11 +31,6 @@ > #include <linux/memcontrol.h> > #include "internal.h" > > -/* > - * 4MB minimal write chunk size > - */ > -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > - > /* > * Passed into wb_writeback(), essentially a subset of writeback_control > */ > @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, > return ret; > } > > -static long writeback_chunk_size(struct bdi_writeback *wb, > - struct wb_writeback_work *work) > +static long writeback_chunk_size(struct super_block *sb, > + struct bdi_writeback *wb, struct wb_writeback_work *work) > { > long pages; > > @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > + return round_down(pages + sb->s_min_writeback_pages, > + sb->s_min_writeback_pages); > } > > /* > @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, > inode->i_state |= I_SYNC; > wbc_attach_and_unlock_inode(&wbc, inode); > > - write_chunk = writeback_chunk_size(wb, work); > + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); > wbc.nr_to_write = write_chunk; > wbc.pages_skipped = 0; > > diff --git a/fs/super.c b/fs/super.c > index 5bab94fb7e03..599c1d2641fe 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > goto fail; > if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) > goto fail; > + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; > return s; > > fail: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..ae6f37c6eaa4 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + long s_min_writeback_pages; > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 22dd4adc5667..49e1dd96f43e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); > void sb_mark_inode_writeback(struct inode *inode); > void sb_clear_inode_writeback(struct inode *inode); > > +/* > + * 4MB minimal write chunk size > + */ > +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > + > #endif /* WRITEBACK_H */ > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-17 12:32 ` Jan Kara 2025-10-17 15:45 ` Darrick J. Wong @ 2025-10-20 9:35 ` Jan Kara 2025-10-24 14:33 ` Nirjhar Roy (IBM) 3 siblings, 0 replies; 25+ messages in thread From: Jan Kara @ 2025-10-20 9:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Fri 17-10-25 05:45:48, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB means that written back > inodes on rotational media are switched a lot. Besides introducing > additional seeks, this also can lead to extreme file fragmentation on > zoned devices when a lot of files are cached relative to the available > writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/fs-writeback.c | 14 +++++--------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > include/linux/writeback.h | 5 +++++ > 4 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 11fd08a0efb8..6d50b02cdab6 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -31,11 +31,6 @@ > #include <linux/memcontrol.h> > #include "internal.h" > > -/* > - * 4MB minimal write chunk size > - */ > -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > - > /* > * Passed into wb_writeback(), essentially a subset of writeback_control > */ > @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, > return ret; > } > > -static long writeback_chunk_size(struct bdi_writeback *wb, > - struct wb_writeback_work *work) > +static long writeback_chunk_size(struct super_block *sb, > + struct bdi_writeback *wb, struct wb_writeback_work *work) > { > long pages; > > @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > + return round_down(pages + sb->s_min_writeback_pages, > + sb->s_min_writeback_pages); > } > > /* > @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, > inode->i_state |= I_SYNC; > wbc_attach_and_unlock_inode(&wbc, inode); > > - write_chunk = writeback_chunk_size(wb, work); > + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); > wbc.nr_to_write = write_chunk; > wbc.pages_skipped = 0; > > diff --git a/fs/super.c b/fs/super.c > index 5bab94fb7e03..599c1d2641fe 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > goto fail; > if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) > goto fail; > + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; > return s; > > fail: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..ae6f37c6eaa4 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + long s_min_writeback_pages; > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 22dd4adc5667..49e1dd96f43e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); > void sb_mark_inode_writeback(struct inode *inode); > void sb_clear_inode_writeback(struct inode *inode); > > +/* > + * 4MB minimal write chunk size > + */ > +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > + > #endif /* WRITEBACK_H */ > -- > 2.47.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig ` (2 preceding siblings ...) 2025-10-20 9:35 ` Jan Kara @ 2025-10-24 14:33 ` Nirjhar Roy (IBM) 2025-10-24 15:12 ` Christoph Hellwig 3 siblings, 1 reply; 25+ messages in thread From: Nirjhar Roy (IBM) @ 2025-10-24 14:33 UTC (permalink / raw) To: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino Cc: Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Fri, 2025-10-17 at 05:45 +0200, Christoph Hellwig wrote: > The relatively low minimal writeback size of 4MiB means that written back > inodes on rotational media are switched a lot. Besides introducing > additional seeks, this also can lead to extreme file fragmentation on > zoned devices when a lot of files are cached relative to the available > writeback bandwidth. > > Add a superblock field that allows the file system to override the > default size. So this patch doesn't really explicitly set s_min_writeback_pages to a non-default/overridden value, right? That is being done in the next patch, isn't it? --NR > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/fs-writeback.c | 14 +++++--------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > include/linux/writeback.h | 5 +++++ > 4 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 11fd08a0efb8..6d50b02cdab6 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -31,11 +31,6 @@ > #include <linux/memcontrol.h> > #include "internal.h" > > -/* > - * 4MB minimal write chunk size > - */ > -#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > - > /* > * Passed into wb_writeback(), essentially a subset of writeback_control > */ > @@ -1874,8 +1869,8 @@ static int writeback_single_inode(struct inode *inode, > return ret; > } > > -static long writeback_chunk_size(struct bdi_writeback *wb, > - struct wb_writeback_work *work) > +static long writeback_chunk_size(struct super_block *sb, > + struct bdi_writeback *wb, struct wb_writeback_work *work) > { > long pages; > > @@ -1898,7 +1893,8 @@ static long writeback_chunk_size(struct bdi_writeback *wb, > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > - return round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); > + return round_down(pages + sb->s_min_writeback_pages, > + sb->s_min_writeback_pages); > } > > /* > @@ -2000,7 +1996,7 @@ static long writeback_sb_inodes(struct super_block *sb, > inode->i_state |= I_SYNC; > wbc_attach_and_unlock_inode(&wbc, inode); > > - write_chunk = writeback_chunk_size(wb, work); > + write_chunk = writeback_chunk_size(inode->i_sb, wb, work); > wbc.nr_to_write = write_chunk; > wbc.pages_skipped = 0; > > diff --git a/fs/super.c b/fs/super.c > index 5bab94fb7e03..599c1d2641fe 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -389,6 +389,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > goto fail; > if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink)) > goto fail; > + s->s_min_writeback_pages = MIN_WRITEBACK_PAGES; > return s; > > fail: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c895146c1444..ae6f37c6eaa4 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1583,6 +1583,7 @@ struct super_block { > > spinlock_t s_inode_wblist_lock; > struct list_head s_inodes_wb; /* writeback inodes */ > + long s_min_writeback_pages; > } __randomize_layout; > > static inline struct user_namespace *i_user_ns(const struct inode *inode) > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 22dd4adc5667..49e1dd96f43e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -374,4 +374,9 @@ bool redirty_page_for_writepage(struct writeback_control *, struct page *); > void sb_mark_inode_writeback(struct inode *inode); > void sb_clear_inode_writeback(struct inode *inode); > > +/* > + * 4MB minimal write chunk size > + */ > +#define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) > + > #endif /* WRITEBACK_H */ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES 2025-10-24 14:33 ` Nirjhar Roy (IBM) @ 2025-10-24 15:12 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2025-10-24 15:12 UTC (permalink / raw) To: Nirjhar Roy (IBM) Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Carlos Maiolino, Andrew Morton, willy, dlemoal, hans.holmberg, linux-mm, linux-fsdevel, linux-xfs On Fri, Oct 24, 2025 at 08:03:34PM +0530, Nirjhar Roy (IBM) wrote: > So this patch doesn't really explicitly set s_min_writeback_pages to a non-default/overridden value, > right? That is being done in the next patch, isn't it? Exactly. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-10-24 15:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-15 6:27 allow file systems to increase the minimum writeback chunk size Christoph Hellwig 2025-10-15 6:27 ` [PATCH 1/3] writeback: cleanup writeback_chunk_size Christoph Hellwig 2025-10-15 7:05 ` Damien Le Moal 2025-10-15 15:48 ` Darrick J. Wong 2025-10-20 9:34 ` Jan Kara 2025-10-15 6:27 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-15 7:09 ` Damien Le Moal 2025-10-15 7:27 ` Christoph Hellwig 2025-10-15 15:13 ` Theodore Ts'o 2025-10-16 4:33 ` Christoph Hellwig 2025-10-15 15:57 ` Darrick J. Wong 2025-10-16 4:37 ` Christoph Hellwig 2025-10-15 20:49 ` Dave Chinner 2025-10-16 4:39 ` Christoph Hellwig 2025-10-16 8:23 ` Dave Chinner 2025-10-15 6:27 ` [PATCH 3/3] xfs: set s_min_writeback_pages for zoned file systems Christoph Hellwig 2025-10-15 7:10 ` Damien Le Moal 2025-10-15 16:01 ` Darrick J. Wong 2025-10-15 7:11 ` allow file systems to increase the minimum writeback chunk size Damien Le Moal -- strict thread matches above, loose matches on Subject: below -- 2025-10-17 3:45 allow file systems to increase the minimum writeback chunk size v2 Christoph Hellwig 2025-10-17 3:45 ` [PATCH 2/3] writeback: allow the file system to override MIN_WRITEBACK_PAGES Christoph Hellwig 2025-10-17 12:32 ` Jan Kara 2025-10-17 15:45 ` Darrick J. Wong 2025-10-20 9:35 ` Jan Kara 2025-10-24 14:33 ` Nirjhar Roy (IBM) 2025-10-24 15:12 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).