* [PATCH] fuse: disable default bdi strictlimiting
@ 2025-10-08 20:41 Joanne Koong
2025-10-09 14:16 ` Miklos Szeredi
0 siblings, 1 reply; 7+ messages in thread
From: Joanne Koong @ 2025-10-08 20:41 UTC (permalink / raw)
To: miklos, linux-fsdevel; +Cc: kernel-team
Commit 5a53748568f7 ("mm/page-writeback.c: add strictlimit feature")
enabled strictlimiting by default on all fuse bdis to address the lack
of writeback accounting for temporary writeback pages.
Commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal
rb tree") eliminated the use of temporary writeback pages and commit
494d2f508883 ("fuse: use default writeback accounting") switched fuse to
use the standard writeback accounting logic provided by the mm layer.
Since fuse now uses proper writeback accounting without temporary pages,
strictlimiting is no longer needed. Additionally, for fuse large folio
buffered writes, strictlimiting is overly conservative and causes
suboptimal performance due to excessive IO throttling.
Administrators can still enable strictlimiting for specific fuse servers
via /sys/class/bdi/*/strict_limit. If needed in the future,
strictlimiting for all unprivileged fuse servers could be enabled
through a sysctl.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/inode.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 6fcfa15da868..87cb2c2bbc7b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1591,8 +1591,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
if (err)
return err;
- sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
-
/*
* For a single fuse filesystem use max 1% of dirty +
* writeback threshold.
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-08 20:41 [PATCH] fuse: disable default bdi strictlimiting Joanne Koong @ 2025-10-09 14:16 ` Miklos Szeredi 2025-10-09 18:36 ` Joanne Koong 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2025-10-09 14:16 UTC (permalink / raw) To: Joanne Koong; +Cc: linux-fsdevel, kernel-team On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote: > Since fuse now uses proper writeback accounting without temporary pages, > strictlimiting is no longer needed. Additionally, for fuse large folio > buffered writes, strictlimiting is overly conservative and causes > suboptimal performance due to excessive IO throttling. I don't quite get this part. Is this a fuse specific limitation of stritlimit vs. large folios? Or is it the case that other filesystems are also affected, but strictlimit is never used outside of fuse? > Administrators can still enable strictlimiting for specific fuse servers > via /sys/class/bdi/*/strict_limit. If needed in the future, What's the issue with doing the opposite: leaving strictlimit the default and disabling strictlimit for specific servers? Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-09 14:16 ` Miklos Szeredi @ 2025-10-09 18:36 ` Joanne Koong 2025-10-10 15:01 ` Darrick J. Wong 2025-10-27 22:38 ` Joanne Koong 0 siblings, 2 replies; 7+ messages in thread From: Joanne Koong @ 2025-10-09 18:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-fsdevel, kernel-team On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote: > > > Since fuse now uses proper writeback accounting without temporary pages, > > strictlimiting is no longer needed. Additionally, for fuse large folio > > buffered writes, strictlimiting is overly conservative and causes > > suboptimal performance due to excessive IO throttling. > > I don't quite get this part. Is this a fuse specific limitation of > stritlimit vs. large folios? > > Or is it the case that other filesystems are also affected, but > strictlimit is never used outside of fuse? It's the combination of fuse doing strictlimiting and setting the bdi max ratio to 1%. I don't think this is fuse-specific. I ran the same fio job [1] locally on xfs and with setting the bdi max ratio to 1%, saw performance drops between strictlimiting off vs. on [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G --numjobs=2 --ramp_time=30 --group_reporting=1 > > > Administrators can still enable strictlimiting for specific fuse servers > > via /sys/class/bdi/*/strict_limit. If needed in the future, > > What's the issue with doing the opposite: leaving strictlimit the > default and disabling strictlimit for specific servers? If we do that, then we can't enable large folios for servers that use the writeback cache. I don't think we can just turn on large folios if an admin later on disables strictlimiting for the server, because I don't think mapping_set_folio_order_range() can be called after the inode has been initialized (not 100% sure about this), which means we'd also need to add some mount option for servers to disable strictlimiting. Thanks, Joanne > > Thanks, > Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-09 18:36 ` Joanne Koong @ 2025-10-10 15:01 ` Darrick J. Wong 2025-10-10 15:07 ` Matthew Wilcox 2025-10-10 23:14 ` Joanne Koong 2025-10-27 22:38 ` Joanne Koong 1 sibling, 2 replies; 7+ messages in thread From: Darrick J. Wong @ 2025-10-10 15:01 UTC (permalink / raw) To: Joanne Koong; +Cc: Miklos Szeredi, linux-fsdevel, kernel-team, Matthew Wilcox [cc willy in case he has opinions about dynamically changing the pagecache order range] On Thu, Oct 09, 2025 at 11:36:30AM -0700, Joanne Koong wrote: > On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > Since fuse now uses proper writeback accounting without temporary pages, > > > strictlimiting is no longer needed. Additionally, for fuse large folio > > > buffered writes, strictlimiting is overly conservative and causes > > > suboptimal performance due to excessive IO throttling. > > > > I don't quite get this part. Is this a fuse specific limitation of > > stritlimit vs. large folios? > > > > Or is it the case that other filesystems are also affected, but > > strictlimit is never used outside of fuse? > > It's the combination of fuse doing strictlimiting and setting the bdi > max ratio to 1%. > > I don't think this is fuse-specific. I ran the same fio job [1] > locally on xfs and with setting the bdi max ratio to 1%, saw > performance drops between strictlimiting off vs. on > > [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G > --numjobs=2 --ramp_time=30 --group_reporting=1 Er... what kind of numbers? :) > > > > > Administrators can still enable strictlimiting for specific fuse servers > > > via /sys/class/bdi/*/strict_limit. If needed in the future, > > > > What's the issue with doing the opposite: leaving strictlimit the > > default and disabling strictlimit for specific servers? > > If we do that, then we can't enable large folios for servers that use > the writeback cache. I don't think we can just turn on large folios if What's the limitation on strictlimit && large_folios? Is it just the throttling problem because dirtying a single byte in a 2M folio charges the process with all 2M? Or something else? > an admin later on disables strictlimiting for the server, because I > don't think mapping_set_folio_order_range() can be called after the > inode has been initialized (not 100% sure about this), which means > we'd also need to add some mount option for servers to disable > strictlimiting. I think it's ok to increase the folio order range at runtime because you're merely expanding the range of valid folio sizes in the mapping. Decreasing the range probably won't work unless you take the inode and mapping locks exclusively and purge the pagecache. --D > Thanks, > Joanne > > > > Thanks, > > Miklos > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-10 15:01 ` Darrick J. Wong @ 2025-10-10 15:07 ` Matthew Wilcox 2025-10-10 23:14 ` Joanne Koong 1 sibling, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2025-10-10 15:07 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Joanne Koong, Miklos Szeredi, linux-fsdevel, kernel-team On Fri, Oct 10, 2025 at 08:01:13AM -0700, Darrick J. Wong wrote: > [cc willy in case he has opinions about dynamically changing the > pagecache order range] It's not designed for that. mapping_set_folio_order_range() accesses mapping->flags without any locking/atomicity, so we can overwrite other changes to mapping->flags, like setting AS_EIO. It really is supposed to be "the filesystem supports folios of these sizes", not "we've made some runtime change to the filesystem and now we'd preefer the MM uses folios of these sizes instead of those". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-10 15:01 ` Darrick J. Wong 2025-10-10 15:07 ` Matthew Wilcox @ 2025-10-10 23:14 ` Joanne Koong 1 sibling, 0 replies; 7+ messages in thread From: Joanne Koong @ 2025-10-10 23:14 UTC (permalink / raw) To: Darrick J. Wong Cc: Miklos Szeredi, linux-fsdevel, kernel-team, Matthew Wilcox On Fri, Oct 10, 2025 at 8:01 AM Darrick J. Wong <djwong@kernel.org> wrote: > > [cc willy in case he has opinions about dynamically changing the > pagecache order range] > > On Thu, Oct 09, 2025 at 11:36:30AM -0700, Joanne Koong wrote: > > On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > > Since fuse now uses proper writeback accounting without temporary pages, > > > > strictlimiting is no longer needed. Additionally, for fuse large folio > > > > buffered writes, strictlimiting is overly conservative and causes > > > > suboptimal performance due to excessive IO throttling. > > > > > > I don't quite get this part. Is this a fuse specific limitation of > > > stritlimit vs. large folios? > > > > > > Or is it the case that other filesystems are also affected, but > > > strictlimit is never used outside of fuse? > > > > It's the combination of fuse doing strictlimiting and setting the bdi > > max ratio to 1%. > > > > I don't think this is fuse-specific. I ran the same fio job [1] > > locally on xfs and with setting the bdi max ratio to 1%, saw > > performance drops between strictlimiting off vs. on > > > > [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G > > --numjobs=2 --ramp_time=30 --group_reporting=1 > > Er... what kind of numbers? :) > When I tested it earlier this week it was on a VM but testing it on an actual machine, this is what I'm seeing: echo 4294967296 > /proc/sys/vm/dirty_bytes # 4GB echo 2147483648 > /proc/sys/vm/dirty_background_bytes # 2GB fio --name=write --ioengine=sync --rw=write --bs=512K --size=2G --numjobs=2 --ramp_time=30 --group_reporting=1 default (no strictlimiting and max_ratio set to 100): around 1600 to 1800 MiB/s strictlimiting on and max_ratio set to 1: around 1050 MiB/s On systems with a lot of RAM where /proc/sys/vm/dirty_bytes is high enough, we don't see the performance drop. But 4 GB seemed like a reasonable value for /proc/sys/vm/dirty_bytes as that implies 20 GB of RAM (as I understand it, the default /proc/sys/vm/dirty_ratio value is usually set to 20% of system ram). > > > > > > > Administrators can still enable strictlimiting for specific fuse servers > > > > via /sys/class/bdi/*/strict_limit. If needed in the future, > > > > > > What's the issue with doing the opposite: leaving strictlimit the > > > default and disabling strictlimit for specific servers? > > > > If we do that, then we can't enable large folios for servers that use > > the writeback cache. I don't think we can just turn on large folios if > > What's the limitation on strictlimit && large_folios? Is it just the > throttling problem because dirtying a single byte in a 2M folio charges > the process with all 2M? Or something else? With strictlimiting on, the throttling threshold is a lot more conservative. When large folios are used, a larger number of pages are dirtied per write at once and not incrementally balanced, which causes the logic in balance_dirty_pages() to schedule io waits, whereas small folios don't have this issue because they incrementally balance pages as they write them back. This thread has a lot more context: https://lore.kernel.org/linux-fsdevel/Z1N505RCcH1dXlLZ@casper.infradead.org/T/#m9e3dd273aa202f9f4e12eb9c96602b5fec2d383d The dirtying a single byte in a 2M folio should also imo be addressed, eg through the followup to [1], but when strictlimiting is off, this is much less of an issue since the threshold is higher. Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/5qgjrq6l627byybxjs6vzouspeqj6hdrx2ohqbxqkkjy65mtz5@zp6pimrpeu4e/T/#med8769e865e98960b1f504375cb1c0c2c3bdea51 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting 2025-10-09 18:36 ` Joanne Koong 2025-10-10 15:01 ` Darrick J. Wong @ 2025-10-27 22:38 ` Joanne Koong 1 sibling, 0 replies; 7+ messages in thread From: Joanne Koong @ 2025-10-27 22:38 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-fsdevel, kernel-team On Thu, Oct 9, 2025 at 11:36 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > Since fuse now uses proper writeback accounting without temporary pages, > > > strictlimiting is no longer needed. Additionally, for fuse large folio > > > buffered writes, strictlimiting is overly conservative and causes > > > suboptimal performance due to excessive IO throttling. > > > > I don't quite get this part. Is this a fuse specific limitation of > > stritlimit vs. large folios? > > > > Or is it the case that other filesystems are also affected, but > > strictlimit is never used outside of fuse? > > It's the combination of fuse doing strictlimiting and setting the bdi > max ratio to 1%. > > I don't think this is fuse-specific. I ran the same fio job [1] > locally on xfs and with setting the bdi max ratio to 1%, saw > performance drops between strictlimiting off vs. on > > [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G > --numjobs=2 --ramp_time=30 --group_reporting=1 > > > > > Administrators can still enable strictlimiting for specific fuse servers > > > via /sys/class/bdi/*/strict_limit. If needed in the future, > > > > What's the issue with doing the opposite: leaving strictlimit the > > default and disabling strictlimit for specific servers? > > If we do that, then we can't enable large folios for servers that use > the writeback cache. I don't think we can just turn on large folios if > an admin later on disables strictlimiting for the server, because I > don't think mapping_set_folio_order_range() can be called after the > inode has been initialized (not 100% sure about this), which means > we'd also need to add some mount option for servers to disable > strictlimiting. Miklos, could you share your thoughts on this? Are you in favor of disabling default strictlimiting? Or do you prefer to have it kept enabled by default, with some mount option or sysctl added for privileged servers to be able to disable strictlimiting + enable large folios if they use the writeback cache? Thanks, Joanne > > Thanks, > Joanne > > > > Thanks, > > Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-27 22:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-08 20:41 [PATCH] fuse: disable default bdi strictlimiting Joanne Koong 2025-10-09 14:16 ` Miklos Szeredi 2025-10-09 18:36 ` Joanne Koong 2025-10-10 15:01 ` Darrick J. Wong 2025-10-10 15:07 ` Matthew Wilcox 2025-10-10 23:14 ` Joanne Koong 2025-10-27 22:38 ` Joanne Koong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).