* [PATCH] fuse: disable default bdi strictlimiting
@ 2025-10-08 20:41 Joanne Koong
2025-10-09 14:16 ` Miklos Szeredi
0 siblings, 1 reply; 7+ messages in thread
From: Joanne Koong @ 2025-10-08 20:41 UTC (permalink / raw)
To: miklos, linux-fsdevel; +Cc: kernel-team
Commit 5a53748568f7 ("mm/page-writeback.c: add strictlimit feature")
enabled strictlimiting by default on all fuse bdis to address the lack
of writeback accounting for temporary writeback pages.
Commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal
rb tree") eliminated the use of temporary writeback pages and commit
494d2f508883 ("fuse: use default writeback accounting") switched fuse to
use the standard writeback accounting logic provided by the mm layer.
Since fuse now uses proper writeback accounting without temporary pages,
strictlimiting is no longer needed. Additionally, for fuse large folio
buffered writes, strictlimiting is overly conservative and causes
suboptimal performance due to excessive IO throttling.
Administrators can still enable strictlimiting for specific fuse servers
via /sys/class/bdi/*/strict_limit. If needed in the future,
strictlimiting for all unprivileged fuse servers could be enabled
through a sysctl.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/inode.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 6fcfa15da868..87cb2c2bbc7b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1591,8 +1591,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
if (err)
return err;
- sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
-
/*
* For a single fuse filesystem use max 1% of dirty +
* writeback threshold.
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-08 20:41 [PATCH] fuse: disable default bdi strictlimiting Joanne Koong
@ 2025-10-09 14:16 ` Miklos Szeredi
2025-10-09 18:36 ` Joanne Koong
0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2025-10-09 14:16 UTC (permalink / raw)
To: Joanne Koong; +Cc: linux-fsdevel, kernel-team
On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote:
> Since fuse now uses proper writeback accounting without temporary pages,
> strictlimiting is no longer needed. Additionally, for fuse large folio
> buffered writes, strictlimiting is overly conservative and causes
> suboptimal performance due to excessive IO throttling.
I don't quite get this part. Is this a fuse specific limitation of
stritlimit vs. large folios?
Or is it the case that other filesystems are also affected, but
strictlimit is never used outside of fuse?
> Administrators can still enable strictlimiting for specific fuse servers
> via /sys/class/bdi/*/strict_limit. If needed in the future,
What's the issue with doing the opposite: leaving strictlimit the
default and disabling strictlimit for specific servers?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-09 14:16 ` Miklos Szeredi
@ 2025-10-09 18:36 ` Joanne Koong
2025-10-10 15:01 ` Darrick J. Wong
2025-10-27 22:38 ` Joanne Koong
0 siblings, 2 replies; 7+ messages in thread
From: Joanne Koong @ 2025-10-09 18:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-fsdevel, kernel-team
On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> > Since fuse now uses proper writeback accounting without temporary pages,
> > strictlimiting is no longer needed. Additionally, for fuse large folio
> > buffered writes, strictlimiting is overly conservative and causes
> > suboptimal performance due to excessive IO throttling.
>
> I don't quite get this part. Is this a fuse specific limitation of
> stritlimit vs. large folios?
>
> Or is it the case that other filesystems are also affected, but
> strictlimit is never used outside of fuse?
It's the combination of fuse doing strictlimiting and setting the bdi
max ratio to 1%.
I don't think this is fuse-specific. I ran the same fio job [1]
locally on xfs and with setting the bdi max ratio to 1%, saw
performance drops between strictlimiting off vs. on
[1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G
--numjobs=2 --ramp_time=30 --group_reporting=1
>
> > Administrators can still enable strictlimiting for specific fuse servers
> > via /sys/class/bdi/*/strict_limit. If needed in the future,
>
> What's the issue with doing the opposite: leaving strictlimit the
> default and disabling strictlimit for specific servers?
If we do that, then we can't enable large folios for servers that use
the writeback cache. I don't think we can just turn on large folios if
an admin later on disables strictlimiting for the server, because I
don't think mapping_set_folio_order_range() can be called after the
inode has been initialized (not 100% sure about this), which means
we'd also need to add some mount option for servers to disable
strictlimiting.
Thanks,
Joanne
>
> Thanks,
> Miklos
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-09 18:36 ` Joanne Koong
@ 2025-10-10 15:01 ` Darrick J. Wong
2025-10-10 15:07 ` Matthew Wilcox
2025-10-10 23:14 ` Joanne Koong
2025-10-27 22:38 ` Joanne Koong
1 sibling, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-10-10 15:01 UTC (permalink / raw)
To: Joanne Koong; +Cc: Miklos Szeredi, linux-fsdevel, kernel-team, Matthew Wilcox
[cc willy in case he has opinions about dynamically changing the
pagecache order range]
On Thu, Oct 09, 2025 at 11:36:30AM -0700, Joanne Koong wrote:
> On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > > Since fuse now uses proper writeback accounting without temporary pages,
> > > strictlimiting is no longer needed. Additionally, for fuse large folio
> > > buffered writes, strictlimiting is overly conservative and causes
> > > suboptimal performance due to excessive IO throttling.
> >
> > I don't quite get this part. Is this a fuse specific limitation of
> > stritlimit vs. large folios?
> >
> > Or is it the case that other filesystems are also affected, but
> > strictlimit is never used outside of fuse?
>
> It's the combination of fuse doing strictlimiting and setting the bdi
> max ratio to 1%.
>
> I don't think this is fuse-specific. I ran the same fio job [1]
> locally on xfs and with setting the bdi max ratio to 1%, saw
> performance drops between strictlimiting off vs. on
>
> [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G
> --numjobs=2 --ramp_time=30 --group_reporting=1
Er... what kind of numbers? :)
> >
> > > Administrators can still enable strictlimiting for specific fuse servers
> > > via /sys/class/bdi/*/strict_limit. If needed in the future,
> >
> > What's the issue with doing the opposite: leaving strictlimit the
> > default and disabling strictlimit for specific servers?
>
> If we do that, then we can't enable large folios for servers that use
> the writeback cache. I don't think we can just turn on large folios if
What's the limitation on strictlimit && large_folios? Is it just the
throttling problem because dirtying a single byte in a 2M folio charges
the process with all 2M? Or something else?
> an admin later on disables strictlimiting for the server, because I
> don't think mapping_set_folio_order_range() can be called after the
> inode has been initialized (not 100% sure about this), which means
> we'd also need to add some mount option for servers to disable
> strictlimiting.
I think it's ok to increase the folio order range at runtime because
you're merely expanding the range of valid folio sizes in the mapping.
Decreasing the range probably won't work unless you take the inode and
mapping locks exclusively and purge the pagecache.
--D
> Thanks,
> Joanne
> >
> > Thanks,
> > Miklos
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-10 15:01 ` Darrick J. Wong
@ 2025-10-10 15:07 ` Matthew Wilcox
2025-10-10 23:14 ` Joanne Koong
1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2025-10-10 15:07 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Joanne Koong, Miklos Szeredi, linux-fsdevel, kernel-team
On Fri, Oct 10, 2025 at 08:01:13AM -0700, Darrick J. Wong wrote:
> [cc willy in case he has opinions about dynamically changing the
> pagecache order range]
It's not designed for that. mapping_set_folio_order_range() accesses
mapping->flags without any locking/atomicity, so we can overwrite
other changes to mapping->flags, like setting AS_EIO. It really
is supposed to be "the filesystem supports folios of these sizes",
not "we've made some runtime change to the filesystem and now we'd
preefer the MM uses folios of these sizes instead of those".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-10 15:01 ` Darrick J. Wong
2025-10-10 15:07 ` Matthew Wilcox
@ 2025-10-10 23:14 ` Joanne Koong
1 sibling, 0 replies; 7+ messages in thread
From: Joanne Koong @ 2025-10-10 23:14 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Miklos Szeredi, linux-fsdevel, kernel-team, Matthew Wilcox
On Fri, Oct 10, 2025 at 8:01 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> [cc willy in case he has opinions about dynamically changing the
> pagecache order range]
>
> On Thu, Oct 09, 2025 at 11:36:30AM -0700, Joanne Koong wrote:
> > On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > > Since fuse now uses proper writeback accounting without temporary pages,
> > > > strictlimiting is no longer needed. Additionally, for fuse large folio
> > > > buffered writes, strictlimiting is overly conservative and causes
> > > > suboptimal performance due to excessive IO throttling.
> > >
> > > I don't quite get this part. Is this a fuse specific limitation of
> > > stritlimit vs. large folios?
> > >
> > > Or is it the case that other filesystems are also affected, but
> > > strictlimit is never used outside of fuse?
> >
> > It's the combination of fuse doing strictlimiting and setting the bdi
> > max ratio to 1%.
> >
> > I don't think this is fuse-specific. I ran the same fio job [1]
> > locally on xfs and with setting the bdi max ratio to 1%, saw
> > performance drops between strictlimiting off vs. on
> >
> > [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G
> > --numjobs=2 --ramp_time=30 --group_reporting=1
>
> Er... what kind of numbers? :)
>
When I tested it earlier this week it was on a VM but testing it on an
actual machine, this is what I'm seeing:
echo 4294967296 > /proc/sys/vm/dirty_bytes # 4GB
echo 2147483648 > /proc/sys/vm/dirty_background_bytes # 2GB
fio --name=write --ioengine=sync --rw=write --bs=512K --size=2G
--numjobs=2 --ramp_time=30 --group_reporting=1
default (no strictlimiting and max_ratio set to 100):
around 1600 to 1800 MiB/s
strictlimiting on and max_ratio set to 1:
around 1050 MiB/s
On systems with a lot of RAM where /proc/sys/vm/dirty_bytes is high
enough, we don't see the performance drop. But 4 GB seemed like a
reasonable value for /proc/sys/vm/dirty_bytes as that implies 20 GB of
RAM (as I understand it, the default /proc/sys/vm/dirty_ratio value is
usually set to 20% of system ram).
> > >
> > > > Administrators can still enable strictlimiting for specific fuse servers
> > > > via /sys/class/bdi/*/strict_limit. If needed in the future,
> > >
> > > What's the issue with doing the opposite: leaving strictlimit the
> > > default and disabling strictlimit for specific servers?
> >
> > If we do that, then we can't enable large folios for servers that use
> > the writeback cache. I don't think we can just turn on large folios if
>
> What's the limitation on strictlimit && large_folios? Is it just the
> throttling problem because dirtying a single byte in a 2M folio charges
> the process with all 2M? Or something else?
With strictlimiting on, the throttling threshold is a lot more
conservative. When large folios are used, a larger number of pages are
dirtied per write at once and not incrementally balanced, which causes
the logic in balance_dirty_pages() to schedule io waits, whereas small
folios don't have this issue because they incrementally balance pages
as they write them back. This thread has a lot more context:
https://lore.kernel.org/linux-fsdevel/Z1N505RCcH1dXlLZ@casper.infradead.org/T/#m9e3dd273aa202f9f4e12eb9c96602b5fec2d383d
The dirtying a single byte in a 2M folio should also imo be addressed,
eg through the followup to [1], but when strictlimiting is off, this
is much less of an issue since the threshold is higher.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/5qgjrq6l627byybxjs6vzouspeqj6hdrx2ohqbxqkkjy65mtz5@zp6pimrpeu4e/T/#med8769e865e98960b1f504375cb1c0c2c3bdea51
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: disable default bdi strictlimiting
2025-10-09 18:36 ` Joanne Koong
2025-10-10 15:01 ` Darrick J. Wong
@ 2025-10-27 22:38 ` Joanne Koong
1 sibling, 0 replies; 7+ messages in thread
From: Joanne Koong @ 2025-10-27 22:38 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-fsdevel, kernel-team
On Thu, Oct 9, 2025 at 11:36 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Oct 9, 2025 at 7:17 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, 8 Oct 2025 at 22:42, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > > Since fuse now uses proper writeback accounting without temporary pages,
> > > strictlimiting is no longer needed. Additionally, for fuse large folio
> > > buffered writes, strictlimiting is overly conservative and causes
> > > suboptimal performance due to excessive IO throttling.
> >
> > I don't quite get this part. Is this a fuse specific limitation of
> > stritlimit vs. large folios?
> >
> > Or is it the case that other filesystems are also affected, but
> > strictlimit is never used outside of fuse?
>
> It's the combination of fuse doing strictlimiting and setting the bdi
> max ratio to 1%.
>
> I don't think this is fuse-specific. I ran the same fio job [1]
> locally on xfs and with setting the bdi max ratio to 1%, saw
> performance drops between strictlimiting off vs. on
>
> [1] fio --name=write --ioengine=sync --rw=write --bs=256K --size=1G
> --numjobs=2 --ramp_time=30 --group_reporting=1
> >
> > > Administrators can still enable strictlimiting for specific fuse servers
> > > via /sys/class/bdi/*/strict_limit. If needed in the future,
> >
> > What's the issue with doing the opposite: leaving strictlimit the
> > default and disabling strictlimit for specific servers?
>
> If we do that, then we can't enable large folios for servers that use
> the writeback cache. I don't think we can just turn on large folios if
> an admin later on disables strictlimiting for the server, because I
> don't think mapping_set_folio_order_range() can be called after the
> inode has been initialized (not 100% sure about this), which means
> we'd also need to add some mount option for servers to disable
> strictlimiting.
Miklos, could you share your thoughts on this? Are you in favor of
disabling default strictlimiting? Or do you prefer to have it kept
enabled by default, with some mount option or sysctl added for
privileged servers to be able to disable strictlimiting + enable large
folios if they use the writeback cache?
Thanks,
Joanne
>
> Thanks,
> Joanne
> >
> > Thanks,
> > Miklos
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-27 22:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 20:41 [PATCH] fuse: disable default bdi strictlimiting Joanne Koong
2025-10-09 14:16 ` Miklos Szeredi
2025-10-09 18:36 ` Joanne Koong
2025-10-10 15:01 ` Darrick J. Wong
2025-10-10 15:07 ` Matthew Wilcox
2025-10-10 23:14 ` Joanne Koong
2025-10-27 22:38 ` Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).