* [PATCH] xfs: update BDI {io,ra}_pages values based on the RT device limits
2026-06-23 14:21 update BDI {io,ra}_pages values based on the RT device limits Christoph Hellwig
@ 2026-06-23 14:21 ` Christoph Hellwig
2026-06-24 10:40 ` Carlos Maiolino
2026-06-24 12:26 ` Jan Kara
2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-06-23 14:21 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Jan Kara, Filip Blagojevic, Matthew Wilcox, Damien Le Moal,
linux-fsdevel, linux-xfs
When using XFS with a main device on an SSD that stores metadata and a RT
device to store data on a HDD, we fail to take the I/O sizes for the RT
device into accounting, leading to up to 5% slower read performance when
using an SSD for metadata vs storing data and metadata on the HDD.
Fix this up by taking the RT settings into account. Note that this
updates the BDI owned by the main device, and leaves those settings in
place even when the file system is unmounted. This is a bit unexpected
but not different from manual tuning through sysfs (although that is only
possible for the ra_pages value).
Reported-by: Filip Blagojevic <filip.blagojevic@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_super.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e6781f86d867..88edd3822872 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -546,6 +546,29 @@ xfs_open_devices(
return error;
}
+/*
+ * When using a RT device some or all data I/O is using the RT device, but
+ * the BDI is inherited from the main data device. When the underlying block
+ * device for the RT device has larger I/O sizes, the BDI settings might be
+ * incorrect, which is especially bad if the main device is a SSD and the
+ * RT device is a HDD, as the io_opt fixup in blk_apply_bdi_limits is missing
+ * for this case.
+ *
+ * Update the BDI values to the max of the data and RT device to cover our
+ * bases.
+ */
+static void
+xfs_update_bdi_rahead(
+ struct xfs_mount *mp)
+{
+ struct backing_dev_info *rt_bdi =
+ mp->m_rtdev_targp->bt_bdev->bd_disk->bdi;
+ struct backing_dev_info *sb_bdi = mp->m_super->s_bdi;
+
+ sb_bdi->ra_pages = max(sb_bdi->ra_pages, rt_bdi->ra_pages);
+ sb_bdi->io_pages = max(sb_bdi->io_pages, rt_bdi->io_pages);
+}
+
/*
* Setup xfs_mount buffer target pointers based on superblock
*/
@@ -583,6 +606,7 @@ xfs_setup_devices(
mp->m_sb.sb_sectsize, mp->m_sb.sb_rblocks);
if (error)
return error;
+ xfs_update_bdi_rahead(mp);
}
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: update BDI {io,ra}_pages values based on the RT device limits
2026-06-23 14:21 update BDI {io,ra}_pages values based on the RT device limits Christoph Hellwig
2026-06-23 14:21 ` [PATCH] xfs: " Christoph Hellwig
@ 2026-06-24 10:40 ` Carlos Maiolino
2026-06-24 15:42 ` Christoph Hellwig
2026-06-24 12:26 ` Jan Kara
2 siblings, 1 reply; 6+ messages in thread
From: Carlos Maiolino @ 2026-06-24 10:40 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Filip Blagojevic, Matthew Wilcox, Damien Le Moal,
linux-fsdevel, linux-xfs
On Tue, Jun 23, 2026 at 04:21:05PM +0200, Christoph Hellwig wrote:
> Hi all,
>
> we ran into a bit of a funny case where adding an SSD to store metadata
> to a (zoned) XFS file system reduced the performance vs using a HDD for
> both data and metadata. It turns out this is due to readahead code
> looking at the BDI limits, including the io_pages value that can't be
> inspected or tuned from userspace.
>
> This patch has the simplest fix for that by just updating the values from
> XFS, but for a long-term solution this feels a bit ugly. Other, a lot
> more invasive options would be:
>
>
> 1) support multiple BDIs per file system.
>
> This would work pretty well for XFS, as data for a given file is
> always entirely on one device, and the VFS writeback code is not
> used for metadata. But it probably doesn't work well for other
> cases
Giving the current tendency of filesystems being multi-device, this
doesn't sound bad IMHO. Wouldn't accessing io_pages of each BDI also be
worth even for a journal dev? I wonder if what you ran into wouldn't be
possible if somebody would be using just a SSD for journal and a non-rt
XFS on a HDD or a different/slower device.
I don't know how stupid that sounds but perhaps it wouldn't be that
complicated to support multiple BDIs without making it unpleasant for
filesystems that don't care?
sb->bdi could be turned into a dynamic array and a new sb->s_devcount
fields to keep track of it on multi-device filesystems. Filesystems who
don't care about multiple BDIs would have it pointing to a single BDI
struct. Again I feel I'm missing something, so it might sound really
stupid :)
>
> 2) stop propagating these values through the BDI
>
> Have a way to query these parameters from the file system, either
> through a method if we want to be fully dynamic, or through fields
> instead of going through the BDI. The downside would be that
> sysfs modifications of the readahead size would not work after
> the file system initially queried them.
It doesn't sound bad, but to me, not better than keeping multiple BDIs.
But my lack of knowledge on details about BDI handling might be hiding
something from me.
>
> Thoughts?
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: update BDI {io,ra}_pages values based on the RT device limits
2026-06-24 10:40 ` Carlos Maiolino
@ 2026-06-24 15:42 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-06-24 15:42 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Christoph Hellwig, Jan Kara, Filip Blagojevic, Matthew Wilcox,
Damien Le Moal, linux-fsdevel, linux-xfs
On Wed, Jun 24, 2026 at 12:40:50PM +0200, Carlos Maiolino wrote:
> Giving the current tendency of filesystems being multi-device, this
> doesn't sound bad IMHO. Wouldn't accessing io_pages of each BDI also be
> worth even for a journal dev? I wonder if what you ran into wouldn't be
> possible if somebody would be using just a SSD for journal and a non-rt
> XFS on a HDD or a different/slower device.
Nothing looks at the value for the log device.
>
> I don't know how stupid that sounds but perhaps it wouldn't be that
> complicated to support multiple BDIs without making it unpleasant for
> filesystems that don't care?
> sb->bdi could be turned into a dynamic array and a new sb->s_devcount
> fields to keep track of it on multi-device filesystems. Filesystems who
> don't care about multiple BDIs would have it pointing to a single BDI
> struct. Again I feel I'm missing something, so it might sound really
> stupid :)
This will get complicated really soon..
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update BDI {io,ra}_pages values based on the RT device limits
2026-06-23 14:21 update BDI {io,ra}_pages values based on the RT device limits Christoph Hellwig
2026-06-23 14:21 ` [PATCH] xfs: " Christoph Hellwig
2026-06-24 10:40 ` Carlos Maiolino
@ 2026-06-24 12:26 ` Jan Kara
2026-06-24 13:49 ` Christoph Hellwig
2 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2026-06-24 12:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Carlos Maiolino, Jan Kara, Filip Blagojevic, Matthew Wilcox,
Damien Le Moal, linux-fsdevel, linux-xfs
Hi Christoph!
On Tue 23-06-26 16:21:05, Christoph Hellwig wrote:
> we ran into a bit of a funny case where adding an SSD to store metadata
> to a (zoned) XFS file system reduced the performance vs using a HDD for
> both data and metadata. It turns out this is due to readahead code
> looking at the BDI limits, including the io_pages value that can't be
> inspected or tuned from userspace.
>
> This patch has the simplest fix for that by just updating the values from
> XFS, but for a long-term solution this feels a bit ugly. Other, a lot
> more invasive options would be:
A cleaner but still relatively simple way how to fix your problem would be
to do something similar like e.g. btrfs does. That is when you have a
filesystem with multiple backing devices, you create a new BDI (using
super_setup_bdi()) and set its parameters based on the combination of
devices you have.
The only downside to this is that blk-wbt has a layering violation in it
when it tries to guess whether there are task dirty-throttled for the
device which stops working for when the filesystem uses this synthetic bdi.
But I don't think that's too serious to worry about.
> 1) support multiple BDIs per file system.
>
> This would work pretty well for XFS, as data for a given file is
> always entirely on one device, and the VFS writeback code is not
> used for metadata. But it probably doesn't work well for other
> cases
Since looking up the bdi is abstracted through inode_to_bdi() (which
currently goes through the superblock), this would be relatively easy to do
but either the inode would have to store the bdi pointer or we'd have to
have an operation to query it. Just at this point I'm not fully convinced
the additional complexity / memory cost is worth it compared to what can be
achieved with a synthetic bdi.
> 2) stop propagating these values through the BDI
>
> Have a way to query these parameters from the file system, either
> through a method if we want to be fully dynamic, or through fields
> instead of going through the BDI. The downside would be that
> sysfs modifications of the readahead size would not work after
> the file system initially queried them.
Similarly as above. It makes sense but I'm not sure there are strong enough
usecases to justify this.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: update BDI {io,ra}_pages values based on the RT device limits
2026-06-24 12:26 ` Jan Kara
@ 2026-06-24 13:49 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-06-24 13:49 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Carlos Maiolino, Filip Blagojevic,
Matthew Wilcox, Damien Le Moal, linux-fsdevel, linux-xfs
On Wed, Jun 24, 2026 at 02:26:35PM +0200, Jan Kara wrote:
> A cleaner but still relatively simple way how to fix your problem would be
> to do something similar like e.g. btrfs does. That is when you have a
> filesystem with multiple backing devices, you create a new BDI (using
> super_setup_bdi()) and set its parameters based on the combination of
> devices you have.
>
> The only downside to this is that blk-wbt has a layering violation in it
> when it tries to guess whether there are task dirty-throttled for the
> device which stops working for when the filesystem uses this synthetic bdi.
> But I don't think that's too serious to worry about.
The big downside of that is that it breaks the userspace ABI - any
previous changes to BDI paramters, most importantly the read-ahead size,
would now only apply to the BDI for s_bdev which isn't actually used
by anything.
And yeah, the blk-wbt thing actually looks really broken and could use
fixing..
> > 1) support multiple BDIs per file system.
> >
> > This would work pretty well for XFS, as data for a given file is
> > always entirely on one device, and the VFS writeback code is not
> > used for metadata. But it probably doesn't work well for other
> > cases
>
> Since looking up the bdi is abstracted through inode_to_bdi() (which
> currently goes through the superblock), this would be relatively easy to do
> but either the inode would have to store the bdi pointer or we'd have to
> have an operation to query it. Just at this point I'm not fully convinced
> the additional complexity / memory cost is worth it compared to what can be
> achieved with a synthetic bdi.
Same.
> > 2) stop propagating these values through the BDI
> >
> > Have a way to query these parameters from the file system, either
> > through a method if we want to be fully dynamic, or through fields
> > instead of going through the BDI. The downside would be that
> > sysfs modifications of the readahead size would not work after
> > the file system initially queried them.
>
> Similarly as above. It makes sense but I'm not sure there are strong enough
> usecases to justify this.
It would at least be a lot less complete than option 1. And get us out of
a bit more of all that bdi weirdness.
^ permalink raw reply [flat|nested] 6+ messages in thread