mkfs.xfs "concurrency" change concerns

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mkfs.xfs "concurrency" change concerns
@ 2025-10-09 20:13 Eric Sandeen
  2025-10-10  5:17 ` Christoph Hellwig
  2025-10-10 19:17 ` Darrick J. Wong
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Sandeen @ 2025-10-09 20:13 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org; +Cc: fstests@vger.kernel.org, Darrick J. Wong

Hey all -

this got long, so tl;dr:

1) concurrency geometry breaks some xfstests for me
2) concurrency behavior is not consistent w/ loopback vs. imagefile
3) concurrency defaults to the mkfs machine not the mount machine

In detail:

So, I realize I'm late to the game here and didn't review the patches
before they went in, but it looks like the "concurrency" mkfs.xfs
arguments and defaults are breaking several xfstests.

4738ff0 mkfs: allow sizing realtime allocation groups for concurrency
c02a1873 mkfs: allow sizing internal logs for concurrency
9338bc8b mkfs: allow sizing allocation groups for concurrency

Specifically, xfs/078, xfs/216, and xfs/217 are failing for us
on various machines with between 8 and 128 CPUS, due to the
fundamental change in geometry that results from the new
concurrency behavior, which makes any consistent golden
output that involves geometry details quite difficult.

One option might be to detect whether the "concurrency" args
exist in mkfs.xfs, and set that back to 4, which is probably likely
to more or less behave the old way, and match the current golden
output which was (usually) based on 4 AGs. But that might break
the purpose of some of the tests, if we're only validating behavior
when a specific set of arguments is applied.

(for 078, adding -d concurrency=4 seems to fix it. For  216 and 217
I think I needed -l concurrency=4, but this might depend on nr cpus.)

So, we could probably fix xfstests to make mkfs.xfs behave the old way,
with loss of coverage of behavior with current code defaults.

Other concerns, though - I see that we only do this if the storage
is nonrotational. But in testing, if you set up a loop device, the
loop dev is nonrotational, and gets the new concurrency behavior,
while doing a mkfs.xfs directly on the backing file doesn't:

# losetup /dev/loop4 testfile.img

# mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks

# mkfs.xfs -f testfile.img 2>&1 | grep agcount
meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks

so we get different behavior depending on how you access the image file.

And speaking of image files, it's a pretty common use case to use mkfs.xfs
on image files for deployment elsewhere.  Maybe the good news, even if
accidental, is that if you mkfs the file directly, you don't get system-
specific "concurrence" geometry. But I am concerned that there is no
guarantee that the machine performing mkfs is the machine that will mount
the filesystem, so this seems like a slightly dangerous assumption for 
default behavior.

I understand the desire to DTRT by default, but I am concerned about
test breakage, loopdev inconsistencies, and too-broad assumptions about
where the resulting filesystem will actually be used.

Thoughts?
Thanks,
-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-09 20:13 mkfs.xfs "concurrency" change concerns Eric Sandeen
@ 2025-10-10  5:17 ` Christoph Hellwig
  2025-10-10 19:17 ` Darrick J. Wong
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2025-10-10  5:17 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Darrick J. Wong

On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
> Specifically, xfs/078, xfs/216, and xfs/217 are failing for us
> on various machines with between 8 and 128 CPUS, due to the
> fundamental change in geometry that results from the new
> concurrency behavior, which makes any consistent golden
> output that involves geometry details quite difficult.

FYI, the zoned XFS CI has seen these fail as well, and I've never
been able to reproduce it locally.  The CI runs on a fairly beefy
box so I suspect it could be correlated.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-09 20:13 mkfs.xfs "concurrency" change concerns Eric Sandeen
  2025-10-10  5:17 ` Christoph Hellwig
@ 2025-10-10 19:17 ` Darrick J. Wong
  2025-10-10 19:34   ` Darrick J. Wong
  2025-10-10 20:47   ` Eric Sandeen
  1 sibling, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2025-10-10 19:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org

On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
> Hey all -
> 
> this got long, so tl;dr:
> 
> 1) concurrency geometry breaks some xfstests for me
> 2) concurrency behavior is not consistent w/ loopback vs. imagefile
> 3) concurrency defaults to the mkfs machine not the mount machine
> 
> In detail:
> 
> So, I realize I'm late to the game here and didn't review the patches
> before they went in, but it looks like the "concurrency" mkfs.xfs
> arguments and defaults are breaking several xfstests.
> 
> 4738ff0 mkfs: allow sizing realtime allocation groups for concurrency
> c02a1873 mkfs: allow sizing internal logs for concurrency
> 9338bc8b mkfs: allow sizing allocation groups for concurrency
> 
> Specifically, xfs/078, xfs/216, and xfs/217 are failing for us
> on various machines with between 8 and 128 CPUS, due to the
> fundamental change in geometry that results from the new
> concurrency behavior, which makes any consistent golden
> output that involves geometry details quite difficult.

Uggggh.  You're right, I see golden output changes in xfs/078 if I boost
the number of VM CPUs past four:

--- /run/fstests/bin/tests/xfs/078.out  2025-07-15 14:41:40.195202883 -0700
+++ /var/tmp/fstests/xfs/078.out.bad    2025-10-10 11:56:14.040263143 -0700
@@ -188,6 +188,6 @@
 *** mount loop filesystem
 *** grow loop filesystem
 xfs_growfs --BlockSize=4096 --Blocks=268435456
-data blocks changed from 268435456 to 4194304001
+data blocks changed from 268435456 to 4194304000
 *** unmount
 *** all done

Can this happen if RAID stripe parameters also get involved and change
the AG count?  This test could be improved by parsing the block counts
and using _within to get past those kinds of problems, if there's more
than one way to make the golden output wrong despite correct operation.

What do your xfs/21[67] failures look like?

> One option might be to detect whether the "concurrency" args
> exist in mkfs.xfs, and set that back to 4, which is probably likely
> to more or less behave the old way, and match the current golden
> output which was (usually) based on 4 AGs. But that might break
> the purpose of some of the tests, if we're only validating behavior
> when a specific set of arguments is applied.

I think you're really asking to force the old behavior from before the
concurrency options existed, but only if fstests is running.  Or maybe
a little more than that; I'll get to that at the end.

> (for 078, adding -d concurrency=4 seems to fix it. For  216 and 217
> I think I needed -l concurrency=4, but this might depend on nr cpus.)
> 
> So, we could probably fix xfstests to make mkfs.xfs behave the old way,
> with loss of coverage of behavior with current code defaults.

Well yes, you'd be losing test coverage either for configurations that
set concurrency options explicitly, or when the storage are
nonrotational.

> Other concerns, though - I see that we only do this if the storage
> is nonrotational. But in testing, if you set up a loop device, the
> loop dev is nonrotational, and gets the new concurrency behavior,
> while doing a mkfs.xfs directly on the backing file doesn't:
> 
> # losetup /dev/loop4 testfile.img
> 
> # mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
> meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks
> 
> # mkfs.xfs -f testfile.img 2>&1 | grep agcount
> meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks
> 
> so we get different behavior depending on how you access the image file.

What kernel is this?  6.17 sets ROTATIONAL by default and clears it if
the backing bdev (or the bdev backing the file) has ROTATIONAL set.
That might be why you see the discrepancy.  I think that behavior has
been in the kernel since ~6.11 or so.

[Aside: Obviously, checking inode->i_sb->sb_bdev isn't sufficient for
files on a multi-disk filesystem, but it's probably close enough here.]

(But see below)

> And speaking of image files, it's a pretty common use case to use mkfs.xfs
> on image files for deployment elsewhere.  Maybe the good news, even if

Yes, mkfs defaults to assuming rotational (and hence not computing a
concurrency factor) if the BLKROTATIONAL query fails.  So that might
be why you get 4 AGs on a regular file but 6 on a loop device pointing
to the same file.

> accidental, is that if you mkfs the file directly, you don't get system-
> specific "concurrence" geometry. But I am concerned that there is no
> guarantee that the machine performing mkfs is the machine that will mount
> the filesystem, so this seems like a slightly dangerous assumption for 
> default behavior.

What I tell our internal customers is:

1. Defer formatting until deployment whenever possible so that mkfs can
optimize the filesystem for the storage and machine it actually gets.

2. If you can't do that, then try to make the image creator machine
match the deployment hardware as much as possible in terms of
rotationality and CPU count.

3. We shouldn't have a #3 because that's leaving performance on the
table.

They were very happy to see performance gains after adjusting their
WOS generation scripts towards #1.

But I think it's this #3 here that's causing the most concern for you?
I suppose if you really don't know what the deployment hardware is going
to look like then ... falling back to the default calculations (which
are mostly for spinning-rust) at least is familiar.

Would the creation of -[dlr] concurrency=never options to mkfs.xfs
address all of your concerns?

> I understand the desire to DTRT by default, but I am concerned about
> test breakage, loopdev inconsistencies, and too-broad assumptions about
> where the resulting filesystem will actually be used.

That itself is a ... very broad statement considering that this code
landed in 6.8 and this is the first I've heard of complaints. ;)

--D

> Thoughts?
> Thanks,
> -Eric
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-10 19:17 ` Darrick J. Wong
@ 2025-10-10 19:34   ` Darrick J. Wong
  2025-10-10 20:47   ` Eric Sandeen
  1 sibling, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2025-10-10 19:34 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org

On Fri, Oct 10, 2025 at 12:17:13PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
> > Hey all -
> > 
> > this got long, so tl;dr:
> > 
> > 1) concurrency geometry breaks some xfstests for me
> > 2) concurrency behavior is not consistent w/ loopback vs. imagefile
> > 3) concurrency defaults to the mkfs machine not the mount machine
> > 
> > In detail:
> > 
> > So, I realize I'm late to the game here and didn't review the patches
> > before they went in, but it looks like the "concurrency" mkfs.xfs
> > arguments and defaults are breaking several xfstests.
> > 
> > 4738ff0 mkfs: allow sizing realtime allocation groups for concurrency
> > c02a1873 mkfs: allow sizing internal logs for concurrency
> > 9338bc8b mkfs: allow sizing allocation groups for concurrency
> > 
> > Specifically, xfs/078, xfs/216, and xfs/217 are failing for us
> > on various machines with between 8 and 128 CPUS, due to the
> > fundamental change in geometry that results from the new
> > concurrency behavior, which makes any consistent golden
> > output that involves geometry details quite difficult.
> 
> Uggggh.  You're right, I see golden output changes in xfs/078 if I boost
> the number of VM CPUs past four:
> 
> --- /run/fstests/bin/tests/xfs/078.out  2025-07-15 14:41:40.195202883 -0700
> +++ /var/tmp/fstests/xfs/078.out.bad    2025-10-10 11:56:14.040263143 -0700
> @@ -188,6 +188,6 @@
>  *** mount loop filesystem
>  *** grow loop filesystem
>  xfs_growfs --BlockSize=4096 --Blocks=268435456
> -data blocks changed from 268435456 to 4194304001
> +data blocks changed from 268435456 to 4194304000
>  *** unmount
>  *** all done
> 
> Can this happen if RAID stripe parameters also get involved and change
> the AG count?  This test could be improved by parsing the block counts
> and using _within to get past those kinds of problems, if there's more
> than one way to make the golden output wrong despite correct operation.
> 
> What do your xfs/21[67] failures look like?
> 
> > One option might be to detect whether the "concurrency" args
> > exist in mkfs.xfs, and set that back to 4, which is probably likely
> > to more or less behave the old way, and match the current golden
> > output which was (usually) based on 4 AGs. But that might break
> > the purpose of some of the tests, if we're only validating behavior
> > when a specific set of arguments is applied.
> 
> I think you're really asking to force the old behavior from before the
> concurrency options existed, but only if fstests is running.  Or maybe
> a little more than that; I'll get to that at the end.
> 
> > (for 078, adding -d concurrency=4 seems to fix it. For  216 and 217
> > I think I needed -l concurrency=4, but this might depend on nr cpus.)
> > 
> > So, we could probably fix xfstests to make mkfs.xfs behave the old way,
> > with loss of coverage of behavior with current code defaults.
> 
> Well yes, you'd be losing test coverage either for configurations that
> set concurrency options explicitly, or when the storage are
> nonrotational.
> 
> > Other concerns, though - I see that we only do this if the storage
> > is nonrotational. But in testing, if you set up a loop device, the
> > loop dev is nonrotational, and gets the new concurrency behavior,
> > while doing a mkfs.xfs directly on the backing file doesn't:
> > 
> > # losetup /dev/loop4 testfile.img
> > 
> > # mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
> > meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks
> > 
> > # mkfs.xfs -f testfile.img 2>&1 | grep agcount
> > meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks
> > 
> > so we get different behavior depending on how you access the image file.
> 
> What kernel is this?  6.17 sets ROTATIONAL by default and clears it if
> the backing bdev (or the bdev backing the file) has ROTATIONAL set.
> That might be why you see the discrepancy.  I think that behavior has
> been in the kernel since ~6.11 or so.
> 
> [Aside: Obviously, checking inode->i_sb->sb_bdev isn't sufficient for
> files on a multi-disk filesystem, but it's probably close enough here.]
> 
> (But see below)
> 
> > And speaking of image files, it's a pretty common use case to use mkfs.xfs
> > on image files for deployment elsewhere.  Maybe the good news, even if
> 
> Yes, mkfs defaults to assuming rotational (and hence not computing a
> concurrency factor) if the BLKROTATIONAL query fails.  So that might
> be why you get 4 AGs on a regular file but 6 on a loop device pointing
> to the same file.
> 
> > accidental, is that if you mkfs the file directly, you don't get system-
> > specific "concurrence" geometry. But I am concerned that there is no
> > guarantee that the machine performing mkfs is the machine that will mount
> > the filesystem, so this seems like a slightly dangerous assumption for 
> > default behavior.
> 
> What I tell our internal customers is:
> 
> 1. Defer formatting until deployment whenever possible so that mkfs can
> optimize the filesystem for the storage and machine it actually gets.
> 
> 2. If you can't do that, then try to make the image creator machine
> match the deployment hardware as much as possible in terms of
> rotationality and CPU count.
> 
> 3. We shouldn't have a #3 because that's leaving performance on the
> table.
> 
> They were very happy to see performance gains after adjusting their
> WOS generation scripts towards #1.
> 
> But I think it's this #3 here that's causing the most concern for you?
> I suppose if you really don't know what the deployment hardware is going
> to look like then ... falling back to the default calculations (which
> are mostly for spinning-rust) at least is familiar.
> 
> Would the creation of -[dlr] concurrency=never options to mkfs.xfs
> address all of your concerns?

Now that I've read the manpage, I'm reminded that -[dlr] concurrency=0
already disables the automatic calculation.  Does injecting that into
dparam (in xfs/078) fix the problem for you?  It does for me.

--D

> > I understand the desire to DTRT by default, but I am concerned about
> > test breakage, loopdev inconsistencies, and too-broad assumptions about
> > where the resulting filesystem will actually be used.
> 
> That itself is a ... very broad statement considering that this code
> landed in 6.8 and this is the first I've heard of complaints. ;)
> 
> --D
> 
> > Thoughts?
> > Thanks,
> > -Eric
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-10 19:17 ` Darrick J. Wong
  2025-10-10 19:34   ` Darrick J. Wong
@ 2025-10-10 20:47   ` Eric Sandeen
  2025-10-14  2:32     ` Darrick J. Wong
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2025-10-10 20:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Christoph Hellwig

On 10/10/25 2:17 PM, Darrick J. Wong wrote:
> On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
>> Hey all -
>>
>> this got long, so tl;dr:
>>
>> 1) concurrency geometry breaks some xfstests for me
>> 2) concurrency behavior is not consistent w/ loopback vs. imagefile
>> 3) concurrency defaults to the mkfs machine not the mount machine

...

> Uggggh.  You're right, I see golden output changes in xfs/078 if I boost
> the number of VM CPUs past four:
> 
> --- /run/fstests/bin/tests/xfs/078.out  2025-07-15 14:41:40.195202883 -0700
> +++ /var/tmp/fstests/xfs/078.out.bad    2025-10-10 11:56:14.040263143 -0700
> @@ -188,6 +188,6 @@
>  *** mount loop filesystem
>  *** grow loop filesystem
>  xfs_growfs --BlockSize=4096 --Blocks=268435456
> -data blocks changed from 268435456 to 4194304001
> +data blocks changed from 268435456 to 4194304000
>  *** unmount
>  *** all done

Yeah I saw exactly that.

> Can this happen if RAID stripe parameters also get involved and change
> the AG count?  This test could be improved by parsing the block counts
> and using _within to get past those kinds of problems, if there's more
> than one way to make the golden output wrong despite correct operation.

Maybe? Not sure tbh. I hadn't seen it fail before.

> What do your xfs/21[67] failures look like?

On a VM w/ 8 CPUs, like this:

xfs/216 4s ... - output mismatch (see /root/xfstests-dev/results//xfs/216.out.bad)
    --- tests/xfs/216.out	2024-02-08 15:39:34.883667028 -0600
    +++ /root/xfstests-dev/results//xfs/216.out.bad	2025-10-10 15:28:24.055130959 -0500
    @@ -7,4 +7,4 @@
     fssize=32g log      =internal log           bsize=4096   blocks=16384, version=2
     fssize=64g log      =internal log           bsize=4096   blocks=16384, version=2
     fssize=128g log      =internal log           bsize=4096   blocks=16384, version=2
    -fssize=256g log      =internal log           bsize=4096   blocks=32768, version=2
    +fssize=256g log      =internal log           bsize=4096   blocks=32767, version=2
    ...
    (Run 'diff -u /root/xfstests-dev/tests/xfs/216.out /root/xfstests-dev/results//xfs/216.out.bad'  to see the entire diff)
Ran: xfs/216
Failures: xfs/216
Failed 1 of 1 tests

217 was similarly off by one. So maybe _within works ...

>> One option might be to detect whether the "concurrency" args
>> exist in mkfs.xfs, and set that back to 4, which is probably likely
>> to more or less behave the old way, and match the current golden
>> output which was (usually) based on 4 AGs. But that might break
>> the purpose of some of the tests, if we're only validating behavior
>> when a specific set of arguments is applied.
> 
> I think you're really asking to force the old behavior from before the
> concurrency options existed, but only if fstests is running.  Or maybe
> a little more than that; I'll get to that at the end.

Not so much asking for as musing about, and I don't think it's the
best idea ...

...

>> # losetup /dev/loop4 testfile.img
>>
>> # mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
>> meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks
>>
>> # mkfs.xfs -f testfile.img 2>&1 | grep agcount
>> meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks
>>
>> so we get different behavior depending on how you access the image file.
> 
> What kernel is this?  6.17 sets ROTATIONAL by default and clears it if
> the backing bdev (or the bdev backing the file) has ROTATIONAL set.
> That might be why you see the discrepancy.  I think that behavior has
> been in the kernel since ~6.11 or so.

This was 6.17. The backing file get the nonrotational/concurrency treatment
but the loop device does. This probably says more about the xfsprogs test
(ddev_is_solidstate) than the kernel.

ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)

fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
says rotational == 0 so we get true for solidstate.

But TBH this might be the right answer for mkfsing a file directly, as it is
likely an image destined for another machine.

Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
files /and/ loopback devices if possible?

> [Aside: Obviously, checking inode->i_sb->sb_bdev isn't sufficient for
> files on a multi-disk filesystem, but it's probably close enough here.]
> 
> (But see below)
> 
>> And speaking of image files, it's a pretty common use case to use mkfs.xfs
>> on image files for deployment elsewhere.  Maybe the good news, even if
> 
> Yes, mkfs defaults to assuming rotational (and hence not computing a
> concurrency factor) if the BLKROTATIONAL query fails.  So that might
> be why you get 4 AGs on a regular file but 6 on a loop device pointing
> to the same file.

Yes, exactly.

>> accidental, is that if you mkfs the file directly, you don't get system-
>> specific "concurrence" geometry. But I am concerned that there is no
>> guarantee that the machine performing mkfs is the machine that will mount
>> the filesystem, so this seems like a slightly dangerous assumption for 
>> default behavior.
> 
> What I tell our internal customers is:
> 
> 1. Defer formatting until deployment whenever possible so that mkfs can
> optimize the filesystem for the storage and machine it actually gets.
> 
> 2. If you can't do that, then try to make the image creator machine
> match the deployment hardware as much as possible in terms of
> rotationality and CPU count.

I just don't think that's practical in real life when you're creating a
generic OS image for wide distribution into unknown environments.

> 3. We shouldn't have a #3 because that's leaving performance on the
> table.
> 
> They were very happy to see performance gains after adjusting their
> WOS generation scripts towards #1.
> 
> But I think it's this #3 here that's causing the most concern for you?
> I suppose if you really don't know what the deployment hardware is going
> to look like then ... falling back to the default calculations (which
> are mostly for spinning-rust) at least is familiar.
> 
> Would the creation of -[dlr] concurrency=never options to mkfs.xfs
> address all of your concerns?

Not quite, because *defaults* have changed, and it's almost impossible
to chase down every consumer who's going to happily update xfsprogs and
suddenly get wildly different geometry which might really help them!
Or might make no sense at all. But that horse has kind of left the barn,
I guess.

I'm tempted to say "do not do the concurrency thing for regular files or
for loopback files because that's a strong hint that this filesystem has
at least a good chance of being destined for another machine." Thoughts?

(from your other email)

> Now that I've read the manpage, I'm reminded that -[dlr] concurrency=0
> already disables the automatic calculation.  Does injecting that into
> dparam (in xfs/078) fix the problem for you?  It does for me.

I'm sure it does. Gets back to my question about whether that is too
detrimental to test coverage for the default behavior. I think probably
a _within is better. _within would take a little work since we're comparing
strings with numbers in them not just numbers but I think we really do
care only about one number in the string, so it's probably fine.

>> I understand the desire to DTRT by default, but I am concerned about
>> test breakage, loopdev inconsistencies, and too-broad assumptions about
>> where the resulting filesystem will actually be used.
> 
> That itself is a ... very broad statement considering that this code
> landed in 6.8 and this is the first I've heard of complaints. ;)

Yeah, I started off with acking that I am very late to this. Been a million
work distractions keeping me from this stuff, I'm sorry. :(

But as you said, any test env. with > 4 CPUs probably started failing tests
since 6.8 so it's probably not all on me. ;)

I think where I'll go from here is do a big run with -d and -l concurrency
set to something biggish in MKFS_OPTIONS, see what breaks, and see what sort
of _within changes might help.

And then I might also see about a patch to make ddev_is_solidstate explicitly
return false for loop devices and regular files, unless anyone sees issues
with that approach.

thanks,
-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-10 20:47   ` Eric Sandeen
@ 2025-10-14  2:32     ` Darrick J. Wong
  2025-10-14 15:36       ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2025-10-14  2:32 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Christoph Hellwig

On Fri, Oct 10, 2025 at 03:47:14PM -0500, Eric Sandeen wrote:
> On 10/10/25 2:17 PM, Darrick J. Wong wrote:
> > On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
> >> Hey all -
> >>
> >> this got long, so tl;dr:
> >>
> >> 1) concurrency geometry breaks some xfstests for me
> >> 2) concurrency behavior is not consistent w/ loopback vs. imagefile
> >> 3) concurrency defaults to the mkfs machine not the mount machine
> 
> ...
> 
> > Uggggh.  You're right, I see golden output changes in xfs/078 if I boost
> > the number of VM CPUs past four:
> > 
> > --- /run/fstests/bin/tests/xfs/078.out  2025-07-15 14:41:40.195202883 -0700
> > +++ /var/tmp/fstests/xfs/078.out.bad    2025-10-10 11:56:14.040263143 -0700
> > @@ -188,6 +188,6 @@
> >  *** mount loop filesystem
> >  *** grow loop filesystem
> >  xfs_growfs --BlockSize=4096 --Blocks=268435456
> > -data blocks changed from 268435456 to 4194304001
> > +data blocks changed from 268435456 to 4194304000
> >  *** unmount
> >  *** all done
> 
> Yeah I saw exactly that.
> 
> > Can this happen if RAID stripe parameters also get involved and change
> > the AG count?  This test could be improved by parsing the block counts
> > and using _within to get past those kinds of problems, if there's more
> > than one way to make the golden output wrong despite correct operation.
> 
> Maybe? Not sure tbh. I hadn't seen it fail before.
> 
> > What do your xfs/21[67] failures look like?
> 
> On a VM w/ 8 CPUs, like this:
> 
> xfs/216 4s ... - output mismatch (see /root/xfstests-dev/results//xfs/216.out.bad)
>     --- tests/xfs/216.out	2024-02-08 15:39:34.883667028 -0600
>     +++ /root/xfstests-dev/results//xfs/216.out.bad	2025-10-10 15:28:24.055130959 -0500
>     @@ -7,4 +7,4 @@
>      fssize=32g log      =internal log           bsize=4096   blocks=16384, version=2
>      fssize=64g log      =internal log           bsize=4096   blocks=16384, version=2
>      fssize=128g log      =internal log           bsize=4096   blocks=16384, version=2
>     -fssize=256g log      =internal log           bsize=4096   blocks=32768, version=2
>     +fssize=256g log      =internal log           bsize=4096   blocks=32767, version=2
>     ...
>     (Run 'diff -u /root/xfstests-dev/tests/xfs/216.out /root/xfstests-dev/results//xfs/216.out.bad'  to see the entire diff)
> Ran: xfs/216
> Failures: xfs/216
> Failed 1 of 1 tests
> 
> 217 was similarly off by one. So maybe _within works ...
> 
> >> One option might be to detect whether the "concurrency" args
> >> exist in mkfs.xfs, and set that back to 4, which is probably likely
> >> to more or less behave the old way, and match the current golden
> >> output which was (usually) based on 4 AGs. But that might break
> >> the purpose of some of the tests, if we're only validating behavior
> >> when a specific set of arguments is applied.
> > 
> > I think you're really asking to force the old behavior from before the
> > concurrency options existed, but only if fstests is running.  Or maybe
> > a little more than that; I'll get to that at the end.
> 
> Not so much asking for as musing about, and I don't think it's the
> best idea ...
> 
> ...
> 
> >> # losetup /dev/loop4 testfile.img
> >>
> >> # mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
> >> meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks
> >>
> >> # mkfs.xfs -f testfile.img 2>&1 | grep agcount
> >> meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks
> >>
> >> so we get different behavior depending on how you access the image file.
> > 
> > What kernel is this?  6.17 sets ROTATIONAL by default and clears it if
> > the backing bdev (or the bdev backing the file) has ROTATIONAL set.
> > That might be why you see the discrepancy.  I think that behavior has
> > been in the kernel since ~6.11 or so.
> 
> This was 6.17. The backing file get the nonrotational/concurrency treatment
> but the loop device does. This probably says more about the xfsprogs test
> (ddev_is_solidstate) than the kernel.
> 
> ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)
> 
> fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
> says rotational == 0 so we get true for solidstate.
> 
> But TBH this might be the right answer for mkfsing a file directly, as it is
> likely an image destined for another machine.
> 
> Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
> files /and/ loopback devices if possible?

It's *possible*, but why would mkfs ignore what the kernel tells us?

Suppose you're creating an image file on a filesystem sitting on a SSD
with the intent of deploying the image in a mostly non-rotational
environment.  Now those people don't get any of the SSD optimizations
even though the creator had an SSD

> > [Aside: Obviously, checking inode->i_sb->sb_bdev isn't sufficient for
> > files on a multi-disk filesystem, but it's probably close enough here.]
> > 
> > (But see below)
> > 
> >> And speaking of image files, it's a pretty common use case to use mkfs.xfs
> >> on image files for deployment elsewhere.  Maybe the good news, even if
> > 
> > Yes, mkfs defaults to assuming rotational (and hence not computing a
> > concurrency factor) if the BLKROTATIONAL query fails.  So that might
> > be why you get 4 AGs on a regular file but 6 on a loop device pointing
> > to the same file.
> 
> Yes, exactly.
> 
> >> accidental, is that if you mkfs the file directly, you don't get system-
> >> specific "concurrence" geometry. But I am concerned that there is no
> >> guarantee that the machine performing mkfs is the machine that will mount
> >> the filesystem, so this seems like a slightly dangerous assumption for 
> >> default behavior.
> > 
> > What I tell our internal customers is:
> > 
> > 1. Defer formatting until deployment whenever possible so that mkfs can
> > optimize the filesystem for the storage and machine it actually gets.
> > 
> > 2. If you can't do that, then try to make the image creator machine
> > match the deployment hardware as much as possible in terms of
> > rotationality and CPU count.
> 
> I just don't think that's practical in real life when you're creating a
> generic OS image for wide distribution into unknown environments.

Uhhh well I exist in real life too.

> > 3. We shouldn't have a #3 because that's leaving performance on the
> > table.
> > 
> > They were very happy to see performance gains after adjusting their
> > WOS generation scripts towards #1.
> > 
> > But I think it's this #3 here that's causing the most concern for you?
> > I suppose if you really don't know what the deployment hardware is going
> > to look like then ... falling back to the default calculations (which
> > are mostly for spinning-rust) at least is familiar.
> > 
> > Would the creation of -[dlr] concurrency=never options to mkfs.xfs
> > address all of your concerns?
> 
> Not quite, because *defaults* have changed, and it's almost impossible
> to chase down every consumer who's going to happily update xfsprogs and
> suddenly get wildly different geometry which might really help them!
> Or might make no sense at all. But that horse has kind of left the barn,
> I guess.

I would say that the *hardware* has changed, and so should the defaults.

The pre-concurrency= defaults, FWIW, are what past xfs developers worked
out as a reasonable setting for spinning rust.

> I'm tempted to say "do not do the concurrency thing for regular files or
> for loopback files because that's a strong hint that this filesystem has
> at least a good chance of being destined for another machine." Thoughts?
> 
> (from your other email)
> 
> > Now that I've read the manpage, I'm reminded that -[dlr] concurrency=0
> > already disables the automatic calculation.  Does injecting that into
> > dparam (in xfs/078) fix the problem for you?  It does for me.
> 
> I'm sure it does. Gets back to my question about whether that is too
> detrimental to test coverage for the default behavior. I think probably
> a _within is better. _within would take a little work since we're comparing
> strings with numbers in them not just numbers but I think we really do
> care only about one number in the string, so it's probably fine.

Yeah.  Could we turn the last two digits in a sequence into "XX" so
that as long as it's not off by more than 100 fsblocks then everything
is ok?

> >> I understand the desire to DTRT by default, but I am concerned about
> >> test breakage, loopdev inconsistencies, and too-broad assumptions about
> >> where the resulting filesystem will actually be used.
> > 
> > That itself is a ... very broad statement considering that this code
> > landed in 6.8 and this is the first I've heard of complaints. ;)
> 
> Yeah, I started off with acking that I am very late to this. Been a million
> work distractions keeping me from this stuff, I'm sorry. :(
> 
> But as you said, any test env. with > 4 CPUs probably started failing tests
> since 6.8 so it's probably not all on me. ;)
> 
> I think where I'll go from here is do a big run with -d and -l concurrency
> set to something biggish in MKFS_OPTIONS, see what breaks, and see what sort
> of _within changes might help.

<nod>

> And then I might also see about a patch to make ddev_is_solidstate explicitly
> return false for loop devices and regular files, unless anyone sees issues
> with that approach.

As I said before, that might be surprising to anyone who saw that the
loop device is not rotational and didn't get nonrotational tuning.

--D

> thanks,
> -Eric
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-14  2:32     ` Darrick J. Wong
@ 2025-10-14 15:36       ` Eric Sandeen
  2025-10-17 22:46         ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2025-10-14 15:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Christoph Hellwig

On 10/13/25 9:32 PM, Darrick J. Wong wrote:
>> This was 6.17. The backing file get the nonrotational/concurrency treatment
>> but the loop device does. This probably says more about the xfsprogs test
>> (ddev_is_solidstate) than the kernel.
>>
>> ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)
>>
>> fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
>> says rotational == 0 so we get true for solidstate.
>>
>> But TBH this might be the right answer for mkfsing a file directly, as it is
>> likely an image destined for another machine.
>>
>> Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
>> files /and/ loopback devices if possible?
> It's *possible*, but why would mkfs ignore what the kernel tells us?

Because for one, it's not reliable or consistent. A loop device and its backing
file are on the same storage, of course. We get different answers when we try to
query one vs the other via the ioctl, currently.

And for two, the actual access patterns and behavior or writing to a loopback
file aren't really the same as either flavor of block device any way, right?

> Suppose you're creating an image file on a filesystem sitting on a SSD
> with the intent of deploying the image in a mostly non-rotational
> environment.  Now those people don't get any of the SSD optimizations
> even though the creator had an SSD

Now suppose you're creating it for deployment on rotating storage, instead.

My point is if the admin is making an image file, mkfs.xfs has absolutely
no idea where that image will be deployed. The administrator might, and 
could explicitly set parameters as needed based on that knowledge.

...

>>> What I tell our internal customers is:
>>>
>>> 1. Defer formatting until deployment whenever possible so that mkfs can
>>> optimize the filesystem for the storage and machine it actually gets.
>>>
>>> 2. If you can't do that, then try to make the image creator machine
>>> match the deployment hardware as much as possible in terms of
>>> rotationality and CPU count.

>> I just don't think that's practical in real life when you're creating a
>> generic OS image for wide distribution into unknown environments.
> Uhhh well I exist in real life too.

Of course...?

I read #2 as "make sure the system you run mkfs on has the same CPU count
as any system you'll deploy that image on" and that's not possible for
a generic image destined for wide deployment into varied environments.

rotationality is pretty trivial and is almost always "not rotational" so
that's not really my major concern. My concern is how CPU count affects
geometry now, by default, once nonrotationality has been determined.

For example if there's some fleet of machines used to produce
an OS and its images, the images may vary depending on which build machine
the image build task lands on unless they all have exactly the same CPU
count. Or say you build for ppc64, aarch64, and x86_64. By default, you're
almost guaranteed to get different fs geometry for each. I just think that's
odd and unexpected. (I understand that it can be overridden but this is
nothing anyone likely expects to be necessary.)

I agree that it makes sense to optimize for nonrotationality more than we
have in the past, by default, for image files. I totally get your point about
how 4 AGs is an optimization for the old world.

So I think my rambling boils down to a few things:

1) mkfsing a file-backed image should be consistent whether you access
   it through a loop device or you open the file directly. That's not
   currently the case.

2) When you are a mkfsing a file-backed image instead of a block device,
   that's a big hint that the filesystem is destined for use on other
   machines, about which mkfs.xfs knows nothing.

3) To meet in the middle, rather than falling back to the old rotational
   behavior of 4 AGs for image files, maybe a new image-file-specific
   heuristic of "more AGs than before, but not scaled by local CPU count"
   would be reasonable. This would make image file generation yield
   consistent and repeatable geometries, by default.

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-14 15:36       ` Eric Sandeen
@ 2025-10-17 22:46         ` Darrick J. Wong
  2025-10-18 15:01           ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2025-10-17 22:46 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Christoph Hellwig

On Tue, Oct 14, 2025 at 10:36:14AM -0500, Eric Sandeen wrote:
> On 10/13/25 9:32 PM, Darrick J. Wong wrote:
> >> This was 6.17. The backing file get the nonrotational/concurrency treatment
> >> but the loop device does. This probably says more about the xfsprogs test
> >> (ddev_is_solidstate) than the kernel.
> >>
> >> ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)
> >>
> >> fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
> >> says rotational == 0 so we get true for solidstate.
> >>
> >> But TBH this might be the right answer for mkfsing a file directly, as it is
> >> likely an image destined for another machine.
> >>
> >> Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
> >> files /and/ loopback devices if possible?
> > It's *possible*, but why would mkfs ignore what the kernel tells us?
> 
> Because for one, it's not reliable or consistent. A loop device and its backing
> file are on the same storage, of course. We get different answers when we try to
> query one vs the other via the ioctl, currently.

The way I see things, I've told you how to turn off the ssd
optimizations for golden image creation.  You don't appear to like that
solution and would prefer some sort of heuristic based on stat::st_rdev.
I suggest you send a patch with your exact solution so that we can all
discuss on list.

--D

> And for two, the actual access patterns and behavior or writing to a loopback
> file aren't really the same as either flavor of block device any way, right?
> 
> > Suppose you're creating an image file on a filesystem sitting on a SSD
> > with the intent of deploying the image in a mostly non-rotational
> > environment.  Now those people don't get any of the SSD optimizations
> > even though the creator had an SSD
> 
> Now suppose you're creating it for deployment on rotating storage, instead.
> 
> My point is if the admin is making an image file, mkfs.xfs has absolutely
> no idea where that image will be deployed. The administrator might, and 
> could explicitly set parameters as needed based on that knowledge.
> 
> ...
> 
> >>> What I tell our internal customers is:
> >>>
> >>> 1. Defer formatting until deployment whenever possible so that mkfs can
> >>> optimize the filesystem for the storage and machine it actually gets.
> >>>
> >>> 2. If you can't do that, then try to make the image creator machine
> >>> match the deployment hardware as much as possible in terms of
> >>> rotationality and CPU count.
> 
> >> I just don't think that's practical in real life when you're creating a
> >> generic OS image for wide distribution into unknown environments.
> > Uhhh well I exist in real life too.
> 
> Of course...?
> 
> I read #2 as "make sure the system you run mkfs on has the same CPU count
> as any system you'll deploy that image on" and that's not possible for
> a generic image destined for wide deployment into varied environments.
> 
> rotationality is pretty trivial and is almost always "not rotational" so
> that's not really my major concern. My concern is how CPU count affects
> geometry now, by default, once nonrotationality has been determined.
> 
> For example if there's some fleet of machines used to produce
> an OS and its images, the images may vary depending on which build machine
> the image build task lands on unless they all have exactly the same CPU
> count. Or say you build for ppc64, aarch64, and x86_64. By default, you're
> almost guaranteed to get different fs geometry for each. I just think that's
> odd and unexpected. (I understand that it can be overridden but this is
> nothing anyone likely expects to be necessary.)
> 
> I agree that it makes sense to optimize for nonrotationality more than we
> have in the past, by default, for image files. I totally get your point about
> how 4 AGs is an optimization for the old world.
> 
> So I think my rambling boils down to a few things:
> 
> 1) mkfsing a file-backed image should be consistent whether you access
>    it through a loop device or you open the file directly. That's not
>    currently the case.
> 
> 2) When you are a mkfsing a file-backed image instead of a block device,
>    that's a big hint that the filesystem is destined for use on other
>    machines, about which mkfs.xfs knows nothing.
> 
> 3) To meet in the middle, rather than falling back to the old rotational
>    behavior of 4 AGs for image files, maybe a new image-file-specific
>    heuristic of "more AGs than before, but not scaled by local CPU count"
>    would be reasonable. This would make image file generation yield
>    consistent and repeatable geometries, by default.
> 
> -Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mkfs.xfs "concurrency" change concerns
  2025-10-17 22:46         ` Darrick J. Wong
@ 2025-10-18 15:01           ` Eric Sandeen
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Sandeen @ 2025-10-18 15:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org,
	Christoph Hellwig

On 10/17/25 5:46 PM, Darrick J. Wong wrote:
> On Tue, Oct 14, 2025 at 10:36:14AM -0500, Eric Sandeen wrote:
>> On 10/13/25 9:32 PM, Darrick J. Wong wrote:
>>>> This was 6.17. The backing file get the nonrotational/concurrency treatment
>>>> but the loop device does. This probably says more about the xfsprogs test
>>>> (ddev_is_solidstate) than the kernel.
>>>>
>>>> ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)
>>>>
>>>> fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
>>>> says rotational == 0 so we get true for solidstate.
>>>>
>>>> But TBH this might be the right answer for mkfsing a file directly, as it is
>>>> likely an image destined for another machine.
>>>>
>>>> Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
>>>> files /and/ loopback devices if possible?
>>> It's *possible*, but why would mkfs ignore what the kernel tells us?
>>
>> Because for one, it's not reliable or consistent. A loop device and its backing
>> file are on the same storage, of course. We get different answers when we try to
>> query one vs the other via the ioctl, currently.
> 
> The way I see things, I've told you how to turn off the ssd
> optimizations for golden image creation.  You don't appear to like that
> solution and would prefer some sort of heuristic based on stat::st_rdev.
> I suggest you send a patch with your exact solution so that we can all
> discuss on list.

The way I see it, given the tone of the responses so far, I'll be walking
into a buzz saw regardless of how I approach it, so ... never mind.

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-18 15:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 20:13 mkfs.xfs "concurrency" change concerns Eric Sandeen
2025-10-10  5:17 ` Christoph Hellwig
2025-10-10 19:17 ` Darrick J. Wong
2025-10-10 19:34   ` Darrick J. Wong
2025-10-10 20:47   ` Eric Sandeen
2025-10-14  2:32     ` Darrick J. Wong
2025-10-14 15:36       ` Eric Sandeen
2025-10-17 22:46         ` Darrick J. Wong
2025-10-18 15:01           ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).