Re: mkfs.xfs "concurrency" change concerns

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"fstests@vger.kernel.org" <fstests@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: mkfs.xfs "concurrency" change concerns
Date: Mon, 13 Oct 2025 19:32:28 -0700	[thread overview]
Message-ID: <20251014023228.GU6188@frogsfrogsfrogs> (raw)
In-Reply-To: <f7d5eaab-c2fe-4a11-82d5-a7c5ca563e75@sandeen.net>

On Fri, Oct 10, 2025 at 03:47:14PM -0500, Eric Sandeen wrote:
> On 10/10/25 2:17 PM, Darrick J. Wong wrote:
> > On Thu, Oct 09, 2025 at 03:13:47PM -0500, Eric Sandeen wrote:
> >> Hey all -
> >>
> >> this got long, so tl;dr:
> >>
> >> 1) concurrency geometry breaks some xfstests for me
> >> 2) concurrency behavior is not consistent w/ loopback vs. imagefile
> >> 3) concurrency defaults to the mkfs machine not the mount machine
> 
> ...
> 
> > Uggggh.  You're right, I see golden output changes in xfs/078 if I boost
> > the number of VM CPUs past four:
> > 
> > --- /run/fstests/bin/tests/xfs/078.out  2025-07-15 14:41:40.195202883 -0700
> > +++ /var/tmp/fstests/xfs/078.out.bad    2025-10-10 11:56:14.040263143 -0700
> > @@ -188,6 +188,6 @@
> >  *** mount loop filesystem
> >  *** grow loop filesystem
> >  xfs_growfs --BlockSize=4096 --Blocks=268435456
> > -data blocks changed from 268435456 to 4194304001
> > +data blocks changed from 268435456 to 4194304000
> >  *** unmount
> >  *** all done
> 
> Yeah I saw exactly that.
> 
> > Can this happen if RAID stripe parameters also get involved and change
> > the AG count?  This test could be improved by parsing the block counts
> > and using _within to get past those kinds of problems, if there's more
> > than one way to make the golden output wrong despite correct operation.
> 
> Maybe? Not sure tbh. I hadn't seen it fail before.
> 
> > What do your xfs/21[67] failures look like?
> 
> On a VM w/ 8 CPUs, like this:
> 
> xfs/216 4s ... - output mismatch (see /root/xfstests-dev/results//xfs/216.out.bad)
>     --- tests/xfs/216.out	2024-02-08 15:39:34.883667028 -0600
>     +++ /root/xfstests-dev/results//xfs/216.out.bad	2025-10-10 15:28:24.055130959 -0500
>     @@ -7,4 +7,4 @@
>      fssize=32g log      =internal log           bsize=4096   blocks=16384, version=2
>      fssize=64g log      =internal log           bsize=4096   blocks=16384, version=2
>      fssize=128g log      =internal log           bsize=4096   blocks=16384, version=2
>     -fssize=256g log      =internal log           bsize=4096   blocks=32768, version=2
>     +fssize=256g log      =internal log           bsize=4096   blocks=32767, version=2
>     ...
>     (Run 'diff -u /root/xfstests-dev/tests/xfs/216.out /root/xfstests-dev/results//xfs/216.out.bad'  to see the entire diff)
> Ran: xfs/216
> Failures: xfs/216
> Failed 1 of 1 tests
> 
> 217 was similarly off by one. So maybe _within works ...
> 
> >> One option might be to detect whether the "concurrency" args
> >> exist in mkfs.xfs, and set that back to 4, which is probably likely
> >> to more or less behave the old way, and match the current golden
> >> output which was (usually) based on 4 AGs. But that might break
> >> the purpose of some of the tests, if we're only validating behavior
> >> when a specific set of arguments is applied.
> > 
> > I think you're really asking to force the old behavior from before the
> > concurrency options existed, but only if fstests is running.  Or maybe
> > a little more than that; I'll get to that at the end.
> 
> Not so much asking for as musing about, and I don't think it's the
> best idea ...
> 
> ...
> 
> >> # losetup /dev/loop4 testfile.img
> >>
> >> # mkfs.xfs -f /dev/loop4 2>&1 | grep agcount
> >> meta-data=/dev/loop4             isize=512    agcount=6, agsize=11184810 blks
> >>
> >> # mkfs.xfs -f testfile.img 2>&1 | grep agcount
> >> meta-data=testfile.img           isize=512    agcount=4, agsize=16777216 blks
> >>
> >> so we get different behavior depending on how you access the image file.
> > 
> > What kernel is this?  6.17 sets ROTATIONAL by default and clears it if
> > the backing bdev (or the bdev backing the file) has ROTATIONAL set.
> > That might be why you see the discrepancy.  I think that behavior has
> > been in the kernel since ~6.11 or so.
> 
> This was 6.17. The backing file get the nonrotational/concurrency treatment
> but the loop device does. This probably says more about the xfsprogs test
> (ddev_is_solidstate) than the kernel.
> 
> ioctl(3, BLKROTATIONAL, 0x7ffd9d48f696) = -1 ENOTTY (Inappropriate ioctl for device)
> 
> fails, so ddev_is_solidstate returns false. For the loop dev, BLKROTATIONAL
> says rotational == 0 so we get true for solidstate.
> 
> But TBH this might be the right answer for mkfsing a file directly, as it is
> likely an image destined for another machine.
> 
> Perhaps ddev_is_solidstate() should default to "not solidstate" for regular
> files /and/ loopback devices if possible?

It's *possible*, but why would mkfs ignore what the kernel tells us?

Suppose you're creating an image file on a filesystem sitting on a SSD
with the intent of deploying the image in a mostly non-rotational
environment.  Now those people don't get any of the SSD optimizations
even though the creator had an SSD

> > [Aside: Obviously, checking inode->i_sb->sb_bdev isn't sufficient for
> > files on a multi-disk filesystem, but it's probably close enough here.]
> > 
> > (But see below)
> > 
> >> And speaking of image files, it's a pretty common use case to use mkfs.xfs
> >> on image files for deployment elsewhere.  Maybe the good news, even if
> > 
> > Yes, mkfs defaults to assuming rotational (and hence not computing a
> > concurrency factor) if the BLKROTATIONAL query fails.  So that might
> > be why you get 4 AGs on a regular file but 6 on a loop device pointing
> > to the same file.
> 
> Yes, exactly.
> 
> >> accidental, is that if you mkfs the file directly, you don't get system-
> >> specific "concurrence" geometry. But I am concerned that there is no
> >> guarantee that the machine performing mkfs is the machine that will mount
> >> the filesystem, so this seems like a slightly dangerous assumption for 
> >> default behavior.
> > 
> > What I tell our internal customers is:
> > 
> > 1. Defer formatting until deployment whenever possible so that mkfs can
> > optimize the filesystem for the storage and machine it actually gets.
> > 
> > 2. If you can't do that, then try to make the image creator machine
> > match the deployment hardware as much as possible in terms of
> > rotationality and CPU count.
> 
> I just don't think that's practical in real life when you're creating a
> generic OS image for wide distribution into unknown environments.

Uhhh well I exist in real life too.

> > 3. We shouldn't have a #3 because that's leaving performance on the
> > table.
> > 
> > They were very happy to see performance gains after adjusting their
> > WOS generation scripts towards #1.
> > 
> > But I think it's this #3 here that's causing the most concern for you?
> > I suppose if you really don't know what the deployment hardware is going
> > to look like then ... falling back to the default calculations (which
> > are mostly for spinning-rust) at least is familiar.
> > 
> > Would the creation of -[dlr] concurrency=never options to mkfs.xfs
> > address all of your concerns?
> 
> Not quite, because *defaults* have changed, and it's almost impossible
> to chase down every consumer who's going to happily update xfsprogs and
> suddenly get wildly different geometry which might really help them!
> Or might make no sense at all. But that horse has kind of left the barn,
> I guess.

I would say that the *hardware* has changed, and so should the defaults.

The pre-concurrency= defaults, FWIW, are what past xfs developers worked
out as a reasonable setting for spinning rust.

> I'm tempted to say "do not do the concurrency thing for regular files or
> for loopback files because that's a strong hint that this filesystem has
> at least a good chance of being destined for another machine." Thoughts?
> 
> (from your other email)
> 
> > Now that I've read the manpage, I'm reminded that -[dlr] concurrency=0
> > already disables the automatic calculation.  Does injecting that into
> > dparam (in xfs/078) fix the problem for you?  It does for me.
> 
> I'm sure it does. Gets back to my question about whether that is too
> detrimental to test coverage for the default behavior. I think probably
> a _within is better. _within would take a little work since we're comparing
> strings with numbers in them not just numbers but I think we really do
> care only about one number in the string, so it's probably fine.

Yeah.  Could we turn the last two digits in a sequence into "XX" so
that as long as it's not off by more than 100 fsblocks then everything
is ok?

> >> I understand the desire to DTRT by default, but I am concerned about
> >> test breakage, loopdev inconsistencies, and too-broad assumptions about
> >> where the resulting filesystem will actually be used.
> > 
> > That itself is a ... very broad statement considering that this code
> > landed in 6.8 and this is the first I've heard of complaints. ;)
> 
> Yeah, I started off with acking that I am very late to this. Been a million
> work distractions keeping me from this stuff, I'm sorry. :(
> 
> But as you said, any test env. with > 4 CPUs probably started failing tests
> since 6.8 so it's probably not all on me. ;)
> 
> I think where I'll go from here is do a big run with -d and -l concurrency
> set to something biggish in MKFS_OPTIONS, see what breaks, and see what sort
> of _within changes might help.

<nod>

> And then I might also see about a patch to make ddev_is_solidstate explicitly
> return false for loop devices and regular files, unless anyone sees issues
> with that approach.

As I said before, that might be surprising to anyone who saw that the
loop device is not rotational and didn't get nonrotational tuning.

--D

> thanks,
> -Eric
>

next prev parent reply	other threads:[~2025-10-14  2:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09 20:13 mkfs.xfs "concurrency" change concerns Eric Sandeen
2025-10-10  5:17 ` Christoph Hellwig
2025-10-10 19:17 ` Darrick J. Wong
2025-10-10 19:34   ` Darrick J. Wong
2025-10-10 20:47   ` Eric Sandeen
2025-10-14  2:32     ` Darrick J. Wong [this message]
2025-10-14 15:36       ` Eric Sandeen
2025-10-17 22:46         ` Darrick J. Wong
2025-10-18 15:01           ` Eric Sandeen
2026-06-15 14:19 ` Christoph Hellwig
2026-06-15 15:24   ` Eric Sandeen
2026-06-15 15:40   ` Darrick J. Wong
2026-06-16  5:24     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251014023228.GU6188@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.