public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: aalbersh@kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] mkfs: enable new features by default
Date: Wed, 10 Dec 2025 09:25:24 +1100	[thread overview]
Message-ID: <aTih1FDXt8fMrIb4@dread.disaster.area> (raw)
In-Reply-To: <176529676146.3974899.6119777261763784206.stgit@frogsfrogsfrogs>

On Tue, Dec 09, 2025 at 08:16:08AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Since the LTS is coming up, enable parent pointers and exchange-range by
> default for all users.  Also fix up an out of date comment.
> 
> I created a really stupid benchmarking script that does:
> 
> #!/bin/bash
> 
> # pptr overhead benchmark
> 
> umount /opt /mnt
> rmmod xfs
> for i in 1 0; do
> 	umount /opt
> 	mkfs.xfs -f /dev/sdb -n parent=$i | grep -i parent=
> 	mount /dev/sdb /opt
> 	mkdir -p /opt/foo
> 	for ((i=0;i<5;i++)); do
> 		time fsstress -n 100000 -p 4 -z -f creat=1 -d /opt/foo -s 1
> 	done
> done

Hmmm. fsstress is an interesting choice here...

> This is the result of creating an enormous number of empty files in a
> single directory:
> 
> # ./dumb.sh
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
> real    0m18.807s
> user    0m2.169s
> sys     0m54.013s

> 
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
> real    0m20.654s
> user    0m2.374s
> sys     1m4.441s

Yeah, that's only creating 20,000 files/sec. That's a lot less than
expect a single thread to be able to do - why is the kernel burning
all 4 CPUs on this workload?

i.e. i'd expect a pure create workload to run at about 40,000
files/s with sleeping contention on the i_rwsem, but this is much
slower than I'd expect and contention is on a spinning lock...

Also, parent pointers add about 20% more system time overhead (54s
sys time to 64.4s sys time). Where does this come from? Do you have
kernel profiles? Is it PP overhead, a change in the contention
point, or just worse contention on the same resource?

> As you can see, there's a 10% increase in runtime here.  If I make the
> workload a bit more representative by changing the -f argument to
> include a directory tree workout:
> 
> -f creat=1,mkdir=1,mknod=1,rmdir=1,unlink=1,link=1,rename=1
> 
> 
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
> real    0m12.742s
> user    0m28.074s
> sys     0m10.839s
> 
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
> real    0m12.782s
> user    0m28.892s
> sys     0m8.897s

Again, that's way slower than I'd expect a 4p metadata workload to
run through 400k modification ops. i.e. it's running at about 35k
ops/s, and I'd be expecting the baseline to be upwards of 100k
ops/s.

Ah, look at the amount of time spent in userspace - 28-20s vs 9-11s
spent in the kernel filesystem code.

Ok, performance is limited by the usrespace code, not the kernel
code. I would expect a decent fs benchmark to be at most 10%
userspace CPU time, with >90% of the time being spent in the kernel
doing filesystem operations.

IOWs, there is way too much userspace overhead in this worklaod to
draw useful conclusions about the impact of the kernel side changes.

System time went up from 9s to 11s when parent pointers are turned
on - a 20% increase in CPU overhead - but that additional overhead
isn't reflected in the wall time results because the CPU overehad is
dominated by the userspace program, not the kernel code that is
being "measured".

> Almost no difference here.

Ah, no. Again, system time went up by ~20%, even though elapsed time
was unchanged. That implies there is some amount of sleeping
contention occurring between processes doing work, and the
additional CPU overhead of the PP code simply resulted in less sleep
time.

Again, this is not noticable because the workload is dominated by
userspace CPU overhead, not the kernel/filesystem operation
overhead...


> If I then actually write to the regular
> files by adding:
> 
> -f write=1
> 
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
> real    0m16.668s
> user    0m21.709s
> sys     0m15.425s
> 
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
> real    0m15.562s
> user    0m21.740s
> sys     0m12.927s
> 
> So that's about a 2% difference.

Same here - system time went up by 25%, even though wall time didn't
change. Also, 15.5s to 16.6s increase in wall time is actually
a 7% difference in runtime, not 2%.

----

Overall, I don't think the benchmarking documented here is
sufficient to justify the conclusion that "parent pointers have
little real world overhead so we can turn them on by default".

I would at least like to see the "will-it-scale" impact on a 64p
machine with a hundred GB of RAM and IO subsystem at least capable
of a million IOPS and a filesystem optimised for max performance
(e.g. highly parallel fsmark based workloads). This will push the
filesystem and CPU usage to their actual limits and directly expose
additional overhead and new contention points in the results.

This is also much more representative of the sorts of high
performance, high end deployments that we expect XFS to be deployed
on, and where performance impact actually matters to users.

i.e. we need to know what the impact of the change is on the high
end as well as low end VM/desktop configs before any conclusion can
be drawn w.r.t. changing the parent pointer default setting....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2025-12-09 22:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-09 16:16 [PATCHSET V2] xfsprogs: enable new stable features for 6.18 Darrick J. Wong
2025-12-09 16:16 ` [PATCH 1/2] mkfs: enable new features by default Darrick J. Wong
2025-12-09 16:22   ` Christoph Hellwig
2025-12-09 22:25   ` Dave Chinner [this message]
2025-12-10 23:49     ` Darrick J. Wong
2025-12-15 23:59       ` Dave Chinner
2025-12-16 23:07         ` Darrick J. Wong
2025-12-09 16:16 ` [PATCH 2/2] mkfs: add 2025 LTS config file Darrick J. Wong
2025-12-09 16:23   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2025-12-02  1:27 [PATCHSET 2/2] xfsprogs: enable new stable features for 6.18 Darrick J. Wong
2025-12-02  1:28 ` [PATCH 1/2] mkfs: enable new features by default Darrick J. Wong
2025-12-02  7:38   ` Christoph Hellwig
2025-12-03  0:53     ` Darrick J. Wong
2025-12-03  6:31       ` Christoph Hellwig
2025-12-04 18:48         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTih1FDXt8fMrIb4@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=aalbersh@kernel.org \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox