Re: [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
       [not found]         ` <Z3ZhNYHKZPMpv8Cz@ryzen>
@ 2025-01-03  6:49           ` Christoph Hellwig
  2025-01-03  9:09             ` Niklas Cassel
  2025-01-07  8:26             ` Oliver Sang
  0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-01-03  6:49 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, Oliver Sang, oe-lkp, lkp, linux-kernel,
	Jens Axboe, linux-block, virtualization, linux-nvme,
	Damien Le Moal, linux-btrfs, linux-aio

On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > from below information, it seems an 'ahci' to me. but since I have limited
> > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > please let us know. thanks a lot!
> > 
> > Yes, this looks like ahci.  Thanks a lot!
> 
> Did this ever get resolved?
> 
> I haven't seen a patch that seems to address this.
> 
> AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> reordering when issuing a batch of commands, my guess is that the problem
> also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> or block layer.

I started looking into this before the holidays.  blktrace shows perfectly
sequential writes without any reordering using ahci, directly on the
block device or using xfs and btrfs when using dd.  I also started
looking into what the test does and got as far as checking out the
stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
submission does simple reads and writes using increasing offsets.
So if the test result isn't a fluke either the aio code does some
weird reordering or btrfs does.

Oliver, did the test also show any interesting results on non-btrfs
setups?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-03  6:49           ` [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression Christoph Hellwig
@ 2025-01-03  9:09             ` Niklas Cassel
  2025-01-06  7:21               ` Christoph Hellwig
  2025-01-07  8:27               ` Oliver Sang
  2025-01-07  8:26             ` Oliver Sang
  1 sibling, 2 replies; 11+ messages in thread
From: Niklas Cassel @ 2025-01-03  9:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, Jens Axboe, linux-block,
	virtualization, linux-nvme, Damien Le Moal, linux-btrfs,
	linux-aio

On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > please let us know. thanks a lot!
> > > 
> > > Yes, this looks like ahci.  Thanks a lot!
> > 
> > Did this ever get resolved?
> > 
> > I haven't seen a patch that seems to address this.
> > 
> > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > reordering when issuing a batch of commands, my guess is that the problem
> > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > or block layer.
> 
> I started looking into this before the holidays.  blktrace shows perfectly
> sequential writes without any reordering using ahci, directly on the
> block device or using xfs and btrfs when using dd.  I also started
> looking into what the test does and got as far as checking out the
> stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> submission does simple reads and writes using increasing offsets.
> So if the test result isn't a fluke either the aio code does some
> weird reordering or btrfs does.
> 
> Oliver, did the test also show any interesting results on non-btrfs
> setups?
> 

One thing that came to mind.
Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
the I/O scheduler to BFQ for single-queue HDDs.

It could very well be the I/O scheduler that reorders.

Oliver, which I/O scheduler are you using?
$ cat /sys/block/sdb/queue/scheduler 
none mq-deadline kyber [bfq]


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-03  9:09             ` Niklas Cassel
@ 2025-01-06  7:21               ` Christoph Hellwig
  2025-01-07  8:27               ` Oliver Sang
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-01-06  7:21 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, Oliver Sang, oe-lkp, lkp, linux-kernel,
	Jens Axboe, linux-block, virtualization, linux-nvme,
	Damien Le Moal, linux-btrfs, linux-aio

On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> One thing that came to mind.
> Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> the I/O scheduler to BFQ for single-queue HDDs.
> 
> It could very well be the I/O scheduler that reorders.
> 
> Oliver, which I/O scheduler are you using?
> $ cat /sys/block/sdb/queue/scheduler 
> none mq-deadline kyber [bfq]

I tried cfq as well and there is no reordering with our without various
file systems in the mix.  I've also tried forcing the rotational
attribute on and off just for an extra variation.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-03  6:49           ` [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression Christoph Hellwig
  2025-01-03  9:09             ` Niklas Cassel
@ 2025-01-07  8:26             ` Oliver Sang
  1 sibling, 0 replies; 11+ messages in thread
From: Oliver Sang @ 2025-01-07  8:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Niklas Cassel, oe-lkp, lkp, linux-kernel, Jens Axboe, linux-block,
	virtualization, linux-nvme, Damien Le Moal, linux-btrfs,
	linux-aio, oliver.sang

hi, Christoph Hellwig,

On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > please let us know. thanks a lot!
> > > 
> > > Yes, this looks like ahci.  Thanks a lot!
> > 
> > Did this ever get resolved?
> > 
> > I haven't seen a patch that seems to address this.
> > 
> > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > reordering when issuing a batch of commands, my guess is that the problem
> > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > or block layer.
> 
> I started looking into this before the holidays.  blktrace shows perfectly
> sequential writes without any reordering using ahci, directly on the
> block device or using xfs and btrfs when using dd.  I also started
> looking into what the test does and got as far as checking out the
> stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> submission does simple reads and writes using increasing offsets.
> So if the test result isn't a fluke either the aio code does some
> weird reordering or btrfs does.
> 
> Oliver, did the test also show any interesting results on non-btrfs
> setups?
> 

I tried to run with ext4 fs [1] and xfs [2], seems not be able to get stable
results (%stddev is too big, even bigger than %change). seems no value from
both tests.


[1]
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/ext4/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/aiol/stress-ng/60s

a3396b99990d8b4e e70c301faece15b618e54b613b1
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
    142.01 ± 17%      -4.6%     135.55 ± 18%  stress-ng.aiol.async_I/O_events_completed_per_sec
     14077 ± 14%      -3.3%      13617 ± 15%  stress-ng.aiol.ops
    233.95 ± 14%      -3.4%     225.97 ± 15%  stress-ng.aiol.ops_per_sec


[2]
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/xfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/aiol/stress-ng/60s

a3396b99990d8b4e e70c301faece15b618e54b613b1
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
     11.97 ± 21%     +18.5%      14.19 ± 44%  stress-ng.aiol.async_I/O_events_completed_per_sec
      1498 ± 33%      +9.5%       1640 ± 49%  stress-ng.aiol.ops
     23.45 ± 34%     +10.2%      25.85 ± 52%  stress-ng.aiol.ops_per_sec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-03  9:09             ` Niklas Cassel
  2025-01-06  7:21               ` Christoph Hellwig
@ 2025-01-07  8:27               ` Oliver Sang
  2025-01-08 10:39                 ` Niklas Cassel
  1 sibling, 1 reply; 11+ messages in thread
From: Oliver Sang @ 2025-01-07  8:27 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio, oliver.sang

hi, Niklas,

On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> > On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > > please let us know. thanks a lot!
> > > > 
> > > > Yes, this looks like ahci.  Thanks a lot!
> > > 
> > > Did this ever get resolved?
> > > 
> > > I haven't seen a patch that seems to address this.
> > > 
> > > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > > reordering when issuing a batch of commands, my guess is that the problem
> > > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > > or block layer.
> > 
> > I started looking into this before the holidays.  blktrace shows perfectly
> > sequential writes without any reordering using ahci, directly on the
> > block device or using xfs and btrfs when using dd.  I also started
> > looking into what the test does and got as far as checking out the
> > stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> > submission does simple reads and writes using increasing offsets.
> > So if the test result isn't a fluke either the aio code does some
> > weird reordering or btrfs does.
> > 
> > Oliver, did the test also show any interesting results on non-btrfs
> > setups?
> > 
> 
> One thing that came to mind.
> Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> the I/O scheduler to BFQ for single-queue HDDs.
> 
> It could very well be the I/O scheduler that reorders.
> 
> Oliver, which I/O scheduler are you using?
> $ cat /sys/block/sdb/queue/scheduler 
> none mq-deadline kyber [bfq]

while our test running:

# cat /sys/block/sdb/queue/scheduler
none [mq-deadline] kyber bfq

> 
> 
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-07  8:27               ` Oliver Sang
@ 2025-01-08 10:39                 ` Niklas Cassel
  2025-01-10  6:53                   ` Oliver Sang
  2025-01-14  6:45                   ` Oliver Sang
  0 siblings, 2 replies; 11+ messages in thread
From: Niklas Cassel @ 2025-01-08 10:39 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio

On Tue, Jan 07, 2025 at 04:27:44PM +0800, Oliver Sang wrote:
> hi, Niklas,
> 
> On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> > On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> > > On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > > > please let us know. thanks a lot!
> > > > > 
> > > > > Yes, this looks like ahci.  Thanks a lot!
> > > > 
> > > > Did this ever get resolved?
> > > > 
> > > > I haven't seen a patch that seems to address this.
> > > > 
> > > > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > > > reordering when issuing a batch of commands, my guess is that the problem
> > > > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > > > or block layer.
> > > 
> > > I started looking into this before the holidays.  blktrace shows perfectly
> > > sequential writes without any reordering using ahci, directly on the
> > > block device or using xfs and btrfs when using dd.  I also started
> > > looking into what the test does and got as far as checking out the
> > > stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> > > submission does simple reads and writes using increasing offsets.
> > > So if the test result isn't a fluke either the aio code does some
> > > weird reordering or btrfs does.
> > > 
> > > Oliver, did the test also show any interesting results on non-btrfs
> > > setups?
> > > 
> > 
> > One thing that came to mind.
> > Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> > the I/O scheduler to BFQ for single-queue HDDs.
> > 
> > It could very well be the I/O scheduler that reorders.
> > 
> > Oliver, which I/O scheduler are you using?
> > $ cat /sys/block/sdb/queue/scheduler 
> > none mq-deadline kyber [bfq]
> 
> while our test running:
> 
> # cat /sys/block/sdb/queue/scheduler
> none [mq-deadline] kyber bfq

The stddev numbers you showed is all over the place, so are we certain
if this is a regression caused by commit e70c301faece ("block:
don't reorder requests in blk_add_rq_to_plug") ?

Do you know if the stddev has such big variation for this test even before
the commit?


If it is not too much to ask... It might be interesting to know if we see
a regression when comparing before/after e70c301faece with scheduler none
instead of mq-deadline.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-08 10:39                 ` Niklas Cassel
@ 2025-01-10  6:53                   ` Oliver Sang
  2025-01-15 11:42                     ` Niklas Cassel
  2025-01-14  6:45                   ` Oliver Sang
  1 sibling, 1 reply; 11+ messages in thread
From: Oliver Sang @ 2025-01-10  6:53 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio, oliver.sang

hi, Niklas,

On Wed, Jan 08, 2025 at 11:39:28AM +0100, Niklas Cassel wrote:
> On Tue, Jan 07, 2025 at 04:27:44PM +0800, Oliver Sang wrote:
> > hi, Niklas,
> > 
> > On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> > > On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> > > > On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > > > > please let us know. thanks a lot!
> > > > > > 
> > > > > > Yes, this looks like ahci.  Thanks a lot!
> > > > > 
> > > > > Did this ever get resolved?
> > > > > 
> > > > > I haven't seen a patch that seems to address this.
> > > > > 
> > > > > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > > > > reordering when issuing a batch of commands, my guess is that the problem
> > > > > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > > > > or block layer.
> > > > 
> > > > I started looking into this before the holidays.  blktrace shows perfectly
> > > > sequential writes without any reordering using ahci, directly on the
> > > > block device or using xfs and btrfs when using dd.  I also started
> > > > looking into what the test does and got as far as checking out the
> > > > stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> > > > submission does simple reads and writes using increasing offsets.
> > > > So if the test result isn't a fluke either the aio code does some
> > > > weird reordering or btrfs does.
> > > > 
> > > > Oliver, did the test also show any interesting results on non-btrfs
> > > > setups?
> > > > 
> > > 
> > > One thing that came to mind.
> > > Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> > > the I/O scheduler to BFQ for single-queue HDDs.
> > > 
> > > It could very well be the I/O scheduler that reorders.
> > > 
> > > Oliver, which I/O scheduler are you using?
> > > $ cat /sys/block/sdb/queue/scheduler 
> > > none mq-deadline kyber [bfq]
> > 
> > while our test running:
> > 
> > # cat /sys/block/sdb/queue/scheduler
> > none [mq-deadline] kyber bfq
> 
> The stddev numbers you showed is all over the place, so are we certain
> if this is a regression caused by commit e70c301faece ("block:
> don't reorder requests in blk_add_rq_to_plug") ?
> 
> Do you know if the stddev has such big variation for this test even before
> the commit?

in order to address your concern, we rebuild kernels for e70c301fae and its
parent a3396b9999, also for v6.12-rc4. the config is still same as shared
in our original report:
https://download.01.org/0day-ci/archive/20241212/202412122112.ca47bcec-lkp@intel.com/config-6.12.0-rc4-00120-ge70c301faece

(
ok, with a small diff

@@ -19,7 +19,7 @@ CONFIG_GCC_ASM_GOTO_OUTPUT_BROKEN=y
 CONFIG_TOOLS_SUPPORT_RELR=y
 CONFIG_CC_HAS_ASM_INLINE=y
 CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
-CONFIG_PAHOLE_VERSION=127
+CONFIG_PAHOLE_VERSION=128
 CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y

but anyway, the configs for e70c301fae, a3396b9999 and v6.12-rc4 for runs this
time are same
)

then we rerun tests more times than before. (normally we run tests for each
commit 6 times at least, this time, 15 times for each commit. but due to our
bot run some tests more times for various purpose, so we got 16 results for
a3396b9999 and 24 results for v6.12-rc4).

summary results are as below:

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/aiol/stress-ng/60s

commit:
  v6.12-rc4
  a3396b9999 ("block: add a rq_list type")
  e70c301fae ("block: don't reorder requests in blk_add_rq_to_plug")

       v6.12-rc4 a3396b99990d8b4e5797e7b16fd e70c301faece15b618e54b613b1
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    187.64 ±  5%      -0.6%     186.48 ±  7%     -47.6%      98.29 ± 17%  stress-ng.aiol.ops_per_sec

yes, the %stddev is not small, which means the data is not very stable, but
still much better than ext4 or xfs I shared in another mail.

a3396b9999 has very similar results as v6.12-rc4, both stress-ng.aiol.ops_per_sec
score and %stddev, e70c301fae has even bigger %stddev, but since the %change is
bigger, we still think the data is valid.

also list the raw data as below FYI.

e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json:  "stress-ng.aiol.ops_per_sec": [
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    112.31,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    76.2,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    111.71,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    109.43,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    106.02,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    101.38,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    109.94,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    103.79,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    116.88,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    61.9,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    104.32,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    100.19,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    87.52,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    64.11,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    108.66
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-  ],


a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json:  "stress-ng.aiol.ops_per_sec": [
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    175.57,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    182.85,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    192.2,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    191.0,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    178.72,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    174.45,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    196.39,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    196.96,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    142.27,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    196.54,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    193.73,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    193.85,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    194.63,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    184.75,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    196.38,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    193.44
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-  ],


v6.12-rc4/matrix.json:  "stress-ng.aiol.ops_per_sec": [
v6.12-rc4/matrix.json-    193.49,
v6.12-rc4/matrix.json-    185.16,
v6.12-rc4/matrix.json-    187.78,
v6.12-rc4/matrix.json-    187.25,
v6.12-rc4/matrix.json-    193.65,
v6.12-rc4/matrix.json-    195.93,
v6.12-rc4/matrix.json-    194.48,
v6.12-rc4/matrix.json-    190.68,
v6.12-rc4/matrix.json-    195.61,
v6.12-rc4/matrix.json-    188.66,
v6.12-rc4/matrix.json-    181.88,
v6.12-rc4/matrix.json-    195.75,
v6.12-rc4/matrix.json-    187.21,
v6.12-rc4/matrix.json-    194.72,
v6.12-rc4/matrix.json-    159.46,
v6.12-rc4/matrix.json-    188.29,
v6.12-rc4/matrix.json-    189.42,
v6.12-rc4/matrix.json-    194.79,
v6.12-rc4/matrix.json-    192.25,
v6.12-rc4/matrix.json-    153.43,
v6.12-rc4/matrix.json-    196.16,
v6.12-rc4/matrix.json-    168.24,
v6.12-rc4/matrix.json-    193.78,
v6.12-rc4/matrix.json-    195.33
v6.12-rc4/matrix.json-  ],


the full comparison is as beloe [1] FYI.

> 
> 
> If it is not too much to ask... It might be interesting to know if we see
> a regression when comparing before/after e70c301faece with scheduler none
> instead of mq-deadline.

ok, we can do this. but due to resource constraint and our current priority,
please don't expect that we can supply results soon. thanks

> 
> 
> Kind regards,
> Niklas
> 


[1]
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/aiol/stress-ng/60s

commit:
  v6.12-rc4
  a3396b9999 ("block: add a rq_list type")
  e70c301fae ("block: don't reorder requests in blk_add_rq_to_plug")

       v6.12-rc4 a3396b99990d8b4e5797e7b16fd e70c301faece15b618e54b613b1
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    642.17 ± 27%     -14.3%     550.12 ± 39%     -56.2%     281.53 ± 47%  perf-c2c.HITM.local
     45.74            +0.1%      45.79 ±  2%     -10.1%      41.12 ±  2%  iostat.cpu.idle
     53.09            -0.2%      52.99            +8.9%      57.80        iostat.cpu.iowait
     44.17 ±  2%      +0.0       44.20 ±  2%      -4.8       39.36 ±  2%  mpstat.cpu.all.idle%
      0.46 ±  4%      +0.0        0.48 ±  9%      -0.1        0.32 ±  8%  mpstat.cpu.all.sys%
     14022 ± 33%      +5.7%      14817 ± 21%     -57.5%       5953 ± 15%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
      3698 ± 32%     +11.6%       4126 ± 30%     -63.0%       1367 ± 27%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
     14683 ±  5%      -0.2%      14651 ±  7%     -46.3%       7886 ± 17%  vmstat.io.bi
     16122 ±  5%      -0.2%      16087 ±  7%     -45.4%       8796 ± 16%  vmstat.io.bo
      7237 ± 22%      -7.4%       6699 ± 20%     -54.5%       3295 ± 22%  meminfo.Dirty
     24305 ± 15%      -1.1%      24025 ± 17%     -40.6%      14427 ± 14%  meminfo.Inactive
     24305 ± 15%      -1.1%      24025 ± 17%     -40.6%      14427 ± 14%  meminfo.Inactive(file)
   1997820 ±  6%      -1.2%    1974059 ±  7%     -47.9%    1040100 ± 16%  time.file_system_inputs
   2008181 ±  6%      -1.2%    1984688 ±  7%     -47.5%    1054950 ± 16%  time.file_system_outputs
      1893 ±  9%      -2.6%       1844 ±  8%     -25.9%       1403 ± 14%  time.involuntary_context_switches
      0.37 ±  6%      +2.6%       0.38 ±  9%     -42.9%       0.21 ± 14%  time.user_time
    125.95 ±  5%      -0.6%     125.18 ±  7%     -47.2%      66.52 ± 17%  stress-ng.aiol.async_I/O_events_completed_per_sec
     11574 ±  6%      -1.2%      11430 ±  7%     -48.5%       5955 ± 17%  stress-ng.aiol.ops
    187.64 ±  5%      -0.6%     186.48 ±  7%     -47.6%      98.29 ± 17%  stress-ng.aiol.ops_per_sec
   1997820 ±  6%      -1.2%    1974059 ±  7%     -47.9%    1040100 ± 16%  stress-ng.time.file_system_inputs
   2008181 ±  6%      -1.2%    1984688 ±  7%     -47.5%    1054950 ± 16%  stress-ng.time.file_system_outputs
      4333 ± 24%     -10.5%       3877 ± 20%     -57.2%       1853 ± 19%  numa-meminfo.node0.Dirty
     14414 ± 15%      -5.4%      13632 ± 20%     -44.4%       8019 ± 19%  numa-meminfo.node0.Inactive
     14414 ± 15%      -5.4%      13632 ± 20%     -44.4%       8019 ± 19%  numa-meminfo.node0.Inactive(file)
      2916 ± 22%      -2.9%       2833 ± 24%     -50.7%       1439 ± 28%  numa-meminfo.node1.Dirty
      9940 ± 18%      +4.8%      10419 ± 17%     -35.0%       6462 ± 21%  numa-meminfo.node1.Inactive
      9940 ± 18%      +4.8%      10419 ± 17%     -35.0%       6462 ± 21%  numa-meminfo.node1.Inactive(file)
      1.15 ±  4%      -1.0%       1.14 ±  3%     +17.0%       1.35 ±  6%  perf-stat.i.MPKI
  14638106 ±  3%      +1.9%   14923264 ±  3%     -10.8%   13053409 ±  8%  perf-stat.i.cache-references
      4.25 ±  4%      +0.0        4.27            +0.4        4.63 ±  2%  perf-stat.overall.branch-miss-rate%
      0.95            +0.7%       0.95 ±  2%      -4.8%       0.90        perf-stat.overall.cpi
      1.06            -0.6%       1.05 ±  2%      +5.0%       1.11        perf-stat.overall.ipc
  14448948 ±  3%      +2.0%   14733547 ±  3%     -10.8%   12885674 ±  8%  perf-stat.ps.cache-references
 3.675e+08 ± 23%     -12.5%  3.215e+08 ± 38%    +119.8%  8.077e+08 ± 14%  sched_debug.cfs_rq:/.avg_vruntime.avg
 8.117e+08 ± 23%     -14.5%  6.942e+08 ± 38%    +111.1%  1.714e+09 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.max
 2.135e+08 ± 23%     -11.3%  1.893e+08 ± 38%     +94.6%  4.155e+08 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.stddev
 3.675e+08 ± 23%     -12.5%  3.215e+08 ± 38%    +119.8%  8.077e+08 ± 14%  sched_debug.cfs_rq:/.min_vruntime.avg
 8.117e+08 ± 23%     -14.5%  6.942e+08 ± 38%    +111.1%  1.714e+09 ±  9%  sched_debug.cfs_rq:/.min_vruntime.max
 2.135e+08 ± 23%     -11.3%  1.893e+08 ± 38%     +94.6%  4.155e+08 ±  7%  sched_debug.cfs_rq:/.min_vruntime.stddev
     10720 ± 16%      -7.9%       9870 ± 19%     -48.9%       5475 ± 18%  numa-vmstat.node0.nr_dirtied
      1086 ± 24%     -10.6%     972.03 ± 20%     -57.2%     464.99 ± 19%  numa-vmstat.node0.nr_dirty
    124767 ± 19%      +8.2%     134943 ± 25%     -48.8%      63821 ± 25%  numa-vmstat.node0.nr_foll_pin_acquired
    124590 ± 19%      +8.1%     134739 ± 25%     -48.9%      63722 ± 25%  numa-vmstat.node0.nr_foll_pin_released
      3627 ± 14%      -5.5%       3428 ± 20%     -44.5%       2014 ± 19%  numa-vmstat.node0.nr_inactive_file
      6842 ± 18%      -5.7%       6453 ± 27%     -44.0%       3829 ± 23%  numa-vmstat.node0.nr_written
      3627 ± 14%      -5.5%       3428 ± 20%     -44.5%       2014 ± 19%  numa-vmstat.node0.nr_zone_inactive_file
      1101 ± 23%     -10.6%     985.18 ± 20%     -57.1%     472.33 ± 18%  numa-vmstat.node0.nr_zone_write_pending
    728.88 ± 22%      -2.4%     711.19 ± 24%     -50.5%     361.14 ± 28%  numa-vmstat.node1.nr_dirty
    132261 ± 21%     -13.7%     114207 ± 24%     -50.6%      65275 ± 24%  numa-vmstat.node1.nr_foll_pin_acquired
    132076 ± 21%     -13.7%     114026 ± 24%     -50.7%      65176 ± 24%  numa-vmstat.node1.nr_foll_pin_released
      2502 ± 18%      +4.9%       2624 ± 17%     -35.1%       1623 ± 21%  numa-vmstat.node1.nr_inactive_file
      2502 ± 18%      +4.9%       2624 ± 17%     -35.1%       1623 ± 21%  numa-vmstat.node1.nr_zone_inactive_file
    738.83 ± 22%      -2.5%     720.39 ± 24%     -50.4%     366.25 ± 28%  numa-vmstat.node1.nr_zone_write_pending
     17946 ± 11%      -1.0%      17775 ± 16%     -43.7%      10111 ± 19%  proc-vmstat.nr_dirtied
      1813 ± 22%      -7.4%       1679 ± 20%     -54.6%     823.72 ± 22%  proc-vmstat.nr_dirty
    256770 ± 14%      -3.1%     248901 ± 13%     -49.7%     129104 ± 12%  proc-vmstat.nr_foll_pin_acquired
    256411 ± 14%      -3.1%     248510 ± 13%     -49.7%     128908 ± 12%  proc-vmstat.nr_foll_pin_released
      6098 ± 14%      -1.3%       6022 ± 17%     -40.6%       3623 ± 14%  proc-vmstat.nr_inactive_file
     47060            -0.1%      47015            -2.7%      45813        proc-vmstat.nr_slab_unreclaimable
     11219 ± 16%      +1.6%      11404 ± 24%     -38.2%       6928 ± 25%  proc-vmstat.nr_written
      6098 ± 14%      -1.3%       6022 ± 17%     -40.6%       3623 ± 14%  proc-vmstat.nr_zone_inactive_file
      1838 ± 22%      -7.4%       1702 ± 20%     -54.6%     834.16 ± 22%  proc-vmstat.nr_zone_write_pending
    420018            +0.2%     420980            -3.6%     404817        proc-vmstat.numa_hit
    353786            +0.3%     354765 ±  2%      -4.3%     338592 ±  2%  proc-vmstat.numa_local
    464103            +0.3%     465694            -4.0%     445644        proc-vmstat.pgalloc_normal
   1000691 ±  6%      -1.2%     989162 ±  7%     -47.8%     522520 ± 17%  proc-vmstat.pgpgin
   1094318 ±  6%      -1.0%    1083461 ±  8%     -46.7%     582829 ± 16%  proc-vmstat.pgpgout
     19.24 ± 19%      -2.2       17.01 ± 29%     -10.7        8.50 ± 17%  perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_drop_extents.insert_reserved_file_extent.btrfs_finish_one_ordered.btrfs_work_helper
     19.22 ± 19%      -2.2       16.99 ± 29%     -10.7        8.49 ± 17%  perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_drop_extents.insert_reserved_file_extent.btrfs_finish_one_ordered
     15.59 ± 22%      -1.9       13.74 ± 30%      -9.2        6.38 ± 20%  perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested.btrfs_lock_root_node.btrfs_search_slot
     15.21 ± 22%      -1.8       13.44 ± 31%      -8.9        6.32 ± 21%  perf-profile.calltrace.cycles-pp.btrfs_lock_root_node.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_drop_extents.insert_reserved_file_extent
     15.16 ± 22%      -1.8       13.40 ± 31%      -8.9        6.29 ± 21%  perf-profile.calltrace.cycles-pp.btrfs_tree_lock_nested.btrfs_lock_root_node.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_drop_extents
     15.16 ± 22%      -1.8       13.40 ± 31%      -8.9        6.29 ± 21%  perf-profile.calltrace.cycles-pp.down_write.btrfs_tree_lock_nested.btrfs_lock_root_node.btrfs_search_slot.btrfs_lookup_file_extent
     15.32 ± 22%      -1.7       13.59 ± 30%      -9.0        6.28 ± 20%  perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested.btrfs_lock_root_node
     11.15 ± 26%      -1.4        9.72 ± 34%      -6.7        4.41 ± 24%  perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
      5.03 ± 15%      -0.5        4.48 ± 24%      -2.9        2.15 ± 21%  perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested
      1.64 ± 19%      -0.3        1.39 ± 29%      -1.0        0.62 ± 42%  perf-profile.calltrace.cycles-pp.btrfs_tree_lock_nested.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_drop_extents.insert_reserved_file_extent
      1.64 ± 19%      -0.3        1.39 ± 29%      -1.0        0.62 ± 42%  perf-profile.calltrace.cycles-pp.down_write.btrfs_tree_lock_nested.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_drop_extents
      3.07 ± 15%      -0.2        2.82 ± 21%      +2.1        5.21 ± 22%  perf-profile.calltrace.cycles-pp.__schedule.schedule.worker_thread.kthread.ret_from_fork
      1.61 ± 20%      -0.2        1.37 ± 29%      -1.0        0.58 ± 52%  perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested.btrfs_search_slot.btrfs_lookup_file_extent
      1.63 ± 18%      -0.2        1.39 ± 27%      -1.1        0.56 ± 52%  perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.btrfs_tree_lock_nested.btrfs_search_slot
      3.10 ± 14%      -0.2        2.86 ± 21%      +2.2        5.26 ± 22%  perf-profile.calltrace.cycles-pp.schedule.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.39 ± 21%      -0.2        2.17 ± 29%      +1.4        3.78 ± 21%  perf-profile.calltrace.cycles-pp.sched_balance_newidle.pick_next_task_fair.__pick_next_task.__schedule.schedule
      2.27 ± 15%      -0.2        2.08 ± 22%      +1.7        3.94 ± 21%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.worker_thread.kthread
      2.24 ± 16%      -0.2        2.06 ± 21%      +1.7        3.91 ± 21%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.worker_thread
      1.25 ± 15%      -0.2        1.10 ± 33%      +0.9        2.15 ± 22%  perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work
      2.84 ± 15%      -0.1        2.70 ± 22%      +2.0        4.82 ± 23%  perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.btrfs_submit_chunk.btrfs_submit_bbio.iomap_dio_bio_iter.__iomap_dio_rw
      2.34 ± 17%      -0.1        2.20 ± 26%      +1.7        4.05 ± 23%  perf-profile.calltrace.cycles-pp.__blk_mq_alloc_requests.blk_mq_submit_bio.__submit_bio.submit_bio_noacct_nocheck.btrfs_submit_chunk
      2.81 ± 15%      -0.1        2.67 ± 22%      +2.0        4.78 ± 23%  perf-profile.calltrace.cycles-pp.blk_mq_submit_bio.__submit_bio.submit_bio_noacct_nocheck.btrfs_submit_chunk.btrfs_submit_bbio
      2.82 ± 15%      -0.1        2.68 ± 22%      +2.0        4.79 ± 23%  perf-profile.calltrace.cycles-pp.__submit_bio.submit_bio_noacct_nocheck.btrfs_submit_chunk.btrfs_submit_bbio.iomap_dio_bio_iter
      2.28 ± 17%      -0.1        2.18 ± 26%      +1.8        4.03 ± 23%  perf-profile.calltrace.cycles-pp.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.__submit_bio.submit_bio_noacct_nocheck
      1.72 ± 15%      -0.1        1.65 ± 22%      +1.4        3.10 ± 21%  perf-profile.calltrace.cycles-pp.sched_balance_find_src_group.sched_balance_rq.sched_balance_newidle.pick_next_task_fair.__pick_next_task
      1.69 ± 15%      -0.1        1.62 ± 22%      +1.4        3.05 ± 21%  perf-profile.calltrace.cycles-pp.update_sd_lb_stats.sched_balance_find_src_group.sched_balance_rq.sched_balance_newidle.pick_next_task_fair
      0.99 ± 11%      -0.1        0.92 ± 27%      +0.3        1.29 ± 15%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      1.04 ± 11%      -0.1        0.98 ± 27%      +0.3        1.37 ± 15%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.87 ± 15%      -0.1        1.81 ± 26%      +1.4        3.29 ± 21%  perf-profile.calltrace.cycles-pp.sched_balance_rq.sched_balance_newidle.pick_next_task_fair.__pick_next_task.__schedule
      1.88 ± 12%      -0.1        1.83 ± 21%      +1.3        3.16 ± 19%  perf-profile.calltrace.cycles-pp.btrfs_submit_bbio.iomap_dio_bio_iter.__iomap_dio_rw.btrfs_direct_write.btrfs_do_write_iter
      1.88 ± 12%      -0.0        1.83 ± 21%      +1.3        3.15 ± 19%  perf-profile.calltrace.cycles-pp.btrfs_submit_chunk.btrfs_submit_bbio.iomap_dio_bio_iter.__iomap_dio_rw.btrfs_direct_write
      1.53 ± 15%      -0.0        1.48 ± 23%      +1.2        2.76 ± 21%  perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.sched_balance_find_src_group.sched_balance_rq.sched_balance_newidle
      0.64 ± 80%      +0.1        0.70 ± 70%      +1.3        1.95 ± 27%  perf-profile.calltrace.cycles-pp.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.__submit_bio
      0.41 ±103%      +0.2        0.57 ± 76%      +0.8        1.17 ± 15%  perf-profile.calltrace.cycles-pp._nohz_idle_balance.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     46.71 ±  8%      +0.7       47.37 ± 20%     -13.4       33.31 ±  4%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
     46.71 ±  8%      +0.7       47.37 ± 20%     -13.4       33.31 ±  4%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
     46.71 ±  8%      +0.7       47.37 ± 20%     -13.4       33.31 ±  4%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     31.82 ± 13%      +0.9       32.73 ± 36%     -14.2       17.66 ± 12%  perf-profile.calltrace.cycles-pp.btrfs_work_helper.process_one_work.worker_thread.kthread.ret_from_fork
     31.74 ± 13%      +0.9       32.66 ± 36%     -14.1       17.59 ± 12%  perf-profile.calltrace.cycles-pp.btrfs_finish_one_ordered.btrfs_work_helper.process_one_work.worker_thread.kthread
     22.58 ± 17%      -2.4       20.19 ± 26%     -12.0       10.53 ± 15%  perf-profile.children.cycles-pp.btrfs_search_slot
     19.26 ± 19%      -2.2       17.02 ± 29%     -10.7        8.51 ± 17%  perf-profile.children.cycles-pp.btrfs_lookup_file_extent
     18.23 ± 19%      -2.0       16.20 ± 28%     -10.4        7.78 ± 17%  perf-profile.children.cycles-pp.btrfs_tree_lock_nested
     18.28 ± 19%      -2.0       16.26 ± 27%     -10.4        7.85 ± 17%  perf-profile.children.cycles-pp.down_write
     18.06 ± 19%      -2.0       16.07 ± 28%     -10.3        7.72 ± 17%  perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     17.92 ± 19%      -2.0       15.95 ± 28%     -10.3        7.64 ± 17%  perf-profile.children.cycles-pp.rwsem_optimistic_spin
     16.20 ± 21%      -1.8       14.40 ± 30%      -9.3        6.88 ± 19%  perf-profile.children.cycles-pp.btrfs_lock_root_node
     11.86 ± 24%      -1.4       10.46 ± 32%      -7.0        4.87 ± 21%  perf-profile.children.cycles-pp.osq_lock
      5.44 ± 13%      -0.5        4.94 ± 22%      -3.0        2.49 ± 14%  perf-profile.children.cycles-pp.rwsem_spin_on_owner
      3.63 ± 11%      -0.3        3.31 ± 21%      -1.2        2.43 ± 18%  perf-profile.children.cycles-pp.setup_items_for_insert
      2.61 ± 10%      -0.3        2.34 ± 22%      -0.9        1.74 ± 23%  perf-profile.children.cycles-pp.__memmove
      6.25 ±  9%      -0.2        6.01 ± 16%      +2.7        8.90 ± 10%  perf-profile.children.cycles-pp.__schedule
      4.94 ±  5%      -0.2        4.71 ± 15%      +1.1        6.03 ±  8%  perf-profile.children.cycles-pp.__irq_exit_rcu
      3.13 ± 11%      -0.2        2.93 ± 19%      +1.4        4.56 ± 16%  perf-profile.children.cycles-pp.sched_balance_rq
      5.29 ± 10%      -0.2        5.09 ± 16%      +2.3        7.61 ± 13%  perf-profile.children.cycles-pp.schedule
      3.66 ± 12%      -0.2        3.48 ± 18%      +1.7        5.38 ± 15%  perf-profile.children.cycles-pp.__pick_next_task
      3.48 ± 13%      -0.2        3.30 ± 19%      +1.7        5.14 ± 16%  perf-profile.children.cycles-pp.pick_next_task_fair
      1.70 ± 18%      -0.2        1.53 ± 26%      -0.7        0.95 ± 30%  perf-profile.children.cycles-pp.btrfs_del_items
      2.77 ± 10%      -0.2        2.61 ± 19%      +1.4        4.15 ± 18%  perf-profile.children.cycles-pp.sched_balance_find_src_group
      3.10 ± 13%      -0.2        2.93 ± 19%      +1.6        4.69 ± 18%  perf-profile.children.cycles-pp.sched_balance_newidle
      1.20 ± 14%      -0.2        1.04 ± 21%      -0.5        0.65 ± 24%  perf-profile.children.cycles-pp.btrfs_set_token_32
      2.73 ± 10%      -0.2        2.58 ± 19%      +1.4        4.10 ± 18%  perf-profile.children.cycles-pp.update_sd_lb_stats
      2.85 ± 15%      -0.1        2.70 ± 22%      +2.0        4.82 ± 23%  perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
      2.82 ± 15%      -0.1        2.68 ± 22%      +2.0        4.78 ± 23%  perf-profile.children.cycles-pp.blk_mq_submit_bio
      2.26 ± 20%      -0.1        2.12 ± 22%      -1.0        1.28 ± 23%  perf-profile.children.cycles-pp.btrfs_insert_empty_items
      2.82 ± 15%      -0.1        2.69 ± 22%      +2.0        4.79 ± 23%  perf-profile.children.cycles-pp.__submit_bio
      2.55 ± 11%      -0.1        2.41 ± 19%      +1.2        3.78 ± 18%  perf-profile.children.cycles-pp.update_sg_lb_stats
      1.97 ± 12%      -0.1        1.85 ± 17%      +0.8        2.75 ± 13%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      2.34 ± 17%      -0.1        2.23 ± 22%      +1.7        4.05 ± 23%  perf-profile.children.cycles-pp.__blk_mq_alloc_requests
      0.82 ± 14%      -0.1        0.71 ± 30%      -0.2        0.57 ± 22%  perf-profile.children.cycles-pp.read_block_for_search
      0.69 ± 19%      -0.1        0.59 ± 26%      -0.3        0.40 ± 16%  perf-profile.children.cycles-pp.up_write
      1.31 ± 17%      -0.1        1.22 ± 22%      +1.0        2.30 ± 23%  perf-profile.children.cycles-pp.handle_edge_irq
      1.26 ± 17%      -0.1        1.18 ± 22%      +1.0        2.23 ± 23%  perf-profile.children.cycles-pp.__handle_irq_event_percpu
      1.25 ± 17%      -0.1        1.18 ± 22%      +1.0        2.23 ± 23%  perf-profile.children.cycles-pp.ahci_single_level_irq_intr
      1.36 ± 17%      -0.1        1.29 ± 22%      +1.1        2.42 ± 22%  perf-profile.children.cycles-pp.__sysvec_posted_msi_notification
      1.30 ± 17%      -0.1        1.22 ± 22%      +1.0        2.29 ± 22%  perf-profile.children.cycles-pp.handle_irq_event
      2.29 ± 17%      -0.1        2.22 ± 22%      +1.8        4.04 ± 23%  perf-profile.children.cycles-pp.blk_mq_get_tag
      0.60 ± 18%      -0.1        0.53 ± 21%      -0.2        0.39 ± 24%  perf-profile.children.cycles-pp.aio_complete_rw
      0.90 ± 15%      -0.1        0.84 ± 27%      +0.8        1.65 ± 23%  perf-profile.children.cycles-pp.scsi_mq_get_budget
      0.51 ± 21%      -0.1        0.45 ± 27%      -0.2        0.28 ± 25%  perf-profile.children.cycles-pp.rwsem_wake
      0.39 ± 21%      -0.1        0.34 ± 21%      -0.2        0.19 ± 35%  perf-profile.children.cycles-pp.pwq_dec_nr_in_flight
      1.15 ± 15%      -0.0        1.10 ± 26%      +0.9        2.08 ± 23%  perf-profile.children.cycles-pp.sbitmap_find_bit
      0.95 ± 15%      -0.0        0.91 ± 21%      +0.3        1.30 ±  5%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
      0.48 ± 17%      -0.0        0.44 ± 25%      -0.2        0.30 ± 19%  perf-profile.children.cycles-pp.btrfs_unlock_up_safe
      0.18 ± 30%      -0.0        0.15 ± 29%      -0.1        0.07 ± 71%  perf-profile.children.cycles-pp.node_activate_pending_pwq
      0.57 ± 26%      -0.0        0.55 ± 27%      +0.5        1.10 ± 33%  perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
      1.03 ± 20%      -0.0        1.01 ± 24%      +1.0        1.99 ± 25%  perf-profile.children.cycles-pp.io_schedule
      0.57 ± 26%      -0.0        0.55 ± 27%      +0.5        1.11 ± 33%  perf-profile.children.cycles-pp.blk_mq_flush_plug_list
      0.58 ± 26%      -0.0        0.56 ± 27%      +0.5        1.11 ± 32%  perf-profile.children.cycles-pp.__blk_flush_plug
      0.89 ± 13%      -0.0        0.87 ± 21%      +0.4        1.25 ± 12%  perf-profile.children.cycles-pp.sched_balance_update_blocked_averages
      0.09 ± 34%      -0.0        0.08 ± 55%      +0.1        0.19 ± 26%  perf-profile.children.cycles-pp.prepare_to_wait_exclusive
      1.65 ± 17%      -0.0        1.64 ± 26%      +0.8        2.41 ±  9%  perf-profile.children.cycles-pp._nohz_idle_balance
      0.46 ± 17%      -0.0        0.45 ± 23%      +0.2        0.69 ± 15%  perf-profile.children.cycles-pp.__get_next_timer_interrupt
      0.61 ± 14%      -0.0        0.61 ± 17%      +0.2        0.82 ± 11%  perf-profile.children.cycles-pp.dequeue_entity
      0.56 ± 14%      +0.0        0.57 ± 23%      +0.2        0.78 ± 11%  perf-profile.children.cycles-pp.update_load_avg
      0.37 ± 16%      +0.0        0.40 ± 33%      +0.2        0.53 ± 10%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.47 ± 15%      +0.1        0.52 ± 28%      +0.2        0.67 ± 11%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
     46.71 ±  8%      +0.7       47.37 ± 20%     -13.4       33.31 ±  4%  perf-profile.children.cycles-pp.kthread
     46.73 ±  8%      +0.7       47.40 ± 20%     -13.4       33.35 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
     46.74 ±  8%      +0.7       47.41 ± 20%     -13.4       33.36 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
     31.82 ± 13%      +0.9       32.73 ± 36%     -14.2       17.66 ± 12%  perf-profile.children.cycles-pp.btrfs_work_helper
     31.74 ± 13%      +0.9       32.66 ± 36%     -14.2       17.59 ± 12%  perf-profile.children.cycles-pp.btrfs_finish_one_ordered
      2.44 ± 12%      +3.7        6.16 ±239%      -0.9        1.51 ± 18%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     11.82 ± 24%      -1.4       10.42 ± 32%      -7.0        4.85 ± 21%  perf-profile.self.cycles-pp.osq_lock
      5.31 ± 13%      -0.5        4.82 ± 22%      -2.9        2.44 ± 14%  perf-profile.self.cycles-pp.rwsem_spin_on_owner
      2.59 ± 10%      -0.3        2.33 ± 22%      -0.9        1.73 ± 23%  perf-profile.self.cycles-pp.__memmove
      1.09 ± 14%      -0.1        0.97 ± 21%      -0.5        0.60 ± 27%  perf-profile.self.cycles-pp.btrfs_set_token_32
      1.77 ± 12%      -0.1        1.66 ± 17%      +0.8        2.55 ± 14%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.23 ± 22%      -0.0        0.19 ± 27%      -0.1        0.10 ± 44%  perf-profile.self.cycles-pp.__set_extent_bit
      0.41 ± 25%      -0.0        0.38 ± 23%      -0.2        0.21 ± 30%  perf-profile.self.cycles-pp.btrfs_search_slot
      0.34 ± 29%      -0.0        0.30 ± 24%      -0.1        0.20 ± 27%  perf-profile.self.cycles-pp.btrfs_get_32
      0.09 ± 40%      -0.0        0.08 ± 59%      +0.1        0.17 ± 33%  perf-profile.self.cycles-pp.fold_diff
      0.25 ± 29%      -0.0        0.25 ± 26%      +0.2        0.46 ± 26%  perf-profile.self.cycles-pp.ahci_single_level_irq_intr
      2.44 ± 12%      +3.7        6.14 ±239%      -0.9        1.50 ± 18%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-08 10:39                 ` Niklas Cassel
  2025-01-10  6:53                   ` Oliver Sang
@ 2025-01-14  6:45                   ` Oliver Sang
  1 sibling, 0 replies; 11+ messages in thread
From: Oliver Sang @ 2025-01-14  6:45 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio, oliver.sang

hi, Niklas,

On Wed, Jan 08, 2025 at 11:39:28AM +0100, Niklas Cassel wrote:
> On Tue, Jan 07, 2025 at 04:27:44PM +0800, Oliver Sang wrote:
> > hi, Niklas,
> > 
> > On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> > > On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> > > > On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > > > > please let us know. thanks a lot!
> > > > > > 
> > > > > > Yes, this looks like ahci.  Thanks a lot!
> > > > > 
> > > > > Did this ever get resolved?
> > > > > 
> > > > > I haven't seen a patch that seems to address this.
> > > > > 
> > > > > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > > > > reordering when issuing a batch of commands, my guess is that the problem
> > > > > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > > > > or block layer.
> > > > 
> > > > I started looking into this before the holidays.  blktrace shows perfectly
> > > > sequential writes without any reordering using ahci, directly on the
> > > > block device or using xfs and btrfs when using dd.  I also started
> > > > looking into what the test does and got as far as checking out the
> > > > stress-ng source tree and looking at stress-aiol.c.  AFAICS the default
> > > > submission does simple reads and writes using increasing offsets.
> > > > So if the test result isn't a fluke either the aio code does some
> > > > weird reordering or btrfs does.
> > > > 
> > > > Oliver, did the test also show any interesting results on non-btrfs
> > > > setups?
> > > > 
> > > 
> > > One thing that came to mind.
> > > Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> > > the I/O scheduler to BFQ for single-queue HDDs.
> > > 
> > > It could very well be the I/O scheduler that reorders.
> > > 
> > > Oliver, which I/O scheduler are you using?
> > > $ cat /sys/block/sdb/queue/scheduler 
> > > none mq-deadline kyber [bfq]
> > 
> > while our test running:
> > 
> > # cat /sys/block/sdb/queue/scheduler
> > none [mq-deadline] kyber bfq
> 
> The stddev numbers you showed is all over the place, so are we certain
> if this is a regression caused by commit e70c301faece ("block:
> don't reorder requests in blk_add_rq_to_plug") ?
> 
> Do you know if the stddev has such big variation for this test even before
> the commit?
> 
> 
> If it is not too much to ask... It might be interesting to know if we see
> a regression when comparing before/after e70c301faece with scheduler none
> instead of mq-deadline.

we also finished the test for scheduler none, run v6.12-rc4, before/after
e70c301faece 15 times for each. it seems the data is not stable enough under
scheduler none, only can say there is a regression trend if comparing
e70c301faece to its parent.

=========================================================================================
compiler/cpufreq_governor/debug-setup/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/none-scheduler/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/aiol/stress-ng/60s

commit:
  v6.12-rc4
  a3396b9999 ("block: add a rq_list type")
  e70c301fae ("block: don't reorder requests in blk_add_rq_to_plug")

       v6.12-rc4 a3396b99990d8b4e5797e7b16fd e70c301faece15b618e54b613b1
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    114.62 ± 19%      -1.9%     112.49 ± 17%     -32.4%      77.47 ± 21%  stress-ng.aiol.ops_per_sec


raw data is as below:

v6.12-rc4/matrix.json:  "stress-ng.aiol.ops_per_sec": [
v6.12-rc4/matrix.json-    108.03,
v6.12-rc4/matrix.json-    108.4,
v6.12-rc4/matrix.json-    109.11,
v6.12-rc4/matrix.json-    109.58,
v6.12-rc4/matrix.json-    194.21,
v6.12-rc4/matrix.json-    111.53,
v6.12-rc4/matrix.json-    107.99,
v6.12-rc4/matrix.json-    115.29,
v6.12-rc4/matrix.json-    105.75,
v6.12-rc4/matrix.json-    113.62,
v6.12-rc4/matrix.json-    96.51,
v6.12-rc4/matrix.json-    110.53,
v6.12-rc4/matrix.json-    108.71,
v6.12-rc4/matrix.json-    98.06,
v6.12-rc4/matrix.json-    121.95
v6.12-rc4/matrix.json-  ],


a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json:  "stress-ng.aiol.ops_per_sec": [
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    116.65,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    106.51,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    119.23,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    108.91,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    111.79,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    111.81,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    114.94,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    99.49,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    106.13,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    124.99,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    174.15,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    92.65,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    113.05,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    75.97,
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-    111.05
a3396b99990d8b4e5797e7b16fdeb64c15ae97bb/matrix.json-  ],


e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json:  "stress-ng.aiol.ops_per_sec": [
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    85.2,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    72.6,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    73.49,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    69.03,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    66.9,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    90.24,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    66.88,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    71.53,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    56.86,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    63.49,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    97.99,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    69.28,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    58.52,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    114.23,
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-    105.79
e70c301faece15b618e54b613b1fd6ece3dd05b4/matrix.json-  ],


since not sure if it's valuable, I didn't list the full comparison table here.
if you want it, please let us know. thanks!

> 
> 
> Kind regards,
> Niklas
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-10  6:53                   ` Oliver Sang
@ 2025-01-15 11:42                     ` Niklas Cassel
  2025-01-16  6:37                       ` Oliver Sang
  0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2025-01-15 11:42 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio

Hello Oliver,

On Fri, Jan 10, 2025 at 02:53:08PM +0800, Oliver Sang wrote:
> On Wed, Jan 08, 2025 at 11:39:28AM +0100, Niklas Cassel wrote:
> > > > Oliver, which I/O scheduler are you using?
> > > > $ cat /sys/block/sdb/queue/scheduler 
> > > > none mq-deadline kyber [bfq]
> > > 
> > > while our test running:
> > > 
> > > # cat /sys/block/sdb/queue/scheduler
> > > none [mq-deadline] kyber bfq
> > 
> > The stddev numbers you showed is all over the place, so are we certain
> > if this is a regression caused by commit e70c301faece ("block:
> > don't reorder requests in blk_add_rq_to_plug") ?
> > 
> > Do you know if the stddev has such big variation for this test even before
> > the commit?
> 
> in order to address your concern, we rebuild kernels for e70c301fae and its
> parent a3396b9999, also for v6.12-rc4. the config is still same as shared
> in our original report:
> https://download.01.org/0day-ci/archive/20241212/202412122112.ca47bcec-lkp@intel.com/config-6.12.0-rc4-00120-ge70c301faece

Thank you for putting in the work to do some extra tests.

(Doing performance regression testing is really important IMO,
as without it you are essentially in the blind.
Thank you guys for taking on the role of this important work!)


Looking at the extended number of iterations that you've in this email,
it is quite clear that e70c301faece, at least with the workload provided
by stress-ng + mq-deadline, introduced a regression:

       v6.12-rc4 a3396b99990d8b4e5797e7b16fd e70c301faece15b618e54b613b1
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    187.64 ±  5%      -0.6%     186.48 ±  7%     -47.6%      98.29 ± 17%  stress-ng.aiol.ops_per_sec




Looking at your results from stress-ng + none scheduler:

         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    114.62 ± 19%      -1.9%     112.49 ± 17%     -32.4%      77.47 ± 21%  stress-ng.aiol.ops_per_sec


Which shows a change, but -32% rather than -47%, also seems to suggest a
regression for the stress-ng workload.




Looking closer at the raw number for stress-ng + none scheduler, in your
other email, it seems clear that the raw values from the stress-ng workload
can vary quite a lot. In the long run, I wonder if we perhaps can find a
workload that has less variation. E.g. fio test for IOPS and fio test for
throughout. But perhaps such workloads are already part of lkp-tests?


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-15 11:42                     ` Niklas Cassel
@ 2025-01-16  6:37                       ` Oliver Sang
  2025-01-16 10:04                         ` Niklas Cassel
  0 siblings, 1 reply; 11+ messages in thread
From: Oliver Sang @ 2025-01-16  6:37 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio, oliver.sang

hi, Niklas,

On Wed, Jan 15, 2025 at 12:42:33PM +0100, Niklas Cassel wrote:
> Hello Oliver,
> 
> On Fri, Jan 10, 2025 at 02:53:08PM +0800, Oliver Sang wrote:
> > On Wed, Jan 08, 2025 at 11:39:28AM +0100, Niklas Cassel wrote:
> > > > > Oliver, which I/O scheduler are you using?
> > > > > $ cat /sys/block/sdb/queue/scheduler 
> > > > > none mq-deadline kyber [bfq]
> > > > 
> > > > while our test running:
> > > > 
> > > > # cat /sys/block/sdb/queue/scheduler
> > > > none [mq-deadline] kyber bfq
> > > 
> > > The stddev numbers you showed is all over the place, so are we certain
> > > if this is a regression caused by commit e70c301faece ("block:
> > > don't reorder requests in blk_add_rq_to_plug") ?
> > > 
> > > Do you know if the stddev has such big variation for this test even before
> > > the commit?
> > 
> > in order to address your concern, we rebuild kernels for e70c301fae and its
> > parent a3396b9999, also for v6.12-rc4. the config is still same as shared
> > in our original report:
> > https://download.01.org/0day-ci/archive/20241212/202412122112.ca47bcec-lkp@intel.com/config-6.12.0-rc4-00120-ge70c301faece
> 
> Thank you for putting in the work to do some extra tests.
> 
> (Doing performance regression testing is really important IMO,
> as without it you are essentially in the blind.
> Thank you guys for taking on the role of this important work!)
> 
> 
> Looking at the extended number of iterations that you've in this email,
> it is quite clear that e70c301faece, at least with the workload provided
> by stress-ng + mq-deadline, introduced a regression:
> 
>        v6.12-rc4 a3396b99990d8b4e5797e7b16fd e70c301faece15b618e54b613b1
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     187.64 ±  5%      -0.6%     186.48 ±  7%     -47.6%      98.29 ± 17%  stress-ng.aiol.ops_per_sec
> 
> 
> 
> 
> Looking at your results from stress-ng + none scheduler:
> 
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     114.62 ± 19%      -1.9%     112.49 ± 17%     -32.4%      77.47 ± 21%  stress-ng.aiol.ops_per_sec
> 
> 
> Which shows a change, but -32% rather than -47%, also seems to suggest a
> regression for the stress-ng workload.
> 
> 
> 
> 
> Looking closer at the raw number for stress-ng + none scheduler, in your
> other email, it seems clear that the raw values from the stress-ng workload
> can vary quite a lot. In the long run, I wonder if we perhaps can find a
> workload that has less variation. E.g. fio test for IOPS and fio test for
> throughout. But perhaps such workloads are already part of lkp-tests?

yes, we have fio tests [1].
as in [2], we get it from https://github.com/axboe/fio
not sure if it's just the fio you mentioned?

our framework is basically automatic. bot merged repo/branches it monitors
into so-called hourly kernel, then if found performance difference with base,
bisect will be triggered to capture which commit causes the change.

due to resource constraint, we cannot allot all testsuites (we have around 80)
to all platforms, and there are other various reasons which could cause us to
miss some performance differences.

if you have interests, could you help check those fio-basic-*.yaml files under
[3]? if you can spot out the correct case, we could do more tests to check
e70c301fae and its parent. thanks!

[1] https://github.com/intel/lkp-tests/tree/master/programs/fio
[2] https://github.com/intel/lkp-tests/blob/master/programs/fio/pkg/PKGBUILD
[3] https://github.com/intel/lkp-tests/tree/master/jobs

> 
> 
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linus:master] [block]  e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression
  2025-01-16  6:37                       ` Oliver Sang
@ 2025-01-16 10:04                         ` Niklas Cassel
  0 siblings, 0 replies; 11+ messages in thread
From: Niklas Cassel @ 2025-01-16 10:04 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Christoph Hellwig, oe-lkp, lkp, linux-kernel, Jens Axboe,
	linux-block, virtualization, linux-nvme, Damien Le Moal,
	linux-btrfs, linux-aio

On Thu, Jan 16, 2025 at 02:37:08PM +0800, Oliver Sang wrote:
> On Wed, Jan 15, 2025 at 12:42:33PM +0100, Niklas Cassel wrote:
> > 
> > Looking closer at the raw number for stress-ng + none scheduler, in your
> > other email, it seems clear that the raw values from the stress-ng workload
> > can vary quite a lot. In the long run, I wonder if we perhaps can find a
> > workload that has less variation. E.g. fio test for IOPS and fio test for
> > throughout. But perhaps such workloads are already part of lkp-tests?
> 
> yes, we have fio tests [1].
> as in [2], we get it from https://github.com/axboe/fio
> not sure if it's just the fio you mentioned?

Yes, that's the one :)


> 
> our framework is basically automatic. bot merged repo/branches it monitors
> into so-called hourly kernel, then if found performance difference with base,
> bisect will be triggered to capture which commit causes the change.
> 
> due to resource constraint, we cannot allot all testsuites (we have around 80)
> to all platforms, and there are other various reasons which could cause us to
> miss some performance differences.
> 
> if you have interests, could you help check those fio-basic-*.yaml files under
> [3]? if you can spot out the correct case, we could do more tests to check
> e70c301fae and its parent. thanks!
> 
> [1] https://github.com/intel/lkp-tests/tree/master/programs/fio
> [2] https://github.com/intel/lkp-tests/blob/master/programs/fio/pkg/PKGBUILD
> [3] https://github.com/intel/lkp-tests/tree/master/jobs

I'm probably not the best qualified person to review this, would be nice if e.g.
Jens himself (or others block layer folks) could have a look at these.

What I can see is:
https://github.com/intel/lkp-tests/blob/master/jobs/fio-basic-local-disk.yaml

seems to do:
    - randrw

but only on for SSDs, not HDDs, and only on ext4.



https://github.com/intel/lkp-tests/blob/master/jobs/fio-basic-1hdd-write.yaml

does test ext4, btrfs, and xfs,
but it does not do randrw.


What are the thresholds for these tests counting as a regression?
Are you comparing BW, or IOPS, or both?

Looking at:
https://github.com/intel/lkp-tests/blob/master/programs/fio/parse

It seems to produce points for:
bw_MBps
iops
total_ios
clat_mean_ns
clat_stddev
slat_mean_us
slat_stddev
and more.

So it does seem to compare BW, IOPS, total IOs, which is what I was looking
for.

Possibly even too much, as enabling too much logging will actually affect
the results, since you need to write way more output to the logs.

But again, Jens (and other block layer folks) are the experts.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-01-16 10:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <202412122112.ca47bcec-lkp@intel.com>
     [not found] ` <20241213143224.GA16111@lst.de>
     [not found]   ` <20241217045527.GA16091@lst.de>
     [not found]     ` <Z2EgW8/WNfzZ28mn@xsang-OptiPlex-9020>
     [not found]       ` <20241217065614.GA19113@lst.de>
     [not found]         ` <Z3ZhNYHKZPMpv8Cz@ryzen>
2025-01-03  6:49           ` [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression Christoph Hellwig
2025-01-03  9:09             ` Niklas Cassel
2025-01-06  7:21               ` Christoph Hellwig
2025-01-07  8:27               ` Oliver Sang
2025-01-08 10:39                 ` Niklas Cassel
2025-01-10  6:53                   ` Oliver Sang
2025-01-15 11:42                     ` Niklas Cassel
2025-01-16  6:37                       ` Oliver Sang
2025-01-16 10:04                         ` Niklas Cassel
2025-01-14  6:45                   ` Oliver Sang
2025-01-07  8:26             ` Oliver Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox