All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<io-uring@vger.kernel.org>, <oliver.sang@intel.com>
Subject: [axboe-block:io_uring-defer-tw.4] [io_uring]  61a5e20297: stress-ng.io-uring.ops_per_sec 41.9% regression
Date: Tue, 1 Jul 2025 12:47:55 +0800	[thread overview]
Message-ID: <202507010550.2d6f83ea-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 41.9% regression of stress-ng.io-uring.ops_per_sec on:


commit: 61a5e202971d4a242fc761728e89922edde02d38 ("io_uring: switch defer task_work to using a ring")
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git io_uring-defer-tw.4

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: io-uring
	cpufreq_governor: performance


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202507010550.2d6f83ea-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250701/202507010550.2d6f83ea-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp2/io-uring/stress-ng/60s

commit: 
  8559f3b41f ("io_uring: make task_work pending check dependent on ring type")
  61a5e20297 ("io_uring: switch defer task_work to using a ring")

8559f3b41fdcdd01 61a5e202971d4a242fc761728e8 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1022268 ±  2%     -30.4%     711175        meminfo.Mapped
 7.478e+09           +30.1%  9.727e+09        cpuidle..time
  3.03e+08           -20.9%  2.398e+08 ±  3%  cpuidle..usage
    696425 ±171%    +181.2%    1958387 ± 81%  numa-meminfo.node0.Unevictable
    940879 ± 10%     -32.9%     631792 ± 14%  numa-meminfo.node1.Mapped
     43.50 ± 20%     -73.9%      11.33 ± 53%  perf-c2c.DRAM.local
     32749 ± 10%     -86.3%       4475 ± 29%  perf-c2c.HITM.local
     33251 ± 10%     -85.0%       4989 ± 25%  perf-c2c.HITM.total
  14632245 ±  9%     -38.5%    8999074 ±  7%  numa-numastat.node0.local_node
  14749610 ±  9%     -38.6%    9056826 ±  6%  numa-numastat.node0.numa_hit
  21106190 ±  4%     -37.5%   13198942 ±  4%  numa-numastat.node1.local_node
  21186924 ±  4%     -37.0%   13339356 ±  4%  numa-numastat.node1.numa_hit
     43.02 ±  2%     -12.5%      37.66 ±  2%  vmstat.cpu.id
     19.87          +121.8%      44.07 ±  2%  vmstat.cpu.wa
     73.14          +101.7%     147.54 ±  2%  vmstat.procs.b
    112.33 ±  2%     -64.8%      39.60 ±  7%  vmstat.procs.r
  12695197           -38.2%    7849636 ±  4%  vmstat.system.cs
   5179340 ±  2%     -24.4%    3915343 ±  4%  vmstat.system.in
    174059 ±171%    +181.3%     489607 ± 81%  numa-vmstat.node0.nr_unevictable
    174060 ±171%    +181.3%     489607 ± 81%  numa-vmstat.node0.nr_zone_unevictable
  14750003 ±  9%     -38.6%    9057006 ±  6%  numa-vmstat.node0.numa_hit
  14632638 ±  9%     -38.5%    8999253 ±  7%  numa-vmstat.node0.numa_local
    236391 ± 10%     -33.3%     157713 ± 14%  numa-vmstat.node1.nr_mapped
  21186186 ±  4%     -37.0%   13338387 ±  4%  numa-vmstat.node1.numa_hit
  21105453 ±  4%     -37.5%   13197958 ±  4%  numa-vmstat.node1.numa_local
     41.57            -5.8       35.76 ±  2%  mpstat.cpu.all.idle%
     20.32           +25.1       45.43 ±  2%  mpstat.cpu.all.iowait%
      6.25 ±  4%      -2.2        4.09 ±  6%  mpstat.cpu.all.irq%
      0.34 ±  4%      -0.2        0.14 ±  6%  mpstat.cpu.all.soft%
     28.91           -15.5       13.40 ±  6%  mpstat.cpu.all.sys%
      2.62            -1.4        1.17 ±  6%  mpstat.cpu.all.usr%
     18.83 ±  5%     -84.1%       3.00        mpstat.max_utilization.seconds
     61.41           -30.1%      42.94        mpstat.max_utilization_pct
 3.455e+08           -41.9%  2.006e+08 ±  4%  stress-ng.io-uring.ops
   5758736           -41.9%    3343243 ±  4%  stress-ng.io-uring.ops_per_sec
  63485668           -85.7%    9052788 ± 15%  stress-ng.time.involuntary_context_switches
     86971            -2.2%      85030        stress-ng.time.minor_page_faults
      6021           -54.8%       2724 ±  6%  stress-ng.time.percent_of_cpu_this_job_got
      3383           -53.8%       1562 ±  6%  stress-ng.time.system_time
    248.17           -67.3%      81.18 ±  9%  stress-ng.time.user_time
 4.227e+08           -40.1%  2.531e+08 ±  4%  stress-ng.time.voluntary_context_switches
   2888857 ±  2%      -8.1%    2654260        proc-vmstat.nr_active_anon
    302955            -3.1%     293576        proc-vmstat.nr_anon_pages
   3475920 ±  2%      -6.5%    3250878        proc-vmstat.nr_file_pages
     44207            -3.1%      42858        proc-vmstat.nr_kernel_stack
    255933 ±  3%     -30.6%     177546        proc-vmstat.nr_mapped
   2586684 ±  3%      -8.7%    2361525        proc-vmstat.nr_shmem
     43152            -1.5%      42518        proc-vmstat.nr_slab_reclaimable
   2888857 ±  2%      -8.1%    2654260        proc-vmstat.nr_zone_active_anon
  35939101           -37.7%   22399100 ±  3%  proc-vmstat.numa_hit
  35741003           -37.9%   22200912 ±  3%  proc-vmstat.numa_local
    585759 ±  5%     -27.5%     424436 ±  8%  proc-vmstat.numa_pte_updates
  36196152           -37.5%   22624491 ±  3%  proc-vmstat.pgalloc_normal
    700860 ±  3%      -7.0%     651538 ±  4%  proc-vmstat.pgfault
  32134448           -41.1%   18939637 ±  4%  proc-vmstat.pgfree
  16707904           -77.5%    3755057 ± 10%  proc-vmstat.unevictable_pgs_culled
      0.17 ±  4%     +94.3%       0.32 ± 16%  perf-stat.i.MPKI
 2.698e+10           -40.1%  1.616e+10 ±  4%  perf-stat.i.branch-instructions
      0.92            -0.3        0.64        perf-stat.i.branch-miss-rate%
 2.173e+08           -57.1%   93142321 ±  5%  perf-stat.i.branch-misses
      2.25 ±  4%      +6.4        8.67 ± 17%  perf-stat.i.cache-miss-rate%
 1.262e+09           -68.8%   3.94e+08 ±  6%  perf-stat.i.cache-references
  13218006           -37.6%    8252620 ±  4%  perf-stat.i.context-switches
      3.40            -7.5%       3.15 ±  3%  perf-stat.i.cpi
 4.003e+11           -40.4%  2.384e+11 ±  5%  perf-stat.i.cpu-cycles
   5382764           -76.2%    1281759 ± 10%  perf-stat.i.cpu-migrations
     32980 ±  5%     -25.9%      24437 ±  9%  perf-stat.i.cycles-between-cache-misses
 1.327e+11           -39.9%  7.973e+10 ±  4%  perf-stat.i.instructions
      0.33            +9.9%       0.36 ±  3%  perf-stat.i.ipc
     96.88           -48.8%      49.64 ±  4%  perf-stat.i.metric.K/sec
      8872 ±  4%     -11.6%       7844 ±  4%  perf-stat.i.minor-faults
      8872 ±  4%     -11.6%       7844 ±  4%  perf-stat.i.page-faults
      0.18 ±  3%     +61.7%       0.29 ±  8%  perf-stat.overall.MPKI
      0.81            -0.2        0.58        perf-stat.overall.branch-miss-rate%
      1.88 ±  3%      +4.0        5.86 ±  9%  perf-stat.overall.cache-miss-rate%
     16903 ±  3%     -38.3%      10426 ±  9%  perf-stat.overall.cycles-between-cache-misses
 2.655e+10           -40.1%   1.59e+10 ±  4%  perf-stat.ps.branch-instructions
 2.138e+08           -57.2%   91585587 ±  5%  perf-stat.ps.branch-misses
 1.241e+09           -68.8%  3.875e+08 ±  6%  perf-stat.ps.cache-references
  13003285           -37.6%    8120099 ±  4%  perf-stat.ps.context-switches
 3.938e+11           -40.5%  2.345e+11 ±  5%  perf-stat.ps.cpu-cycles
   5295095           -76.2%    1259803 ± 10%  perf-stat.ps.cpu-migrations
 1.306e+11           -39.9%  7.846e+10 ±  4%  perf-stat.ps.instructions
      8714 ±  4%     -11.7%       7694 ±  4%  perf-stat.ps.minor-faults
      8714 ±  4%     -11.7%       7694 ±  4%  perf-stat.ps.page-faults
 8.049e+12           -40.0%  4.829e+12 ±  4%  perf-stat.total.instructions
    879267 ±  3%     -77.4%     198767 ± 46%  sched_debug.cfs_rq:/.avg_vruntime.avg
   2197261 ±  7%     -80.3%     433455 ± 40%  sched_debug.cfs_rq:/.avg_vruntime.max
    702597 ±  3%     -82.0%     126663 ± 48%  sched_debug.cfs_rq:/.avg_vruntime.min
    144651 ±  9%     -75.7%      35081 ± 36%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.38 ±  7%     -79.6%       0.08 ± 20%  sched_debug.cfs_rq:/.h_nr_queued.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cfs_rq:/.h_nr_queued.max
      0.61 ±  4%     -57.2%       0.26 ±  9%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.34 ±  6%     -77.5%       0.08 ± 19%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cfs_rq:/.h_nr_runnable.max
      0.56 ±  5%     -53.3%       0.26 ±  9%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    115895 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.left_deadline.avg
   1148129 ± 31%     -77.7%     255916 ± 52%  sched_debug.cfs_rq:/.left_deadline.max
    300169 ±  8%     -87.0%      39025 ± 54%  sched_debug.cfs_rq:/.left_deadline.stddev
    115876 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.left_vruntime.avg
   1147975 ± 31%     -77.7%     255883 ± 52%  sched_debug.cfs_rq:/.left_vruntime.max
    300120 ±  8%     -87.0%      39021 ± 54%  sched_debug.cfs_rq:/.left_vruntime.stddev
      2.08 ± 16%    -100.0%       0.00        sched_debug.cfs_rq:/.load_avg.min
    879267 ±  3%     -77.4%     198767 ± 46%  sched_debug.cfs_rq:/.min_vruntime.avg
   2197261 ±  7%     -80.3%     433455 ± 40%  sched_debug.cfs_rq:/.min_vruntime.max
    702597 ±  3%     -82.0%     126663 ± 48%  sched_debug.cfs_rq:/.min_vruntime.min
    144651 ±  9%     -75.7%      35081 ± 36%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.24 ±  5%     -67.9%       0.08 ± 19%  sched_debug.cfs_rq:/.nr_queued.avg
      0.36           -27.4%       0.26 ±  9%  sched_debug.cfs_rq:/.nr_queued.stddev
    115876 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.right_vruntime.avg
   1147975 ± 31%     -77.7%     255883 ± 52%  sched_debug.cfs_rq:/.right_vruntime.max
    300120 ±  8%     -87.0%      39021 ± 54%  sched_debug.cfs_rq:/.right_vruntime.stddev
    293.31 ±  2%     -61.0%     114.35 ± 10%  sched_debug.cfs_rq:/.runnable_avg.avg
    114.75 ±  6%    -100.0%       0.00        sched_debug.cfs_rq:/.runnable_avg.min
    161.40 ±  3%     +16.8%     188.44 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
    243.06 ±  2%     -53.0%     114.20 ± 10%  sched_debug.cfs_rq:/.util_avg.avg
    111.42 ±  5%    -100.0%       0.00        sched_debug.cfs_rq:/.util_avg.min
    143.53 ±  4%     +31.2%     188.36 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
     45.14 ±  5%     -53.8%      20.87 ± 29%  sched_debug.cfs_rq:/.util_est.avg
    117.16 ±  9%     -23.3%      89.81 ± 15%  sched_debug.cfs_rq:/.util_est.stddev
    460889           +78.9%     824600 ±  4%  sched_debug.cpu.avg_idle.avg
    545161 ±  4%     +83.4%    1000000        sched_debug.cpu.avg_idle.max
      7815 ±  7%     -47.7%       4084 ± 14%  sched_debug.cpu.avg_idle.min
     96234 ±  8%    +192.4%     281404 ± 13%  sched_debug.cpu.avg_idle.stddev
    754.64 ±  5%     -19.2%     609.61 ±  9%  sched_debug.cpu.clock_task.stddev
      1016 ±  7%     -74.3%     261.72 ± 25%  sched_debug.cpu.curr->pid.avg
      1648           -37.6%       1027 ± 14%  sched_debug.cpu.curr->pid.stddev
      0.00 ± 24%     -27.7%       0.00 ± 10%  sched_debug.cpu.next_balance.stddev
      0.35 ± 10%     -82.5%       0.06 ± 20%  sched_debug.cpu.nr_running.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cpu.nr_running.max
      0.60 ±  6%     -61.0%       0.23 ±  8%  sched_debug.cpu.nr_running.stddev
   2060126           -47.9%    1073009 ± 44%  sched_debug.cpu.nr_switches.avg
   2688437           -31.6%    1839609 ± 44%  sched_debug.cpu.nr_switches.max
    650892 ±  9%     -43.0%     370926 ± 54%  sched_debug.cpu.nr_switches.min
    522908 ±  2%     -49.9%     261974 ± 45%  sched_debug.cpu.nr_switches.stddev




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2025-07-01  4:48 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202507010550.2d6f83ea-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.