All of lore.kernel.org
 help / color / mirror / Atom feed
* [tip:sched/core] [sched/deadline]  cccb45d7c4: will-it-scale.per_thread_ops 36.7% regression
@ 2025-07-23  6:22 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-07-23  6:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oe-lkp, lkp, linux-kernel, x86, Chris Mason, Juri Lelli,
	aubrey.li, yu.c.chen, oliver.sang



Hello,

we found various performance regressions/improvements by this commit in
different micro benckmarks. FYI.


kernel test robot noticed a 36.7% regression of will-it-scale.per_thread_ops on:


commit: cccb45d7c4295bbfeba616582d0249f2d21e6df5 ("sched/deadline: Less agressive dl_server handling")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

[test failed on linux-next/master d086c886ceb9f59dea6c3a9dae7eb89e780a20c9]

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_task: 100%
	mode: thread
	test: mmap1
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.schedpolicy.ops_per_sec 8.6% improvement                                |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory    |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | nr_threads=100%                                                                              |
|                  | test=schedpolicy                                                                             |
|                  | testtime=60s                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.netdev.ops_per_sec  61.6% regression                                    |
| test machine     | 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory  |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | nr_threads=100%                                                                              |
|                  | test=netdev                                                                                  |
|                  | testtime=60s                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps 7.7% improvement                                            |
| test machine     | 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory  |
| test parameters  | cluster=cs-localhost                                                                         |
|                  | cpufreq_governor=performance                                                                 |
|                  | ip=ipv4                                                                                      |
|                  | nr_threads=200%                                                                              |
|                  | runtime=300s                                                                                 |
|                  | test=TCP_MAERTS                                                                              |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops  60.6% regression                               |
| test machine     | 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | mode=process                                                                                 |
|                  | nr_task=100%                                                                                 |
|                  | test=pwrite3                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.timer.ops_per_sec 270.9% improvement                                    |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory  |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | nr_threads=100%                                                                              |
|                  | test=timer                                                                                   |
|                  | testtime=60s                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.min-nanosleep.ops_per_sec 2.3% improvement                              |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory    |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | nr_threads=100%                                                                              |
|                  | test=min-nanosleep                                                                           |
|                  | testtime=60s                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.switch.ops_per_sec 10.0% improvement                                    |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory  |
| test parameters  | cpufreq_governor=performance                                                                 |
|                  | nr_threads=100%                                                                              |
|                  | test=switch                                                                                  |
|                  | testtime=60s                                                                                 |
+------------------+----------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202507230755.5fe8e03e-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250723/202507230755.5fe8e03e-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp2/mmap1/will-it-scale

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     48929           +20.7%      59075 ±  2%  uptime.idle
 9.128e+09 ±  2%    +111.7%  1.933e+10 ±  7%  cpuidle..time
  19232045 ±  2%    +140.2%   46190322 ±  3%  cpuidle..usage
     96378 ± 38%     -64.6%      34164 ± 96%  numa-meminfo.node1.Mapped
    811324 ±  8%     -38.4%     499711 ± 10%  numa-meminfo.node1.Shmem
   1829489           -17.8%    1503275 ±  4%  meminfo.Active
   1829489           -17.8%    1503275 ±  4%  meminfo.Active(anon)
    138153           -37.3%      86582 ±  2%  meminfo.Mapped
   1041324           -30.6%     722167 ±  8%  meminfo.Shmem
    736805 ± 12%     -22.8%     568536 ± 13%  numa-numastat.node0.local_node
    842124 ±  7%     -21.1%     664384 ±  8%  numa-numastat.node0.numa_hit
    950209 ±  7%     -24.7%     715976 ± 12%  numa-numastat.node1.local_node
   1037055 ±  5%     -21.5%     814163 ±  9%  numa-numastat.node1.numa_hit
     16.03 ±  2%    +104.7%      32.81 ±  7%  vmstat.cpu.id
    153.43           -17.2%     127.05 ±  3%  vmstat.procs.r
    108965 ±  2%     +83.0%     199375 ±  3%  vmstat.system.cs
    270608           +12.1%     303291        vmstat.system.in
    123587 ±  2%     -36.7%      78248 ±  5%  will-it-scale.192.threads
     15.05 ±  2%    +115.1%      32.38 ±  7%  will-it-scale.192.threads_idle
    643.33 ±  2%     -36.7%     407.00 ±  5%  will-it-scale.per_thread_ops
    123587 ±  2%     -36.7%      78248 ±  5%  will-it-scale.workload
     15.53 ±  2%     +17.0       32.50 ±  7%  mpstat.cpu.all.idle%
      0.41 ±  2%      +0.1        0.49 ±  2%  mpstat.cpu.all.irq%
      0.04 ±  2%      +0.0        0.05 ±  4%  mpstat.cpu.all.soft%
     83.91           -17.0       66.90 ±  3%  mpstat.cpu.all.sys%
      0.12            -0.1        0.07 ±  5%  mpstat.cpu.all.usr%
    841246 ±  7%     -21.1%     663460 ±  8%  numa-vmstat.node0.numa_hit
    735926 ± 12%     -22.9%     567613 ± 13%  numa-vmstat.node0.numa_local
     24387 ± 38%     -64.3%       8695 ± 96%  numa-vmstat.node1.nr_mapped
    202834 ±  8%     -38.4%     124853 ± 10%  numa-vmstat.node1.nr_shmem
   1038433 ±  6%     -21.7%     813183 ±  9%  numa-vmstat.node1.numa_hit
    951586 ±  8%     -24.9%     714996 ± 12%  numa-vmstat.node1.numa_local
    457393           -17.8%     375837 ±  4%  proc-vmstat.nr_active_anon
   1145415            -7.0%    1065562        proc-vmstat.nr_file_pages
     34923           -37.1%      21983 ±  2%  proc-vmstat.nr_mapped
    260363           -30.7%     180507 ±  8%  proc-vmstat.nr_shmem
    457393           -17.8%     375837 ±  4%  proc-vmstat.nr_zone_active_anon
   1880864           -21.3%    1479470 ±  3%  proc-vmstat.numa_hit
   1688698           -23.9%    1285435 ±  4%  proc-vmstat.numa_local
   3234094 ±  2%     -37.5%    2021950 ±  6%  proc-vmstat.pgalloc_normal
   2859422 ±  2%     -38.6%    1755130 ±  6%  proc-vmstat.pgfree
      0.06           +63.3%       0.10 ±  8%  perf-stat.i.MPKI
 1.924e+10           -20.2%  1.536e+10 ±  4%  perf-stat.i.branch-instructions
      0.04            +0.1        0.10 ± 11%  perf-stat.i.branch-miss-rate%
   7139830 ±  2%      +8.5%    7746733        perf-stat.i.branch-misses
  22682520            -3.5%   21885670 ±  2%  perf-stat.i.cache-references
    109943 ±  2%     +84.4%     202695 ±  3%  perf-stat.i.context-switches
 5.202e+11           -19.9%  4.164e+11 ±  4%  perf-stat.i.cpu-cycles
    259.66            +3.7%     269.38        perf-stat.i.cpu-migrations
     97159           -18.2%      79505 ±  2%  perf-stat.i.cycles-between-cache-misses
 9.606e+10           -20.2%  7.665e+10 ±  4%  perf-stat.i.instructions
      0.19            +2.5%       0.19        perf-stat.i.ipc
      0.07 ± 26%     -73.4%       0.02 ±101%  perf-stat.i.major-faults
      0.06           +26.6%       0.07 ±  3%  perf-stat.overall.MPKI
      0.04            +0.0        0.05 ±  4%  perf-stat.overall.branch-miss-rate%
     23.94            +0.9       24.82        perf-stat.overall.cache-miss-rate%
     95666           -20.7%      75866 ±  3%  perf-stat.overall.cycles-between-cache-misses
 2.344e+08 ±  2%     +25.0%  2.931e+08 ±  4%  perf-stat.overall.path-length
 1.915e+10           -20.8%  1.516e+10 ±  3%  perf-stat.ps.branch-instructions
   7105457            +7.4%    7633217        perf-stat.ps.branch-misses
  22609904            -3.5%   21824261 ±  2%  perf-stat.ps.cache-references
    109413 ±  2%     +82.8%     200048 ±  3%  perf-stat.ps.context-switches
 5.178e+11           -20.6%   4.11e+11 ±  3%  perf-stat.ps.cpu-cycles
    258.60            +3.4%     267.48        perf-stat.ps.cpu-migrations
 9.563e+10           -20.9%  7.568e+10 ±  3%  perf-stat.ps.instructions
      0.06 ± 26%     -73.7%       0.02 ±101%  perf-stat.ps.major-faults
 2.895e+13           -20.9%   2.29e+13 ±  4%  perf-stat.total.instructions
  24168770           -48.0%   12569759 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.avg
  24397767           -42.1%   14114188 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.max
  23131295           -88.3%    2698143 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.min
    129414 ± 12%   +1811.9%    2474324 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     36961 ± 53%     -93.9%       2249 ±135%  sched_debug.cfs_rq:/.left_deadline.avg
   7096695 ± 53%     -93.9%     431832 ±135%  sched_debug.cfs_rq:/.left_deadline.max
    510824 ± 53%     -93.9%      31084 ±135%  sched_debug.cfs_rq:/.left_deadline.stddev
     36961 ± 53%     -93.9%       2249 ±135%  sched_debug.cfs_rq:/.left_vruntime.avg
   7096632 ± 53%     -93.9%     431793 ±135%  sched_debug.cfs_rq:/.left_vruntime.max
    510819 ± 53%     -93.9%      31081 ±135%  sched_debug.cfs_rq:/.left_vruntime.stddev
    347.47 ± 11%     -37.8%     216.06 ± 23%  sched_debug.cfs_rq:/.load_avg.max
      2.61 ±  4%     -54.3%       1.19 ± 14%  sched_debug.cfs_rq:/.load_avg.min
  24168770           -48.0%   12569758 ±  8%  sched_debug.cfs_rq:/.min_vruntime.avg
  24397767           -42.1%   14114188 ±  8%  sched_debug.cfs_rq:/.min_vruntime.max
  23131295           -88.3%    2698143 ±  7%  sched_debug.cfs_rq:/.min_vruntime.min
    129414 ± 12%   +1811.9%    2474324 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
     36961 ± 53%     -93.9%       2249 ±135%  sched_debug.cfs_rq:/.right_vruntime.avg
   7096633 ± 53%     -93.9%     431793 ±135%  sched_debug.cfs_rq:/.right_vruntime.max
    510819 ± 53%     -93.9%      31081 ±135%  sched_debug.cfs_rq:/.right_vruntime.stddev
    406.31 ±  7%     -72.5%     111.89 ±  6%  sched_debug.cfs_rq:/.runnable_avg.min
    123.61 ±  6%     +43.5%     177.42 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
    399.81 ±  7%     -72.4%     110.19 ±  6%  sched_debug.cfs_rq:/.util_avg.min
    108.91 ±  7%     +61.9%     176.29 ±  7%  sched_debug.cfs_rq:/.util_avg.stddev
    968.69 ±  4%      -4.9%     921.42 ±  3%  sched_debug.cfs_rq:/.util_est.max
   1356108 ±  4%     +15.5%    1566412 ±  4%  sched_debug.cpu.avg_idle.avg
     13.33 ±  4%     -10.9%      11.88 ±  6%  sched_debug.cpu.clock.stddev
     84211 ±  2%     +76.6%     148687 ±  3%  sched_debug.cpu.nr_switches.avg
    374937 ± 12%    +424.3%    1965825 ± 39%  sched_debug.cpu.nr_switches.max
     49563 ±  7%    +472.2%     283616 ± 16%  sched_debug.cpu.nr_switches.stddev
      0.06 ±137%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.01 ± 50%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.02 ± 27%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.01 ± 11%     -94.6%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.force_qs_rnp.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.03 ± 37%    +591.7%       0.21 ± 22%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.01 ± 43%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      0.01 ± 35%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.fifo_pipe_write.vfs_write
      0.12 ± 57%     -94.0%       0.01 ± 12%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.66 ± 35%     -99.3%       0.00 ± 16%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.04 ± 19%     -85.7%       0.01 ± 16%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.01 ± 38%     -87.8%       0.00        perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.08 ± 76%     -94.4%       0.00 ± 17%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.07 ± 42%     -85.5%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.01 ± 37%     -87.2%       0.00 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.30 ± 31%     -83.7%       0.05 ± 43%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.18 ± 32%     -97.0%       0.01 ±  9%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ± 20%     -59.0%       0.00 ±  8%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.02 ±  9%     -72.3%       0.00 ± 10%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ±  6%     -37.8%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±  4%     -47.0%       0.01 ± 10%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.29 ±165%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.02 ± 89%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.02 ± 14%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.01 ± 16%     -95.6%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.force_qs_rnp.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3.52 ±  8%     -77.6%       0.79 ± 27%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.01 ± 52%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      0.01 ± 93%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_write.fifo_pipe_write.vfs_write
      2.96 ± 32%     -89.3%       0.32 ± 72%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      1.75 ± 63%     -99.2%       0.01 ± 44%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.01 ± 35%     -45.9%       0.01 ± 39%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      1.78 ± 67%     -99.6%       0.01 ± 12%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3.05 ± 35%     -96.8%       0.10 ±185%  perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1.79 ± 48%     -84.7%       0.27 ± 90%  perf-sched.sch_delay.max.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.38 ± 85%     -98.2%       0.01 ± 21%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      2.04 ± 49%     -98.8%       0.03 ± 38%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.44 ± 83%     -94.8%       0.28 ± 93%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 73%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.15 ± 86%     -98.9%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.20 ±105%     -98.7%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.13 ± 61%     -95.9%       0.01 ±154%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      3.71 ±  2%     -78.6%       0.79 ± 24%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      2.98 ± 21%     -99.5%       0.02 ±  6%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      2.17 ± 17%     -99.0%       0.02 ± 31%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.54 ± 84%     -95.8%       0.02 ± 44%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     11.05 ±  4%     -28.7%       7.88        perf-sched.total_wait_and_delay.average.ms
    238376 ±  3%     +37.9%     328775 ±  3%  perf-sched.total_wait_and_delay.count.ms
      2293 ± 13%     +88.4%       4321 ±  5%  perf-sched.total_wait_and_delay.max.ms
     11.04 ±  4%     -28.7%       7.87        perf-sched.total_wait_time.average.ms
      2293 ± 13%     +88.4%       4321 ±  5%  perf-sched.total_wait_time.max.ms
      0.41 ±223%  +31852.7%     130.74 ± 43%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    107.76 ± 32%     +71.6%     184.90 ±  6%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     23.89 ± 20%     -43.6%      13.48 ± 11%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
     98.40 ± 10%     +63.9%     161.27 ± 27%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.39 ±  4%    +513.5%       2.40 ±  9%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.32 ±  3%    +637.8%       2.36 ±  9%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
    442.42 ±  3%     +39.0%     615.10 ±  2%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.17 ± 25%     -83.8%       1.00 ±152%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    403.50 ± 35%     -45.9%     218.33 ±  6%  perf-sched.wait_and_delay.count.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1234 ± 23%     +69.2%       2087 ±  9%  perf-sched.wait_and_delay.count.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    130.50 ± 10%     -35.5%      84.17 ± 20%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    100155 ±  4%     +57.8%     158055 ±  3%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
    126802 ±  3%     +28.3%     162742 ±  3%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      3720 ±  7%     -50.6%       1839 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1048 ±  4%     -23.6%     801.33 ±  8%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     58.08 ±223%   +4522.0%       2684 ± 41%  perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      2166 ± 20%     +95.2%       4229 ±  8%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1994 ±  5%     +98.4%       3957 ±  7%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      5.53 ±121%     -97.5%       0.14 ±149%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.12 ±128%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.02 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      4.39 ±  5%     -84.9%       0.66 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.force_qs_rnp.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3.04 ± 18%   +4190.5%     130.53 ± 44%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.01 ± 43%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.fifo_pipe_write.vfs_write
      0.74 ±148%     -99.5%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
      2.86 ± 79%    +420.4%      14.91 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    107.71 ± 32%     +71.7%     184.89 ±  6%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     23.89 ± 20%     -43.6%      13.48 ± 11%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.01 ± 59%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.04 ± 93%     -96.0%       0.00 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.02 ± 80%     -89.5%       0.00 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.11 ±163%     -95.4%       0.01 ±154%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
     98.10 ± 10%     +64.3%     161.22 ± 27%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.38 ±  3%    +524.3%       2.39 ±  9%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.31 ±  3%    +651.2%       2.35 ±  9%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      0.88 ± 10%     -32.7%       0.59        perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      4.45           -10.9%       3.97 ±  3%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    442.40 ±  3%     +39.0%     615.09 ±  2%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    224.31 ± 98%     -99.9%       0.14 ±149%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.45 ±136%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.02 ± 14%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      4.82 ±  7%     -86.2%       0.66 ±223%  perf-sched.wait_time.max.ms.__cond_resched.force_qs_rnp.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 52%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      7.36 ± 16%     -41.6%       4.29 ± 44%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.01 ± 93%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.anon_pipe_write.fifo_pipe_write.vfs_write
     12.34 ±136%    -100.0%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
     20.81 ± 11%     -20.3%      16.58 ±  5%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    836.03 ± 44%     -79.5%     171.53 ±216%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 73%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.86 ± 98%     -99.8%       0.00 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.75 ± 99%     -99.7%       0.00 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    167.42 ±223%     -99.9%       0.10 ±138%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      1.09 ±103%     -99.5%       0.01 ±154%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      4.85 ±  6%     -52.6%       2.30 ±  7%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      6.99 ±  8%     -28.4%       5.01        perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      2166 ± 20%     +95.2%       4228 ±  8%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    333.80 ±141%    -100.0%       0.02 ± 44%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      1994 ±  5%     +98.4%       3957 ±  7%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     57.59           -10.2       47.34        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64
     57.93            -9.5       48.45        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     57.94            -9.5       48.49        perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     58.16            -9.1       49.10        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     58.17            -9.0       49.13        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     58.17            -9.0       49.13        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     58.18            -9.0       49.15        perf-profile.calltrace.cycles-pp.__mmap
      0.00            +0.8        0.83 ± 17%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.00            +0.9        0.86 ± 17%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.9        0.90 ± 21%  perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.00            +1.1        1.08 ± 23%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.00            +1.1        1.08 ± 23%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.00            +1.1        1.08 ± 23%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +1.1        1.10 ± 23%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.00            +1.1        1.10 ± 23%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      0.00            +1.2        1.16 ± 23%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.00            +1.3        1.27 ± 22%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      0.00            +1.3        1.27 ± 22%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.00            +1.3        1.27 ± 22%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      0.00            +1.3        1.28 ± 22%  perf-profile.calltrace.cycles-pp.common_startup_64
     41.04            +6.4       47.39        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap
     41.28            +7.2       48.50        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64
     41.29            +7.2       48.54        perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     41.62            +7.8       49.41        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     41.62            +7.8       49.41        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     41.62            +7.8       49.43        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     41.62            +7.8       49.43        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     41.63            +7.8       49.45        perf-profile.calltrace.cycles-pp.__munmap
     58.16            -9.1       49.10        perf-profile.children.cycles-pp.vm_mmap_pgoff
     58.18            -9.0       49.15        perf-profile.children.cycles-pp.__mmap
     98.64            -3.9       94.75        perf-profile.children.cycles-pp.osq_lock
     99.22            -2.3       96.96        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     99.24            -2.2       97.02        perf-profile.children.cycles-pp.down_write_killable
     99.83            -1.2       98.63        perf-profile.children.cycles-pp.do_syscall_64
     99.83            -1.2       98.63        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.05            +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.get_jiffies_update
      0.05 ±  7%      +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.tmigr_requires_handle_remote
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.rcu_do_batch
      0.00            +0.1        0.06 ± 14%  perf-profile.children.cycles-pp.update_curr
      0.06            +0.1        0.12 ± 19%  perf-profile.children.cycles-pp.vma_expand
      0.00            +0.1        0.07 ± 15%  perf-profile.children.cycles-pp.__resched_curr
      0.00            +0.1        0.07 ± 13%  perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.00            +0.1        0.07 ± 14%  perf-profile.children.cycles-pp.schedule_idle
      0.06            +0.1        0.13 ± 12%  perf-profile.children.cycles-pp.kmem_cache_free
      0.00            +0.1        0.07 ± 12%  perf-profile.children.cycles-pp.mas_store_gfp
      0.00            +0.1        0.08 ± 16%  perf-profile.children.cycles-pp.rcu_core
      0.00            +0.1        0.08 ± 20%  perf-profile.children.cycles-pp.osq_unlock
      0.00            +0.1        0.08 ± 13%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.12 ±  8%      +0.1        0.20 ± 10%  perf-profile.children.cycles-pp.update_cfs_group
      0.00            +0.1        0.08 ± 13%  perf-profile.children.cycles-pp.vma_complete
      0.00            +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.downgrade_write
      0.07            +0.1        0.15 ± 19%  perf-profile.children.cycles-pp.vma_merge_new_range
      0.00            +0.1        0.09 ± 18%  perf-profile.children.cycles-pp.mas_store_prealloc
      0.05            +0.1        0.14 ± 15%  perf-profile.children.cycles-pp.unmapped_area_topdown
      0.05            +0.1        0.14 ± 15%  perf-profile.children.cycles-pp.vm_unmapped_area
      0.00            +0.1        0.09 ± 12%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.00            +0.1        0.09 ± 15%  perf-profile.children.cycles-pp.mas_rev_awalk
      0.00            +0.1        0.10 ± 15%  perf-profile.children.cycles-pp.mas_empty_area_rev
      0.06            +0.1        0.16 ± 17%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.06            +0.1        0.16 ± 17%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      0.09 ±  4%      +0.1        0.19 ± 18%  perf-profile.children.cycles-pp.__vma_start_write
      0.06 ±  7%      +0.1        0.17 ± 16%  perf-profile.children.cycles-pp.__get_unmapped_area
      0.00            +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.00            +0.1        0.11 ± 18%  perf-profile.children.cycles-pp.sched_balance_newidle
      0.00            +0.1        0.11 ± 18%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.sched_balance_find_src_group
      0.00            +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.10            +0.1        0.22 ± 18%  perf-profile.children.cycles-pp.__mmap_region
      0.00            +0.1        0.12 ± 19%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.00            +0.1        0.13 ± 19%  perf-profile.children.cycles-pp.sched_balance_rq
      0.00            +0.1        0.14 ± 17%  perf-profile.children.cycles-pp.__pick_next_task
      0.13            +0.1        0.27 ± 15%  perf-profile.children.cycles-pp.__split_vma
      0.00            +0.2        0.15 ± 18%  perf-profile.children.cycles-pp.handle_softirqs
      0.02 ± 99%      +0.2        0.18 ± 16%  perf-profile.children.cycles-pp.clockevents_program_event
      0.00            +0.2        0.16 ± 17%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.00            +0.2        0.16 ± 17%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.00            +0.2        0.16 ± 19%  perf-profile.children.cycles-pp.try_to_wake_up
      0.00            +0.2        0.16 ± 21%  perf-profile.children.cycles-pp.up_read
      0.00            +0.2        0.16 ± 18%  perf-profile.children.cycles-pp.wake_up_q
      0.00            +0.2        0.18 ±  8%  perf-profile.children.cycles-pp.update_load_avg
      0.00            +0.2        0.18 ± 28%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.00            +0.2        0.19 ± 22%  perf-profile.children.cycles-pp.up_write
      0.16 ±  5%      +0.2        0.36 ±  8%  perf-profile.children.cycles-pp.task_tick_fair
      0.01 ±223%      +0.2        0.22 ± 20%  perf-profile.children.cycles-pp.ktime_get
      0.16            +0.2        0.37 ± 15%  perf-profile.children.cycles-pp.vms_gather_munmap_vmas
      0.03 ± 70%      +0.2        0.26 ± 17%  perf-profile.children.cycles-pp.schedule
      0.00            +0.2        0.24 ± 15%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.17 ±  2%      +0.2        0.41 ± 16%  perf-profile.children.cycles-pp.do_mmap
      0.00            +0.2        0.24 ± 19%  perf-profile.children.cycles-pp.schedule_preempt_disabled
      0.00            +0.2        0.24 ± 16%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.04 ± 44%      +0.2        0.28 ± 16%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.04 ± 44%      +0.3        0.30 ± 21%  perf-profile.children.cycles-pp.rwsem_wake
      0.06 ±  6%      +0.3        0.32 ± 17%  perf-profile.children.cycles-pp.__schedule
      0.11 ±  4%      +0.3        0.38 ± 20%  perf-profile.children.cycles-pp.vms_complete_munmap_vmas
      0.20 ±  3%      +0.3        0.49 ±  4%  perf-profile.children.cycles-pp.sched_tick
      0.06 ±  7%      +0.3        0.38 ± 19%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.00            +0.4        0.38 ± 34%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.00            +0.4        0.39 ± 34%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.30 ±  2%      +0.4        0.70 ±  6%  perf-profile.children.cycles-pp.update_process_times
      0.32            +0.5        0.78 ±  8%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.33            +0.5        0.80 ±  8%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.30            +0.5        0.83 ± 17%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.32            +0.6        0.86 ± 17%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.40 ±  2%      +0.7        1.06 ±  9%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.41            +0.7        1.07 ±  9%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.41            +0.7        1.13 ± 19%  perf-profile.children.cycles-pp.rwsem_spin_on_owner
      0.45            +0.8        1.27 ± 11%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.47            +0.9        1.39 ± 12%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.10 ±  5%      +1.0        1.08 ± 23%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      0.10 ±  5%      +1.0        1.08 ± 23%  perf-profile.children.cycles-pp.acpi_idle_enter
      0.10 ±  5%      +1.0        1.08 ± 23%  perf-profile.children.cycles-pp.acpi_safe_halt
      0.10 ±  5%      +1.0        1.08 ± 23%  perf-profile.children.cycles-pp.pv_native_safe_halt
      0.10 ±  5%      +1.0        1.10 ± 23%  perf-profile.children.cycles-pp.cpuidle_enter
      0.10 ±  5%      +1.0        1.10 ± 23%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.10 ±  4%      +1.1        1.17 ± 23%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.12 ±  4%      +1.2        1.27 ± 22%  perf-profile.children.cycles-pp.start_secondary
      0.12 ±  6%      +1.2        1.27 ± 22%  perf-profile.children.cycles-pp.do_idle
      0.12 ±  6%      +1.2        1.28 ± 22%  perf-profile.children.cycles-pp.common_startup_64
      0.12 ±  6%      +1.2        1.28 ± 22%  perf-profile.children.cycles-pp.cpu_startup_entry
     41.62            +7.8       49.41        perf-profile.children.cycles-pp.__vm_munmap
     41.62            +7.8       49.41        perf-profile.children.cycles-pp.__x64_sys_munmap
     41.63            +7.8       49.45        perf-profile.children.cycles-pp.__munmap
     98.21            -4.1       94.08        perf-profile.self.cycles-pp.osq_lock
      0.05            +0.1        0.11 ± 19%  perf-profile.self.cycles-pp.get_jiffies_update
      0.00            +0.1        0.07 ± 14%  perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      0.00            +0.1        0.07 ± 15%  perf-profile.self.cycles-pp.__resched_curr
      0.00            +0.1        0.08 ± 20%  perf-profile.self.cycles-pp.osq_unlock
      0.12 ±  8%      +0.1        0.20 ± 10%  perf-profile.self.cycles-pp.update_cfs_group
      0.00            +0.1        0.09 ± 21%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.09 ± 26%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.09 ±  4%      +0.1        0.18 ± 18%  perf-profile.self.cycles-pp.__vma_start_write
      0.00            +0.1        0.12 ± 25%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.00            +0.1        0.13 ± 10%  perf-profile.self.cycles-pp.update_load_avg
      0.00            +0.2        0.18 ± 31%  perf-profile.self.cycles-pp.pv_native_safe_halt
      0.00            +0.2        0.20 ± 21%  perf-profile.self.cycles-pp.ktime_get
      0.10 ±  3%      +0.3        0.40 ± 20%  perf-profile.self.cycles-pp.rwsem_down_write_slowpath
      0.00            +0.4        0.38 ± 34%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.40            +0.7        1.10 ± 18%  perf-profile.self.cycles-pp.rwsem_spin_on_owner


***************************************************************************************************
lkp-icl-2sp8: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/schedpolicy/stress-ng/60s

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1590744 ±  3%     -10.1%    1429682 ±  3%  meminfo.Mapped
      0.44 ±  6%      -9.4%       0.39 ±  5%  sched_debug.cpu.nr_running.stddev
    398430 ±  3%     -10.5%     356666 ±  3%  proc-vmstat.nr_mapped
      4780            -3.3%       4622 ±  2%  proc-vmstat.nr_page_table_pages
 5.916e+08            +8.6%  6.427e+08        stress-ng.schedpolicy.ops
   9861019            +8.6%   10712524        stress-ng.schedpolicy.ops_per_sec
      3085            -1.4%       3043        stress-ng.time.system_time
    655.33            +6.8%     699.99        stress-ng.time.user_time
     20.70 ±129%     -99.8%       0.04 ± 59%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      3.89 ± 12%     -31.0%       2.68 ± 17%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     20.28 ± 34%     -52.5%       9.64 ± 24%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    700.01 ±130%     -99.9%       0.60 ± 66%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    258.83 ±  2%      +8.1%     279.83 ±  3%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     58.67 ± 10%     +20.5%      70.67 ±  5%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     20.70 ±129%     -99.8%       0.04 ± 60%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    700.01 ±130%     -99.9%       0.60 ± 66%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.25           +14.0%       0.29 ±  2%  perf-stat.i.MPKI
 4.879e+10            -1.5%  4.806e+10        perf-stat.i.branch-instructions
  46494984           +10.3%   51292475 ±  2%  perf-stat.i.cache-misses
 1.416e+08 ±  2%      +8.0%  1.529e+08        perf-stat.i.cache-references
      1.20            +3.5%       1.24        perf-stat.i.cpi
      4741            -9.2%       4302        perf-stat.i.cycles-between-cache-misses
 1.837e+11            -3.4%  1.775e+11        perf-stat.i.instructions
      0.84            -3.4%       0.81        perf-stat.i.ipc
      0.25           +14.2%       0.29 ±  2%  perf-stat.overall.MPKI
      1.20            +3.5%       1.24        perf-stat.overall.cpi
      4736            -9.3%       4295        perf-stat.overall.cycles-between-cache-misses
      0.83            -3.4%       0.81        perf-stat.overall.ipc
   4.8e+10            -1.6%  4.725e+10        perf-stat.ps.branch-instructions
  45737759           +10.3%   50431677 ±  2%  perf-stat.ps.cache-misses
 1.393e+08 ±  2%      +7.9%  1.504e+08        perf-stat.ps.cache-references
 1.807e+11            -3.4%  1.745e+11        perf-stat.ps.instructions
 1.111e+13            -3.5%  1.073e+13        perf-stat.total.instructions



***************************************************************************************************
lkp-srf-2sp3: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp3/netdev/stress-ng/60s

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  3.63e+08          +643.3%  2.698e+09 ± 13%  cpuidle..time
    204743 ±  3%   +1975.4%    4249334 ± 10%  cpuidle..usage
      4.97          +341.0%      21.93 ± 10%  vmstat.cpu.id
    184.63           -22.2%     143.65 ±  4%  vmstat.procs.r
      3473         +3085.4%     110658 ±  8%  vmstat.system.cs
    408964            +2.0%     417266        vmstat.system.in
   1113721           +17.7%    1310862        meminfo.Active
   1113721           +17.7%    1310862        meminfo.Active(anon)
    177366 ±  2%     +11.0%     196822 ±  4%  meminfo.DirectMap4k
    298665           +19.2%     355990 ±  2%  meminfo.Mapped
    420393           +45.3%     610884 ±  2%  meminfo.Shmem
      2.96 ± 20%     +16.7       19.66 ± 14%  mpstat.cpu.all.idle%
      0.26            +0.3        0.61 ±  3%  mpstat.cpu.all.irq%
      0.00 ± 20%      +0.0        0.03 ±  5%  mpstat.cpu.all.soft%
     96.53           -17.1       79.47 ±  3%  mpstat.cpu.all.sys%
     14.83 ± 61%    +238.2%      50.17 ± 20%  mpstat.max_utilization.seconds
    100.00           -14.6%      85.36 ±  3%  mpstat.max_utilization_pct
   9547602           -61.6%    3667923 ±  4%  stress-ng.netdev.ops
    159180           -61.6%      61151 ±  4%  stress-ng.netdev.ops_per_sec
     67355            +2.8%      69256        stress-ng.time.minor_page_faults
     19016           -21.0%      15021 ±  4%  stress-ng.time.percent_of_cpu_this_job_got
     11432           -21.0%       9033 ±  4%  stress-ng.time.system_time
     35368 ±  2%   +9542.0%    3410177 ±  8%  stress-ng.time.voluntary_context_switches
    278515           +17.5%     327222        proc-vmstat.nr_active_anon
    995358            +4.7%    1042520        proc-vmstat.nr_file_pages
     74999           +18.3%      88740 ±  2%  proc-vmstat.nr_mapped
    105146           +44.9%     152305 ±  2%  proc-vmstat.nr_shmem
    278515           +17.5%     327222        proc-vmstat.nr_zone_active_anon
    826913            +7.7%     890858        proc-vmstat.numa_hit
    629070           +10.1%     692863        proc-vmstat.numa_local
    873883            +7.2%     936679        proc-vmstat.pgalloc_normal
    418067            +2.9%     430228        proc-vmstat.pgfault
      0.10 ±  3%     +37.8%       0.14 ±  2%  perf-stat.i.MPKI
 2.248e+10           -22.9%  1.733e+10 ±  4%  perf-stat.i.branch-instructions
      0.10 ±  2%      +0.0        0.15 ±  4%  perf-stat.i.branch-miss-rate%
  18947128           +20.6%   22857416        perf-stat.i.branch-misses
     35.42           -17.1       18.35 ±  9%  perf-stat.i.cache-miss-rate%
   9364646           +11.9%   10482390 ±  2%  perf-stat.i.cache-misses
  27205535          +125.0%   61210467 ± 11%  perf-stat.i.cache-references
      3273 ±  2%   +3392.2%     114320 ±  8%  perf-stat.i.context-switches
      5.35            +3.8%       5.56        perf-stat.i.cpi
 6.028e+11           -20.1%  4.818e+11 ±  4%  perf-stat.i.cpu-cycles
    327.85          +343.1%       1452 ±  9%  perf-stat.i.cpu-migrations
     68905           -27.8%      49741 ±  2%  perf-stat.i.cycles-between-cache-misses
  1.12e+11           -23.0%  8.626e+10 ±  4%  perf-stat.i.instructions
      0.19            -3.5%       0.18        perf-stat.i.ipc
      4316 ±  2%      +6.1%       4578        perf-stat.i.minor-faults
      4316 ±  2%      +6.1%       4578        perf-stat.i.page-faults
      0.08           +45.3%       0.12 ±  2%  perf-stat.overall.MPKI
      0.08            +0.0        0.13 ±  5%  perf-stat.overall.branch-miss-rate%
     34.42           -17.1       17.28 ±  9%  perf-stat.overall.cache-miss-rate%
      5.38            +3.8%       5.59        perf-stat.overall.cpi
     64384           -28.5%      46017 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.19            -3.7%       0.18        perf-stat.overall.ipc
 2.211e+10           -22.9%  1.705e+10 ±  4%  perf-stat.ps.branch-instructions
  18642811           +20.5%   22455956        perf-stat.ps.branch-misses
   9210208           +11.8%   10296398 ±  2%  perf-stat.ps.cache-misses
  26761745          +124.9%   60190009 ± 11%  perf-stat.ps.cache-references
      3220 ±  2%   +3391.4%     112425 ±  8%  perf-stat.ps.context-switches
  5.93e+11           -20.1%  4.739e+11 ±  4%  perf-stat.ps.cpu-cycles
    322.54          +343.0%       1428 ±  9%  perf-stat.ps.cpu-migrations
 1.102e+11           -23.0%  8.484e+10 ±  4%  perf-stat.ps.instructions
      4239 ±  2%      +5.3%       4464        perf-stat.ps.minor-faults
      4239 ±  2%      +5.3%       4464        perf-stat.ps.page-faults
 6.771e+12           -23.7%  5.169e+12 ±  4%  perf-stat.total.instructions
   5992277           -35.8%    3846765 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.avg
   6049811           -19.2%    4888185 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.max
   5847973           -63.4%    2140155 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.min
     30248 ± 13%   +3774.5%    1171963 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.53           -21.2%       0.42 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.avg
      0.50          -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_queued.min
      0.17 ± 10%     +99.4%       0.34 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.53           -21.3%       0.42 ±  4%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      0.50          -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_runnable.min
      0.17 ± 10%     +98.9%       0.34 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      2696          -100.0%       0.00        sched_debug.cfs_rq:/.load.min
      2.50           -83.3%       0.42 ±107%  sched_debug.cfs_rq:/.load_avg.min
   5992277           -35.8%    3846765 ±  8%  sched_debug.cfs_rq:/.min_vruntime.avg
   6049811           -19.2%    4888185 ±  5%  sched_debug.cfs_rq:/.min_vruntime.max
   5847973           -63.4%    2140155 ±  6%  sched_debug.cfs_rq:/.min_vruntime.min
     30248 ± 13%   +3774.5%    1171963 ±  2%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.53           -21.2%       0.42 ±  4%  sched_debug.cfs_rq:/.nr_queued.avg
      0.50          -100.0%       0.00        sched_debug.cfs_rq:/.nr_queued.min
      0.12 ±  8%    +185.0%       0.33 ±  4%  sched_debug.cfs_rq:/.nr_queued.stddev
    588.21           -17.7%     484.36 ±  3%  sched_debug.cfs_rq:/.runnable_avg.avg
    489.25 ±  6%     -95.0%      24.25 ±141%  sched_debug.cfs_rq:/.runnable_avg.min
    136.65 ±  9%     +70.5%     233.00 ±  4%  sched_debug.cfs_rq:/.runnable_avg.stddev
    585.65           -17.5%     482.95 ±  3%  sched_debug.cfs_rq:/.util_avg.avg
    410.58 ± 29%     -94.4%      23.00 ±141%  sched_debug.cfs_rq:/.util_avg.min
    117.24 ±  7%     +99.4%     233.84 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
    520.05           -32.1%     353.31 ±  6%  sched_debug.cfs_rq:/.util_est.avg
      1139 ± 14%     -19.1%     921.17 ± 11%  sched_debug.cfs_rq:/.util_est.max
    387.58 ± 45%    -100.0%       0.00        sched_debug.cfs_rq:/.util_est.min
     67.01 ± 18%    +283.1%     256.74 ±  2%  sched_debug.cfs_rq:/.util_est.stddev
    669274 ± 19%     +60.4%    1073556 ±  6%  sched_debug.cpu.avg_idle.avg
   1885708 ± 21%     +44.0%    2714848 ±  7%  sched_debug.cpu.avg_idle.max
      7213 ± 87%    +597.3%      50301 ±  5%  sched_debug.cpu.avg_idle.min
     16.82 ± 12%     -35.5%      10.86 ±  8%  sched_debug.cpu.clock.stddev
      2573           -21.9%       2010 ±  4%  sched_debug.cpu.curr->pid.avg
      2303 ± 12%    -100.0%       0.00        sched_debug.cpu.curr->pid.min
    448.26 ±  8%    +220.4%       1436 ±  4%  sched_debug.cpu.curr->pid.stddev
    235851 ± 10%     +17.7%     277484 ±  9%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.53           -21.5%       0.42 ±  4%  sched_debug.cpu.nr_running.avg
      0.50          -100.0%       0.00        sched_debug.cpu.nr_running.min
      0.16 ± 11%    +108.2%       0.33 ±  3%  sched_debug.cpu.nr_running.stddev
      1673 ± 16%   +1028.2%      18878 ±  7%  sched_debug.cpu.nr_switches.avg
      2564 ± 67%    +752.3%      21858 ±  4%  sched_debug.cpu.nr_switches.stddev
      0.00 ±111%   +6575.0%       0.12 ± 14%  sched_debug.cpu.nr_uninterruptible.avg
    -31.42          +179.6%     -87.83        sched_debug.cpu.nr_uninterruptible.min
      0.03 ±  7%    +349.1%       0.12 ±  8%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.03 ± 99%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.08 ±107%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.01 ±  8%     +78.1%       0.01 ± 17%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.79 ± 29%     -99.2%       0.01 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.60 ± 65%     -92.2%       0.05 ±192%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.03 ± 70%     -71.0%       0.01 ± 36%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.23 ± 26%     -90.3%       0.02 ± 36%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.50 ± 42%     -86.5%       0.07 ±120%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.05 ± 45%     -83.5%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.06 ± 28%     -73.9%       0.02 ± 35%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 10%    +136.7%       0.02 ± 46%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.26 ± 29%     -96.0%       0.01 ± 85%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.30 ± 40%     -95.2%       0.01 ± 21%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_ifconf
      0.31 ± 41%     -95.3%       0.01 ± 20%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.devinet_ioctl
      0.20 ± 39%     -96.3%       0.01 ± 25%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01           +73.3%       0.01 ± 14%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.01 ± 32%     -39.7%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.21 ± 22%     -96.2%       0.01 ± 22%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.27 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.58 ± 90%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.01 ± 21%    +204.2%       0.02 ± 75%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.39 ±  8%     -99.7%       0.01 ±  7%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      2.08 ± 55%     -92.1%       0.16 ±210%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3.41 ±  7%     -90.9%       0.31 ± 97%  perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      6.55 ± 39%     -91.8%       0.54 ± 56%  perf-sched.sch_delay.max.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      2.46 ± 39%     -98.1%       0.05 ± 73%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.88 ± 49%     -92.6%       0.44 ± 88%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 21%    +433.9%       0.06 ± 78%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      6.90 ± 37%    +311.9%      28.44 ±  8%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_ifconf
      9.31 ± 16%    +223.8%      30.16 ± 11%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.devinet_ioctl
      3.20 ±  6%     -97.0%       0.10 ± 75%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ±  8%    +174.5%       0.02 ± 47%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1.83 ± 20%     -95.6%       0.08 ± 21%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      2.67 ± 38%     -98.4%       0.04 ± 82%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.18 ± 37%     -91.7%       0.01 ± 18%  perf-sched.total_sch_delay.average.ms
     10.22          +205.5%      31.22 ±  9%  perf-sched.total_sch_delay.max.ms
    108.64 ±  6%     -93.1%       7.50 ±  3%  perf-sched.total_wait_and_delay.average.ms
     12100 ±  7%   +1924.9%     245027 ±  4%  perf-sched.total_wait_and_delay.count.ms
      4980           -18.5%       4056 ±  8%  perf-sched.total_wait_and_delay.max.ms
    108.47 ±  6%     -93.1%       7.48 ±  3%  perf-sched.total_wait_time.average.ms
      4980           -18.5%       4056 ±  8%  perf-sched.total_wait_time.max.ms
      7.85           -92.1%       0.62 ±223%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    589.78 ±  7%     +28.3%     756.97        perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      1.16 ± 28%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    500.89           -74.7%     126.73 ± 19%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      1.20 ± 10%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      6.80 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     47.83 ±  7%     -22.6%      37.00        perf-sched.wait_and_delay.count.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    109.67 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    602.00 ± 48%     -93.8%      37.33 ±223%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     24.00          +438.9%     129.33 ± 15%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      3099 ±  8%   +3753.2%     119435 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_ifconf
      3170 ±  8%   +3674.8%     119693 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.devinet_ioctl
     86.33          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    712.00 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1707 ±  2%     +48.8%       2540 ± 19%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    449.67 ±  4%     +20.9%     543.67 ±  5%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      4980           -96.7%     166.80 ±223%  perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      5.85 ± 18%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.96 ± 35%    +145.9%      34.34 ± 34%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_ifconf
     18.63 ± 16%     +81.2%      33.75 ± 25%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.devinet_ioctl
      7.17 ± 12%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    479.50 ±  8%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      7.82           -50.2%       3.90 ±  8%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.25 ±140%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.06 ±147%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
    589.55 ±  7%     +28.4%     756.94        perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.20 ±151%    +304.7%       0.82 ± 29%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      1.10 ± 31%     -70.5%       0.33 ±  9%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    500.62           -74.7%     126.72 ± 19%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.39 ± 43%    +156.4%       1.00 ± 17%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_ifconf
      0.40 ± 39%    +150.1%       1.01 ± 17%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.devinet_ioctl
      1.00 ±  8%     -41.3%       0.58 ±  5%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.21 ± 24%     -98.1%       0.00 ± 38%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      4980           -79.9%       1000        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      1.15 ±108%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.44 ± 96%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.41 ±152%    +306.1%       1.65 ± 29%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      3.88 ±  7%     -66.4%       1.30 ± 27%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.36 ±  5%     -61.1%       2.09 ±  4%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      2.67 ± 38%     -98.7%       0.03 ± 71%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open



***************************************************************************************************
lkp-srf-2sp3: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/200%/debian-12-x86_64-20240206.cgz/300s/lkp-srf-2sp3/TCP_MAERTS/netperf

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    306685 ±  2%     +21.9%     373719 ± 12%  meminfo.Mapped
    244930 ± 10%     +37.3%     336267 ± 13%  numa-meminfo.node1.Mapped
     61246 ± 10%     +37.9%      84450 ± 13%  numa-vmstat.node1.nr_mapped
    167630 ±  6%    +620.6%    1207890 ± 20%  vmstat.system.cs
      3.72            +0.4        4.16 ±  2%  mpstat.cpu.all.soft%
      0.79            +0.2        0.94 ±  5%  mpstat.cpu.all.usr%
     77063 ±  2%     +21.6%      93738 ± 12%  proc-vmstat.nr_mapped
 1.337e+09            +4.9%  1.402e+09        proc-vmstat.numa_hit
 1.337e+09            +4.9%  1.402e+09        proc-vmstat.numa_local
 1.068e+10            +4.9%   1.12e+10        proc-vmstat.pgalloc_normal
 1.068e+10            +4.9%   1.12e+10        proc-vmstat.pgfree
      3050            +7.7%       3286 ±  2%  netperf.ThroughputBoth_Mbps
   1171565            +7.7%    1261877 ±  2%  netperf.ThroughputBoth_total_Mbps
      3050            +7.7%       3286 ±  2%  netperf.Throughput_Mbps
   1171565            +7.7%    1261877 ±  2%  netperf.Throughput_total_Mbps
   7775337           +40.3%   10907917 ±  3%  netperf.time.involuntary_context_switches
      9436            -4.7%       8995        netperf.time.percent_of_cpu_this_job_got
     28543            -4.9%      27151        netperf.time.system_time
     81.34           +47.4%     119.87 ±  8%  netperf.time.user_time
  16720953 ± 10%    +928.6%   1.72e+08 ± 21%  netperf.time.voluntary_context_switches
 2.682e+09            +7.7%  2.888e+09 ±  2%  netperf.workload
   3412907 ± 11%     +52.0%    5189109 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      1.22 ± 12%     -27.3%       0.89 ±  8%  sched_debug.cfs_rq:/.h_nr_queued.min
      0.98 ±  2%     +11.4%       1.09 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.90 ±  2%      +9.9%       0.99 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      5114 ±  8%     +21.1%       6193 ± 14%  sched_debug.cfs_rq:/.load.avg
      6126 ± 95%    +197.0%      18194 ± 52%  sched_debug.cfs_rq:/.load.stddev
   3412907 ± 11%     +52.0%    5189109 ±  4%  sched_debug.cfs_rq:/.min_vruntime.stddev
      4874 ±  2%     +11.5%       5437 ±  3%  sched_debug.cfs_rq:/.runnable_avg.max
    773.27 ±  3%     +15.1%     889.83 ±  3%  sched_debug.cfs_rq:/.runnable_avg.stddev
     93.93 ±  4%     +17.0%     109.92 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
    638.98 ±  2%     +10.5%     706.28 ±  3%  sched_debug.cfs_rq:/.util_est.stddev
    635.15 ± 19%     +79.5%       1140 ±  6%  sched_debug.cpu.clock_task.stddev
    131209 ± 10%     +20.6%     158279 ± 10%  sched_debug.cpu.max_idle_balance_cost.stddev
      1.22 ± 15%     -29.5%       0.86 ±  7%  sched_debug.cpu.nr_running.min
      0.98 ±  2%     +12.0%       1.09 ±  3%  sched_debug.cpu.nr_running.stddev
    144909 ±  7%    +571.8%     973498 ± 20%  sched_debug.cpu.nr_switches.avg
    521657 ± 15%    +739.8%    4380922 ± 24%  sched_debug.cpu.nr_switches.max
     74844 ±  4%     +48.0%     110781 ± 10%  sched_debug.cpu.nr_switches.min
     69526 ± 17%   +1338.8%    1000330 ± 12%  sched_debug.cpu.nr_switches.stddev
     49.95           -15.5%      42.22 ±  4%  perf-stat.i.MPKI
 1.105e+10           +21.2%  1.339e+10 ±  4%  perf-stat.i.branch-instructions
      0.38            +0.0        0.39        perf-stat.i.branch-miss-rate%
  40500273           +25.6%   50848366 ±  5%  perf-stat.i.branch-misses
     57.26            +1.6       58.84        perf-stat.i.cache-miss-rate%
 2.854e+09            +1.6%    2.9e+09        perf-stat.i.cache-misses
  4.98e+09            -1.1%  4.924e+09        perf-stat.i.cache-references
    163743 ±  6%    +636.8%    1206409 ± 20%  perf-stat.i.context-switches
     10.67           -16.8%       8.87 ±  4%  perf-stat.i.cpi
    788.92 ±  2%     +14.7%     904.64 ±  2%  perf-stat.i.cpu-migrations
    215.18            -1.4%     212.21        perf-stat.i.cycles-between-cache-misses
 5.724e+10           +21.3%  6.943e+10 ±  4%  perf-stat.i.instructions
      0.10           +20.4%       0.12 ±  4%  perf-stat.i.ipc
      0.22 ± 32%   +2720.2%       6.27 ± 20%  perf-stat.i.metric.K/sec
     49.87           -16.1%      41.87 ±  4%  perf-stat.overall.MPKI
      0.37            +0.0        0.38        perf-stat.overall.branch-miss-rate%
     57.31            +1.6       58.90        perf-stat.overall.cache-miss-rate%
     10.65           -17.4%       8.80 ±  4%  perf-stat.overall.cpi
    213.53            -1.6%     210.17        perf-stat.overall.cycles-between-cache-misses
      0.09           +21.3%       0.11 ±  4%  perf-stat.overall.ipc
      6443           +12.7%       7263 ±  2%  perf-stat.overall.path-length
 1.099e+10           +21.2%  1.333e+10 ±  4%  perf-stat.ps.branch-instructions
  40255640           +25.6%   50561552 ±  5%  perf-stat.ps.branch-misses
  2.84e+09            +1.6%  2.886e+09        perf-stat.ps.cache-misses
 4.956e+09            -1.1%  4.901e+09        perf-stat.ps.cache-references
    162829 ±  6%    +637.3%    1200587 ± 20%  perf-stat.ps.context-switches
    780.07 ±  2%     +14.8%     895.36 ±  2%  perf-stat.ps.cpu-migrations
 5.695e+10           +21.3%   6.91e+10 ±  4%  perf-stat.ps.instructions
 1.728e+13           +21.5%  2.099e+13 ±  4%  perf-stat.total.instructions
      8.03 ± 18%     -68.8%       2.50 ± 86%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      3.67 ±  6%     -12.7%       3.20 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      6.14 ±  4%     -31.8%       4.19 ±  7%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      1.10 ± 21%     -32.2%       0.75 ± 21%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      2.75 ± 12%     -30.9%       1.90 ± 23%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      4.24           +28.8%       5.45 ±  8%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      5.44 ± 65%    +370.0%      25.55 ± 46%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      9.31 ± 61%   +3846.0%     367.32 ±131%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.27 ± 17%     -37.5%       5.17 ± 22%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      1027 ± 67%    +167.7%       2750 ± 46%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     16.86 ± 18%     -75.8%       4.07 ±141%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      7.98 ±  6%     -16.4%       6.67 ±  9%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
     12.79 ±  4%     -31.7%       8.73 ±  7%  perf-sched.wait_and_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
     10.53           +20.0%      12.63 ±  7%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
    162.68 ±  3%     -17.5%     134.19 ±  7%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     21026 ± 11%    +102.8%      42630 ± 15%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
     97720 ± 35%     +60.6%     156896 ±  7%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
    168.83 ± 11%     +94.2%     327.83 ± 15%  perf-sched.wait_and_delay.count.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
    513.00 ± 71%    +241.3%       1750 ± 19%  perf-sched.wait_and_delay.count.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
    378227 ± 19%     +49.1%     563802 ±  4%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    708.83 ± 46%    +169.1%       1907 ± 18%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    216494 ± 11%     -35.5%     139689 ± 11%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
    196.65 ±128%   +1011.1%       2184 ±101%  perf-sched.wait_and_delay.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      8.84 ± 19%     -58.9%       3.63 ± 61%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      4.31 ±  6%     -19.6%       3.46 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      6.64 ±  4%     -31.7%       4.54 ±  7%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
    338.36 ±139%     -99.3%       2.40 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    143.60 ± 84%     -97.5%       3.60 ± 36%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      6.30           +14.0%       7.18 ±  6%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
    162.51 ±  3%     -17.7%     133.83 ±  7%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    339.75 ±138%     -98.5%       5.09 ± 67%  perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      1755 ± 41%     +79.6%       3151 ± 28%  perf-sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read



***************************************************************************************************
lkp-gnr-2sp3: 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2sp3/pwrite3/will-it-scale

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    495.17 ±  6%     +52.6%     755.67 ±  3%  perf-c2c.DRAM.local
     62494           +29.1%      80709 ±  7%  uptime.idle
 2.241e+09 ±  7%    +817.5%  2.056e+10 ± 28%  cpuidle..time
   2486747 ± 22%    +433.5%   13267360 ± 26%  cpuidle..usage
      3.32 ±  6%    +691.1%      26.29 ± 27%  vmstat.cpu.id
     19982 ± 29%    +299.7%      79871 ± 41%  vmstat.system.cs
   2202934           -10.2%    1979216 ±  4%  meminfo.Active
   2202934           -10.2%    1979216 ±  4%  meminfo.Active(anon)
    174250 ± 19%     -29.1%     123491 ± 13%  meminfo.Mapped
   1358971           -16.0%    1141476 ±  7%  meminfo.Shmem
   3646055 ± 40%     -41.7%    2127009 ± 71%  numa-meminfo.node1.FilePages
    151686 ± 32%     -54.1%      69581 ± 54%  numa-meminfo.node1.Mapped
   1225823 ±  7%     -20.1%     979942 ± 13%  numa-meminfo.node1.Shmem
   2420190 ± 59%     -52.6%    1147007 ±135%  numa-meminfo.node1.Unevictable
    848825 ±  4%     -60.6%     334391 ± 21%  will-it-scale.256.processes
      2.34 ±  9%    +987.8%      25.40 ± 29%  will-it-scale.256.processes_idle
      3315 ±  4%     -60.6%       1305 ± 21%  will-it-scale.per_process_ops
    848825 ±  4%     -60.6%     334391 ± 21%  will-it-scale.workload
    911634 ± 40%     -41.7%     531686 ± 71%  numa-vmstat.node1.nr_file_pages
     38323 ± 32%     -53.8%      17709 ± 53%  numa-vmstat.node1.nr_mapped
    306576 ±  7%     -20.1%     244919 ± 13%  numa-vmstat.node1.nr_shmem
    605047 ± 59%     -52.6%     286751 ±135%  numa-vmstat.node1.nr_unevictable
    605047 ± 59%     -52.6%     286751 ±135%  numa-vmstat.node1.nr_zone_unevictable
      2.83 ±  8%     +23.0       25.79 ± 28%  mpstat.cpu.all.idle%
      0.49            +0.5        0.99 ± 11%  mpstat.cpu.all.irq%
      0.01 ±  4%      +0.1        0.08 ± 34%  mpstat.cpu.all.soft%
     96.56           -23.5       73.07 ± 10%  mpstat.cpu.all.sys%
      0.10 ± 17%      -0.0        0.07 ± 25%  mpstat.cpu.all.usr%
     99.44           -20.1%      79.44 ± 13%  mpstat.max_utilization_pct
    550559           -10.1%     494873 ±  4%  proc-vmstat.nr_active_anon
   1224645            -4.4%    1170439        proc-vmstat.nr_file_pages
     44178 ± 19%     -29.0%      31350 ± 13%  proc-vmstat.nr_mapped
    339576           -16.0%     285368 ±  7%  proc-vmstat.nr_shmem
    129664            -1.6%     127553        proc-vmstat.nr_slab_unreclaimable
    550559           -10.1%     494873 ±  4%  proc-vmstat.nr_zone_active_anon
     19739 ± 61%    +134.0%      46184 ± 19%  proc-vmstat.numa_hint_faults
     12532 ± 64%    +220.6%      40178 ± 20%  proc-vmstat.numa_hint_faults_local
   1799418            -5.2%    1706194        proc-vmstat.numa_hit
   1535263            -6.1%    1442214 ±  2%  proc-vmstat.numa_local
   1929792            -5.6%    1821092        proc-vmstat.pgalloc_normal
      0.08 ±  2%     +28.2%       0.10 ±  5%  perf-stat.i.MPKI
 4.292e+10           -22.9%  3.307e+10 ±  9%  perf-stat.i.branch-instructions
      0.02 ±  2%      +0.0        0.03 ±  3%  perf-stat.i.branch-miss-rate%
   6247563 ±  3%     +14.1%    7127517 ±  5%  perf-stat.i.branch-misses
     20048 ± 29%    +302.4%      80680 ± 41%  perf-stat.i.context-switches
      4.17            -1.7%       4.10        perf-stat.i.cpi
 8.963e+11           -24.2%  6.798e+11 ± 10%  perf-stat.i.cpu-cycles
     58005 ±  2%     -24.2%      43965 ±  5%  perf-stat.i.cycles-between-cache-misses
 2.146e+11           -23.0%  1.652e+11 ±  9%  perf-stat.i.instructions
      0.24            +1.7%       0.25        perf-stat.i.ipc
      0.07 ±  2%     +30.9%       0.10 ±  5%  perf-stat.overall.MPKI
      0.01 ±  3%      +0.0        0.02 ±  5%  perf-stat.overall.branch-miss-rate%
      4.18            -1.6%       4.11        perf-stat.overall.cpi
     57130 ±  2%     -24.6%      43086 ±  5%  perf-stat.overall.cycles-between-cache-misses
      0.24            +1.6%       0.24        perf-stat.overall.ipc
  76432035 ±  4%     +99.7%  1.526e+08 ± 11%  perf-stat.overall.path-length
 4.277e+10           -22.9%  3.298e+10 ±  9%  perf-stat.ps.branch-instructions
   6226439 ±  3%     +13.3%    7057144 ±  5%  perf-stat.ps.branch-misses
     19983 ± 29%    +301.2%      80169 ± 41%  perf-stat.ps.context-switches
 8.933e+11           -24.1%  6.779e+11 ± 10%  perf-stat.ps.cpu-cycles
 2.139e+11           -23.0%  1.648e+11 ±  9%  perf-stat.ps.instructions
 6.473e+13           -22.9%  4.989e+13 ±  9%  perf-stat.total.instructions
  36927349           -42.7%   21166906 ± 18%  sched_debug.cfs_rq:/.avg_vruntime.avg
  37069728           -23.2%   28469387 ± 10%  sched_debug.cfs_rq:/.avg_vruntime.max
  29889816 ±  5%     -84.5%    4642664 ± 27%  sched_debug.cfs_rq:/.avg_vruntime.min
    491905 ± 21%   +1883.7%    9757910 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.77 ±  3%     -18.4%       0.63 ± 12%  sched_debug.cfs_rq:/.h_nr_queued.avg
      1.75 ±  7%     -27.0%       1.28 ± 14%  sched_debug.cfs_rq:/.h_nr_queued.max
      0.25 ± 13%     +48.8%       0.37 ± 15%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.77 ±  3%     -18.4%       0.63 ± 12%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      1.64 ± 10%     -27.1%       1.19 ± 18%  sched_debug.cfs_rq:/.h_nr_runnable.max
      0.25 ± 13%     +49.4%       0.37 ± 15%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      3852 ±  7%     +20.4%       4636 ± 16%  sched_debug.cfs_rq:/.load.avg
      1.64 ± 14%     -59.3%       0.67 ± 20%  sched_debug.cfs_rq:/.load_avg.min
  36927349           -42.7%   21166906 ± 18%  sched_debug.cfs_rq:/.min_vruntime.avg
  37069728           -23.2%   28469387 ± 10%  sched_debug.cfs_rq:/.min_vruntime.max
  29889816 ±  5%     -84.5%    4642664 ± 27%  sched_debug.cfs_rq:/.min_vruntime.min
    491905 ± 21%   +1883.7%    9757910 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ±  3%     -18.1%       0.63 ± 12%  sched_debug.cfs_rq:/.nr_queued.avg
      0.24 ± 16%     +57.1%       0.37 ± 15%  sched_debug.cfs_rq:/.nr_queued.stddev
    790.52 ±  2%     -22.5%     612.40 ± 11%  sched_debug.cfs_rq:/.runnable_avg.avg
      1511 ±  8%     -23.7%       1153 ± 13%  sched_debug.cfs_rq:/.runnable_avg.max
    411.75 ± 19%     -79.8%      83.19 ± 50%  sched_debug.cfs_rq:/.runnable_avg.min
     96.86 ±  4%    +220.8%     310.73 ± 16%  sched_debug.cfs_rq:/.runnable_avg.stddev
    787.67 ±  2%     -22.4%     611.05 ± 11%  sched_debug.cfs_rq:/.util_avg.avg
      1218 ±  8%     -17.8%       1001        sched_debug.cfs_rq:/.util_avg.max
    341.69 ± 29%     -78.0%      75.06 ± 53%  sched_debug.cfs_rq:/.util_avg.min
     89.07 ±  7%    +246.3%     308.43 ± 16%  sched_debug.cfs_rq:/.util_avg.stddev
    738.25 ±  3%     -22.4%     572.51 ± 14%  sched_debug.cfs_rq:/.util_est.avg
      1316 ± 13%     -25.6%     979.92 ±  4%  sched_debug.cfs_rq:/.util_est.max
    223.31 ± 15%     +60.9%     359.21 ± 13%  sched_debug.cfs_rq:/.util_est.stddev
    491.62 ±  2%    +187.3%       1412 ± 28%  sched_debug.cpu.clock_task.stddev
      4763 ±  3%     -18.3%       3892 ± 12%  sched_debug.cpu.curr->pid.avg
      1464 ± 15%     +56.2%       2286 ± 15%  sched_debug.cpu.curr->pid.stddev
   1225968           -39.6%     740896 ±  4%  sched_debug.cpu.max_idle_balance_cost.avg
    689300 ±  8%     -27.5%     500000        sched_debug.cpu.max_idle_balance_cost.min
    195813 ±  5%     +47.2%     288179 ±  4%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.77 ±  3%     -18.7%       0.63 ± 12%  sched_debug.cpu.nr_running.avg
      1.75 ±  7%     -19.0%       1.42 ± 13%  sched_debug.cpu.nr_running.max
      0.25 ± 13%     +50.6%       0.38 ± 15%  sched_debug.cpu.nr_running.stddev
     13387 ± 25%    +238.9%      45364 ± 31%  sched_debug.cpu.nr_switches.avg
    114155 ± 48%    +174.0%     312755 ± 72%  sched_debug.cpu.nr_switches.max
      8431 ± 20%     -80.3%       1658 ± 11%  sched_debug.cpu.nr_switches.min
      7889 ± 43%    +872.9%      76756 ± 60%  sched_debug.cpu.nr_switches.stddev
      0.07 ± 32%    +194.9%       0.22 ± 35%  sched_debug.cpu.nr_uninterruptible.avg
      0.01 ±  9%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.23 ± 85%     -96.7%       0.01 ±145%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.36 ± 49%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      0.30 ± 52%     -97.7%       0.01 ± 13%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.71 ± 56%     -98.8%       0.01 ± 18%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.05 ± 50%     -81.0%       0.01 ±  8%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.01 ± 60%     -78.5%       0.00 ± 13%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      0.26 ± 57%     -90.5%       0.02 ±158%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.08 ± 44%     -85.2%       0.01 ± 15%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.34 ±158%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.02 ± 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.36 ± 30%     -89.4%       0.04 ± 71%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.21 ± 22%     -96.2%       0.01 ± 12%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.02 ± 18%     -62.9%       0.01 ± 13%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 13%     +50.0%       0.01 ± 13%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.04 ± 59%     -71.9%       0.01 ± 57%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.02 ±  9%    +240.0%       0.05 ± 13%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ± 16%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      2.14 ± 77%     -99.0%       0.02 ±143%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      2.29 ± 56%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      2.48 ± 45%     -99.6%       0.01 ± 30%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      2.16 ± 55%     -99.3%       0.02 ± 20%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.62 ± 41%     -98.7%       0.03 ± 26%  perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      2.57 ± 19%     -71.3%       0.74 ± 56%  perf-sched.sch_delay.max.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      2.83 ± 31%     -97.0%       0.09 ±145%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.71 ± 28%     -98.3%       0.05 ± 48%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.67 ±161%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.07 ± 76%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.85 ± 28%     -82.0%       0.51 ± 91%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      3.34 ± 11%     -98.3%       0.06 ±104%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      2.16 ± 17%     -97.6%       0.05 ± 66%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.93 ± 78%     -86.3%       0.13 ±172%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     27.34 ± 17%     -48.8%      14.01 ± 18%  perf-sched.total_wait_and_delay.average.ms
     62576 ± 18%    +137.2%     148418 ± 18%  perf-sched.total_wait_and_delay.count.ms
     27.26 ± 16%     -48.7%      13.98 ± 18%  perf-sched.total_wait_time.average.ms
    133.01 ± 28%     +58.5%     210.76 ±  3%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.95 ± 17%    +224.8%       3.10 ± 19%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.shmem_file_write_iter
     15.33 ± 19%     -90.2%       1.50 ± 74%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    333.33 ± 32%     -42.5%     191.67 ±  3%  perf-sched.wait_and_delay.count.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     54177 ± 20%    +161.2%     141513 ± 19%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.shmem_file_write_iter
    633.67 ±  3%     -66.8%     210.17 ±141%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1948           +12.6%       2195        perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    477.50 ±  3%     -41.3%     280.17 ± 12%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±  9%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      1.85 ± 22%    +185.4%       5.28 ±  3%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
     59.14 ±179%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
    132.96 ± 28%     +58.5%     210.75 ±  3%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.08 ± 93%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.87 ± 15%    +251.6%       3.06 ± 19%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.shmem_file_write_iter
      1.03 ± 16%     -33.3%       0.69 ±  7%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.04 ± 65%     -82.7%       0.01 ±105%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.02 ± 16%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     10.21 ± 23%     -27.4%       7.42 ±  5%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
    525.30 ±141%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      1000           -99.3%       7.28 ±  6%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.46 ±104%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
     11.93 ± 16%     -27.6%       8.63 ±  8%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.shmem_file_write_iter
     17.21 ± 67%     -84.1%       2.74 ± 10%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.93 ± 78%     -86.3%       0.13 ±172%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     98.85            -2.2       96.62        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.down_write.shmem_file_write_iter.vfs_write
     99.22            -1.4       97.82        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
     99.28            -1.4       97.92        perf-profile.calltrace.cycles-pp.down_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
     99.64            -0.5       99.12        perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe
     99.66            -0.5       99.14        perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     99.66            -0.5       99.14        perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     99.72            -0.5       99.22        perf-profile.calltrace.cycles-pp.__libc_pwrite
     99.72            -0.5       99.22        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     99.71            -0.5       99.21        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      0.53 ±  2%      +0.8        1.33 ± 11%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.osq_lock.rwsem_down_write_slowpath.down_write.shmem_file_write_iter
      0.43 ± 44%      +0.9        1.31 ± 11%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.osq_lock.rwsem_down_write_slowpath.down_write
      0.00            +0.9        0.93 ± 43%  perf-profile.calltrace.cycles-pp.rwsem_wake.up_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
      0.25 ±100%      +1.0        1.28 ± 11%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.osq_lock.rwsem_down_write_slowpath
      0.00            +1.1        1.05 ± 39%  perf-profile.calltrace.cycles-pp.up_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
      0.00            +1.1        1.06 ± 12%  perf-profile.calltrace.cycles-pp.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.17 ±141%      +1.1        1.28 ± 11%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.osq_lock
      0.00            +1.1        1.11 ± 11%  perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.00            +1.1        1.12 ± 11%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
     98.87            -2.2       96.64        perf-profile.children.cycles-pp.osq_lock
     99.22            -1.4       97.83        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     99.28            -1.4       97.92        perf-profile.children.cycles-pp.down_write
     99.66            -0.5       99.14        perf-profile.children.cycles-pp.__x64_sys_pwrite64
     99.74            -0.5       99.23        perf-profile.children.cycles-pp.__libc_pwrite
     99.72            -0.4       99.30        perf-profile.children.cycles-pp.shmem_file_write_iter
     99.74            -0.4       99.33        perf-profile.children.cycles-pp.vfs_write
     99.82            -0.4       99.44        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.81            -0.4       99.44        perf-profile.children.cycles-pp.do_syscall_64
      0.14 ±  8%      +0.1        0.20 ± 18%  perf-profile.children.cycles-pp.rwsem_spin_on_owner
      0.00            +0.1        0.06 ± 17%  perf-profile.children.cycles-pp.idle_cpu
      0.05            +0.1        0.12 ± 11%  perf-profile.children.cycles-pp.clockevents_program_event
      0.06 ±  7%      +0.1        0.14 ±  4%  perf-profile.children.cycles-pp.get_jiffies_update
      0.00            +0.1        0.08 ± 17%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
      0.06 ±  7%      +0.1        0.14 ±  4%  perf-profile.children.cycles-pp.tmigr_requires_handle_remote
      0.05            +0.1        0.13 ± 16%  perf-profile.children.cycles-pp.ktime_get
      0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp._raw_spin_lock
      0.07 ± 10%      +0.1        0.15 ± 10%  perf-profile.children.cycles-pp.copy_folio_from_iter_atomic
      0.00            +0.1        0.09 ± 22%  perf-profile.children.cycles-pp.kick_ilb
      0.00            +0.1        0.09 ± 22%  perf-profile.children.cycles-pp.update_curr
      0.07 ± 21%      +0.1        0.18 ±  8%  perf-profile.children.cycles-pp.perf_mmap__read_head
      0.07 ± 16%      +0.1        0.17 ±  8%  perf-profile.children.cycles-pp.ring_buffer_read_head
      0.19 ±  6%      +0.1        0.30 ±  7%  perf-profile.children.cycles-pp.generic_perform_write
      0.09 ± 20%      +0.1        0.20 ± 10%  perf-profile.children.cycles-pp.ksys_write
      0.00            +0.1        0.11 ± 13%  perf-profile.children.cycles-pp.wake_q_add
      0.00            +0.1        0.11 ±  8%  perf-profile.children.cycles-pp.update_load_avg
      0.13 ±  9%      +0.1        0.25 ± 14%  perf-profile.children.cycles-pp.update_cfs_group
      0.00            +0.1        0.12 ± 34%  perf-profile.children.cycles-pp.handle_softirqs
      0.11 ± 17%      +0.1        0.22 ± 11%  perf-profile.children.cycles-pp.writen
      0.10 ± 20%      +0.1        0.22 ± 11%  perf-profile.children.cycles-pp.write
      0.10 ± 19%      +0.1        0.22 ± 11%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +0.1        0.12 ± 30%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.00            +0.1        0.14 ± 22%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.14 ± 21%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.00            +0.1        0.15 ± 20%  perf-profile.children.cycles-pp.sched_balance_find_src_group
      0.00            +0.2        0.15 ± 22%  perf-profile.children.cycles-pp.sched_balance_newidle
      0.00            +0.2        0.16 ± 29%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.00            +0.2        0.16 ± 20%  perf-profile.children.cycles-pp.sched_balance_rq
      0.00            +0.2        0.17 ± 28%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.00            +0.2        0.17 ± 19%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.00            +0.2        0.18 ± 20%  perf-profile.children.cycles-pp.__pick_next_task
      0.05 ± 45%      +0.2        0.23 ± 18%  perf-profile.children.cycles-pp.try_to_wake_up
      0.06 ± 11%      +0.2        0.26 ± 19%  perf-profile.children.cycles-pp.wake_up_q
      0.21 ± 17%      +0.2        0.44 ±  6%  perf-profile.children.cycles-pp.perf_mmap__push
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.acpi_idle_enter
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.acpi_safe_halt
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.cpuidle_enter
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.00            +0.2        0.22 ± 26%  perf-profile.children.cycles-pp.pv_native_safe_halt
      0.20 ±  6%      +0.2        0.43 ± 10%  perf-profile.children.cycles-pp.task_tick_fair
      0.22 ± 18%      +0.2        0.45 ±  6%  perf-profile.children.cycles-pp.handle_internal_command
      0.22 ± 18%      +0.2        0.45 ±  6%  perf-profile.children.cycles-pp.main
      0.22 ± 18%      +0.2        0.45 ±  6%  perf-profile.children.cycles-pp.run_builtin
      0.22 ± 19%      +0.2        0.44 ±  6%  perf-profile.children.cycles-pp.__cmd_record
      0.22 ± 19%      +0.2        0.44 ±  6%  perf-profile.children.cycles-pp.cmd_record
      0.22 ± 19%      +0.2        0.44 ±  6%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.00            +0.2        0.23 ± 19%  perf-profile.children.cycles-pp.schedule_preempt_disabled
      0.00            +0.2        0.24 ± 17%  perf-profile.children.cycles-pp.schedule
      0.00            +0.2        0.25 ± 25%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.02 ±141%      +0.3        0.27 ± 17%  perf-profile.children.cycles-pp.__schedule
      0.00            +0.3        0.26 ± 20%  perf-profile.children.cycles-pp.start_secondary
      0.00            +0.3        0.26 ± 29%  perf-profile.children.cycles-pp.nohz_balancer_kick
      0.00            +0.3        0.29 ± 24%  perf-profile.children.cycles-pp.common_startup_64
      0.00            +0.3        0.29 ± 24%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.00            +0.3        0.29 ± 24%  perf-profile.children.cycles-pp.do_idle
      0.25 ±  5%      +0.3        0.54 ±  9%  perf-profile.children.cycles-pp.sched_tick
      0.00            +0.5        0.46 ± 51%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.09 ± 15%      +0.6        0.69 ± 52%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.39 ±  3%      +0.7        1.11 ± 12%  perf-profile.children.cycles-pp.update_process_times
      0.41 ±  3%      +0.8        1.16 ± 12%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.42 ±  2%      +0.8        1.17 ± 12%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.15 ± 11%      +0.8        0.94 ± 44%  perf-profile.children.cycles-pp.rwsem_wake
      0.22 ±  7%      +0.8        1.06 ± 38%  perf-profile.children.cycles-pp.up_write
      0.50 ±  2%      +0.8        1.35 ± 11%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.50 ±  2%      +0.8        1.34 ± 11%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.52 ±  2%      +0.9        1.39 ± 11%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.54 ±  2%      +0.9        1.42 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.02 ±141%      +1.0        0.97 ± 58%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     98.34            -3.0       95.31        perf-profile.self.cycles-pp.osq_lock
      0.07 ±  6%      +0.0        0.12 ±  9%  perf-profile.self.cycles-pp.up_write
      0.14 ±  8%      +0.1        0.20 ± 17%  perf-profile.self.cycles-pp.rwsem_spin_on_owner
      0.00            +0.1        0.06 ± 14%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.1        0.06 ± 17%  perf-profile.self.cycles-pp.idle_cpu
      0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.update_load_avg
      0.06 ± 19%      +0.1        0.14 ±  9%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.06 ±  7%      +0.1        0.14 ±  4%  perf-profile.self.cycles-pp.get_jiffies_update
      0.08 ±  8%      +0.1        0.16 ± 14%  perf-profile.self.cycles-pp.rwsem_down_write_slowpath
      0.07 ± 11%      +0.1        0.15 ± 10%  perf-profile.self.cycles-pp.copy_folio_from_iter_atomic
      0.00            +0.1        0.10 ± 20%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.11 ± 13%  perf-profile.self.cycles-pp.wake_q_add
      0.13 ±  9%      +0.1        0.24 ± 15%  perf-profile.self.cycles-pp.update_cfs_group
      0.06 ± 50%      +0.1        0.17 ±  9%  perf-profile.self.cycles-pp.ring_buffer_read_head
      0.01 ±223%      +0.1        0.13 ± 14%  perf-profile.self.cycles-pp.ktime_get
      0.00            +0.2        0.21 ± 25%  perf-profile.self.cycles-pp.nohz_balancer_kick
      0.02 ±141%      +1.0        0.97 ± 58%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



***************************************************************************************************
lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/timer/stress-ng/60s

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 8.839e+08 ±  8%     -34.0%  5.835e+08 ± 18%  cpuidle..time
    374076           -22.0%     291744 ±  8%  cpuidle..usage
   4380694 ±  5%     -37.1%    2755939 ±  4%  numa-numastat.node1.local_node
   4513772 ±  4%     -35.4%    2915575 ±  5%  numa-numastat.node1.numa_hit
    831.83 ± 48%    +357.4%       3804 ±  5%  perf-c2c.DRAM.local
     50467 ±  7%     -94.1%       2992 ± 10%  perf-c2c.HITM.local
     51333 ±  7%     -92.2%       4001 ±  9%  perf-c2c.HITM.total
     58.62 ±  2%    +195.9%     173.43        vmstat.procs.r
  14909196          +289.6%   58088357        vmstat.system.cs
  17927003          +261.9%   64878409        vmstat.system.in
      7.56 ±  8%      -3.8        3.80 ± 15%  mpstat.cpu.all.idle%
     74.80           -49.3       25.52        mpstat.cpu.all.irq%
      0.12            +1.0        1.07        mpstat.cpu.all.soft%
     14.42 ±  2%     +46.7       61.14        mpstat.cpu.all.sys%
      3.10            +5.4        8.46        mpstat.cpu.all.usr%
  11720320 ±  3%     -36.0%    7506290 ±  5%  meminfo.Active
  11720320 ±  3%     -36.0%    7506290 ±  5%  meminfo.Active(anon)
  14500482 ±  2%     -30.1%   10137930 ±  3%  meminfo.Cached
  13718912 ±  2%     -31.9%    9341814 ±  4%  meminfo.Committed_AS
    890402 ±  3%     -15.3%     753873 ±  4%  meminfo.Mapped
  18655774           -22.5%   14456476 ±  3%  meminfo.Memused
  10939477 ±  3%     -39.9%    6576917 ±  6%  meminfo.Shmem
  18813194           -22.3%   14611140 ±  3%  meminfo.max_used_kB
   2878970 ±  3%     -37.3%    1805638 ±  5%  numa-vmstat.node1.nr_active_anon
    145499 ± 16%     +27.8%     185994 ± 13%  numa-vmstat.node1.nr_anon_pages
   3426306 ±  8%     -44.5%    1900170 ± 18%  numa-vmstat.node1.nr_file_pages
    218920 ±  3%     -28.1%     157490 ± 19%  numa-vmstat.node1.nr_mapped
   2733686 ±  3%     -40.7%    1619835 ±  5%  numa-vmstat.node1.nr_shmem
   2878963 ±  3%     -37.3%    1805632 ±  5%  numa-vmstat.node1.nr_zone_active_anon
   4513028 ±  4%     -35.4%    2914516 ±  5%  numa-vmstat.node1.numa_hit
   4379950 ±  5%     -37.1%    2754880 ±  4%  numa-vmstat.node1.numa_local
  20328031 ±  3%     -79.3%    4210756 ±  4%  stress-ng.time.involuntary_context_switches
    122804            -2.7%     119428        stress-ng.time.minor_page_faults
      3225          +291.3%      12621        stress-ng.time.percent_of_cpu_this_job_got
      1682          +300.9%       6745        stress-ng.time.system_time
    258.08          +228.2%     847.00        stress-ng.time.user_time
 4.978e+08          +274.5%  1.864e+09        stress-ng.time.voluntary_context_switches
 5.565e+08          +271.0%  2.064e+09        stress-ng.timer.ops
   9278066          +270.9%   34416857        stress-ng.timer.ops_per_sec
  11504678 ±  3%     -37.2%    7230485 ±  5%  numa-meminfo.node1.Active
  11504678 ±  3%     -37.2%    7230485 ±  5%  numa-meminfo.node1.Active(anon)
     23268 ± 62%    +349.7%     104639 ± 96%  numa-meminfo.node1.AnonHugePages
    581787 ± 16%     +27.9%     744372 ± 13%  numa-meminfo.node1.AnonPages
    677481 ± 13%     +24.4%     842680 ± 12%  numa-meminfo.node1.AnonPages.max
  13694251 ±  8%     -44.4%    7608213 ± 18%  numa-meminfo.node1.FilePages
    872038 ±  3%     -27.4%     632983 ± 19%  numa-meminfo.node1.Mapped
  15974582 ±  7%     -37.5%    9978910 ± 14%  numa-meminfo.node1.MemUsed
  10923766 ±  3%     -40.6%    6486872 ±  5%  numa-meminfo.node1.Shmem
   2932299 ±  3%     -36.1%    1873947 ±  5%  proc-vmstat.nr_active_anon
   3627304 ±  2%     -30.2%    2531584 ±  3%  proc-vmstat.nr_file_pages
    223399 ±  3%     -15.3%     189137 ±  4%  proc-vmstat.nr_mapped
   2737051 ±  3%     -40.0%    1641331 ±  5%  proc-vmstat.nr_shmem
     45179            -5.5%      42672        proc-vmstat.nr_slab_reclaimable
   2932299 ±  3%     -36.1%    1873947 ±  5%  proc-vmstat.nr_zone_active_anon
    170124 ± 10%     +36.4%     232098 ± 20%  proc-vmstat.numa_hint_faults
    145600 ± 12%     +47.9%     215383 ± 20%  proc-vmstat.numa_hint_faults_local
   4945216 ±  2%     -30.7%    3428845 ±  4%  proc-vmstat.numa_hit
   4710249 ±  2%     -32.1%    3196862 ±  5%  proc-vmstat.numa_local
   5006509 ±  2%     -29.4%    3533388 ±  5%  proc-vmstat.pgalloc_normal
     31295 ±  2%      -9.6%      28276 ±  4%  proc-vmstat.pgreuse
      0.13 ±  2%     -64.8%       0.05 ±  7%  perf-stat.i.MPKI
  3.78e+10          +167.0%  1.009e+11        perf-stat.i.branch-instructions
      0.31            -0.1        0.18 ±  2%  perf-stat.i.branch-miss-rate%
  1.08e+08           +50.1%   1.62e+08 ±  2%  perf-stat.i.branch-misses
     10.00 ±  3%     +12.7       22.74 ±  5%  perf-stat.i.cache-miss-rate%
  25169188 ±  2%     -26.1%   18610547 ±  9%  perf-stat.i.cache-misses
 2.052e+08           -66.1%   69542808 ±  6%  perf-stat.i.cache-references
  15495581          +292.9%   60874742        perf-stat.i.context-switches
      3.39           -63.6%       1.23        perf-stat.i.cpi
 6.179e+11            +2.9%  6.361e+11        perf-stat.i.cpu-cycles
     32444 ±  4%     -58.8%      13369 ± 10%  perf-stat.i.cpu-migrations
     48598 ±  5%     +32.2%      64230 ± 16%  perf-stat.i.cycles-between-cache-misses
 1.853e+11          +182.2%  5.229e+11        perf-stat.i.instructions
      0.30          +171.0%       0.82        perf-stat.i.ipc
     69.14          +293.0%     271.70        perf-stat.i.metric.K/sec
      0.14 ±  3%     -78.8%       0.03 ± 45%  perf-stat.overall.MPKI
      0.29            -0.2        0.13 ± 44%  perf-stat.overall.branch-miss-rate%
      3.33           -69.5%       1.02 ± 44%  perf-stat.overall.cpi
  24822910 ±  3%     -40.2%   14833984 ± 45%  perf-stat.ps.cache-misses
 2.018e+08           -72.4%   55793889 ± 45%  perf-stat.ps.cache-references
     31979 ±  4%     -66.6%      10693 ± 45%  perf-stat.ps.cpu-migrations
    422827          +586.9%    2904535        sched_debug.cfs_rq:/.avg_vruntime.avg
   1494052 ±  8%    +167.2%    3992568 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
    106476 ±  2%   +2450.1%    2715268 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.min
    139865 ±  6%     -23.6%     106906 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.45 ±  4%     -22.9%       0.35 ±  7%  sched_debug.cfs_rq:/.h_nr_queued.avg
      0.31 ±  4%     +11.5%       0.35 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.44 ±  4%     -23.3%       0.34 ±  8%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      0.32 ±  4%      +9.0%       0.35 ±  4%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    150427 ±  6%     +73.4%     260771 ± 28%  sched_debug.cfs_rq:/.left_deadline.avg
   1011330 ± 32%    +201.1%    3044655 ±  2%  sched_debug.cfs_rq:/.left_deadline.max
    207464 ±  5%    +294.8%     819064 ± 13%  sched_debug.cfs_rq:/.left_deadline.stddev
    150380 ±  6%     +73.4%     260748 ± 28%  sched_debug.cfs_rq:/.left_vruntime.avg
   1011176 ± 32%    +201.1%    3044403 ±  2%  sched_debug.cfs_rq:/.left_vruntime.max
    207402 ±  5%    +294.9%     818991 ± 13%  sched_debug.cfs_rq:/.left_vruntime.stddev
      6671 ± 10%     -62.1%       2525 ±  9%  sched_debug.cfs_rq:/.load.avg
    115596 ±165%     -82.8%      19918 ± 13%  sched_debug.cfs_rq:/.load.max
    422827          +586.9%    2904535        sched_debug.cfs_rq:/.min_vruntime.avg
   1494052 ±  8%    +167.2%    3992568 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
    106476 ±  2%   +2450.1%    2715268 ±  2%  sched_debug.cfs_rq:/.min_vruntime.min
    139865 ±  6%     -23.6%     106906 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.45 ±  2%     -28.7%       0.32 ± 11%  sched_debug.cfs_rq:/.nr_queued.avg
      0.27 ±  4%     +24.3%       0.34 ±  4%  sched_debug.cfs_rq:/.nr_queued.stddev
    150380 ±  6%     +73.4%     260748 ± 28%  sched_debug.cfs_rq:/.right_vruntime.avg
   1011176 ± 32%    +201.1%    3044403 ±  2%  sched_debug.cfs_rq:/.right_vruntime.max
    207402 ±  5%    +294.9%     818991 ± 13%  sched_debug.cfs_rq:/.right_vruntime.stddev
    243.60 ±  4%     +73.1%     421.68        sched_debug.cfs_rq:/.runnable_avg.avg
      1043 ±  2%     +15.4%       1204 ±  5%  sched_debug.cfs_rq:/.runnable_avg.max
      1.75 ±177%  +13161.9%     232.08 ± 50%  sched_debug.cfs_rq:/.runnable_avg.min
    165.57 ±  6%     -14.7%     141.26 ±  8%  sched_debug.cfs_rq:/.runnable_avg.stddev
    188.04 ±  5%    +122.3%     418.02        sched_debug.cfs_rq:/.util_avg.avg
      1042 ±  2%     +14.3%       1191 ±  5%  sched_debug.cfs_rq:/.util_avg.max
      1.00 ±165%  +22908.3%     230.08 ± 50%  sched_debug.cfs_rq:/.util_avg.min
    163.44 ±  5%     -13.9%     140.64 ±  8%  sched_debug.cfs_rq:/.util_avg.stddev
    362.91 ±  3%     -18.1%     297.06 ±  3%  sched_debug.cfs_rq:/.util_est.avg
      1212 ±  5%     -23.4%     929.25 ± 12%  sched_debug.cfs_rq:/.util_est.max
   2932116 ±  2%     -44.9%    1616135 ± 22%  sched_debug.cpu.avg_idle.max
    351856 ± 10%     -36.7%     222709 ± 11%  sched_debug.cpu.avg_idle.stddev
     68743 ±  3%     +11.8%      76828 ±  2%  sched_debug.cpu.clock_task.avg
     54408 ±  4%     +14.8%      62434 ±  3%  sched_debug.cpu.clock_task.min
      1299 ±  5%     -18.2%       1062        sched_debug.cpu.clock_task.stddev
      1737 ±  2%     +49.0%       2587 ±  2%  sched_debug.cpu.curr->pid.avg
   1026070           -26.7%     752086        sched_debug.cpu.max_idle_balance_cost.avg
    663991 ± 11%     -24.7%     500000        sched_debug.cpu.max_idle_balance_cost.min
    138202 ± 11%     +81.9%     251453 ±  4%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 10%    +147.3%       0.00 ± 60%  sched_debug.cpu.next_balance.stddev
      0.11 ± 13%    +273.0%       0.43 ±  2%  sched_debug.cpu.nr_running.avg
   2070582          +293.3%    8144031        sched_debug.cpu.nr_switches.avg
   3165060 ± 12%    +180.5%    8877583        sched_debug.cpu.nr_switches.max
      2619 ± 10%  +1.2e+05%    3246312 ±  9%  sched_debug.cpu.nr_switches.min
    299563 ±  3%     +78.0%     533152 ±  5%  sched_debug.cpu.nr_switches.stddev



***************************************************************************************************
lkp-icl-2sp7: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/min-nanosleep/stress-ng/60s

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    798405 ±  3%      +2.9%     821573        vmstat.system.cs
      2.86 ±  4%      -2.1        0.73        mpstat.cpu.all.irq%
      1.10 ±  3%      -0.5        0.65 ±  2%  mpstat.cpu.all.sys%
      9026 ±  2%      -9.5%       8166        sched_debug.cfs_rq:/.avg_vruntime.avg
      9026 ±  2%      -9.5%       8166        sched_debug.cfs_rq:/.min_vruntime.avg
    212.91 ±  3%     -10.0%     191.61 ±  5%  sched_debug.cfs_rq:/.runnable_avg.stddev
      9680 ±  6%     -70.7%       2832 ±  6%  perf-c2c.DRAM.remote
     25889           -86.5%       3500 ±  8%  perf-c2c.HITM.local
      6625 ±  4%     -75.7%       1612 ±  7%  perf-c2c.HITM.remote
     32514           -84.3%       5112 ±  3%  perf-c2c.HITM.total
     72248            +2.3%      73879        stress-ng.min-nanosleep.ops
      1203            +2.3%       1231        stress-ng.min-nanosleep.ops_per_sec
     51.50           -14.6%      44.00        stress-ng.time.percent_of_cpu_this_job_got
     25.69           -22.6%      19.88        stress-ng.time.system_time
  25427905            +2.3%   26004874        stress-ng.time.voluntary_context_switches
      0.01 ±  8%     -30.8%       0.01 ± 16%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 11%     -20.0%       0.01 ±  8%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.00 ± 11%     -55.6%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ± 10%     -42.4%       0.01 ±  8%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ±157%     -77.9%       0.01 ± 14%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr
      0.01 ±  7%     -36.4%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.03 ± 24%     -59.1%       0.01 ± 35%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.21 ±202%     -94.8%       0.01 ± 25%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr
      0.00 ± 11%     -55.6%       0.00        perf-sched.total_sch_delay.average.ms
    210.45 ±  5%      -9.4%     190.66 ±  5%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
    331.12 ±  9%     +16.7%     386.42 ±  7%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1087 ±  5%     -15.5%     918.33 ±  6%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3884 ± 13%     -23.1%       2986 ± 17%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    210.44 ±  5%      -9.4%     190.66 ±  5%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      0.05 ±157%     -77.9%       0.01 ± 14%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr
    331.11 ±  9%     +16.7%     386.41 ±  7%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.21 ±202%     -94.8%       0.01 ± 25%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr
      3884 ± 13%     -23.1%       2986 ± 17%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.23 ±  8%     -20.1%       0.99 ± 10%  perf-stat.i.MPKI
 2.343e+09 ±  2%     -11.2%  2.081e+09        perf-stat.i.branch-instructions
  47874283 ±  2%      -9.4%   43364196        perf-stat.i.branch-misses
     23.05 ±  6%     +15.8       38.81 ±  6%  perf-stat.i.cache-miss-rate%
  14180414 ±  8%     -21.5%   11128252 ±  9%  perf-stat.i.cache-misses
  54095888 ±  3%     -59.0%   22190233 ±  3%  perf-stat.i.cache-references
    829020 ±  2%      +3.4%     857028        perf-stat.i.context-switches
      2.16           -25.0%       1.62        perf-stat.i.cpi
 2.267e+10 ±  2%     -32.3%  1.535e+10        perf-stat.i.cpu-cycles
      2865 ±  4%     +55.4%       4453 ±  5%  perf-stat.i.cycles-between-cache-misses
 1.126e+10 ±  2%     -10.4%  1.009e+10        perf-stat.i.instructions
      0.50           +29.2%       0.64        perf-stat.i.ipc
     12.95 ±  2%      +3.4%      13.39        perf-stat.i.metric.K/sec
      2.04            +0.0        2.08        perf-stat.overall.branch-miss-rate%
     26.21 ±  7%     +23.9       50.06 ±  6%  perf-stat.overall.cache-miss-rate%
      2.01 ±  2%     -24.5%       1.52        perf-stat.overall.cpi
      1608 ±  6%     -13.6%       1389 ±  7%  perf-stat.overall.cycles-between-cache-misses
      0.50 ±  2%     +32.4%       0.66        perf-stat.overall.ipc
 2.305e+09 ±  2%     -11.2%  2.047e+09        perf-stat.ps.branch-instructions
  47097166 ±  2%      -9.4%   42662359        perf-stat.ps.branch-misses
  13943941 ±  8%     -21.5%   10946497 ±  9%  perf-stat.ps.cache-misses
  53209541 ±  3%     -59.0%   21830983 ±  3%  perf-stat.ps.cache-references
    815603 ±  2%      +3.4%     843028        perf-stat.ps.context-switches
  2.23e+10 ±  2%     -32.3%   1.51e+10        perf-stat.ps.cpu-cycles
 1.108e+10 ±  2%     -10.4%  9.927e+09        perf-stat.ps.instructions
 6.878e+11           -11.6%  6.081e+11        perf-stat.total.instructions



***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/switch/stress-ng/60s

commit: 
  570c8efd5e ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
  cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")

570c8efd5eb79c37 cccb45d7c4295bbfeba616582d0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    743461           +12.7%     838242 ±  8%  meminfo.Mapped
      2.12            -0.2        1.89        mpstat.cpu.all.irq%
     10845           +11.6%      12103 ±  9%  uptime.idle
    186621           +11.9%     208920 ±  8%  proc-vmstat.nr_mapped
    858832            +5.8%     908319 ±  4%  proc-vmstat.pgfault
     24656 ± 11%   +1747.2%     455454 ± 77%  sched_debug.cfs_rq:/.load.max
    122.86 ±  5%     +17.4%     144.19 ±  5%  sched_debug.cfs_rq:/.util_avg.stddev
   2649501 ± 10%     +16.4%    3084791 ± 11%  numa-vmstat.node1.nr_file_pages
    136293 ± 21%     +29.2%     176049 ± 22%  numa-vmstat.node1.nr_mapped
     18637 ± 25%     +46.2%      27244 ± 18%  numa-vmstat.node1.nr_slab_reclaimable
  10586514 ± 10%     +16.7%   12354729 ± 11%  numa-meminfo.node1.FilePages
     74512 ± 25%     +46.3%     109039 ± 18%  numa-meminfo.node1.KReclaimable
    541917 ± 21%     +30.7%     708183 ± 21%  numa-meminfo.node1.Mapped
  12771253 ±  9%     +13.9%   14550141 ± 10%  numa-meminfo.node1.MemUsed
     74512 ± 25%     +46.3%     109039 ± 18%  numa-meminfo.node1.SReclaimable
    267327 ± 12%     +21.0%     323424 ±  8%  numa-meminfo.node1.Slab
     15650            -9.0%      14241 ±  2%  stress-ng.switch.nanosecs_per_context_switch_pipe_method
 8.583e+08           +10.0%  9.438e+08 ±  2%  stress-ng.switch.ops
  14311754           +10.0%   15738006 ±  2%  stress-ng.switch.ops_per_sec
  68162389           +10.3%   75160292 ±  7%  stress-ng.time.involuntary_context_switches
     14828            +1.0%      14978        stress-ng.time.percent_of_cpu_this_job_got
      8284            +1.1%       8378        stress-ng.time.system_time
 1.484e+09            +1.6%  1.507e+09        stress-ng.time.voluntary_context_switches
 6.215e+10            -3.1%  6.023e+10        perf-stat.i.branch-instructions
 5.429e+08            -2.6%  5.287e+08        perf-stat.i.branch-misses
  4.73e+09            +1.5%  4.801e+09        perf-stat.i.cache-references
  40304510            +2.1%   41144079        perf-stat.i.context-switches
      2.01            +3.7%       2.09        perf-stat.i.cpi
 3.101e+11            -2.8%  3.013e+11        perf-stat.i.instructions
      0.52            -3.0%       0.50        perf-stat.i.ipc
    218.21            +1.7%     221.87        perf-stat.i.metric.K/sec
      0.11            +4.4%       0.11        perf-stat.overall.MPKI
      1.94            +3.6%       2.01        perf-stat.overall.cpi
      0.51            -3.5%       0.50        perf-stat.overall.ipc
 6.104e+10            -4.3%  5.842e+10        perf-stat.ps.branch-instructions
 5.329e+08            -3.0%  5.167e+08        perf-stat.ps.branch-misses
 4.645e+09            +1.2%  4.702e+09        perf-stat.ps.cache-references
  39587505            +1.8%   40301573        perf-stat.ps.context-switches
 3.045e+11            -4.0%  2.923e+11        perf-stat.ps.instructions
 1.867e+13            -5.3%  1.769e+13 ±  2%  perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-07-23  6:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23  6:22 [tip:sched/core] [sched/deadline] cccb45d7c4: will-it-scale.per_thread_ops 36.7% regression kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.