All of lore.kernel.org
 help / color / mirror / Atom feed
* [tip:sched/core] [sched/fair]  e837456fdc: stress-ng.clock.ops_per_sec 3.4% regression
@ 2025-11-24 13:36 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-11-24 13:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: oe-lkp, lkp, linux-kernel, x86, Peter Zijlstra, aubrey.li,
	yu.c.chen, oliver.sang



Hello,


we reported an improvement in pts tests (also included in this report). now we
notice a small regression in stress-ng tests. just FYI.


kernel test robot noticed a 3.4% regression of stress-ng.clock.ops_per_sec on:


commit: e837456fdca81899a3c8e47b3fd39e30eae6e291 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: clock
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | pts: pts.quadray.1.1080p.fps 23.2% improvement                                                |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters  | cpufreq_governor=performance                                                                  |
|                  | need_x=true                                                                                   |
|                  | option_a=5                                                                                    |
|                  | option_b=1080p                                                                                |
|                  | test=quadray-1.0.0                                                                            |
+------------------+-----------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202511242155.286a5a75-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251124/202511242155.286a5a75-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/clock/stress-ng/60s

commit: 
  aceccac58a ("sched/fair: Enable scheduler feature NEXT_BUDDY")
  e837456fdc ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")

aceccac58ad76305 e837456fdca81899a3c8e47b3fd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.032e+09 ±  3%     +28.0%  1.321e+09 ±  2%  cpuidle..time
    897841 ±  5%     -15.0%     763285 ±  7%  numa-numastat.node0.local_node
    896471 ±  5%     -14.8%     763770 ±  7%  numa-vmstat.node0.numa_local
     45743            +1.1%      46225        proc-vmstat.nr_kernel_stack
      5.60            +2.0        7.57        mpstat.cpu.all.idle%
      0.77 ±  3%      +0.1        0.90        mpstat.cpu.all.irq%
     59739 ±  2%     -11.1%      53102 ±  4%  perf-c2c.HITM.local
     17302 ±  2%      -9.8%      15614 ±  3%  perf-c2c.HITM.remote
     77041 ±  2%     -10.8%      68716 ±  3%  perf-c2c.HITM.total
   5452857            -3.4%    5269019        stress-ng.clock.ops
     90934            -3.4%      87863        stress-ng.clock.ops_per_sec
     84662           +20.6%     102102 ±  3%  stress-ng.time.involuntary_context_switches
     24199            -2.3%      23648        stress-ng.time.percent_of_cpu_this_job_got
     14530            -2.3%      14199        stress-ng.time.system_time
      0.30 ± 44%      +0.1        0.40        turbostat.C1%
      0.95 ± 44%      +0.4        1.34 ±  3%  turbostat.C1E%
      2.01 ± 44%      +2.2        4.17 ±  3%  turbostat.C6%
      0.36 ± 44%    +121.2%       0.80 ±  4%  turbostat.CPU%c1
      2.07 ± 48%     +62.7%       3.37 ±  2%  turbostat.CPU%c6
     75763           -10.4%      67848        sched_debug.cfs_rq:/.avg_vruntime.avg
     33867 ±  4%     -15.8%      28530 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.min
     75459           -10.4%      67579        sched_debug.cfs_rq:/.zero_vruntime.avg
     33783 ±  4%     -16.3%      28291 ±  7%  sched_debug.cfs_rq:/.zero_vruntime.min
    110597 ± 16%     +41.9%     156958 ± 12%  sched_debug.cpu.max_idle_balance_cost.stddev
    -36.17           -30.4%     -25.17        sched_debug.cpu.nr_uninterruptible.min
     14.46 ±  6%     -20.0%      11.57 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
 4.239e+10            -2.3%   4.14e+10        perf-stat.i.branch-instructions
      0.08            +0.0        0.08        perf-stat.i.branch-miss-rate%
  33247700            +2.5%   34067265        perf-stat.i.branch-misses
     28.18            -0.8       27.42        perf-stat.i.cache-miss-rate%
  90091009            -3.5%   86913480        perf-stat.i.cache-misses
 3.205e+08            -1.0%  3.175e+08        perf-stat.i.cache-references
 8.768e+11            -2.2%  8.572e+11        perf-stat.i.cpu-cycles
      1621            -8.3%       1486        perf-stat.i.cpu-migrations
      9765            +1.6%       9920        perf-stat.i.cycles-between-cache-misses
 2.121e+11            -2.4%   2.07e+11        perf-stat.i.instructions
      0.42            -1.2%       0.42        perf-stat.overall.MPKI
      0.08            +0.0        0.08        perf-stat.overall.branch-miss-rate%
     28.11            -0.7       27.37        perf-stat.overall.cache-miss-rate%
      9732            +1.4%       9867        perf-stat.overall.cycles-between-cache-misses
 4.168e+10            -2.3%   4.07e+10        perf-stat.ps.branch-instructions
  32683973            +2.4%   33483421        perf-stat.ps.branch-misses
  88576640            -3.6%   85423812        perf-stat.ps.cache-misses
 3.151e+08            -0.9%  3.121e+08        perf-stat.ps.cache-references
  8.62e+11            -2.2%  8.429e+11        perf-stat.ps.cpu-cycles
      1594            -8.2%       1463        perf-stat.ps.cpu-migrations
 2.085e+11            -2.4%  2.036e+11        perf-stat.ps.instructions
 1.262e+13            -2.3%  1.233e+13        perf-stat.total.instructions


***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/need_x/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/true/5/1080p/debian-12-x86_64-phoronix/lkp-csl-2sp7/quadray-1.0.0/pts

commit: 
  aceccac58a ("sched/fair: Enable scheduler feature NEXT_BUDDY")
  e837456fdc ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")

aceccac58ad76305 e837456fdca81899a3c8e47b3fd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  15179434 ± 18%     +51.8%   23046485 ± 15%  meminfo.DirectMap2M
      0.37            +0.0        0.41        mpstat.cpu.all.sys%
   1055135 ± 31%     -53.4%     492047 ± 91%  numa-meminfo.node0.Shmem
    263798 ± 31%     -53.4%     122995 ± 91%  numa-vmstat.node0.nr_shmem
     21104            +9.2%      23039        vmstat.system.cs
    371194            +1.8%     378021        proc-vmstat.nr_shmem
      4628            +2.9%       4762        proc-vmstat.numa_huge_pte_updates
     18161 ±  6%     +22.2%      22195 ± 11%  sched_debug.cfs_rq:/system.slice.avg_vruntime.stddev
     18051 ±  6%     +22.4%      22090 ± 11%  sched_debug.cfs_rq:/system.slice.zero_vruntime.stddev
     16231           +11.2%      18053        sched_debug.cpu.nr_switches.avg
      6.26 ± 10%     -19.7%       5.03 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
     39.69 ±  2%     +23.2%      48.89        pts.quadray.1.1080p.fps
      3245            +2.4%       3323        pts.time.percent_of_cpu_this_job_got
     39.97 ±  2%     +14.1%      45.62        pts.time.system_time
      4456            +2.2%       4555        pts.time.user_time
   1290701            +9.9%    1419060        pts.time.voluntary_context_switches
  6.13e+09            +2.8%  6.299e+09        perf-stat.i.branch-instructions
     21517            +9.0%      23449        perf-stat.i.context-switches
 6.949e+10            +2.3%  7.108e+10        perf-stat.i.cpu-cycles
    203.07            +7.8%     218.95        perf-stat.i.cpu-migrations
      8980 ±  2%      -7.8%       8279        perf-stat.i.cycles-between-cache-misses
 2.515e+10            +2.9%  2.589e+10        perf-stat.i.dTLB-loads
  8.21e+09            +2.7%  8.429e+09        perf-stat.i.dTLB-stores
 6.904e+10            +2.9%  7.102e+10        perf-stat.i.instructions
      0.72            +2.3%       0.74        perf-stat.i.metric.GHz
    514.91 ±  2%      -9.4%     466.42 ±  5%  perf-stat.i.metric.K/sec
    412.78            +2.9%     424.58        perf-stat.i.metric.M/sec
    372305 ±  2%     +20.8%     449603 ±  5%  perf-stat.i.node-store-misses
      1.81            -0.0        1.77        perf-stat.overall.branch-miss-rate%
     57.23 ±  2%      +4.3       61.51 ±  3%  perf-stat.overall.node-store-miss-rate%
 6.085e+09            +2.8%  6.254e+09        perf-stat.ps.branch-instructions
     21348            +9.0%      23273        perf-stat.ps.context-switches
   6.9e+10            +2.3%  7.059e+10        perf-stat.ps.cpu-cycles
    201.61            +7.8%     217.38        perf-stat.ps.cpu-migrations
 2.498e+10            +2.9%  2.571e+10        perf-stat.ps.dTLB-loads
 8.154e+09            +2.7%  8.373e+09        perf-stat.ps.dTLB-stores
 6.855e+10            +2.9%  7.053e+10        perf-stat.ps.instructions
    369608 ±  2%     +20.8%     446351 ±  5%  perf-stat.ps.node-store-misses
 9.519e+12            +3.0%  9.803e+12        perf-stat.total.instructions
      0.00 ±223%   +1490.9%       0.03 ±109%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±  9%     -94.7%       0.00 ±111%  perf-sched.sch_delay.avg.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
      0.06 ± 21%     -86.5%       0.01 ± 17%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
      0.12 ± 31%     -89.2%       0.01 ± 68%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.06 ± 67%     -67.6%       0.02 ± 40%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.03 ±  2%     -36.3%       0.02 ±  3%  perf-sched.sch_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
      0.21 ± 12%     -87.8%       0.03 ±  7%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.05 ± 14%     -78.5%       0.01 ± 14%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
      0.03 ± 43%     -70.0%       0.01 ±  9%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.22 ± 56%     -94.6%       0.01 ± 41%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_select
      0.02 ± 27%     -54.8%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.00 ±223%   +3636.4%       0.07 ±169%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1.11 ± 92%     -98.3%       0.02 ± 11%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.06 ±  6%     -98.4%       0.03 ± 15%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
      3.22 ±  4%     -63.6%       1.17 ± 98%  perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.67 ±129%    +808.5%       6.12 ± 52%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.00 ±223%   +4066.7%       0.04 ±153%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.02 ± 32%    +715.6%       0.12 ±149%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.22 ± 14%     -43.5%       1.25 ± 30%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
      1.84 ± 48%     -91.5%       0.16 ±169%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      1.68 ± 55%     -94.8%       0.09 ±164%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_select
      0.35 ±130%     -93.5%       0.02 ± 25%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1.34 ± 35%     -97.2%       0.04 ± 48%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.03 ±  3%     -37.6%       0.02 ±  3%  perf-sched.total_sch_delay.average.ms
      8.33 ±  4%     -18.9%       6.76 ±  3%  perf-sched.total_wait_and_delay.average.ms
    162856           +20.3%     195924        perf-sched.total_wait_and_delay.count.ms
      8.30 ±  4%     -18.8%       6.74 ±  3%  perf-sched.total_wait_time.average.ms
    213.84 ± 51%     -87.7%      26.25 ± 77%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     15.84 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
    800.53           -15.3%     677.82 ± 13%  perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      2.37 ±  2%     -21.9%       1.85        perf-sched.wait_and_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
      1.38 ±  9%     -51.8%       0.67 ± 42%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     38.24 ±  7%     -12.2%      33.58 ±  7%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     61.60 ±  3%     -24.2%      46.70 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
     13.33 ± 24%    +217.5%      42.33 ±  9%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.33 ±141%   +5300.0%      18.00 ± 12%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    196.50 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.ww_mutex_lock.drm_gem_vunmap.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
    152066 ±  2%     +19.5%     181763        perf-sched.wait_and_delay.count.futex_do_wait.__futex_wait.futex_wait.do_futex
      1290 ±  6%     +27.7%       1647 ± 12%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    272.83           +16.9%     319.00 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    238.33 ±  4%     +24.8%     297.33        perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      2356 ± 47%     -66.5%     788.64 ± 92%  perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     44.95 ±147%   +2131.0%       1002 ± 69%  perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     59.34 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
      0.01 ±146%  +30486.1%       1.84 ± 31%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
    213.84 ± 51%     -87.7%      26.25 ± 77%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     15.82 ±  2%     -96.0%       0.63 ± 66%  perf-sched.wait_time.avg.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
    800.27           -15.3%     677.81 ± 13%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.61 ±  5%     +15.0%       0.70 ±  7%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.34 ±  2%     -21.8%       1.83        perf-sched.wait_time.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
      1.05 ±  2%    +199.9%       3.15        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.11 ± 17%   +1066.9%       1.23 ± 12%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.47 ± 38%    +148.3%       1.17 ± 30%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     38.23 ±  7%     -12.2%      33.58 ±  7%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     61.57 ±  3%     -24.2%      46.69 ±  2%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      0.00 ± 49%   +5108.3%       0.10 ± 73%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.01 ±142%  +28030.6%       2.30 ± 23%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      2356 ± 47%     -66.5%     788.64 ± 92%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     45.07 ±146%   +2125.3%       1002 ± 69%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     11.69 ±  6%     -17.3%       9.67 ±  9%  perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
    237.73 ±143%    +561.4%       1572 ± 61%  perf-sched.wait_time.max.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
      3.69 ±  4%   +1614.4%      63.33 ± 48%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      3.90 ± 26%    +959.8%      41.38 ±100%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.04 ± 50%   +5947.9%       2.19 ± 52%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-11-24 13:37 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 13:36 [tip:sched/core] [sched/fair] e837456fdc: stress-ng.clock.ops_per_sec 3.4% regression kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.