All of lore.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [rseq]  abc850e761: stress-ng.sem.sem_wait_calls_per_sec 3.1% improvement
@ 2025-12-09 15:41 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-12-09 15:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: oe-lkp, lkp, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Mathieu Desnoyers, oliver.sang



Hello,

kernel test robot noticed a 3.1% improvement of stress-ng.sem.sem_wait_calls_per_sec on:


commit: abc850e7616c91ebaa3f5ba3617ab0a104d45039 ("rseq: Provide and use rseq_update_user_cs()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sem
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251209/202512092342.3ee2de77-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/sem/stress-ng/60s

commit: 
  9c37cb6e80 ("rseq: Provide static branch for runtime debugging")
  abc850e761 ("rseq: Provide and use rseq_update_user_cs()")

9c37cb6e80b8fcdd abc850e7616c91ebaa3f5ba3617 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    713480 ± 29%     -24.9%     536114 ± 28%  meminfo.Mapped
  19261235 ± 14%     -28.3%   13815751 ± 45%  perf-sched.total_wait_and_delay.count.ms
  19261235 ± 14%     -28.3%   13815751 ± 45%  perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
    209285 ±  4%      -3.8%     201417 ±  3%  proc-vmstat.nr_anon_pages
    179839 ± 29%     -25.3%     134393 ± 28%  proc-vmstat.nr_mapped
      0.21            +0.0        0.22        perf-stat.i.branch-miss-rate%
 3.044e+08            +3.6%  3.154e+08        perf-stat.i.branch-misses
 1.933e+08            +3.5%  2.001e+08        perf-stat.i.context-switches
      0.93 ±  3%      +6.7%       0.99        perf-stat.i.metric.M/sec
      0.20            +0.0        0.21        perf-stat.overall.branch-miss-rate%
 2.996e+08            +3.6%  3.104e+08        perf-stat.ps.branch-misses
 1.903e+08            +3.5%   1.97e+08        perf-stat.ps.context-switches
 1.341e+10            +2.6%  1.377e+10        stress-ng.sem.ops
 2.235e+08            +2.6%  2.294e+08        stress-ng.sem.ops_per_sec
    374680            +3.1%     386364        stress-ng.sem.sem_timedwait_calls_per_sec
    374638            +3.2%     386525        stress-ng.sem.sem_trywait_calls_per_sec
    374649            +3.1%     386331        stress-ng.sem.sem_wait_calls_per_sec
 1.178e+10            +3.5%  1.219e+10        stress-ng.time.involuntary_context_switches
      7623            -1.2%       7530        stress-ng.time.system_time
      3803            +2.8%       3908        stress-ng.time.user_time
      8.44            -2.8        5.65        perf-profile.calltrace.cycles-pp.__rseq_handle_notify_resume.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      9.62            -2.6        7.02        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     59.80            -1.0       58.80        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     60.34            -1.0       59.38        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.83            +0.0        0.86        perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.__schedule.schedule.__x64_sys_sched_yield
      0.65            +0.0        0.68        perf-profile.calltrace.cycles-pp.__update_load_avg_se.update_load_avg.put_prev_entity.pick_next_task_fair.__pick_next_task
      1.00            +0.0        1.03        perf-profile.calltrace.cycles-pp.update_load_avg.set_next_entity.pick_next_task_fair.__pick_next_task.__schedule
      1.11            +0.0        1.15        perf-profile.calltrace.cycles-pp.__enqueue_entity.put_prev_entity.pick_next_task_fair.__pick_next_task.__schedule
      1.56            +0.1        1.62        perf-profile.calltrace.cycles-pp.update_load_avg.put_prev_entity.pick_next_task_fair.__pick_next_task.__schedule
      1.73            +0.1        1.79        perf-profile.calltrace.cycles-pp.native_sched_clock.sched_clock.sched_clock_cpu.update_rq_clock.yield_task_fair
      1.77            +0.1        1.84        perf-profile.calltrace.cycles-pp.sched_clock.sched_clock_cpu.update_rq_clock.yield_task_fair.do_sched_yield
      2.13            +0.1        2.20        perf-profile.calltrace.cycles-pp.__rdgsbase_inactive.__sched_yield
      1.84            +0.1        1.90        perf-profile.calltrace.cycles-pp.sched_clock_cpu.update_rq_clock.yield_task_fair.do_sched_yield.__x64_sys_sched_yield
      1.95            +0.1        2.02        perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__pick_next_task.__schedule.schedule
      2.06            +0.1        2.14        perf-profile.calltrace.cycles-pp.update_rq_clock.yield_task_fair.do_sched_yield.__x64_sys_sched_yield.do_syscall_64
      2.38            +0.1        2.46        perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
      2.98            +0.1        3.08        perf-profile.calltrace.cycles-pp.__wrgsbase_inactive.__sched_yield
      3.26            +0.1        3.38        perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__pick_next_task.__schedule.schedule
      5.15            +0.2        5.32        perf-profile.calltrace.cycles-pp.yield_task_fair.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.45            +0.2        6.66        perf-profile.calltrace.cycles-pp.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      3.28            +0.2        3.50        perf-profile.calltrace.cycles-pp.rseq_update_cpu_node_id.__rseq_handle_notify_resume.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      9.21            +0.3        9.50        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield
      9.41            +0.3        9.70        perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
     17.41            +0.5       17.88        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.59            +0.5       18.07        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     16.02            +0.6       16.60        perf-profile.calltrace.cycles-pp.os_xsave.__sched_yield
     24.16            +0.7       24.86        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     20.98            +0.7       21.71        perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe
     23.42            +0.8       24.26        perf-profile.calltrace.cycles-pp.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     25.22            +0.9       26.11        perf-profile.calltrace.cycles-pp.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      8.52            -2.6        5.88        perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      9.64            -2.6        7.04        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     59.97            -1.0       58.99        perf-profile.children.cycles-pp.do_syscall_64
     60.46            -1.0       59.50        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.09            +0.0        0.10        perf-profile.children.cycles-pp.propagate_entity_load_avg
      0.12 ±  4%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.clock_gettime
      1.12            +0.0        1.16        perf-profile.children.cycles-pp.__enqueue_entity
      1.14            +0.0        1.18        perf-profile.children.cycles-pp.__update_load_avg_se
      1.97            +0.1        2.04        perf-profile.children.cycles-pp.set_next_entity
      2.38            +0.1        2.46        perf-profile.children.cycles-pp.prepare_task_switch
      2.41            +0.1        2.49        perf-profile.children.cycles-pp.__rdgsbase_inactive
      2.67            +0.1        2.77        perf-profile.children.cycles-pp.update_load_avg
      3.26            +0.1        3.37        perf-profile.children.cycles-pp.__wrgsbase_inactive
      3.30            +0.1        3.41        perf-profile.children.cycles-pp.put_prev_entity
      5.17            +0.2        5.34        perf-profile.children.cycles-pp.yield_task_fair
      6.49            +0.2        6.70        perf-profile.children.cycles-pp.do_sched_yield
      3.45            +0.2        3.68        perf-profile.children.cycles-pp.rseq_update_cpu_node_id
      9.24            +0.3        9.54        perf-profile.children.cycles-pp.pick_next_task_fair
      9.43            +0.3        9.73        perf-profile.children.cycles-pp.__pick_next_task
     17.48            +0.5       17.96        perf-profile.children.cycles-pp.__schedule
     17.61            +0.5       18.09        perf-profile.children.cycles-pp.schedule
     16.04            +0.6       16.61        perf-profile.children.cycles-pp.os_xsave
     24.18            +0.7       24.88        perf-profile.children.cycles-pp.__x64_sys_sched_yield
     21.01            +0.7       21.75        perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
     23.44            +0.8       24.28        perf-profile.children.cycles-pp.switch_fpu_return
     25.24            +0.9       26.15        perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
      0.67            +0.0        0.70        perf-profile.self.cycles-pp.___perf_sw_event
      0.92            +0.0        0.95        perf-profile.self.cycles-pp.update_curr
      1.11            +0.0        1.15        perf-profile.self.cycles-pp.__enqueue_entity
      0.80            +0.0        0.84        perf-profile.self.cycles-pp.update_load_avg
      0.70            +0.0        0.73        perf-profile.self.cycles-pp.pick_next_task_fair
      1.04            +0.0        1.08        perf-profile.self.cycles-pp.exit_to_user_mode_loop
      1.12            +0.1        1.17        perf-profile.self.cycles-pp.__update_load_avg_se
      1.78            +0.1        1.84        perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
      1.48            +0.1        1.54        perf-profile.self.cycles-pp.prepare_task_switch
      0.69            +0.1        0.75        perf-profile.self.cycles-pp.do_syscall_64
      2.40            +0.1        2.48        perf-profile.self.cycles-pp.__rdgsbase_inactive
      3.12            +0.1        3.23        perf-profile.self.cycles-pp.__wrgsbase_inactive
     16.02            +0.6       16.61        perf-profile.self.cycles-pp.os_xsave
     21.00            +0.7       21.74        perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.37            +2.0        2.40        perf-profile.self.cycles-pp.__rseq_handle_notify_resume




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-12-09 15:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09 15:41 [linus:master] [rseq] abc850e761: stress-ng.sem.sem_wait_calls_per_sec 3.1% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.