All of lore.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [sched/core]  ea9cffc0a1: stream.triad_bandwidth_MBps 1.1% improvement
@ 2024-12-20  2:46 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-12-20  2:46 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: oe-lkp, lkp, linux-kernel, Peter Zijlstra, aubrey.li, yu.c.chen,
	oliver.sang



Hello,

kernel test robot noticed a 1.1% improvement of stream.triad_bandwidth_MBps on:


commit: ea9cffc0a154124821531991d5afdd7e8b20d7aa ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stream
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
parameters:

	nr_threads: 50%
	iterations: 10x
	array_size: 50000000
	loop: 100
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241220/202412201007.aa43a5fa-lkp@intel.com

=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/rootfs/tbox_group/testcase:
  50000000/gcc-12/performance/10x/x86_64-rhel-9.4/100/50%/debian-12-x86_64-20240206.cgz/lkp-skl-d02/stream

commit: 
  6675ce2004 ("softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel")
  ea9cffc0a1 ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")

6675ce20046d149e ea9cffc0a154124821531991d5a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     15264           +23.1%      18793        meminfo.Shmem
      0.02 ±  4%      +0.0        0.03 ±  4%  mpstat.cpu.all.soft%
      3818           +23.1%       4700        proc-vmstat.nr_shmem
    587.28          +302.4%       2363        vmstat.system.cs
      2577            -3.5%       2488        vmstat.system.in
     36673 ±  2%    +164.6%      97051 ±  2%  sched_debug.cpu.nr_switches.avg
     53585 ± 10%    +332.2%     231568 ± 16%  sched_debug.cpu.nr_switches.max
     12003 ± 23%    +578.7%      81463 ± 24%  sched_debug.cpu.nr_switches.stddev
    578.05          +310.5%       2372        perf-stat.i.context-switches
     14.72 ±  4%     +10.8%      16.30        perf-stat.i.cpu-migrations
      0.04 ±  5%    +268.8%       0.15        perf-stat.i.metric.K/sec
    575.63          +310.5%       2363        perf-stat.ps.context-switches
     14.65 ±  4%     +10.8%      16.23        perf-stat.ps.cpu-migrations
     18760            +1.0%      18950        stream.add_bandwidth_MBps
     18759            +1.0%      18948        stream.add_bandwidth_MBps_harmonicMean
     14581            +1.2%      14751        stream.scale_bandwidth_MBps
     14580            +1.2%      14748        stream.scale_bandwidth_MBps_harmonicMean
     18289            +1.1%      18487        stream.triad_bandwidth_MBps
     18287            +1.1%      18484        stream.triad_bandwidth_MBps_harmonicMean
      0.02 ± 12%     -32.3%       0.01 ± 16%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 42%     -48.6%       0.01 ±  7%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.10 ± 70%    +332.7%       0.44 ± 95%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
     65.81 ±  3%     -68.0%      21.05 ±  3%  perf-sched.total_wait_and_delay.average.ms
      2011          +229.0%       6618 ±  4%  perf-sched.total_wait_and_delay.count.ms
     65.80 ±  3%     -68.0%      21.04 ±  3%  perf-sched.total_wait_time.average.ms
      3.86 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
    500.54           +24.3%     622.17 ±  5%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    497.31 ± 14%     -98.6%       6.72 ±  7%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ± 15%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     19.83 ± 22%    -100.0%       0.00        perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     53.83 ±  9%   +8594.4%       4680 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     21.00          -100.0%       0.00        perf-sched.wait_and_delay.count.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      3666 ± 51%     -72.7%       1000        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      4.04          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      1001          +136.0%       2362 ±  8%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.05 ± 37%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    500.52           +24.3%     622.15 ±  5%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    497.29 ± 14%     -98.7%       6.71 ±  7%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.00 ±165%    +525.0%       0.00 ± 68%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      3666 ± 51%     -72.7%       1000        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      1001          +136.0%       2362 ±  8%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.01 ±142%    +247.8%       0.04 ± 54%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     97.56            -0.4       97.12        perf-profile.calltrace.cycles-pp.main
      1.17 ±  2%      +0.2        1.42 ±  6%  perf-profile.calltrace.cycles-pp.common_startup_64
     97.61            -0.4       97.17        perf-profile.children.cycles-pp.main
      0.02 ±141%      +0.1        0.07 ± 14%  perf-profile.children.cycles-pp.poll_idle
      0.00            +0.1        0.06 ± 15%  perf-profile.children.cycles-pp.__hrtimer_start_range_ns
      0.00            +0.1        0.06 ± 15%  perf-profile.children.cycles-pp.dequeue_entity
      0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.enqueue_dl_entity
      0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.dl_server_start
      0.00            +0.1        0.06 ± 17%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
      0.00            +0.1        0.06 ± 21%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.update_load_avg
      0.00            +0.1        0.08 ± 17%  perf-profile.children.cycles-pp.__pick_next_task
      0.00            +0.1        0.10 ± 19%  perf-profile.children.cycles-pp.dequeue_entities
      0.00            +0.1        0.11 ± 17%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.00            +0.1        0.11 ± 18%  perf-profile.children.cycles-pp.try_to_block_task
      0.01 ±223%      +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.enqueue_task
      0.00            +0.1        0.13 ±  8%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.01 ±223%      +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.05 ±  7%      +0.2        0.20 ± 11%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.00            +0.2        0.18 ± 11%  perf-profile.children.cycles-pp.try_to_wake_up
      0.07 ± 14%      +0.2        0.25 ± 18%  perf-profile.children.cycles-pp.kthread
      0.07 ±  8%      +0.2        0.25 ± 18%  perf-profile.children.cycles-pp.ret_from_fork
      0.07 ±  8%      +0.2        0.25 ± 19%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.02 ±141%      +0.2        0.20 ± 20%  perf-profile.children.cycles-pp.schedule
      0.00            +0.2        0.19 ± 24%  perf-profile.children.cycles-pp.smpboot_thread_fn
      0.05 ±  8%      +0.2        0.25 ± 21%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      1.17 ±  2%      +0.2        1.42 ±  6%  perf-profile.children.cycles-pp.common_startup_64
      1.17 ±  2%      +0.2        1.42 ±  6%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.17 ±  2%      +0.2        1.42 ±  6%  perf-profile.children.cycles-pp.do_idle
      0.09 ± 39%      +0.2        0.34 ± 20%  perf-profile.children.cycles-pp.__schedule
     97.30            -0.4       96.86        perf-profile.self.cycles-pp.main
      0.02 ±141%      +0.0        0.06 ± 14%  perf-profile.self.cycles-pp.poll_idle




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-12-20  2:46 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20  2:46 [linus:master] [sched/core] ea9cffc0a1: stream.triad_bandwidth_MBps 1.1% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.