* [linus:master] [sched/core] ea9cffc0a1: stream.triad_bandwidth_MBps 1.1% improvement
@ 2024-12-20 2:46 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-12-20 2:46 UTC (permalink / raw)
To: K Prateek Nayak
Cc: oe-lkp, lkp, linux-kernel, Peter Zijlstra, aubrey.li, yu.c.chen,
oliver.sang
Hello,
kernel test robot noticed a 1.1% improvement of stream.triad_bandwidth_MBps on:
commit: ea9cffc0a154124821531991d5afdd7e8b20d7aa ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stream
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
parameters:
nr_threads: 50%
iterations: 10x
array_size: 50000000
loop: 100
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241220/202412201007.aa43a5fa-lkp@intel.com
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-9.4/100/50%/debian-12-x86_64-20240206.cgz/lkp-skl-d02/stream
commit:
6675ce2004 ("softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel")
ea9cffc0a1 ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")
6675ce20046d149e ea9cffc0a154124821531991d5a
---------------- ---------------------------
%stddev %change %stddev
\ | \
15264 +23.1% 18793 meminfo.Shmem
0.02 ± 4% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
3818 +23.1% 4700 proc-vmstat.nr_shmem
587.28 +302.4% 2363 vmstat.system.cs
2577 -3.5% 2488 vmstat.system.in
36673 ± 2% +164.6% 97051 ± 2% sched_debug.cpu.nr_switches.avg
53585 ± 10% +332.2% 231568 ± 16% sched_debug.cpu.nr_switches.max
12003 ± 23% +578.7% 81463 ± 24% sched_debug.cpu.nr_switches.stddev
578.05 +310.5% 2372 perf-stat.i.context-switches
14.72 ± 4% +10.8% 16.30 perf-stat.i.cpu-migrations
0.04 ± 5% +268.8% 0.15 perf-stat.i.metric.K/sec
575.63 +310.5% 2363 perf-stat.ps.context-switches
14.65 ± 4% +10.8% 16.23 perf-stat.ps.cpu-migrations
18760 +1.0% 18950 stream.add_bandwidth_MBps
18759 +1.0% 18948 stream.add_bandwidth_MBps_harmonicMean
14581 +1.2% 14751 stream.scale_bandwidth_MBps
14580 +1.2% 14748 stream.scale_bandwidth_MBps_harmonicMean
18289 +1.1% 18487 stream.triad_bandwidth_MBps
18287 +1.1% 18484 stream.triad_bandwidth_MBps_harmonicMean
0.02 ± 12% -32.3% 0.01 ± 16% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.02 ± 42% -48.6% 0.01 ± 7% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.10 ± 70% +332.7% 0.44 ± 95% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
65.81 ± 3% -68.0% 21.05 ± 3% perf-sched.total_wait_and_delay.average.ms
2011 +229.0% 6618 ± 4% perf-sched.total_wait_and_delay.count.ms
65.80 ± 3% -68.0% 21.04 ± 3% perf-sched.total_wait_time.average.ms
3.86 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
500.54 +24.3% 622.17 ± 5% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
497.31 ± 14% -98.6% 6.72 ± 7% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
19.83 ± 22% -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
53.83 ± 9% +8594.4% 4680 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
21.00 -100.0% 0.00 perf-sched.wait_and_delay.count.wait_for_partner.fifo_open.do_dentry_open.vfs_open
3666 ± 51% -72.7% 1000 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
4.04 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
1001 +136.0% 2362 ± 8% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.05 ± 37% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
500.52 +24.3% 622.15 ± 5% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
497.29 ± 14% -98.7% 6.71 ± 7% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.00 ±165% +525.0% 0.00 ± 68% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
3666 ± 51% -72.7% 1000 perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1001 +136.0% 2362 ± 8% perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.01 ±142% +247.8% 0.04 ± 54% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
97.56 -0.4 97.12 perf-profile.calltrace.cycles-pp.main
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.calltrace.cycles-pp.common_startup_64
97.61 -0.4 97.17 perf-profile.children.cycles-pp.main
0.02 ±141% +0.1 0.07 ± 14% perf-profile.children.cycles-pp.poll_idle
0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.__hrtimer_start_range_ns
0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.dequeue_entity
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.enqueue_dl_entity
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.dl_server_start
0.00 +0.1 0.06 ± 17% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.00 +0.1 0.06 ± 21% perf-profile.children.cycles-pp.pick_next_task_fair
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.1 0.08 ± 17% perf-profile.children.cycles-pp.__pick_next_task
0.00 +0.1 0.10 ± 19% perf-profile.children.cycles-pp.dequeue_entities
0.00 +0.1 0.11 ± 17% perf-profile.children.cycles-pp.dequeue_task_fair
0.00 +0.1 0.11 ± 18% perf-profile.children.cycles-pp.try_to_block_task
0.01 ±223% +0.1 0.14 ± 8% perf-profile.children.cycles-pp.enqueue_task
0.00 +0.1 0.13 ± 8% perf-profile.children.cycles-pp.enqueue_task_fair
0.01 ±223% +0.1 0.14 ± 8% perf-profile.children.cycles-pp.ttwu_do_activate
0.05 ± 7% +0.2 0.20 ± 11% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.00 +0.2 0.18 ± 11% perf-profile.children.cycles-pp.try_to_wake_up
0.07 ± 14% +0.2 0.25 ± 18% perf-profile.children.cycles-pp.kthread
0.07 ± 8% +0.2 0.25 ± 18% perf-profile.children.cycles-pp.ret_from_fork
0.07 ± 8% +0.2 0.25 ± 19% perf-profile.children.cycles-pp.ret_from_fork_asm
0.02 ±141% +0.2 0.20 ± 20% perf-profile.children.cycles-pp.schedule
0.00 +0.2 0.19 ± 24% perf-profile.children.cycles-pp.smpboot_thread_fn
0.05 ± 8% +0.2 0.25 ± 21% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.common_startup_64
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.cpu_startup_entry
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.do_idle
0.09 ± 39% +0.2 0.34 ± 20% perf-profile.children.cycles-pp.__schedule
97.30 -0.4 96.86 perf-profile.self.cycles-pp.main
0.02 ±141% +0.0 0.06 ± 14% perf-profile.self.cycles-pp.poll_idle
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-12-20 2:46 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 2:46 [linus:master] [sched/core] ea9cffc0a1: stream.triad_bandwidth_MBps 1.1% improvement kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.