All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Julia Lawall <julia.lawall@inria.fr>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	<aubrey.li@linux.intel.com>, <yu.c.chen@intel.com>,
	<oliver.sang@intel.com>
Subject: [linus:master] [sched/core]  e932c4ab38: aim9.sync_disk_cp.ops_per_sec 2.3% improvement
Date: Tue, 24 Dec 2024 16:34:05 +0800	[thread overview]
Message-ID: <202412241607.dc13db91-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 2.3% improvement of aim9.sync_disk_cp.ops_per_sec on:


commit: e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: aim9
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory
parameters:

	testtime: 300s
	test: sync_disk_cp
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput 2.4% improvement                  |
| test machine     | 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory |
| test parameters  | cpufreq_governor=performance                                                |
|                  | runtime=300s                                                                |
|                  | test=migrate                                                                |
+------------------+-----------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412241607.dc13db91-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-skl-d06/sync_disk_cp/aim9/300s

commit: 
  ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
  e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")

ff47a0acfcce309c e932c4ab38f072ce5894b2851fe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    779244            +2.3%     797195        aim9.sync_disk_cp.ops_per_sec
    444185 ±  2%     -51.7%     214738 ±  3%  cpuidle..usage
     40.83 ± 15%     -84.5%       6.33 ± 23%  perf-c2c.HITM.local
   6505472 ± 12%     +21.6%    7908010 ±  4%  meminfo.DirectMap2M
     29200           -10.3%      26194        meminfo.Shmem
      0.08 ±  2%      -0.0        0.06 ±  2%  mpstat.cpu.all.irq%
      0.04 ±  3%      -0.0        0.03 ±  4%  mpstat.cpu.all.soft%
      2562 ±  2%     -60.3%       1018        vmstat.system.cs
      2343           -23.3%       1798        vmstat.system.in
    117335           -53.2%      54952        sched_debug.cpu.nr_switches.avg
    285639 ±  5%     -71.9%      80403 ±  5%  sched_debug.cpu.nr_switches.max
    100396 ±  9%     -77.1%      22968 ± 14%  sched_debug.cpu.nr_switches.stddev
      7316           -10.5%       6550        proc-vmstat.nr_shmem
  58767234            +2.4%   60172860        proc-vmstat.numa_hit
  58984855            +2.0%   60176451        proc-vmstat.numa_local
  58862408            +2.3%   60212415        proc-vmstat.pgalloc_normal
  58848231            +2.3%   60198260        proc-vmstat.pgfree
 7.448e+08            +1.7%  7.574e+08        perf-stat.i.branch-instructions
      1.35            -0.1        1.29        perf-stat.i.branch-miss-rate%
  65562189 ±  2%      -4.9%   62378502        perf-stat.i.cache-references
      2571 ±  2%     -60.5%       1016        perf-stat.i.context-switches
 3.732e+09            +1.8%  3.797e+09        perf-stat.i.instructions
      0.14 ±  3%     -87.0%       0.02        perf-stat.i.metric.K/sec
 7.426e+08            +1.7%   7.55e+08        perf-stat.ps.branch-instructions
  65356430 ±  2%      -4.9%   62171508        perf-stat.ps.cache-references
      2563 ±  2%     -60.5%       1012        perf-stat.ps.context-switches
  3.72e+09            +1.7%  3.785e+09        perf-stat.ps.instructions
  1.12e+12            +1.8%   1.14e+12        perf-stat.total.instructions
      0.02 ± 25%     +78.4%       0.03 ± 18%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ± 55%     +82.3%       0.04 ± 16%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
      0.04 ± 21%     +87.3%       0.07 ± 21%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±  9%     +35.2%       0.02 ±  6%  perf-sched.total_sch_delay.average.ms
     20.34 ±  5%    +111.1%      42.94        perf-sched.total_wait_and_delay.average.ms
      7025 ±  6%     -54.0%       3228        perf-sched.total_wait_and_delay.count.ms
      3058 ± 20%     +63.5%       4998        perf-sched.total_wait_and_delay.max.ms
     20.33 ±  5%    +111.1%      42.92        perf-sched.total_wait_time.average.ms
      3058 ± 20%     +63.5%       4998        perf-sched.total_wait_time.max.ms
    202.58 ± 18%     +94.7%     394.49 ±  9%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    609.98 ±  5%     -17.9%     500.63        perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      9.01 ± 12%   +6133.8%     561.38 ± 15%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3837 ± 12%     -98.6%      52.17 ± 12%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1349 ± 39%    +270.4%       4998        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2785 ± 16%     -64.1%       1001        perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    202.50 ± 18%     +94.8%     394.38 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    609.95 ±  5%     -17.9%     500.56        perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      9.00 ± 12%   +6140.7%     561.36 ± 15%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1349 ± 39%    +270.4%       4998        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2785 ± 16%     -64.1%       1001        perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.common_startup_64
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.do_idle
      1.12 ±  6%      -0.6        0.49 ± 13%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.92 ±  5%      -0.5        0.42 ± 17%  perf-profile.children.cycles-pp.cpuidle_enter
      0.92 ±  5%      -0.5        0.42 ± 17%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.50 ±  6%      -0.3        0.21 ± 12%  perf-profile.children.cycles-pp.intel_idle
      0.52 ±  8%      -0.2        0.33 ±  7%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.48 ±  6%      -0.2        0.31 ±  7%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.27 ± 18%      -0.2        0.10 ± 36%  perf-profile.children.cycles-pp.__schedule
      0.20 ± 12%      -0.2        0.04 ± 73%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.21 ± 13%      -0.2        0.06 ± 20%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.24 ±  9%      -0.1        0.11 ± 12%  perf-profile.children.cycles-pp.ret_from_fork
      0.24 ±  9%      -0.1        0.11 ± 12%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.24 ±  9%      -0.1        0.11 ± 10%  perf-profile.children.cycles-pp.kthread
      0.18 ±  8%      -0.1        0.05 ± 49%  perf-profile.children.cycles-pp.schedule
      0.31 ±  9%      -0.1        0.19 ±  8%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.30 ±  9%      -0.1        0.19 ±  8%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.25 ±  8%      -0.1        0.16 ±  7%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.11 ± 11%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.try_to_block_task
      0.10 ± 13%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.21 ± 12%      -0.1        0.14 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.10 ± 14%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.dequeue_entities
      0.17 ± 13%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.11 ± 12%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.sched_tick
     40.09            +0.6       40.66        perf-profile.children.cycles-pp.read
      0.50 ±  6%      -0.3        0.21 ± 12%  perf-profile.self.cycles-pp.intel_idle
      0.97 ±  4%      +0.1        1.05 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack


***************************************************************************************************
lkp-skl-d03: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-skl-d03/migrate/vm-scalability

commit: 
  ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
  e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")

ff47a0acfcce309c e932c4ab38f072ce5894b2851fe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    181821           -12.5%     159050        meminfo.Mapped
      0.02 ±  4%     -20.4%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     45923           -12.8%      40022        proc-vmstat.nr_mapped
      1.00 ± 99%    -100.0%       0.00 ± 52%  vm-scalability.free_time
   2422987            +2.4%    2480833        vm-scalability.median
   2422987            +2.4%    2480833        vm-scalability.throughput
     90071            +2.5%      92323        vm-scalability.time.involuntary_context_switches
      3.03 ±  3%      -0.2        2.84 ±  3%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
      2.84 ±  2%      -0.2        2.67 ±  3%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
      6.04 ±  2%      -0.2        5.88        perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      6.06 ±  2%      -0.2        5.89        perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2.78 ±  2%      -0.1        2.64 ±  2%  perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
      2.90            -0.1        2.77 ±  2%  perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.90 ±  4%      +0.1        0.95 ±  3%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.92 ±  3%      +0.1        0.99 ±  2%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.77 ±  4%      +0.1        0.84 ±  3%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.80 ±  7%      +0.1        0.91 ±  7%  perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      4.90 ±  2%      -0.2        4.70 ±  2%  perf-profile.children.cycles-pp.do_read_fault
      6.09 ±  2%      -0.2        5.92        perf-profile.children.cycles-pp.exit_mm
      0.54 ±  2%      -0.1        0.49 ±  8%  perf-profile.children.cycles-pp.___perf_sw_event
      0.39 ±  5%      -0.0        0.35 ±  6%  perf-profile.children.cycles-pp.vfs_open
      0.20 ±  4%      -0.0        0.16 ± 10%  perf-profile.children.cycles-pp.opendir
      0.15 ±  8%      +0.0        0.19 ±  5%  perf-profile.children.cycles-pp.__kmalloc_cache_noprof
      0.18 ±  6%      +0.0        0.22 ± 11%  perf-profile.children.cycles-pp.__kernel_read
      0.29 ±  5%      +0.0        0.34 ±  5%  perf-profile.children.cycles-pp.filemap_read
      1.17 ±  4%      +0.1        1.28 ±  4%  perf-profile.children.cycles-pp.__memcg_slab_free_hook
      0.44 ±  4%      -0.0        0.40 ±  8%  perf-profile.self.cycles-pp.___perf_sw_event
      0.07 ± 15%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.__folio_batch_add_and_move





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2024-12-24  8:34 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202412241607.dc13db91-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=julia.lawall@inria.fr \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.