All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Yang Shi <yang@os.amperecomputing.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	Minchan Kim <minchan@kernel.org>,
	"Hans Boehm" <hboehm@google.com>,
	Lokesh Gidra <lokeshgidra@google.com>, <linux-mm@kvack.org>,
	<oliver.sang@intel.com>
Subject: [linus:master] [mm]  249608ee47:  will-it-scale.per_thread_ops 50.1% improvement
Date: Fri, 13 Dec 2024 00:04:39 +0800	[thread overview]
Message-ID: <202412122346.ea54d461-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 50.1% improvement of will-it-scale.per_thread_ops on:


commit: 249608ee47132cab3b1adacd9e463548f57bd316 ("mm: respect mmap hint address when aligning for THP")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 100%
	mode: thread
	test: brk1
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 51.6% improvement |
| test machine     | 104 threads 2 sockets (Skylake) with 192G memory              |
| test parameters  | cpufreq_governor=performance                                  |
|                  | mode=thread                                                   |
|                  | nr_task=100%                                                  |
|                  | test=brk2                                                     |
+------------------+---------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241212/202412122346.ea54d461-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/brk1/will-it-scale

commit: 
  89dd878282 ("mm: memcg: declare do_memsw_account inline")
  249608ee47 ("mm: respect mmap hint address when aligning for THP")

89dd878282881306 249608ee47132cab3b1adacd9e4 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.271e+09 ± 11%     -23.6%  2.499e+09 ±  4%  cpuidle..time
    534782 ±  3%      -9.8%     482625        meminfo.Shmem
      7292 ± 10%     -16.8%       6068        uptime.idle
    117230            +3.0%     120705        vmstat.system.in
     10.21 ± 10%      -2.5        7.74 ±  4%  mpstat.cpu.all.idle%
      0.10            -0.0        0.08        mpstat.cpu.all.soft%
      0.30 ±  8%      +0.1        0.38 ±  2%  mpstat.cpu.all.usr%
   1562083 ±  5%     -28.9%    1111214 ±  6%  numa-numastat.node0.local_node
   1600171 ±  5%     -27.1%    1165935 ±  5%  numa-numastat.node0.numa_hit
   2469533 ±  5%     -36.7%    1562269 ±  7%  numa-numastat.node1.local_node
   2538689 ±  5%     -36.4%    1615104 ±  7%  numa-numastat.node1.numa_hit
   1599764 ±  5%     -27.2%    1165290 ±  5%  numa-vmstat.node0.numa_hit
   1561676 ±  5%     -28.9%    1110570 ±  6%  numa-vmstat.node0.numa_local
   2537854 ±  5%     -36.4%    1613883 ±  7%  numa-vmstat.node1.numa_hit
   2468697 ±  5%     -36.8%    1561112 ±  7%  numa-vmstat.node1.numa_local
    517.00 ±  6%     +44.8%     748.67 ±  5%  perf-c2c.DRAM.local
      5599 ±  3%     +22.8%       6877 ±  3%  perf-c2c.DRAM.remote
      5356 ±  2%     +17.2%       6277 ±  4%  perf-c2c.HITM.local
      3995 ±  3%     +12.9%       4512 ±  2%  perf-c2c.HITM.remote
    207757 ±  3%     +50.1%     311758 ±  4%  will-it-scale.104.threads
      9.27 ±  4%     -19.6%       7.45 ±  4%  will-it-scale.104.threads_idle
      1997 ±  3%     +50.1%       2997 ±  4%  will-it-scale.per_thread_ops
    207757 ±  3%     +50.1%     311758 ±  4%  will-it-scale.workload
  20771245 ±  7%     +19.8%   24875862 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.avg
   6013540 ±  9%     +29.6%    7795227 ± 15%  sched_debug.cfs_rq:/.avg_vruntime.stddev
  20771245 ±  7%     +19.8%   24875862 ±  5%  sched_debug.cfs_rq:/.min_vruntime.avg
   6013540 ±  9%     +29.6%    7795227 ± 15%  sched_debug.cfs_rq:/.min_vruntime.stddev
      5286 ±  5%     -32.3%       3580 ±  9%  sched_debug.cpu.avg_idle.min
    304791            -4.4%     291399        proc-vmstat.nr_active_anon
   1009858            -1.3%     996889        proc-vmstat.nr_file_pages
     23935            -4.3%      22912        proc-vmstat.nr_mapped
    133626 ±  3%      -9.7%     120653        proc-vmstat.nr_shmem
    108257            -1.7%     106463        proc-vmstat.nr_slab_unreclaimable
    304791            -4.4%     291399        proc-vmstat.nr_zone_active_anon
   4140560           -32.8%    2781620 ±  2%  proc-vmstat.numa_hit
   4033316           -33.7%    2674065 ±  2%  proc-vmstat.numa_local
   7314624 ±  2%     -37.7%    4554492 ±  3%  proc-vmstat.pgalloc_normal
   1102175            -2.4%    1075842        proc-vmstat.pgfault
   7136742 ±  2%     -38.5%    4391328 ±  3%  proc-vmstat.pgfree
      0.49 ±  6%     +23.1%       0.60 ±  6%  perf-stat.i.MPKI
     37.67            +4.2       41.92        perf-stat.i.cache-miss-rate%
  13495545 ±  3%     +26.4%   17064915 ±  6%  perf-stat.i.cache-misses
  36075782 ±  2%     +14.0%   41135363 ±  5%  perf-stat.i.cache-references
      9.29            +2.5%       9.52        perf-stat.i.cpi
 2.621e+11            +2.5%  2.685e+11        perf-stat.i.cpu-cycles
    212.81            -1.4%     209.80        perf-stat.i.cpu-migrations
     19736 ±  4%     -19.1%      15958 ±  7%  perf-stat.i.cycles-between-cache-misses
      0.11 ±  2%      -3.3%       0.11        perf-stat.i.ipc
      0.48 ±  4%     +25.9%       0.60 ±  6%  perf-stat.overall.MPKI
     37.35            +4.0       41.40        perf-stat.overall.cache-miss-rate%
      9.33            +2.0%       9.52        perf-stat.overall.cpi
     19440 ±  3%     -18.7%      15809 ±  7%  perf-stat.overall.cycles-between-cache-misses
      0.11            -2.0%       0.11        perf-stat.overall.ipc
  40994713 ±  3%     -33.4%   27301203 ±  4%  perf-stat.overall.path-length
  13453027 ±  3%     +26.4%   17009626 ±  6%  perf-stat.ps.cache-misses
  36008186 ±  2%     +14.0%   41056969 ±  5%  perf-stat.ps.cache-references
 2.612e+11            +2.5%  2.676e+11        perf-stat.ps.cpu-cycles
    212.16            -1.4%     209.13        perf-stat.ps.cpu-migrations
      0.00 ±143%    +614.3%       0.01 ± 38%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.00 ±223%  +12311.1%       0.19 ±115%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.00         +2575.0%       0.05 ± 92%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.04 ±175%    +275.8%       0.15 ± 89%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ±120%    +669.0%       0.15 ± 89%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ± 32%    +657.1%       0.07 ± 51%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.15 ±114%    +559.8%       1.00 ± 19%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.00 ± 55%    +229.2%       0.01 ± 22%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.04 ± 61%    +378.2%       0.19 ± 15%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.01 ± 15%    +160.3%       0.03 ±109%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.01 ± 30%    +216.1%       0.02 ± 12%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      0.03 ±163%    +448.7%       0.18 ± 24%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ± 30%     +96.7%       0.02 ± 11%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 86%    +234.6%       0.05 ± 60%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.00 ±143%    +700.0%       0.01 ± 33%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.00 ±223%  +50788.9%       0.76 ±137%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      1.05 ±141%    +326.0%       4.46 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_expand.vma_merge_new_range.do_brk_flags
      0.60 ±186%    +271.1%       2.25 ± 74%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.02 ± 97%  +14710.9%       2.72 ± 47%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.17 ±208%    +228.7%       0.54 ± 80%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.10 ±150%   +2829.8%       2.93 ± 34%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.73 ± 99%    +137.5%       4.10 ±  5%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.05 ±162%   +3038.5%       1.62 ± 72%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.18 ±174%   +1759.9%       3.30 ± 41%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      2.19 ± 69%     +74.8%       3.82 ±  6%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      1.16 ± 95%    +211.8%       3.61 ±  8%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 25%    +200.0%       0.02 ± 11%  perf-sched.total_sch_delay.average.ms
      5.20 ±  7%     +55.1%       8.06 ±  7%  perf-sched.total_wait_and_delay.average.ms
    338197 ±  7%     -43.5%     190977 ±  7%  perf-sched.total_wait_and_delay.count.ms
      5.19 ±  7%     +54.9%       8.04 ±  7%  perf-sched.total_wait_time.average.ms
      6.72 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     70.88 ±162%    +311.9%     292.00 ± 22%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.91 ± 15%     -43.6%       0.51 ±  3%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
    279.25 ± 11%     +24.7%     348.09 ±  5%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    607.00 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    328796 ±  8%     -45.0%     180683 ±  7%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      3211 ±  6%     -20.9%       2541 ±  7%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1001          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.00 ±223%  +52555.6%       0.79 ± 31%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.00 ±142%  +1.2e+05%       1.79 ± 90%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.commit_merge.vma_expand
     70.88 ±162%    +312.0%     291.99 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.91 ± 16%     -45.1%       0.50 ±  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      0.98 ± 11%     +43.4%       1.40 ± 25%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    279.22 ± 11%     +24.7%     348.08 ±  5%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.00 ±223%  +1.5e+05%       2.21 ± 63%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.00 ±145%  +2.2e+05%       3.74 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.commit_merge.vma_expand
      0.05 ±161%   +3018.3%       1.62 ± 72%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.59 ±  3%      -0.3        0.27 ±100%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable
      0.57 ±  6%      -0.3        0.26 ±100%  perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.70 ±  4%      -0.2        1.49 ±  3%  perf-profile.calltrace.cycles-pp.common_startup_64
      1.61 ±  4%      -0.2        1.40 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.61 ±  4%      -0.2        1.40 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.62 ±  4%      -0.2        1.42 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.68 ±  4%      -0.2        1.47 ±  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.68 ±  4%      -0.2        1.48 ±  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.68 ±  4%      -0.2        1.48 ±  3%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      0.72            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.81            -0.1        0.70        perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
     97.96            +0.1       98.08        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     97.98            +0.1       98.11        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     96.80            +0.1       96.94        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk.do_syscall_64
     98.01            +0.1       98.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
     96.91            +0.2       97.07        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     96.94            +0.2       97.12        perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     95.81            +0.2       96.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
     98.17            +0.2       98.40        perf-profile.calltrace.cycles-pp.brk
      0.00            +0.6        0.59 ±  2%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.53 ±  6%      -0.4        0.17 ±  8%  perf-profile.children.cycles-pp.intel_idle_irq
      1.00 ±  4%      -0.3        0.70 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.70 ±  4%      -0.2        1.49 ±  3%  perf-profile.children.cycles-pp.common_startup_64
      1.70 ±  4%      -0.2        1.49 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.63 ±  4%      -0.2        1.42 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter
      1.63 ±  4%      -0.2        1.42 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.64 ±  4%      -0.2        1.43 ±  3%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.70 ±  4%      -0.2        1.49 ±  3%  perf-profile.children.cycles-pp.do_idle
      1.68 ±  4%      -0.2        1.48 ±  3%  perf-profile.children.cycles-pp.start_secondary
      0.21 ±  2%      -0.2        0.05        perf-profile.children.cycles-pp.mas_store_gfp
      0.72            -0.1        0.58 ±  2%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.82            -0.1        0.70        perf-profile.children.cycles-pp.rwsem_spin_on_owner
      0.17 ±  2%      -0.1        0.06 ±  7%  perf-profile.children.cycles-pp.mas_store_prealloc
      0.17 ±  2%      -0.1        0.07 ±  5%  perf-profile.children.cycles-pp.vma_complete
      0.58 ±  6%      -0.1        0.49 ±  9%  perf-profile.children.cycles-pp.intel_idle_ibrs
      0.64 ±  3%      -0.1        0.56 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.54 ±  3%      -0.1        0.47 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.54 ±  4%      -0.1        0.47 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.45 ±  3%      -0.1        0.39 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.41 ±  4%      -0.1        0.36 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.35            -0.0        0.31 ±  3%  perf-profile.children.cycles-pp.vms_gather_munmap_vmas
      0.32            -0.0        0.27 ±  3%  perf-profile.children.cycles-pp.__split_vma
      0.36 ±  2%      -0.0        0.31 ±  5%  perf-profile.children.cycles-pp.update_process_times
      0.14 ±  6%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.handle_softirqs
      0.23 ±  2%      -0.0        0.20 ±  4%  perf-profile.children.cycles-pp.sched_tick
      0.13 ±  6%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.rcu_core
      0.13 ±  5%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.rcu_do_batch
      0.15 ±  3%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.kmem_cache_free
      0.06 ±  6%      -0.0        0.04 ± 44%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.06 ± 11%      -0.0        0.05        perf-profile.children.cycles-pp.kthread
      0.06 ± 11%      -0.0        0.05        perf-profile.children.cycles-pp.ret_from_fork
      0.06 ± 11%      -0.0        0.05        perf-profile.children.cycles-pp.ret_from_fork_asm
      0.06 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.smpboot_thread_fn
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.__slab_free
      0.06 ±  7%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.vma_expand
      0.07 ±  7%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.08 ±  6%      +0.0        0.10        perf-profile.children.cycles-pp.vma_merge_new_range
      0.06 ±  9%      +0.0        0.08 ±  4%  perf-profile.children.cycles-pp.anon_vma_clone
      0.08 ±  5%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.up_write
      0.06 ±  8%      +0.0        0.09 ±  8%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.05 ±  7%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      0.08 ±  5%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.vms_clear_ptes
      0.12 ±  4%      +0.0        0.16 ±  2%  perf-profile.children.cycles-pp.do_brk_flags
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.unlink_anon_vmas
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.06 ±  8%  perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      0.00            +0.1        0.06 ±  8%  perf-profile.children.cycles-pp.vm_area_dup
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.free_pgtables
      0.16 ±  4%      +0.1        0.22 ±  3%  perf-profile.children.cycles-pp.vms_complete_munmap_vmas
      0.00            +0.1        0.07 ±  5%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.00            +0.1        0.11 ±  4%  perf-profile.children.cycles-pp.poll_idle
     97.96            +0.1       98.08        perf-profile.children.cycles-pp.__do_sys_brk
     98.02            +0.1       98.14        perf-profile.children.cycles-pp.do_syscall_64
     96.80            +0.1       96.94        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     98.05            +0.1       98.19        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.45 ±  4%      +0.2        0.60 ±  2%  perf-profile.children.cycles-pp.intel_idle
     96.91            +0.2       97.07        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     96.94            +0.2       97.12        perf-profile.children.cycles-pp.down_write_killable
     95.84            +0.2       96.02        perf-profile.children.cycles-pp.osq_lock
     98.18            +0.2       98.40        perf-profile.children.cycles-pp.brk
      0.50 ±  6%      -0.3        0.16 ±  9%  perf-profile.self.cycles-pp.intel_idle_irq
      0.81            -0.1        0.70        perf-profile.self.cycles-pp.rwsem_spin_on_owner
      0.58 ±  6%      -0.1        0.49 ±  9%  perf-profile.self.cycles-pp.intel_idle_ibrs
      0.06 ±  8%      +0.0        0.08 ±  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.06 ±  7%      +0.0        0.09 ±  4%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.05 ±  7%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.13 ±  2%      +0.1        0.18 ±  2%  perf-profile.self.cycles-pp.rwsem_optimistic_spin
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.up_write
      0.00            +0.1        0.11 ±  4%  perf-profile.self.cycles-pp.poll_idle
      0.45 ±  4%      +0.2        0.60 ±  2%  perf-profile.self.cycles-pp.intel_idle
     95.28            +0.3       95.53        perf-profile.self.cycles-pp.osq_lock


***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/brk2/will-it-scale

commit: 
  89dd878282 ("mm: memcg: declare do_memsw_account inline")
  249608ee47 ("mm: respect mmap hint address when aligning for THP")

89dd878282881306 249608ee47132cab3b1adacd9e4 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.415e+09 ±  5%     -18.3%  2.791e+09 ±  8%  cpuidle..time
    117810            +2.1%     120255        vmstat.system.in
     10.66 ±  4%      -2.0        8.69 ±  8%  mpstat.cpu.all.idle%
      0.10            -0.0        0.08 ±  2%  mpstat.cpu.all.soft%
      0.31            +0.1        0.37 ±  2%  mpstat.cpu.all.usr%
   1679216 ±  5%     -30.5%    1166751 ±  9%  numa-numastat.node0.local_node
   1728543 ±  4%     -29.7%    1214908 ±  8%  numa-numastat.node0.numa_hit
   2318360 ±  3%     -30.9%    1600917 ±  6%  numa-numastat.node1.local_node
   2376686 ±  2%     -30.1%    1660471 ±  5%  numa-numastat.node1.numa_hit
   1726631 ±  4%     -29.7%    1214257 ±  8%  numa-vmstat.node0.numa_hit
   1677304 ±  5%     -30.5%    1166100 ±  9%  numa-vmstat.node0.numa_local
   2374815 ±  2%     -30.1%    1659314 ±  5%  numa-vmstat.node1.numa_hit
   2316489 ±  3%     -30.9%    1599760 ±  6%  numa-vmstat.node1.numa_local
    198860           +51.6%     301493 ±  2%  will-it-scale.104.threads
     10.10           -22.5%       7.82 ±  2%  will-it-scale.104.threads_idle
      1911           +51.6%       2898 ±  2%  will-it-scale.per_thread_ops
    198860           +51.6%     301493 ±  2%  will-it-scale.workload
    506.67 ±  6%     +50.9%     764.67 ±  3%  perf-c2c.DRAM.local
      5447           +27.1%       6925 ±  3%  perf-c2c.DRAM.remote
      5367 ±  2%     +18.6%       6364        perf-c2c.HITM.local
      3830           +17.8%       4513 ±  3%  perf-c2c.HITM.remote
      9197           +18.3%      10877 ±  2%  perf-c2c.HITM.total
     23736            -1.8%      23303        proc-vmstat.nr_mapped
    108712            -2.0%     106548        proc-vmstat.nr_slab_unreclaimable
   4105528           -30.0%    2875907        proc-vmstat.numa_hit
   3997875           -30.8%    2768196        proc-vmstat.numa_local
    236448 ± 14%     -25.0%     177254 ± 12%  proc-vmstat.numa_pte_updates
   7242851           -34.3%    4757136        proc-vmstat.pgalloc_normal
   7071106           -35.1%    4589946        proc-vmstat.pgfree
  19917807 ±  2%     +24.3%   24752419 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.avg
  38832674 ±  6%     +31.8%   51167079 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.max
   5538759 ±  3%     +56.3%    8659607 ± 16%  sched_debug.cfs_rq:/.avg_vruntime.stddev
  19917807 ±  2%     +24.3%   24752418 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
  38832674 ±  6%     +31.8%   51167093 ±  8%  sched_debug.cfs_rq:/.min_vruntime.max
   5538759 ±  3%     +56.3%    8659606 ± 16%  sched_debug.cfs_rq:/.min_vruntime.stddev
    894.81 ±  7%     +11.9%       1001 ±  8%  sched_debug.cfs_rq:/.util_est.max
      5560 ±  6%     -40.7%       3294 ±  3%  sched_debug.cpu.avg_idle.min
      0.52 ±  3%     +21.7%       0.63 ±  3%  perf-stat.i.MPKI
  17623556            -6.6%   16458641 ±  3%  perf-stat.i.branch-misses
     37.96            +3.6       41.59        perf-stat.i.cache-miss-rate%
  14340737 ±  3%     +22.2%   17528616 ±  2%  perf-stat.i.cache-misses
  38069590 ±  2%     +11.5%   42445235 ±  2%  perf-stat.i.cache-references
      9.24            +2.6%       9.48        perf-stat.i.cpi
 2.602e+11            +2.4%  2.665e+11        perf-stat.i.cpu-cycles
     18443 ±  3%     -17.1%      15286 ±  2%  perf-stat.i.cycles-between-cache-misses
      0.51 ±  2%     +22.2%       0.63 ±  2%  perf-stat.overall.MPKI
      0.32            -0.0        0.29 ±  2%  perf-stat.overall.branch-miss-rate%
     37.63            +3.6       41.25        perf-stat.overall.cache-miss-rate%
      9.28            +2.4%       9.50        perf-stat.overall.cpi
     18154 ±  2%     -16.2%      15205 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.11            -2.3%       0.11        perf-stat.overall.ipc
  42574383           -33.8%   28187632 ±  2%  perf-stat.overall.path-length
  17580646            -6.7%   16398374 ±  3%  perf-stat.ps.branch-misses
  14294844 ±  3%     +22.2%   17469729 ±  2%  perf-stat.ps.cache-misses
  37981661 ±  2%     +11.5%   42347645 ±  2%  perf-stat.ps.cache-references
 2.593e+11            +2.4%  2.655e+11        perf-stat.ps.cpu-cycles
      0.00 ±147%    +500.0%       0.01 ± 14%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.11 ±  8%     -32.5%       0.08 ± 23%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.00 ±223%  +10641.7%       0.21 ± 55%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.00 ±179%   +2890.9%       0.05 ± 53%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.01 ±135%    +390.2%       0.07 ±100%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.00 ±223%   +1475.0%       0.01 ± 71%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.00 ±223%   +9837.5%       0.13 ±121%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      0.00 ± 14%   +1830.0%       0.06 ± 97%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.01 ±  8%   +2452.0%       0.21 ± 64%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 16%    +870.6%       0.08 ± 84%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ±  6%    +823.9%       0.07 ± 31%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00 ±100%    +411.1%       0.01 ±  9%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.02 ± 34%   +3178.5%       0.71 ± 32%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.01 ± 75%   +1602.7%       0.10 ±143%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.12 ±150%     -87.6%       0.02 ± 45%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.00 ±150%   +1047.1%       0.03 ±105%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      0.00 ± 30%    +346.7%       0.01 ± 20%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.02 ± 68%   +1050.0%       0.19 ± 27%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.01 ± 14%    +376.8%       0.04 ±105%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.01 ±  9%    +138.9%       0.01 ± 12%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      0.01         +2033.3%       0.13 ± 33%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ± 11%    +216.7%       0.03 ± 83%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.01 ±  5%    +172.1%       0.02 ± 11%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 61%    +173.4%       0.03 ± 46%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.00 ±147%    +787.5%       0.01 ± 37%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.03 ±223%   +4840.4%       1.24 ± 64%  perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      0.00 ±223%  +41625.0%       0.83 ± 60%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.16 ±213%    +813.2%       1.48 ± 78%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_expand.vma_merge_new_range.do_brk_flags
      0.00 ±167%  +43144.0%       1.80 ± 59%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.00 ±223%  +22188.9%       0.33 ±216%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
      0.00 ±223%   +2458.3%       0.05 ±154%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.00 ±223%  +68268.8%       1.82 ± 71%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      0.00 ± 11%  +15918.5%       0.72 ±101%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.01 ± 12%   +5779.5%       0.72 ± 50%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 53%   +2545.4%       0.48 ± 73%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ± 18%  +15675.3%       2.45 ± 11%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00 ±100%   +1100.0%       0.02 ± 76%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.22 ± 70%   +1725.7%       3.94 ±  4%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.01 ± 72%   +3737.3%       0.33 ±114%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.00 ±141%  +25095.7%       0.97 ±144%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      0.58 ± 79%    +423.4%       3.03 ± 43%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.91 ± 75%    +324.0%       3.84 ±  3%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.02 ± 49%  +18885.6%       3.51 ± 21%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.06 ±  5%   +3199.2%       2.01        perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.93 ±115%    +238.9%       3.16 ± 52%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      5.53 ±  3%     +35.2%       7.48 ±  3%  perf-sched.total_wait_and_delay.average.ms
    330090           -37.0%     207837 ±  4%  perf-sched.total_wait_and_delay.count.ms
      5.52 ±  3%     +35.2%       7.46 ±  3%  perf-sched.total_wait_time.average.ms
      6.70 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    167.82 ± 96%     -92.4%      12.75 ± 78%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      1.20 ±  4%     -58.9%       0.49 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
    280.09 ±  3%     +36.1%     381.15 ±  3%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    606.50 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    320972           -38.3%     197924 ±  4%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      3118 ±  2%     -24.6%       2352 ±  2%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    693.67            -9.8%     626.00        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    167.82 ± 96%     -91.5%      14.30 ± 56%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.55 ±223%    +762.9%       4.74 ±117%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
      0.61 ±  3%     +24.0%       0.76 ±  8%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±221%   +3041.2%       8.22 ±129%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      1.20 ±  4%     -59.9%       0.48 ±  4%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      0.91           +45.7%       1.32 ±  6%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    280.07 ±  3%     +36.1%     381.13 ±  3%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.43 ±223%    +525.8%       2.69 ± 57%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      3.29 ±223%   +1258.4%      44.70 ± 98%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
     29.75 ±  9%     +42.0%      42.24 ± 16%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.52 ±222%  +67466.8%     350.90 ±131%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      3.60 ±  5%    +106.8%       7.43 ± 11%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      5.04           +36.0%       6.86 ±  4%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1.72 ±  3%      -0.2        1.47 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.73 ±  3%      -0.2        1.48 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.72 ±  3%      -0.2        1.47 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.82 ±  3%      -0.2        1.57 ±  3%  perf-profile.calltrace.cycles-pp.common_startup_64
      1.80 ±  3%      -0.2        1.56 ±  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.80 ±  3%      -0.2        1.56 ±  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.80 ±  3%      -0.2        1.56 ±  3%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      0.63 ±  3%      -0.2        0.43 ± 44%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable
      0.73            -0.1        0.59        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.82            -0.1        0.71        perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
      0.63 ±  3%      -0.1        0.54 ±  4%  perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     97.85            +0.2       98.02        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     97.87            +0.2       98.04        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     96.68            +0.2       96.85        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk.do_syscall_64
     97.90            +0.2       98.09        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
     96.79            +0.2       96.99        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     96.82            +0.2       97.04        perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     95.68            +0.2       95.91        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__do_sys_brk
     98.06            +0.3       98.32        perf-profile.calltrace.cycles-pp.brk
      0.00            +0.6        0.60 ±  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.56 ±  4%      -0.4        0.16 ±  4%  perf-profile.children.cycles-pp.intel_idle_irq
      1.06 ±  3%      -0.4        0.70 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.73 ±  3%      -0.2        1.49 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter
      1.73 ±  3%      -0.2        1.49 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.74 ±  3%      -0.2        1.50 ±  3%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.82 ±  3%      -0.2        1.57 ±  3%  perf-profile.children.cycles-pp.common_startup_64
      1.82 ±  3%      -0.2        1.57 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.82 ±  3%      -0.2        1.57 ±  3%  perf-profile.children.cycles-pp.do_idle
      1.80 ±  3%      -0.2        1.56 ±  3%  perf-profile.children.cycles-pp.start_secondary
      0.21            -0.2        0.05 ±  7%  perf-profile.children.cycles-pp.mas_store_gfp
      0.73            -0.1        0.59        perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.69 ±  2%      -0.1        0.56 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.58 ±  3%      -0.1        0.47 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.83            -0.1        0.72        perf-profile.children.cycles-pp.rwsem_spin_on_owner
      0.17 ±  2%      -0.1        0.07 ±  7%  perf-profile.children.cycles-pp.mas_store_prealloc
      0.58 ±  3%      -0.1        0.47 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.17 ±  2%      -0.1        0.07 ±  6%  perf-profile.children.cycles-pp.vma_complete
      0.49 ±  3%      -0.1        0.39 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.63 ±  4%      -0.1        0.55 ±  4%  perf-profile.children.cycles-pp.intel_idle_ibrs
      0.44 ±  4%      -0.1        0.36 ±  4%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.39 ±  3%      -0.1        0.32 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.32            -0.0        0.28        perf-profile.children.cycles-pp.__split_vma
      0.36            -0.0        0.31        perf-profile.children.cycles-pp.vms_gather_munmap_vmas
      0.24 ±  4%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.sched_tick
      0.19 ±  7%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.task_tick_fair
      0.06 ±  6%      -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.smpboot_thread_fn
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.rcu_do_batch
      0.13 ±  3%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.rcu_core
      0.14 ±  2%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.handle_softirqs
      0.08 ±  4%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.get_jiffies_update
      0.08 ±  5%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.tmigr_requires_handle_remote
      0.14 ±  2%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.kmem_cache_free
      0.07 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.kthread
      0.07 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.ret_from_fork
      0.07 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.ret_from_fork_asm
      0.10 ±  7%      -0.0        0.08 ±  4%  perf-profile.children.cycles-pp.update_cfs_group
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.__slab_free
      0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.commit_merge
      0.06 ±  9%      +0.0        0.08 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.06 ±  6%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.vma_expand
      0.08 ±  4%      +0.0        0.11 ±  5%  perf-profile.children.cycles-pp.up_write
      0.06 ±  6%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.05 ±  7%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.anon_vma_clone
      0.07 ±  5%      +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
      0.06 ±  9%      +0.0        0.09        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      0.08 ±  5%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.vms_clear_ptes
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.unlink_anon_vmas
      0.00            +0.1        0.05 ±  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.11 ±  4%      +0.1        0.17 ±  2%  perf-profile.children.cycles-pp.do_brk_flags
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.free_pgtables
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.vm_area_dup
      0.17 ±  2%      +0.1        0.23 ±  2%  perf-profile.children.cycles-pp.vms_complete_munmap_vmas
      0.00            +0.1        0.07 ±  7%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.poll_idle
      0.46 ±  4%      +0.1        0.60 ±  3%  perf-profile.children.cycles-pp.intel_idle
     97.85            +0.2       98.02        perf-profile.children.cycles-pp.__do_sys_brk
     97.90            +0.2       98.08        perf-profile.children.cycles-pp.do_syscall_64
     96.68            +0.2       96.86        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     97.94            +0.2       98.12        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     96.79            +0.2       96.99        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     96.82            +0.2       97.04        perf-profile.children.cycles-pp.down_write_killable
     95.71            +0.2       95.94        perf-profile.children.cycles-pp.osq_lock
     98.06            +0.3       98.32        perf-profile.children.cycles-pp.brk
      0.54 ±  4%      -0.4        0.15 ±  3%  perf-profile.self.cycles-pp.intel_idle_irq
      0.82            -0.1        0.71        perf-profile.self.cycles-pp.rwsem_spin_on_owner
      0.63 ±  4%      -0.1        0.55 ±  4%  perf-profile.self.cycles-pp.intel_idle_ibrs
      0.08 ±  4%      -0.0        0.06 ± 11%  perf-profile.self.cycles-pp.get_jiffies_update
      0.10 ±  7%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.update_cfs_group
      0.06            -0.0        0.05        perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      0.06 ±  9%      +0.0        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.06            +0.0        0.09 ±  6%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.13 ±  3%      +0.1        0.18 ±  2%  perf-profile.self.cycles-pp.rwsem_optimistic_spin
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.up_write
      0.00            +0.1        0.12 ±  4%  perf-profile.self.cycles-pp.poll_idle
      0.46 ±  4%      +0.1        0.60 ±  3%  perf-profile.self.cycles-pp.intel_idle
     95.11            +0.3       95.44        perf-profile.self.cycles-pp.osq_lock





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2024-12-12 16:06 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202412122346.ea54d461-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hboehm@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lokeshgidra@google.com \
    --cc=minchan@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.