All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	sparclinux@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com,
	oliver.sang@intel.com
Subject: [linus:master] [locking]  c8afaa1b0f:  stress-ng.zero.ops_per_sec 6.3% improvement
Date: Tue, 15 Aug 2023 15:11:45 +0800	[thread overview]
Message-ID: <202308151426.97be5bd8-oliver.sang@intel.com> (raw)



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	class: memory
	test: zero
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     20.98 ±  8%     +12.7%      23.65 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     21.05 ±  8%    +803.4%     190.14 ±196%  perf-sched.total_sch_delay.max.ms
     46437            +2.4%      47564        stress-ng.time.involuntary_context_switches
  87942414            +6.3%   93441484        stress-ng.time.minor_page_faults
  21983137            +6.3%   23357886        stress-ng.zero.ops
    366380            +6.3%     389295        stress-ng.zero.ops_per_sec
    100683            +4.1%     104861 ±  2%  proc-vmstat.nr_shmem
  60215587            +6.2%   63957836        proc-vmstat.numa_hit
  60148996            +6.2%   63889951        proc-vmstat.numa_local
  22046746            +6.2%   23421583        proc-vmstat.pgactivate
  83092777            +6.3%   88309102        proc-vmstat.pgalloc_normal
  88854159            +6.1%   94276960        proc-vmstat.pgfault
  82294936            +6.3%   87489838        proc-vmstat.pgfree
  21970411            +6.3%   23344438        proc-vmstat.unevictable_pgs_culled
  21970116            +6.3%   23344165        proc-vmstat.unevictable_pgs_mlocked
  21970115            +6.3%   23344164        proc-vmstat.unevictable_pgs_munlocked
  21970113            +6.3%   23344161        proc-vmstat.unevictable_pgs_rescued
 1.455e+10            +4.2%  1.517e+10        perf-stat.i.branch-instructions
  58358654            +5.0%   61304729        perf-stat.i.branch-misses
  1.12e+08            +5.2%  1.179e+08        perf-stat.i.cache-misses
 2.569e+08            +5.1%  2.698e+08        perf-stat.i.cache-references
      3.32            -4.4%       3.17        perf-stat.i.cpi
      2031 ±  2%      -5.0%       1930 ±  2%  perf-stat.i.cycles-between-cache-misses
 1.603e+10            +4.4%  1.674e+10        perf-stat.i.dTLB-loads
 7.449e+09            +6.1%  7.901e+09        perf-stat.i.dTLB-stores
  6.52e+10            +4.4%  6.807e+10        perf-stat.i.instructions
      0.31            +5.7%       0.33 ±  3%  perf-stat.i.ipc
    825.05            +4.8%     864.24        perf-stat.i.metric.K/sec
    598.07            +4.7%     626.06        perf-stat.i.metric.M/sec
  12910790            +4.3%   13471810        perf-stat.i.node-load-misses
   7901301 ±  2%      +5.7%    8348185        perf-stat.i.node-loads
  21890957 ±  3%      +6.9%   23410670 ±  2%  perf-stat.i.node-stores
      3.38            -4.3%       3.23        perf-stat.overall.cpi
      1964            -5.1%       1864        perf-stat.overall.cycles-between-cache-misses
      0.30            +4.5%       0.31        perf-stat.overall.ipc
 1.431e+10            +4.3%  1.493e+10        perf-stat.ps.branch-instructions
  57370846            +5.0%   60264193        perf-stat.ps.branch-misses
 1.103e+08            +5.3%   1.16e+08        perf-stat.ps.cache-misses
 2.528e+08            +5.1%  2.657e+08        perf-stat.ps.cache-references
 1.577e+10            +4.5%  1.647e+10        perf-stat.ps.dTLB-loads
  7.33e+09            +6.1%  7.776e+09        perf-stat.ps.dTLB-stores
 6.415e+10            +4.4%  6.699e+10        perf-stat.ps.instructions
  12704753            +4.4%   13259951        perf-stat.ps.node-load-misses
   7778242 ±  2%      +5.7%    8224062        perf-stat.ps.node-loads
  21539559 ±  3%      +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12            +5.0%  4.205e+12        perf-stat.total.instructions
     38.85            -0.8       38.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dentry_kill
     39.12            -0.8       38.34        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dentry_kill.dput
     41.16            -0.7       40.44        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
     42.07            -0.7       41.39        perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
     42.09            -0.7       41.42        perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_user_mode_loop
     42.13            -0.7       41.46        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare
     42.59            -0.6       41.94        perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     42.69            -0.6       42.04        perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     42.77            -0.6       42.12        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.75            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.74            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.04            -0.4       46.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     46.97            -0.4       46.55        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.20            -0.4       46.79        perf-profile.calltrace.cycles-pp.__munmap
     39.35            -0.3       39.09        perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
     39.08            -0.2       38.84        perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup
      0.60            +0.0        0.63        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap
      0.64 ±  2%      +0.0        0.66        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
      0.62            +0.0        0.65        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
      0.64            +0.0        0.67        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup
      0.80            +0.0        0.84        perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.72            +0.0        0.75        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode
      0.88 ±  2%      +0.0        0.92        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.94 ±  2%      +0.0        0.98        perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.82            +0.0        0.86        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
      0.79 ±  2%      +0.0        0.84        perf-profile.calltrace.cycles-pp.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.22            +0.0        1.26        perf-profile.calltrace.cycles-pp.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.81            +0.0        0.85        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages
      0.97            +0.1        1.02        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
      0.63            +0.1        0.68        perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
      1.07            +0.1        1.12        perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
      0.61            +0.1        0.66        perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict
      0.83 ±  2%      +0.1        0.88        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
      0.58            +0.1        0.64 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.95 ±  3%      +0.1        1.01        perf-profile.calltrace.cycles-pp.alloc_file.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region
      0.61            +0.1        0.67        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.31            +0.1        1.38        perf-profile.calltrace.cycles-pp.stress_zero
      1.20            +0.1        1.27        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dentry_kill
      1.63            +0.1        1.70        perf-profile.calltrace.cycles-pp.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64
      1.37            +0.1        1.44        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dentry_kill.dput
      0.95 ±  2%      +0.1        1.02        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.32 ±  3%      +0.1        1.40        perf-profile.calltrace.cycles-pp.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region.do_mmap
      1.82 ±  2%      +0.1        1.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      2.18            +0.1        2.32        perf-profile.calltrace.cycles-pp.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.35            +0.1        2.49        perf-profile.calltrace.cycles-pp.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.34 ± 70%      +0.2        0.52        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region
      3.82            +0.2        4.03        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      4.01 ±  2%      +0.2        4.23        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.16 ±  2%      +0.2        4.39        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     50.50            +0.3       50.80        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.40            +0.3       50.70        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.53            +0.3       50.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     50.71            +0.3       51.02        perf-profile.calltrace.cycles-pp.__mmap
     78.82            -1.0       77.78        perf-profile.children.cycles-pp._raw_spin_lock
     78.20            -0.9       77.29        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     41.17            -0.7       40.44        perf-profile.children.cycles-pp.evict
     42.07            -0.7       41.39        perf-profile.children.cycles-pp.__dentry_kill
     42.10            -0.7       41.42        perf-profile.children.cycles-pp.dentry_kill
     42.13            -0.7       41.46        perf-profile.children.cycles-pp.dput
     42.70            -0.6       42.05        perf-profile.children.cycles-pp.task_work_run
     42.60            -0.6       41.95        perf-profile.children.cycles-pp.__fput
     42.84            -0.6       42.20        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     42.78            -0.6       42.14        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     42.74            -0.6       42.10        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     47.25            -0.4       46.85        perf-profile.children.cycles-pp.__munmap
     39.36            -0.3       39.10        perf-profile.children.cycles-pp.inode_sb_list_add
     40.21            -0.2       40.01        perf-profile.children.cycles-pp.new_inode
     40.58            -0.2       40.38        perf-profile.children.cycles-pp.shmem_get_inode
     97.92            -0.1       97.83        perf-profile.children.cycles-pp.do_syscall_64
     98.05            -0.1       97.96        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.15 ±  2%      +0.0        0.16        perf-profile.children.cycles-pp.mas_prev_slot
      0.19 ±  2%      +0.0        0.20        perf-profile.children.cycles-pp.ksys_read
      0.14 ±  4%      +0.0        0.16 ±  5%  perf-profile.children.cycles-pp.inode_init_owner
      0.20 ±  2%      +0.0        0.21        perf-profile.children.cycles-pp.errseq_sample
      0.35            +0.0        0.37        perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.48            +0.0        0.50        perf-profile.children.cycles-pp.__destroy_inode
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.memcg_list_lru_alloc
      0.49            +0.0        0.51        perf-profile.children.cycles-pp.destroy_inode
      0.49 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.25 ±  3%      +0.0        0.28 ±  2%  perf-profile.children.cycles-pp.__munlock_folio
      0.64            +0.0        0.66        perf-profile.children.cycles-pp.shmem_fault
      0.50 ±  2%      +0.0        0.53        perf-profile.children.cycles-pp.perf_event_mmap_output
      0.63            +0.0        0.66        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.22 ±  4%      +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.inode_init_always
      0.24 ±  3%      +0.0        0.27 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.61 ±  2%      +0.0        0.64        perf-profile.children.cycles-pp.perf_iterate_sb
      0.65            +0.0        0.68        perf-profile.children.cycles-pp.vm_unmapped_area
      0.29 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.clear_nlink
      0.64            +0.0        0.68        perf-profile.children.cycles-pp.__do_fault
      0.73            +0.0        0.76        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.20 ±  4%      +0.0        0.24 ±  3%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.36 ±  2%      +0.0        0.40        perf-profile.children.cycles-pp.slab_pre_alloc_hook
      0.20 ±  8%      +0.0        0.24 ±  9%  perf-profile.children.cycles-pp.free_unref_page
      0.18 ±  9%      +0.0        0.21 ±  8%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.80            +0.0        0.84        perf-profile.children.cycles-pp.get_unmapped_area
      1.23            +0.0        1.27        perf-profile.children.cycles-pp.mas_store_prealloc
      0.88 ±  2%      +0.0        0.92        perf-profile.children.cycles-pp.___slab_alloc
      1.14            +0.0        1.18        perf-profile.children.cycles-pp.mas_wr_node_store
      0.94 ±  2%      +0.0        0.98        perf-profile.children.cycles-pp.perf_event_mmap
      0.89 ±  2%      +0.0        0.93        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_read_fault
      0.80 ±  2%      +0.0        0.84        perf-profile.children.cycles-pp.vm_area_alloc
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_fault
      0.41 ±  2%      +0.0        0.46 ±  2%  perf-profile.children.cycles-pp.mlock_drain_local
      1.16            +0.0        1.21        perf-profile.children.cycles-pp.mas_wr_modify
      0.44            +0.0        0.49 ±  3%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.40 ±  2%      +0.0        0.45        perf-profile.children.cycles-pp.mlock_folio_batch
      0.63 ±  2%      +0.0        0.68        perf-profile.children.cycles-pp.__folio_batch_release
      0.82 ±  2%      +0.0        0.86        perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.fput
      0.98            +0.1        1.03        perf-profile.children.cycles-pp.__handle_mm_fault
      1.07            +0.1        1.13        perf-profile.children.cycles-pp.handle_mm_fault
      0.83 ±  2%      +0.1        0.89        perf-profile.children.cycles-pp.alloc_inode
      0.73 ±  2%      +0.1        0.78        perf-profile.children.cycles-pp.release_pages
      0.58            +0.1        0.64        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.96 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.alloc_file
      0.61            +0.1        0.67        perf-profile.children.cycles-pp.tlb_batch_pages_flush
      1.34 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.kmem_cache_alloc
      1.33            +0.1        1.40        perf-profile.children.cycles-pp.stress_zero
      1.21            +0.1        1.28        perf-profile.children.cycles-pp.shmem_undo_range
      0.95 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.64            +0.1        1.71        perf-profile.children.cycles-pp.__get_user_pages
      1.38            +0.1        1.45        perf-profile.children.cycles-pp.shmem_evict_inode
      0.68            +0.1        0.76 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.72            +0.1        0.80 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
      1.32 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.alloc_file_pseudo
      0.49            +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.67            +0.1        0.78 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.83 ±  2%      +0.1        1.94        perf-profile.children.cycles-pp.unmap_region
      2.18            +0.1        2.32        perf-profile.children.cycles-pp.populate_vma_page_range
      2.35            +0.1        2.49        perf-profile.children.cycles-pp.__mm_populate
      3.87 ±  2%      +0.2        4.08        perf-profile.children.cycles-pp.do_vmi_align_munmap
      4.17 ±  2%      +0.2        4.39        perf-profile.children.cycles-pp.__vm_munmap
      4.19 ±  2%      +0.2        4.42        perf-profile.children.cycles-pp.do_vmi_munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.children.cycles-pp.__x64_sys_munmap
     50.41            +0.3       50.71        perf-profile.children.cycles-pp.vm_mmap_pgoff
     50.75            +0.3       51.07        perf-profile.children.cycles-pp.__mmap
     75.97            -1.0       74.99        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.02 ±  2%      -0.0        0.98        perf-profile.self.cycles-pp._raw_spin_lock
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.get_random_u32
      0.19 ±  2%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.17 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk
      0.18 ±  2%      +0.0        0.20        perf-profile.self.cycles-pp.memcg_list_lru_alloc
      0.19            +0.0        0.21 ±  2%  perf-profile.self.cycles-pp.errseq_sample
      0.28 ±  2%      +0.0        0.30        perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
      0.18 ±  4%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp.inode_init_always
      0.42            +0.0        0.44        perf-profile.self.cycles-pp.__destroy_inode
      0.50            +0.0        0.53 ±  2%  perf-profile.self.cycles-pp.mas_wr_node_store
      0.29 ±  2%      +0.0        0.32        perf-profile.self.cycles-pp.clear_nlink
      0.21 ±  3%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__fput
      1.31            +0.1        1.37        perf-profile.self.cycles-pp.stress_zero




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	<linux-alpha@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-ia64@vger.kernel.org>, <linux-mips@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>, <sparclinux@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	<oliver.sang@intel.com>
Subject: [linus:master] [locking]  c8afaa1b0f:  stress-ng.zero.ops_per_sec 6.3% improvement
Date: Tue, 15 Aug 2023 15:11:45 +0800	[thread overview]
Message-ID: <202308151426.97be5bd8-oliver.sang@intel.com> (raw)



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	class: memory
	test: zero
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     20.98 ±  8%     +12.7%      23.65 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     21.05 ±  8%    +803.4%     190.14 ±196%  perf-sched.total_sch_delay.max.ms
     46437            +2.4%      47564        stress-ng.time.involuntary_context_switches
  87942414            +6.3%   93441484        stress-ng.time.minor_page_faults
  21983137            +6.3%   23357886        stress-ng.zero.ops
    366380            +6.3%     389295        stress-ng.zero.ops_per_sec
    100683            +4.1%     104861 ±  2%  proc-vmstat.nr_shmem
  60215587            +6.2%   63957836        proc-vmstat.numa_hit
  60148996            +6.2%   63889951        proc-vmstat.numa_local
  22046746            +6.2%   23421583        proc-vmstat.pgactivate
  83092777            +6.3%   88309102        proc-vmstat.pgalloc_normal
  88854159            +6.1%   94276960        proc-vmstat.pgfault
  82294936            +6.3%   87489838        proc-vmstat.pgfree
  21970411            +6.3%   23344438        proc-vmstat.unevictable_pgs_culled
  21970116            +6.3%   23344165        proc-vmstat.unevictable_pgs_mlocked
  21970115            +6.3%   23344164        proc-vmstat.unevictable_pgs_munlocked
  21970113            +6.3%   23344161        proc-vmstat.unevictable_pgs_rescued
 1.455e+10            +4.2%  1.517e+10        perf-stat.i.branch-instructions
  58358654            +5.0%   61304729        perf-stat.i.branch-misses
  1.12e+08            +5.2%  1.179e+08        perf-stat.i.cache-misses
 2.569e+08            +5.1%  2.698e+08        perf-stat.i.cache-references
      3.32            -4.4%       3.17        perf-stat.i.cpi
      2031 ±  2%      -5.0%       1930 ±  2%  perf-stat.i.cycles-between-cache-misses
 1.603e+10            +4.4%  1.674e+10        perf-stat.i.dTLB-loads
 7.449e+09            +6.1%  7.901e+09        perf-stat.i.dTLB-stores
  6.52e+10            +4.4%  6.807e+10        perf-stat.i.instructions
      0.31            +5.7%       0.33 ±  3%  perf-stat.i.ipc
    825.05            +4.8%     864.24        perf-stat.i.metric.K/sec
    598.07            +4.7%     626.06        perf-stat.i.metric.M/sec
  12910790            +4.3%   13471810        perf-stat.i.node-load-misses
   7901301 ±  2%      +5.7%    8348185        perf-stat.i.node-loads
  21890957 ±  3%      +6.9%   23410670 ±  2%  perf-stat.i.node-stores
      3.38            -4.3%       3.23        perf-stat.overall.cpi
      1964            -5.1%       1864        perf-stat.overall.cycles-between-cache-misses
      0.30            +4.5%       0.31        perf-stat.overall.ipc
 1.431e+10            +4.3%  1.493e+10        perf-stat.ps.branch-instructions
  57370846            +5.0%   60264193        perf-stat.ps.branch-misses
 1.103e+08            +5.3%   1.16e+08        perf-stat.ps.cache-misses
 2.528e+08            +5.1%  2.657e+08        perf-stat.ps.cache-references
 1.577e+10            +4.5%  1.647e+10        perf-stat.ps.dTLB-loads
  7.33e+09            +6.1%  7.776e+09        perf-stat.ps.dTLB-stores
 6.415e+10            +4.4%  6.699e+10        perf-stat.ps.instructions
  12704753            +4.4%   13259951        perf-stat.ps.node-load-misses
   7778242 ±  2%      +5.7%    8224062        perf-stat.ps.node-loads
  21539559 ±  3%      +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12            +5.0%  4.205e+12        perf-stat.total.instructions
     38.85            -0.8       38.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dentry_kill
     39.12            -0.8       38.34        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dentry_kill.dput
     41.16            -0.7       40.44        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
     42.07            -0.7       41.39        perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
     42.09            -0.7       41.42        perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_user_mode_loop
     42.13            -0.7       41.46        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare
     42.59            -0.6       41.94        perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     42.69            -0.6       42.04        perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     42.77            -0.6       42.12        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.75            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.74            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.04            -0.4       46.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     46.97            -0.4       46.55        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.20            -0.4       46.79        perf-profile.calltrace.cycles-pp.__munmap
     39.35            -0.3       39.09        perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
     39.08            -0.2       38.84        perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup
      0.60            +0.0        0.63        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap
      0.64 ±  2%      +0.0        0.66        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
      0.62            +0.0        0.65        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
      0.64            +0.0        0.67        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup
      0.80            +0.0        0.84        perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.72            +0.0        0.75        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode
      0.88 ±  2%      +0.0        0.92        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.94 ±  2%      +0.0        0.98        perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.82            +0.0        0.86        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
      0.79 ±  2%      +0.0        0.84        perf-profile.calltrace.cycles-pp.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.22            +0.0        1.26        perf-profile.calltrace.cycles-pp.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.81            +0.0        0.85        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages
      0.97            +0.1        1.02        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
      0.63            +0.1        0.68        perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
      1.07            +0.1        1.12        perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
      0.61            +0.1        0.66        perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict
      0.83 ±  2%      +0.1        0.88        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
      0.58            +0.1        0.64 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.95 ±  3%      +0.1        1.01        perf-profile.calltrace.cycles-pp.alloc_file.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region
      0.61            +0.1        0.67        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.31            +0.1        1.38        perf-profile.calltrace.cycles-pp.stress_zero
      1.20            +0.1        1.27        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dentry_kill
      1.63            +0.1        1.70        perf-profile.calltrace.cycles-pp.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64
      1.37            +0.1        1.44        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dentry_kill.dput
      0.95 ±  2%      +0.1        1.02        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.32 ±  3%      +0.1        1.40        perf-profile.calltrace.cycles-pp.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region.do_mmap
      1.82 ±  2%      +0.1        1.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      2.18            +0.1        2.32        perf-profile.calltrace.cycles-pp.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.35            +0.1        2.49        perf-profile.calltrace.cycles-pp.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.34 ± 70%      +0.2        0.52        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region
      3.82            +0.2        4.03        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      4.01 ±  2%      +0.2        4.23        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.16 ±  2%      +0.2        4.39        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     50.50            +0.3       50.80        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.40            +0.3       50.70        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.53            +0.3       50.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     50.71            +0.3       51.02        perf-profile.calltrace.cycles-pp.__mmap
     78.82            -1.0       77.78        perf-profile.children.cycles-pp._raw_spin_lock
     78.20            -0.9       77.29        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     41.17            -0.7       40.44        perf-profile.children.cycles-pp.evict
     42.07            -0.7       41.39        perf-profile.children.cycles-pp.__dentry_kill
     42.10            -0.7       41.42        perf-profile.children.cycles-pp.dentry_kill
     42.13            -0.7       41.46        perf-profile.children.cycles-pp.dput
     42.70            -0.6       42.05        perf-profile.children.cycles-pp.task_work_run
     42.60            -0.6       41.95        perf-profile.children.cycles-pp.__fput
     42.84            -0.6       42.20        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     42.78            -0.6       42.14        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     42.74            -0.6       42.10        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     47.25            -0.4       46.85        perf-profile.children.cycles-pp.__munmap
     39.36            -0.3       39.10        perf-profile.children.cycles-pp.inode_sb_list_add
     40.21            -0.2       40.01        perf-profile.children.cycles-pp.new_inode
     40.58            -0.2       40.38        perf-profile.children.cycles-pp.shmem_get_inode
     97.92            -0.1       97.83        perf-profile.children.cycles-pp.do_syscall_64
     98.05            -0.1       97.96        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.15 ±  2%      +0.0        0.16        perf-profile.children.cycles-pp.mas_prev_slot
      0.19 ±  2%      +0.0        0.20        perf-profile.children.cycles-pp.ksys_read
      0.14 ±  4%      +0.0        0.16 ±  5%  perf-profile.children.cycles-pp.inode_init_owner
      0.20 ±  2%      +0.0        0.21        perf-profile.children.cycles-pp.errseq_sample
      0.35            +0.0        0.37        perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.48            +0.0        0.50        perf-profile.children.cycles-pp.__destroy_inode
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.memcg_list_lru_alloc
      0.49            +0.0        0.51        perf-profile.children.cycles-pp.destroy_inode
      0.49 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.25 ±  3%      +0.0        0.28 ±  2%  perf-profile.children.cycles-pp.__munlock_folio
      0.64            +0.0        0.66        perf-profile.children.cycles-pp.shmem_fault
      0.50 ±  2%      +0.0        0.53        perf-profile.children.cycles-pp.perf_event_mmap_output
      0.63            +0.0        0.66        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.22 ±  4%      +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.inode_init_always
      0.24 ±  3%      +0.0        0.27 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.61 ±  2%      +0.0        0.64        perf-profile.children.cycles-pp.perf_iterate_sb
      0.65            +0.0        0.68        perf-profile.children.cycles-pp.vm_unmapped_area
      0.29 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.clear_nlink
      0.64            +0.0        0.68        perf-profile.children.cycles-pp.__do_fault
      0.73            +0.0        0.76        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.20 ±  4%      +0.0        0.24 ±  3%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.36 ±  2%      +0.0        0.40        perf-profile.children.cycles-pp.slab_pre_alloc_hook
      0.20 ±  8%      +0.0        0.24 ±  9%  perf-profile.children.cycles-pp.free_unref_page
      0.18 ±  9%      +0.0        0.21 ±  8%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.80            +0.0        0.84        perf-profile.children.cycles-pp.get_unmapped_area
      1.23            +0.0        1.27        perf-profile.children.cycles-pp.mas_store_prealloc
      0.88 ±  2%      +0.0        0.92        perf-profile.children.cycles-pp.___slab_alloc
      1.14            +0.0        1.18        perf-profile.children.cycles-pp.mas_wr_node_store
      0.94 ±  2%      +0.0        0.98        perf-profile.children.cycles-pp.perf_event_mmap
      0.89 ±  2%      +0.0        0.93        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_read_fault
      0.80 ±  2%      +0.0        0.84        perf-profile.children.cycles-pp.vm_area_alloc
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_fault
      0.41 ±  2%      +0.0        0.46 ±  2%  perf-profile.children.cycles-pp.mlock_drain_local
      1.16            +0.0        1.21        perf-profile.children.cycles-pp.mas_wr_modify
      0.44            +0.0        0.49 ±  3%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.40 ±  2%      +0.0        0.45        perf-profile.children.cycles-pp.mlock_folio_batch
      0.63 ±  2%      +0.0        0.68        perf-profile.children.cycles-pp.__folio_batch_release
      0.82 ±  2%      +0.0        0.86        perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.fput
      0.98            +0.1        1.03        perf-profile.children.cycles-pp.__handle_mm_fault
      1.07            +0.1        1.13        perf-profile.children.cycles-pp.handle_mm_fault
      0.83 ±  2%      +0.1        0.89        perf-profile.children.cycles-pp.alloc_inode
      0.73 ±  2%      +0.1        0.78        perf-profile.children.cycles-pp.release_pages
      0.58            +0.1        0.64        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.96 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.alloc_file
      0.61            +0.1        0.67        perf-profile.children.cycles-pp.tlb_batch_pages_flush
      1.34 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.kmem_cache_alloc
      1.33            +0.1        1.40        perf-profile.children.cycles-pp.stress_zero
      1.21            +0.1        1.28        perf-profile.children.cycles-pp.shmem_undo_range
      0.95 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.64            +0.1        1.71        perf-profile.children.cycles-pp.__get_user_pages
      1.38            +0.1        1.45        perf-profile.children.cycles-pp.shmem_evict_inode
      0.68            +0.1        0.76 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.72            +0.1        0.80 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
      1.32 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.alloc_file_pseudo
      0.49            +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.67            +0.1        0.78 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.83 ±  2%      +0.1        1.94        perf-profile.children.cycles-pp.unmap_region
      2.18            +0.1        2.32        perf-profile.children.cycles-pp.populate_vma_page_range
      2.35            +0.1        2.49        perf-profile.children.cycles-pp.__mm_populate
      3.87 ±  2%      +0.2        4.08        perf-profile.children.cycles-pp.do_vmi_align_munmap
      4.17 ±  2%      +0.2        4.39        perf-profile.children.cycles-pp.__vm_munmap
      4.19 ±  2%      +0.2        4.42        perf-profile.children.cycles-pp.do_vmi_munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.children.cycles-pp.__x64_sys_munmap
     50.41            +0.3       50.71        perf-profile.children.cycles-pp.vm_mmap_pgoff
     50.75            +0.3       51.07        perf-profile.children.cycles-pp.__mmap
     75.97            -1.0       74.99        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.02 ±  2%      -0.0        0.98        perf-profile.self.cycles-pp._raw_spin_lock
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.get_random_u32
      0.19 ±  2%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.17 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk
      0.18 ±  2%      +0.0        0.20        perf-profile.self.cycles-pp.memcg_list_lru_alloc
      0.19            +0.0        0.21 ±  2%  perf-profile.self.cycles-pp.errseq_sample
      0.28 ±  2%      +0.0        0.30        perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
      0.18 ±  4%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp.inode_init_always
      0.42            +0.0        0.44        perf-profile.self.cycles-pp.__destroy_inode
      0.50            +0.0        0.53 ±  2%  perf-profile.self.cycles-pp.mas_wr_node_store
      0.29 ±  2%      +0.0        0.32        perf-profile.self.cycles-pp.clear_nlink
      0.21 ±  3%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__fput
      1.31            +0.1        1.37        perf-profile.self.cycles-pp.stress_zero




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: feng.tang@intel.com, linux-ia64@vger.kernel.org, lkp@intel.com,
	fengwei.yin@intel.com, linuxppc-dev@lists.ozlabs.org,
	ying.huang@intel.com, linux-kernel@vger.kernel.org,
	linux-mips@vger.kernel.org, sparclinux@vger.kernel.org,
	oliver.sang@intel.com, linux-alpha@vger.kernel.org,
	oe-lkp@lists.linux.dev, linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org
Subject: [linus:master] [locking]  c8afaa1b0f:  stress-ng.zero.ops_per_sec 6.3% improvement
Date: Tue, 15 Aug 2023 15:11:45 +0800	[thread overview]
Message-ID: <202308151426.97be5bd8-oliver.sang@intel.com> (raw)



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	class: memory
	test: zero
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     20.98 ±  8%     +12.7%      23.65 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     21.05 ±  8%    +803.4%     190.14 ±196%  perf-sched.total_sch_delay.max.ms
     46437            +2.4%      47564        stress-ng.time.involuntary_context_switches
  87942414            +6.3%   93441484        stress-ng.time.minor_page_faults
  21983137            +6.3%   23357886        stress-ng.zero.ops
    366380            +6.3%     389295        stress-ng.zero.ops_per_sec
    100683            +4.1%     104861 ±  2%  proc-vmstat.nr_shmem
  60215587            +6.2%   63957836        proc-vmstat.numa_hit
  60148996            +6.2%   63889951        proc-vmstat.numa_local
  22046746            +6.2%   23421583        proc-vmstat.pgactivate
  83092777            +6.3%   88309102        proc-vmstat.pgalloc_normal
  88854159            +6.1%   94276960        proc-vmstat.pgfault
  82294936            +6.3%   87489838        proc-vmstat.pgfree
  21970411            +6.3%   23344438        proc-vmstat.unevictable_pgs_culled
  21970116            +6.3%   23344165        proc-vmstat.unevictable_pgs_mlocked
  21970115            +6.3%   23344164        proc-vmstat.unevictable_pgs_munlocked
  21970113            +6.3%   23344161        proc-vmstat.unevictable_pgs_rescued
 1.455e+10            +4.2%  1.517e+10        perf-stat.i.branch-instructions
  58358654            +5.0%   61304729        perf-stat.i.branch-misses
  1.12e+08            +5.2%  1.179e+08        perf-stat.i.cache-misses
 2.569e+08            +5.1%  2.698e+08        perf-stat.i.cache-references
      3.32            -4.4%       3.17        perf-stat.i.cpi
      2031 ±  2%      -5.0%       1930 ±  2%  perf-stat.i.cycles-between-cache-misses
 1.603e+10            +4.4%  1.674e+10        perf-stat.i.dTLB-loads
 7.449e+09            +6.1%  7.901e+09        perf-stat.i.dTLB-stores
  6.52e+10            +4.4%  6.807e+10        perf-stat.i.instructions
      0.31            +5.7%       0.33 ±  3%  perf-stat.i.ipc
    825.05            +4.8%     864.24        perf-stat.i.metric.K/sec
    598.07            +4.7%     626.06        perf-stat.i.metric.M/sec
  12910790            +4.3%   13471810        perf-stat.i.node-load-misses
   7901301 ±  2%      +5.7%    8348185        perf-stat.i.node-loads
  21890957 ±  3%      +6.9%   23410670 ±  2%  perf-stat.i.node-stores
      3.38            -4.3%       3.23        perf-stat.overall.cpi
      1964            -5.1%       1864        perf-stat.overall.cycles-between-cache-misses
      0.30            +4.5%       0.31        perf-stat.overall.ipc
 1.431e+10            +4.3%  1.493e+10        perf-stat.ps.branch-instructions
  57370846            +5.0%   60264193        perf-stat.ps.branch-misses
 1.103e+08            +5.3%   1.16e+08        perf-stat.ps.cache-misses
 2.528e+08            +5.1%  2.657e+08        perf-stat.ps.cache-references
 1.577e+10            +4.5%  1.647e+10        perf-stat.ps.dTLB-loads
  7.33e+09            +6.1%  7.776e+09        perf-stat.ps.dTLB-stores
 6.415e+10            +4.4%  6.699e+10        perf-stat.ps.instructions
  12704753            +4.4%   13259951        perf-stat.ps.node-load-misses
   7778242 ±  2%      +5.7%    8224062        perf-stat.ps.node-loads
  21539559 ±  3%      +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12            +5.0%  4.205e+12        perf-stat.total.instructions
     38.85            -0.8       38.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dentry_kill
     39.12            -0.8       38.34        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dentry_kill.dput
     41.16            -0.7       40.44        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
     42.07            -0.7       41.39        perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
     42.09            -0.7       41.42        perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_user_mode_loop
     42.13            -0.7       41.46        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare
     42.59            -0.6       41.94        perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     42.69            -0.6       42.04        perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     42.77            -0.6       42.12        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.75            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.74            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.04            -0.4       46.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     46.97            -0.4       46.55        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.20            -0.4       46.79        perf-profile.calltrace.cycles-pp.__munmap
     39.35            -0.3       39.09        perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
     39.08            -0.2       38.84        perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup
      0.60            +0.0        0.63        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap
      0.64 ±  2%      +0.0        0.66        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
      0.62            +0.0        0.65        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
      0.64            +0.0        0.67        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup
      0.80            +0.0        0.84        perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.72            +0.0        0.75        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode
      0.88 ±  2%      +0.0        0.92        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.94 ±  2%      +0.0        0.98        perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.82            +0.0        0.86        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
      0.79 ±  2%      +0.0        0.84        perf-profile.calltrace.cycles-pp.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.22            +0.0        1.26        perf-profile.calltrace.cycles-pp.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.81            +0.0        0.85        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages
      0.97            +0.1        1.02        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
      0.63            +0.1        0.68        perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
      1.07            +0.1        1.12        perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
      0.61            +0.1        0.66        perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict
      0.83 ±  2%      +0.1        0.88        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
      0.58            +0.1        0.64 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.95 ±  3%      +0.1        1.01        perf-profile.calltrace.cycles-pp.alloc_file.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region
      0.61            +0.1        0.67        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.31            +0.1        1.38        perf-profile.calltrace.cycles-pp.stress_zero
      1.20            +0.1        1.27        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dentry_kill
      1.63            +0.1        1.70        perf-profile.calltrace.cycles-pp.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64
      1.37            +0.1        1.44        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dentry_kill.dput
      0.95 ±  2%      +0.1        1.02        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.32 ±  3%      +0.1        1.40        perf-profile.calltrace.cycles-pp.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region.do_mmap
      1.82 ±  2%      +0.1        1.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      2.18            +0.1        2.32        perf-profile.calltrace.cycles-pp.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.35            +0.1        2.49        perf-profile.calltrace.cycles-pp.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.34 ± 70%      +0.2        0.52        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region
      3.82            +0.2        4.03        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      4.01 ±  2%      +0.2        4.23        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.16 ±  2%      +0.2        4.39        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     50.50            +0.3       50.80        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.40            +0.3       50.70        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.53            +0.3       50.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     50.71            +0.3       51.02        perf-profile.calltrace.cycles-pp.__mmap
     78.82            -1.0       77.78        perf-profile.children.cycles-pp._raw_spin_lock
     78.20            -0.9       77.29        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     41.17            -0.7       40.44        perf-profile.children.cycles-pp.evict
     42.07            -0.7       41.39        perf-profile.children.cycles-pp.__dentry_kill
     42.10            -0.7       41.42        perf-profile.children.cycles-pp.dentry_kill
     42.13            -0.7       41.46        perf-profile.children.cycles-pp.dput
     42.70            -0.6       42.05        perf-profile.children.cycles-pp.task_work_run
     42.60            -0.6       41.95        perf-profile.children.cycles-pp.__fput
     42.84            -0.6       42.20        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     42.78            -0.6       42.14        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     42.74            -0.6       42.10        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     47.25            -0.4       46.85        perf-profile.children.cycles-pp.__munmap
     39.36            -0.3       39.10        perf-profile.children.cycles-pp.inode_sb_list_add
     40.21            -0.2       40.01        perf-profile.children.cycles-pp.new_inode
     40.58            -0.2       40.38        perf-profile.children.cycles-pp.shmem_get_inode
     97.92            -0.1       97.83        perf-profile.children.cycles-pp.do_syscall_64
     98.05            -0.1       97.96        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.15 ±  2%      +0.0        0.16        perf-profile.children.cycles-pp.mas_prev_slot
      0.19 ±  2%      +0.0        0.20        perf-profile.children.cycles-pp.ksys_read
      0.14 ±  4%      +0.0        0.16 ±  5%  perf-profile.children.cycles-pp.inode_init_owner
      0.20 ±  2%      +0.0        0.21        perf-profile.children.cycles-pp.errseq_sample
      0.35            +0.0        0.37        perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.48            +0.0        0.50        perf-profile.children.cycles-pp.__destroy_inode
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.memcg_list_lru_alloc
      0.49            +0.0        0.51        perf-profile.children.cycles-pp.destroy_inode
      0.49 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.25 ±  3%      +0.0        0.28 ±  2%  perf-profile.children.cycles-pp.__munlock_folio
      0.64            +0.0        0.66        perf-profile.children.cycles-pp.shmem_fault
      0.50 ±  2%      +0.0        0.53        perf-profile.children.cycles-pp.perf_event_mmap_output
      0.63            +0.0        0.66        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.22 ±  4%      +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.inode_init_always
      0.24 ±  3%      +0.0        0.27 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.61 ±  2%      +0.0        0.64        perf-profile.children.cycles-pp.perf_iterate_sb
      0.65            +0.0        0.68        perf-profile.children.cycles-pp.vm_unmapped_area
      0.29 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.clear_nlink
      0.64            +0.0        0.68        perf-profile.children.cycles-pp.__do_fault
      0.73            +0.0        0.76        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.20 ±  4%      +0.0        0.24 ±  3%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.36 ±  2%      +0.0        0.40        perf-profile.children.cycles-pp.slab_pre_alloc_hook
      0.20 ±  8%      +0.0        0.24 ±  9%  perf-profile.children.cycles-pp.free_unref_page
      0.18 ±  9%      +0.0        0.21 ±  8%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.80            +0.0        0.84        perf-profile.children.cycles-pp.get_unmapped_area
      1.23            +0.0        1.27        perf-profile.children.cycles-pp.mas_store_prealloc
      0.88 ±  2%      +0.0        0.92        perf-profile.children.cycles-pp.___slab_alloc
      1.14            +0.0        1.18        perf-profile.children.cycles-pp.mas_wr_node_store
      0.94 ±  2%      +0.0        0.98        perf-profile.children.cycles-pp.perf_event_mmap
      0.89 ±  2%      +0.0        0.93        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_read_fault
      0.80 ±  2%      +0.0        0.84        perf-profile.children.cycles-pp.vm_area_alloc
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_fault
      0.41 ±  2%      +0.0        0.46 ±  2%  perf-profile.children.cycles-pp.mlock_drain_local
      1.16            +0.0        1.21        perf-profile.children.cycles-pp.mas_wr_modify
      0.44            +0.0        0.49 ±  3%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.40 ±  2%      +0.0        0.45        perf-profile.children.cycles-pp.mlock_folio_batch
      0.63 ±  2%      +0.0        0.68        perf-profile.children.cycles-pp.__folio_batch_release
      0.82 ±  2%      +0.0        0.86        perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.fput
      0.98            +0.1        1.03        perf-profile.children.cycles-pp.__handle_mm_fault
      1.07            +0.1        1.13        perf-profile.children.cycles-pp.handle_mm_fault
      0.83 ±  2%      +0.1        0.89        perf-profile.children.cycles-pp.alloc_inode
      0.73 ±  2%      +0.1        0.78        perf-profile.children.cycles-pp.release_pages
      0.58            +0.1        0.64        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.96 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.alloc_file
      0.61            +0.1        0.67        perf-profile.children.cycles-pp.tlb_batch_pages_flush
      1.34 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.kmem_cache_alloc
      1.33            +0.1        1.40        perf-profile.children.cycles-pp.stress_zero
      1.21            +0.1        1.28        perf-profile.children.cycles-pp.shmem_undo_range
      0.95 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.64            +0.1        1.71        perf-profile.children.cycles-pp.__get_user_pages
      1.38            +0.1        1.45        perf-profile.children.cycles-pp.shmem_evict_inode
      0.68            +0.1        0.76 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.72            +0.1        0.80 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
      1.32 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.alloc_file_pseudo
      0.49            +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.67            +0.1        0.78 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.83 ±  2%      +0.1        1.94        perf-profile.children.cycles-pp.unmap_region
      2.18            +0.1        2.32        perf-profile.children.cycles-pp.populate_vma_page_range
      2.35            +0.1        2.49        perf-profile.children.cycles-pp.__mm_populate
      3.87 ±  2%      +0.2        4.08        perf-profile.children.cycles-pp.do_vmi_align_munmap
      4.17 ±  2%      +0.2        4.39        perf-profile.children.cycles-pp.__vm_munmap
      4.19 ±  2%      +0.2        4.42        perf-profile.children.cycles-pp.do_vmi_munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.children.cycles-pp.__x64_sys_munmap
     50.41            +0.3       50.71        perf-profile.children.cycles-pp.vm_mmap_pgoff
     50.75            +0.3       51.07        perf-profile.children.cycles-pp.__mmap
     75.97            -1.0       74.99        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.02 ±  2%      -0.0        0.98        perf-profile.self.cycles-pp._raw_spin_lock
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.get_random_u32
      0.19 ±  2%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.17 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk
      0.18 ±  2%      +0.0        0.20        perf-profile.self.cycles-pp.memcg_list_lru_alloc
      0.19            +0.0        0.21 ±  2%  perf-profile.self.cycles-pp.errseq_sample
      0.28 ±  2%      +0.0        0.30        perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
      0.18 ±  4%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp.inode_init_always
      0.42            +0.0        0.44        perf-profile.self.cycles-pp.__destroy_inode
      0.50            +0.0        0.53 ±  2%  perf-profile.self.cycles-pp.mas_wr_node_store
      0.29 ±  2%      +0.0        0.32        perf-profile.self.cycles-pp.clear_nlink
      0.21 ±  3%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__fput
      1.31            +0.1        1.37        perf-profile.self.cycles-pp.stress_zero




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	<linux-alpha@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-ia64@vger.kernel.org>, <linux-mips@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>, <sparclinux@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	<oliver.sang@intel.com>
Subject: [linus:master] [locking]  c8afaa1b0f:  stress-ng.zero.ops_per_sec 6.3% improvement
Date: Tue, 15 Aug 2023 15:11:45 +0800	[thread overview]
Message-ID: <202308151426.97be5bd8-oliver.sang@intel.com> (raw)



Hello,

kernel test robot noticed a 6.3% improvement of stress-ng.zero.ops_per_sec on:


commit: c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 ("locking: remove spin_lock_prefetch")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	class: memory
	test: zero
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230815/202308151426.97be5bd8-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  memory/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/zero/stress-ng/60s

commit: 
  3feecb1b84 ("Merge tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
  c8afaa1b0f ("locking: remove spin_lock_prefetch")

3feecb1b848359b1 c8afaa1b0f8bc93d013ab2ea6b9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     20.98 ±  8%     +12.7%      23.65 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     21.05 ±  8%    +803.4%     190.14 ±196%  perf-sched.total_sch_delay.max.ms
     46437            +2.4%      47564        stress-ng.time.involuntary_context_switches
  87942414            +6.3%   93441484        stress-ng.time.minor_page_faults
  21983137            +6.3%   23357886        stress-ng.zero.ops
    366380            +6.3%     389295        stress-ng.zero.ops_per_sec
    100683            +4.1%     104861 ±  2%  proc-vmstat.nr_shmem
  60215587            +6.2%   63957836        proc-vmstat.numa_hit
  60148996            +6.2%   63889951        proc-vmstat.numa_local
  22046746            +6.2%   23421583        proc-vmstat.pgactivate
  83092777            +6.3%   88309102        proc-vmstat.pgalloc_normal
  88854159            +6.1%   94276960        proc-vmstat.pgfault
  82294936            +6.3%   87489838        proc-vmstat.pgfree
  21970411            +6.3%   23344438        proc-vmstat.unevictable_pgs_culled
  21970116            +6.3%   23344165        proc-vmstat.unevictable_pgs_mlocked
  21970115            +6.3%   23344164        proc-vmstat.unevictable_pgs_munlocked
  21970113            +6.3%   23344161        proc-vmstat.unevictable_pgs_rescued
 1.455e+10            +4.2%  1.517e+10        perf-stat.i.branch-instructions
  58358654            +5.0%   61304729        perf-stat.i.branch-misses
  1.12e+08            +5.2%  1.179e+08        perf-stat.i.cache-misses
 2.569e+08            +5.1%  2.698e+08        perf-stat.i.cache-references
      3.32            -4.4%       3.17        perf-stat.i.cpi
      2031 ±  2%      -5.0%       1930 ±  2%  perf-stat.i.cycles-between-cache-misses
 1.603e+10            +4.4%  1.674e+10        perf-stat.i.dTLB-loads
 7.449e+09            +6.1%  7.901e+09        perf-stat.i.dTLB-stores
  6.52e+10            +4.4%  6.807e+10        perf-stat.i.instructions
      0.31            +5.7%       0.33 ±  3%  perf-stat.i.ipc
    825.05            +4.8%     864.24        perf-stat.i.metric.K/sec
    598.07            +4.7%     626.06        perf-stat.i.metric.M/sec
  12910790            +4.3%   13471810        perf-stat.i.node-load-misses
   7901301 ±  2%      +5.7%    8348185        perf-stat.i.node-loads
  21890957 ±  3%      +6.9%   23410670 ±  2%  perf-stat.i.node-stores
      3.38            -4.3%       3.23        perf-stat.overall.cpi
      1964            -5.1%       1864        perf-stat.overall.cycles-between-cache-misses
      0.30            +4.5%       0.31        perf-stat.overall.ipc
 1.431e+10            +4.3%  1.493e+10        perf-stat.ps.branch-instructions
  57370846            +5.0%   60264193        perf-stat.ps.branch-misses
 1.103e+08            +5.3%   1.16e+08        perf-stat.ps.cache-misses
 2.528e+08            +5.1%  2.657e+08        perf-stat.ps.cache-references
 1.577e+10            +4.5%  1.647e+10        perf-stat.ps.dTLB-loads
  7.33e+09            +6.1%  7.776e+09        perf-stat.ps.dTLB-stores
 6.415e+10            +4.4%  6.699e+10        perf-stat.ps.instructions
  12704753            +4.4%   13259951        perf-stat.ps.node-load-misses
   7778242 ±  2%      +5.7%    8224062        perf-stat.ps.node-loads
  21539559 ±  3%      +7.0%   23044455 ±  2%  perf-stat.ps.node-stores
 4.005e+12            +5.0%  4.205e+12        perf-stat.total.instructions
     38.85            -0.8       38.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dentry_kill
     39.12            -0.8       38.34        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dentry_kill.dput
     41.16            -0.7       40.44        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
     42.07            -0.7       41.39        perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
     42.09            -0.7       41.42        perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_user_mode_loop
     42.13            -0.7       41.46        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare
     42.59            -0.6       41.94        perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     42.69            -0.6       42.04        perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     42.77            -0.6       42.12        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.75            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.74            -0.6       42.10        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.04            -0.4       46.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     46.97            -0.4       46.55        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.20            -0.4       46.79        perf-profile.calltrace.cycles-pp.__munmap
     39.35            -0.3       39.09        perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
     39.08            -0.2       38.84        perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.shmem_get_inode.__shmem_file_setup
      0.60            +0.0        0.63        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap
      0.64 ±  2%      +0.0        0.66        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
      0.62            +0.0        0.65        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
      0.64            +0.0        0.67        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup
      0.80            +0.0        0.84        perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.72            +0.0        0.75        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
      0.59 ±  2%      +0.0        0.62        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode
      0.88 ±  2%      +0.0        0.92        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.94 ±  2%      +0.0        0.98        perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.82            +0.0        0.86        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
      0.79 ±  2%      +0.0        0.84        perf-profile.calltrace.cycles-pp.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.22            +0.0        1.26        perf-profile.calltrace.cycles-pp.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      0.81            +0.0        0.85        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.__get_user_pages
      0.97            +0.1        1.02        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
      0.63            +0.1        0.68        perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
      1.07            +0.1        1.12        perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
      0.61            +0.1        0.66        perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict
      0.83 ±  2%      +0.1        0.88        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.shmem_get_inode.__shmem_file_setup.shmem_zero_setup
      0.58            +0.1        0.64 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.95 ±  3%      +0.1        1.01        perf-profile.calltrace.cycles-pp.alloc_file.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region
      0.61            +0.1        0.67        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.31            +0.1        1.38        perf-profile.calltrace.cycles-pp.stress_zero
      1.20            +0.1        1.27        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dentry_kill
      1.63            +0.1        1.70        perf-profile.calltrace.cycles-pp.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64
      1.37            +0.1        1.44        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dentry_kill.dput
      0.95 ±  2%      +0.1        1.02        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.32 ±  3%      +0.1        1.40        perf-profile.calltrace.cycles-pp.alloc_file_pseudo.__shmem_file_setup.shmem_zero_setup.mmap_region.do_mmap
      1.82 ±  2%      +0.1        1.93        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      2.18            +0.1        2.32        perf-profile.calltrace.cycles-pp.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.35            +0.1        2.49        perf-profile.calltrace.cycles-pp.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.34 ± 70%      +0.2        0.52        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.mmap_region
      3.82            +0.2        4.03        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      4.01 ±  2%      +0.2        4.23        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.16 ±  2%      +0.2        4.39        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     50.50            +0.3       50.80        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.40            +0.3       50.70        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     50.53            +0.3       50.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     50.71            +0.3       51.02        perf-profile.calltrace.cycles-pp.__mmap
     78.82            -1.0       77.78        perf-profile.children.cycles-pp._raw_spin_lock
     78.20            -0.9       77.29        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     41.17            -0.7       40.44        perf-profile.children.cycles-pp.evict
     42.07            -0.7       41.39        perf-profile.children.cycles-pp.__dentry_kill
     42.10            -0.7       41.42        perf-profile.children.cycles-pp.dentry_kill
     42.13            -0.7       41.46        perf-profile.children.cycles-pp.dput
     42.70            -0.6       42.05        perf-profile.children.cycles-pp.task_work_run
     42.60            -0.6       41.95        perf-profile.children.cycles-pp.__fput
     42.84            -0.6       42.20        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     42.78            -0.6       42.14        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     42.74            -0.6       42.10        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     47.25            -0.4       46.85        perf-profile.children.cycles-pp.__munmap
     39.36            -0.3       39.10        perf-profile.children.cycles-pp.inode_sb_list_add
     40.21            -0.2       40.01        perf-profile.children.cycles-pp.new_inode
     40.58            -0.2       40.38        perf-profile.children.cycles-pp.shmem_get_inode
     97.92            -0.1       97.83        perf-profile.children.cycles-pp.do_syscall_64
     98.05            -0.1       97.96        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.15 ±  2%      +0.0        0.16        perf-profile.children.cycles-pp.mas_prev_slot
      0.19 ±  2%      +0.0        0.20        perf-profile.children.cycles-pp.ksys_read
      0.14 ±  4%      +0.0        0.16 ±  5%  perf-profile.children.cycles-pp.inode_init_owner
      0.20 ±  2%      +0.0        0.21        perf-profile.children.cycles-pp.errseq_sample
      0.35            +0.0        0.37        perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.48            +0.0        0.50        perf-profile.children.cycles-pp.__destroy_inode
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.memcg_list_lru_alloc
      0.49            +0.0        0.51        perf-profile.children.cycles-pp.destroy_inode
      0.49 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.25 ±  3%      +0.0        0.28 ±  2%  perf-profile.children.cycles-pp.__munlock_folio
      0.64            +0.0        0.66        perf-profile.children.cycles-pp.shmem_fault
      0.50 ±  2%      +0.0        0.53        perf-profile.children.cycles-pp.perf_event_mmap_output
      0.63            +0.0        0.66        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.22 ±  4%      +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.inode_init_always
      0.24 ±  3%      +0.0        0.27 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.61 ±  2%      +0.0        0.64        perf-profile.children.cycles-pp.perf_iterate_sb
      0.65            +0.0        0.68        perf-profile.children.cycles-pp.vm_unmapped_area
      0.29 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.clear_nlink
      0.64            +0.0        0.68        perf-profile.children.cycles-pp.__do_fault
      0.73            +0.0        0.76        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.20 ±  4%      +0.0        0.24 ±  3%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.36 ±  2%      +0.0        0.40        perf-profile.children.cycles-pp.slab_pre_alloc_hook
      0.20 ±  8%      +0.0        0.24 ±  9%  perf-profile.children.cycles-pp.free_unref_page
      0.18 ±  9%      +0.0        0.21 ±  8%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.80            +0.0        0.84        perf-profile.children.cycles-pp.get_unmapped_area
      1.23            +0.0        1.27        perf-profile.children.cycles-pp.mas_store_prealloc
      0.88 ±  2%      +0.0        0.92        perf-profile.children.cycles-pp.___slab_alloc
      1.14            +0.0        1.18        perf-profile.children.cycles-pp.mas_wr_node_store
      0.94 ±  2%      +0.0        0.98        perf-profile.children.cycles-pp.perf_event_mmap
      0.89 ±  2%      +0.0        0.93        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_read_fault
      0.80 ±  2%      +0.0        0.84        perf-profile.children.cycles-pp.vm_area_alloc
      0.82            +0.0        0.86        perf-profile.children.cycles-pp.do_fault
      0.41 ±  2%      +0.0        0.46 ±  2%  perf-profile.children.cycles-pp.mlock_drain_local
      1.16            +0.0        1.21        perf-profile.children.cycles-pp.mas_wr_modify
      0.44            +0.0        0.49 ±  3%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.40 ±  2%      +0.0        0.45        perf-profile.children.cycles-pp.mlock_folio_batch
      0.63 ±  2%      +0.0        0.68        perf-profile.children.cycles-pp.__folio_batch_release
      0.82 ±  2%      +0.0        0.86        perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.fput
      0.98            +0.1        1.03        perf-profile.children.cycles-pp.__handle_mm_fault
      1.07            +0.1        1.13        perf-profile.children.cycles-pp.handle_mm_fault
      0.83 ±  2%      +0.1        0.89        perf-profile.children.cycles-pp.alloc_inode
      0.73 ±  2%      +0.1        0.78        perf-profile.children.cycles-pp.release_pages
      0.58            +0.1        0.64        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.96 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.alloc_file
      0.61            +0.1        0.67        perf-profile.children.cycles-pp.tlb_batch_pages_flush
      1.34 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.kmem_cache_alloc
      1.33            +0.1        1.40        perf-profile.children.cycles-pp.stress_zero
      1.21            +0.1        1.28        perf-profile.children.cycles-pp.shmem_undo_range
      0.95 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.64            +0.1        1.71        perf-profile.children.cycles-pp.__get_user_pages
      1.38            +0.1        1.45        perf-profile.children.cycles-pp.shmem_evict_inode
      0.68            +0.1        0.76 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.72            +0.1        0.80 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
      1.32 ±  2%      +0.1        1.40        perf-profile.children.cycles-pp.alloc_file_pseudo
      0.49            +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.67            +0.1        0.78 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.83 ±  2%      +0.1        1.94        perf-profile.children.cycles-pp.unmap_region
      2.18            +0.1        2.32        perf-profile.children.cycles-pp.populate_vma_page_range
      2.35            +0.1        2.49        perf-profile.children.cycles-pp.__mm_populate
      3.87 ±  2%      +0.2        4.08        perf-profile.children.cycles-pp.do_vmi_align_munmap
      4.17 ±  2%      +0.2        4.39        perf-profile.children.cycles-pp.__vm_munmap
      4.19 ±  2%      +0.2        4.42        perf-profile.children.cycles-pp.do_vmi_munmap
      4.17 ±  2%      +0.2        4.40        perf-profile.children.cycles-pp.__x64_sys_munmap
     50.41            +0.3       50.71        perf-profile.children.cycles-pp.vm_mmap_pgoff
     50.75            +0.3       51.07        perf-profile.children.cycles-pp.__mmap
     75.97            -1.0       74.99        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.02 ±  2%      -0.0        0.98        perf-profile.self.cycles-pp._raw_spin_lock
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.get_random_u32
      0.19 ±  2%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.17 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk
      0.18 ±  2%      +0.0        0.20        perf-profile.self.cycles-pp.memcg_list_lru_alloc
      0.19            +0.0        0.21 ±  2%  perf-profile.self.cycles-pp.errseq_sample
      0.28 ±  2%      +0.0        0.30        perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
      0.18 ±  4%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp.inode_init_always
      0.42            +0.0        0.44        perf-profile.self.cycles-pp.__destroy_inode
      0.50            +0.0        0.53 ±  2%  perf-profile.self.cycles-pp.mas_wr_node_store
      0.29 ±  2%      +0.0        0.32        perf-profile.self.cycles-pp.clear_nlink
      0.21 ±  3%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__fput
      1.31            +0.1        1.37        perf-profile.self.cycles-pp.stress_zero




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2023-08-15  7:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-15  7:11 kernel test robot [this message]
2023-08-15  7:11 ` [linus:master] [locking] c8afaa1b0f: stress-ng.zero.ops_per_sec 6.3% improvement kernel test robot
2023-08-15  7:11 ` kernel test robot
2023-08-15  7:11 ` kernel test robot
2023-08-15  7:33 ` Linus Torvalds
2023-08-15  7:33   ` Linus Torvalds
2023-08-15  7:33   ` Linus Torvalds
2023-08-15  7:43   ` Mateusz Guzik
2023-08-15  7:43     ` Mateusz Guzik
2023-08-15  7:43     ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202308151426.97be5bd8-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lkp@intel.com \
    --cc=mjguzik@gmail.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.