All of lore.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [slab]  e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
@ 2026-03-18  7:35 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-03-18  7:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: oe-lkp, lkp, linux-kernel, Suren Baghdasaryan, Harry Yoo, Hao Li,
	Breno Leitao, Liam R. Howlett, Zhao Liu, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:


commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:

	nr_threads: 100%
	disk: 1SSD
	testtime: 60s
	fs: xfs
	test: lockofd
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s

commit: 
  4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
  e47c897a29 ("slab: add sheaves to most caches")

4b038a9670154e8b e47c897a29491ade20b27612fdd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.912e+09 ±  3%     +15.1%  2.201e+09 ±  2%  stress-ng.lockofd.ops
  31892971 ±  3%     +15.1%   36721113 ±  2%  stress-ng.lockofd.ops_per_sec
      0.28 ±  2%      +0.0        0.31 ±  2%  mpstat.cpu.all.irq%
      0.33 ±  3%     +12.5%       0.38 ±  2%  turbostat.IPC
     17.77 ±  2%     -19.2%      14.36 ±  5%  turbostat.RAMWatt
      6374 ± 13%     -91.1%     568.33 ± 47%  perf-c2c.DRAM.local
    209276 ± 12%     -69.0%      64817 ± 40%  perf-c2c.DRAM.remote
    475369 ± 14%     -64.4%     169285 ± 36%  perf-c2c.HITM.local
     24507 ± 12%     -49.3%      12424 ± 40%  perf-c2c.HITM.remote
    499876 ± 13%     -63.6%     181710 ± 36%  perf-c2c.HITM.total
      4.03 ± 11%     -49.8%       2.02 ± 10%  perf-stat.i.MPKI
 7.069e+10 ±  3%     +14.3%  8.083e+10        perf-stat.i.branch-instructions
      0.40            -0.1        0.33        perf-stat.i.branch-miss-rate%
     22.12 ± 13%      -6.0       16.15 ± 10%  perf-stat.i.cache-miss-rate%
 1.229e+09 ±  7%     -43.2%  6.977e+08 ±  9%  perf-stat.i.cache-misses
 5.599e+09 ±  5%     -22.7%  4.329e+09 ±  2%  perf-stat.i.cache-references
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.i.cpi
    748.51 ±  7%     +78.5%       1336 ±  9%  perf-stat.i.cycles-between-cache-misses
 3.062e+11 ±  3%     +12.9%  3.456e+11        perf-stat.i.instructions
      0.33 ±  3%     +11.8%       0.37 ±  2%  perf-stat.i.ipc
      4.03 ± 11%     -49.9%       2.02 ± 10%  perf-stat.overall.MPKI
      0.40            -0.1        0.33        perf-stat.overall.branch-miss-rate%
     22.10 ± 13%      -6.0       16.11 ± 10%  perf-stat.overall.cache-miss-rate%
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.overall.cpi
    748.41 ±  7%     +78.6%       1336 ±  9%  perf-stat.overall.cycles-between-cache-misses
      0.34 ±  3%     +11.8%       0.37 ±  2%  perf-stat.overall.ipc
 6.915e+10 ±  3%     +13.8%  7.871e+10 ±  2%  perf-stat.ps.branch-instructions
 1.201e+09 ±  7%     -43.5%  6.784e+08 ±  9%  perf-stat.ps.cache-misses
 5.475e+09 ±  5%     -23.0%  4.216e+09 ±  2%  perf-stat.ps.cache-references
 2.995e+11 ±  3%     +12.4%  3.366e+11 ±  2%  perf-stat.ps.instructions
 1.829e+13 ±  3%     +10.2%  2.015e+13 ±  4%  perf-stat.total.instructions
      3.09 ±  2%      -1.6        1.52 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      3.25 ±  2%      -1.2        2.06 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
     93.17            -0.5       92.64        perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
     95.62            -0.3       95.36        perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.15 ±  4%      -0.1        1.00 ±  2%  perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      0.72            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      1.04 ±  4%      -0.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
     96.32            -0.1       96.21        perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.45            +0.1       97.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.66 ±  4%      +0.1        0.76 ±  7%  perf-profile.calltrace.cycles-pp.stress_lockofd_contention
     97.61            +0.1       97.71        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.74 ±  2%      +0.3        1.07        perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
      3.66 ±  3%      -1.6        2.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      1.66 ±  3%      -1.5        0.15 ±  9%  perf-profile.children.cycles-pp.___slab_alloc
      1.66 ±  4%      -1.2        0.42 ±  8%  perf-profile.children.cycles-pp.__slab_free
      4.08            -0.9        3.16 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.97 ±  5%      -0.9        0.06 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.69 ±  5%      -0.6        0.08 ±  8%  perf-profile.children.cycles-pp.get_partial_node
     93.45            -0.6       92.87        perf-profile.children.cycles-pp.posix_lock_inode
      0.62 ±  4%      -0.5        0.07 ± 12%  perf-profile.children.cycles-pp.__put_partials
     95.72            -0.2       95.48        perf-profile.children.cycles-pp.fcntl_setlk
      0.46 ±  5%      -0.2        0.29        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.16 ±  4%      -0.1        1.01 ±  2%  perf-profile.children.cycles-pp.locks_insert_lock_ctx
      0.82            -0.1        0.70 ±  2%  perf-profile.children.cycles-pp.locks_release_private
      0.31            -0.1        0.21 ±  2%  perf-profile.children.cycles-pp.__locks_delete_block
     96.34            -0.1       96.24        perf-profile.children.cycles-pp.do_fcntl
      0.15 ±  3%      -0.0        0.11 ± 14%  perf-profile.children.cycles-pp.__libc_fcntl64
      0.16 ±  3%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.stress_mwc16
      0.12 ±  3%      -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.locks_copy_lock
      0.18            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.locks_get_lock_context
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.07            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.stress_mwc64
      0.15 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.flock64_to_posix_lock
      0.29            +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
      0.11 ±  4%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__init_waitqueue_head
      0.23 ±  4%      +0.1        0.28        perf-profile.children.cycles-pp.fdget_raw
      0.32 ±  4%      +0.1        0.37        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.71 ±  3%      +0.1        0.78 ±  7%  perf-profile.children.cycles-pp.stress_lockofd_contention
     97.58            +0.1       97.66        perf-profile.children.cycles-pp.do_syscall_64
      0.42 ±  4%      +0.2        0.58        perf-profile.children.cycles-pp.__pi_memset
      0.00            +0.2        0.22 ±  9%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.2        0.23 ±  7%  perf-profile.children.cycles-pp.__pcs_replace_empty_main
      1.59 ±  3%      -1.2        0.42 ±  8%  perf-profile.self.cycles-pp.__slab_free
      0.95 ±  2%      -0.9        0.06 ±  7%  perf-profile.self.cycles-pp.___slab_alloc
      0.24 ±  2%      -0.2        0.06 ± 13%  perf-profile.self.cycles-pp.get_partial_node
      1.42 ±  3%      -0.2        1.24 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.46 ±  5%      -0.2        0.29        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.79            -0.2        0.64 ±  2%  perf-profile.self.cycles-pp.locks_release_private
      0.30            -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.__locks_delete_block
      0.51 ±  8%      -0.1        0.43 ±  5%  perf-profile.self.cycles-pp.locks_unlink_lock_ctx
      0.09            -0.1        0.02 ± 99%  perf-profile.self.cycles-pp.__put_partials
      0.13 ±  3%      -0.0        0.09 ± 17%  perf-profile.self.cycles-pp.__libc_fcntl64
      0.15 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.stress_mwc16
      0.12 ±  6%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.locks_insert_lock_ctx
      0.18 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.locks_copy_lock
      0.11 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.locks_get_lock_context
      0.09 ±  4%      +0.0        0.10        perf-profile.self.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.flock64_to_posix_lock
      0.06 ±  6%      +0.0        0.09        perf-profile.self.cycles-pp.stress_mwc64
      0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
      0.14 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.do_fcntl
      0.08 ±  4%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__init_waitqueue_head
      0.22 ±  4%      +0.1        0.27        perf-profile.self.cycles-pp.fdget_raw
      0.31 ±  4%      +0.1        0.36 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.24 ±  4%      +0.1        0.29        perf-profile.self.cycles-pp.do_syscall_64
      0.24 ±  2%      +0.1        0.30 ±  2%  perf-profile.self.cycles-pp.__x64_sys_fcntl
      0.42 ±  2%      +0.1        0.48        perf-profile.self.cycles-pp.fcntl_setlk
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
      0.67 ±  4%      +0.1        0.75 ±  7%  perf-profile.self.cycles-pp.stress_lockofd_contention
      0.40 ±  4%      +0.2        0.56 ±  2%  perf-profile.self.cycles-pp.__pi_memset
      1.57            +1.0        2.56        perf-profile.self.cycles-pp.kmem_cache_free
     60.77            +1.9       62.63        perf-profile.self.cycles-pp.posix_lock_inode



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-18  7:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  7:35 [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.