public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [linus:master] [slab]  e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
@ 2026-03-18  7:35 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-03-18  7:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: oe-lkp, lkp, linux-kernel, Suren Baghdasaryan, Harry Yoo, Hao Li,
	Breno Leitao, Liam R. Howlett, Zhao Liu, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:


commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:

	nr_threads: 100%
	disk: 1SSD
	testtime: 60s
	fs: xfs
	test: lockofd
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s

commit: 
  4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
  e47c897a29 ("slab: add sheaves to most caches")

4b038a9670154e8b e47c897a29491ade20b27612fdd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.912e+09 ±  3%     +15.1%  2.201e+09 ±  2%  stress-ng.lockofd.ops
  31892971 ±  3%     +15.1%   36721113 ±  2%  stress-ng.lockofd.ops_per_sec
      0.28 ±  2%      +0.0        0.31 ±  2%  mpstat.cpu.all.irq%
      0.33 ±  3%     +12.5%       0.38 ±  2%  turbostat.IPC
     17.77 ±  2%     -19.2%      14.36 ±  5%  turbostat.RAMWatt
      6374 ± 13%     -91.1%     568.33 ± 47%  perf-c2c.DRAM.local
    209276 ± 12%     -69.0%      64817 ± 40%  perf-c2c.DRAM.remote
    475369 ± 14%     -64.4%     169285 ± 36%  perf-c2c.HITM.local
     24507 ± 12%     -49.3%      12424 ± 40%  perf-c2c.HITM.remote
    499876 ± 13%     -63.6%     181710 ± 36%  perf-c2c.HITM.total
      4.03 ± 11%     -49.8%       2.02 ± 10%  perf-stat.i.MPKI
 7.069e+10 ±  3%     +14.3%  8.083e+10        perf-stat.i.branch-instructions
      0.40            -0.1        0.33        perf-stat.i.branch-miss-rate%
     22.12 ± 13%      -6.0       16.15 ± 10%  perf-stat.i.cache-miss-rate%
 1.229e+09 ±  7%     -43.2%  6.977e+08 ±  9%  perf-stat.i.cache-misses
 5.599e+09 ±  5%     -22.7%  4.329e+09 ±  2%  perf-stat.i.cache-references
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.i.cpi
    748.51 ±  7%     +78.5%       1336 ±  9%  perf-stat.i.cycles-between-cache-misses
 3.062e+11 ±  3%     +12.9%  3.456e+11        perf-stat.i.instructions
      0.33 ±  3%     +11.8%       0.37 ±  2%  perf-stat.i.ipc
      4.03 ± 11%     -49.9%       2.02 ± 10%  perf-stat.overall.MPKI
      0.40            -0.1        0.33        perf-stat.overall.branch-miss-rate%
     22.10 ± 13%      -6.0       16.11 ± 10%  perf-stat.overall.cache-miss-rate%
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.overall.cpi
    748.41 ±  7%     +78.6%       1336 ±  9%  perf-stat.overall.cycles-between-cache-misses
      0.34 ±  3%     +11.8%       0.37 ±  2%  perf-stat.overall.ipc
 6.915e+10 ±  3%     +13.8%  7.871e+10 ±  2%  perf-stat.ps.branch-instructions
 1.201e+09 ±  7%     -43.5%  6.784e+08 ±  9%  perf-stat.ps.cache-misses
 5.475e+09 ±  5%     -23.0%  4.216e+09 ±  2%  perf-stat.ps.cache-references
 2.995e+11 ±  3%     +12.4%  3.366e+11 ±  2%  perf-stat.ps.instructions
 1.829e+13 ±  3%     +10.2%  2.015e+13 ±  4%  perf-stat.total.instructions
      3.09 ±  2%      -1.6        1.52 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      3.25 ±  2%      -1.2        2.06 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
     93.17            -0.5       92.64        perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
     95.62            -0.3       95.36        perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.15 ±  4%      -0.1        1.00 ±  2%  perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      0.72            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      1.04 ±  4%      -0.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
     96.32            -0.1       96.21        perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.45            +0.1       97.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.66 ±  4%      +0.1        0.76 ±  7%  perf-profile.calltrace.cycles-pp.stress_lockofd_contention
     97.61            +0.1       97.71        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.74 ±  2%      +0.3        1.07        perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
      3.66 ±  3%      -1.6        2.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      1.66 ±  3%      -1.5        0.15 ±  9%  perf-profile.children.cycles-pp.___slab_alloc
      1.66 ±  4%      -1.2        0.42 ±  8%  perf-profile.children.cycles-pp.__slab_free
      4.08            -0.9        3.16 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.97 ±  5%      -0.9        0.06 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.69 ±  5%      -0.6        0.08 ±  8%  perf-profile.children.cycles-pp.get_partial_node
     93.45            -0.6       92.87        perf-profile.children.cycles-pp.posix_lock_inode
      0.62 ±  4%      -0.5        0.07 ± 12%  perf-profile.children.cycles-pp.__put_partials
     95.72            -0.2       95.48        perf-profile.children.cycles-pp.fcntl_setlk
      0.46 ±  5%      -0.2        0.29        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.16 ±  4%      -0.1        1.01 ±  2%  perf-profile.children.cycles-pp.locks_insert_lock_ctx
      0.82            -0.1        0.70 ±  2%  perf-profile.children.cycles-pp.locks_release_private
      0.31            -0.1        0.21 ±  2%  perf-profile.children.cycles-pp.__locks_delete_block
     96.34            -0.1       96.24        perf-profile.children.cycles-pp.do_fcntl
      0.15 ±  3%      -0.0        0.11 ± 14%  perf-profile.children.cycles-pp.__libc_fcntl64
      0.16 ±  3%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.stress_mwc16
      0.12 ±  3%      -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.locks_copy_lock
      0.18            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.locks_get_lock_context
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.07            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.stress_mwc64
      0.15 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.flock64_to_posix_lock
      0.29            +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
      0.11 ±  4%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__init_waitqueue_head
      0.23 ±  4%      +0.1        0.28        perf-profile.children.cycles-pp.fdget_raw
      0.32 ±  4%      +0.1        0.37        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.71 ±  3%      +0.1        0.78 ±  7%  perf-profile.children.cycles-pp.stress_lockofd_contention
     97.58            +0.1       97.66        perf-profile.children.cycles-pp.do_syscall_64
      0.42 ±  4%      +0.2        0.58        perf-profile.children.cycles-pp.__pi_memset
      0.00            +0.2        0.22 ±  9%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.2        0.23 ±  7%  perf-profile.children.cycles-pp.__pcs_replace_empty_main
      1.59 ±  3%      -1.2        0.42 ±  8%  perf-profile.self.cycles-pp.__slab_free
      0.95 ±  2%      -0.9        0.06 ±  7%  perf-profile.self.cycles-pp.___slab_alloc
      0.24 ±  2%      -0.2        0.06 ± 13%  perf-profile.self.cycles-pp.get_partial_node
      1.42 ±  3%      -0.2        1.24 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.46 ±  5%      -0.2        0.29        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.79            -0.2        0.64 ±  2%  perf-profile.self.cycles-pp.locks_release_private
      0.30            -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.__locks_delete_block
      0.51 ±  8%      -0.1        0.43 ±  5%  perf-profile.self.cycles-pp.locks_unlink_lock_ctx
      0.09            -0.1        0.02 ± 99%  perf-profile.self.cycles-pp.__put_partials
      0.13 ±  3%      -0.0        0.09 ± 17%  perf-profile.self.cycles-pp.__libc_fcntl64
      0.15 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.stress_mwc16
      0.12 ±  6%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.locks_insert_lock_ctx
      0.18 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.locks_copy_lock
      0.11 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.locks_get_lock_context
      0.09 ±  4%      +0.0        0.10        perf-profile.self.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.flock64_to_posix_lock
      0.06 ±  6%      +0.0        0.09        perf-profile.self.cycles-pp.stress_mwc64
      0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
      0.14 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.do_fcntl
      0.08 ±  4%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__init_waitqueue_head
      0.22 ±  4%      +0.1        0.27        perf-profile.self.cycles-pp.fdget_raw
      0.31 ±  4%      +0.1        0.36 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.24 ±  4%      +0.1        0.29        perf-profile.self.cycles-pp.do_syscall_64
      0.24 ±  2%      +0.1        0.30 ±  2%  perf-profile.self.cycles-pp.__x64_sys_fcntl
      0.42 ±  2%      +0.1        0.48        perf-profile.self.cycles-pp.fcntl_setlk
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
      0.67 ±  4%      +0.1        0.75 ±  7%  perf-profile.self.cycles-pp.stress_lockofd_contention
      0.40 ±  4%      +0.2        0.56 ±  2%  perf-profile.self.cycles-pp.__pi_memset
      1.57            +1.0        2.56        perf-profile.self.cycles-pp.kmem_cache_free
     60.77            +1.9       62.63        perf-profile.self.cycles-pp.posix_lock_inode



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-18  7:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  7:35 [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox