* [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
@ 2026-03-18 7:35 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-03-18 7:35 UTC (permalink / raw)
To: Vlastimil Babka
Cc: oe-lkp, lkp, linux-kernel, Suren Baghdasaryan, Harry Yoo, Hao Li,
Breno Leitao, Liam R. Howlett, Zhao Liu, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:
commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:
nr_threads: 100%
disk: 1SSD
testtime: 60s
fs: xfs
test: lockofd
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s
commit:
4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
e47c897a29 ("slab: add sheaves to most caches")
4b038a9670154e8b e47c897a29491ade20b27612fdd
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.912e+09 ± 3% +15.1% 2.201e+09 ± 2% stress-ng.lockofd.ops
31892971 ± 3% +15.1% 36721113 ± 2% stress-ng.lockofd.ops_per_sec
0.28 ± 2% +0.0 0.31 ± 2% mpstat.cpu.all.irq%
0.33 ± 3% +12.5% 0.38 ± 2% turbostat.IPC
17.77 ± 2% -19.2% 14.36 ± 5% turbostat.RAMWatt
6374 ± 13% -91.1% 568.33 ± 47% perf-c2c.DRAM.local
209276 ± 12% -69.0% 64817 ± 40% perf-c2c.DRAM.remote
475369 ± 14% -64.4% 169285 ± 36% perf-c2c.HITM.local
24507 ± 12% -49.3% 12424 ± 40% perf-c2c.HITM.remote
499876 ± 13% -63.6% 181710 ± 36% perf-c2c.HITM.total
4.03 ± 11% -49.8% 2.02 ± 10% perf-stat.i.MPKI
7.069e+10 ± 3% +14.3% 8.083e+10 perf-stat.i.branch-instructions
0.40 -0.1 0.33 perf-stat.i.branch-miss-rate%
22.12 ± 13% -6.0 16.15 ± 10% perf-stat.i.cache-miss-rate%
1.229e+09 ± 7% -43.2% 6.977e+08 ± 9% perf-stat.i.cache-misses
5.599e+09 ± 5% -22.7% 4.329e+09 ± 2% perf-stat.i.cache-references
2.99 ± 3% -10.6% 2.67 ± 2% perf-stat.i.cpi
748.51 ± 7% +78.5% 1336 ± 9% perf-stat.i.cycles-between-cache-misses
3.062e+11 ± 3% +12.9% 3.456e+11 perf-stat.i.instructions
0.33 ± 3% +11.8% 0.37 ± 2% perf-stat.i.ipc
4.03 ± 11% -49.9% 2.02 ± 10% perf-stat.overall.MPKI
0.40 -0.1 0.33 perf-stat.overall.branch-miss-rate%
22.10 ± 13% -6.0 16.11 ± 10% perf-stat.overall.cache-miss-rate%
2.99 ± 3% -10.6% 2.67 ± 2% perf-stat.overall.cpi
748.41 ± 7% +78.6% 1336 ± 9% perf-stat.overall.cycles-between-cache-misses
0.34 ± 3% +11.8% 0.37 ± 2% perf-stat.overall.ipc
6.915e+10 ± 3% +13.8% 7.871e+10 ± 2% perf-stat.ps.branch-instructions
1.201e+09 ± 7% -43.5% 6.784e+08 ± 9% perf-stat.ps.cache-misses
5.475e+09 ± 5% -23.0% 4.216e+09 ± 2% perf-stat.ps.cache-references
2.995e+11 ± 3% +12.4% 3.366e+11 ± 2% perf-stat.ps.instructions
1.829e+13 ± 3% +10.2% 2.015e+13 ± 4% perf-stat.total.instructions
3.09 ± 2% -1.6 1.52 ± 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
3.25 ± 2% -1.2 2.06 ± 3% perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
93.17 -0.5 92.64 perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
95.62 -0.3 95.36 perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.15 ± 4% -0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
0.72 -0.1 0.58 ± 2% perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
1.04 ± 4% -0.1 0.92 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
96.32 -0.1 96.21 perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.45 +0.1 97.54 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.66 ± 4% +0.1 0.76 ± 7% perf-profile.calltrace.cycles-pp.stress_lockofd_contention
97.61 +0.1 97.71 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
0.74 ± 2% +0.3 1.07 perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
3.66 ± 3% -1.6 2.10 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
1.66 ± 3% -1.5 0.15 ± 9% perf-profile.children.cycles-pp.___slab_alloc
1.66 ± 4% -1.2 0.42 ± 8% perf-profile.children.cycles-pp.__slab_free
4.08 -0.9 3.16 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.97 ± 5% -0.9 0.06 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.69 ± 5% -0.6 0.08 ± 8% perf-profile.children.cycles-pp.get_partial_node
93.45 -0.6 92.87 perf-profile.children.cycles-pp.posix_lock_inode
0.62 ± 4% -0.5 0.07 ± 12% perf-profile.children.cycles-pp.__put_partials
95.72 -0.2 95.48 perf-profile.children.cycles-pp.fcntl_setlk
0.46 ± 5% -0.2 0.29 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.16 ± 4% -0.1 1.01 ± 2% perf-profile.children.cycles-pp.locks_insert_lock_ctx
0.82 -0.1 0.70 ± 2% perf-profile.children.cycles-pp.locks_release_private
0.31 -0.1 0.21 ± 2% perf-profile.children.cycles-pp.__locks_delete_block
96.34 -0.1 96.24 perf-profile.children.cycles-pp.do_fcntl
0.15 ± 3% -0.0 0.11 ± 14% perf-profile.children.cycles-pp.__libc_fcntl64
0.16 ± 3% -0.0 0.13 ± 5% perf-profile.children.cycles-pp.stress_mwc16
0.12 ± 3% -0.0 0.10 ± 5% perf-profile.children.cycles-pp.locks_copy_lock
0.18 -0.0 0.17 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.15 -0.0 0.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.07 +0.0 0.08 perf-profile.children.cycles-pp.locks_get_lock_context
0.10 ± 3% +0.0 0.11 perf-profile.children.cycles-pp.x64_sys_call
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.07 +0.0 0.10 ± 4% perf-profile.children.cycles-pp.stress_mwc64
0.15 ± 3% +0.0 0.18 ± 3% perf-profile.children.cycles-pp.flock64_to_posix_lock
0.29 +0.0 0.32 ± 2% perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
0.11 ± 4% +0.0 0.15 ± 2% perf-profile.children.cycles-pp.__init_waitqueue_head
0.23 ± 4% +0.1 0.28 perf-profile.children.cycles-pp.fdget_raw
0.32 ± 4% +0.1 0.37 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.71 ± 3% +0.1 0.78 ± 7% perf-profile.children.cycles-pp.stress_lockofd_contention
97.58 +0.1 97.66 perf-profile.children.cycles-pp.do_syscall_64
0.42 ± 4% +0.2 0.58 perf-profile.children.cycles-pp.__pi_memset
0.00 +0.2 0.22 ± 9% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
0.00 +0.2 0.23 ± 7% perf-profile.children.cycles-pp.__pcs_replace_empty_main
1.59 ± 3% -1.2 0.42 ± 8% perf-profile.self.cycles-pp.__slab_free
0.95 ± 2% -0.9 0.06 ± 7% perf-profile.self.cycles-pp.___slab_alloc
0.24 ± 2% -0.2 0.06 ± 13% perf-profile.self.cycles-pp.get_partial_node
1.42 ± 3% -0.2 1.24 ± 2% perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
0.46 ± 5% -0.2 0.29 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.79 -0.2 0.64 ± 2% perf-profile.self.cycles-pp.locks_release_private
0.30 -0.1 0.20 ± 2% perf-profile.self.cycles-pp.__locks_delete_block
0.51 ± 8% -0.1 0.43 ± 5% perf-profile.self.cycles-pp.locks_unlink_lock_ctx
0.09 -0.1 0.02 ± 99% perf-profile.self.cycles-pp.__put_partials
0.13 ± 3% -0.0 0.09 ± 17% perf-profile.self.cycles-pp.__libc_fcntl64
0.15 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.stress_mwc16
0.12 ± 6% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.locks_insert_lock_ctx
0.18 ± 2% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.locks_copy_lock
0.11 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.06 +0.0 0.07 perf-profile.self.cycles-pp.locks_get_lock_context
0.09 ± 4% +0.0 0.10 perf-profile.self.cycles-pp.x64_sys_call
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.flock64_to_posix_lock
0.06 ± 6% +0.0 0.09 perf-profile.self.cycles-pp.stress_mwc64
0.27 ± 2% +0.0 0.30 ± 2% perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
0.14 ± 4% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.do_fcntl
0.08 ± 4% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.__init_waitqueue_head
0.22 ± 4% +0.1 0.27 perf-profile.self.cycles-pp.fdget_raw
0.31 ± 4% +0.1 0.36 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.24 ± 4% +0.1 0.29 perf-profile.self.cycles-pp.do_syscall_64
0.24 ± 2% +0.1 0.30 ± 2% perf-profile.self.cycles-pp.__x64_sys_fcntl
0.42 ± 2% +0.1 0.48 perf-profile.self.cycles-pp.fcntl_setlk
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
0.67 ± 4% +0.1 0.75 ± 7% perf-profile.self.cycles-pp.stress_lockofd_contention
0.40 ± 4% +0.2 0.56 ± 2% perf-profile.self.cycles-pp.__pi_memset
1.57 +1.0 2.56 perf-profile.self.cycles-pp.kmem_cache_free
60.77 +1.9 62.63 perf-profile.self.cycles-pp.posix_lock_inode
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-03-18 7:35 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 7:35 [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox