* [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
@ 2026-03-18 7:35 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-03-18 7:35 UTC (permalink / raw)
To: Vlastimil Babka
Cc: oe-lkp, lkp, linux-kernel, Suren Baghdasaryan, Harry Yoo, Hao Li,
Breno Leitao, Liam R. Howlett, Zhao Liu, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:
commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:
nr_threads: 100%
disk: 1SSD
testtime: 60s
fs: xfs
test: lockofd
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s
commit:
4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
e47c897a29 ("slab: add sheaves to most caches")
4b038a9670154e8b e47c897a29491ade20b27612fdd
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.912e+09 ± 3% +15.1% 2.201e+09 ± 2% stress-ng.lockofd.ops
31892971 ± 3% +15.1% 36721113 ± 2% stress-ng.lockofd.ops_per_sec
0.28 ± 2% +0.0 0.31 ± 2% mpstat.cpu.all.irq%
0.33 ± 3% +12.5% 0.38 ± 2% turbostat.IPC
17.77 ± 2% -19.2% 14.36 ± 5% turbostat.RAMWatt
6374 ± 13% -91.1% 568.33 ± 47% perf-c2c.DRAM.local
209276 ± 12% -69.0% 64817 ± 40% perf-c2c.DRAM.remote
475369 ± 14% -64.4% 169285 ± 36% perf-c2c.HITM.local
24507 ± 12% -49.3% 12424 ± 40% perf-c2c.HITM.remote
499876 ± 13% -63.6% 181710 ± 36% perf-c2c.HITM.total
4.03 ± 11% -49.8% 2.02 ± 10% perf-stat.i.MPKI
7.069e+10 ± 3% +14.3% 8.083e+10 perf-stat.i.branch-instructions
0.40 -0.1 0.33 perf-stat.i.branch-miss-rate%
22.12 ± 13% -6.0 16.15 ± 10% perf-stat.i.cache-miss-rate%
1.229e+09 ± 7% -43.2% 6.977e+08 ± 9% perf-stat.i.cache-misses
5.599e+09 ± 5% -22.7% 4.329e+09 ± 2% perf-stat.i.cache-references
2.99 ± 3% -10.6% 2.67 ± 2% perf-stat.i.cpi
748.51 ± 7% +78.5% 1336 ± 9% perf-stat.i.cycles-between-cache-misses
3.062e+11 ± 3% +12.9% 3.456e+11 perf-stat.i.instructions
0.33 ± 3% +11.8% 0.37 ± 2% perf-stat.i.ipc
4.03 ± 11% -49.9% 2.02 ± 10% perf-stat.overall.MPKI
0.40 -0.1 0.33 perf-stat.overall.branch-miss-rate%
22.10 ± 13% -6.0 16.11 ± 10% perf-stat.overall.cache-miss-rate%
2.99 ± 3% -10.6% 2.67 ± 2% perf-stat.overall.cpi
748.41 ± 7% +78.6% 1336 ± 9% perf-stat.overall.cycles-between-cache-misses
0.34 ± 3% +11.8% 0.37 ± 2% perf-stat.overall.ipc
6.915e+10 ± 3% +13.8% 7.871e+10 ± 2% perf-stat.ps.branch-instructions
1.201e+09 ± 7% -43.5% 6.784e+08 ± 9% perf-stat.ps.cache-misses
5.475e+09 ± 5% -23.0% 4.216e+09 ± 2% perf-stat.ps.cache-references
2.995e+11 ± 3% +12.4% 3.366e+11 ± 2% perf-stat.ps.instructions
1.829e+13 ± 3% +10.2% 2.015e+13 ± 4% perf-stat.total.instructions
3.09 ± 2% -1.6 1.52 ± 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
3.25 ± 2% -1.2 2.06 ± 3% perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
93.17 -0.5 92.64 perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
95.62 -0.3 95.36 perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.15 ± 4% -0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
0.72 -0.1 0.58 ± 2% perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
1.04 ± 4% -0.1 0.92 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
96.32 -0.1 96.21 perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.45 +0.1 97.54 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.66 ± 4% +0.1 0.76 ± 7% perf-profile.calltrace.cycles-pp.stress_lockofd_contention
97.61 +0.1 97.71 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
0.74 ± 2% +0.3 1.07 perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
3.66 ± 3% -1.6 2.10 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
1.66 ± 3% -1.5 0.15 ± 9% perf-profile.children.cycles-pp.___slab_alloc
1.66 ± 4% -1.2 0.42 ± 8% perf-profile.children.cycles-pp.__slab_free
4.08 -0.9 3.16 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.97 ± 5% -0.9 0.06 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.69 ± 5% -0.6 0.08 ± 8% perf-profile.children.cycles-pp.get_partial_node
93.45 -0.6 92.87 perf-profile.children.cycles-pp.posix_lock_inode
0.62 ± 4% -0.5 0.07 ± 12% perf-profile.children.cycles-pp.__put_partials
95.72 -0.2 95.48 perf-profile.children.cycles-pp.fcntl_setlk
0.46 ± 5% -0.2 0.29 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.16 ± 4% -0.1 1.01 ± 2% perf-profile.children.cycles-pp.locks_insert_lock_ctx
0.82 -0.1 0.70 ± 2% perf-profile.children.cycles-pp.locks_release_private
0.31 -0.1 0.21 ± 2% perf-profile.children.cycles-pp.__locks_delete_block
96.34 -0.1 96.24 perf-profile.children.cycles-pp.do_fcntl
0.15 ± 3% -0.0 0.11 ± 14% perf-profile.children.cycles-pp.__libc_fcntl64
0.16 ± 3% -0.0 0.13 ± 5% perf-profile.children.cycles-pp.stress_mwc16
0.12 ± 3% -0.0 0.10 ± 5% perf-profile.children.cycles-pp.locks_copy_lock
0.18 -0.0 0.17 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.15 -0.0 0.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.07 +0.0 0.08 perf-profile.children.cycles-pp.locks_get_lock_context
0.10 ± 3% +0.0 0.11 perf-profile.children.cycles-pp.x64_sys_call
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.07 +0.0 0.10 ± 4% perf-profile.children.cycles-pp.stress_mwc64
0.15 ± 3% +0.0 0.18 ± 3% perf-profile.children.cycles-pp.flock64_to_posix_lock
0.29 +0.0 0.32 ± 2% perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
0.11 ± 4% +0.0 0.15 ± 2% perf-profile.children.cycles-pp.__init_waitqueue_head
0.23 ± 4% +0.1 0.28 perf-profile.children.cycles-pp.fdget_raw
0.32 ± 4% +0.1 0.37 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.71 ± 3% +0.1 0.78 ± 7% perf-profile.children.cycles-pp.stress_lockofd_contention
97.58 +0.1 97.66 perf-profile.children.cycles-pp.do_syscall_64
0.42 ± 4% +0.2 0.58 perf-profile.children.cycles-pp.__pi_memset
0.00 +0.2 0.22 ± 9% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
0.00 +0.2 0.23 ± 7% perf-profile.children.cycles-pp.__pcs_replace_empty_main
1.59 ± 3% -1.2 0.42 ± 8% perf-profile.self.cycles-pp.__slab_free
0.95 ± 2% -0.9 0.06 ± 7% perf-profile.self.cycles-pp.___slab_alloc
0.24 ± 2% -0.2 0.06 ± 13% perf-profile.self.cycles-pp.get_partial_node
1.42 ± 3% -0.2 1.24 ± 2% perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
0.46 ± 5% -0.2 0.29 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.79 -0.2 0.64 ± 2% perf-profile.self.cycles-pp.locks_release_private
0.30 -0.1 0.20 ± 2% perf-profile.self.cycles-pp.__locks_delete_block
0.51 ± 8% -0.1 0.43 ± 5% perf-profile.self.cycles-pp.locks_unlink_lock_ctx
0.09 -0.1 0.02 ± 99% perf-profile.self.cycles-pp.__put_partials
0.13 ± 3% -0.0 0.09 ± 17% perf-profile.self.cycles-pp.__libc_fcntl64
0.15 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.stress_mwc16
0.12 ± 6% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.locks_insert_lock_ctx
0.18 ± 2% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.locks_copy_lock
0.11 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.06 +0.0 0.07 perf-profile.self.cycles-pp.locks_get_lock_context
0.09 ± 4% +0.0 0.10 perf-profile.self.cycles-pp.x64_sys_call
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.13 ± 5% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.flock64_to_posix_lock
0.06 ± 6% +0.0 0.09 perf-profile.self.cycles-pp.stress_mwc64
0.27 ± 2% +0.0 0.30 ± 2% perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
0.14 ± 4% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.do_fcntl
0.08 ± 4% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.__init_waitqueue_head
0.22 ± 4% +0.1 0.27 perf-profile.self.cycles-pp.fdget_raw
0.31 ± 4% +0.1 0.36 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.24 ± 4% +0.1 0.29 perf-profile.self.cycles-pp.do_syscall_64
0.24 ± 2% +0.1 0.30 ± 2% perf-profile.self.cycles-pp.__x64_sys_fcntl
0.42 ± 2% +0.1 0.48 perf-profile.self.cycles-pp.fcntl_setlk
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
0.67 ± 4% +0.1 0.75 ± 7% perf-profile.self.cycles-pp.stress_lockofd_contention
0.40 ± 4% +0.2 0.56 ± 2% perf-profile.self.cycles-pp.__pi_memset
1.57 +1.0 2.56 perf-profile.self.cycles-pp.kmem_cache_free
60.77 +1.9 62.63 perf-profile.self.cycles-pp.posix_lock_inode
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-03-18 7:35 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 7:35 [linus:master] [slab] e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.