All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Harry Yoo <harry.yoo@oracle.com>, "Hao Li" <hao.li@linux.dev>,
	Breno Leitao <leitao@debian.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Zhao Liu <zhao1.liu@intel.com>, <linux-mm@kvack.org>,
	<oliver.sang@intel.com>
Subject: [linus:master] [slab]  e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
Date: Wed, 18 Mar 2026 15:35:29 +0800	[thread overview]
Message-ID: <202603181437.2b4fc5d4-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:


commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:

	nr_threads: 100%
	disk: 1SSD
	testtime: 60s
	fs: xfs
	test: lockofd
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s

commit: 
  4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
  e47c897a29 ("slab: add sheaves to most caches")

4b038a9670154e8b e47c897a29491ade20b27612fdd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.912e+09 ±  3%     +15.1%  2.201e+09 ±  2%  stress-ng.lockofd.ops
  31892971 ±  3%     +15.1%   36721113 ±  2%  stress-ng.lockofd.ops_per_sec
      0.28 ±  2%      +0.0        0.31 ±  2%  mpstat.cpu.all.irq%
      0.33 ±  3%     +12.5%       0.38 ±  2%  turbostat.IPC
     17.77 ±  2%     -19.2%      14.36 ±  5%  turbostat.RAMWatt
      6374 ± 13%     -91.1%     568.33 ± 47%  perf-c2c.DRAM.local
    209276 ± 12%     -69.0%      64817 ± 40%  perf-c2c.DRAM.remote
    475369 ± 14%     -64.4%     169285 ± 36%  perf-c2c.HITM.local
     24507 ± 12%     -49.3%      12424 ± 40%  perf-c2c.HITM.remote
    499876 ± 13%     -63.6%     181710 ± 36%  perf-c2c.HITM.total
      4.03 ± 11%     -49.8%       2.02 ± 10%  perf-stat.i.MPKI
 7.069e+10 ±  3%     +14.3%  8.083e+10        perf-stat.i.branch-instructions
      0.40            -0.1        0.33        perf-stat.i.branch-miss-rate%
     22.12 ± 13%      -6.0       16.15 ± 10%  perf-stat.i.cache-miss-rate%
 1.229e+09 ±  7%     -43.2%  6.977e+08 ±  9%  perf-stat.i.cache-misses
 5.599e+09 ±  5%     -22.7%  4.329e+09 ±  2%  perf-stat.i.cache-references
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.i.cpi
    748.51 ±  7%     +78.5%       1336 ±  9%  perf-stat.i.cycles-between-cache-misses
 3.062e+11 ±  3%     +12.9%  3.456e+11        perf-stat.i.instructions
      0.33 ±  3%     +11.8%       0.37 ±  2%  perf-stat.i.ipc
      4.03 ± 11%     -49.9%       2.02 ± 10%  perf-stat.overall.MPKI
      0.40            -0.1        0.33        perf-stat.overall.branch-miss-rate%
     22.10 ± 13%      -6.0       16.11 ± 10%  perf-stat.overall.cache-miss-rate%
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.overall.cpi
    748.41 ±  7%     +78.6%       1336 ±  9%  perf-stat.overall.cycles-between-cache-misses
      0.34 ±  3%     +11.8%       0.37 ±  2%  perf-stat.overall.ipc
 6.915e+10 ±  3%     +13.8%  7.871e+10 ±  2%  perf-stat.ps.branch-instructions
 1.201e+09 ±  7%     -43.5%  6.784e+08 ±  9%  perf-stat.ps.cache-misses
 5.475e+09 ±  5%     -23.0%  4.216e+09 ±  2%  perf-stat.ps.cache-references
 2.995e+11 ±  3%     +12.4%  3.366e+11 ±  2%  perf-stat.ps.instructions
 1.829e+13 ±  3%     +10.2%  2.015e+13 ±  4%  perf-stat.total.instructions
      3.09 ±  2%      -1.6        1.52 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      3.25 ±  2%      -1.2        2.06 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
     93.17            -0.5       92.64        perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
     95.62            -0.3       95.36        perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.15 ±  4%      -0.1        1.00 ±  2%  perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      0.72            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      1.04 ±  4%      -0.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
     96.32            -0.1       96.21        perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.45            +0.1       97.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.66 ±  4%      +0.1        0.76 ±  7%  perf-profile.calltrace.cycles-pp.stress_lockofd_contention
     97.61            +0.1       97.71        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.74 ±  2%      +0.3        1.07        perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
      3.66 ±  3%      -1.6        2.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      1.66 ±  3%      -1.5        0.15 ±  9%  perf-profile.children.cycles-pp.___slab_alloc
      1.66 ±  4%      -1.2        0.42 ±  8%  perf-profile.children.cycles-pp.__slab_free
      4.08            -0.9        3.16 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.97 ±  5%      -0.9        0.06 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.69 ±  5%      -0.6        0.08 ±  8%  perf-profile.children.cycles-pp.get_partial_node
     93.45            -0.6       92.87        perf-profile.children.cycles-pp.posix_lock_inode
      0.62 ±  4%      -0.5        0.07 ± 12%  perf-profile.children.cycles-pp.__put_partials
     95.72            -0.2       95.48        perf-profile.children.cycles-pp.fcntl_setlk
      0.46 ±  5%      -0.2        0.29        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.16 ±  4%      -0.1        1.01 ±  2%  perf-profile.children.cycles-pp.locks_insert_lock_ctx
      0.82            -0.1        0.70 ±  2%  perf-profile.children.cycles-pp.locks_release_private
      0.31            -0.1        0.21 ±  2%  perf-profile.children.cycles-pp.__locks_delete_block
     96.34            -0.1       96.24        perf-profile.children.cycles-pp.do_fcntl
      0.15 ±  3%      -0.0        0.11 ± 14%  perf-profile.children.cycles-pp.__libc_fcntl64
      0.16 ±  3%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.stress_mwc16
      0.12 ±  3%      -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.locks_copy_lock
      0.18            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.locks_get_lock_context
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.07            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.stress_mwc64
      0.15 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.flock64_to_posix_lock
      0.29            +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
      0.11 ±  4%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__init_waitqueue_head
      0.23 ±  4%      +0.1        0.28        perf-profile.children.cycles-pp.fdget_raw
      0.32 ±  4%      +0.1        0.37        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.71 ±  3%      +0.1        0.78 ±  7%  perf-profile.children.cycles-pp.stress_lockofd_contention
     97.58            +0.1       97.66        perf-profile.children.cycles-pp.do_syscall_64
      0.42 ±  4%      +0.2        0.58        perf-profile.children.cycles-pp.__pi_memset
      0.00            +0.2        0.22 ±  9%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.2        0.23 ±  7%  perf-profile.children.cycles-pp.__pcs_replace_empty_main
      1.59 ±  3%      -1.2        0.42 ±  8%  perf-profile.self.cycles-pp.__slab_free
      0.95 ±  2%      -0.9        0.06 ±  7%  perf-profile.self.cycles-pp.___slab_alloc
      0.24 ±  2%      -0.2        0.06 ± 13%  perf-profile.self.cycles-pp.get_partial_node
      1.42 ±  3%      -0.2        1.24 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.46 ±  5%      -0.2        0.29        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.79            -0.2        0.64 ±  2%  perf-profile.self.cycles-pp.locks_release_private
      0.30            -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.__locks_delete_block
      0.51 ±  8%      -0.1        0.43 ±  5%  perf-profile.self.cycles-pp.locks_unlink_lock_ctx
      0.09            -0.1        0.02 ± 99%  perf-profile.self.cycles-pp.__put_partials
      0.13 ±  3%      -0.0        0.09 ± 17%  perf-profile.self.cycles-pp.__libc_fcntl64
      0.15 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.stress_mwc16
      0.12 ±  6%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.locks_insert_lock_ctx
      0.18 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.locks_copy_lock
      0.11 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.locks_get_lock_context
      0.09 ±  4%      +0.0        0.10        perf-profile.self.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.flock64_to_posix_lock
      0.06 ±  6%      +0.0        0.09        perf-profile.self.cycles-pp.stress_mwc64
      0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
      0.14 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.do_fcntl
      0.08 ±  4%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__init_waitqueue_head
      0.22 ±  4%      +0.1        0.27        perf-profile.self.cycles-pp.fdget_raw
      0.31 ±  4%      +0.1        0.36 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.24 ±  4%      +0.1        0.29        perf-profile.self.cycles-pp.do_syscall_64
      0.24 ±  2%      +0.1        0.30 ±  2%  perf-profile.self.cycles-pp.__x64_sys_fcntl
      0.42 ±  2%      +0.1        0.48        perf-profile.self.cycles-pp.fcntl_setlk
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
      0.67 ±  4%      +0.1        0.75 ±  7%  perf-profile.self.cycles-pp.stress_lockofd_contention
      0.40 ±  4%      +0.2        0.56 ±  2%  perf-profile.self.cycles-pp.__pi_memset
      1.57            +1.0        2.56        perf-profile.self.cycles-pp.kmem_cache_free
     60.77            +1.9       62.63        perf-profile.self.cycles-pp.posix_lock_inode



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



                 reply	other threads:[~2026-03-18  7:35 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202603181437.2b4fc5d4-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.