public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Harry Yoo <harry.yoo@oracle.com>, "Hao Li" <hao.li@linux.dev>,
	Breno Leitao <leitao@debian.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Zhao Liu <zhao1.liu@intel.com>, <linux-mm@kvack.org>,
	<oliver.sang@intel.com>
Subject: [linus:master] [slab]  e47c897a29: stress-ng.lockofd.ops_per_sec 15.1% improvement
Date: Wed, 18 Mar 2026 15:35:29 +0800	[thread overview]
Message-ID: <202603181437.2b4fc5d4-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 15.1% improvement of stress-ng.lockofd.ops_per_sec on:


commit: e47c897a29491ade20b27612fdd3107c39a07357 ("slab: add sheaves to most caches")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:

	nr_threads: 100%
	disk: 1SSD
	testtime: 60s
	fs: xfs
	test: lockofd
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260318/202603181437.2b4fc5d4-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/1SSD/xfs/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/lockofd/stress-ng/60s

commit: 
  4b038a9670 ("slub: keep empty main sheaf as spare in __pcs_replace_empty_main()")
  e47c897a29 ("slab: add sheaves to most caches")

4b038a9670154e8b e47c897a29491ade20b27612fdd 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.912e+09 ±  3%     +15.1%  2.201e+09 ±  2%  stress-ng.lockofd.ops
  31892971 ±  3%     +15.1%   36721113 ±  2%  stress-ng.lockofd.ops_per_sec
      0.28 ±  2%      +0.0        0.31 ±  2%  mpstat.cpu.all.irq%
      0.33 ±  3%     +12.5%       0.38 ±  2%  turbostat.IPC
     17.77 ±  2%     -19.2%      14.36 ±  5%  turbostat.RAMWatt
      6374 ± 13%     -91.1%     568.33 ± 47%  perf-c2c.DRAM.local
    209276 ± 12%     -69.0%      64817 ± 40%  perf-c2c.DRAM.remote
    475369 ± 14%     -64.4%     169285 ± 36%  perf-c2c.HITM.local
     24507 ± 12%     -49.3%      12424 ± 40%  perf-c2c.HITM.remote
    499876 ± 13%     -63.6%     181710 ± 36%  perf-c2c.HITM.total
      4.03 ± 11%     -49.8%       2.02 ± 10%  perf-stat.i.MPKI
 7.069e+10 ±  3%     +14.3%  8.083e+10        perf-stat.i.branch-instructions
      0.40            -0.1        0.33        perf-stat.i.branch-miss-rate%
     22.12 ± 13%      -6.0       16.15 ± 10%  perf-stat.i.cache-miss-rate%
 1.229e+09 ±  7%     -43.2%  6.977e+08 ±  9%  perf-stat.i.cache-misses
 5.599e+09 ±  5%     -22.7%  4.329e+09 ±  2%  perf-stat.i.cache-references
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.i.cpi
    748.51 ±  7%     +78.5%       1336 ±  9%  perf-stat.i.cycles-between-cache-misses
 3.062e+11 ±  3%     +12.9%  3.456e+11        perf-stat.i.instructions
      0.33 ±  3%     +11.8%       0.37 ±  2%  perf-stat.i.ipc
      4.03 ± 11%     -49.9%       2.02 ± 10%  perf-stat.overall.MPKI
      0.40            -0.1        0.33        perf-stat.overall.branch-miss-rate%
     22.10 ± 13%      -6.0       16.11 ± 10%  perf-stat.overall.cache-miss-rate%
      2.99 ±  3%     -10.6%       2.67 ±  2%  perf-stat.overall.cpi
    748.41 ±  7%     +78.6%       1336 ±  9%  perf-stat.overall.cycles-between-cache-misses
      0.34 ±  3%     +11.8%       0.37 ±  2%  perf-stat.overall.ipc
 6.915e+10 ±  3%     +13.8%  7.871e+10 ±  2%  perf-stat.ps.branch-instructions
 1.201e+09 ±  7%     -43.5%  6.784e+08 ±  9%  perf-stat.ps.cache-misses
 5.475e+09 ±  5%     -23.0%  4.216e+09 ±  2%  perf-stat.ps.cache-references
 2.995e+11 ±  3%     +12.4%  3.366e+11 ±  2%  perf-stat.ps.instructions
 1.829e+13 ±  3%     +10.2%  2.015e+13 ±  4%  perf-stat.total.instructions
      3.09 ±  2%      -1.6        1.52 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      3.25 ±  2%      -1.2        2.06 ±  3%  perf-profile.calltrace.cycles-pp.kmem_cache_free.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
     93.17            -0.5       92.64        perf-profile.calltrace.cycles-pp.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
     95.62            -0.3       95.36        perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.15 ±  4%      -0.1        1.00 ±  2%  perf-profile.calltrace.cycles-pp.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      0.72            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.locks_release_private.posix_lock_inode.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      1.04 ±  4%      -0.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_insert_lock_ctx.posix_lock_inode.fcntl_setlk.do_fcntl
     96.32            -0.1       96.21        perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.45            +0.1       97.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.66 ±  4%      +0.1        0.76 ±  7%  perf-profile.calltrace.cycles-pp.stress_lockofd_contention
     97.61            +0.1       97.71        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.74 ±  2%      +0.3        1.07        perf-profile.calltrace.cycles-pp.kmem_cache_free.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
      3.66 ±  3%      -1.6        2.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
      1.66 ±  3%      -1.5        0.15 ±  9%  perf-profile.children.cycles-pp.___slab_alloc
      1.66 ±  4%      -1.2        0.42 ±  8%  perf-profile.children.cycles-pp.__slab_free
      4.08            -0.9        3.16 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.97 ±  5%      -0.9        0.06 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.69 ±  5%      -0.6        0.08 ±  8%  perf-profile.children.cycles-pp.get_partial_node
     93.45            -0.6       92.87        perf-profile.children.cycles-pp.posix_lock_inode
      0.62 ±  4%      -0.5        0.07 ± 12%  perf-profile.children.cycles-pp.__put_partials
     95.72            -0.2       95.48        perf-profile.children.cycles-pp.fcntl_setlk
      0.46 ±  5%      -0.2        0.29        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.16 ±  4%      -0.1        1.01 ±  2%  perf-profile.children.cycles-pp.locks_insert_lock_ctx
      0.82            -0.1        0.70 ±  2%  perf-profile.children.cycles-pp.locks_release_private
      0.31            -0.1        0.21 ±  2%  perf-profile.children.cycles-pp.__locks_delete_block
     96.34            -0.1       96.24        perf-profile.children.cycles-pp.do_fcntl
      0.15 ±  3%      -0.0        0.11 ± 14%  perf-profile.children.cycles-pp.__libc_fcntl64
      0.16 ±  3%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.stress_mwc16
      0.12 ±  3%      -0.0        0.10 ±  5%  perf-profile.children.cycles-pp.locks_copy_lock
      0.18            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.locks_get_lock_context
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.07            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.stress_mwc64
      0.15 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.flock64_to_posix_lock
      0.29            +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
      0.11 ±  4%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__init_waitqueue_head
      0.23 ±  4%      +0.1        0.28        perf-profile.children.cycles-pp.fdget_raw
      0.32 ±  4%      +0.1        0.37        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.71 ±  3%      +0.1        0.78 ±  7%  perf-profile.children.cycles-pp.stress_lockofd_contention
     97.58            +0.1       97.66        perf-profile.children.cycles-pp.do_syscall_64
      0.42 ±  4%      +0.2        0.58        perf-profile.children.cycles-pp.__pi_memset
      0.00            +0.2        0.22 ±  9%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.2        0.23 ±  7%  perf-profile.children.cycles-pp.__pcs_replace_empty_main
      1.59 ±  3%      -1.2        0.42 ±  8%  perf-profile.self.cycles-pp.__slab_free
      0.95 ±  2%      -0.9        0.06 ±  7%  perf-profile.self.cycles-pp.___slab_alloc
      0.24 ±  2%      -0.2        0.06 ± 13%  perf-profile.self.cycles-pp.get_partial_node
      1.42 ±  3%      -0.2        1.24 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
      0.46 ±  5%      -0.2        0.29        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.79            -0.2        0.64 ±  2%  perf-profile.self.cycles-pp.locks_release_private
      0.30            -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.__locks_delete_block
      0.51 ±  8%      -0.1        0.43 ±  5%  perf-profile.self.cycles-pp.locks_unlink_lock_ctx
      0.09            -0.1        0.02 ± 99%  perf-profile.self.cycles-pp.__put_partials
      0.13 ±  3%      -0.0        0.09 ± 17%  perf-profile.self.cycles-pp.__libc_fcntl64
      0.15 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.stress_mwc16
      0.12 ±  6%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.locks_insert_lock_ctx
      0.18 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.locks_copy_lock
      0.11 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.locks_get_lock_context
      0.09 ±  4%      +0.0        0.10        perf-profile.self.cycles-pp.x64_sys_call
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.13 ±  5%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.flock64_to_posix_lock
      0.06 ±  6%      +0.0        0.09        perf-profile.self.cycles-pp.stress_mwc64
      0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
      0.14 ±  4%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.do_fcntl
      0.08 ±  4%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__init_waitqueue_head
      0.22 ±  4%      +0.1        0.27        perf-profile.self.cycles-pp.fdget_raw
      0.31 ±  4%      +0.1        0.36 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.24 ±  4%      +0.1        0.29        perf-profile.self.cycles-pp.do_syscall_64
      0.24 ±  2%      +0.1        0.30 ±  2%  perf-profile.self.cycles-pp.__x64_sys_fcntl
      0.42 ±  2%      +0.1        0.48        perf-profile.self.cycles-pp.fcntl_setlk
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
      0.67 ±  4%      +0.1        0.75 ±  7%  perf-profile.self.cycles-pp.stress_lockofd_contention
      0.40 ±  4%      +0.2        0.56 ±  2%  perf-profile.self.cycles-pp.__pi_memset
      1.57            +1.0        2.56        perf-profile.self.cycles-pp.kmem_cache_free
     60.77            +1.9       62.63        perf-profile.self.cycles-pp.posix_lock_inode



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



                 reply	other threads:[~2026-03-18  7:35 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202603181437.2b4fc5d4-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox