public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>, <rcu@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: [paulmckrcu:dev.2025.12.16a] [rcu]  1ac50ec628: stress-ng.memfd.ops_per_sec 3.4% improvement
Date: Wed, 7 Jan 2026 14:39:06 +0800	[thread overview]
Message-ID: <202601071316.992a1d32-lkp@intel.com> (raw)


hi, Paul E. McKenney,

similar to b41f5a411f report we just made out. we make this report based on
stable data.
please educate us if this report is less meaningful. thanks


Hello,

kernel test robot noticed a 3.4% improvement of stress-ng.memfd.ops_per_sec on:


commit: 1ac50ec62874025381a864f784583dbdc30dcc7c ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
https://github.com/paulmckrcu/linux dev.2025.12.16a

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: memfd
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/202601071316.992a1d32-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/memfd/stress-ng/60s

commit: 
  43c23963b3 ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
  1ac50ec628 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")

43c23963b3c549da 1ac50ec62874025381a864f7845 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    203232 ± 22%     -21.8%     158902 ± 18%  numa-meminfo.node1.Mapped
     50977 ± 22%     -21.8%      39853 ± 18%  numa-vmstat.node1.nr_mapped
      7039            +1.6%       7152        vmstat.system.cs
    107613            -3.9%     103453        stress-ng.memfd.nanosecs_per_memfd_create_call
    193537            +3.4%     200175        stress-ng.memfd.ops
      3226            +3.4%       3337        stress-ng.memfd.ops_per_sec
    187908            +1.6%     190921        stress-ng.time.involuntary_context_switches
  99134672            +3.4%  1.025e+08        stress-ng.time.minor_page_faults
     61965 ±  3%      -6.2%      58116        proc-vmstat.nr_mapped
 1.526e+08            +3.4%  1.578e+08        proc-vmstat.numa_hit
 1.524e+08            +3.4%  1.576e+08        proc-vmstat.numa_local
 1.631e+08            +3.4%  1.687e+08        proc-vmstat.pgalloc_normal
  99574028            +3.4%   1.03e+08        proc-vmstat.pgfault
 1.624e+08            +3.4%  1.679e+08        proc-vmstat.pgfree
      2.26            +1.1%       2.28        perf-stat.i.MPKI
 1.646e+10            +1.8%  1.675e+10        perf-stat.i.branch-instructions
      0.24            +0.0        0.25        perf-stat.i.branch-miss-rate%
  38660390            +6.8%   41301381        perf-stat.i.branch-misses
 1.714e+08            +3.1%  1.768e+08        perf-stat.i.cache-misses
 2.884e+08            +3.3%  2.979e+08        perf-stat.i.cache-references
      6758            +1.3%       6846        perf-stat.i.context-switches
      7.88            -2.0%       7.73        perf-stat.i.cpi
      3504            -3.1%       3397        perf-stat.i.cycles-between-cache-misses
 7.628e+10            +2.0%  7.783e+10        perf-stat.i.instructions
      0.13            +2.0%       0.13        perf-stat.i.ipc
     17.01            +3.5%      17.59        perf-stat.i.metric.K/sec
   1632582            +3.5%    1689088        perf-stat.i.minor-faults
   1632582            +3.5%    1689088        perf-stat.i.page-faults
      2.25            +1.1%       2.27        perf-stat.overall.MPKI
      0.23            +0.0        0.25        perf-stat.overall.branch-miss-rate%
      7.91            -2.0%       7.76        perf-stat.overall.cpi
      3519            -3.1%       3411        perf-stat.overall.cycles-between-cache-misses
      0.13            +2.0%       0.13        perf-stat.overall.ipc
 1.619e+10            +1.8%  1.648e+10        perf-stat.ps.branch-instructions
  37927942            +6.8%   40525278        perf-stat.ps.branch-misses
 1.687e+08            +3.1%   1.74e+08        perf-stat.ps.cache-misses
 2.841e+08            +3.3%  2.933e+08        perf-stat.ps.cache-references
      6638            +1.4%       6729        perf-stat.ps.context-switches
 7.503e+10            +2.0%  7.654e+10        perf-stat.ps.instructions
   1606108            +3.5%    1661536        perf-stat.ps.minor-faults
   1606108            +3.5%    1661536        perf-stat.ps.page-faults
 4.564e+12            +1.8%  4.647e+12        perf-stat.total.instructions
     46.05            -0.3       45.79        perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup
     45.93            -0.3       45.68        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode
     46.36            -0.3       46.10        perf-profile.calltrace.cycles-pp.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.12            -0.3       45.87        perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create
     46.26            -0.3       46.01        perf-profile.calltrace.cycles-pp.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64
     46.57            -0.2       46.32        perf-profile.calltrace.cycles-pp.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     46.62            -0.2       46.38        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.memfd_create
     46.61            -0.2       46.37        perf-profile.calltrace.cycles-pp.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     46.64            -0.2       46.40        perf-profile.calltrace.cycles-pp.memfd_create
     46.62            -0.2       46.37        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     45.57            -0.2       45.38        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.finish_dput.__fput
     45.40            -0.2       45.22        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.finish_dput
     46.69            -0.2       46.53        perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.61            -0.2       46.45        perf-profile.calltrace.cycles-pp.finish_dput.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64
     46.73            -0.2       46.57        perf-profile.calltrace.cycles-pp.close_range
     46.71            -0.2       46.55        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.71            -0.2       46.55        perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.73            -0.2       46.57        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.73            -0.2       46.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close_range
     46.60            -0.2       46.45        perf-profile.calltrace.cycles-pp.__dentry_kill.finish_dput.__fput.task_work_run.exit_to_user_mode_loop
     46.40            -0.2       46.24        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.finish_dput.__fput.task_work_run
      0.59            +0.0        0.60        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.56            +0.0        0.57        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.62            +0.0        0.64        perf-profile.calltrace.cycles-pp.__munmap
      0.59            +0.0        0.60        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.57            +0.0        0.59        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.59            +0.0        0.61        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      0.59            +0.0        0.61        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.98            +0.0        1.01        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.94            +0.0        0.97        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.94            +0.0        0.97        perf-profile.calltrace.cycles-pp.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.01            +0.0        1.04        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      0.83            +0.0        0.86        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_shared_fault.do_fault
      0.84            +0.0        0.87        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_shared_fault.do_fault.__handle_mm_fault
      0.84            +0.0        0.88        perf-profile.calltrace.cycles-pp.__do_fault.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.08            +0.0        1.12        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      1.09            +0.0        1.14        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      1.25            +0.1        1.30        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_memfd_child
      1.26            +0.1        1.32        perf-profile.calltrace.cycles-pp.stress_memfd_child
      0.81 ±  2%      +0.1        0.91 ±  2%  perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.80 ±  3%      +0.1        0.90 ±  2%  perf-profile.calltrace.cycles-pp.__mmap_region.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      0.99 ±  2%      +0.1        1.11        perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.00 ±  2%      +0.1        1.13        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.03 ±  2%      +0.1        1.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
      1.02 ±  2%      +0.1        1.15        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.03 ±  2%      +0.1        1.16        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.21            +0.1        1.35        perf-profile.calltrace.cycles-pp.__mmap
     92.36            -0.4       91.92        perf-profile.children.cycles-pp._raw_spin_lock
     92.52            -0.3       92.17        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     46.12            -0.3       45.87        perf-profile.children.cycles-pp.inode_sb_list_add
     46.26            -0.3       46.01        perf-profile.children.cycles-pp.new_inode
     46.36            -0.3       46.10        perf-profile.children.cycles-pp.__shmem_get_inode
     46.57            -0.2       46.32        perf-profile.children.cycles-pp.__shmem_file_setup
     46.61            -0.2       46.37        perf-profile.children.cycles-pp.__x64_sys_memfd_create
     46.65            -0.2       46.41        perf-profile.children.cycles-pp.memfd_create
     47.18            -0.2       47.01        perf-profile.children.cycles-pp.__fput
     47.10            -0.2       46.94        perf-profile.children.cycles-pp.finish_dput
     46.73            -0.2       46.57        perf-profile.children.cycles-pp.close_range
     47.09            -0.2       46.93        perf-profile.children.cycles-pp.__dentry_kill
     46.88            -0.2       46.72        perf-profile.children.cycles-pp.evict
     46.71            -0.2       46.55        perf-profile.children.cycles-pp.exit_to_user_mode_loop
     46.71            -0.2       46.55        perf-profile.children.cycles-pp.task_work_run
     97.60            -0.1       97.51        perf-profile.children.cycles-pp.do_syscall_64
     97.61            -0.1       97.53        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.14            +0.0        0.15        perf-profile.children.cycles-pp.xas_create
      0.11            +0.0        0.12        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.11            +0.0        0.12        perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.mas_rev_awalk
      0.12            +0.0        0.13        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.12            +0.0        0.13        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.29            +0.0        0.30        perf-profile.children.cycles-pp.kmem_cache_free
      0.09 ±  5%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.xas_expand
      0.17 ±  2%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.kmem_cache_alloc_lru_noprof
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.xas_store
      0.13 ±  3%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.flush_tlb_func
      0.47            +0.0        0.49        perf-profile.children.cycles-pp.kthread
      0.47            +0.0        0.49        perf-profile.children.cycles-pp.ret_from_fork
      0.47            +0.0        0.49        perf-profile.children.cycles-pp.ret_from_fork_asm
      0.23            +0.0        0.25        perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.12 ±  4%      +0.0        0.14        perf-profile.children.cycles-pp.shmem_alloc_folio
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.run_ksoftirqd
      0.14 ±  3%      +0.0        0.16        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.13 ±  3%      +0.0        0.15        perf-profile.children.cycles-pp.vm_unmapped_area
      0.15 ±  3%      +0.0        0.17        perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.08 ±  5%      +0.0        0.10        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.59            +0.0        0.60        perf-profile.children.cycles-pp.__vm_munmap
      0.53            +0.0        0.55        perf-profile.children.cycles-pp.handle_softirqs
      0.52            +0.0        0.54        perf-profile.children.cycles-pp.rcu_do_batch
      0.56            +0.0        0.57        perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.59            +0.0        0.61        perf-profile.children.cycles-pp.__x64_sys_munmap
      0.52            +0.0        0.54        perf-profile.children.cycles-pp.rcu_core
      0.31            +0.0        0.33        perf-profile.children.cycles-pp.unmap_page_range
      0.13 ±  2%      +0.0        0.15        perf-profile.children.cycles-pp.unmapped_area_topdown
      0.15 ±  2%      +0.0        0.17        perf-profile.children.cycles-pp.__get_unmapped_area
      0.28            +0.0        0.30        perf-profile.children.cycles-pp.zap_pte_range
      0.63            +0.0        0.65        perf-profile.children.cycles-pp.__munmap
      0.57            +0.0        0.59        perf-profile.children.cycles-pp.do_vmi_munmap
      0.15 ±  2%      +0.0        0.17        perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.21            +0.0        0.23        perf-profile.children.cycles-pp.zap_page_range_single
      0.36            +0.0        0.38        perf-profile.children.cycles-pp.__mmap_new_vma
      0.29            +0.0        0.31        perf-profile.children.cycles-pp.zap_pmd_range
      0.19            +0.0        0.22 ±  2%  perf-profile.children.cycles-pp.zap_page_range_single_batched
      0.23            +0.0        0.26        perf-profile.children.cycles-pp.unmap_mapping_range
      0.05            +0.0        0.08        perf-profile.children.cycles-pp.perf_iterate_sb
      0.51            +0.0        0.54        perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.98            +0.0        1.02        perf-profile.children.cycles-pp.__handle_mm_fault
      0.94            +0.0        0.97        perf-profile.children.cycles-pp.do_fault
      0.94            +0.0        0.97        perf-profile.children.cycles-pp.do_shared_fault
      1.01            +0.0        1.04        perf-profile.children.cycles-pp.handle_mm_fault
      0.84            +0.0        0.87        perf-profile.children.cycles-pp.shmem_fault
      1.09            +0.0        1.12        perf-profile.children.cycles-pp.do_user_addr_fault
      0.84            +0.0        0.88        perf-profile.children.cycles-pp.__do_fault
      1.09            +0.0        1.14        perf-profile.children.cycles-pp.exc_page_fault
      0.14 ±  3%      +0.0        0.18        perf-profile.children.cycles-pp.perf_event_mmap
      0.13 ±  3%      +0.0        0.17        perf-profile.children.cycles-pp.perf_event_mmap_event
      1.02            +0.0        1.06        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.fault_dirty_shared_page
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.perf_event_mmap_output
      1.40            +0.1        1.46        perf-profile.children.cycles-pp.asm_exc_page_fault
      1.53            +0.1        1.60        perf-profile.children.cycles-pp.stress_memfd_child
      0.81 ±  2%      +0.1        0.91 ±  2%  perf-profile.children.cycles-pp.mmap_region
      0.80 ±  2%      +0.1        0.90 ±  2%  perf-profile.children.cycles-pp.__mmap_region
      0.99 ±  2%      +0.1        1.11        perf-profile.children.cycles-pp.do_mmap
      1.00 ±  2%      +0.1        1.13        perf-profile.children.cycles-pp.vm_mmap_pgoff
      1.02 ±  2%      +0.1        1.15        perf-profile.children.cycles-pp.ksys_mmap_pgoff
      1.22            +0.1        1.36        perf-profile.children.cycles-pp.__mmap
     92.16            -0.4       91.80        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.mas_rev_awalk
      0.12            +0.0        0.13        perf-profile.self.cycles-pp.native_flush_tlb_one_user




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2026-01-07  6:39 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202601071316.992a1d32-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox