All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Tejun Heo <tj@kernel.org>, "JP Kobryn" <inwardvessel@gmail.com>,
	<cgroups@vger.kernel.org>, <oliver.sang@intel.com>
Subject: [linux-next:master] [cgroup]  36df6e3dbd: will-it-scale.per_process_ops 2.9% improvement
Date: Thu, 31 Jul 2025 15:31:46 +0800	[thread overview]
Message-ID: <202507310831.cf3e212e-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 2.9% improvement of will-it-scale.per_process_ops on:


commit: 36df6e3dbd7e7b074e55fec080012184e2fa3a46 ("cgroup: make css_rstat_updated nmi safe")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V  CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:

	nr_task: 100%
	mode: process
	test: tlb_flush2
	cpufreq_governor: performance


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250731/202507310831.cf3e212e-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp3/tlb_flush2/will-it-scale

commit: 
  1257b8786a ("cgroup: support to enable nmi-safe css_rstat_updated")
  36df6e3dbd ("cgroup: make css_rstat_updated nmi safe")

1257b8786ac689a2 36df6e3dbd7e7b074e55fec0800 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.78 ±  2%      +0.3        3.11        mpstat.cpu.all.usr%
    522283 ± 32%     +29.1%     674402 ±  5%  sched_debug.cpu.avg_idle.min
  11822911            +2.9%   12170263        will-it-scale.192.processes
     61577            +2.9%      63386        will-it-scale.per_process_ops
  11822911            +2.9%   12170263        will-it-scale.workload
      2.98 ± 11%     -25.3%       2.23 ± 31%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      4312 ± 11%     +13.1%       4878        perf-sched.total_wait_and_delay.max.ms
      4312 ± 11%     +13.1%       4878        perf-sched.total_wait_time.max.ms
    320.37 ±104%    +191.2%     932.90 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    365.55 ± 83%    +155.2%     932.79 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      2.98 ± 11%     -32.0%       2.03 ± 32%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     18415            +2.9%      18955        proc-vmstat.nr_kernel_stack
 1.791e+09            +3.1%  1.848e+09        proc-vmstat.nr_unaccepted
  3.59e+09            +3.0%  3.697e+09        proc-vmstat.numa_interleave
 3.589e+09            +3.0%  3.697e+09        proc-vmstat.pgalloc_dma32
  7.12e+09            +3.0%  7.334e+09        proc-vmstat.pglazyfree
 3.589e+09            +3.0%  3.696e+09        proc-vmstat.pgskip_device
 2.716e+10            +1.7%  2.761e+10        perf-stat.i.branch-instructions
      0.15            +0.0        0.15        perf-stat.i.branch-miss-rate%
  38117449            +3.6%   39497918        perf-stat.i.branch-misses
      4.20            -1.1%       4.16        perf-stat.i.cpi
      0.24            +1.1%       0.24        perf-stat.i.ipc
    245.58            +3.0%     252.83        perf-stat.i.metric.K/sec
  23582407            +2.9%   24272602        perf-stat.i.minor-faults
  23582407            +2.9%   24272602        perf-stat.i.page-faults
      0.14            +0.0        0.14        perf-stat.overall.branch-miss-rate%
      4.21            -1.1%       4.16        perf-stat.overall.cpi
   3359915            -1.8%    3300246        perf-stat.overall.path-length
 2.706e+10            +1.7%  2.752e+10        perf-stat.ps.branch-instructions
  37940559            +3.7%   39340939        perf-stat.ps.branch-misses
  23496794            +2.9%   24189927        perf-stat.ps.minor-faults
  23496794            +2.9%   24189927        perf-stat.ps.page-faults
 3.972e+13            +1.1%  4.016e+13        perf-stat.total.instructions
     58.13            -1.6       56.50        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault
     58.24            -1.6       56.62        perf-profile.calltrace.cycles-pp.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     59.95            -1.5       58.46        perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     60.58            -1.4       59.16        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     61.08            -1.4       59.69        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     62.15            -1.3       60.87        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     62.26            -1.3       60.99        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     28.72            -1.1       27.64        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
     28.74            -1.1       27.66        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.do_anonymous_page
     28.74            -1.1       27.66        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault
     66.84            -0.9       65.93        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     67.47            -0.8       66.62        perf-profile.calltrace.cycles-pp.testcase
     28.90            -0.6       28.27        perf-profile.calltrace.cycles-pp.folios_put_refs.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault
      0.54 ±  4%      +0.1        0.59        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
      0.60 ±  4%      +0.1        0.66        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
      0.68 ±  3%      +0.1        0.74        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.84 ±  4%      +0.1        0.92        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single_batched.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior
      0.94 ±  4%      +0.1        1.03        perf-profile.calltrace.cycles-pp.zap_page_range_single_batched.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior.do_madvise
      1.00 ±  4%      +0.1        1.10        perf-profile.calltrace.cycles-pp.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior.do_madvise.__x64_sys_madvise
      0.92 ±  3%      +0.1        1.02        perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.06 ±  4%      +0.1        1.16        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.madvise_do_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
      1.48 ±  4%      +0.2        1.64        perf-profile.calltrace.cycles-pp.madvise_do_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±100%      +0.3        0.55        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.60 ±  6%      +0.3        0.95        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise
      0.77 ±  5%      +0.4        1.14        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64
      0.08 ±223%      +0.5        0.59        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.do_madvise
     32.13 ±  2%      +0.8       32.98        perf-profile.calltrace.cycles-pp.__madvise
     58.20            -1.6       56.57        perf-profile.children.cycles-pp.folio_batch_move_lru
     58.30            -1.6       56.68        perf-profile.children.cycles-pp.folio_add_lru
     59.98            -1.5       58.50        perf-profile.children.cycles-pp.do_anonymous_page
     60.60            -1.4       59.18        perf-profile.children.cycles-pp.__handle_mm_fault
     61.12            -1.4       59.73        perf-profile.children.cycles-pp.handle_mm_fault
     62.19            -1.3       60.90        perf-profile.children.cycles-pp.do_user_addr_fault
     62.28            -1.3       61.01        perf-profile.children.cycles-pp.exc_page_fault
     67.07            -0.9       66.15        perf-profile.children.cycles-pp.asm_exc_page_fault
      0.14 ±  5%      -0.1        0.09 ±  4%  perf-profile.children.cycles-pp.css_rstat_updated
      0.11 ±  3%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.handle_internal_command
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.main
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.run_builtin
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.__cmd_record
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.cmd_record
      0.11            -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.perf_mmap__push
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.access_error
      0.11 ±  3%      +0.0        0.12        perf-profile.children.cycles-pp.update_process_times
      0.16 ±  5%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.clear_page_erms
      0.19 ±  3%      +0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.30 ±  5%      +0.0        0.34        perf-profile.children.cycles-pp.find_vma_prev
      0.45 ±  4%      +0.0        0.49        perf-profile.children.cycles-pp.get_page_from_freelist
      0.26 ±  9%      +0.0        0.31        perf-profile.children.cycles-pp.lru_gen_add_folio
      0.41 ±  3%      +0.0        0.45        perf-profile.children.cycles-pp.mas_walk
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.get_vma_policy
      0.52 ±  4%      +0.1        0.57        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.56 ±  4%      +0.1        0.61        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.63 ±  3%      +0.1        0.69        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.68 ±  3%      +0.1        0.75        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      0.38 ±  9%      +0.1        0.44        perf-profile.children.cycles-pp.lru_add
      0.96 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.unmap_page_range
      0.94 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.zap_page_range_single_batched
      0.94 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.alloc_anon_folio
      1.01 ±  4%      +0.1        1.11        perf-profile.children.cycles-pp.madvise_dontneed_free
      1.06 ±  4%      +0.1        1.17        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.01 ±223%      +0.1        0.12        perf-profile.children.cycles-pp.mm_needs_global_asid
      0.47 ±  5%      +0.1        0.59        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      1.49 ±  4%      +0.2        1.64        perf-profile.children.cycles-pp.madvise_do_behavior
      0.62 ±  5%      +0.4        0.97        perf-profile.children.cycles-pp.flush_tlb_func
      0.79 ±  6%      +0.4        1.17        perf-profile.children.cycles-pp.flush_tlb_mm_range
     32.30 ±  2%      +0.9       33.16        perf-profile.children.cycles-pp.__madvise
      0.21 ± 10%      -0.1        0.09 ±  5%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.13 ±  6%      -0.1        0.07 ± 10%  perf-profile.self.cycles-pp.css_rstat_updated
      0.05            +0.0        0.06        perf-profile.self.cycles-pp._raw_spin_trylock
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.zap_page_range_single_batched
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.find_vma_prev
      0.07 ±  6%      +0.0        0.09 ±  4%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.14 ±  4%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.unmap_page_range
      0.12 ±  6%      +0.0        0.14        perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.10 ±  8%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.lru_add
      0.21 ±  3%      +0.0        0.23 ±  2%  perf-profile.self.cycles-pp.handle_mm_fault
      0.18 ±  9%      +0.0        0.21        perf-profile.self.cycles-pp.lru_gen_add_folio
      0.39 ±  3%      +0.0        0.44        perf-profile.self.cycles-pp.mas_walk
      0.00            +0.1        0.12 ±  4%  perf-profile.self.cycles-pp.mm_needs_global_asid
      0.46 ±  5%      +0.1        0.58        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      0.11 ±  7%      +0.2        0.27        perf-profile.self.cycles-pp.flush_tlb_func




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2025-07-31  7:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202507310831.cf3e212e-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=cgroups@vger.kernel.org \
    --cc=inwardvessel@gmail.com \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.