cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Tejun Heo <tj@kernel.org>, "JP Kobryn" <inwardvessel@gmail.com>,
	<cgroups@vger.kernel.org>, <oliver.sang@intel.com>
Subject: [linux-next:master] [cgroup]  36df6e3dbd: will-it-scale.per_process_ops 2.9% improvement
Date: Thu, 31 Jul 2025 15:31:46 +0800	[thread overview]
Message-ID: <202507310831.cf3e212e-lkp@intel.com> (raw)



Hello,

kernel test robot noticed a 2.9% improvement of will-it-scale.per_process_ops on:


commit: 36df6e3dbd7e7b074e55fec080012184e2fa3a46 ("cgroup: make css_rstat_updated nmi safe")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V  CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:

	nr_task: 100%
	mode: process
	test: tlb_flush2
	cpufreq_governor: performance


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250731/202507310831.cf3e212e-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp3/tlb_flush2/will-it-scale

commit: 
  1257b8786a ("cgroup: support to enable nmi-safe css_rstat_updated")
  36df6e3dbd ("cgroup: make css_rstat_updated nmi safe")

1257b8786ac689a2 36df6e3dbd7e7b074e55fec0800 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.78 ±  2%      +0.3        3.11        mpstat.cpu.all.usr%
    522283 ± 32%     +29.1%     674402 ±  5%  sched_debug.cpu.avg_idle.min
  11822911            +2.9%   12170263        will-it-scale.192.processes
     61577            +2.9%      63386        will-it-scale.per_process_ops
  11822911            +2.9%   12170263        will-it-scale.workload
      2.98 ± 11%     -25.3%       2.23 ± 31%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      4312 ± 11%     +13.1%       4878        perf-sched.total_wait_and_delay.max.ms
      4312 ± 11%     +13.1%       4878        perf-sched.total_wait_time.max.ms
    320.37 ±104%    +191.2%     932.90 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    365.55 ± 83%    +155.2%     932.79 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      2.98 ± 11%     -32.0%       2.03 ± 32%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     18415            +2.9%      18955        proc-vmstat.nr_kernel_stack
 1.791e+09            +3.1%  1.848e+09        proc-vmstat.nr_unaccepted
  3.59e+09            +3.0%  3.697e+09        proc-vmstat.numa_interleave
 3.589e+09            +3.0%  3.697e+09        proc-vmstat.pgalloc_dma32
  7.12e+09            +3.0%  7.334e+09        proc-vmstat.pglazyfree
 3.589e+09            +3.0%  3.696e+09        proc-vmstat.pgskip_device
 2.716e+10            +1.7%  2.761e+10        perf-stat.i.branch-instructions
      0.15            +0.0        0.15        perf-stat.i.branch-miss-rate%
  38117449            +3.6%   39497918        perf-stat.i.branch-misses
      4.20            -1.1%       4.16        perf-stat.i.cpi
      0.24            +1.1%       0.24        perf-stat.i.ipc
    245.58            +3.0%     252.83        perf-stat.i.metric.K/sec
  23582407            +2.9%   24272602        perf-stat.i.minor-faults
  23582407            +2.9%   24272602        perf-stat.i.page-faults
      0.14            +0.0        0.14        perf-stat.overall.branch-miss-rate%
      4.21            -1.1%       4.16        perf-stat.overall.cpi
   3359915            -1.8%    3300246        perf-stat.overall.path-length
 2.706e+10            +1.7%  2.752e+10        perf-stat.ps.branch-instructions
  37940559            +3.7%   39340939        perf-stat.ps.branch-misses
  23496794            +2.9%   24189927        perf-stat.ps.minor-faults
  23496794            +2.9%   24189927        perf-stat.ps.page-faults
 3.972e+13            +1.1%  4.016e+13        perf-stat.total.instructions
     58.13            -1.6       56.50        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault
     58.24            -1.6       56.62        perf-profile.calltrace.cycles-pp.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     59.95            -1.5       58.46        perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     60.58            -1.4       59.16        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     61.08            -1.4       59.69        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     62.15            -1.3       60.87        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     62.26            -1.3       60.99        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     28.72            -1.1       27.64        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
     28.74            -1.1       27.66        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.do_anonymous_page
     28.74            -1.1       27.66        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault
     66.84            -0.9       65.93        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     67.47            -0.8       66.62        perf-profile.calltrace.cycles-pp.testcase
     28.90            -0.6       28.27        perf-profile.calltrace.cycles-pp.folios_put_refs.folio_batch_move_lru.folio_add_lru.do_anonymous_page.__handle_mm_fault
      0.54 ±  4%      +0.1        0.59        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
      0.60 ±  4%      +0.1        0.66        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
      0.68 ±  3%      +0.1        0.74        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.84 ±  4%      +0.1        0.92        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single_batched.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior
      0.94 ±  4%      +0.1        1.03        perf-profile.calltrace.cycles-pp.zap_page_range_single_batched.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior.do_madvise
      1.00 ±  4%      +0.1        1.10        perf-profile.calltrace.cycles-pp.madvise_dontneed_free.madvise_vma_behavior.madvise_do_behavior.do_madvise.__x64_sys_madvise
      0.92 ±  3%      +0.1        1.02        perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.06 ±  4%      +0.1        1.16        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.madvise_do_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
      1.48 ±  4%      +0.2        1.64        perf-profile.calltrace.cycles-pp.madvise_do_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±100%      +0.3        0.55        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.60 ±  6%      +0.3        0.95        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise
      0.77 ±  5%      +0.4        1.14        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64
      0.08 ±223%      +0.5        0.59        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.do_madvise
     32.13 ±  2%      +0.8       32.98        perf-profile.calltrace.cycles-pp.__madvise
     58.20            -1.6       56.57        perf-profile.children.cycles-pp.folio_batch_move_lru
     58.30            -1.6       56.68        perf-profile.children.cycles-pp.folio_add_lru
     59.98            -1.5       58.50        perf-profile.children.cycles-pp.do_anonymous_page
     60.60            -1.4       59.18        perf-profile.children.cycles-pp.__handle_mm_fault
     61.12            -1.4       59.73        perf-profile.children.cycles-pp.handle_mm_fault
     62.19            -1.3       60.90        perf-profile.children.cycles-pp.do_user_addr_fault
     62.28            -1.3       61.01        perf-profile.children.cycles-pp.exc_page_fault
     67.07            -0.9       66.15        perf-profile.children.cycles-pp.asm_exc_page_fault
      0.14 ±  5%      -0.1        0.09 ±  4%  perf-profile.children.cycles-pp.css_rstat_updated
      0.11 ±  3%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.handle_internal_command
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.main
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.run_builtin
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.__cmd_record
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.cmd_record
      0.11            -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.perf_mmap__push
      0.10 ±  3%      +0.0        0.11        perf-profile.children.cycles-pp.access_error
      0.11 ±  3%      +0.0        0.12        perf-profile.children.cycles-pp.update_process_times
      0.16 ±  5%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.clear_page_erms
      0.19 ±  3%      +0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.30 ±  5%      +0.0        0.34        perf-profile.children.cycles-pp.find_vma_prev
      0.45 ±  4%      +0.0        0.49        perf-profile.children.cycles-pp.get_page_from_freelist
      0.26 ±  9%      +0.0        0.31        perf-profile.children.cycles-pp.lru_gen_add_folio
      0.41 ±  3%      +0.0        0.45        perf-profile.children.cycles-pp.mas_walk
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.get_vma_policy
      0.52 ±  4%      +0.1        0.57        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.56 ±  4%      +0.1        0.61        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.63 ±  3%      +0.1        0.69        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.68 ±  3%      +0.1        0.75        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      0.38 ±  9%      +0.1        0.44        perf-profile.children.cycles-pp.lru_add
      0.96 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.unmap_page_range
      0.94 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.zap_page_range_single_batched
      0.94 ±  4%      +0.1        1.04        perf-profile.children.cycles-pp.alloc_anon_folio
      1.01 ±  4%      +0.1        1.11        perf-profile.children.cycles-pp.madvise_dontneed_free
      1.06 ±  4%      +0.1        1.17        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.01 ±223%      +0.1        0.12        perf-profile.children.cycles-pp.mm_needs_global_asid
      0.47 ±  5%      +0.1        0.59        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      1.49 ±  4%      +0.2        1.64        perf-profile.children.cycles-pp.madvise_do_behavior
      0.62 ±  5%      +0.4        0.97        perf-profile.children.cycles-pp.flush_tlb_func
      0.79 ±  6%      +0.4        1.17        perf-profile.children.cycles-pp.flush_tlb_mm_range
     32.30 ±  2%      +0.9       33.16        perf-profile.children.cycles-pp.__madvise
      0.21 ± 10%      -0.1        0.09 ±  5%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.13 ±  6%      -0.1        0.07 ± 10%  perf-profile.self.cycles-pp.css_rstat_updated
      0.05            +0.0        0.06        perf-profile.self.cycles-pp._raw_spin_trylock
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.zap_page_range_single_batched
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.find_vma_prev
      0.07 ±  6%      +0.0        0.09 ±  4%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.14 ±  4%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.unmap_page_range
      0.12 ±  6%      +0.0        0.14        perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.10 ±  8%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.lru_add
      0.21 ±  3%      +0.0        0.23 ±  2%  perf-profile.self.cycles-pp.handle_mm_fault
      0.18 ±  9%      +0.0        0.21        perf-profile.self.cycles-pp.lru_gen_add_folio
      0.39 ±  3%      +0.0        0.44        perf-profile.self.cycles-pp.mas_walk
      0.00            +0.1        0.12 ±  4%  perf-profile.self.cycles-pp.mm_needs_global_asid
      0.46 ±  5%      +0.1        0.58        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      0.11 ±  7%      +0.2        0.27        perf-profile.self.cycles-pp.flush_tlb_func




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


                 reply	other threads:[~2025-07-31  7:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202507310831.cf3e212e-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=cgroups@vger.kernel.org \
    --cc=inwardvessel@gmail.com \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).