[linus:master] [llist] 375700bab5: will-it-scale.per_thread_ops 2.6% regression

All of lore.kernel.org
 help / color / mirror / Atom feed

From: kernel test robot <oliver.sang@intel.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	<oliver.sang@intel.com>
Subject: [linus:master] [llist]  375700bab5:  will-it-scale.per_thread_ops 2.6% regression
Date: Fri, 15 Aug 2025 15:36:00 +0800	[thread overview]
Message-ID: <202508150803.d5387224-lkp@intel.com> (raw)



Hello,


kernel test robot noticed a 2.6% regression of will-it-scale.per_thread_ops on:


commit: 375700bab5b150e876e42d894a9a7470881f8a61 ("llist: make llist_add_batch() a static inline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on      linus/master 8742b2d8935f476449ef37e263bc4da3295c7b58]
[still regression on linux-next/master 2674d1eadaa2fd3a918dfcdb6d0bb49efe8a8bb9]

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:

	nr_task: 100%
	mode: thread
	test: tlb_flush3
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508150803.d5387224-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250815/202508150803.d5387224-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-csl-2sp10/tlb_flush3/will-it-scale

commit: 
  5ef2dccfcc ("delayacct: remove redundant code and adjust indentation")
  375700bab5 ("llist: make llist_add_batch() a static inline")

5ef2dccfcca8d864 375700bab5b150e876e42d894a9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    118225 ±  2%      -6.0%     111161        perf-c2c.HITM.total
 1.926e+08            -2.5%  1.878e+08        proc-vmstat.pgfault
     14579            -2.2%      14264        vmstat.system.cs
    579287            -2.6%     564220        will-it-scale.192.threads
      1.98            -2.9%       1.92        will-it-scale.192.threads_idle
      3016            -2.6%       2938        will-it-scale.per_thread_ops
    579287            -2.6%     564220        will-it-scale.workload
      0.33 ± 19%     +34.2%       0.44 ±  6%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.79 ±  9%     -44.9%       2.64 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     28.30 ±  3%      +9.9%      31.10 ±  4%  perf-sched.total_wait_and_delay.average.ms
     71544 ±  2%     -12.6%      62531 ±  3%  perf-sched.total_wait_and_delay.count.ms
     28.21 ±  3%      +9.9%      31.00 ±  4%  perf-sched.total_wait_time.average.ms
     47.56 ±115%    +220.4%     152.39 ± 11%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      3197 ±  5%     -13.6%       2761 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      4324 ± 16%     -28.8%       3079 ±  2%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.30 ± 73%     -73.6%       0.08 ±109%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
     47.48 ±115%    +220.3%     152.08 ± 11%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      9.36            +4.5%       9.77        perf-stat.i.MPKI
 1.427e+10            -4.5%  1.362e+10        perf-stat.i.branch-instructions
      0.97            +0.0        1.02        perf-stat.i.branch-miss-rate%
     34.20            +0.7       34.87        perf-stat.i.cache-miss-rate%
 1.753e+09            -1.5%  1.727e+09        perf-stat.i.cache-references
     14678            -2.6%      14293        perf-stat.i.context-switches
      9.07            +3.8%       9.42        perf-stat.i.cpi
    556.91 ±  2%      -4.6%     531.43        perf-stat.i.cpu-migrations
 6.398e+10            -4.0%  6.145e+10        perf-stat.i.instructions
      6.62            -2.8%       6.44        perf-stat.i.metric.K/sec
    635521            -2.7%     618322        perf-stat.i.minor-faults
    635521            -2.7%     618322        perf-stat.i.page-faults
     27.27           -27.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     26.31           -26.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
     12.12           -12.1        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     11.53           -11.5        0.00        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
     11.39           -11.4        0.00        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
     11.36           -11.4        0.00        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
     13.84            -0.3       13.54        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
     48.02            +0.2       48.21        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior.do_madvise
     47.88            +0.2       48.07        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
     47.89            +0.2       48.08        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior
      4.21            +5.9       10.09        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      4.19            +5.9       10.08        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      8.00           +11.0       18.97        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
      8.02           +11.0       19.03        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      8.11           +11.1       19.25        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     54.16           -54.2        0.00        perf-profile.children.cycles-pp.llist_add_batch
     21.03            -0.5       20.54        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     20.82            -0.5       20.37        perf-profile.children.cycles-pp.__sysvec_call_function
     21.06            -0.4       20.62        perf-profile.children.cycles-pp.sysvec_call_function
     22.05            -0.4       21.64        perf-profile.children.cycles-pp.asm_sysvec_call_function
     14.88            -0.4       14.52        perf-profile.children.cycles-pp.llist_reverse_order
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.common_startup_64
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.do_idle
      0.49 ±  4%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.start_secondary
      0.42 ±  3%      -0.1        0.35 ±  8%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.children.cycles-pp.intel_idle
      0.48 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.21            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.22 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.40 ±  2%      -0.0        0.36 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.29 ±  5%      -0.0        0.26 ±  5%  perf-profile.children.cycles-pp.madvise_lock
      0.22 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.sysvec_call_function_single
      0.52 ±  2%      -0.0        0.48 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.44 ±  3%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.32 ±  2%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.44 ±  2%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.12 ±  3%      -0.0        0.10 ±  8%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
      0.24            +0.0        0.26        perf-profile.children.cycles-pp.next_uptodate_folio
      0.49            +0.0        0.53 ±  2%  perf-profile.children.cycles-pp.should_flush_tlb
     48.07            +0.2       48.25        perf-profile.children.cycles-pp.unmap_page_range
     47.94            +0.2       48.12        perf-profile.children.cycles-pp.zap_pmd_range
     47.93            +0.2       48.12        perf-profile.children.cycles-pp.zap_pte_range
     41.92           -41.9        0.00        perf-profile.self.cycles-pp.llist_add_batch
     14.87            -0.4       14.51        perf-profile.self.cycles-pp.llist_reverse_order
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.intel_idle
      0.18 ±  2%      +0.0        0.19        perf-profile.self.cycles-pp.next_uptodate_folio
      0.14 ±  2%      +0.0        0.16        perf-profile.self.cycles-pp.filemap_map_pages
      0.36 ±  2%      +0.0        0.40 ±  3%  perf-profile.self.cycles-pp.should_flush_tlb
     29.83           +42.5       72.37        perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

                 reply	other threads:[~2025-08-15  7:36 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202508150803.d5387224-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.