All of lore.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [llist]  375700bab5:  will-it-scale.per_thread_ops 2.6% regression
@ 2025-08-15  7:36 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-08-15  7:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, oliver.sang



Hello,


kernel test robot noticed a 2.6% regression of will-it-scale.per_thread_ops on:


commit: 375700bab5b150e876e42d894a9a7470881f8a61 ("llist: make llist_add_batch() a static inline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on      linus/master 8742b2d8935f476449ef37e263bc4da3295c7b58]
[still regression on linux-next/master 2674d1eadaa2fd3a918dfcdb6d0bb49efe8a8bb9]

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:

	nr_task: 100%
	mode: thread
	test: tlb_flush3
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508150803.d5387224-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250815/202508150803.d5387224-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-csl-2sp10/tlb_flush3/will-it-scale

commit: 
  5ef2dccfcc ("delayacct: remove redundant code and adjust indentation")
  375700bab5 ("llist: make llist_add_batch() a static inline")

5ef2dccfcca8d864 375700bab5b150e876e42d894a9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    118225 ±  2%      -6.0%     111161        perf-c2c.HITM.total
 1.926e+08            -2.5%  1.878e+08        proc-vmstat.pgfault
     14579            -2.2%      14264        vmstat.system.cs
    579287            -2.6%     564220        will-it-scale.192.threads
      1.98            -2.9%       1.92        will-it-scale.192.threads_idle
      3016            -2.6%       2938        will-it-scale.per_thread_ops
    579287            -2.6%     564220        will-it-scale.workload
      0.33 ± 19%     +34.2%       0.44 ±  6%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.79 ±  9%     -44.9%       2.64 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     28.30 ±  3%      +9.9%      31.10 ±  4%  perf-sched.total_wait_and_delay.average.ms
     71544 ±  2%     -12.6%      62531 ±  3%  perf-sched.total_wait_and_delay.count.ms
     28.21 ±  3%      +9.9%      31.00 ±  4%  perf-sched.total_wait_time.average.ms
     47.56 ±115%    +220.4%     152.39 ± 11%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      3197 ±  5%     -13.6%       2761 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      4324 ± 16%     -28.8%       3079 ±  2%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.30 ± 73%     -73.6%       0.08 ±109%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
     47.48 ±115%    +220.3%     152.08 ± 11%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      9.36            +4.5%       9.77        perf-stat.i.MPKI
 1.427e+10            -4.5%  1.362e+10        perf-stat.i.branch-instructions
      0.97            +0.0        1.02        perf-stat.i.branch-miss-rate%
     34.20            +0.7       34.87        perf-stat.i.cache-miss-rate%
 1.753e+09            -1.5%  1.727e+09        perf-stat.i.cache-references
     14678            -2.6%      14293        perf-stat.i.context-switches
      9.07            +3.8%       9.42        perf-stat.i.cpi
    556.91 ±  2%      -4.6%     531.43        perf-stat.i.cpu-migrations
 6.398e+10            -4.0%  6.145e+10        perf-stat.i.instructions
      6.62            -2.8%       6.44        perf-stat.i.metric.K/sec
    635521            -2.7%     618322        perf-stat.i.minor-faults
    635521            -2.7%     618322        perf-stat.i.page-faults
     27.27           -27.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     26.31           -26.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
     12.12           -12.1        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     11.53           -11.5        0.00        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
     11.39           -11.4        0.00        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
     11.36           -11.4        0.00        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
     13.84            -0.3       13.54        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
     48.02            +0.2       48.21        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior.do_madvise
     47.88            +0.2       48.07        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
     47.89            +0.2       48.08        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior
      4.21            +5.9       10.09        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      4.19            +5.9       10.08        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      8.00           +11.0       18.97        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
      8.02           +11.0       19.03        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      8.11           +11.1       19.25        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     54.16           -54.2        0.00        perf-profile.children.cycles-pp.llist_add_batch
     21.03            -0.5       20.54        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     20.82            -0.5       20.37        perf-profile.children.cycles-pp.__sysvec_call_function
     21.06            -0.4       20.62        perf-profile.children.cycles-pp.sysvec_call_function
     22.05            -0.4       21.64        perf-profile.children.cycles-pp.asm_sysvec_call_function
     14.88            -0.4       14.52        perf-profile.children.cycles-pp.llist_reverse_order
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.common_startup_64
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.do_idle
      0.49 ±  4%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.start_secondary
      0.42 ±  3%      -0.1        0.35 ±  8%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.children.cycles-pp.intel_idle
      0.48 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.21            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.22 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.40 ±  2%      -0.0        0.36 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.29 ±  5%      -0.0        0.26 ±  5%  perf-profile.children.cycles-pp.madvise_lock
      0.22 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.sysvec_call_function_single
      0.52 ±  2%      -0.0        0.48 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.44 ±  3%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.32 ±  2%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.44 ±  2%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.12 ±  3%      -0.0        0.10 ±  8%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
      0.24            +0.0        0.26        perf-profile.children.cycles-pp.next_uptodate_folio
      0.49            +0.0        0.53 ±  2%  perf-profile.children.cycles-pp.should_flush_tlb
     48.07            +0.2       48.25        perf-profile.children.cycles-pp.unmap_page_range
     47.94            +0.2       48.12        perf-profile.children.cycles-pp.zap_pmd_range
     47.93            +0.2       48.12        perf-profile.children.cycles-pp.zap_pte_range
     41.92           -41.9        0.00        perf-profile.self.cycles-pp.llist_add_batch
     14.87            -0.4       14.51        perf-profile.self.cycles-pp.llist_reverse_order
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.intel_idle
      0.18 ±  2%      +0.0        0.19        perf-profile.self.cycles-pp.next_uptodate_folio
      0.14 ±  2%      +0.0        0.16        perf-profile.self.cycles-pp.filemap_map_pages
      0.36 ±  2%      +0.0        0.40 ±  3%  perf-profile.self.cycles-pp.should_flush_tlb
     29.83           +42.5       72.37        perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-08-15  7:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15  7:36 [linus:master] [llist] 375700bab5: will-it-scale.per_thread_ops 2.6% regression kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.