* [linus:master] [llist] 375700bab5: will-it-scale.per_thread_ops 2.6% regression
@ 2025-08-15 7:36 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-08-15 7:36 UTC (permalink / raw)
To: Jens Axboe; +Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, oliver.sang
Hello,
kernel test robot noticed a 2.6% regression of will-it-scale.per_thread_ops on:
commit: 375700bab5b150e876e42d894a9a7470881f8a61 ("llist: make llist_add_batch() a static inline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master 8742b2d8935f476449ef37e263bc4da3295c7b58]
[still regression on linux-next/master 2674d1eadaa2fd3a918dfcdb6d0bb49efe8a8bb9]
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:
nr_task: 100%
mode: thread
test: tlb_flush3
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508150803.d5387224-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250815/202508150803.d5387224-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-csl-2sp10/tlb_flush3/will-it-scale
commit:
5ef2dccfcc ("delayacct: remove redundant code and adjust indentation")
375700bab5 ("llist: make llist_add_batch() a static inline")
5ef2dccfcca8d864 375700bab5b150e876e42d894a9
---------------- ---------------------------
%stddev %change %stddev
\ | \
118225 ± 2% -6.0% 111161 perf-c2c.HITM.total
1.926e+08 -2.5% 1.878e+08 proc-vmstat.pgfault
14579 -2.2% 14264 vmstat.system.cs
579287 -2.6% 564220 will-it-scale.192.threads
1.98 -2.9% 1.92 will-it-scale.192.threads_idle
3016 -2.6% 2938 will-it-scale.per_thread_ops
579287 -2.6% 564220 will-it-scale.workload
0.33 ± 19% +34.2% 0.44 ± 6% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
4.79 ± 9% -44.9% 2.64 ± 67% perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
28.30 ± 3% +9.9% 31.10 ± 4% perf-sched.total_wait_and_delay.average.ms
71544 ± 2% -12.6% 62531 ± 3% perf-sched.total_wait_and_delay.count.ms
28.21 ± 3% +9.9% 31.00 ± 4% perf-sched.total_wait_time.average.ms
47.56 ±115% +220.4% 152.39 ± 11% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
3197 ± 5% -13.6% 2761 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
4324 ± 16% -28.8% 3079 ± 2% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.30 ± 73% -73.6% 0.08 ±109% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
47.48 ±115% +220.3% 152.08 ± 11% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
9.36 +4.5% 9.77 perf-stat.i.MPKI
1.427e+10 -4.5% 1.362e+10 perf-stat.i.branch-instructions
0.97 +0.0 1.02 perf-stat.i.branch-miss-rate%
34.20 +0.7 34.87 perf-stat.i.cache-miss-rate%
1.753e+09 -1.5% 1.727e+09 perf-stat.i.cache-references
14678 -2.6% 14293 perf-stat.i.context-switches
9.07 +3.8% 9.42 perf-stat.i.cpi
556.91 ± 2% -4.6% 531.43 perf-stat.i.cpu-migrations
6.398e+10 -4.0% 6.145e+10 perf-stat.i.instructions
6.62 -2.8% 6.44 perf-stat.i.metric.K/sec
635521 -2.7% 618322 perf-stat.i.minor-faults
635521 -2.7% 618322 perf-stat.i.page-faults
27.27 -27.3 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
26.31 -26.3 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
12.12 -12.1 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
11.53 -11.5 0.00 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
11.39 -11.4 0.00 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
11.36 -11.4 0.00 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
13.84 -0.3 13.54 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
48.02 +0.2 48.21 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior.do_madvise
47.88 +0.2 48.07 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
47.89 +0.2 48.08 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior
4.21 +5.9 10.09 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
4.19 +5.9 10.08 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
8.00 +11.0 18.97 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
8.02 +11.0 19.03 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
8.11 +11.1 19.25 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
54.16 -54.2 0.00 perf-profile.children.cycles-pp.llist_add_batch
21.03 -0.5 20.54 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
20.82 -0.5 20.37 perf-profile.children.cycles-pp.__sysvec_call_function
21.06 -0.4 20.62 perf-profile.children.cycles-pp.sysvec_call_function
22.05 -0.4 21.64 perf-profile.children.cycles-pp.asm_sysvec_call_function
14.88 -0.4 14.52 perf-profile.children.cycles-pp.llist_reverse_order
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.common_startup_64
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.cpu_startup_entry
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.do_idle
0.49 ± 4% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.start_secondary
0.42 ± 3% -0.1 0.35 ± 8% perf-profile.children.cycles-pp.cpuidle_idle_call
0.40 ± 3% -0.1 0.34 ± 7% perf-profile.children.cycles-pp.cpuidle_enter
0.40 ± 3% -0.1 0.34 ± 7% perf-profile.children.cycles-pp.cpuidle_enter_state
0.23 ± 4% -0.0 0.18 ± 6% perf-profile.children.cycles-pp.intel_idle
0.48 ± 2% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.21 -0.0 0.17 ± 2% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.22 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.40 ± 2% -0.0 0.36 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.29 ± 5% -0.0 0.26 ± 5% perf-profile.children.cycles-pp.madvise_lock
0.22 ± 2% -0.0 0.18 perf-profile.children.cycles-pp.sysvec_call_function_single
0.52 ± 2% -0.0 0.48 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.44 ± 3% -0.0 0.41 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.32 ± 2% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.update_process_times
0.44 ± 2% -0.0 0.41 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.12 ± 3% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.24 +0.0 0.26 perf-profile.children.cycles-pp.next_uptodate_folio
0.49 +0.0 0.53 ± 2% perf-profile.children.cycles-pp.should_flush_tlb
48.07 +0.2 48.25 perf-profile.children.cycles-pp.unmap_page_range
47.94 +0.2 48.12 perf-profile.children.cycles-pp.zap_pmd_range
47.93 +0.2 48.12 perf-profile.children.cycles-pp.zap_pte_range
41.92 -41.9 0.00 perf-profile.self.cycles-pp.llist_add_batch
14.87 -0.4 14.51 perf-profile.self.cycles-pp.llist_reverse_order
0.23 ± 4% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.intel_idle
0.18 ± 2% +0.0 0.19 perf-profile.self.cycles-pp.next_uptodate_folio
0.14 ± 2% +0.0 0.16 perf-profile.self.cycles-pp.filemap_map_pages
0.36 ± 2% +0.0 0.40 ± 3% perf-profile.self.cycles-pp.should_flush_tlb
29.83 +42.5 72.37 perf-profile.self.cycles-pp.smp_call_function_many_cond
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-08-15 7:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 7:36 [linus:master] [llist] 375700bab5: will-it-scale.per_thread_ops 2.6% regression kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.