From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: kernel test robot <oliver.sang@intel.com>,
"Harry Yoo (Oracle)" <harry@kernel.org>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, Hao Li <hao.li@linux.dev>,
linux-mm@kvack.org
Subject: Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
Date: Thu, 14 May 2026 16:45:22 +0200 [thread overview]
Message-ID: <90bf195e-45fb-423c-b686-49be9cadbd11@kernel.org> (raw)
In-Reply-To: <202605112204.9382cecf-lkp@intel.com>
On 5/11/26 16:45, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:
Yay for an optimization that was supposed to have no tradeoffs :)
> commit: 298cdbf5f7c9e19289f46710ed5ab3da4e711150 ("mm, slab: add an optimistic __slab_try_return_freelist()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [still regression on linux-next/master 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0]
>
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory
> parameters:
>
> nr_task: 100%
> mode: process
> test: mmap1
> cpufreq_governor: performance
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202605112204.9382cecf-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260511/202605112204.9382cecf-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/mmap1/will-it-scale
>
> commit:
> 1f7c8e1d52 ("mm/slub: defer freelist construction until after bulk allocation from a new slab")
> 298cdbf5f7 ("mm, slab: add an optimistic __slab_try_return_freelist()")
>
> 1f7c8e1d52428cbf 298cdbf5f7c9e19289f46710ed5
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 35417967 -6.3% 33202379 will-it-scale.192.processes
> 184468 -6.3% 172928 will-it-scale.per_process_ops
> 35417967 -6.3% 33202379 will-it-scale.workload
> 21873 ± 2% +8.0% 23628 vmstat.system.cs
> 3957 ± 19% -64.8% 1392 ± 13% perf-c2c.DRAM.local
> 1016 ± 20% -46.9% 540.33 ± 34% perf-c2c.DRAM.remote
> 0.71 -5.6% 0.67 turbostat.IPC
> 430.37 -1.2% 425.33 turbostat.PkgWatt
> 0.13 -0.0 0.12 mpstat.cpu.all.irq%
> 18.38 +1.4 19.79 mpstat.cpu.all.soft%
> 1.64 -0.1 1.54 mpstat.cpu.all.usr%
> 7.06 ± 14% +26.0% 8.89 ± 12% sched_debug.cfs_rq:/.load_avg.min
> 18788 ± 2% +7.2% 20147 sched_debug.cpu.nr_switches.avg
> 16273 ± 3% +7.6% 17503 sched_debug.cpu.nr_switches.min
> 3418653 +97.9% 6765264 numa-numastat.node0.local_node
> 3479092 +96.6% 6839697 numa-numastat.node0.numa_hit
> 4424809 ± 2% +79.1% 7922798 ± 2% numa-numastat.node1.local_node
> 4564748 ± 2% +76.3% 8048580 ± 2% numa-numastat.node1.numa_hit
> 3478940 +96.6% 6839185 numa-vmstat.node0.numa_hit
> 3418501 +97.9% 6764752 numa-vmstat.node0.numa_local
> 4565277 ± 2% +76.3% 8048347 ± 2% numa-vmstat.node1.numa_hit
> 4425338 ± 2% +79.0% 7922564 ± 2% numa-vmstat.node1.numa_local
> 189874 -2.4% 185345 proc-vmstat.nr_slab_unreclaimable
> 8051058 +85.0% 14892463 proc-vmstat.numa_hit
> 7850680 +87.1% 14692248 proc-vmstat.numa_local
> 25073981 +109.6% 52554060 proc-vmstat.pgalloc_normal
> 23597828 +116.6% 51111380 proc-vmstat.pgfree
Perhaps the weirdest part, the commit shouldn't be affecting page
allocations at all.
> 0.20 +8.9% 0.22 ± 2% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> 0.20 +8.9% 0.22 ± 2% perf-sched.total_sch_delay.average.ms
> 26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.total_wait_and_delay.average.ms
> 108336 ± 3% +9.0% 118099 perf-sched.total_wait_and_delay.count.ms
> 26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.total_wait_time.average.ms
> 26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> 108336 ± 3% +9.0% 118099 perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
> 26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
> 9.213e+10 -5.7% 8.687e+10 perf-stat.i.branch-instructions
> 1.098e+08 -6.7% 1.025e+08 perf-stat.i.branch-misses
> 14.49 ± 2% -1.4 13.11 ± 4% perf-stat.i.cache-miss-rate%
> 1.059e+08 ± 2% -12.9% 92245846 ± 5% perf-stat.i.cache-misses
> 7.389e+08 -3.6% 7.123e+08 perf-stat.i.cache-references
> 21981 ± 2% +7.8% 23693 perf-stat.i.context-switches
> 1.40 +6.2% 1.49 perf-stat.i.cpi
> 256.41 +3.0% 263.99 perf-stat.i.cpu-migrations
> 5815 ± 2% +15.3% 6707 ± 4% perf-stat.i.cycles-between-cache-misses
> 4.341e+11 -5.8% 4.089e+11 perf-stat.i.instructions
> 0.71 -5.8% 0.67 perf-stat.i.ipc
> 14.34 ± 2% -1.4 12.97 ± 4% perf-stat.overall.cache-miss-rate%
> 1.41 +6.2% 1.49 perf-stat.overall.cpi
> 5764 ± 2% +15.0% 6626 ± 4% perf-stat.overall.cycles-between-cache-misses
> 0.71 -5.9% 0.67 perf-stat.overall.ipc
> 9.183e+10 -5.7% 8.659e+10 perf-stat.ps.branch-instructions
> 1.095e+08 -6.7% 1.021e+08 perf-stat.ps.branch-misses
> 1.056e+08 ± 2% -12.8% 92081686 ± 5% perf-stat.ps.cache-misses
> 7.367e+08 -3.6% 7.102e+08 perf-stat.ps.cache-references
> 21840 ± 2% +8.1% 23604 perf-stat.ps.context-switches
> 253.30 +3.0% 260.93 perf-stat.ps.cpu-migrations
> 4.327e+11 -5.8% 4.076e+11 perf-stat.ps.instructions
> 1.312e+14 -5.9% 1.235e+14 perf-stat.total.instructions
> 12.77 -12.8 0.00 perf-profile.calltrace.cycles-pp.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
> 12.68 -12.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main
> 12.60 -12.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects
The zeroes here suggest the patch is working as expected, we're avoding
__slab_free() completely here because __slab_try_return_freelist() succeeds
reliably.
> 4.81 -0.5 4.28 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_replace_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
> 2.52 -0.4 2.09 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
> 7.46 -0.4 7.06 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
> 5.21 ± 2% -0.4 4.86 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 3.50 -0.3 3.19 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store
> 3.47 -0.3 3.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu
> 5.90 -0.3 5.60 perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 1.96 -0.3 1.70 perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
> 2.43 ± 4% -0.3 2.17 ± 4% perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap
> 2.25 ± 4% -0.2 2.00 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp
> 1.27 ± 10% -0.2 1.03 ± 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
> 2.66 ± 4% -0.2 2.44 ± 3% perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
> 5.57 ± 2% -0.2 5.37 perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
> 1.46 ± 8% -0.2 1.27 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp.do_vmi_align_munmap
> 1.27 ± 8% -0.2 1.08 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
> 5.06 ± 2% -0.2 4.88 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
> 2.48 -0.2 2.32 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs
> 2.51 -0.2 2.36 perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
> 2.00 -0.1 1.86 perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.24 -0.1 2.11 perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
> 2.89 -0.1 2.76 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
> 2.26 -0.1 2.12 perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
> 2.56 -0.1 2.43 perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas
> 2.13 -0.1 2.01 perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas
> 2.00 -0.1 1.88 perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
> 2.74 -0.1 2.62 perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
> 1.94 -0.1 1.82 perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64
> 1.67 -0.1 1.58 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff
> 1.44 -0.1 1.36 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
> 1.50 -0.1 1.42 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region
> 1.35 -0.1 1.28 perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area
> 1.38 -0.1 1.30 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap
> 1.58 -0.1 1.52 perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
> 1.44 -0.1 1.39 perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff
> 0.62 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap
> 0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap
> 1.06 -0.0 1.03 perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap
> 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags
> 0.64 -0.0 0.62 ± 2% perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 0.64 -0.0 0.62 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
> 0.88 +0.1 0.98 perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
> 0.55 ± 5% +0.1 0.66 ± 2% perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> 0.51 +0.1 0.64 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap
> 1.52 ± 5% +0.3 1.84 ± 2% perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
> 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
> 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
> 2.17 ± 5% +0.4 2.60 ± 2% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 0.00 +0.6 0.60 ± 8% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
> 0.00 +0.6 0.62 ± 5% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
> 12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
> 12.83 +1.0 13.82 perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node
> 12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main
> 12.81 +1.0 13.81 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
> 12.82 +1.0 13.82 perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore
> 12.87 +1.0 13.87 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects
> 12.81 +1.0 13.82 perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 10.64 +1.3 11.92 perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
> 11.72 +1.6 13.30 perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs
> 11.63 +1.6 13.23 perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core
> 11.33 +1.6 12.94 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch
> 11.20 +1.6 12.82 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf
> 22.90 +14.2 37.13 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main
> 22.99 +14.3 37.26 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
Taking the spin_lock isn't avoided (and isn't supposed to be), we're just
not doing it from __slab_free() but from __refill_objects_node(). Should be
the same frequency and same amount of work under the lock (maybe less as
there's no double cmpxchg under the spin lock), we just avoid walking
freelist (outside of any lock). So it should be the same or better.
> 27.87 -10.5 17.33 perf-profile.children.cycles-pp.__slab_free
> 7.27 -0.8 6.48 perf-profile.children.cycles-pp.barn_get_empty_sheaf
> 5.91 -0.6 5.32 perf-profile.children.cycles-pp.barn_replace_empty_sheaf
> 10.36 -0.5 9.82 perf-profile.children.cycles-pp.mas_wr_node_store
> 7.58 -0.4 7.18 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
> 3.98 -0.4 3.61 perf-profile.children.cycles-pp.barn_put_full_sheaf
> 4.70 -0.4 4.34 perf-profile.children.cycles-pp.__kfree_rcu_sheaf
> 5.91 -0.3 5.60 perf-profile.children.cycles-pp.unmap_region
> 5.19 -0.3 4.90 perf-profile.children.cycles-pp.kvfree_call_rcu
> 94.66 -0.3 94.41 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 94.52 -0.2 94.27 perf-profile.children.cycles-pp.do_syscall_64
> 5.57 ± 2% -0.2 5.37 perf-profile.children.cycles-pp.mas_store_prealloc
> 2.06 -0.2 1.89 perf-profile.children.cycles-pp.__get_unmapped_area
> 2.96 -0.2 2.80 perf-profile.children.cycles-pp.__pi_memcpy
> 2.55 -0.2 2.40 perf-profile.children.cycles-pp.free_pgtables
> 2.25 -0.1 2.12 perf-profile.children.cycles-pp.vms_gather_munmap_vmas
> 2.26 -0.1 2.13 perf-profile.children.cycles-pp.free_pgd_range
> 2.58 -0.1 2.44 perf-profile.children.cycles-pp.zap_pmd_range
> 2.90 -0.1 2.77 perf-profile.children.cycles-pp.unmap_vmas
> 2.14 -0.1 2.02 perf-profile.children.cycles-pp.free_p4d_range
> 2.75 -0.1 2.63 perf-profile.children.cycles-pp.__zap_vma_range
> 2.01 -0.1 1.89 perf-profile.children.cycles-pp.free_pud_range
> 1.94 -0.1 1.82 perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
> 1.69 -0.1 1.59 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
> 1.26 -0.1 1.18 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 1.42 -0.1 1.34 perf-profile.children.cycles-pp.mas_find
> 1.38 -0.1 1.30 perf-profile.children.cycles-pp.vm_unmapped_area
> 1.36 -0.1 1.28 perf-profile.children.cycles-pp.unmapped_area_topdown
> 0.98 -0.1 0.92 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.58 -0.1 1.52 perf-profile.children.cycles-pp.__mmap_complete
> 1.46 -0.1 1.40 perf-profile.children.cycles-pp.perf_event_mmap
> 0.64 -0.0 0.60 perf-profile.children.cycles-pp.mas_prev_slot
> 0.54 -0.0 0.50 perf-profile.children.cycles-pp.mas_wr_store_type
> 1.08 -0.0 1.04 perf-profile.children.cycles-pp.perf_event_mmap_event
> 0.63 -0.0 0.59 perf-profile.children.cycles-pp.mas_empty_area_rev
> 0.54 -0.0 0.50 perf-profile.children.cycles-pp.__vma_start_write
> 0.54 -0.0 0.52 perf-profile.children.cycles-pp.mas_next_slot
> 0.43 -0.0 0.41 perf-profile.children.cycles-pp.mas_rev_awalk
> 0.53 -0.0 0.51 perf-profile.children.cycles-pp.mas_walk
> 0.64 -0.0 0.62 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
> 0.36 -0.0 0.33 perf-profile.children.cycles-pp.__vma_start_exclude_readers
> 0.29 -0.0 0.27 perf-profile.children.cycles-pp.__rcu_free_sheaf_prepare
> 0.44 -0.0 0.42 perf-profile.children.cycles-pp.vma_merge_new_range
> 0.65 -0.0 0.63 perf-profile.children.cycles-pp.perf_iterate_sb
> 0.24 -0.0 0.22 perf-profile.children.cycles-pp.security_vm_enough_memory_mm
> 0.07 ± 5% -0.0 0.05 perf-profile.children.cycles-pp.mmap_region
> 0.14 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.mas_prev
> 0.30 -0.0 0.28 perf-profile.children.cycles-pp.up_read
> 0.30 -0.0 0.29 perf-profile.children.cycles-pp.vma_set_page_prot
> 0.07 -0.0 0.06 ± 9% perf-profile.children.cycles-pp.unlink_file_vma_batch_add
> 0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.__alloc_empty_sheaf
> 0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.remove_vma
> 0.24 -0.0 0.22 perf-profile.children.cycles-pp.downgrade_write
> 0.20 ± 3% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.mas_wr_walk_descend
> 0.28 -0.0 0.27 perf-profile.children.cycles-pp.down_write_killable
> 0.18 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.vma_wants_writenotify
> 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__kmalloc_noprof
> 0.20 -0.0 0.19 perf-profile.children.cycles-pp.tlb_gather_mmu
> 0.07 -0.0 0.06 perf-profile.children.cycles-pp.__call_rcu_common
> 0.16 -0.0 0.15 perf-profile.children.cycles-pp.may_expand_vm
> 0.14 -0.0 0.13 perf-profile.children.cycles-pp.up_write
> 0.12 -0.0 0.11 perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.15 -0.0 0.14 perf-profile.children.cycles-pp.vma_is_shared_writable
> 0.11 -0.0 0.10 perf-profile.children.cycles-pp.x64_sys_call
> 0.88 +0.1 0.99 perf-profile.children.cycles-pp.vm_area_alloc
> 0.36 +0.1 0.50 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.children.cycles-pp.run_ksoftirqd
> 2.17 ± 5% +0.4 2.60 ± 2% perf-profile.children.cycles-pp.smpboot_thread_fn
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.kthread
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork
> 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm
> 0.69 ± 5% +0.5 1.22 ± 4% perf-profile.children.cycles-pp.alloc_from_new_slab
> 15.26 +1.2 16.44 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> 18.16 +1.3 19.49 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 18.14 +1.3 19.47 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 18.02 +1.3 19.36 perf-profile.children.cycles-pp.__irq_exit_rcu
> 60.59 +1.5 62.05 perf-profile.children.cycles-pp.__pcs_replace_empty_main
> 61.51 +1.6 63.10 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
> 20.16 +1.8 21.94 perf-profile.children.cycles-pp.rcu_core
> 20.18 +1.8 21.95 perf-profile.children.cycles-pp.handle_softirqs
> 20.14 +1.8 21.92 perf-profile.children.cycles-pp.rcu_do_batch
> 50.97 +2.0 52.95 perf-profile.children.cycles-pp.__refill_objects_node
> 15.07 +2.2 17.25 perf-profile.children.cycles-pp.__kmem_cache_free_bulk
> 15.67 +2.2 17.86 perf-profile.children.cycles-pp.rcu_free_sheaf
> 65.82 +2.4 68.25 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 65.33 +2.5 67.80 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 51.67 +2.5 54.18 perf-profile.children.cycles-pp.refill_objects
> 2.16 -0.5 1.66 perf-profile.self.cycles-pp.__refill_objects_node
> 2.64 -0.2 2.48 perf-profile.self.cycles-pp.__pi_memcpy
> 2.36 -0.1 2.22 perf-profile.self.cycles-pp.zap_pmd_range
> 1.84 -0.1 1.71 perf-profile.self.cycles-pp.free_pud_range
> 1.78 -0.1 1.66 perf-profile.self.cycles-pp.__mmap_region
> 0.52 -0.1 0.40 perf-profile.self.cycles-pp.__slab_free
> 1.44 -0.1 1.34 perf-profile.self.cycles-pp.mas_wr_node_store
> 0.98 -0.1 0.92 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.84 -0.1 0.78 perf-profile.self.cycles-pp.mas_store_gfp
> 0.55 -0.0 0.50 perf-profile.self.cycles-pp.do_vmi_align_munmap
> 0.58 -0.0 0.54 perf-profile.self.cycles-pp.mas_prev_slot
> 0.62 -0.0 0.58 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.17 -0.0 0.13 perf-profile.self.cycles-pp.do_mmap
> 0.48 -0.0 0.44 perf-profile.self.cycles-pp.__mmap
> 0.49 -0.0 0.45 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.35 -0.0 0.32 perf-profile.self.cycles-pp.vms_gather_munmap_vmas
> 0.18 -0.0 0.15 ± 2% perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
> 0.42 -0.0 0.39 perf-profile.self.cycles-pp.__munmap
> 0.49 -0.0 0.46 perf-profile.self.cycles-pp.mas_next_slot
> 0.48 -0.0 0.45 perf-profile.self.cycles-pp.perf_iterate_sb
> 0.10 -0.0 0.07 ± 5% perf-profile.self.cycles-pp.__get_unmapped_area
> 0.40 -0.0 0.37 perf-profile.self.cycles-pp.mas_rev_awalk
> 0.48 -0.0 0.45 perf-profile.self.cycles-pp.mas_walk
> 0.31 -0.0 0.28 perf-profile.self.cycles-pp.kmem_cache_free
> 0.37 -0.0 0.34 perf-profile.self.cycles-pp.__vm_munmap
> 0.30 -0.0 0.28 perf-profile.self.cycles-pp.__vma_start_exclude_readers
> 0.31 -0.0 0.29 perf-profile.self.cycles-pp.mas_wr_store_type
> 0.36 -0.0 0.34 perf-profile.self.cycles-pp.perf_event_mmap
> 0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_preallocate
> 0.68 -0.0 0.66 perf-profile.self.cycles-pp.mas_update_gap
> 0.32 -0.0 0.30 perf-profile.self.cycles-pp.vma_merge_new_range
> 0.29 -0.0 0.27 perf-profile.self.cycles-pp.__rcu_free_sheaf_prepare
> 0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_find
> 0.18 -0.0 0.16 perf-profile.self.cycles-pp.rcu_free_sheaf
> 0.33 -0.0 0.31 perf-profile.self.cycles-pp.vm_area_alloc
> 0.07 -0.0 0.05 perf-profile.self.cycles-pp.unlink_file_vma_batch_add
> 0.38 -0.0 0.36 perf-profile.self.cycles-pp.mas_store_prealloc
> 0.26 -0.0 0.24 perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
> 0.25 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.up_read
> 0.26 -0.0 0.24 perf-profile.self.cycles-pp.down_write_killable
> 0.22 ± 2% -0.0 0.20 perf-profile.self.cycles-pp.downgrade_write
> 0.26 -0.0 0.25 perf-profile.self.cycles-pp.__kfree_rcu_sheaf
> 0.20 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.vms_complete_munmap_vmas
> 0.18 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.mas_wr_walk_descend
> 0.19 ± 2% -0.0 0.18 perf-profile.self.cycles-pp.do_syscall_64
> 0.38 -0.0 0.37 perf-profile.self.cycles-pp.unmapped_area_topdown
> 0.17 -0.0 0.16 perf-profile.self.cycles-pp.__vma_start_write
> 0.13 -0.0 0.12 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.10 -0.0 0.09 perf-profile.self.cycles-pp.free_pgd_range
> 0.07 -0.0 0.06 perf-profile.self.cycles-pp.mas_prev
> 0.17 -0.0 0.16 perf-profile.self.cycles-pp.tlb_finish_mmu
> 0.12 -0.0 0.11 perf-profile.self.cycles-pp.security_vm_enough_memory_mm
> 0.06 -0.0 0.05 perf-profile.self.cycles-pp.mmap_region
> 0.43 +0.1 0.50 perf-profile.self.cycles-pp.kvfree_call_rcu
> 0.29 +0.1 0.41 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> 65.33 +2.5 67.80 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
And yet it seems we spend more time spinning as a result. Can't explain it
by what the patch does, so could it be just some code cache layout effect?
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
next prev parent reply other threads:[~2026-05-14 14:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 14:45 [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression kernel test robot
2026-05-14 14:45 ` Vlastimil Babka (SUSE) [this message]
2026-05-14 16:00 ` Vlastimil Babka (SUSE)
2026-05-14 16:02 ` Vlastimil Babka (SUSE)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=90bf195e-45fb-423c-b686-49be9cadbd11@kernel.org \
--to=vbabka@kernel.org \
--cc=hao.li@linux.dev \
--cc=harry@kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox