* [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
@ 2026-05-11 14:45 kernel test robot
2026-05-14 14:45 ` Vlastimil Babka (SUSE)
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2026-05-11 14:45 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: oe-lkp, lkp, Hao Li, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:
commit: 298cdbf5f7c9e19289f46710ed5ab3da4e711150 ("mm, slab: add an optimistic __slab_try_return_freelist()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[still regression on linux-next/master 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0]
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:
nr_task: 100%
mode: process
test: mmap1
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202605112204.9382cecf-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260511/202605112204.9382cecf-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/mmap1/will-it-scale
commit:
1f7c8e1d52 ("mm/slub: defer freelist construction until after bulk allocation from a new slab")
298cdbf5f7 ("mm, slab: add an optimistic __slab_try_return_freelist()")
1f7c8e1d52428cbf 298cdbf5f7c9e19289f46710ed5
---------------- ---------------------------
%stddev %change %stddev
\ | \
35417967 -6.3% 33202379 will-it-scale.192.processes
184468 -6.3% 172928 will-it-scale.per_process_ops
35417967 -6.3% 33202379 will-it-scale.workload
21873 ± 2% +8.0% 23628 vmstat.system.cs
3957 ± 19% -64.8% 1392 ± 13% perf-c2c.DRAM.local
1016 ± 20% -46.9% 540.33 ± 34% perf-c2c.DRAM.remote
0.71 -5.6% 0.67 turbostat.IPC
430.37 -1.2% 425.33 turbostat.PkgWatt
0.13 -0.0 0.12 mpstat.cpu.all.irq%
18.38 +1.4 19.79 mpstat.cpu.all.soft%
1.64 -0.1 1.54 mpstat.cpu.all.usr%
7.06 ± 14% +26.0% 8.89 ± 12% sched_debug.cfs_rq:/.load_avg.min
18788 ± 2% +7.2% 20147 sched_debug.cpu.nr_switches.avg
16273 ± 3% +7.6% 17503 sched_debug.cpu.nr_switches.min
3418653 +97.9% 6765264 numa-numastat.node0.local_node
3479092 +96.6% 6839697 numa-numastat.node0.numa_hit
4424809 ± 2% +79.1% 7922798 ± 2% numa-numastat.node1.local_node
4564748 ± 2% +76.3% 8048580 ± 2% numa-numastat.node1.numa_hit
3478940 +96.6% 6839185 numa-vmstat.node0.numa_hit
3418501 +97.9% 6764752 numa-vmstat.node0.numa_local
4565277 ± 2% +76.3% 8048347 ± 2% numa-vmstat.node1.numa_hit
4425338 ± 2% +79.0% 7922564 ± 2% numa-vmstat.node1.numa_local
189874 -2.4% 185345 proc-vmstat.nr_slab_unreclaimable
8051058 +85.0% 14892463 proc-vmstat.numa_hit
7850680 +87.1% 14692248 proc-vmstat.numa_local
25073981 +109.6% 52554060 proc-vmstat.pgalloc_normal
23597828 +116.6% 51111380 proc-vmstat.pgfree
0.20 +8.9% 0.22 ± 2% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
0.20 +8.9% 0.22 ± 2% perf-sched.total_sch_delay.average.ms
26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.total_wait_and_delay.average.ms
108336 ± 3% +9.0% 118099 perf-sched.total_wait_and_delay.count.ms
26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.total_wait_time.average.ms
26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
108336 ± 3% +9.0% 118099 perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
9.213e+10 -5.7% 8.687e+10 perf-stat.i.branch-instructions
1.098e+08 -6.7% 1.025e+08 perf-stat.i.branch-misses
14.49 ± 2% -1.4 13.11 ± 4% perf-stat.i.cache-miss-rate%
1.059e+08 ± 2% -12.9% 92245846 ± 5% perf-stat.i.cache-misses
7.389e+08 -3.6% 7.123e+08 perf-stat.i.cache-references
21981 ± 2% +7.8% 23693 perf-stat.i.context-switches
1.40 +6.2% 1.49 perf-stat.i.cpi
256.41 +3.0% 263.99 perf-stat.i.cpu-migrations
5815 ± 2% +15.3% 6707 ± 4% perf-stat.i.cycles-between-cache-misses
4.341e+11 -5.8% 4.089e+11 perf-stat.i.instructions
0.71 -5.8% 0.67 perf-stat.i.ipc
14.34 ± 2% -1.4 12.97 ± 4% perf-stat.overall.cache-miss-rate%
1.41 +6.2% 1.49 perf-stat.overall.cpi
5764 ± 2% +15.0% 6626 ± 4% perf-stat.overall.cycles-between-cache-misses
0.71 -5.9% 0.67 perf-stat.overall.ipc
9.183e+10 -5.7% 8.659e+10 perf-stat.ps.branch-instructions
1.095e+08 -6.7% 1.021e+08 perf-stat.ps.branch-misses
1.056e+08 ± 2% -12.8% 92081686 ± 5% perf-stat.ps.cache-misses
7.367e+08 -3.6% 7.102e+08 perf-stat.ps.cache-references
21840 ± 2% +8.1% 23604 perf-stat.ps.context-switches
253.30 +3.0% 260.93 perf-stat.ps.cpu-migrations
4.327e+11 -5.8% 4.076e+11 perf-stat.ps.instructions
1.312e+14 -5.9% 1.235e+14 perf-stat.total.instructions
12.77 -12.8 0.00 perf-profile.calltrace.cycles-pp.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
12.68 -12.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main
12.60 -12.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects
4.81 -0.5 4.28 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_replace_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
2.52 -0.4 2.09 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
7.46 -0.4 7.06 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
5.21 ± 2% -0.4 4.86 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
3.50 -0.3 3.19 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store
3.47 -0.3 3.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu
5.90 -0.3 5.60 perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.96 -0.3 1.70 perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
2.43 ± 4% -0.3 2.17 ± 4% perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap
2.25 ± 4% -0.2 2.00 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp
1.27 ± 10% -0.2 1.03 ± 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
2.66 ± 4% -0.2 2.44 ± 3% perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
5.57 ± 2% -0.2 5.37 perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
1.46 ± 8% -0.2 1.27 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp.do_vmi_align_munmap
1.27 ± 8% -0.2 1.08 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
5.06 ± 2% -0.2 4.88 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
2.48 -0.2 2.32 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs
2.51 -0.2 2.36 perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
2.00 -0.1 1.86 perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.24 -0.1 2.11 perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
2.89 -0.1 2.76 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
2.26 -0.1 2.12 perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
2.56 -0.1 2.43 perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas
2.13 -0.1 2.01 perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas
2.00 -0.1 1.88 perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
2.74 -0.1 2.62 perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
1.94 -0.1 1.82 perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64
1.67 -0.1 1.58 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff
1.44 -0.1 1.36 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
1.50 -0.1 1.42 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region
1.35 -0.1 1.28 perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area
1.38 -0.1 1.30 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap
1.58 -0.1 1.52 perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
1.44 -0.1 1.39 perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff
0.62 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap
0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap
1.06 -0.0 1.03 perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap
0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags
0.64 -0.0 0.62 ± 2% perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.64 -0.0 0.62 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
0.88 +0.1 0.98 perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
0.55 ± 5% +0.1 0.66 ± 2% perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
0.51 +0.1 0.64 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap
1.52 ± 5% +0.3 1.84 ± 2% perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
2.17 ± 5% +0.4 2.60 ± 2% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
0.00 +0.6 0.60 ± 8% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
0.00 +0.6 0.62 ± 5% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
12.83 +1.0 13.82 perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node
12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main
12.81 +1.0 13.81 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
12.82 +1.0 13.82 perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore
12.87 +1.0 13.87 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects
12.81 +1.0 13.82 perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
10.64 +1.3 11.92 perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
11.72 +1.6 13.30 perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs
11.63 +1.6 13.23 perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core
11.33 +1.6 12.94 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch
11.20 +1.6 12.82 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf
22.90 +14.2 37.13 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main
22.99 +14.3 37.26 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
27.87 -10.5 17.33 perf-profile.children.cycles-pp.__slab_free
7.27 -0.8 6.48 perf-profile.children.cycles-pp.barn_get_empty_sheaf
5.91 -0.6 5.32 perf-profile.children.cycles-pp.barn_replace_empty_sheaf
10.36 -0.5 9.82 perf-profile.children.cycles-pp.mas_wr_node_store
7.58 -0.4 7.18 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
3.98 -0.4 3.61 perf-profile.children.cycles-pp.barn_put_full_sheaf
4.70 -0.4 4.34 perf-profile.children.cycles-pp.__kfree_rcu_sheaf
5.91 -0.3 5.60 perf-profile.children.cycles-pp.unmap_region
5.19 -0.3 4.90 perf-profile.children.cycles-pp.kvfree_call_rcu
94.66 -0.3 94.41 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
94.52 -0.2 94.27 perf-profile.children.cycles-pp.do_syscall_64
5.57 ± 2% -0.2 5.37 perf-profile.children.cycles-pp.mas_store_prealloc
2.06 -0.2 1.89 perf-profile.children.cycles-pp.__get_unmapped_area
2.96 -0.2 2.80 perf-profile.children.cycles-pp.__pi_memcpy
2.55 -0.2 2.40 perf-profile.children.cycles-pp.free_pgtables
2.25 -0.1 2.12 perf-profile.children.cycles-pp.vms_gather_munmap_vmas
2.26 -0.1 2.13 perf-profile.children.cycles-pp.free_pgd_range
2.58 -0.1 2.44 perf-profile.children.cycles-pp.zap_pmd_range
2.90 -0.1 2.77 perf-profile.children.cycles-pp.unmap_vmas
2.14 -0.1 2.02 perf-profile.children.cycles-pp.free_p4d_range
2.75 -0.1 2.63 perf-profile.children.cycles-pp.__zap_vma_range
2.01 -0.1 1.89 perf-profile.children.cycles-pp.free_pud_range
1.94 -0.1 1.82 perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
1.69 -0.1 1.59 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
1.26 -0.1 1.18 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.42 -0.1 1.34 perf-profile.children.cycles-pp.mas_find
1.38 -0.1 1.30 perf-profile.children.cycles-pp.vm_unmapped_area
1.36 -0.1 1.28 perf-profile.children.cycles-pp.unmapped_area_topdown
0.98 -0.1 0.92 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.58 -0.1 1.52 perf-profile.children.cycles-pp.__mmap_complete
1.46 -0.1 1.40 perf-profile.children.cycles-pp.perf_event_mmap
0.64 -0.0 0.60 perf-profile.children.cycles-pp.mas_prev_slot
0.54 -0.0 0.50 perf-profile.children.cycles-pp.mas_wr_store_type
1.08 -0.0 1.04 perf-profile.children.cycles-pp.perf_event_mmap_event
0.63 -0.0 0.59 perf-profile.children.cycles-pp.mas_empty_area_rev
0.54 -0.0 0.50 perf-profile.children.cycles-pp.__vma_start_write
0.54 -0.0 0.52 perf-profile.children.cycles-pp.mas_next_slot
0.43 -0.0 0.41 perf-profile.children.cycles-pp.mas_rev_awalk
0.53 -0.0 0.51 perf-profile.children.cycles-pp.mas_walk
0.64 -0.0 0.62 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.36 -0.0 0.33 perf-profile.children.cycles-pp.__vma_start_exclude_readers
0.29 -0.0 0.27 perf-profile.children.cycles-pp.__rcu_free_sheaf_prepare
0.44 -0.0 0.42 perf-profile.children.cycles-pp.vma_merge_new_range
0.65 -0.0 0.63 perf-profile.children.cycles-pp.perf_iterate_sb
0.24 -0.0 0.22 perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.07 ± 5% -0.0 0.05 perf-profile.children.cycles-pp.mmap_region
0.14 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.mas_prev
0.30 -0.0 0.28 perf-profile.children.cycles-pp.up_read
0.30 -0.0 0.29 perf-profile.children.cycles-pp.vma_set_page_prot
0.07 -0.0 0.06 ± 9% perf-profile.children.cycles-pp.unlink_file_vma_batch_add
0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.__alloc_empty_sheaf
0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.remove_vma
0.24 -0.0 0.22 perf-profile.children.cycles-pp.downgrade_write
0.20 ± 3% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.mas_wr_walk_descend
0.28 -0.0 0.27 perf-profile.children.cycles-pp.down_write_killable
0.18 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.vma_wants_writenotify
0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__kmalloc_noprof
0.20 -0.0 0.19 perf-profile.children.cycles-pp.tlb_gather_mmu
0.07 -0.0 0.06 perf-profile.children.cycles-pp.__call_rcu_common
0.16 -0.0 0.15 perf-profile.children.cycles-pp.may_expand_vm
0.14 -0.0 0.13 perf-profile.children.cycles-pp.up_write
0.12 -0.0 0.11 perf-profile.children.cycles-pp.hrtimer_interrupt
0.15 -0.0 0.14 perf-profile.children.cycles-pp.vma_is_shared_writable
0.11 -0.0 0.10 perf-profile.children.cycles-pp.x64_sys_call
0.88 +0.1 0.99 perf-profile.children.cycles-pp.vm_area_alloc
0.36 +0.1 0.50 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
2.16 ± 5% +0.4 2.59 ± 2% perf-profile.children.cycles-pp.run_ksoftirqd
2.17 ± 5% +0.4 2.60 ± 2% perf-profile.children.cycles-pp.smpboot_thread_fn
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.kthread
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork
2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm
0.69 ± 5% +0.5 1.22 ± 4% perf-profile.children.cycles-pp.alloc_from_new_slab
15.26 +1.2 16.44 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
18.16 +1.3 19.49 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
18.14 +1.3 19.47 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
18.02 +1.3 19.36 perf-profile.children.cycles-pp.__irq_exit_rcu
60.59 +1.5 62.05 perf-profile.children.cycles-pp.__pcs_replace_empty_main
61.51 +1.6 63.10 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
20.16 +1.8 21.94 perf-profile.children.cycles-pp.rcu_core
20.18 +1.8 21.95 perf-profile.children.cycles-pp.handle_softirqs
20.14 +1.8 21.92 perf-profile.children.cycles-pp.rcu_do_batch
50.97 +2.0 52.95 perf-profile.children.cycles-pp.__refill_objects_node
15.07 +2.2 17.25 perf-profile.children.cycles-pp.__kmem_cache_free_bulk
15.67 +2.2 17.86 perf-profile.children.cycles-pp.rcu_free_sheaf
65.82 +2.4 68.25 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
65.33 +2.5 67.80 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
51.67 +2.5 54.18 perf-profile.children.cycles-pp.refill_objects
2.16 -0.5 1.66 perf-profile.self.cycles-pp.__refill_objects_node
2.64 -0.2 2.48 perf-profile.self.cycles-pp.__pi_memcpy
2.36 -0.1 2.22 perf-profile.self.cycles-pp.zap_pmd_range
1.84 -0.1 1.71 perf-profile.self.cycles-pp.free_pud_range
1.78 -0.1 1.66 perf-profile.self.cycles-pp.__mmap_region
0.52 -0.1 0.40 perf-profile.self.cycles-pp.__slab_free
1.44 -0.1 1.34 perf-profile.self.cycles-pp.mas_wr_node_store
0.98 -0.1 0.92 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.84 -0.1 0.78 perf-profile.self.cycles-pp.mas_store_gfp
0.55 -0.0 0.50 perf-profile.self.cycles-pp.do_vmi_align_munmap
0.58 -0.0 0.54 perf-profile.self.cycles-pp.mas_prev_slot
0.62 -0.0 0.58 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.17 -0.0 0.13 perf-profile.self.cycles-pp.do_mmap
0.48 -0.0 0.44 perf-profile.self.cycles-pp.__mmap
0.49 -0.0 0.45 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.35 -0.0 0.32 perf-profile.self.cycles-pp.vms_gather_munmap_vmas
0.18 -0.0 0.15 ± 2% perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
0.42 -0.0 0.39 perf-profile.self.cycles-pp.__munmap
0.49 -0.0 0.46 perf-profile.self.cycles-pp.mas_next_slot
0.48 -0.0 0.45 perf-profile.self.cycles-pp.perf_iterate_sb
0.10 -0.0 0.07 ± 5% perf-profile.self.cycles-pp.__get_unmapped_area
0.40 -0.0 0.37 perf-profile.self.cycles-pp.mas_rev_awalk
0.48 -0.0 0.45 perf-profile.self.cycles-pp.mas_walk
0.31 -0.0 0.28 perf-profile.self.cycles-pp.kmem_cache_free
0.37 -0.0 0.34 perf-profile.self.cycles-pp.__vm_munmap
0.30 -0.0 0.28 perf-profile.self.cycles-pp.__vma_start_exclude_readers
0.31 -0.0 0.29 perf-profile.self.cycles-pp.mas_wr_store_type
0.36 -0.0 0.34 perf-profile.self.cycles-pp.perf_event_mmap
0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_preallocate
0.68 -0.0 0.66 perf-profile.self.cycles-pp.mas_update_gap
0.32 -0.0 0.30 perf-profile.self.cycles-pp.vma_merge_new_range
0.29 -0.0 0.27 perf-profile.self.cycles-pp.__rcu_free_sheaf_prepare
0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_find
0.18 -0.0 0.16 perf-profile.self.cycles-pp.rcu_free_sheaf
0.33 -0.0 0.31 perf-profile.self.cycles-pp.vm_area_alloc
0.07 -0.0 0.05 perf-profile.self.cycles-pp.unlink_file_vma_batch_add
0.38 -0.0 0.36 perf-profile.self.cycles-pp.mas_store_prealloc
0.26 -0.0 0.24 perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
0.25 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.up_read
0.26 -0.0 0.24 perf-profile.self.cycles-pp.down_write_killable
0.22 ± 2% -0.0 0.20 perf-profile.self.cycles-pp.downgrade_write
0.26 -0.0 0.25 perf-profile.self.cycles-pp.__kfree_rcu_sheaf
0.20 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.vms_complete_munmap_vmas
0.18 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.mas_wr_walk_descend
0.19 ± 2% -0.0 0.18 perf-profile.self.cycles-pp.do_syscall_64
0.38 -0.0 0.37 perf-profile.self.cycles-pp.unmapped_area_topdown
0.17 -0.0 0.16 perf-profile.self.cycles-pp.__vma_start_write
0.13 -0.0 0.12 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.10 -0.0 0.09 perf-profile.self.cycles-pp.free_pgd_range
0.07 -0.0 0.06 perf-profile.self.cycles-pp.mas_prev
0.17 -0.0 0.16 perf-profile.self.cycles-pp.tlb_finish_mmu
0.12 -0.0 0.11 perf-profile.self.cycles-pp.security_vm_enough_memory_mm
0.06 -0.0 0.05 perf-profile.self.cycles-pp.mmap_region
0.43 +0.1 0.50 perf-profile.self.cycles-pp.kvfree_call_rcu
0.29 +0.1 0.41 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
65.33 +2.5 67.80 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression 2026-05-11 14:45 [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression kernel test robot @ 2026-05-14 14:45 ` Vlastimil Babka (SUSE) 2026-05-14 16:00 ` Vlastimil Babka (SUSE) 0 siblings, 1 reply; 4+ messages in thread From: Vlastimil Babka (SUSE) @ 2026-05-14 14:45 UTC (permalink / raw) To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm On 5/11/26 16:45, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on: Yay for an optimization that was supposed to have no tradeoffs :) > commit: 298cdbf5f7c9e19289f46710ed5ab3da4e711150 ("mm, slab: add an optimistic __slab_try_return_freelist()") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > [still regression on linux-next/master 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0] > > testcase: will-it-scale > config: x86_64-rhel-9.4 > compiler: gcc-14 > test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory > parameters: > > nr_task: 100% > mode: process > test: mmap1 > cpufreq_governor: performance > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202605112204.9382cecf-lkp@intel.com > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20260511/202605112204.9382cecf-lkp@intel.com > > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/mmap1/will-it-scale > > commit: > 1f7c8e1d52 ("mm/slub: defer freelist construction until after bulk allocation from a new slab") > 298cdbf5f7 ("mm, slab: add an optimistic __slab_try_return_freelist()") > > 1f7c8e1d52428cbf 298cdbf5f7c9e19289f46710ed5 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 35417967 -6.3% 33202379 will-it-scale.192.processes > 184468 -6.3% 172928 will-it-scale.per_process_ops > 35417967 -6.3% 33202379 will-it-scale.workload > 21873 ± 2% +8.0% 23628 vmstat.system.cs > 3957 ± 19% -64.8% 1392 ± 13% perf-c2c.DRAM.local > 1016 ± 20% -46.9% 540.33 ± 34% perf-c2c.DRAM.remote > 0.71 -5.6% 0.67 turbostat.IPC > 430.37 -1.2% 425.33 turbostat.PkgWatt > 0.13 -0.0 0.12 mpstat.cpu.all.irq% > 18.38 +1.4 19.79 mpstat.cpu.all.soft% > 1.64 -0.1 1.54 mpstat.cpu.all.usr% > 7.06 ± 14% +26.0% 8.89 ± 12% sched_debug.cfs_rq:/.load_avg.min > 18788 ± 2% +7.2% 20147 sched_debug.cpu.nr_switches.avg > 16273 ± 3% +7.6% 17503 sched_debug.cpu.nr_switches.min > 3418653 +97.9% 6765264 numa-numastat.node0.local_node > 3479092 +96.6% 6839697 numa-numastat.node0.numa_hit > 4424809 ± 2% +79.1% 7922798 ± 2% numa-numastat.node1.local_node > 4564748 ± 2% +76.3% 8048580 ± 2% numa-numastat.node1.numa_hit > 3478940 +96.6% 6839185 numa-vmstat.node0.numa_hit > 3418501 +97.9% 6764752 numa-vmstat.node0.numa_local > 4565277 ± 2% +76.3% 8048347 ± 2% numa-vmstat.node1.numa_hit > 4425338 ± 2% +79.0% 7922564 ± 2% numa-vmstat.node1.numa_local > 189874 -2.4% 185345 proc-vmstat.nr_slab_unreclaimable > 8051058 +85.0% 14892463 proc-vmstat.numa_hit > 7850680 +87.1% 14692248 proc-vmstat.numa_local > 25073981 +109.6% 52554060 proc-vmstat.pgalloc_normal > 23597828 +116.6% 51111380 proc-vmstat.pgfree Perhaps the weirdest part, the commit shouldn't be affecting page allocations at all. > 0.20 +8.9% 0.22 ± 2% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 0.20 +8.9% 0.22 ± 2% perf-sched.total_sch_delay.average.ms > 26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.total_wait_and_delay.average.ms > 108336 ± 3% +9.0% 118099 perf-sched.total_wait_and_delay.count.ms > 26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.total_wait_time.average.ms > 26.54 ± 2% -12.1% 23.32 ± 2% perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 108336 ± 3% +9.0% 118099 perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown] > 26.34 ± 2% -12.3% 23.11 ± 2% perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 9.213e+10 -5.7% 8.687e+10 perf-stat.i.branch-instructions > 1.098e+08 -6.7% 1.025e+08 perf-stat.i.branch-misses > 14.49 ± 2% -1.4 13.11 ± 4% perf-stat.i.cache-miss-rate% > 1.059e+08 ± 2% -12.9% 92245846 ± 5% perf-stat.i.cache-misses > 7.389e+08 -3.6% 7.123e+08 perf-stat.i.cache-references > 21981 ± 2% +7.8% 23693 perf-stat.i.context-switches > 1.40 +6.2% 1.49 perf-stat.i.cpi > 256.41 +3.0% 263.99 perf-stat.i.cpu-migrations > 5815 ± 2% +15.3% 6707 ± 4% perf-stat.i.cycles-between-cache-misses > 4.341e+11 -5.8% 4.089e+11 perf-stat.i.instructions > 0.71 -5.8% 0.67 perf-stat.i.ipc > 14.34 ± 2% -1.4 12.97 ± 4% perf-stat.overall.cache-miss-rate% > 1.41 +6.2% 1.49 perf-stat.overall.cpi > 5764 ± 2% +15.0% 6626 ± 4% perf-stat.overall.cycles-between-cache-misses > 0.71 -5.9% 0.67 perf-stat.overall.ipc > 9.183e+10 -5.7% 8.659e+10 perf-stat.ps.branch-instructions > 1.095e+08 -6.7% 1.021e+08 perf-stat.ps.branch-misses > 1.056e+08 ± 2% -12.8% 92081686 ± 5% perf-stat.ps.cache-misses > 7.367e+08 -3.6% 7.102e+08 perf-stat.ps.cache-references > 21840 ± 2% +8.1% 23604 perf-stat.ps.context-switches > 253.30 +3.0% 260.93 perf-stat.ps.cpu-migrations > 4.327e+11 -5.8% 4.076e+11 perf-stat.ps.instructions > 1.312e+14 -5.9% 1.235e+14 perf-stat.total.instructions > 12.77 -12.8 0.00 perf-profile.calltrace.cycles-pp.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof > 12.68 -12.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main > 12.60 -12.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects The zeroes here suggest the patch is working as expected, we're avoding __slab_free() completely here because __slab_try_return_freelist() succeeds reliably. > 4.81 -0.5 4.28 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_replace_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof > 2.52 -0.4 2.09 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof > 7.46 -0.4 7.06 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap > 5.21 ± 2% -0.4 4.86 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 3.50 -0.3 3.19 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store > 3.47 -0.3 3.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu > 5.90 -0.3 5.60 perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 1.96 -0.3 1.70 perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu > 2.43 ± 4% -0.3 2.17 ± 4% perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap > 2.25 ± 4% -0.2 2.00 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp > 1.27 ± 10% -0.2 1.03 ± 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate > 2.66 ± 4% -0.2 2.44 ± 3% perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap > 5.57 ± 2% -0.2 5.37 perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff > 1.46 ± 8% -0.2 1.27 ± 4% perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp.do_vmi_align_munmap > 1.27 ± 8% -0.2 1.08 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp > 5.06 ± 2% -0.2 4.88 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap > 2.48 -0.2 2.32 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs > 2.51 -0.2 2.36 perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap > 2.00 -0.1 1.86 perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 2.24 -0.1 2.11 perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap > 2.89 -0.1 2.76 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap > 2.26 -0.1 2.12 perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap > 2.56 -0.1 2.43 perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas > 2.13 -0.1 2.01 perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas > 2.00 -0.1 1.88 perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region > 2.74 -0.1 2.62 perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap > 1.94 -0.1 1.82 perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64 > 1.67 -0.1 1.58 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff > 1.44 -0.1 1.36 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap > 1.50 -0.1 1.42 perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region > 1.35 -0.1 1.28 perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area > 1.38 -0.1 1.30 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap > 1.58 -0.1 1.52 perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64 > 1.44 -0.1 1.39 perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff > 0.62 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap > 0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap > 1.06 -0.0 1.03 perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags > 0.64 -0.0 0.62 ± 2% perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 0.64 -0.0 0.62 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region > 0.88 +0.1 0.98 perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff > 0.55 ± 5% +0.1 0.66 ± 2% perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd > 0.51 +0.1 0.64 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap > 1.52 ± 5% +0.3 1.84 ± 2% perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd > 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork > 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread > 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn > 2.17 ± 5% +0.4 2.60 ± 2% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork_asm > 0.00 +0.6 0.60 ± 8% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate > 0.00 +0.6 0.62 ± 5% perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp > 12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof > 12.83 +1.0 13.82 perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node > 12.88 +1.0 13.87 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main > 12.81 +1.0 13.81 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt > 12.82 +1.0 13.82 perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore > 12.87 +1.0 13.87 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects > 12.81 +1.0 13.82 perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt > 10.64 +1.3 11.92 perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu > 11.72 +1.6 13.30 perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs > 11.63 +1.6 13.23 perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core > 11.33 +1.6 12.94 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch > 11.20 +1.6 12.82 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf > 22.90 +14.2 37.13 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main > 22.99 +14.3 37.26 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof Taking the spin_lock isn't avoided (and isn't supposed to be), we're just not doing it from __slab_free() but from __refill_objects_node(). Should be the same frequency and same amount of work under the lock (maybe less as there's no double cmpxchg under the spin lock), we just avoid walking freelist (outside of any lock). So it should be the same or better. > 27.87 -10.5 17.33 perf-profile.children.cycles-pp.__slab_free > 7.27 -0.8 6.48 perf-profile.children.cycles-pp.barn_get_empty_sheaf > 5.91 -0.6 5.32 perf-profile.children.cycles-pp.barn_replace_empty_sheaf > 10.36 -0.5 9.82 perf-profile.children.cycles-pp.mas_wr_node_store > 7.58 -0.4 7.18 perf-profile.children.cycles-pp.vms_complete_munmap_vmas > 3.98 -0.4 3.61 perf-profile.children.cycles-pp.barn_put_full_sheaf > 4.70 -0.4 4.34 perf-profile.children.cycles-pp.__kfree_rcu_sheaf > 5.91 -0.3 5.60 perf-profile.children.cycles-pp.unmap_region > 5.19 -0.3 4.90 perf-profile.children.cycles-pp.kvfree_call_rcu > 94.66 -0.3 94.41 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 94.52 -0.2 94.27 perf-profile.children.cycles-pp.do_syscall_64 > 5.57 ± 2% -0.2 5.37 perf-profile.children.cycles-pp.mas_store_prealloc > 2.06 -0.2 1.89 perf-profile.children.cycles-pp.__get_unmapped_area > 2.96 -0.2 2.80 perf-profile.children.cycles-pp.__pi_memcpy > 2.55 -0.2 2.40 perf-profile.children.cycles-pp.free_pgtables > 2.25 -0.1 2.12 perf-profile.children.cycles-pp.vms_gather_munmap_vmas > 2.26 -0.1 2.13 perf-profile.children.cycles-pp.free_pgd_range > 2.58 -0.1 2.44 perf-profile.children.cycles-pp.zap_pmd_range > 2.90 -0.1 2.77 perf-profile.children.cycles-pp.unmap_vmas > 2.14 -0.1 2.02 perf-profile.children.cycles-pp.free_p4d_range > 2.75 -0.1 2.63 perf-profile.children.cycles-pp.__zap_vma_range > 2.01 -0.1 1.89 perf-profile.children.cycles-pp.free_pud_range > 1.94 -0.1 1.82 perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags > 1.69 -0.1 1.59 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown > 1.26 -0.1 1.18 perf-profile.children.cycles-pp.entry_SYSCALL_64 > 1.42 -0.1 1.34 perf-profile.children.cycles-pp.mas_find > 1.38 -0.1 1.30 perf-profile.children.cycles-pp.vm_unmapped_area > 1.36 -0.1 1.28 perf-profile.children.cycles-pp.unmapped_area_topdown > 0.98 -0.1 0.92 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 1.58 -0.1 1.52 perf-profile.children.cycles-pp.__mmap_complete > 1.46 -0.1 1.40 perf-profile.children.cycles-pp.perf_event_mmap > 0.64 -0.0 0.60 perf-profile.children.cycles-pp.mas_prev_slot > 0.54 -0.0 0.50 perf-profile.children.cycles-pp.mas_wr_store_type > 1.08 -0.0 1.04 perf-profile.children.cycles-pp.perf_event_mmap_event > 0.63 -0.0 0.59 perf-profile.children.cycles-pp.mas_empty_area_rev > 0.54 -0.0 0.50 perf-profile.children.cycles-pp.__vma_start_write > 0.54 -0.0 0.52 perf-profile.children.cycles-pp.mas_next_slot > 0.43 -0.0 0.41 perf-profile.children.cycles-pp.mas_rev_awalk > 0.53 -0.0 0.51 perf-profile.children.cycles-pp.mas_walk > 0.64 -0.0 0.62 ± 2% perf-profile.children.cycles-pp.kmem_cache_free > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.__vma_start_exclude_readers > 0.29 -0.0 0.27 perf-profile.children.cycles-pp.__rcu_free_sheaf_prepare > 0.44 -0.0 0.42 perf-profile.children.cycles-pp.vma_merge_new_range > 0.65 -0.0 0.63 perf-profile.children.cycles-pp.perf_iterate_sb > 0.24 -0.0 0.22 perf-profile.children.cycles-pp.security_vm_enough_memory_mm > 0.07 ± 5% -0.0 0.05 perf-profile.children.cycles-pp.mmap_region > 0.14 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.mas_prev > 0.30 -0.0 0.28 perf-profile.children.cycles-pp.up_read > 0.30 -0.0 0.29 perf-profile.children.cycles-pp.vma_set_page_prot > 0.07 -0.0 0.06 ± 9% perf-profile.children.cycles-pp.unlink_file_vma_batch_add > 0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.__alloc_empty_sheaf > 0.08 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.remove_vma > 0.24 -0.0 0.22 perf-profile.children.cycles-pp.downgrade_write > 0.20 ± 3% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.mas_wr_walk_descend > 0.28 -0.0 0.27 perf-profile.children.cycles-pp.down_write_killable > 0.18 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.vma_wants_writenotify > 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__kmalloc_noprof > 0.20 -0.0 0.19 perf-profile.children.cycles-pp.tlb_gather_mmu > 0.07 -0.0 0.06 perf-profile.children.cycles-pp.__call_rcu_common > 0.16 -0.0 0.15 perf-profile.children.cycles-pp.may_expand_vm > 0.14 -0.0 0.13 perf-profile.children.cycles-pp.up_write > 0.12 -0.0 0.11 perf-profile.children.cycles-pp.hrtimer_interrupt > 0.15 -0.0 0.14 perf-profile.children.cycles-pp.vma_is_shared_writable > 0.11 -0.0 0.10 perf-profile.children.cycles-pp.x64_sys_call > 0.88 +0.1 0.99 perf-profile.children.cycles-pp.vm_area_alloc > 0.36 +0.1 0.50 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook > 2.16 ± 5% +0.4 2.59 ± 2% perf-profile.children.cycles-pp.run_ksoftirqd > 2.17 ± 5% +0.4 2.60 ± 2% perf-profile.children.cycles-pp.smpboot_thread_fn > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.kthread > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork > 2.17 ± 5% +0.4 2.61 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm > 0.69 ± 5% +0.5 1.22 ± 4% perf-profile.children.cycles-pp.alloc_from_new_slab > 15.26 +1.2 16.44 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore > 18.16 +1.3 19.49 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 18.14 +1.3 19.47 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > 18.02 +1.3 19.36 perf-profile.children.cycles-pp.__irq_exit_rcu > 60.59 +1.5 62.05 perf-profile.children.cycles-pp.__pcs_replace_empty_main > 61.51 +1.6 63.10 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof > 20.16 +1.8 21.94 perf-profile.children.cycles-pp.rcu_core > 20.18 +1.8 21.95 perf-profile.children.cycles-pp.handle_softirqs > 20.14 +1.8 21.92 perf-profile.children.cycles-pp.rcu_do_batch > 50.97 +2.0 52.95 perf-profile.children.cycles-pp.__refill_objects_node > 15.07 +2.2 17.25 perf-profile.children.cycles-pp.__kmem_cache_free_bulk > 15.67 +2.2 17.86 perf-profile.children.cycles-pp.rcu_free_sheaf > 65.82 +2.4 68.25 perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 65.33 +2.5 67.80 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath > 51.67 +2.5 54.18 perf-profile.children.cycles-pp.refill_objects > 2.16 -0.5 1.66 perf-profile.self.cycles-pp.__refill_objects_node > 2.64 -0.2 2.48 perf-profile.self.cycles-pp.__pi_memcpy > 2.36 -0.1 2.22 perf-profile.self.cycles-pp.zap_pmd_range > 1.84 -0.1 1.71 perf-profile.self.cycles-pp.free_pud_range > 1.78 -0.1 1.66 perf-profile.self.cycles-pp.__mmap_region > 0.52 -0.1 0.40 perf-profile.self.cycles-pp.__slab_free > 1.44 -0.1 1.34 perf-profile.self.cycles-pp.mas_wr_node_store > 0.98 -0.1 0.92 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.84 -0.1 0.78 perf-profile.self.cycles-pp.mas_store_gfp > 0.55 -0.0 0.50 perf-profile.self.cycles-pp.do_vmi_align_munmap > 0.58 -0.0 0.54 perf-profile.self.cycles-pp.mas_prev_slot > 0.62 -0.0 0.58 perf-profile.self.cycles-pp.entry_SYSCALL_64 > 0.17 -0.0 0.13 perf-profile.self.cycles-pp.do_mmap > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.__mmap > 0.49 -0.0 0.45 perf-profile.self.cycles-pp._raw_spin_lock_irqsave > 0.35 -0.0 0.32 perf-profile.self.cycles-pp.vms_gather_munmap_vmas > 0.18 -0.0 0.15 ± 2% perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags > 0.42 -0.0 0.39 perf-profile.self.cycles-pp.__munmap > 0.49 -0.0 0.46 perf-profile.self.cycles-pp.mas_next_slot > 0.48 -0.0 0.45 perf-profile.self.cycles-pp.perf_iterate_sb > 0.10 -0.0 0.07 ± 5% perf-profile.self.cycles-pp.__get_unmapped_area > 0.40 -0.0 0.37 perf-profile.self.cycles-pp.mas_rev_awalk > 0.48 -0.0 0.45 perf-profile.self.cycles-pp.mas_walk > 0.31 -0.0 0.28 perf-profile.self.cycles-pp.kmem_cache_free > 0.37 -0.0 0.34 perf-profile.self.cycles-pp.__vm_munmap > 0.30 -0.0 0.28 perf-profile.self.cycles-pp.__vma_start_exclude_readers > 0.31 -0.0 0.29 perf-profile.self.cycles-pp.mas_wr_store_type > 0.36 -0.0 0.34 perf-profile.self.cycles-pp.perf_event_mmap > 0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_preallocate > 0.68 -0.0 0.66 perf-profile.self.cycles-pp.mas_update_gap > 0.32 -0.0 0.30 perf-profile.self.cycles-pp.vma_merge_new_range > 0.29 -0.0 0.27 perf-profile.self.cycles-pp.__rcu_free_sheaf_prepare > 0.37 -0.0 0.35 perf-profile.self.cycles-pp.mas_find > 0.18 -0.0 0.16 perf-profile.self.cycles-pp.rcu_free_sheaf > 0.33 -0.0 0.31 perf-profile.self.cycles-pp.vm_area_alloc > 0.07 -0.0 0.05 perf-profile.self.cycles-pp.unlink_file_vma_batch_add > 0.38 -0.0 0.36 perf-profile.self.cycles-pp.mas_store_prealloc > 0.26 -0.0 0.24 perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown > 0.25 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.up_read > 0.26 -0.0 0.24 perf-profile.self.cycles-pp.down_write_killable > 0.22 ± 2% -0.0 0.20 perf-profile.self.cycles-pp.downgrade_write > 0.26 -0.0 0.25 perf-profile.self.cycles-pp.__kfree_rcu_sheaf > 0.20 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.vms_complete_munmap_vmas > 0.18 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.mas_wr_walk_descend > 0.19 ± 2% -0.0 0.18 perf-profile.self.cycles-pp.do_syscall_64 > 0.38 -0.0 0.37 perf-profile.self.cycles-pp.unmapped_area_topdown > 0.17 -0.0 0.16 perf-profile.self.cycles-pp.__vma_start_write > 0.13 -0.0 0.12 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 0.10 -0.0 0.09 perf-profile.self.cycles-pp.free_pgd_range > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.mas_prev > 0.17 -0.0 0.16 perf-profile.self.cycles-pp.tlb_finish_mmu > 0.12 -0.0 0.11 perf-profile.self.cycles-pp.security_vm_enough_memory_mm > 0.06 -0.0 0.05 perf-profile.self.cycles-pp.mmap_region > 0.43 +0.1 0.50 perf-profile.self.cycles-pp.kvfree_call_rcu > 0.29 +0.1 0.41 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook > 65.33 +2.5 67.80 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath And yet it seems we spend more time spinning as a result. Can't explain it by what the patch does, so could it be just some code cache layout effect? > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression 2026-05-14 14:45 ` Vlastimil Babka (SUSE) @ 2026-05-14 16:00 ` Vlastimil Babka (SUSE) 2026-05-14 16:02 ` Vlastimil Babka (SUSE) 0 siblings, 1 reply; 4+ messages in thread From: Vlastimil Babka (SUSE) @ 2026-05-14 16:00 UTC (permalink / raw) To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm On 5/14/26 16:45, Vlastimil Babka (SUSE) wrote: > On 5/11/26 16:45, kernel test robot wrote: >> >> >> Hello, >> >> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on: > > Yay for an optimization that was supposed to have no tradeoffs :) Does this help? I don't expect much, but perhaps... - list_empty(&pc.slabs) is no longer unlikely when it can likely have a slab where we returned part of the freelist - let's ignore s->min_partial when returning slabs, we just pulled them from the list, it's unlikely there are too many free slabs, and unlikely we have free slabs to return. Maybe it will reduce the page alloc/frees and the code is simpler From 761946747d53b880855ac1e795ae0627be416c3e Mon Sep 17 00:00:00 2001 From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org> Date: Thu, 14 May 2026 17:55:31 +0200 Subject: [PATCH] mm, slab: simplify __refill_objects_node --- mm/slub.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 5d867349912b..0cc6c88f11e3 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -7187,26 +7187,16 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi break; } - if (unlikely(!list_empty(&pc.slabs))) { + if (!list_empty(&pc.slabs)) { spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) { - if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) - continue; - list_del(&slab->slab_list); add_partial(n, slab, ADD_TO_HEAD); } spin_unlock_irqrestore(&n->list_lock, flags); - - /* any slabs left are completely free and for discard */ - list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) { - - list_del(&slab->slab_list); - discard_slab(s, slab); - } } return refilled; -- 2.54.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression 2026-05-14 16:00 ` Vlastimil Babka (SUSE) @ 2026-05-14 16:02 ` Vlastimil Babka (SUSE) 0 siblings, 0 replies; 4+ messages in thread From: Vlastimil Babka (SUSE) @ 2026-05-14 16:02 UTC (permalink / raw) To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm On 5/14/26 18:00, Vlastimil Babka (SUSE) wrote: > On 5/14/26 16:45, Vlastimil Babka (SUSE) wrote: >> On 5/11/26 16:45, kernel test robot wrote: >>> >>> >>> Hello, >>> >>> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on: >> >> Yay for an optimization that was supposed to have no tradeoffs :) > > Does this help? I don't expect much, but perhaps... And a separate measurement with this on top of the previous one, plase? It's just that __slab_free() would have been adding to tail so let's try it too. From 8fba1377797478a945d97ad6163021d95ac7665c Mon Sep 17 00:00:00 2001 From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org> Date: Thu, 14 May 2026 18:00:52 +0200 Subject: [PATCH] mm, slab: ADD_TO_TAIL in __refill_objects_node --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index 0cc6c88f11e3..35e574e94538 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -7193,7 +7193,7 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) { list_del(&slab->slab_list); - add_partial(n, slab, ADD_TO_HEAD); + add_partial(n, slab, ADD_TO_TAIL); } spin_unlock_irqrestore(&n->list_lock, flags); -- 2.54.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-14 16:03 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-11 14:45 [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression kernel test robot 2026-05-14 14:45 ` Vlastimil Babka (SUSE) 2026-05-14 16:00 ` Vlastimil Babka (SUSE) 2026-05-14 16:02 ` Vlastimil Babka (SUSE)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox