Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next:master] [mm, slab]  298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
@ 2026-05-11 14:45 kernel test robot
  2026-05-14 14:45 ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2026-05-11 14:45 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: oe-lkp, lkp, Hao Li, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:


commit: 298cdbf5f7c9e19289f46710ed5ab3da4e711150 ("mm, slab: add an optimistic __slab_try_return_freelist()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

[still regression on linux-next/master 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0]

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_task: 100%
	mode: process
	test: mmap1
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202605112204.9382cecf-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260511/202605112204.9382cecf-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/mmap1/will-it-scale

commit: 
  1f7c8e1d52 ("mm/slub: defer freelist construction until after bulk allocation from a new slab")
  298cdbf5f7 ("mm, slab: add an optimistic __slab_try_return_freelist()")

1f7c8e1d52428cbf 298cdbf5f7c9e19289f46710ed5 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  35417967            -6.3%   33202379        will-it-scale.192.processes
    184468            -6.3%     172928        will-it-scale.per_process_ops
  35417967            -6.3%   33202379        will-it-scale.workload
     21873 ±  2%      +8.0%      23628        vmstat.system.cs
      3957 ± 19%     -64.8%       1392 ± 13%  perf-c2c.DRAM.local
      1016 ± 20%     -46.9%     540.33 ± 34%  perf-c2c.DRAM.remote
      0.71            -5.6%       0.67        turbostat.IPC
    430.37            -1.2%     425.33        turbostat.PkgWatt
      0.13            -0.0        0.12        mpstat.cpu.all.irq%
     18.38            +1.4       19.79        mpstat.cpu.all.soft%
      1.64            -0.1        1.54        mpstat.cpu.all.usr%
      7.06 ± 14%     +26.0%       8.89 ± 12%  sched_debug.cfs_rq:/.load_avg.min
     18788 ±  2%      +7.2%      20147        sched_debug.cpu.nr_switches.avg
     16273 ±  3%      +7.6%      17503        sched_debug.cpu.nr_switches.min
   3418653           +97.9%    6765264        numa-numastat.node0.local_node
   3479092           +96.6%    6839697        numa-numastat.node0.numa_hit
   4424809 ±  2%     +79.1%    7922798 ±  2%  numa-numastat.node1.local_node
   4564748 ±  2%     +76.3%    8048580 ±  2%  numa-numastat.node1.numa_hit
   3478940           +96.6%    6839185        numa-vmstat.node0.numa_hit
   3418501           +97.9%    6764752        numa-vmstat.node0.numa_local
   4565277 ±  2%     +76.3%    8048347 ±  2%  numa-vmstat.node1.numa_hit
   4425338 ±  2%     +79.0%    7922564 ±  2%  numa-vmstat.node1.numa_local
    189874            -2.4%     185345        proc-vmstat.nr_slab_unreclaimable
   8051058           +85.0%   14892463        proc-vmstat.numa_hit
   7850680           +87.1%   14692248        proc-vmstat.numa_local
  25073981          +109.6%   52554060        proc-vmstat.pgalloc_normal
  23597828          +116.6%   51111380        proc-vmstat.pgfree
      0.20            +8.9%       0.22 ±  2%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      0.20            +8.9%       0.22 ±  2%  perf-sched.total_sch_delay.average.ms
     26.54 ±  2%     -12.1%      23.32 ±  2%  perf-sched.total_wait_and_delay.average.ms
    108336 ±  3%      +9.0%     118099        perf-sched.total_wait_and_delay.count.ms
     26.34 ±  2%     -12.3%      23.11 ±  2%  perf-sched.total_wait_time.average.ms
     26.54 ±  2%     -12.1%      23.32 ±  2%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
    108336 ±  3%      +9.0%     118099        perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
     26.34 ±  2%     -12.3%      23.11 ±  2%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
 9.213e+10            -5.7%  8.687e+10        perf-stat.i.branch-instructions
 1.098e+08            -6.7%  1.025e+08        perf-stat.i.branch-misses
     14.49 ±  2%      -1.4       13.11 ±  4%  perf-stat.i.cache-miss-rate%
 1.059e+08 ±  2%     -12.9%   92245846 ±  5%  perf-stat.i.cache-misses
 7.389e+08            -3.6%  7.123e+08        perf-stat.i.cache-references
     21981 ±  2%      +7.8%      23693        perf-stat.i.context-switches
      1.40            +6.2%       1.49        perf-stat.i.cpi
    256.41            +3.0%     263.99        perf-stat.i.cpu-migrations
      5815 ±  2%     +15.3%       6707 ±  4%  perf-stat.i.cycles-between-cache-misses
 4.341e+11            -5.8%  4.089e+11        perf-stat.i.instructions
      0.71            -5.8%       0.67        perf-stat.i.ipc
     14.34 ±  2%      -1.4       12.97 ±  4%  perf-stat.overall.cache-miss-rate%
      1.41            +6.2%       1.49        perf-stat.overall.cpi
      5764 ±  2%     +15.0%       6626 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.71            -5.9%       0.67        perf-stat.overall.ipc
 9.183e+10            -5.7%  8.659e+10        perf-stat.ps.branch-instructions
 1.095e+08            -6.7%  1.021e+08        perf-stat.ps.branch-misses
 1.056e+08 ±  2%     -12.8%   92081686 ±  5%  perf-stat.ps.cache-misses
 7.367e+08            -3.6%  7.102e+08        perf-stat.ps.cache-references
     21840 ±  2%      +8.1%      23604        perf-stat.ps.context-switches
    253.30            +3.0%     260.93        perf-stat.ps.cpu-migrations
 4.327e+11            -5.8%  4.076e+11        perf-stat.ps.instructions
 1.312e+14            -5.9%  1.235e+14        perf-stat.total.instructions
     12.77           -12.8        0.00        perf-profile.calltrace.cycles-pp.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
     12.68           -12.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main
     12.60           -12.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects
      4.81            -0.5        4.28        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_replace_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
      2.52            -0.4        2.09        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
      7.46            -0.4        7.06        perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      5.21 ±  2%      -0.4        4.86        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      3.50            -0.3        3.19        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store
      3.47            -0.3        3.16        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu
      5.90            -0.3        5.60        perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.96            -0.3        1.70        perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
      2.43 ±  4%      -0.3        2.17 ±  4%  perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap
      2.25 ±  4%      -0.2        2.00 ±  4%  perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp
      1.27 ± 10%      -0.2        1.03 ±  6%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
      2.66 ±  4%      -0.2        2.44 ±  3%  perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
      5.57 ±  2%      -0.2        5.37        perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
      1.46 ±  8%      -0.2        1.27 ±  4%  perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp.do_vmi_align_munmap
      1.27 ±  8%      -0.2        1.08 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
      5.06 ±  2%      -0.2        4.88        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
      2.48            -0.2        2.32        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs
      2.51            -0.2        2.36        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
      2.00            -0.1        1.86        perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.24            -0.1        2.11        perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      2.89            -0.1        2.76        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
      2.26            -0.1        2.12        perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
      2.56            -0.1        2.43        perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas
      2.13            -0.1        2.01        perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas
      2.00            -0.1        1.88        perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
      2.74            -0.1        2.62        perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
      1.94            -0.1        1.82        perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.67            -0.1        1.58        perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff
      1.44            -0.1        1.36        perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
      1.50            -0.1        1.42        perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region
      1.35            -0.1        1.28        perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area
      1.38            -0.1        1.30        perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap
      1.58            -0.1        1.52        perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      1.44            -0.1        1.39        perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff
      0.62            -0.0        0.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap
      0.61            -0.0        0.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap
      1.06            -0.0        1.03        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap
      0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags
      0.64            -0.0        0.62 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
      0.88            +0.1        0.98        perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
      0.55 ±  5%      +0.1        0.66 ±  2%  perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      0.51            +0.1        0.64        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap
      1.52 ±  5%      +0.3        1.84 ±  2%  perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
      2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
      2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
      2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
      2.17 ±  5%      +0.4        2.60 ±  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      0.00            +0.6        0.60 ±  8%  perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
      0.00            +0.6        0.62 ±  5%  perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
     12.88            +1.0       13.87        perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
     12.83            +1.0       13.82        perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node
     12.88            +1.0       13.87        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main
     12.81            +1.0       13.81        perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
     12.82            +1.0       13.82        perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore
     12.87            +1.0       13.87        perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects
     12.81            +1.0       13.82        perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
     10.64            +1.3       11.92        perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
     11.72            +1.6       13.30        perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs
     11.63            +1.6       13.23        perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core
     11.33            +1.6       12.94        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch
     11.20            +1.6       12.82        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf
     22.90           +14.2       37.13        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main
     22.99           +14.3       37.26        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
     27.87           -10.5       17.33        perf-profile.children.cycles-pp.__slab_free
      7.27            -0.8        6.48        perf-profile.children.cycles-pp.barn_get_empty_sheaf
      5.91            -0.6        5.32        perf-profile.children.cycles-pp.barn_replace_empty_sheaf
     10.36            -0.5        9.82        perf-profile.children.cycles-pp.mas_wr_node_store
      7.58            -0.4        7.18        perf-profile.children.cycles-pp.vms_complete_munmap_vmas
      3.98            -0.4        3.61        perf-profile.children.cycles-pp.barn_put_full_sheaf
      4.70            -0.4        4.34        perf-profile.children.cycles-pp.__kfree_rcu_sheaf
      5.91            -0.3        5.60        perf-profile.children.cycles-pp.unmap_region
      5.19            -0.3        4.90        perf-profile.children.cycles-pp.kvfree_call_rcu
     94.66            -0.3       94.41        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     94.52            -0.2       94.27        perf-profile.children.cycles-pp.do_syscall_64
      5.57 ±  2%      -0.2        5.37        perf-profile.children.cycles-pp.mas_store_prealloc
      2.06            -0.2        1.89        perf-profile.children.cycles-pp.__get_unmapped_area
      2.96            -0.2        2.80        perf-profile.children.cycles-pp.__pi_memcpy
      2.55            -0.2        2.40        perf-profile.children.cycles-pp.free_pgtables
      2.25            -0.1        2.12        perf-profile.children.cycles-pp.vms_gather_munmap_vmas
      2.26            -0.1        2.13        perf-profile.children.cycles-pp.free_pgd_range
      2.58            -0.1        2.44        perf-profile.children.cycles-pp.zap_pmd_range
      2.90            -0.1        2.77        perf-profile.children.cycles-pp.unmap_vmas
      2.14            -0.1        2.02        perf-profile.children.cycles-pp.free_p4d_range
      2.75            -0.1        2.63        perf-profile.children.cycles-pp.__zap_vma_range
      2.01            -0.1        1.89        perf-profile.children.cycles-pp.free_pud_range
      1.94            -0.1        1.82        perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      1.69            -0.1        1.59        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      1.26            -0.1        1.18        perf-profile.children.cycles-pp.entry_SYSCALL_64
      1.42            -0.1        1.34        perf-profile.children.cycles-pp.mas_find
      1.38            -0.1        1.30        perf-profile.children.cycles-pp.vm_unmapped_area
      1.36            -0.1        1.28        perf-profile.children.cycles-pp.unmapped_area_topdown
      0.98            -0.1        0.92        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.58            -0.1        1.52        perf-profile.children.cycles-pp.__mmap_complete
      1.46            -0.1        1.40        perf-profile.children.cycles-pp.perf_event_mmap
      0.64            -0.0        0.60        perf-profile.children.cycles-pp.mas_prev_slot
      0.54            -0.0        0.50        perf-profile.children.cycles-pp.mas_wr_store_type
      1.08            -0.0        1.04        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.63            -0.0        0.59        perf-profile.children.cycles-pp.mas_empty_area_rev
      0.54            -0.0        0.50        perf-profile.children.cycles-pp.__vma_start_write
      0.54            -0.0        0.52        perf-profile.children.cycles-pp.mas_next_slot
      0.43            -0.0        0.41        perf-profile.children.cycles-pp.mas_rev_awalk
      0.53            -0.0        0.51        perf-profile.children.cycles-pp.mas_walk
      0.64            -0.0        0.62 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.36            -0.0        0.33        perf-profile.children.cycles-pp.__vma_start_exclude_readers
      0.29            -0.0        0.27        perf-profile.children.cycles-pp.__rcu_free_sheaf_prepare
      0.44            -0.0        0.42        perf-profile.children.cycles-pp.vma_merge_new_range
      0.65            -0.0        0.63        perf-profile.children.cycles-pp.perf_iterate_sb
      0.24            -0.0        0.22        perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.07 ±  5%      -0.0        0.05        perf-profile.children.cycles-pp.mmap_region
      0.14            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.mas_prev
      0.30            -0.0        0.28        perf-profile.children.cycles-pp.up_read
      0.30            -0.0        0.29        perf-profile.children.cycles-pp.vma_set_page_prot
      0.07            -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.unlink_file_vma_batch_add
      0.08 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.__alloc_empty_sheaf
      0.08 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.remove_vma
      0.24            -0.0        0.22        perf-profile.children.cycles-pp.downgrade_write
      0.20 ±  3%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.mas_wr_walk_descend
      0.28            -0.0        0.27        perf-profile.children.cycles-pp.down_write_killable
      0.18 ±  2%      -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.vma_wants_writenotify
      0.07 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.__kmalloc_noprof
      0.20            -0.0        0.19        perf-profile.children.cycles-pp.tlb_gather_mmu
      0.07            -0.0        0.06        perf-profile.children.cycles-pp.__call_rcu_common
      0.16            -0.0        0.15        perf-profile.children.cycles-pp.may_expand_vm
      0.14            -0.0        0.13        perf-profile.children.cycles-pp.up_write
      0.12            -0.0        0.11        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.vma_is_shared_writable
      0.11            -0.0        0.10        perf-profile.children.cycles-pp.x64_sys_call
      0.88            +0.1        0.99        perf-profile.children.cycles-pp.vm_area_alloc
      0.36            +0.1        0.50        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.children.cycles-pp.run_ksoftirqd
      2.17 ±  5%      +0.4        2.60 ±  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.kthread
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
      2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.69 ±  5%      +0.5        1.22 ±  4%  perf-profile.children.cycles-pp.alloc_from_new_slab
     15.26            +1.2       16.44        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
     18.16            +1.3       19.49        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     18.14            +1.3       19.47        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
     18.02            +1.3       19.36        perf-profile.children.cycles-pp.__irq_exit_rcu
     60.59            +1.5       62.05        perf-profile.children.cycles-pp.__pcs_replace_empty_main
     61.51            +1.6       63.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
     20.16            +1.8       21.94        perf-profile.children.cycles-pp.rcu_core
     20.18            +1.8       21.95        perf-profile.children.cycles-pp.handle_softirqs
     20.14            +1.8       21.92        perf-profile.children.cycles-pp.rcu_do_batch
     50.97            +2.0       52.95        perf-profile.children.cycles-pp.__refill_objects_node
     15.07            +2.2       17.25        perf-profile.children.cycles-pp.__kmem_cache_free_bulk
     15.67            +2.2       17.86        perf-profile.children.cycles-pp.rcu_free_sheaf
     65.82            +2.4       68.25        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     65.33            +2.5       67.80        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     51.67            +2.5       54.18        perf-profile.children.cycles-pp.refill_objects
      2.16            -0.5        1.66        perf-profile.self.cycles-pp.__refill_objects_node
      2.64            -0.2        2.48        perf-profile.self.cycles-pp.__pi_memcpy
      2.36            -0.1        2.22        perf-profile.self.cycles-pp.zap_pmd_range
      1.84            -0.1        1.71        perf-profile.self.cycles-pp.free_pud_range
      1.78            -0.1        1.66        perf-profile.self.cycles-pp.__mmap_region
      0.52            -0.1        0.40        perf-profile.self.cycles-pp.__slab_free
      1.44            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_node_store
      0.98            -0.1        0.92        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.84            -0.1        0.78        perf-profile.self.cycles-pp.mas_store_gfp
      0.55            -0.0        0.50        perf-profile.self.cycles-pp.do_vmi_align_munmap
      0.58            -0.0        0.54        perf-profile.self.cycles-pp.mas_prev_slot
      0.62            -0.0        0.58        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.17            -0.0        0.13        perf-profile.self.cycles-pp.do_mmap
      0.48            -0.0        0.44        perf-profile.self.cycles-pp.__mmap
      0.49            -0.0        0.45        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.35            -0.0        0.32        perf-profile.self.cycles-pp.vms_gather_munmap_vmas
      0.18            -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
      0.42            -0.0        0.39        perf-profile.self.cycles-pp.__munmap
      0.49            -0.0        0.46        perf-profile.self.cycles-pp.mas_next_slot
      0.48            -0.0        0.45        perf-profile.self.cycles-pp.perf_iterate_sb
      0.10            -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.__get_unmapped_area
      0.40            -0.0        0.37        perf-profile.self.cycles-pp.mas_rev_awalk
      0.48            -0.0        0.45        perf-profile.self.cycles-pp.mas_walk
      0.31            -0.0        0.28        perf-profile.self.cycles-pp.kmem_cache_free
      0.37            -0.0        0.34        perf-profile.self.cycles-pp.__vm_munmap
      0.30            -0.0        0.28        perf-profile.self.cycles-pp.__vma_start_exclude_readers
      0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_wr_store_type
      0.36            -0.0        0.34        perf-profile.self.cycles-pp.perf_event_mmap
      0.37            -0.0        0.35        perf-profile.self.cycles-pp.mas_preallocate
      0.68            -0.0        0.66        perf-profile.self.cycles-pp.mas_update_gap
      0.32            -0.0        0.30        perf-profile.self.cycles-pp.vma_merge_new_range
      0.29            -0.0        0.27        perf-profile.self.cycles-pp.__rcu_free_sheaf_prepare
      0.37            -0.0        0.35        perf-profile.self.cycles-pp.mas_find
      0.18            -0.0        0.16        perf-profile.self.cycles-pp.rcu_free_sheaf
      0.33            -0.0        0.31        perf-profile.self.cycles-pp.vm_area_alloc
      0.07            -0.0        0.05        perf-profile.self.cycles-pp.unlink_file_vma_batch_add
      0.38            -0.0        0.36        perf-profile.self.cycles-pp.mas_store_prealloc
      0.26            -0.0        0.24        perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
      0.25            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.up_read
      0.26            -0.0        0.24        perf-profile.self.cycles-pp.down_write_killable
      0.22 ±  2%      -0.0        0.20        perf-profile.self.cycles-pp.downgrade_write
      0.26            -0.0        0.25        perf-profile.self.cycles-pp.__kfree_rcu_sheaf
      0.20 ±  2%      -0.0        0.19        perf-profile.self.cycles-pp.vms_complete_munmap_vmas
      0.18 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk_descend
      0.19 ±  2%      -0.0        0.18        perf-profile.self.cycles-pp.do_syscall_64
      0.38            -0.0        0.37        perf-profile.self.cycles-pp.unmapped_area_topdown
      0.17            -0.0        0.16        perf-profile.self.cycles-pp.__vma_start_write
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.10            -0.0        0.09        perf-profile.self.cycles-pp.free_pgd_range
      0.07            -0.0        0.06        perf-profile.self.cycles-pp.mas_prev
      0.17            -0.0        0.16        perf-profile.self.cycles-pp.tlb_finish_mmu
      0.12            -0.0        0.11        perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.06            -0.0        0.05        perf-profile.self.cycles-pp.mmap_region
      0.43            +0.1        0.50        perf-profile.self.cycles-pp.kvfree_call_rcu
      0.29            +0.1        0.41        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
     65.33            +2.5       67.80        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
  2026-05-11 14:45 [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression kernel test robot
@ 2026-05-14 14:45 ` Vlastimil Babka (SUSE)
  2026-05-14 16:00   ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-14 14:45 UTC (permalink / raw)
  To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm

On 5/11/26 16:45, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:

Yay for an optimization that was supposed to have no tradeoffs :)

> commit: 298cdbf5f7c9e19289f46710ed5ab3da4e711150 ("mm, slab: add an optimistic __slab_try_return_freelist()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [still regression on linux-next/master 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0]
> 
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
> parameters:
> 
> 	nr_task: 100%
> 	mode: process
> 	test: mmap1
> 	cpufreq_governor: performance
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202605112204.9382cecf-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260511/202605112204.9382cecf-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/mmap1/will-it-scale
> 
> commit: 
>   1f7c8e1d52 ("mm/slub: defer freelist construction until after bulk allocation from a new slab")
>   298cdbf5f7 ("mm, slab: add an optimistic __slab_try_return_freelist()")
> 
> 1f7c8e1d52428cbf 298cdbf5f7c9e19289f46710ed5 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>   35417967            -6.3%   33202379        will-it-scale.192.processes
>     184468            -6.3%     172928        will-it-scale.per_process_ops
>   35417967            -6.3%   33202379        will-it-scale.workload
>      21873 ±  2%      +8.0%      23628        vmstat.system.cs
>       3957 ± 19%     -64.8%       1392 ± 13%  perf-c2c.DRAM.local
>       1016 ± 20%     -46.9%     540.33 ± 34%  perf-c2c.DRAM.remote
>       0.71            -5.6%       0.67        turbostat.IPC
>     430.37            -1.2%     425.33        turbostat.PkgWatt
>       0.13            -0.0        0.12        mpstat.cpu.all.irq%
>      18.38            +1.4       19.79        mpstat.cpu.all.soft%
>       1.64            -0.1        1.54        mpstat.cpu.all.usr%
>       7.06 ± 14%     +26.0%       8.89 ± 12%  sched_debug.cfs_rq:/.load_avg.min
>      18788 ±  2%      +7.2%      20147        sched_debug.cpu.nr_switches.avg
>      16273 ±  3%      +7.6%      17503        sched_debug.cpu.nr_switches.min
>    3418653           +97.9%    6765264        numa-numastat.node0.local_node
>    3479092           +96.6%    6839697        numa-numastat.node0.numa_hit
>    4424809 ±  2%     +79.1%    7922798 ±  2%  numa-numastat.node1.local_node
>    4564748 ±  2%     +76.3%    8048580 ±  2%  numa-numastat.node1.numa_hit
>    3478940           +96.6%    6839185        numa-vmstat.node0.numa_hit
>    3418501           +97.9%    6764752        numa-vmstat.node0.numa_local
>    4565277 ±  2%     +76.3%    8048347 ±  2%  numa-vmstat.node1.numa_hit
>    4425338 ±  2%     +79.0%    7922564 ±  2%  numa-vmstat.node1.numa_local
>     189874            -2.4%     185345        proc-vmstat.nr_slab_unreclaimable
>    8051058           +85.0%   14892463        proc-vmstat.numa_hit
>    7850680           +87.1%   14692248        proc-vmstat.numa_local
>   25073981          +109.6%   52554060        proc-vmstat.pgalloc_normal
>   23597828          +116.6%   51111380        proc-vmstat.pgfree

Perhaps the weirdest part, the commit shouldn't be affecting page
allocations at all.

>       0.20            +8.9%       0.22 ±  2%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       0.20            +8.9%       0.22 ±  2%  perf-sched.total_sch_delay.average.ms
>      26.54 ±  2%     -12.1%      23.32 ±  2%  perf-sched.total_wait_and_delay.average.ms
>     108336 ±  3%      +9.0%     118099        perf-sched.total_wait_and_delay.count.ms
>      26.34 ±  2%     -12.3%      23.11 ±  2%  perf-sched.total_wait_time.average.ms
>      26.54 ±  2%     -12.1%      23.32 ±  2%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>     108336 ±  3%      +9.0%     118099        perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
>      26.34 ±  2%     -12.3%      23.11 ±  2%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>  9.213e+10            -5.7%  8.687e+10        perf-stat.i.branch-instructions
>  1.098e+08            -6.7%  1.025e+08        perf-stat.i.branch-misses
>      14.49 ±  2%      -1.4       13.11 ±  4%  perf-stat.i.cache-miss-rate%
>  1.059e+08 ±  2%     -12.9%   92245846 ±  5%  perf-stat.i.cache-misses
>  7.389e+08            -3.6%  7.123e+08        perf-stat.i.cache-references
>      21981 ±  2%      +7.8%      23693        perf-stat.i.context-switches
>       1.40            +6.2%       1.49        perf-stat.i.cpi
>     256.41            +3.0%     263.99        perf-stat.i.cpu-migrations
>       5815 ±  2%     +15.3%       6707 ±  4%  perf-stat.i.cycles-between-cache-misses
>  4.341e+11            -5.8%  4.089e+11        perf-stat.i.instructions
>       0.71            -5.8%       0.67        perf-stat.i.ipc
>      14.34 ±  2%      -1.4       12.97 ±  4%  perf-stat.overall.cache-miss-rate%
>       1.41            +6.2%       1.49        perf-stat.overall.cpi
>       5764 ±  2%     +15.0%       6626 ±  4%  perf-stat.overall.cycles-between-cache-misses
>       0.71            -5.9%       0.67        perf-stat.overall.ipc
>  9.183e+10            -5.7%  8.659e+10        perf-stat.ps.branch-instructions
>  1.095e+08            -6.7%  1.021e+08        perf-stat.ps.branch-misses
>  1.056e+08 ±  2%     -12.8%   92081686 ±  5%  perf-stat.ps.cache-misses
>  7.367e+08            -3.6%  7.102e+08        perf-stat.ps.cache-references
>      21840 ±  2%      +8.1%      23604        perf-stat.ps.context-switches
>     253.30            +3.0%     260.93        perf-stat.ps.cpu-migrations
>  4.327e+11            -5.8%  4.076e+11        perf-stat.ps.instructions
>  1.312e+14            -5.9%  1.235e+14        perf-stat.total.instructions
>      12.77           -12.8        0.00        perf-profile.calltrace.cycles-pp.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
>      12.68           -12.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects.__pcs_replace_empty_main
>      12.60           -12.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__refill_objects_node.refill_objects

The zeroes here suggest the patch is working as expected, we're avoding
__slab_free() completely here because __slab_try_return_freelist() succeeds
reliably.

>       4.81            -0.5        4.28        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_replace_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
>       2.52            -0.4        2.09        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof
>       7.46            -0.4        7.06        perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       5.21 ±  2%      -0.4        4.86        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       3.50            -0.3        3.19        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store
>       3.47            -0.3        3.16        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu
>       5.90            -0.3        5.60        perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       1.96            -0.3        1.70        perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
>       2.43 ±  4%      -0.3        2.17 ±  4%  perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap
>       2.25 ±  4%      -0.2        2.00 ±  4%  perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__kfree_rcu_sheaf.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp
>       1.27 ± 10%      -0.2        1.03 ±  6%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
>       2.66 ±  4%      -0.2        2.44 ±  3%  perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       5.57 ±  2%      -0.2        5.37        perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       1.46 ±  8%      -0.2        1.27 ±  4%  perf-profile.calltrace.cycles-pp.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp.do_vmi_align_munmap
>       1.27 ±  8%      -0.2        1.08 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_get_empty_sheaf.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
>       5.06 ±  2%      -0.2        4.88        perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
>       2.48            -0.2        2.32        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs
>       2.51            -0.2        2.36        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>       2.00            -0.1        1.86        perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.24            -0.1        2.11        perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       2.89            -0.1        2.76        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>       2.26            -0.1        2.12        perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
>       2.56            -0.1        2.43        perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas
>       2.13            -0.1        2.01        perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas
>       2.00            -0.1        1.88        perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
>       2.74            -0.1        2.62        perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
>       1.94            -0.1        1.82        perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff.do_syscall_64
>       1.67            -0.1        1.58        perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap.vm_mmap_pgoff
>       1.44            -0.1        1.36        perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       1.50            -0.1        1.42        perf-profile.calltrace.cycles-pp.__pi_memcpy.mas_wr_node_store.mas_store_prealloc.__mmap_new_vma.__mmap_region
>       1.35            -0.1        1.28        perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area
>       1.38            -0.1        1.30        perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags.__get_unmapped_area.do_mmap
>       1.58            -0.1        1.52        perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
>       1.44            -0.1        1.39        perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff
>       0.62            -0.0        0.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap
>       0.61            -0.0        0.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap
>       1.06            -0.0        1.03        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap
>       0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.thp_get_unmapped_area_vmflags
>       0.64            -0.0        0.62 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.64            -0.0        0.62        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
>       0.88            +0.1        0.98        perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       0.55 ±  5%      +0.1        0.66 ±  2%  perf-profile.calltrace.cycles-pp.barn_put_full_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       0.51            +0.1        0.64        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap
>       1.52 ±  5%      +0.3        1.84 ±  2%  perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>       2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
>       2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
>       2.17 ±  5%      +0.4        2.60 ±  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       0.00            +0.6        0.60 ±  8%  perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_preallocate
>       0.00            +0.6        0.62 ±  5%  perf-profile.calltrace.cycles-pp.alloc_from_new_slab.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof.mas_store_gfp
>      12.88            +1.0       13.87        perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof
>      12.83            +1.0       13.82        perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node
>      12.88            +1.0       13.87        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects.__pcs_replace_empty_main
>      12.81            +1.0       13.81        perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
>      12.82            +1.0       13.82        perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore
>      12.87            +1.0       13.87        perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt._raw_spin_unlock_irqrestore.__refill_objects_node.refill_objects
>      12.81            +1.0       13.82        perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>      10.64            +1.3       11.92        perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu
>      11.72            +1.6       13.30        perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs
>      11.63            +1.6       13.23        perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core
>      11.33            +1.6       12.94        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch
>      11.20            +1.6       12.82        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf
>      22.90           +14.2       37.13        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main
>      22.99           +14.3       37.26        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__pcs_replace_empty_main.kmem_cache_alloc_noprof

Taking the spin_lock isn't avoided (and isn't supposed to be), we're just
not doing it from __slab_free() but from __refill_objects_node(). Should be
the same frequency and same amount of work under the lock (maybe less as
there's no double cmpxchg under the spin lock), we just avoid walking
freelist (outside of any lock). So it should be the same or better.

>      27.87           -10.5       17.33        perf-profile.children.cycles-pp.__slab_free
>       7.27            -0.8        6.48        perf-profile.children.cycles-pp.barn_get_empty_sheaf
>       5.91            -0.6        5.32        perf-profile.children.cycles-pp.barn_replace_empty_sheaf
>      10.36            -0.5        9.82        perf-profile.children.cycles-pp.mas_wr_node_store
>       7.58            -0.4        7.18        perf-profile.children.cycles-pp.vms_complete_munmap_vmas
>       3.98            -0.4        3.61        perf-profile.children.cycles-pp.barn_put_full_sheaf
>       4.70            -0.4        4.34        perf-profile.children.cycles-pp.__kfree_rcu_sheaf
>       5.91            -0.3        5.60        perf-profile.children.cycles-pp.unmap_region
>       5.19            -0.3        4.90        perf-profile.children.cycles-pp.kvfree_call_rcu
>      94.66            -0.3       94.41        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      94.52            -0.2       94.27        perf-profile.children.cycles-pp.do_syscall_64
>       5.57 ±  2%      -0.2        5.37        perf-profile.children.cycles-pp.mas_store_prealloc
>       2.06            -0.2        1.89        perf-profile.children.cycles-pp.__get_unmapped_area
>       2.96            -0.2        2.80        perf-profile.children.cycles-pp.__pi_memcpy
>       2.55            -0.2        2.40        perf-profile.children.cycles-pp.free_pgtables
>       2.25            -0.1        2.12        perf-profile.children.cycles-pp.vms_gather_munmap_vmas
>       2.26            -0.1        2.13        perf-profile.children.cycles-pp.free_pgd_range
>       2.58            -0.1        2.44        perf-profile.children.cycles-pp.zap_pmd_range
>       2.90            -0.1        2.77        perf-profile.children.cycles-pp.unmap_vmas
>       2.14            -0.1        2.02        perf-profile.children.cycles-pp.free_p4d_range
>       2.75            -0.1        2.63        perf-profile.children.cycles-pp.__zap_vma_range
>       2.01            -0.1        1.89        perf-profile.children.cycles-pp.free_pud_range
>       1.94            -0.1        1.82        perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
>       1.69            -0.1        1.59        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
>       1.26            -0.1        1.18        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       1.42            -0.1        1.34        perf-profile.children.cycles-pp.mas_find
>       1.38            -0.1        1.30        perf-profile.children.cycles-pp.vm_unmapped_area
>       1.36            -0.1        1.28        perf-profile.children.cycles-pp.unmapped_area_topdown
>       0.98            -0.1        0.92        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.58            -0.1        1.52        perf-profile.children.cycles-pp.__mmap_complete
>       1.46            -0.1        1.40        perf-profile.children.cycles-pp.perf_event_mmap
>       0.64            -0.0        0.60        perf-profile.children.cycles-pp.mas_prev_slot
>       0.54            -0.0        0.50        perf-profile.children.cycles-pp.mas_wr_store_type
>       1.08            -0.0        1.04        perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.63            -0.0        0.59        perf-profile.children.cycles-pp.mas_empty_area_rev
>       0.54            -0.0        0.50        perf-profile.children.cycles-pp.__vma_start_write
>       0.54            -0.0        0.52        perf-profile.children.cycles-pp.mas_next_slot
>       0.43            -0.0        0.41        perf-profile.children.cycles-pp.mas_rev_awalk
>       0.53            -0.0        0.51        perf-profile.children.cycles-pp.mas_walk
>       0.64            -0.0        0.62 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
>       0.36            -0.0        0.33        perf-profile.children.cycles-pp.__vma_start_exclude_readers
>       0.29            -0.0        0.27        perf-profile.children.cycles-pp.__rcu_free_sheaf_prepare
>       0.44            -0.0        0.42        perf-profile.children.cycles-pp.vma_merge_new_range
>       0.65            -0.0        0.63        perf-profile.children.cycles-pp.perf_iterate_sb
>       0.24            -0.0        0.22        perf-profile.children.cycles-pp.security_vm_enough_memory_mm
>       0.07 ±  5%      -0.0        0.05        perf-profile.children.cycles-pp.mmap_region
>       0.14            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.mas_prev
>       0.30            -0.0        0.28        perf-profile.children.cycles-pp.up_read
>       0.30            -0.0        0.29        perf-profile.children.cycles-pp.vma_set_page_prot
>       0.07            -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.unlink_file_vma_batch_add
>       0.08 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.__alloc_empty_sheaf
>       0.08 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.remove_vma
>       0.24            -0.0        0.22        perf-profile.children.cycles-pp.downgrade_write
>       0.20 ±  3%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.mas_wr_walk_descend
>       0.28            -0.0        0.27        perf-profile.children.cycles-pp.down_write_killable
>       0.18 ±  2%      -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.vma_wants_writenotify
>       0.07 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.__kmalloc_noprof
>       0.20            -0.0        0.19        perf-profile.children.cycles-pp.tlb_gather_mmu
>       0.07            -0.0        0.06        perf-profile.children.cycles-pp.__call_rcu_common
>       0.16            -0.0        0.15        perf-profile.children.cycles-pp.may_expand_vm
>       0.14            -0.0        0.13        perf-profile.children.cycles-pp.up_write
>       0.12            -0.0        0.11        perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.15            -0.0        0.14        perf-profile.children.cycles-pp.vma_is_shared_writable
>       0.11            -0.0        0.10        perf-profile.children.cycles-pp.x64_sys_call
>       0.88            +0.1        0.99        perf-profile.children.cycles-pp.vm_area_alloc
>       0.36            +0.1        0.50        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
>       2.16 ±  5%      +0.4        2.59 ±  2%  perf-profile.children.cycles-pp.run_ksoftirqd
>       2.17 ±  5%      +0.4        2.60 ±  2%  perf-profile.children.cycles-pp.smpboot_thread_fn
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.kthread
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
>       2.17 ±  5%      +0.4        2.61 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       0.69 ±  5%      +0.5        1.22 ±  4%  perf-profile.children.cycles-pp.alloc_from_new_slab
>      15.26            +1.2       16.44        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
>      18.16            +1.3       19.49        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>      18.14            +1.3       19.47        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>      18.02            +1.3       19.36        perf-profile.children.cycles-pp.__irq_exit_rcu
>      60.59            +1.5       62.05        perf-profile.children.cycles-pp.__pcs_replace_empty_main
>      61.51            +1.6       63.10        perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
>      20.16            +1.8       21.94        perf-profile.children.cycles-pp.rcu_core
>      20.18            +1.8       21.95        perf-profile.children.cycles-pp.handle_softirqs
>      20.14            +1.8       21.92        perf-profile.children.cycles-pp.rcu_do_batch
>      50.97            +2.0       52.95        perf-profile.children.cycles-pp.__refill_objects_node
>      15.07            +2.2       17.25        perf-profile.children.cycles-pp.__kmem_cache_free_bulk
>      15.67            +2.2       17.86        perf-profile.children.cycles-pp.rcu_free_sheaf
>      65.82            +2.4       68.25        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>      65.33            +2.5       67.80        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>      51.67            +2.5       54.18        perf-profile.children.cycles-pp.refill_objects
>       2.16            -0.5        1.66        perf-profile.self.cycles-pp.__refill_objects_node
>       2.64            -0.2        2.48        perf-profile.self.cycles-pp.__pi_memcpy
>       2.36            -0.1        2.22        perf-profile.self.cycles-pp.zap_pmd_range
>       1.84            -0.1        1.71        perf-profile.self.cycles-pp.free_pud_range
>       1.78            -0.1        1.66        perf-profile.self.cycles-pp.__mmap_region
>       0.52            -0.1        0.40        perf-profile.self.cycles-pp.__slab_free
>       1.44            -0.1        1.34        perf-profile.self.cycles-pp.mas_wr_node_store
>       0.98            -0.1        0.92        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.84            -0.1        0.78        perf-profile.self.cycles-pp.mas_store_gfp
>       0.55            -0.0        0.50        perf-profile.self.cycles-pp.do_vmi_align_munmap
>       0.58            -0.0        0.54        perf-profile.self.cycles-pp.mas_prev_slot
>       0.62            -0.0        0.58        perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.17            -0.0        0.13        perf-profile.self.cycles-pp.do_mmap
>       0.48            -0.0        0.44        perf-profile.self.cycles-pp.__mmap
>       0.49            -0.0        0.45        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.35            -0.0        0.32        perf-profile.self.cycles-pp.vms_gather_munmap_vmas
>       0.18            -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.thp_get_unmapped_area_vmflags
>       0.42            -0.0        0.39        perf-profile.self.cycles-pp.__munmap
>       0.49            -0.0        0.46        perf-profile.self.cycles-pp.mas_next_slot
>       0.48            -0.0        0.45        perf-profile.self.cycles-pp.perf_iterate_sb
>       0.10            -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.__get_unmapped_area
>       0.40            -0.0        0.37        perf-profile.self.cycles-pp.mas_rev_awalk
>       0.48            -0.0        0.45        perf-profile.self.cycles-pp.mas_walk
>       0.31            -0.0        0.28        perf-profile.self.cycles-pp.kmem_cache_free
>       0.37            -0.0        0.34        perf-profile.self.cycles-pp.__vm_munmap
>       0.30            -0.0        0.28        perf-profile.self.cycles-pp.__vma_start_exclude_readers
>       0.31            -0.0        0.29        perf-profile.self.cycles-pp.mas_wr_store_type
>       0.36            -0.0        0.34        perf-profile.self.cycles-pp.perf_event_mmap
>       0.37            -0.0        0.35        perf-profile.self.cycles-pp.mas_preallocate
>       0.68            -0.0        0.66        perf-profile.self.cycles-pp.mas_update_gap
>       0.32            -0.0        0.30        perf-profile.self.cycles-pp.vma_merge_new_range
>       0.29            -0.0        0.27        perf-profile.self.cycles-pp.__rcu_free_sheaf_prepare
>       0.37            -0.0        0.35        perf-profile.self.cycles-pp.mas_find
>       0.18            -0.0        0.16        perf-profile.self.cycles-pp.rcu_free_sheaf
>       0.33            -0.0        0.31        perf-profile.self.cycles-pp.vm_area_alloc
>       0.07            -0.0        0.05        perf-profile.self.cycles-pp.unlink_file_vma_batch_add
>       0.38            -0.0        0.36        perf-profile.self.cycles-pp.mas_store_prealloc
>       0.26            -0.0        0.24        perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
>       0.25            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.up_read
>       0.26            -0.0        0.24        perf-profile.self.cycles-pp.down_write_killable
>       0.22 ±  2%      -0.0        0.20        perf-profile.self.cycles-pp.downgrade_write
>       0.26            -0.0        0.25        perf-profile.self.cycles-pp.__kfree_rcu_sheaf
>       0.20 ±  2%      -0.0        0.19        perf-profile.self.cycles-pp.vms_complete_munmap_vmas
>       0.18 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.mas_wr_walk_descend
>       0.19 ±  2%      -0.0        0.18        perf-profile.self.cycles-pp.do_syscall_64
>       0.38            -0.0        0.37        perf-profile.self.cycles-pp.unmapped_area_topdown
>       0.17            -0.0        0.16        perf-profile.self.cycles-pp.__vma_start_write
>       0.13            -0.0        0.12        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.10            -0.0        0.09        perf-profile.self.cycles-pp.free_pgd_range
>       0.07            -0.0        0.06        perf-profile.self.cycles-pp.mas_prev
>       0.17            -0.0        0.16        perf-profile.self.cycles-pp.tlb_finish_mmu
>       0.12            -0.0        0.11        perf-profile.self.cycles-pp.security_vm_enough_memory_mm
>       0.06            -0.0        0.05        perf-profile.self.cycles-pp.mmap_region
>       0.43            +0.1        0.50        perf-profile.self.cycles-pp.kvfree_call_rcu
>       0.29            +0.1        0.41        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
>      65.33            +2.5       67.80        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath

And yet it seems we spend more time spinning as a result. Can't explain it
by what the patch does, so could it be just some code cache layout effect?

> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
  2026-05-14 14:45 ` Vlastimil Babka (SUSE)
@ 2026-05-14 16:00   ` Vlastimil Babka (SUSE)
  2026-05-14 16:02     ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-14 16:00 UTC (permalink / raw)
  To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm

On 5/14/26 16:45, Vlastimil Babka (SUSE) wrote:
> On 5/11/26 16:45, kernel test robot wrote:
>> 
>> 
>> Hello,
>> 
>> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:
> 
> Yay for an optimization that was supposed to have no tradeoffs :)

Does this help? I don't expect much, but perhaps...

- list_empty(&pc.slabs) is no longer unlikely when it can likely have a slab
where we returned part of the freelist
- let's ignore s->min_partial when returning slabs, we just pulled them from
the list, it's unlikely there are too many free slabs, and unlikely we have
free slabs to return. Maybe it will reduce the page alloc/frees and the code
is simpler

From 761946747d53b880855ac1e795ae0627be416c3e Mon Sep 17 00:00:00 2001
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Date: Thu, 14 May 2026 17:55:31 +0200
Subject: [PATCH] mm, slab: simplify __refill_objects_node

---
 mm/slub.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5d867349912b..0cc6c88f11e3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7187,26 +7187,16 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
 			break;
 	}
 
-	if (unlikely(!list_empty(&pc.slabs))) {
+	if (!list_empty(&pc.slabs)) {
 		spin_lock_irqsave(&n->list_lock, flags);
 
 		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
 
-			if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial))
-				continue;
-
 			list_del(&slab->slab_list);
 			add_partial(n, slab, ADD_TO_HEAD);
 		}
 
 		spin_unlock_irqrestore(&n->list_lock, flags);
-
-		/* any slabs left are completely free and for discard */
-		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
-
-			list_del(&slab->slab_list);
-			discard_slab(s, slab);
-		}
 	}
 
 	return refilled;
-- 
2.54.0




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression
  2026-05-14 16:00   ` Vlastimil Babka (SUSE)
@ 2026-05-14 16:02     ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-14 16:02 UTC (permalink / raw)
  To: kernel test robot, Harry Yoo (Oracle); +Cc: oe-lkp, lkp, Hao Li, linux-mm

On 5/14/26 18:00, Vlastimil Babka (SUSE) wrote:
> On 5/14/26 16:45, Vlastimil Babka (SUSE) wrote:
>> On 5/11/26 16:45, kernel test robot wrote:
>>> 
>>> 
>>> Hello,
>>> 
>>> kernel test robot noticed a 6.3% regression of will-it-scale.per_process_ops on:
>> 
>> Yay for an optimization that was supposed to have no tradeoffs :)
> 
> Does this help? I don't expect much, but perhaps...

And a separate measurement with this on top of the previous one, plase?
It's just that __slab_free() would have been adding to tail so let's try it
too.
 
From 8fba1377797478a945d97ad6163021d95ac7665c Mon Sep 17 00:00:00 2001
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Date: Thu, 14 May 2026 18:00:52 +0200
Subject: [PATCH] mm, slab: ADD_TO_TAIL in __refill_objects_node

---
 mm/slub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0cc6c88f11e3..35e574e94538 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7193,7 +7193,7 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
 		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
 
 			list_del(&slab->slab_list);
-			add_partial(n, slab, ADD_TO_HEAD);
+			add_partial(n, slab, ADD_TO_TAIL);
 		}
 
 		spin_unlock_irqrestore(&n->list_lock, flags);
-- 
2.54.0




^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-14 16:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 14:45 [linux-next:master] [mm, slab] 298cdbf5f7: will-it-scale.per_process_ops 6.3% regression kernel test robot
2026-05-14 14:45 ` Vlastimil Babka (SUSE)
2026-05-14 16:00   ` Vlastimil Babka (SUSE)
2026-05-14 16:02     ` Vlastimil Babka (SUSE)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox