All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhao Liu <zhao1.liu@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Hao Li <haolee.swjtu@gmail.com>,
	akpm@linux-foundation.org, harry.yoo@oracle.com, cl@gentwo.org,
	rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	tim.c.chen@intel.com, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main()
Date: Fri, 16 Jan 2026 17:07:30 +0800	[thread overview]
Message-ID: <aWn/0mn93MmUvTPY@intel.com> (raw)
In-Reply-To: <6be60100-e94c-4c06-9542-29ac8bf8f013@suse.cz>

> > The following is the perf data comparing 2 tests w/o fix & with this fix:
> > 
> > # Baseline  Delta Abs  Shared Object            Symbol
> > # ........  .........  .......................  ....................................
> > #
> >     61.76%     +4.78%  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
> >      0.93%     -0.32%  [kernel.vmlinux]         [k] __slab_free
> >      0.39%     -0.31%  [kernel.vmlinux]         [k] barn_get_empty_sheaf
> >      1.35%     -0.30%  [kernel.vmlinux]         [k] mas_leaf_max_gap
> >      3.22%     -0.30%  [kernel.vmlinux]         [k] __kmem_cache_alloc_bulk
> >      1.73%     -0.20%  [kernel.vmlinux]         [k] __cond_resched
> >      0.52%     -0.19%  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
> >      0.92%     +0.18%  [kernel.vmlinux]         [k] _raw_spin_lock
> >      1.91%     -0.15%  [kernel.vmlinux]         [k] zap_pmd_range.isra.0
> >      1.37%     -0.13%  [kernel.vmlinux]         [k] mas_wr_node_store
> >      1.29%     -0.12%  [kernel.vmlinux]         [k] free_pud_range
> >      0.92%     -0.11%  [kernel.vmlinux]         [k] __mmap_region
> >      0.12%     -0.11%  [kernel.vmlinux]         [k] barn_put_empty_sheaf
> >      0.20%     -0.09%  [kernel.vmlinux]         [k] barn_replace_empty_sheaf
> >      0.31%     +0.09%  [kernel.vmlinux]         [k] get_partial_node
> >      0.29%     -0.07%  [kernel.vmlinux]         [k] __rcu_free_sheaf_prepare
> >      0.12%     -0.07%  [kernel.vmlinux]         [k] intel_idle_xstate
> >      0.21%     -0.07%  [kernel.vmlinux]         [k] __kfree_rcu_sheaf
> >      0.26%     -0.07%  [kernel.vmlinux]         [k] down_write
> >      0.53%     -0.06%  libc.so.6                [.] __mmap
> >      0.66%     -0.06%  [kernel.vmlinux]         [k] mas_walk
> >      0.48%     -0.06%  [kernel.vmlinux]         [k] mas_prev_slot
> >      0.45%     -0.06%  [kernel.vmlinux]         [k] mas_find
> >      0.38%     -0.06%  [kernel.vmlinux]         [k] mas_wr_store_type
> >      0.23%     -0.06%  [kernel.vmlinux]         [k] do_vmi_align_munmap
> >      0.21%     -0.05%  [kernel.vmlinux]         [k] perf_event_mmap_event
> >      0.32%     -0.05%  [kernel.vmlinux]         [k] entry_SYSRETQ_unsafe_stack
> >      0.19%     -0.05%  [kernel.vmlinux]         [k] downgrade_write
> >      0.59%     -0.05%  [kernel.vmlinux]         [k] mas_next_slot
> >      0.31%     -0.05%  [kernel.vmlinux]         [k] __mmap_new_vma
> >      0.44%     -0.05%  [kernel.vmlinux]         [k] kmem_cache_alloc_noprof
> >      0.28%     -0.05%  [kernel.vmlinux]         [k] __vma_enter_locked
> >      0.41%     -0.05%  [kernel.vmlinux]         [k] memcpy
> >      0.48%     -0.04%  [kernel.vmlinux]         [k] mas_store_gfp
> >      0.14%     +0.04%  [kernel.vmlinux]         [k] __put_partials
> >      0.19%     -0.04%  [kernel.vmlinux]         [k] mas_empty_area_rev
> >      0.30%     -0.04%  [kernel.vmlinux]         [k] do_syscall_64
> >      0.25%     -0.04%  [kernel.vmlinux]         [k] mas_preallocate
> >      0.15%     -0.04%  [kernel.vmlinux]         [k] rcu_free_sheaf
> >      0.22%     -0.04%  [kernel.vmlinux]         [k] entry_SYSCALL_64
> >      0.49%     -0.04%  libc.so.6                [.] __munmap
> >      0.91%     -0.04%  [kernel.vmlinux]         [k] rcu_all_qs
> >      0.21%     -0.04%  [kernel.vmlinux]         [k] __vm_munmap
> >      0.24%     -0.04%  [kernel.vmlinux]         [k] mas_store_prealloc
> >      0.19%     -0.04%  [kernel.vmlinux]         [k] __kmalloc_cache_noprof
> >      0.34%     -0.04%  [kernel.vmlinux]         [k] build_detached_freelist
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] vms_complete_munmap_vmas
> >      0.36%     -0.03%  [kernel.vmlinux]         [k] mas_rev_awalk
> >      0.05%     -0.03%  [kernel.vmlinux]         [k] shuffle_freelist
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] down_write_killable
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] kmem_cache_free
> >      0.27%     -0.03%  [kernel.vmlinux]         [k] up_write
> >      0.13%     -0.03%  [kernel.vmlinux]         [k] vm_area_alloc
> >      0.18%     -0.03%  [kernel.vmlinux]         [k] arch_get_unmapped_area_topdown
> >      0.08%     -0.03%  [kernel.vmlinux]         [k] userfaultfd_unmap_complete
> >      0.10%     -0.03%  [kernel.vmlinux]         [k] tlb_gather_mmu
> >      0.30%     -0.02%  [kernel.vmlinux]         [k] ___slab_alloc
> > 
> > I think the insteresting item is "get_partial_node". It seems this fix
> > makes "get_partial_node" slightly more frequent. HMM, however, I still
> > can't figure out why this is happening. Do you have any thoughts on it?
> 
> I'm not sure if it's statistically significant or just noise, +0.09% could
> be noise?

small number does't always mean it's noise. When perf samples get_partial_node
on the spin lock call chain, its subroutines (spin lock) are hotter, so
the proportion of subroutine execution is higher. If the function -
get_partial_node itself (excluding subroutines) executes very quickly,
the proportion is lower.

I also expend the perf data with call chain:

* w/o fix:

We can calculate the proportion of spin locks introduced by get_partial_node
is: 31.05% / 49.91% = 62.21%

    49.91%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
            |
             --49.91%--native_queued_spin_lock_slowpath
                       |
                        --49.91%--_raw_spin_lock_irqsave
                                  |
                                  |--31.05%--get_partial_node
                                  |          |
                                  |          |--23.66%--get_any_partial
                                  |          |          ___slab_alloc
                                  |          |
                                  |           --7.40%--___slab_alloc
                                  |                     __kmem_cache_alloc_bulk
                                  |
                                  |--10.84%--barn_get_empty_sheaf
                                  |          |
                                  |          |--6.18%--__kfree_rcu_sheaf
                                  |          |          kvfree_call_rcu
                                  |          |
                                  |           --4.66%--__pcs_replace_empty_main
                                  |                     kmem_cache_alloc_noprof
                                  |
                                  |--5.10%--barn_put_empty_sheaf
                                  |          |
                                  |           --5.09%--__pcs_replace_empty_main
                                  |                     kmem_cache_alloc_noprof
                                  |
                                  |--2.01%--barn_replace_empty_sheaf
                                  |          __pcs_replace_empty_main
                                  |          kmem_cache_alloc_noprof
                                  |
                                   --0.78%--__put_partials
                                             |
                                              --0.78%--__kmem_cache_free_bulk.part.0
                                                        rcu_free_sheaf


* with fix:

Similarly, the proportion of spin locks introduced by get_partial_node
is: 39.91% / 42.82% = 93.20%

    42.82%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
            |
            ---native_queued_spin_lock_slowpath
               |
                --42.82%--_raw_spin_lock_irqsave
                          |
                          |--39.91%--get_partial_node
                          |          |
                          |          |--28.25%--get_any_partial
                          |          |          ___slab_alloc
                          |          |
                          |           --11.66%--___slab_alloc
                          |                     __kmem_cache_alloc_bulk
                          |
                          |--1.09%--barn_get_empty_sheaf
                          |          |
                          |           --0.90%--__kfree_rcu_sheaf
                          |                     kvfree_call_rcu
                          |
                          |--0.96%--barn_replace_empty_sheaf
                          |          __pcs_replace_empty_main
                          |          kmem_cache_alloc_noprof
                          |
                           --0.77%--__put_partials
                                     __kmem_cache_free_bulk.part.0
                                     rcu_free_sheaf


So, 62.21% -> 93.20% could reflect that get_partial_node contribute more
overhead at this point.

> > So, I'd like to know if you think dynamically or adaptively adjusting
> > capacity is a worthwhile idea.
> 
> In the followup series, there will be automatically determined capacity to
> roughly match the current capacity of cpu partial slabs:
> 
> https://lore.kernel.org/all/20260112-sheaves-for-all-v2-4-98225cfb50cf@suse.cz/
> 
> We can use that as starting point for further tuning. But I suspect making
> it adjust dynamically would be complicated.

Thanks, will continue to evaluate this series.

Regards,
Zhao



  reply	other threads:[~2026-01-16  8:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  0:26 [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main() Hao Li
2025-12-15 14:30 ` Vlastimil Babka
2025-12-16  2:34   ` Hao Lee
2025-12-22 10:20   ` Harry Yoo
2026-01-05 15:58     ` Vlastimil Babka
2026-01-15 10:12   ` Zhao Liu
2026-01-15 16:19     ` Vlastimil Babka
2026-01-16  9:07       ` Zhao Liu [this message]
2026-01-16  9:11         ` Hao Li
2026-01-16  4:06     ` Hao Li
2026-01-16  9:16       ` Zhao Liu
2026-01-16  9:09         ` Hao Li
2026-01-19  6:07     ` Hao Li
2026-01-20  8:21       ` Zhao Liu
2026-01-21  3:15         ` Hao Li
2026-01-21 13:17           ` Zhao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWn/0mn93MmUvTPY@intel.com \
    --to=zhao1.liu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=harry.yoo@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=tim.c.chen@intel.com \
    --cc=vbabka@suse.cz \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.