From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4937C3B5317
	for <linux-kernel@vger.kernel.org>; Tue, 20 Jan 2026 07:55:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1768895749; cv=none; b=RVod3niGyeIE2XbumslrfwXHnXlV0DEg99z60ZcUXR+mphs7Q6rFHCbND5E29zf9RQWfTIDJf+MGNjVXXV0yy60OTsaeq1Fo+KClD54ab9VACxjZ8W3WeyYU+2nm1eQhddezrUUVIgPjQHDUJB6apDbPWZ/yFEuhOG7bxxJce4c=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1768895749; c=relaxed/simple;
	bh=L3f5UMHF5dBPZaXmapr7xWQ6PtDNRb8PrrsGqH4TLK8=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=J1SPz/fp4cDVfMCS2ZxnX4ihdCe+PFvjaa4CdJkoDOfTQRjyE95QsCmCPtRhWW2Dm0N5KDOxQKYnT5p1NCdPIWNyZmKXCBoOkY53VEbsePh9Muu8x2LI/mT7KVOqFeeJ1EprSdW7j6j59Afnvkc6Vv2WGnpLKO0gNRUT5Q6lEE8=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Rfioc7f1; arc=none smtp.client-ip=198.175.65.13
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Rfioc7f1"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1768895749; x=1800431749;
  h=date:from:to:cc:subject:message-id:references:
   mime-version:content-transfer-encoding:in-reply-to;
  bh=L3f5UMHF5dBPZaXmapr7xWQ6PtDNRb8PrrsGqH4TLK8=;
  b=Rfioc7f12U2ja0yREPNp4ob44ne0gJ1R3N+zSx7EKdBA2+zrBnOcRUZX
   2DTs+RR+0LvCdfvJ8nXBRsgDDIsCYqBaRyzmwyWE2tPKb3ln4IFH6Rtid
   PI8OYFolVoR2Qn/pcmkSYFONI/VOhKpvZwACbGxrSmli40pX2yrc8X6hZ
   6HN4iyOCtO0wJhxjJTCywkqUkOdauDM9mLcxnvB12IMJWulH3m9BqK8D2
   p67Mil9e6b7zX+OoqxRYnw6u7N6Ygv6FfusUBDrqQ6kup4tAy2uDxigTJ
   pD1AAaDSrXB+TqM6VaspWtLjCch28irrLbQfIVgQMLHUBnAfjuPVKvDeq
   A==;
X-CSE-ConnectionGUID: pT0UQzRzTnGji0iwuaRNIA==
X-CSE-MsgGUID: GPYBfxwoR3qEZhiq9tAlLA==
X-IronPort-AV: E=McAfee;i="6800,10657,11676"; a="81207086"
X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; 
   d="scan'208";a="81207086"
Received: from fmviesa010.fm.intel.com ([10.60.135.150])
  by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2026 23:55:48 -0800
X-CSE-ConnectionGUID: La31ubYVRuWO+3LEHm0g4g==
X-CSE-MsgGUID: B4jzu01bQ3Se71q1W8NARQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.21,240,1763452800"; 
   d="scan'208";a="206401252"
Received: from liuzhao-optiplex-7080.sh.intel.com (HELO localhost) ([10.239.160.39])
  by fmviesa010.fm.intel.com with ESMTP; 19 Jan 2026 23:55:44 -0800
Date: Tue, 20 Jan 2026 16:21:16 +0800
From: Zhao Liu <zhao1.liu@intel.com>
To: Hao Li <hao.li@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>, Hao Li <haolee.swjtu@gmail.com>,
	akpm@linux-foundation.org, harry.yoo@oracle.com, cl@gentwo.org,
	rientjes@google.com, roman.gushchin@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, tim.c.chen@intel.com,
	yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in
 __pcs_replace_empty_main()
Message-ID: <aW86/Nc2+bkopFd7@intel.com>
References: <20251210002629.34448-1-haoli.tcs@gmail.com>
 <a231264a-2da5-4468-a276-777fc0241246@suse.cz>
 <aWi9nAbIkTfYFoMM@intel.com>
 <3ozekmmsscrarwoa7vcytwjn5rxsiyxjrcsirlu3bhmlwtdxzn@s7a6rcxnqadc>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <3ozekmmsscrarwoa7vcytwjn5rxsiyxjrcsirlu3bhmlwtdxzn@s7a6rcxnqadc>

> 1. Machine Configuration
> 
> The topology of my machine is as follows:
> 
> CPU(s):              384
> On-line CPU(s) list: 0-383
> Thread(s) per core:  2
> Core(s) per socket:  96
> Socket(s):           2
> NUMA node(s):        2

It seems like this is a GNR machine - maybe SNC could be enabled.

> Since my machine only has 192 cores when counting physical cores, I had to
> enable SMT to support the higher number of tasks in the LKP test cases. My
> configuration was as follows:
> 
> will-it-scale:
>   mode: process
>   test: mmap2
>   no_affinity: 0
>   smt: 1

For lkp, smt parameter is disabled. I tried with smt=1 locally, the
difference between "with fix" & "w/o fix" is not significate. Maybe smt
parameter could be set as 0.

On another machine (2 sockets with SNC3 enabled - 6 NUMA nodes), there's
the similar regression happening when tasks fill up a socket and then
there're more get_partial_node().

> Here's the "perf report --no-children -g" output with the patch:
> 
> ```
> +   30.36%  mmap2_processes  [kernel.kallsyms]     [k] perf_iterate_ctx
> -   28.80%  mmap2_processes  [kernel.kallsyms]     [k] native_queued_spin_lock_slowpath
>    - 24.72% testcase
>       - 24.71% __mmap
>          - 24.68% entry_SYSCALL_64_after_hwframe
>             - do_syscall_64
>                - 24.61% ksys_mmap_pgoff
>                   - 24.57% vm_mmap_pgoff
>                      - 24.51% do_mmap
>                         - 24.30% __mmap_region
>                            - 18.33% mas_preallocate
>                               - 18.30% mas_alloc_nodes
>                                  - 18.30% kmem_cache_alloc_noprof
>                                     - 18.28% __pcs_replace_empty_main
>                                        + 9.06% barn_replace_empty_sheaf
>                                        + 6.12% barn_get_empty_sheaf
>                                        + 3.09% refill_sheaf

this is the difference with my previous perf report: here the proportion
of refill_sheaf is low - it indicates the shaeves are enough in the most
time.

Back to my previous test, I'm guessing that with this fix, under extreme
conditions of massive mmap usage, each CPU now stores an empty spare sheaf
locally. Previously, each CPU's spare sheaf was NULL. So memory pressure
increases with more spare sheaves locally. And in that extreme scenario,
cross-socket remote NUMA access incurs significant overhead — which is why
regression occurs here.

However, testing from 1 task to max tasks (nr_tasks = nr_logical_cpus)
shows overall significant improvements in most scenarios. Regressions
only occur at the specific topology boundaries described above.

I believe the cases with performance gains are more common. So I think
the regression is a corner case. If it does indeed impact certain
workloads in the future, we may need to reconsider optimization at that
time. It can now be used as a reference.

Thanks,
Zhao