public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/mempolicy: track page allocations per mempolicy
@ 2026-03-07  4:55 JP Kobryn (Meta)
  2026-03-07 12:27 ` Huang, Ying
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-07  4:55 UTC (permalink / raw)
  To: linux-mm, akpm, mhocko, vbabka
  Cc: apopple, axelrasmussen, byungchul, cgroups, david, eperezma,
	gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
	linux-kernel, lorenzo.stoakes, matthew.brost, mst, rppt,
	muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang, yuanchu, ziy, kernel-team

When investigating pressure on a NUMA node, there is no straightforward way
to determine which policies are driving allocations to it.

Add per-policy page allocation counters as new node stat items. These
counters track allocations to nodes and also whether the allocations were
intentional or fallbacks.

The new stats follow the existing numa hit/miss/foreign style and have the
following meanings:

  hit
    - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
    - for other policies, allocation succeeded on intended node
    - counted on the node of the allocation
  miss
    - allocation intended for other node, but happened on this one
    - counted on other node
  foreign
    - allocation intended on this node, but happened on other node
    - counted on this node

Counters are exposed per-memcg, per-node in memory.numa_stat and globally
in /proc/vmstat.

Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
---
v2:
  - Replaced single per-policy total counter (PGALLOC_MPOL_*) with
    hit/miss/foreign triplet per policy
  - Changed from global node stats to per-memcg per-node tracking

v1:
https://lore.kernel.org/linux-mm/20260212045109.255391-2-inwardvessel@gmail.com/

 include/linux/mmzone.h | 20 ++++++++++
 mm/memcontrol.c        | 60 ++++++++++++++++++++++++++++
 mm/mempolicy.c         | 90 ++++++++++++++++++++++++++++++++++++++++--
 mm/vmstat.c            | 20 ++++++++++
 4 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..c0517cbcb0e2 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -323,6 +323,26 @@ enum node_stat_item {
 	PGSCAN_ANON,
 	PGSCAN_FILE,
 	PGREFILL,
+#ifdef CONFIG_NUMA
+	NUMA_MPOL_LOCAL_HIT,
+	NUMA_MPOL_LOCAL_MISS,
+	NUMA_MPOL_LOCAL_FOREIGN,
+	NUMA_MPOL_PREFERRED_HIT,
+	NUMA_MPOL_PREFERRED_MISS,
+	NUMA_MPOL_PREFERRED_FOREIGN,
+	NUMA_MPOL_PREFERRED_MANY_HIT,
+	NUMA_MPOL_PREFERRED_MANY_MISS,
+	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
+	NUMA_MPOL_BIND_HIT,
+	NUMA_MPOL_BIND_MISS,
+	NUMA_MPOL_BIND_FOREIGN,
+	NUMA_MPOL_INTERLEAVE_HIT,
+	NUMA_MPOL_INTERLEAVE_MISS,
+	NUMA_MPOL_INTERLEAVE_FOREIGN,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
+#endif
 #ifdef CONFIG_HUGETLB_PAGE
 	NR_HUGETLB,
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 982231a078f2..4d29f723a2de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -420,6 +420,26 @@ static const unsigned int memcg_node_stat_items[] = {
 	PGSCAN_ANON,
 	PGSCAN_FILE,
 	PGREFILL,
+#ifdef CONFIG_NUMA
+	NUMA_MPOL_LOCAL_HIT,
+	NUMA_MPOL_LOCAL_MISS,
+	NUMA_MPOL_LOCAL_FOREIGN,
+	NUMA_MPOL_PREFERRED_HIT,
+	NUMA_MPOL_PREFERRED_MISS,
+	NUMA_MPOL_PREFERRED_FOREIGN,
+	NUMA_MPOL_PREFERRED_MANY_HIT,
+	NUMA_MPOL_PREFERRED_MANY_MISS,
+	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
+	NUMA_MPOL_BIND_HIT,
+	NUMA_MPOL_BIND_MISS,
+	NUMA_MPOL_BIND_FOREIGN,
+	NUMA_MPOL_INTERLEAVE_HIT,
+	NUMA_MPOL_INTERLEAVE_MISS,
+	NUMA_MPOL_INTERLEAVE_FOREIGN,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
+	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
+#endif
 #ifdef CONFIG_HUGETLB_PAGE
 	NR_HUGETLB,
 #endif
@@ -1591,6 +1611,26 @@ static const struct memory_stat memory_stats[] = {
 #ifdef CONFIG_NUMA_BALANCING
 	{ "pgpromote_success",		PGPROMOTE_SUCCESS	},
 #endif
+#ifdef CONFIG_NUMA
+	{ "numa_mpol_local_hit",		NUMA_MPOL_LOCAL_HIT		},
+	{ "numa_mpol_local_miss",		NUMA_MPOL_LOCAL_MISS		},
+	{ "numa_mpol_local_foreign",		NUMA_MPOL_LOCAL_FOREIGN		},
+	{ "numa_mpol_preferred_hit",		NUMA_MPOL_PREFERRED_HIT		},
+	{ "numa_mpol_preferred_miss",		NUMA_MPOL_PREFERRED_MISS	},
+	{ "numa_mpol_preferred_foreign",	NUMA_MPOL_PREFERRED_FOREIGN	},
+	{ "numa_mpol_preferred_many_hit",	NUMA_MPOL_PREFERRED_MANY_HIT	},
+	{ "numa_mpol_preferred_many_miss",	NUMA_MPOL_PREFERRED_MANY_MISS	},
+	{ "numa_mpol_preferred_many_foreign",	NUMA_MPOL_PREFERRED_MANY_FOREIGN },
+	{ "numa_mpol_bind_hit",			NUMA_MPOL_BIND_HIT		},
+	{ "numa_mpol_bind_miss",		NUMA_MPOL_BIND_MISS		},
+	{ "numa_mpol_bind_foreign",		NUMA_MPOL_BIND_FOREIGN		},
+	{ "numa_mpol_interleave_hit",		NUMA_MPOL_INTERLEAVE_HIT	},
+	{ "numa_mpol_interleave_miss",		NUMA_MPOL_INTERLEAVE_MISS	},
+	{ "numa_mpol_interleave_foreign",	NUMA_MPOL_INTERLEAVE_FOREIGN	},
+	{ "numa_mpol_weighted_interleave_hit",	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT },
+	{ "numa_mpol_weighted_interleave_miss",	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS },
+	{ "numa_mpol_weighted_interleave_foreign", NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN },
+#endif
 };
 
 /* The actual unit of the state item, not the same as the output unit */
@@ -1642,6 +1682,26 @@ static int memcg_page_state_output_unit(int item)
 	case PGREFILL:
 #ifdef CONFIG_NUMA_BALANCING
 	case PGPROMOTE_SUCCESS:
+#endif
+#ifdef CONFIG_NUMA
+	case NUMA_MPOL_LOCAL_HIT:
+	case NUMA_MPOL_LOCAL_MISS:
+	case NUMA_MPOL_LOCAL_FOREIGN:
+	case NUMA_MPOL_PREFERRED_HIT:
+	case NUMA_MPOL_PREFERRED_MISS:
+	case NUMA_MPOL_PREFERRED_FOREIGN:
+	case NUMA_MPOL_PREFERRED_MANY_HIT:
+	case NUMA_MPOL_PREFERRED_MANY_MISS:
+	case NUMA_MPOL_PREFERRED_MANY_FOREIGN:
+	case NUMA_MPOL_BIND_HIT:
+	case NUMA_MPOL_BIND_MISS:
+	case NUMA_MPOL_BIND_FOREIGN:
+	case NUMA_MPOL_INTERLEAVE_HIT:
+	case NUMA_MPOL_INTERLEAVE_MISS:
+	case NUMA_MPOL_INTERLEAVE_FOREIGN:
+	case NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT:
+	case NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS:
+	case NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN:
 #endif
 		return 1;
 	default:
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0e5175f1c767..2417de75098d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -117,6 +117,7 @@
 #include <asm/tlb.h>
 #include <linux/uaccess.h>
 #include <linux/memory.h>
+#include <linux/memcontrol.h>
 
 #include "internal.h"
 
@@ -2426,6 +2427,83 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
 	return page;
 }
 
+/*
+ * Count a mempolicy allocation. Stats are tracked per-node and per-cgroup.
+ * The following numa_{hit/miss/foreign} pattern is used:
+ *
+ *   hit
+ *     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
+ *     - for other policies, allocation succeeded on intended node
+ *     - counted on the node of the allocation
+ *   miss
+ *     - allocation intended for other node, but happened on this one
+ *     - counted on other node
+ *   foreign
+ *     - allocation intended on this node, but happened on other node
+ *     - counted on this node
+ */
+static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
+				  struct page *page, unsigned int order)
+{
+	int actual_nid = page_to_nid(page);
+	long nr_pages = 1L << order;
+	enum node_stat_item hit_idx;
+	struct mem_cgroup *memcg;
+	struct lruvec *lruvec;
+	bool is_hit;
+
+	if (!root_mem_cgroup || mem_cgroup_disabled())
+		return;
+
+	/*
+	 * Start with hit then use +1 or +2 later on to change to miss or
+	 * foreign respectively if needed.
+	 */
+	switch (pol->mode) {
+	case MPOL_PREFERRED:
+		hit_idx = NUMA_MPOL_PREFERRED_HIT;
+		break;
+	case MPOL_PREFERRED_MANY:
+		hit_idx = NUMA_MPOL_PREFERRED_MANY_HIT;
+		break;
+	case MPOL_BIND:
+		hit_idx = NUMA_MPOL_BIND_HIT;
+		break;
+	case MPOL_INTERLEAVE:
+		hit_idx = NUMA_MPOL_INTERLEAVE_HIT;
+		break;
+	case MPOL_WEIGHTED_INTERLEAVE:
+		hit_idx = NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT;
+		break;
+	default:
+		hit_idx = NUMA_MPOL_LOCAL_HIT;
+		break;
+	}
+
+	if (pol->mode == MPOL_BIND || pol->mode == MPOL_PREFERRED_MANY)
+		is_hit = node_isset(actual_nid, pol->nodes);
+	else
+		is_hit = (actual_nid == intended_nid);
+
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+
+	if (is_hit) {
+		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
+		mod_lruvec_state(lruvec, hit_idx, nr_pages);
+	} else {
+		/* account for miss on the fallback node */
+		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
+		mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
+
+		/* account for foreign on the intended node */
+		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
+		mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
+	}
+
+	rcu_read_unlock();
+}
+
 /**
  * alloc_pages_mpol - Allocate pages according to NUMA mempolicy.
  * @gfp: GFP flags.
@@ -2444,8 +2522,10 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 
 	nodemask = policy_nodemask(gfp, pol, ilx, &nid);
 
-	if (pol->mode == MPOL_PREFERRED_MANY)
-		return alloc_pages_preferred_many(gfp, order, nid, nodemask);
+	if (pol->mode == MPOL_PREFERRED_MANY) {
+		page = alloc_pages_preferred_many(gfp, order, nid, nodemask);
+		goto out;
+	}
 
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    /* filter "hugepage" allocation, unless from alloc_pages() */
@@ -2471,7 +2551,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
 				nid, NULL);
 			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
-				return page;
+				goto out;
 			/*
 			 * If hugepage allocations are configured to always
 			 * synchronous compact or the vma has been madvised
@@ -2494,6 +2574,10 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 		}
 	}
 
+out:
+	if (page)
+		mpol_count_numa_alloc(pol, nid, page, order);
+
 	return page;
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b33097ab9bc8..d9f745831624 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1291,6 +1291,26 @@ const char * const vmstat_text[] = {
 	[I(PGSCAN_ANON)]			= "pgscan_anon",
 	[I(PGSCAN_FILE)]			= "pgscan_file",
 	[I(PGREFILL)]				= "pgrefill",
+#ifdef CONFIG_NUMA
+	[I(NUMA_MPOL_LOCAL_HIT)]		= "numa_mpol_local_hit",
+	[I(NUMA_MPOL_LOCAL_MISS)]		= "numa_mpol_local_miss",
+	[I(NUMA_MPOL_LOCAL_FOREIGN)]		= "numa_mpol_local_foreign",
+	[I(NUMA_MPOL_PREFERRED_HIT)]		= "numa_mpol_preferred_hit",
+	[I(NUMA_MPOL_PREFERRED_MISS)]		= "numa_mpol_preferred_miss",
+	[I(NUMA_MPOL_PREFERRED_FOREIGN)]	= "numa_mpol_preferred_foreign",
+	[I(NUMA_MPOL_PREFERRED_MANY_HIT)]	= "numa_mpol_preferred_many_hit",
+	[I(NUMA_MPOL_PREFERRED_MANY_MISS)]	= "numa_mpol_preferred_many_miss",
+	[I(NUMA_MPOL_PREFERRED_MANY_FOREIGN)]	= "numa_mpol_preferred_many_foreign",
+	[I(NUMA_MPOL_BIND_HIT)]			= "numa_mpol_bind_hit",
+	[I(NUMA_MPOL_BIND_MISS)]		= "numa_mpol_bind_miss",
+	[I(NUMA_MPOL_BIND_FOREIGN)]		= "numa_mpol_bind_foreign",
+	[I(NUMA_MPOL_INTERLEAVE_HIT)]		= "numa_mpol_interleave_hit",
+	[I(NUMA_MPOL_INTERLEAVE_MISS)]		= "numa_mpol_interleave_miss",
+	[I(NUMA_MPOL_INTERLEAVE_FOREIGN)]	= "numa_mpol_interleave_foreign",
+	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT)]	= "numa_mpol_weighted_interleave_hit",
+	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS)]	= "numa_mpol_weighted_interleave_miss",
+	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN)] = "numa_mpol_weighted_interleave_foreign",
+#endif
 #ifdef CONFIG_HUGETLB_PAGE
 	[I(NR_HUGETLB)]				= "nr_hugetlb",
 #endif
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
@ 2026-03-07 12:27 ` Huang, Ying
  2026-03-08 19:20   ` Gregory Price
  2026-03-09  4:31   ` JP Kobryn (Meta)
  2026-03-07 14:32 ` kernel test robot
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Huang, Ying @ 2026-03-07 12:27 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

"JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:

> When investigating pressure on a NUMA node, there is no straightforward way
> to determine which policies are driving allocations to it.
>
> Add per-policy page allocation counters as new node stat items. These
> counters track allocations to nodes and also whether the allocations were
> intentional or fallbacks.
>
> The new stats follow the existing numa hit/miss/foreign style and have the
> following meanings:
>
>   hit
>     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>     - for other policies, allocation succeeded on intended node
>     - counted on the node of the allocation
>   miss
>     - allocation intended for other node, but happened on this one
>     - counted on other node
>   foreign
>     - allocation intended on this node, but happened on other node
>     - counted on this node
>
> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> in /proc/vmstat.

IMHO, it may be better to describe your workflow as an example to use
the newly added statistics.  That can describe why we need them.  For
example, what you have described in

https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/

> 1) Pressure/OOMs reported while system-wide memory is free.
> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
> down node(s) under pressure. They become available in
> /sys/devices/system/node/nodeN/vmstat.
> 3) Check per-policy allocation counters (this patch) on that node to
> find what policy was driving it. Same readout at nodeN/vmstat.
> 4) Now use /proc/*/numa_maps to identify tasks using the policy.

One question.  If we have to search /proc/*/numa_maps, why can't we
find all necessary information via /proc/*/numa_maps?  For example,
which VMA uses the most pages on the node?  Which policy is used in the
VMA? ...

---
Best Regards,
Huang, Ying

[snip]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
  2026-03-07 12:27 ` Huang, Ying
@ 2026-03-07 14:32 ` kernel test robot
  2026-03-07 19:57 ` kernel test robot
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2026-03-07 14:32 UTC (permalink / raw)
  To: JP Kobryn (Meta), linux-mm, akpm, mhocko, vbabka
  Cc: oe-kbuild-all, apopple, axelrasmussen, byungchul, cgroups, david,
	eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
	linux-kernel, lorenzo.stoakes, matthew.brost, mst, rppt,
	muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang

Hi JP,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/JP-Kobryn-Meta/mm-mempolicy-track-page-allocations-per-mempolicy/20260307-125642
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260307045520.247998-1-jp.kobryn%40linux.dev
patch subject: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
config: x86_64-randconfig-074-20260307 (https://download.01.org/0day-ci/archive/20260307/202603072210.TSPUKsyq-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260307/202603072210.TSPUKsyq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603072210.TSPUKsyq-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/mempolicy.c: In function 'mpol_count_numa_alloc':
   mm/mempolicy.c:2489:17: error: implicit declaration of function 'mem_cgroup_from_task'; did you mean 'mem_cgroup_from_css'? [-Wimplicit-function-declaration]
    2489 |         memcg = mem_cgroup_from_task(current);
         |                 ^~~~~~~~~~~~~~~~~~~~
         |                 mem_cgroup_from_css
>> mm/mempolicy.c:2489:15: error: assignment to 'struct mem_cgroup *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2489 |         memcg = mem_cgroup_from_task(current);
         |               ^


vim +2489 mm/mempolicy.c

  2429	
  2430	/*
  2431	 * Count a mempolicy allocation. Stats are tracked per-node and per-cgroup.
  2432	 * The following numa_{hit/miss/foreign} pattern is used:
  2433	 *
  2434	 *   hit
  2435	 *     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
  2436	 *     - for other policies, allocation succeeded on intended node
  2437	 *     - counted on the node of the allocation
  2438	 *   miss
  2439	 *     - allocation intended for other node, but happened on this one
  2440	 *     - counted on other node
  2441	 *   foreign
  2442	 *     - allocation intended on this node, but happened on other node
  2443	 *     - counted on this node
  2444	 */
  2445	static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
  2446					  struct page *page, unsigned int order)
  2447	{
  2448		int actual_nid = page_to_nid(page);
  2449		long nr_pages = 1L << order;
  2450		enum node_stat_item hit_idx;
  2451		struct mem_cgroup *memcg;
  2452		struct lruvec *lruvec;
  2453		bool is_hit;
  2454	
  2455		if (!root_mem_cgroup || mem_cgroup_disabled())
  2456			return;
  2457	
  2458		/*
  2459		 * Start with hit then use +1 or +2 later on to change to miss or
  2460		 * foreign respectively if needed.
  2461		 */
  2462		switch (pol->mode) {
  2463		case MPOL_PREFERRED:
  2464			hit_idx = NUMA_MPOL_PREFERRED_HIT;
  2465			break;
  2466		case MPOL_PREFERRED_MANY:
  2467			hit_idx = NUMA_MPOL_PREFERRED_MANY_HIT;
  2468			break;
  2469		case MPOL_BIND:
  2470			hit_idx = NUMA_MPOL_BIND_HIT;
  2471			break;
  2472		case MPOL_INTERLEAVE:
  2473			hit_idx = NUMA_MPOL_INTERLEAVE_HIT;
  2474			break;
  2475		case MPOL_WEIGHTED_INTERLEAVE:
  2476			hit_idx = NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT;
  2477			break;
  2478		default:
  2479			hit_idx = NUMA_MPOL_LOCAL_HIT;
  2480			break;
  2481		}
  2482	
  2483		if (pol->mode == MPOL_BIND || pol->mode == MPOL_PREFERRED_MANY)
  2484			is_hit = node_isset(actual_nid, pol->nodes);
  2485		else
  2486			is_hit = (actual_nid == intended_nid);
  2487	
  2488		rcu_read_lock();
> 2489		memcg = mem_cgroup_from_task(current);
  2490	
  2491		if (is_hit) {
  2492			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
  2493			mod_lruvec_state(lruvec, hit_idx, nr_pages);
  2494		} else {
  2495			/* account for miss on the fallback node */
  2496			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
  2497			mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
  2498	
  2499			/* account for foreign on the intended node */
  2500			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
  2501			mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
  2502		}
  2503	
  2504		rcu_read_unlock();
  2505	}
  2506	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
  2026-03-07 12:27 ` Huang, Ying
  2026-03-07 14:32 ` kernel test robot
@ 2026-03-07 19:57 ` kernel test robot
  2026-03-08 19:24 ` Usama Arif
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2026-03-07 19:57 UTC (permalink / raw)
  To: JP Kobryn (Meta), linux-mm, akpm, mhocko, vbabka
  Cc: oe-kbuild-all, apopple, axelrasmussen, byungchul, cgroups, david,
	eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
	linux-kernel, lorenzo.stoakes, matthew.brost, mst, rppt,
	muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang

Hi JP,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/JP-Kobryn-Meta/mm-mempolicy-track-page-allocations-per-mempolicy/20260307-125642
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260307045520.247998-1-jp.kobryn%40linux.dev
patch subject: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
config: arm64-randconfig-001-20260307 (https://download.01.org/0day-ci/archive/20260308/202603080349.ya7tIjgk-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260308/202603080349.ya7tIjgk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603080349.ya7tIjgk-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   mm/mempolicy.c: In function 'mpol_count_numa_alloc':
>> mm/mempolicy.c:2489:10: error: implicit declaration of function 'mem_cgroup_from_task'; did you mean 'perf_cgroup_from_task'? [-Werror=implicit-function-declaration]
     memcg = mem_cgroup_from_task(current);
             ^~~~~~~~~~~~~~~~~~~~
             perf_cgroup_from_task
>> mm/mempolicy.c:2489:8: warning: assignment to 'struct mem_cgroup *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     memcg = mem_cgroup_from_task(current);
           ^
   cc1: some warnings being treated as errors


vim +2489 mm/mempolicy.c

  2429	
  2430	/*
  2431	 * Count a mempolicy allocation. Stats are tracked per-node and per-cgroup.
  2432	 * The following numa_{hit/miss/foreign} pattern is used:
  2433	 *
  2434	 *   hit
  2435	 *     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
  2436	 *     - for other policies, allocation succeeded on intended node
  2437	 *     - counted on the node of the allocation
  2438	 *   miss
  2439	 *     - allocation intended for other node, but happened on this one
  2440	 *     - counted on other node
  2441	 *   foreign
  2442	 *     - allocation intended on this node, but happened on other node
  2443	 *     - counted on this node
  2444	 */
  2445	static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
  2446					  struct page *page, unsigned int order)
  2447	{
  2448		int actual_nid = page_to_nid(page);
  2449		long nr_pages = 1L << order;
  2450		enum node_stat_item hit_idx;
  2451		struct mem_cgroup *memcg;
  2452		struct lruvec *lruvec;
  2453		bool is_hit;
  2454	
  2455		if (!root_mem_cgroup || mem_cgroup_disabled())
  2456			return;
  2457	
  2458		/*
  2459		 * Start with hit then use +1 or +2 later on to change to miss or
  2460		 * foreign respectively if needed.
  2461		 */
  2462		switch (pol->mode) {
  2463		case MPOL_PREFERRED:
  2464			hit_idx = NUMA_MPOL_PREFERRED_HIT;
  2465			break;
  2466		case MPOL_PREFERRED_MANY:
  2467			hit_idx = NUMA_MPOL_PREFERRED_MANY_HIT;
  2468			break;
  2469		case MPOL_BIND:
  2470			hit_idx = NUMA_MPOL_BIND_HIT;
  2471			break;
  2472		case MPOL_INTERLEAVE:
  2473			hit_idx = NUMA_MPOL_INTERLEAVE_HIT;
  2474			break;
  2475		case MPOL_WEIGHTED_INTERLEAVE:
  2476			hit_idx = NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT;
  2477			break;
  2478		default:
  2479			hit_idx = NUMA_MPOL_LOCAL_HIT;
  2480			break;
  2481		}
  2482	
  2483		if (pol->mode == MPOL_BIND || pol->mode == MPOL_PREFERRED_MANY)
  2484			is_hit = node_isset(actual_nid, pol->nodes);
  2485		else
  2486			is_hit = (actual_nid == intended_nid);
  2487	
  2488		rcu_read_lock();
> 2489		memcg = mem_cgroup_from_task(current);
  2490	
  2491		if (is_hit) {
  2492			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
  2493			mod_lruvec_state(lruvec, hit_idx, nr_pages);
  2494		} else {
  2495			/* account for miss on the fallback node */
  2496			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
  2497			mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
  2498	
  2499			/* account for foreign on the intended node */
  2500			lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
  2501			mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
  2502		}
  2503	
  2504		rcu_read_unlock();
  2505	}
  2506	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07 12:27 ` Huang, Ying
@ 2026-03-08 19:20   ` Gregory Price
  2026-03-09  4:11     ` JP Kobryn (Meta)
  2026-03-09  4:31   ` JP Kobryn (Meta)
  1 sibling, 1 reply; 30+ messages in thread
From: Gregory Price @ 2026-03-08 19:20 UTC (permalink / raw)
  To: Huang, Ying
  Cc: JP Kobryn (Meta), linux-mm, akpm, mhocko, vbabka, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, jasowang,
	hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

On Sat, Mar 07, 2026 at 08:27:22PM +0800, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
> >
> >   hit
> >     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
> >     - for other policies, allocation succeeded on intended node
> >     - counted on the node of the allocation
> >   miss
> >     - allocation intended for other node, but happened on this one
> >     - counted on other node
> >   foreign
> >     - allocation intended on this node, but happened on other node
> >     - counted on this node
> >
> > Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> > in /proc/vmstat.
> 
> IMHO, it may be better to describe your workflow as an example to use
> the newly added statistics.  That can describe why we need them.  For
> example, what you have described in
> 
> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
> 
> > 1) Pressure/OOMs reported while system-wide memory is free.
> > 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
> > down node(s) under pressure. They become available in
> > /sys/devices/system/node/nodeN/vmstat.
> > 3) Check per-policy allocation counters (this patch) on that node to
> > find what policy was driving it. Same readout at nodeN/vmstat.
> > 4) Now use /proc/*/numa_maps to identify tasks using the policy.
> 
> One question.  If we have to search /proc/*/numa_maps, why can't we
> find all necessary information via /proc/*/numa_maps?  For example,
> which VMA uses the most pages on the node?  Which policy is used in the
> VMA? ...
> 

I am a little confused by this too - consider:

7f85dca86000 interleave=0,1 file=[...] mapped=14 mapmax=5 N0=3 N1=10 ...

Is n0=3 and N1=10 because we did those allocations according to the
policy but got fallbacks, or is it that way because we did 7/7 and
then things got migrated due to pressure?

Do these counters let you capture that, or does it just make the numbers
even more meaningless?

The page allocator will happily fallback to other nodes - even when a
mempolicy is present - because mempolicy is more of a suggestion rather
than a rule (unlike cpusets).  So I'd like to understand how these
counters are intended to be used a little better.

~Gregory

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
                   ` (2 preceding siblings ...)
  2026-03-07 19:57 ` kernel test robot
@ 2026-03-08 19:24 ` Usama Arif
  2026-03-09  3:30   ` JP Kobryn (Meta)
  2026-03-09 23:35 ` Shakeel Butt
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Usama Arif @ 2026-03-08 19:24 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: Usama Arif, linux-mm, akpm, mhocko, vbabka, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
	kernel-team

On Fri,  6 Mar 2026 20:55:20 -0800 "JP Kobryn (Meta)" <jp.kobryn@linux.dev> wrote:

> When investigating pressure on a NUMA node, there is no straightforward way
> to determine which policies are driving allocations to it.
> 
> Add per-policy page allocation counters as new node stat items. These
> counters track allocations to nodes and also whether the allocations were
> intentional or fallbacks.
> 
> The new stats follow the existing numa hit/miss/foreign style and have the
> following meanings:
> 
>   hit
>     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>     - for other policies, allocation succeeded on intended node
>     - counted on the node of the allocation
>   miss
>     - allocation intended for other node, but happened on this one
>     - counted on other node
>   foreign
>     - allocation intended on this node, but happened on other node
>     - counted on this node
> 
> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> in /proc/vmstat.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
> v2:
>   - Replaced single per-policy total counter (PGALLOC_MPOL_*) with
>     hit/miss/foreign triplet per policy
>   - Changed from global node stats to per-memcg per-node tracking
> 
> v1:
> https://lore.kernel.org/linux-mm/20260212045109.255391-2-inwardvessel@gmail.com/
> 
>  include/linux/mmzone.h | 20 ++++++++++
>  mm/memcontrol.c        | 60 ++++++++++++++++++++++++++++
>  mm/mempolicy.c         | 90 ++++++++++++++++++++++++++++++++++++++++--
>  mm/vmstat.c            | 20 ++++++++++
>  4 files changed, 187 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..c0517cbcb0e2 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -323,6 +323,26 @@ enum node_stat_item {
>  	PGSCAN_ANON,
>  	PGSCAN_FILE,
>  	PGREFILL,
> +#ifdef CONFIG_NUMA
> +	NUMA_MPOL_LOCAL_HIT,
> +	NUMA_MPOL_LOCAL_MISS,
> +	NUMA_MPOL_LOCAL_FOREIGN,
> +	NUMA_MPOL_PREFERRED_HIT,
> +	NUMA_MPOL_PREFERRED_MISS,
> +	NUMA_MPOL_PREFERRED_FOREIGN,
> +	NUMA_MPOL_PREFERRED_MANY_HIT,
> +	NUMA_MPOL_PREFERRED_MANY_MISS,
> +	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
> +	NUMA_MPOL_BIND_HIT,
> +	NUMA_MPOL_BIND_MISS,
> +	NUMA_MPOL_BIND_FOREIGN,
> +	NUMA_MPOL_INTERLEAVE_HIT,
> +	NUMA_MPOL_INTERLEAVE_MISS,
> +	NUMA_MPOL_INTERLEAVE_FOREIGN,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
> +#endif
>  #ifdef CONFIG_HUGETLB_PAGE
>  	NR_HUGETLB,
>  #endif
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 982231a078f2..4d29f723a2de 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -420,6 +420,26 @@ static const unsigned int memcg_node_stat_items[] = {
>  	PGSCAN_ANON,
>  	PGSCAN_FILE,
>  	PGREFILL,
> +#ifdef CONFIG_NUMA
> +	NUMA_MPOL_LOCAL_HIT,
> +	NUMA_MPOL_LOCAL_MISS,
> +	NUMA_MPOL_LOCAL_FOREIGN,
> +	NUMA_MPOL_PREFERRED_HIT,
> +	NUMA_MPOL_PREFERRED_MISS,
> +	NUMA_MPOL_PREFERRED_FOREIGN,
> +	NUMA_MPOL_PREFERRED_MANY_HIT,
> +	NUMA_MPOL_PREFERRED_MANY_MISS,
> +	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
> +	NUMA_MPOL_BIND_HIT,
> +	NUMA_MPOL_BIND_MISS,
> +	NUMA_MPOL_BIND_FOREIGN,
> +	NUMA_MPOL_INTERLEAVE_HIT,
> +	NUMA_MPOL_INTERLEAVE_MISS,
> +	NUMA_MPOL_INTERLEAVE_FOREIGN,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
> +#endif
>  #ifdef CONFIG_HUGETLB_PAGE
>  	NR_HUGETLB,
>  #endif
> @@ -1591,6 +1611,26 @@ static const struct memory_stat memory_stats[] = {
>  #ifdef CONFIG_NUMA_BALANCING
>  	{ "pgpromote_success",		PGPROMOTE_SUCCESS	},
>  #endif
> +#ifdef CONFIG_NUMA
> +	{ "numa_mpol_local_hit",		NUMA_MPOL_LOCAL_HIT		},
> +	{ "numa_mpol_local_miss",		NUMA_MPOL_LOCAL_MISS		},
> +	{ "numa_mpol_local_foreign",		NUMA_MPOL_LOCAL_FOREIGN		},
> +	{ "numa_mpol_preferred_hit",		NUMA_MPOL_PREFERRED_HIT		},
> +	{ "numa_mpol_preferred_miss",		NUMA_MPOL_PREFERRED_MISS	},
> +	{ "numa_mpol_preferred_foreign",	NUMA_MPOL_PREFERRED_FOREIGN	},
> +	{ "numa_mpol_preferred_many_hit",	NUMA_MPOL_PREFERRED_MANY_HIT	},
> +	{ "numa_mpol_preferred_many_miss",	NUMA_MPOL_PREFERRED_MANY_MISS	},
> +	{ "numa_mpol_preferred_many_foreign",	NUMA_MPOL_PREFERRED_MANY_FOREIGN },
> +	{ "numa_mpol_bind_hit",			NUMA_MPOL_BIND_HIT		},
> +	{ "numa_mpol_bind_miss",		NUMA_MPOL_BIND_MISS		},
> +	{ "numa_mpol_bind_foreign",		NUMA_MPOL_BIND_FOREIGN		},
> +	{ "numa_mpol_interleave_hit",		NUMA_MPOL_INTERLEAVE_HIT	},
> +	{ "numa_mpol_interleave_miss",		NUMA_MPOL_INTERLEAVE_MISS	},
> +	{ "numa_mpol_interleave_foreign",	NUMA_MPOL_INTERLEAVE_FOREIGN	},
> +	{ "numa_mpol_weighted_interleave_hit",	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT },
> +	{ "numa_mpol_weighted_interleave_miss",	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS },
> +	{ "numa_mpol_weighted_interleave_foreign", NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN },
> +#endif
>  };
>  
>  /* The actual unit of the state item, not the same as the output unit */
> @@ -1642,6 +1682,26 @@ static int memcg_page_state_output_unit(int item)
>  	case PGREFILL:
>  #ifdef CONFIG_NUMA_BALANCING
>  	case PGPROMOTE_SUCCESS:
> +#endif
> +#ifdef CONFIG_NUMA
> +	case NUMA_MPOL_LOCAL_HIT:
> +	case NUMA_MPOL_LOCAL_MISS:
> +	case NUMA_MPOL_LOCAL_FOREIGN:
> +	case NUMA_MPOL_PREFERRED_HIT:
> +	case NUMA_MPOL_PREFERRED_MISS:
> +	case NUMA_MPOL_PREFERRED_FOREIGN:
> +	case NUMA_MPOL_PREFERRED_MANY_HIT:
> +	case NUMA_MPOL_PREFERRED_MANY_MISS:
> +	case NUMA_MPOL_PREFERRED_MANY_FOREIGN:
> +	case NUMA_MPOL_BIND_HIT:
> +	case NUMA_MPOL_BIND_MISS:
> +	case NUMA_MPOL_BIND_FOREIGN:
> +	case NUMA_MPOL_INTERLEAVE_HIT:
> +	case NUMA_MPOL_INTERLEAVE_MISS:
> +	case NUMA_MPOL_INTERLEAVE_FOREIGN:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN:
>  #endif
>  		return 1;
>  	default:
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0e5175f1c767..2417de75098d 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -117,6 +117,7 @@
>  #include <asm/tlb.h>
>  #include <linux/uaccess.h>
>  #include <linux/memory.h>
> +#include <linux/memcontrol.h>
>  
>  #include "internal.h"
>  
> @@ -2426,6 +2427,83 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
>  	return page;
>  }
>  
> +/*
> + * Count a mempolicy allocation. Stats are tracked per-node and per-cgroup.
> + * The following numa_{hit/miss/foreign} pattern is used:
> + *
> + *   hit
> + *     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
> + *     - for other policies, allocation succeeded on intended node
> + *     - counted on the node of the allocation
> + *   miss
> + *     - allocation intended for other node, but happened on this one
> + *     - counted on other node
> + *   foreign
> + *     - allocation intended on this node, but happened on other node
> + *     - counted on this node
> + */
> +static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
> +				  struct page *page, unsigned int order)
> +{
> +	int actual_nid = page_to_nid(page);
> +	long nr_pages = 1L << order;
> +	enum node_stat_item hit_idx;
> +	struct mem_cgroup *memcg;
> +	struct lruvec *lruvec;
> +	bool is_hit;
> +
> +	if (!root_mem_cgroup || mem_cgroup_disabled())
> +		return;

Hello JP!

The stats are exposed via /proc/vmstat and are guarded by CONFIG_NUMA, not
CONFIG_MEMCG. Early returning overhere would make it inaccuate. Does
it make sense to use mod_node_page_state if memcg is not available,
so that these global counters work regardless of cgroup configuration.

> +
> +	/*
> +	 * Start with hit then use +1 or +2 later on to change to miss or
> +	 * foreign respectively if needed.
> +	 */
> +	switch (pol->mode) {
> +	case MPOL_PREFERRED:
> +		hit_idx = NUMA_MPOL_PREFERRED_HIT;
> +		break;
> +	case MPOL_PREFERRED_MANY:
> +		hit_idx = NUMA_MPOL_PREFERRED_MANY_HIT;
> +		break;
> +	case MPOL_BIND:
> +		hit_idx = NUMA_MPOL_BIND_HIT;
> +		break;
> +	case MPOL_INTERLEAVE:
> +		hit_idx = NUMA_MPOL_INTERLEAVE_HIT;
> +		break;
> +	case MPOL_WEIGHTED_INTERLEAVE:
> +		hit_idx = NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT;
> +		break;
> +	default:
> +		hit_idx = NUMA_MPOL_LOCAL_HIT;
> +		break;
> +	}
> +
> +	if (pol->mode == MPOL_BIND || pol->mode == MPOL_PREFERRED_MANY)
> +		is_hit = node_isset(actual_nid, pol->nodes);
> +	else
> +		is_hit = (actual_nid == intended_nid);
> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +
> +	if (is_hit) {
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx, nr_pages);
> +	} else {
> +		/* account for miss on the fallback node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
> +
> +		/* account for foreign on the intended node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
> +	}
> +
> +	rcu_read_unlock();
> +}
> +
>  /**
>   * alloc_pages_mpol - Allocate pages according to NUMA mempolicy.
>   * @gfp: GFP flags.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-08 19:24 ` Usama Arif
@ 2026-03-09  3:30   ` JP Kobryn (Meta)
  2026-03-11 18:06     ` Johannes Weiner
  0 siblings, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-09  3:30 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang, yuanchu, ziy, kernel-team

On 3/8/26 12:24 PM, Usama Arif wrote:
> On Fri,  6 Mar 2026 20:55:20 -0800 "JP Kobryn (Meta)" <jp.kobryn@linux.dev> wrote:
[...]
>> +static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
>> +				  struct page *page, unsigned int order)
>> +{
>> +	int actual_nid = page_to_nid(page);
>> +	long nr_pages = 1L << order;
>> +	enum node_stat_item hit_idx;
>> +	struct mem_cgroup *memcg;
>> +	struct lruvec *lruvec;
>> +	bool is_hit;
>> +
>> +	if (!root_mem_cgroup || mem_cgroup_disabled())
>> +		return;
> 
> Hello JP!
> 
> The stats are exposed via /proc/vmstat and are guarded by CONFIG_NUMA, not
> CONFIG_MEMCG. Early returning overhere would make it inaccuate. Does
> it make sense to use mod_node_page_state if memcg is not available,
> so that these global counters work regardless of cgroup configuration.
>

Good call. I can instead do:

if (!mem_cgroup_disabled() && root_mem_cgroup) {
	struct mem_cgroup *memcg;
	struct lruvec *lruvec;
	/* use lruvec for updating stats */
} else {
	/* use node for updating stats */
}

This should also take care of the bot warning on mem_cgroup_from_task()
not being available.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-08 19:20   ` Gregory Price
@ 2026-03-09  4:11     ` JP Kobryn (Meta)
  0 siblings, 0 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-09  4:11 UTC (permalink / raw)
  To: Gregory Price, Huang, Ying
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/8/26 12:20 PM, Gregory Price wrote:
> On Sat, Mar 07, 2026 at 08:27:22PM +0800, Huang, Ying wrote:
>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>
>>>
>>>    hit
>>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>      - for other policies, allocation succeeded on intended node
>>>      - counted on the node of the allocation
>>>    miss
>>>      - allocation intended for other node, but happened on this one
>>>      - counted on other node
>>>    foreign
>>>      - allocation intended on this node, but happened on other node
>>>      - counted on this node
>>>
>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>> in /proc/vmstat.
>>
>> IMHO, it may be better to describe your workflow as an example to use
>> the newly added statistics.  That can describe why we need them.  For
>> example, what you have described in
>>
>> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
>>
>>> 1) Pressure/OOMs reported while system-wide memory is free.
>>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>>> down node(s) under pressure. They become available in
>>> /sys/devices/system/node/nodeN/vmstat.
>>> 3) Check per-policy allocation counters (this patch) on that node to
>>> find what policy was driving it. Same readout at nodeN/vmstat.
>>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
>>
>> One question.  If we have to search /proc/*/numa_maps, why can't we
>> find all necessary information via /proc/*/numa_maps?  For example,
>> which VMA uses the most pages on the node?  Which policy is used in the
>> VMA? ...
>>
> 
> I am a little confused by this too - consider:
> 
> 7f85dca86000 interleave=0,1 file=[...] mapped=14 mapmax=5 N0=3 N1=10 ...
> 
> Is n0=3 and N1=10 because we did those allocations according to the
> policy but got fallbacks, or is it that way because we did 7/7 and
> then things got migrated due to pressure?

That ambiguity should be resolved with this patch.

> 
> Do these counters let you capture that, or does it just make the numbers
> even more meaningless?

You would be able to look at the new counters and see that the
allocations were distributed evenly at the time of allocation. If an
imbalance is observed afterward we would know that it was due to
migration.

> 
> The page allocator will happily fallback to other nodes - even when a
> mempolicy is present - because mempolicy is more of a suggestion rather
> than a rule (unlike cpusets).  So I'd like to understand how these
> counters are intended to be used a little better.

That was the motivation for v2. In the previous rev, there was debate on
the lack of accounting for the fallback cases. So in this patch we
account for the fallbacks by making use of miss/foreign. In terms of how
the counters are intended to be used, the workflow would resemble:

1) Pressure/OOMs reported while system-wide memory is free.
2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
    down node(s) under pressure.
3) Check per-policy hit/miss/foreign counters (added by this patch) on
    node(s) to see what policy is driving allocations there (intentional
    vs fallback).
4) Use /proc/*/numa_maps to identify tasks using the policy.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07 12:27 ` Huang, Ying
  2026-03-08 19:20   ` Gregory Price
@ 2026-03-09  4:31   ` JP Kobryn (Meta)
  2026-03-11  2:56     ` Huang, Ying
  1 sibling, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-09  4:31 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/7/26 4:27 AM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> When investigating pressure on a NUMA node, there is no straightforward way
>> to determine which policies are driving allocations to it.
>>
>> Add per-policy page allocation counters as new node stat items. These
>> counters track allocations to nodes and also whether the allocations were
>> intentional or fallbacks.
>>
>> The new stats follow the existing numa hit/miss/foreign style and have the
>> following meanings:
>>
>>    hit
>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>      - for other policies, allocation succeeded on intended node
>>      - counted on the node of the allocation
>>    miss
>>      - allocation intended for other node, but happened on this one
>>      - counted on other node
>>    foreign
>>      - allocation intended on this node, but happened on other node
>>      - counted on this node
>>
>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>> in /proc/vmstat.
> 
> IMHO, it may be better to describe your workflow as an example to use
> the newly added statistics.  That can describe why we need them.  For
> example, what you have described in
> 
> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
> 
>> 1) Pressure/OOMs reported while system-wide memory is free.
>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>> down node(s) under pressure. They become available in
>> /sys/devices/system/node/nodeN/vmstat.
>> 3) Check per-policy allocation counters (this patch) on that node to
>> find what policy was driving it. Same readout at nodeN/vmstat.
>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
> 

Good call. I'll add a workflow adapted for the current approach in
the next revision. I included it in another response in this thread, but
I'll repeat here because it will make it easier to answer your question
below.

1) Pressure/OOMs reported while system-wide memory is free.
2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
    down node(s) under pressure.
3) Check per-policy hit/miss/foreign counters (added by this patch) on
    node(s) to see what policy is driving allocations there (intentional
    vs fallback).
4) Use /proc/*/numa_maps to identify tasks using the policy.

> One question.  If we have to search /proc/*/numa_maps, why can't we
> find all necessary information via /proc/*/numa_maps?  For example,
> which VMA uses the most pages on the node?  Which policy is used in the
> VMA? ...
> 

There's a gap in the flow of information if we go straight from a node
in question to numa_maps. Without step 3 above, we can't distinguish
whether pages landed there intentionally, as a fallback, or were
migrated sometime after the allocation. These new counters track the
results of allocations at the time they happen, preserving that
information regardless of what may happen later on.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
                   ` (3 preceding siblings ...)
  2026-03-08 19:24 ` Usama Arif
@ 2026-03-09 23:35 ` Shakeel Butt
  2026-03-09 23:43 ` Shakeel Butt
  2026-03-12 13:40 ` Vlastimil Babka (SUSE)
  6 siblings, 0 replies; 30+ messages in thread
From: Shakeel Butt @ 2026-03-09 23:35 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	surenb, virtualization, weixugc, xuanzhuo, ying.huang, yuanchu,
	ziy, kernel-team

On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
> When investigating pressure on a NUMA node, there is no straightforward way
> to determine which policies are driving allocations to it.
> 
> Add per-policy page allocation counters as new node stat items. These
> counters track allocations to nodes and also whether the allocations were
> intentional or fallbacks.
> 
> The new stats follow the existing numa hit/miss/foreign style and have the
> following meanings:
> 
>   hit
>     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>     - for other policies, allocation succeeded on intended node
>     - counted on the node of the allocation
>   miss
>     - allocation intended for other node, but happened on this one
>     - counted on other node
>   foreign
>     - allocation intended on this node, but happened on other node
>     - counted on this node
> 
> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> in /proc/vmstat.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
> v2:
>   - Replaced single per-policy total counter (PGALLOC_MPOL_*) with
>     hit/miss/foreign triplet per policy
>   - Changed from global node stats to per-memcg per-node tracking
> 
> v1:
> https://lore.kernel.org/linux-mm/20260212045109.255391-2-inwardvessel@gmail.com/
> 
>  include/linux/mmzone.h | 20 ++++++++++
>  mm/memcontrol.c        | 60 ++++++++++++++++++++++++++++
>  mm/mempolicy.c         | 90 ++++++++++++++++++++++++++++++++++++++++--
>  mm/vmstat.c            | 20 ++++++++++
>  4 files changed, 187 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..c0517cbcb0e2 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -323,6 +323,26 @@ enum node_stat_item {
>  	PGSCAN_ANON,
>  	PGSCAN_FILE,
>  	PGREFILL,
> +#ifdef CONFIG_NUMA
> +	NUMA_MPOL_LOCAL_HIT,
> +	NUMA_MPOL_LOCAL_MISS,
> +	NUMA_MPOL_LOCAL_FOREIGN,
> +	NUMA_MPOL_PREFERRED_HIT,
> +	NUMA_MPOL_PREFERRED_MISS,
> +	NUMA_MPOL_PREFERRED_FOREIGN,
> +	NUMA_MPOL_PREFERRED_MANY_HIT,
> +	NUMA_MPOL_PREFERRED_MANY_MISS,
> +	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
> +	NUMA_MPOL_BIND_HIT,
> +	NUMA_MPOL_BIND_MISS,
> +	NUMA_MPOL_BIND_FOREIGN,
> +	NUMA_MPOL_INTERLEAVE_HIT,
> +	NUMA_MPOL_INTERLEAVE_MISS,
> +	NUMA_MPOL_INTERLEAVE_FOREIGN,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
> +#endif

I have not looked into what these metrics mean but these are too many, at least
for the memcg. For the memcg, there is significant memory cost for each metric
added to memcg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
                   ` (4 preceding siblings ...)
  2026-03-09 23:35 ` Shakeel Butt
@ 2026-03-09 23:43 ` Shakeel Butt
  2026-03-10  4:17   ` JP Kobryn (Meta)
  2026-03-12 13:40 ` Vlastimil Babka (SUSE)
  6 siblings, 1 reply; 30+ messages in thread
From: Shakeel Butt @ 2026-03-09 23:43 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	surenb, virtualization, weixugc, xuanzhuo, ying.huang, yuanchu,
	ziy, kernel-team

On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
> When investigating pressure on a NUMA node, there is no straightforward way
> to determine which policies are driving allocations to it.
> 
> Add per-policy page allocation counters as new node stat items. These
> counters track allocations to nodes and also whether the allocations were
> intentional or fallbacks.
> 
> The new stats follow the existing numa hit/miss/foreign style and have the
> following meanings:
> 
>   hit
>     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>     - for other policies, allocation succeeded on intended node
>     - counted on the node of the allocation
>   miss
>     - allocation intended for other node, but happened on this one
>     - counted on other node
>   foreign
>     - allocation intended on this node, but happened on other node
>     - counted on this node
> 
> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> in /proc/vmstat.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>

[...]

> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +
> +	if (is_hit) {
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx, nr_pages);
> +	} else {
> +		/* account for miss on the fallback node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
> +
> +		/* account for foreign on the intended node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
> +	}

This seems like monotonic increasing metrics and I think you don't care about
their absolute value but rather rate of change. Any reason this can not be
achieved through tracepoints and BPF combination?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-09 23:43 ` Shakeel Butt
@ 2026-03-10  4:17   ` JP Kobryn (Meta)
  2026-03-10 14:53     ` Shakeel Butt
  0 siblings, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-10  4:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	surenb, virtualization, weixugc, xuanzhuo, ying.huang, yuanchu,
	ziy, kernel-team

On 3/9/26 4:43 PM, Shakeel Butt wrote:
> On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
>> When investigating pressure on a NUMA node, there is no straightforward way
>> to determine which policies are driving allocations to it.
>>
>> Add per-policy page allocation counters as new node stat items. These
>> counters track allocations to nodes and also whether the allocations were
>> intentional or fallbacks.
>>
>> The new stats follow the existing numa hit/miss/foreign style and have the
>> following meanings:
>>
>>    hit
>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>      - for other policies, allocation succeeded on intended node
>>      - counted on the node of the allocation
>>    miss
>>      - allocation intended for other node, but happened on this one
>>      - counted on other node
>>    foreign
>>      - allocation intended on this node, but happened on other node
>>      - counted on this node
>>
>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>> in /proc/vmstat.
>>
>> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> 
> [...]
> 
>> +
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(current);
>> +
>> +	if (is_hit) {
>> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
>> +		mod_lruvec_state(lruvec, hit_idx, nr_pages);
>> +	} else {
>> +		/* account for miss on the fallback node */
>> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
>> +		mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
>> +
>> +		/* account for foreign on the intended node */
>> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
>> +		mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
>> +	}
> 
> This seems like monotonic increasing metrics and I think you don't care about
> their absolute value but rather rate of change. Any reason this can not be
> achieved through tracepoints and BPF combination?

We have the per-node reclaim stats (pg{steal,scan,refill}) in
nodeN/vmstat and memory.numa_stat now. The new stats in this patch would
be collected from the same source. They were meant to be used together,
so it seemed like a reasonable location. I think the advantage over
tracepoints is we get the observability on from the start and it would
be simple to extend existing programs that already read stats from the
cgroup dir files.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-10  4:17   ` JP Kobryn (Meta)
@ 2026-03-10 14:53     ` Shakeel Butt
  2026-03-10 17:01       ` JP Kobryn (Meta)
  0 siblings, 1 reply; 30+ messages in thread
From: Shakeel Butt @ 2026-03-10 14:53 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	surenb, virtualization, weixugc, xuanzhuo, ying.huang, yuanchu,
	ziy, kernel-team

On Mon, Mar 09, 2026 at 09:17:43PM -0700, JP Kobryn (Meta) wrote:
> On 3/9/26 4:43 PM, Shakeel Butt wrote:
> > On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
[...]
> > 
> > This seems like monotonic increasing metrics and I think you don't care about
> > their absolute value but rather rate of change. Any reason this can not be
> > achieved through tracepoints and BPF combination?
> 
> We have the per-node reclaim stats (pg{steal,scan,refill}) in
> nodeN/vmstat and memory.numa_stat now. The new stats in this patch would
> be collected from the same source. They were meant to be used together,
> so it seemed like a reasonable location. I think the advantage over
> tracepoints is we get the observability on from the start and it would
> be simple to extend existing programs that already read stats from the
> cgroup dir files.

Convenience is not really justifying the cost of adding 18 counters,
particularly in memcg. We can argue about adding just in system level metrics
but not for memcg.

counter_cost = nr_cpus * nr_nodes * nr_memcg * 16 (struct lruvec_stats_percpu)

On a typical prod machine, we can see 1000s of memcg, 100s of cpus and couple of
numa nodes. So, a single counter's cost can range from 200KiB to MiBs. This does
not seem like a cost we should force everyone to pay.

If you really want these per-memcg and assuming these metrics are updated in
non-performance critical path, we can try to decouple these and other reclaim
related stats from rstat infra. That would at least reduce nr_cpus factor in the
above equation to 1. Though we will need to actually evaluate the performance
for the change before committing to it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-10 14:53     ` Shakeel Butt
@ 2026-03-10 17:01       ` JP Kobryn (Meta)
  0 siblings, 0 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-10 17:01 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	surenb, virtualization, weixugc, xuanzhuo, ying.huang, yuanchu,
	ziy, kernel-team

On 3/10/26 7:53 AM, Shakeel Butt wrote:
> On Mon, Mar 09, 2026 at 09:17:43PM -0700, JP Kobryn (Meta) wrote:
>> On 3/9/26 4:43 PM, Shakeel Butt wrote:
>>> On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
> [...]
>>>
>>> This seems like monotonic increasing metrics and I think you don't care about
>>> their absolute value but rather rate of change. Any reason this can not be
>>> achieved through tracepoints and BPF combination?
>>
>> We have the per-node reclaim stats (pg{steal,scan,refill}) in
>> nodeN/vmstat and memory.numa_stat now. The new stats in this patch would
>> be collected from the same source. They were meant to be used together,
>> so it seemed like a reasonable location. I think the advantage over
>> tracepoints is we get the observability on from the start and it would
>> be simple to extend existing programs that already read stats from the
>> cgroup dir files.
> 
> Convenience is not really justifying the cost of adding 18 counters,
> particularly in memcg. We can argue about adding just in system level metrics
> but not for memcg.
> 
> counter_cost = nr_cpus * nr_nodes * nr_memcg * 16 (struct lruvec_stats_percpu)
> 
> On a typical prod machine, we can see 1000s of memcg, 100s of cpus and couple of
> numa nodes. So, a single counter's cost can range from 200KiB to MiBs. This does
> not seem like a cost we should force everyone to pay.
> 
> If you really want these per-memcg and assuming these metrics are updated in
> non-performance critical path, we can try to decouple these and other reclaim
> related stats from rstat infra. That would at least reduce nr_cpus factor in the
> above equation to 1. Though we will need to actually evaluate the performance
> for the change before committing to it.

I could trade off the per-cgroup granularity and change these stats to
become global per-node stats.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-09  4:31   ` JP Kobryn (Meta)
@ 2026-03-11  2:56     ` Huang, Ying
  2026-03-11 17:31       ` JP Kobryn (Meta)
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2026-03-11  2:56 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

"JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:

> On 3/7/26 4:27 AM, Huang, Ying wrote:
>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>> 
>>> When investigating pressure on a NUMA node, there is no straightforward way
>>> to determine which policies are driving allocations to it.
>>>
>>> Add per-policy page allocation counters as new node stat items. These
>>> counters track allocations to nodes and also whether the allocations were
>>> intentional or fallbacks.
>>>
>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>> following meanings:
>>>
>>>    hit
>>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>      - for other policies, allocation succeeded on intended node
>>>      - counted on the node of the allocation
>>>    miss
>>>      - allocation intended for other node, but happened on this one
>>>      - counted on other node
>>>    foreign
>>>      - allocation intended on this node, but happened on other node
>>>      - counted on this node
>>>
>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>> in /proc/vmstat.
>> IMHO, it may be better to describe your workflow as an example to
>> use
>> the newly added statistics.  That can describe why we need them.  For
>> example, what you have described in
>> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
>> 
>>> 1) Pressure/OOMs reported while system-wide memory is free.
>>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>>> down node(s) under pressure. They become available in
>>> /sys/devices/system/node/nodeN/vmstat.
>>> 3) Check per-policy allocation counters (this patch) on that node to
>>> find what policy was driving it. Same readout at nodeN/vmstat.
>>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
>> 
>
> Good call. I'll add a workflow adapted for the current approach in
> the next revision. I included it in another response in this thread, but
> I'll repeat here because it will make it easier to answer your question
> below.
>
> 1) Pressure/OOMs reported while system-wide memory is free.
> 2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
>    down node(s) under pressure.
> 3) Check per-policy hit/miss/foreign counters (added by this patch) on
>    node(s) to see what policy is driving allocations there (intentional
>    vs fallback).
> 4) Use /proc/*/numa_maps to identify tasks using the policy.
>
>> One question.  If we have to search /proc/*/numa_maps, why can't we
>> find all necessary information via /proc/*/numa_maps?  For example,
>> which VMA uses the most pages on the node?  Which policy is used in the
>> VMA? ...
>> 
>
> There's a gap in the flow of information if we go straight from a node
> in question to numa_maps. Without step 3 above, we can't distinguish
> whether pages landed there intentionally, as a fallback, or were
> migrated sometime after the allocation. These new counters track the
> results of allocations at the time they happen, preserving that
> information regardless of what may happen later on.

Sorry for late reply.

IMHO, step 3) doesn't add much to the flow.  It only counts allocation,
not migration, freeing, etc.  I'm afraid that it may be misleading.  For
example, if a lot of pages have been allocated with a mempolicy, then
these pages are freed.  /proc/*/numa_maps are more useful stats for the
goal.  To get all necessary information, I think that more thorough
tracing is necessary.

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-11  2:56     ` Huang, Ying
@ 2026-03-11 17:31       ` JP Kobryn (Meta)
  0 siblings, 0 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-11 17:31 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, akpm, mhocko, vbabka, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/10/26 7:56 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> On 3/7/26 4:27 AM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>
>>>> When investigating pressure on a NUMA node, there is no straightforward way
>>>> to determine which policies are driving allocations to it.
>>>>
>>>> Add per-policy page allocation counters as new node stat items. These
>>>> counters track allocations to nodes and also whether the allocations were
>>>> intentional or fallbacks.
>>>>
>>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>>> following meanings:
>>>>
>>>>     hit
>>>>       - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>>       - for other policies, allocation succeeded on intended node
>>>>       - counted on the node of the allocation
>>>>     miss
>>>>       - allocation intended for other node, but happened on this one
>>>>       - counted on other node
>>>>     foreign
>>>>       - allocation intended on this node, but happened on other node
>>>>       - counted on this node
>>>>
>>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>>> in /proc/vmstat.
>>> IMHO, it may be better to describe your workflow as an example to
>>> use
>>> the newly added statistics.  That can describe why we need them.  For
>>> example, what you have described in
>>> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
>>>
>>>> 1) Pressure/OOMs reported while system-wide memory is free.
>>>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>>>> down node(s) under pressure. They become available in
>>>> /sys/devices/system/node/nodeN/vmstat.
>>>> 3) Check per-policy allocation counters (this patch) on that node to
>>>> find what policy was driving it. Same readout at nodeN/vmstat.
>>>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
>>>
>>
>> Good call. I'll add a workflow adapted for the current approach in
>> the next revision. I included it in another response in this thread, but
>> I'll repeat here because it will make it easier to answer your question
>> below.
>>
>> 1) Pressure/OOMs reported while system-wide memory is free.
>> 2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
>>     down node(s) under pressure.
>> 3) Check per-policy hit/miss/foreign counters (added by this patch) on
>>     node(s) to see what policy is driving allocations there (intentional
>>     vs fallback).
>> 4) Use /proc/*/numa_maps to identify tasks using the policy.
>>
>>> One question.  If we have to search /proc/*/numa_maps, why can't we
>>> find all necessary information via /proc/*/numa_maps?  For example,
>>> which VMA uses the most pages on the node?  Which policy is used in the
>>> VMA? ...
>>>
>>
>> There's a gap in the flow of information if we go straight from a node
>> in question to numa_maps. Without step 3 above, we can't distinguish
>> whether pages landed there intentionally, as a fallback, or were
>> migrated sometime after the allocation. These new counters track the
>> results of allocations at the time they happen, preserving that
>> information regardless of what may happen later on.
> 
> Sorry for late reply.
> 
> IMHO, step 3) doesn't add much to the flow.  It only counts allocation,
> not migration, freeing, etc.

This logic would undermine other existing stats.

> I'm afraid that it may be misleading.  For
> example, if a lot of pages have been allocated with a mempolicy, then
> these pages are freed.  /proc/*/numa_maps are more useful stats for the
> goal.

numa_maps only show live snapshots with no attribution. Even if we
tracked them over time, there's no way to determine if the allocations
exist as a result of a policy decision.

> To get all necessary information, I think that more thorough
> tracing is necessary.

Tracking other sources of pages on a node (migration, etc) is
beyond the goal of this patch.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-09  3:30   ` JP Kobryn (Meta)
@ 2026-03-11 18:06     ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2026-03-11 18:06 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: Usama Arif, linux-mm, akpm, mhocko, vbabka, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
	kernel-team

On Sun, Mar 08, 2026 at 08:30:47PM -0700, JP Kobryn (Meta) wrote:
> On 3/8/26 12:24 PM, Usama Arif wrote:
> > On Fri,  6 Mar 2026 20:55:20 -0800 "JP Kobryn (Meta)" <jp.kobryn@linux.dev> wrote:
> [...]
> >> +static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
> >> +				  struct page *page, unsigned int order)
> >> +{
> >> +	int actual_nid = page_to_nid(page);
> >> +	long nr_pages = 1L << order;
> >> +	enum node_stat_item hit_idx;
> >> +	struct mem_cgroup *memcg;
> >> +	struct lruvec *lruvec;
> >> +	bool is_hit;
> >> +
> >> +	if (!root_mem_cgroup || mem_cgroup_disabled())
> >> +		return;
> > 
> > Hello JP!
> > 
> > The stats are exposed via /proc/vmstat and are guarded by CONFIG_NUMA, not
> > CONFIG_MEMCG. Early returning overhere would make it inaccuate. Does
> > it make sense to use mod_node_page_state if memcg is not available,
> > so that these global counters work regardless of cgroup configuration.
> >
> 
> Good call. I can instead do:
> 
> if (!mem_cgroup_disabled() && root_mem_cgroup) {
> 	struct mem_cgroup *memcg;
> 	struct lruvec *lruvec;
> 	/* use lruvec for updating stats */
> } else {
> 	/* use node for updating stats */
> }
> 
> This should also take care of the bot warning on mem_cgroup_from_task()
> not being available.

mem_cgroup_lruvec() and mod_lruvec_state() already do the right thing
for !CONFIG_MEMCG. Add a dummy for mem_cgroup_from_task() and you can
do a single, shared sequence for both configs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
                   ` (5 preceding siblings ...)
  2026-03-09 23:43 ` Shakeel Butt
@ 2026-03-12 13:40 ` Vlastimil Babka (SUSE)
  2026-03-12 16:13   ` JP Kobryn (Meta)
  6 siblings, 1 reply; 30+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-03-12 13:40 UTC (permalink / raw)
  To: JP Kobryn (Meta), linux-mm, akpm, mhocko
  Cc: apopple, axelrasmussen, byungchul, cgroups, david, eperezma,
	gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
	linux-kernel, lorenzo.stoakes, matthew.brost, mst, rppt,
	muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang, yuanchu, ziy, kernel-team

On 3/7/26 05:55, JP Kobryn (Meta) wrote:
> When investigating pressure on a NUMA node, there is no straightforward way
> to determine which policies are driving allocations to it.
> 
> Add per-policy page allocation counters as new node stat items. These
> counters track allocations to nodes and also whether the allocations were
> intentional or fallbacks.
> 
> The new stats follow the existing numa hit/miss/foreign style and have the
> following meanings:
> 
>   hit
>     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>     - for other policies, allocation succeeded on intended node
>     - counted on the node of the allocation
>   miss
>     - allocation intended for other node, but happened on this one
>     - counted on other node
>   foreign
>     - allocation intended on this node, but happened on other node
>     - counted on this node
> 
> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
> in /proc/vmstat.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>

I think I've been on of the folks on previous versions arguing against the
many counters, and one of the arguments was it they can't tell the full
story anyway (compared to e.g. tracing), but I don't think adding even more
counters is the right solution. Seems like a number of other people
responding to the thread are providing similar feedback.

For example I'm still not sure how it would help me if I knew the
hits/misses were due to a preferred vs preferred_many policy, or interleave
vs weithed interleave?

> ---
> v2:
>   - Replaced single per-policy total counter (PGALLOC_MPOL_*) with
>     hit/miss/foreign triplet per policy
>   - Changed from global node stats to per-memcg per-node tracking
> 
> v1:
> https://lore.kernel.org/linux-mm/20260212045109.255391-2-inwardvessel@gmail.com/
> 
>  include/linux/mmzone.h | 20 ++++++++++
>  mm/memcontrol.c        | 60 ++++++++++++++++++++++++++++
>  mm/mempolicy.c         | 90 ++++++++++++++++++++++++++++++++++++++++--
>  mm/vmstat.c            | 20 ++++++++++
>  4 files changed, 187 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..c0517cbcb0e2 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -323,6 +323,26 @@ enum node_stat_item {
>  	PGSCAN_ANON,
>  	PGSCAN_FILE,
>  	PGREFILL,
> +#ifdef CONFIG_NUMA
> +	NUMA_MPOL_LOCAL_HIT,
> +	NUMA_MPOL_LOCAL_MISS,
> +	NUMA_MPOL_LOCAL_FOREIGN,
> +	NUMA_MPOL_PREFERRED_HIT,
> +	NUMA_MPOL_PREFERRED_MISS,
> +	NUMA_MPOL_PREFERRED_FOREIGN,
> +	NUMA_MPOL_PREFERRED_MANY_HIT,
> +	NUMA_MPOL_PREFERRED_MANY_MISS,
> +	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
> +	NUMA_MPOL_BIND_HIT,
> +	NUMA_MPOL_BIND_MISS,
> +	NUMA_MPOL_BIND_FOREIGN,
> +	NUMA_MPOL_INTERLEAVE_HIT,
> +	NUMA_MPOL_INTERLEAVE_MISS,
> +	NUMA_MPOL_INTERLEAVE_FOREIGN,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
> +#endif
>  #ifdef CONFIG_HUGETLB_PAGE
>  	NR_HUGETLB,
>  #endif
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 982231a078f2..4d29f723a2de 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -420,6 +420,26 @@ static const unsigned int memcg_node_stat_items[] = {
>  	PGSCAN_ANON,
>  	PGSCAN_FILE,
>  	PGREFILL,
> +#ifdef CONFIG_NUMA
> +	NUMA_MPOL_LOCAL_HIT,
> +	NUMA_MPOL_LOCAL_MISS,
> +	NUMA_MPOL_LOCAL_FOREIGN,
> +	NUMA_MPOL_PREFERRED_HIT,
> +	NUMA_MPOL_PREFERRED_MISS,
> +	NUMA_MPOL_PREFERRED_FOREIGN,
> +	NUMA_MPOL_PREFERRED_MANY_HIT,
> +	NUMA_MPOL_PREFERRED_MANY_MISS,
> +	NUMA_MPOL_PREFERRED_MANY_FOREIGN,
> +	NUMA_MPOL_BIND_HIT,
> +	NUMA_MPOL_BIND_MISS,
> +	NUMA_MPOL_BIND_FOREIGN,
> +	NUMA_MPOL_INTERLEAVE_HIT,
> +	NUMA_MPOL_INTERLEAVE_MISS,
> +	NUMA_MPOL_INTERLEAVE_FOREIGN,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS,
> +	NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN,
> +#endif
>  #ifdef CONFIG_HUGETLB_PAGE
>  	NR_HUGETLB,
>  #endif
> @@ -1591,6 +1611,26 @@ static const struct memory_stat memory_stats[] = {
>  #ifdef CONFIG_NUMA_BALANCING
>  	{ "pgpromote_success",		PGPROMOTE_SUCCESS	},
>  #endif
> +#ifdef CONFIG_NUMA
> +	{ "numa_mpol_local_hit",		NUMA_MPOL_LOCAL_HIT		},
> +	{ "numa_mpol_local_miss",		NUMA_MPOL_LOCAL_MISS		},
> +	{ "numa_mpol_local_foreign",		NUMA_MPOL_LOCAL_FOREIGN		},
> +	{ "numa_mpol_preferred_hit",		NUMA_MPOL_PREFERRED_HIT		},
> +	{ "numa_mpol_preferred_miss",		NUMA_MPOL_PREFERRED_MISS	},
> +	{ "numa_mpol_preferred_foreign",	NUMA_MPOL_PREFERRED_FOREIGN	},
> +	{ "numa_mpol_preferred_many_hit",	NUMA_MPOL_PREFERRED_MANY_HIT	},
> +	{ "numa_mpol_preferred_many_miss",	NUMA_MPOL_PREFERRED_MANY_MISS	},
> +	{ "numa_mpol_preferred_many_foreign",	NUMA_MPOL_PREFERRED_MANY_FOREIGN },
> +	{ "numa_mpol_bind_hit",			NUMA_MPOL_BIND_HIT		},
> +	{ "numa_mpol_bind_miss",		NUMA_MPOL_BIND_MISS		},
> +	{ "numa_mpol_bind_foreign",		NUMA_MPOL_BIND_FOREIGN		},
> +	{ "numa_mpol_interleave_hit",		NUMA_MPOL_INTERLEAVE_HIT	},
> +	{ "numa_mpol_interleave_miss",		NUMA_MPOL_INTERLEAVE_MISS	},
> +	{ "numa_mpol_interleave_foreign",	NUMA_MPOL_INTERLEAVE_FOREIGN	},
> +	{ "numa_mpol_weighted_interleave_hit",	NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT },
> +	{ "numa_mpol_weighted_interleave_miss",	NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS },
> +	{ "numa_mpol_weighted_interleave_foreign", NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN },
> +#endif
>  };
>  
>  /* The actual unit of the state item, not the same as the output unit */
> @@ -1642,6 +1682,26 @@ static int memcg_page_state_output_unit(int item)
>  	case PGREFILL:
>  #ifdef CONFIG_NUMA_BALANCING
>  	case PGPROMOTE_SUCCESS:
> +#endif
> +#ifdef CONFIG_NUMA
> +	case NUMA_MPOL_LOCAL_HIT:
> +	case NUMA_MPOL_LOCAL_MISS:
> +	case NUMA_MPOL_LOCAL_FOREIGN:
> +	case NUMA_MPOL_PREFERRED_HIT:
> +	case NUMA_MPOL_PREFERRED_MISS:
> +	case NUMA_MPOL_PREFERRED_FOREIGN:
> +	case NUMA_MPOL_PREFERRED_MANY_HIT:
> +	case NUMA_MPOL_PREFERRED_MANY_MISS:
> +	case NUMA_MPOL_PREFERRED_MANY_FOREIGN:
> +	case NUMA_MPOL_BIND_HIT:
> +	case NUMA_MPOL_BIND_MISS:
> +	case NUMA_MPOL_BIND_FOREIGN:
> +	case NUMA_MPOL_INTERLEAVE_HIT:
> +	case NUMA_MPOL_INTERLEAVE_MISS:
> +	case NUMA_MPOL_INTERLEAVE_FOREIGN:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS:
> +	case NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN:
>  #endif
>  		return 1;
>  	default:
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0e5175f1c767..2417de75098d 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -117,6 +117,7 @@
>  #include <asm/tlb.h>
>  #include <linux/uaccess.h>
>  #include <linux/memory.h>
> +#include <linux/memcontrol.h>
>  
>  #include "internal.h"
>  
> @@ -2426,6 +2427,83 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
>  	return page;
>  }
>  
> +/*
> + * Count a mempolicy allocation. Stats are tracked per-node and per-cgroup.
> + * The following numa_{hit/miss/foreign} pattern is used:
> + *
> + *   hit
> + *     - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
> + *     - for other policies, allocation succeeded on intended node
> + *     - counted on the node of the allocation
> + *   miss
> + *     - allocation intended for other node, but happened on this one
> + *     - counted on other node
> + *   foreign
> + *     - allocation intended on this node, but happened on other node
> + *     - counted on this node
> + */
> +static void mpol_count_numa_alloc(struct mempolicy *pol, int intended_nid,
> +				  struct page *page, unsigned int order)
> +{
> +	int actual_nid = page_to_nid(page);
> +	long nr_pages = 1L << order;
> +	enum node_stat_item hit_idx;
> +	struct mem_cgroup *memcg;
> +	struct lruvec *lruvec;
> +	bool is_hit;
> +
> +	if (!root_mem_cgroup || mem_cgroup_disabled())
> +		return;
> +
> +	/*
> +	 * Start with hit then use +1 or +2 later on to change to miss or
> +	 * foreign respectively if needed.
> +	 */
> +	switch (pol->mode) {
> +	case MPOL_PREFERRED:
> +		hit_idx = NUMA_MPOL_PREFERRED_HIT;
> +		break;
> +	case MPOL_PREFERRED_MANY:
> +		hit_idx = NUMA_MPOL_PREFERRED_MANY_HIT;
> +		break;
> +	case MPOL_BIND:
> +		hit_idx = NUMA_MPOL_BIND_HIT;
> +		break;
> +	case MPOL_INTERLEAVE:
> +		hit_idx = NUMA_MPOL_INTERLEAVE_HIT;
> +		break;
> +	case MPOL_WEIGHTED_INTERLEAVE:
> +		hit_idx = NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT;
> +		break;
> +	default:
> +		hit_idx = NUMA_MPOL_LOCAL_HIT;
> +		break;
> +	}
> +
> +	if (pol->mode == MPOL_BIND || pol->mode == MPOL_PREFERRED_MANY)
> +		is_hit = node_isset(actual_nid, pol->nodes);
> +	else
> +		is_hit = (actual_nid == intended_nid);
> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +
> +	if (is_hit) {
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx, nr_pages);
> +	} else {
> +		/* account for miss on the fallback node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(actual_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 1, nr_pages);
> +
> +		/* account for foreign on the intended node */
> +		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(intended_nid));
> +		mod_lruvec_state(lruvec, hit_idx + 2, nr_pages);
> +	}
> +
> +	rcu_read_unlock();
> +}
> +
>  /**
>   * alloc_pages_mpol - Allocate pages according to NUMA mempolicy.
>   * @gfp: GFP flags.
> @@ -2444,8 +2522,10 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>  
>  	nodemask = policy_nodemask(gfp, pol, ilx, &nid);
>  
> -	if (pol->mode == MPOL_PREFERRED_MANY)
> -		return alloc_pages_preferred_many(gfp, order, nid, nodemask);
> +	if (pol->mode == MPOL_PREFERRED_MANY) {
> +		page = alloc_pages_preferred_many(gfp, order, nid, nodemask);
> +		goto out;
> +	}
>  
>  	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>  	    /* filter "hugepage" allocation, unless from alloc_pages() */
> @@ -2471,7 +2551,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>  				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
>  				nid, NULL);
>  			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
> -				return page;
> +				goto out;
>  			/*
>  			 * If hugepage allocations are configured to always
>  			 * synchronous compact or the vma has been madvised
> @@ -2494,6 +2574,10 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>  		}
>  	}
>  
> +out:
> +	if (page)
> +		mpol_count_numa_alloc(pol, nid, page, order);
> +
>  	return page;
>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b33097ab9bc8..d9f745831624 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1291,6 +1291,26 @@ const char * const vmstat_text[] = {
>  	[I(PGSCAN_ANON)]			= "pgscan_anon",
>  	[I(PGSCAN_FILE)]			= "pgscan_file",
>  	[I(PGREFILL)]				= "pgrefill",
> +#ifdef CONFIG_NUMA
> +	[I(NUMA_MPOL_LOCAL_HIT)]		= "numa_mpol_local_hit",
> +	[I(NUMA_MPOL_LOCAL_MISS)]		= "numa_mpol_local_miss",
> +	[I(NUMA_MPOL_LOCAL_FOREIGN)]		= "numa_mpol_local_foreign",
> +	[I(NUMA_MPOL_PREFERRED_HIT)]		= "numa_mpol_preferred_hit",
> +	[I(NUMA_MPOL_PREFERRED_MISS)]		= "numa_mpol_preferred_miss",
> +	[I(NUMA_MPOL_PREFERRED_FOREIGN)]	= "numa_mpol_preferred_foreign",
> +	[I(NUMA_MPOL_PREFERRED_MANY_HIT)]	= "numa_mpol_preferred_many_hit",
> +	[I(NUMA_MPOL_PREFERRED_MANY_MISS)]	= "numa_mpol_preferred_many_miss",
> +	[I(NUMA_MPOL_PREFERRED_MANY_FOREIGN)]	= "numa_mpol_preferred_many_foreign",
> +	[I(NUMA_MPOL_BIND_HIT)]			= "numa_mpol_bind_hit",
> +	[I(NUMA_MPOL_BIND_MISS)]		= "numa_mpol_bind_miss",
> +	[I(NUMA_MPOL_BIND_FOREIGN)]		= "numa_mpol_bind_foreign",
> +	[I(NUMA_MPOL_INTERLEAVE_HIT)]		= "numa_mpol_interleave_hit",
> +	[I(NUMA_MPOL_INTERLEAVE_MISS)]		= "numa_mpol_interleave_miss",
> +	[I(NUMA_MPOL_INTERLEAVE_FOREIGN)]	= "numa_mpol_interleave_foreign",
> +	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_HIT)]	= "numa_mpol_weighted_interleave_hit",
> +	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_MISS)]	= "numa_mpol_weighted_interleave_miss",
> +	[I(NUMA_MPOL_WEIGHTED_INTERLEAVE_FOREIGN)] = "numa_mpol_weighted_interleave_foreign",
> +#endif
>  #ifdef CONFIG_HUGETLB_PAGE
>  	[I(NR_HUGETLB)]				= "nr_hugetlb",
>  #endif


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-12 13:40 ` Vlastimil Babka (SUSE)
@ 2026-03-12 16:13   ` JP Kobryn (Meta)
  2026-03-13  5:07     ` Huang, Ying
  0 siblings, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-12 16:13 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko
  Cc: apopple, axelrasmussen, byungchul, cgroups, david, eperezma,
	gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
	linux-kernel, lorenzo.stoakes, matthew.brost, mst, rppt,
	muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo,
	ying.huang, yuanchu, ziy, kernel-team

On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
> On 3/7/26 05:55, JP Kobryn (Meta) wrote:
>> When investigating pressure on a NUMA node, there is no straightforward way
>> to determine which policies are driving allocations to it.
>>
>> Add per-policy page allocation counters as new node stat items. These
>> counters track allocations to nodes and also whether the allocations were
>> intentional or fallbacks.
>>
>> The new stats follow the existing numa hit/miss/foreign style and have the
>> following meanings:
>>
>>    hit
>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>      - for other policies, allocation succeeded on intended node
>>      - counted on the node of the allocation
>>    miss
>>      - allocation intended for other node, but happened on this one
>>      - counted on other node
>>    foreign
>>      - allocation intended on this node, but happened on other node
>>      - counted on this node
>>
>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>> in /proc/vmstat.
>>
>> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> 
> I think I've been on of the folks on previous versions arguing against the
> many counters, and one of the arguments was it they can't tell the full
> story anyway (compared to e.g. tracing), but I don't think adding even more
> counters is the right solution. Seems like a number of other people
> responding to the thread are providing similar feedback.
> 
> For example I'm still not sure how it would help me if I knew the
> hits/misses were due to a preferred vs preferred_many policy, or interleave
> vs weithed interleave?
> 

How about I change from per-policy hit/miss/foreign triplets to a single
aggregated policy triplet (i.e. just 3 new counters which account for
all policies)? They would follow the same hit/miss/foreign semantics
already proposed (visible in quoted text above). This would still
provide the otherwise missing signal of whether policy-driven
allocations to a node are intentional or fallback.

Note that I am also planning on moving the stats off of the memcg so the
3 new counters will be global per-node in response to similar feedback.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-12 16:13   ` JP Kobryn (Meta)
@ 2026-03-13  5:07     ` Huang, Ying
  2026-03-13  6:14       ` JP Kobryn (Meta)
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2026-03-13  5:07 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

"JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:

> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>> On 3/7/26 05:55, JP Kobryn (Meta) wrote:
>>> When investigating pressure on a NUMA node, there is no straightforward way
>>> to determine which policies are driving allocations to it.
>>>
>>> Add per-policy page allocation counters as new node stat items. These
>>> counters track allocations to nodes and also whether the allocations were
>>> intentional or fallbacks.
>>>
>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>> following meanings:
>>>
>>>    hit
>>>      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>      - for other policies, allocation succeeded on intended node
>>>      - counted on the node of the allocation
>>>    miss
>>>      - allocation intended for other node, but happened on this one
>>>      - counted on other node
>>>    foreign
>>>      - allocation intended on this node, but happened on other node
>>>      - counted on this node
>>>
>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>> in /proc/vmstat.
>>>
>>> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
>> I think I've been on of the folks on previous versions arguing
>> against the
>> many counters, and one of the arguments was it they can't tell the full
>> story anyway (compared to e.g. tracing), but I don't think adding even more
>> counters is the right solution. Seems like a number of other people
>> responding to the thread are providing similar feedback.
>> For example I'm still not sure how it would help me if I knew the
>> hits/misses were due to a preferred vs preferred_many policy, or interleave
>> vs weithed interleave?
>> 
>
> How about I change from per-policy hit/miss/foreign triplets to a single
> aggregated policy triplet (i.e. just 3 new counters which account for
> all policies)? They would follow the same hit/miss/foreign semantics
> already proposed (visible in quoted text above). This would still
> provide the otherwise missing signal of whether policy-driven
> allocations to a node are intentional or fallback.
>
> Note that I am also planning on moving the stats off of the memcg so the
> 3 new counters will be global per-node in response to similar feedback.

Emm, what's the difference between these newly added counters and the
existing numa_hit/miss/foreign counters?

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13  5:07     ` Huang, Ying
@ 2026-03-13  6:14       ` JP Kobryn (Meta)
  2026-03-13  7:34         ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-13  6:14 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

On 3/12/26 10:07 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>> On 3/7/26 05:55, JP Kobryn (Meta) wrote:
>>>> When investigating pressure on a NUMA node, there is no straightforward way
>>>> to determine which policies are driving allocations to it.
>>>>
>>>> Add per-policy page allocation counters as new node stat items. These
>>>> counters track allocations to nodes and also whether the allocations were
>>>> intentional or fallbacks.
>>>>
>>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>>> following meanings:
>>>>
>>>>     hit
>>>>       - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>>       - for other policies, allocation succeeded on intended node
>>>>       - counted on the node of the allocation
>>>>     miss
>>>>       - allocation intended for other node, but happened on this one
>>>>       - counted on other node
>>>>     foreign
>>>>       - allocation intended on this node, but happened on other node
>>>>       - counted on this node
>>>>
>>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>>> in /proc/vmstat.
>>>>
>>>> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
>>> I think I've been on of the folks on previous versions arguing
>>> against the
>>> many counters, and one of the arguments was it they can't tell the full
>>> story anyway (compared to e.g. tracing), but I don't think adding even more
>>> counters is the right solution. Seems like a number of other people
>>> responding to the thread are providing similar feedback.
>>> For example I'm still not sure how it would help me if I knew the
>>> hits/misses were due to a preferred vs preferred_many policy, or interleave
>>> vs weithed interleave?
>>>
>>
>> How about I change from per-policy hit/miss/foreign triplets to a single
>> aggregated policy triplet (i.e. just 3 new counters which account for
>> all policies)? They would follow the same hit/miss/foreign semantics
>> already proposed (visible in quoted text above). This would still
>> provide the otherwise missing signal of whether policy-driven
>> allocations to a node are intentional or fallback.
>>
>> Note that I am also planning on moving the stats off of the memcg so the
>> 3 new counters will be global per-node in response to similar feedback.
> 
> Emm, what's the difference between these newly added counters and the
> existing numa_hit/miss/foreign counters?

The existing counters don't account for node masks in the policies that
make use of them. An allocation can land on a node in the mask and still
be considered a miss because it wasn't the preferred node.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13  6:14       ` JP Kobryn (Meta)
@ 2026-03-13  7:34         ` Vlastimil Babka (SUSE)
  2026-03-13  9:31           ` Huang, Ying
  2026-03-13 18:09           ` JP Kobryn (Meta)
  0 siblings, 2 replies; 30+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-03-13  7:34 UTC (permalink / raw)
  To: JP Kobryn (Meta), Huang, Ying
  Cc: linux-mm, akpm, mhocko, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/13/26 07:14, JP Kobryn (Meta) wrote:
> On 3/12/26 10:07 PM, Huang, Ying wrote:
>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>> 
>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>
>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>> all policies)? They would follow the same hit/miss/foreign semantics
>>> already proposed (visible in quoted text above). This would still
>>> provide the otherwise missing signal of whether policy-driven
>>> allocations to a node are intentional or fallback.
>>>
>>> Note that I am also planning on moving the stats off of the memcg so the
>>> 3 new counters will be global per-node in response to similar feedback.
>> 
>> Emm, what's the difference between these newly added counters and the
>> existing numa_hit/miss/foreign counters?
> 
> The existing counters don't account for node masks in the policies that
> make use of them. An allocation can land on a node in the mask and still
> be considered a miss because it wasn't the preferred node.

That sounds like we could just a new counter e.g. numa_hit_preferred and
adjust definitions accordingly? Or some other variant that fills the gap?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13  7:34         ` Vlastimil Babka (SUSE)
@ 2026-03-13  9:31           ` Huang, Ying
  2026-03-13 18:28             ` JP Kobryn (Meta)
  2026-03-13 18:09           ` JP Kobryn (Meta)
  1 sibling, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2026-03-13  9:31 UTC (permalink / raw)
  To: JP Kobryn (Meta), Vlastimil Babka (SUSE)
  Cc: linux-mm, akpm, mhocko, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

"Vlastimil Babka (SUSE)" <vbabka@kernel.org> writes:

> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>> 
>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>
>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>> already proposed (visible in quoted text above). This would still
>>>> provide the otherwise missing signal of whether policy-driven
>>>> allocations to a node are intentional or fallback.
>>>>
>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>> 3 new counters will be global per-node in response to similar feedback.
>>> 
>>> Emm, what's the difference between these newly added counters and the
>>> existing numa_hit/miss/foreign counters?
>> 
>> The existing counters don't account for node masks in the policies that
>> make use of them. An allocation can land on a node in the mask and still
>> be considered a miss because it wasn't the preferred node.
>
> That sounds like we could just a new counter e.g. numa_hit_preferred and
> adjust definitions accordingly? Or some other variant that fills the gap?

Or can we adjust the semantics of numa_hit/miss/foreign to consider the
preferred nodemask instead of the preferred node?  Is there some
programs to depends on the current behavior?

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13  7:34         ` Vlastimil Babka (SUSE)
  2026-03-13  9:31           ` Huang, Ying
@ 2026-03-13 18:09           ` JP Kobryn (Meta)
  2026-03-16  2:54             ` Huang, Ying
  1 sibling, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-13 18:09 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Huang, Ying
  Cc: linux-mm, akpm, mhocko, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/13/26 12:34 AM, Vlastimil Babka (SUSE) wrote:
> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>
>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>
>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>> already proposed (visible in quoted text above). This would still
>>>> provide the otherwise missing signal of whether policy-driven
>>>> allocations to a node are intentional or fallback.
>>>>
>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>> 3 new counters will be global per-node in response to similar feedback.
>>>
>>> Emm, what's the difference between these newly added counters and the
>>> existing numa_hit/miss/foreign counters?
>>
>> The existing counters don't account for node masks in the policies that
>> make use of them. An allocation can land on a node in the mask and still
>> be considered a miss because it wasn't the preferred node.
> 
> That sounds like we could just a new counter e.g. numa_hit_preferred and
> adjust definitions accordingly? Or some other variant that fills the gap?

It's an interesting thought. Looking into these existing counters more,
the in-kernel direct node allocations, which don't fall under any
mempolicy, are also included in these stats. One good example might be
include/linux/skbuff.h, where __dev_alloc_pages() calls
alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
zone_statistics() and increments the stats. So if we applied the
hit/miss/foreign semantics in this patch to the existing counters we
would be mixing allocations that are in and out of policy, losing the
accuracy.

The new 3 counters I last proposed (in an effort to reduce the amount of
new counters as much as possible) would isolate mempolicy allocs and be
named to reflect that: numa_mpol_{hit,miss,foreign}.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13  9:31           ` Huang, Ying
@ 2026-03-13 18:28             ` JP Kobryn (Meta)
  0 siblings, 0 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-13 18:28 UTC (permalink / raw)
  To: Huang, Ying, Vlastimil Babka (SUSE)
  Cc: linux-mm, akpm, mhocko, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/13/26 2:31 AM, Huang, Ying wrote:
> "Vlastimil Babka (SUSE)" <vbabka@kernel.org> writes:
> 
>> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>>
>>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>>
>>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>>> already proposed (visible in quoted text above). This would still
>>>>> provide the otherwise missing signal of whether policy-driven
>>>>> allocations to a node are intentional or fallback.
>>>>>
>>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>>> 3 new counters will be global per-node in response to similar feedback.
>>>>
>>>> Emm, what's the difference between these newly added counters and the
>>>> existing numa_hit/miss/foreign counters?
>>>
>>> The existing counters don't account for node masks in the policies that
>>> make use of them. An allocation can land on a node in the mask and still
>>> be considered a miss because it wasn't the preferred node.
>>
>> That sounds like we could just a new counter e.g. numa_hit_preferred and
>> adjust definitions accordingly? Or some other variant that fills the gap?
> 
> Or can we adjust the semantics of numa_hit/miss/foreign to consider the
> preferred nodemask instead of the preferred node?  Is there some
> programs to depends on the current behavior?

Good question. I think it comes down to whether the existing semantics
are correct with respect to policies that make use of node masks. I gave
some thoughts on this in the previous reply to Vlastimil. That
correctness may be outside of the scope of this patch, but I can give
that a try afterward. I'd like to send a revision that reduces the new
counters to just 3 and moves them off of the memcg (as previously
mentioned in thread).

I know numastat is one consumer of the existing stats. It seems up to
the user on the interpretation of the data. Not sure about others.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-13 18:09           ` JP Kobryn (Meta)
@ 2026-03-16  2:54             ` Huang, Ying
  2026-03-17  4:37               ` JP Kobryn (Meta)
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2026-03-16  2:54 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

"JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:

> On 3/13/26 12:34 AM, Vlastimil Babka (SUSE) wrote:
>> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>>
>>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>>
>>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>>> already proposed (visible in quoted text above). This would still
>>>>> provide the otherwise missing signal of whether policy-driven
>>>>> allocations to a node are intentional or fallback.
>>>>>
>>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>>> 3 new counters will be global per-node in response to similar feedback.
>>>>
>>>> Emm, what's the difference between these newly added counters and the
>>>> existing numa_hit/miss/foreign counters?
>>>
>>> The existing counters don't account for node masks in the policies that
>>> make use of them. An allocation can land on a node in the mask and still
>>> be considered a miss because it wasn't the preferred node.
>> That sounds like we could just a new counter e.g. numa_hit_preferred
>> and
>> adjust definitions accordingly? Or some other variant that fills the gap?
>
> It's an interesting thought. Looking into these existing counters more,
> the in-kernel direct node allocations, which don't fall under any
> mempolicy, are also included in these stats. One good example might be
> include/linux/skbuff.h, where __dev_alloc_pages() calls
> alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
> zone_statistics() and increments the stats.

IIUC, the default memory policy is used here, that is, MPOL_LOCAL.

> So if we applied the hit/miss/foreign semantics in this patch to the
> existing counters we would be mixing allocations that are in and out
> of policy, losing the accuracy.
>
> The new 3 counters I last proposed (in an effort to reduce the amount of
> new counters as much as possible) would isolate mempolicy allocs and be
> named to reflect that: numa_mpol_{hit,miss,foreign}.

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-16  2:54             ` Huang, Ying
@ 2026-03-17  4:37               ` JP Kobryn (Meta)
  2026-03-17  6:44                 ` Huang, Ying
  0 siblings, 1 reply; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-17  4:37 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

On 3/15/26 7:54 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> On 3/13/26 12:34 AM, Vlastimil Babka (SUSE) wrote:
>>> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>>>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>>>
>>>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>>>
>>>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>>>> already proposed (visible in quoted text above). This would still
>>>>>> provide the otherwise missing signal of whether policy-driven
>>>>>> allocations to a node are intentional or fallback.
>>>>>>
>>>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>>>> 3 new counters will be global per-node in response to similar feedback.
>>>>>
>>>>> Emm, what's the difference between these newly added counters and the
>>>>> existing numa_hit/miss/foreign counters?
>>>>
>>>> The existing counters don't account for node masks in the policies that
>>>> make use of them. An allocation can land on a node in the mask and still
>>>> be considered a miss because it wasn't the preferred node.
>>> That sounds like we could just a new counter e.g. numa_hit_preferred
>>> and
>>> adjust definitions accordingly? Or some other variant that fills the gap?
>>
>> It's an interesting thought. Looking into these existing counters more,
>> the in-kernel direct node allocations, which don't fall under any
>> mempolicy, are also included in these stats. One good example might be
>> include/linux/skbuff.h, where __dev_alloc_pages() calls
>> alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
>> zone_statistics() and increments the stats.
> 
> IIUC, the default memory policy is used here, that is, MPOL_LOCAL.

I'm not seeing that. zone_statistics() is eventually reached.
alloc_pages_mpol() is not.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-17  4:37               ` JP Kobryn (Meta)
@ 2026-03-17  6:44                 ` Huang, Ying
  2026-03-17 11:10                   ` Vlastimil Babka (SUSE)
  2026-03-17 17:55                   ` JP Kobryn (Meta)
  0 siblings, 2 replies; 30+ messages in thread
From: Huang, Ying @ 2026-03-17  6:44 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

"JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:

> On 3/15/26 7:54 PM, Huang, Ying wrote:
>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>> 
>>> On 3/13/26 12:34 AM, Vlastimil Babka (SUSE) wrote:
>>>> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>>>>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>>>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>>>>
>>>>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>>>>
>>>>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>>>>> already proposed (visible in quoted text above). This would still
>>>>>>> provide the otherwise missing signal of whether policy-driven
>>>>>>> allocations to a node are intentional or fallback.
>>>>>>>
>>>>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>>>>> 3 new counters will be global per-node in response to similar feedback.
>>>>>>
>>>>>> Emm, what's the difference between these newly added counters and the
>>>>>> existing numa_hit/miss/foreign counters?
>>>>>
>>>>> The existing counters don't account for node masks in the policies that
>>>>> make use of them. An allocation can land on a node in the mask and still
>>>>> be considered a miss because it wasn't the preferred node.
>>>> That sounds like we could just a new counter e.g. numa_hit_preferred
>>>> and
>>>> adjust definitions accordingly? Or some other variant that fills the gap?
>>>
>>> It's an interesting thought. Looking into these existing counters more,
>>> the in-kernel direct node allocations, which don't fall under any
>>> mempolicy, are also included in these stats. One good example might be
>>> include/linux/skbuff.h, where __dev_alloc_pages() calls
>>> alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
>>> zone_statistics() and increments the stats.
>> IIUC, the default memory policy is used here, that is, MPOL_LOCAL.
>
> I'm not seeing that. zone_statistics() is eventually reached.
> alloc_pages_mpol() is not.

Yes.  The page isn't allocated through alloc_pages_mpol().  For example,
if we want to allocate pages for the kernel instead of user space
applications.  However, IMHO, the equivalent memory policy is
MPOL_LOCAL, that is, allocate from local node firstly, then fallback to
other nodes.  I don't think that alloc_pages_mpol() is so special.

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-17  6:44                 ` Huang, Ying
@ 2026-03-17 11:10                   ` Vlastimil Babka (SUSE)
  2026-03-17 17:55                   ` JP Kobryn (Meta)
  1 sibling, 0 replies; 30+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-03-17 11:10 UTC (permalink / raw)
  To: Huang, Ying, JP Kobryn (Meta)
  Cc: linux-mm, akpm, mhocko, apopple, axelrasmussen, byungchul,
	cgroups, david, eperezma, gourry, jasowang, hannes, joshua.hahnjy,
	Liam.Howlett, linux-kernel, lorenzo.stoakes, matthew.brost, mst,
	rppt, muchun.song, zhengqi.arch, rakie.kim, roman.gushchin,
	shakeel.butt, surenb, virtualization, weixugc, xuanzhuo, yuanchu,
	ziy, kernel-team

On 3/17/26 07:44, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>>>>
>>>> It's an interesting thought. Looking into these existing counters more,
>>>> the in-kernel direct node allocations, which don't fall under any
>>>> mempolicy, are also included in these stats. One good example might be
>>>> include/linux/skbuff.h, where __dev_alloc_pages() calls
>>>> alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
>>>> zone_statistics() and increments the stats.
>>> IIUC, the default memory policy is used here, that is, MPOL_LOCAL.
>>
>> I'm not seeing that. zone_statistics() is eventually reached.
>> alloc_pages_mpol() is not.
> 
> Yes.  The page isn't allocated through alloc_pages_mpol().  For example,
> if we want to allocate pages for the kernel instead of user space
> applications.  However, IMHO, the equivalent memory policy is
> MPOL_LOCAL, that is, allocate from local node firstly, then fallback to
> other nodes.  I don't think that alloc_pages_mpol() is so special.

Agree, it's equivalent to MPOL_LOCAL.

> ---
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
  2026-03-17  6:44                 ` Huang, Ying
  2026-03-17 11:10                   ` Vlastimil Babka (SUSE)
@ 2026-03-17 17:55                   ` JP Kobryn (Meta)
  1 sibling, 0 replies; 30+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-17 17:55 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vlastimil Babka (SUSE), linux-mm, akpm, mhocko, apopple,
	axelrasmussen, byungchul, cgroups, david, eperezma, gourry,
	jasowang, hannes, joshua.hahnjy, Liam.Howlett, linux-kernel,
	lorenzo.stoakes, matthew.brost, mst, rppt, muchun.song,
	zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
	virtualization, weixugc, xuanzhuo, yuanchu, ziy, kernel-team

On 3/16/26 11:44 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> On 3/15/26 7:54 PM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>
>>>> On 3/13/26 12:34 AM, Vlastimil Babka (SUSE) wrote:
>>>>> On 3/13/26 07:14, JP Kobryn (Meta) wrote:
>>>>>> On 3/12/26 10:07 PM, Huang, Ying wrote:
>>>>>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>>>>>
>>>>>>>> On 3/12/26 6:40 AM, Vlastimil Babka (SUSE) wrote:
>>>>>>>>
>>>>>>>> How about I change from per-policy hit/miss/foreign triplets to a single
>>>>>>>> aggregated policy triplet (i.e. just 3 new counters which account for
>>>>>>>> all policies)? They would follow the same hit/miss/foreign semantics
>>>>>>>> already proposed (visible in quoted text above). This would still
>>>>>>>> provide the otherwise missing signal of whether policy-driven
>>>>>>>> allocations to a node are intentional or fallback.
>>>>>>>>
>>>>>>>> Note that I am also planning on moving the stats off of the memcg so the
>>>>>>>> 3 new counters will be global per-node in response to similar feedback.
>>>>>>>
>>>>>>> Emm, what's the difference between these newly added counters and the
>>>>>>> existing numa_hit/miss/foreign counters?
>>>>>>
>>>>>> The existing counters don't account for node masks in the policies that
>>>>>> make use of them. An allocation can land on a node in the mask and still
>>>>>> be considered a miss because it wasn't the preferred node.
>>>>> That sounds like we could just a new counter e.g. numa_hit_preferred
>>>>> and
>>>>> adjust definitions accordingly? Or some other variant that fills the gap?
>>>>
>>>> It's an interesting thought. Looking into these existing counters more,
>>>> the in-kernel direct node allocations, which don't fall under any
>>>> mempolicy, are also included in these stats. One good example might be
>>>> include/linux/skbuff.h, where __dev_alloc_pages() calls
>>>> alloc_pages_node_noprof(NUMA_NO_NODE, ...) which eventually reaches
>>>> zone_statistics() and increments the stats.
>>> IIUC, the default memory policy is used here, that is, MPOL_LOCAL.
>>
>> I'm not seeing that. zone_statistics() is eventually reached.
>> alloc_pages_mpol() is not.
> 
> Yes.  The page isn't allocated through alloc_pages_mpol().  For example,
> if we want to allocate pages for the kernel instead of user space
> applications.  However, IMHO, the equivalent memory policy is
> MPOL_LOCAL, that is, allocate from local node firstly, then fallback to
> other nodes.  I don't think that alloc_pages_mpol() is so special.

Sure. My response was based on how you said, "the default memory policy
is used here". I took that literally. I agree on the behavioral
equivalence, but the important point is that no mempolicy is set. In the
v3 patch which was recently sent out, I'm using that aspect to
distinguish between allocations with a user-defined mempolicy and
allocations without one.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-03-17 17:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
2026-03-07 12:27 ` Huang, Ying
2026-03-08 19:20   ` Gregory Price
2026-03-09  4:11     ` JP Kobryn (Meta)
2026-03-09  4:31   ` JP Kobryn (Meta)
2026-03-11  2:56     ` Huang, Ying
2026-03-11 17:31       ` JP Kobryn (Meta)
2026-03-07 14:32 ` kernel test robot
2026-03-07 19:57 ` kernel test robot
2026-03-08 19:24 ` Usama Arif
2026-03-09  3:30   ` JP Kobryn (Meta)
2026-03-11 18:06     ` Johannes Weiner
2026-03-09 23:35 ` Shakeel Butt
2026-03-09 23:43 ` Shakeel Butt
2026-03-10  4:17   ` JP Kobryn (Meta)
2026-03-10 14:53     ` Shakeel Butt
2026-03-10 17:01       ` JP Kobryn (Meta)
2026-03-12 13:40 ` Vlastimil Babka (SUSE)
2026-03-12 16:13   ` JP Kobryn (Meta)
2026-03-13  5:07     ` Huang, Ying
2026-03-13  6:14       ` JP Kobryn (Meta)
2026-03-13  7:34         ` Vlastimil Babka (SUSE)
2026-03-13  9:31           ` Huang, Ying
2026-03-13 18:28             ` JP Kobryn (Meta)
2026-03-13 18:09           ` JP Kobryn (Meta)
2026-03-16  2:54             ` Huang, Ying
2026-03-17  4:37               ` JP Kobryn (Meta)
2026-03-17  6:44                 ` Huang, Ying
2026-03-17 11:10                   ` Vlastimil Babka (SUSE)
2026-03-17 17:55                   ` JP Kobryn (Meta)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox