Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy

public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed

From: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
To: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@suse.com,
	vbabka@suse.cz, apopple@nvidia.com, axelrasmussen@google.com,
	byungchul@sk.com, cgroups@vger.kernel.org, david@kernel.org,
	eperezma@redhat.com, gourry@gourry.net, jasowang@redhat.com,
	hannes@cmpxchg.org, joshua.hahnjy@gmail.com,
	Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org,
	lorenzo.stoakes@oracle.com, matthew.brost@intel.com,
	mst@redhat.com, rppt@kernel.org, muchun.song@linux.dev,
	zhengqi.arch@bytedance.com, rakie.kim@sk.com,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	surenb@google.com, virtualization@lists.linux.dev,
	weixugc@google.com, xuanzhuo@linux.alibaba.com,
	yuanchu@google.com, ziy@nvidia.com, kernel-team@meta.com
Subject: Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
Date: Wed, 11 Mar 2026 10:31:48 -0700	[thread overview]
Message-ID: <a26df89c-5504-445d-a639-ffd2d12efaf5@linux.dev> (raw)
In-Reply-To: <87cy1boyzd.fsf@DESKTOP-5N7EMDA>

On 3/10/26 7:56 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
> 
>> On 3/7/26 4:27 AM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>
>>>> When investigating pressure on a NUMA node, there is no straightforward way
>>>> to determine which policies are driving allocations to it.
>>>>
>>>> Add per-policy page allocation counters as new node stat items. These
>>>> counters track allocations to nodes and also whether the allocations were
>>>> intentional or fallbacks.
>>>>
>>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>>> following meanings:
>>>>
>>>>     hit
>>>>       - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>>       - for other policies, allocation succeeded on intended node
>>>>       - counted on the node of the allocation
>>>>     miss
>>>>       - allocation intended for other node, but happened on this one
>>>>       - counted on other node
>>>>     foreign
>>>>       - allocation intended on this node, but happened on other node
>>>>       - counted on this node
>>>>
>>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>>> in /proc/vmstat.
>>> IMHO, it may be better to describe your workflow as an example to
>>> use
>>> the newly added statistics.  That can describe why we need them.  For
>>> example, what you have described in
>>> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
>>>
>>>> 1) Pressure/OOMs reported while system-wide memory is free.
>>>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>>>> down node(s) under pressure. They become available in
>>>> /sys/devices/system/node/nodeN/vmstat.
>>>> 3) Check per-policy allocation counters (this patch) on that node to
>>>> find what policy was driving it. Same readout at nodeN/vmstat.
>>>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
>>>
>>
>> Good call. I'll add a workflow adapted for the current approach in
>> the next revision. I included it in another response in this thread, but
>> I'll repeat here because it will make it easier to answer your question
>> below.
>>
>> 1) Pressure/OOMs reported while system-wide memory is free.
>> 2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
>>     down node(s) under pressure.
>> 3) Check per-policy hit/miss/foreign counters (added by this patch) on
>>     node(s) to see what policy is driving allocations there (intentional
>>     vs fallback).
>> 4) Use /proc/*/numa_maps to identify tasks using the policy.
>>
>>> One question.  If we have to search /proc/*/numa_maps, why can't we
>>> find all necessary information via /proc/*/numa_maps?  For example,
>>> which VMA uses the most pages on the node?  Which policy is used in the
>>> VMA? ...
>>>
>>
>> There's a gap in the flow of information if we go straight from a node
>> in question to numa_maps. Without step 3 above, we can't distinguish
>> whether pages landed there intentionally, as a fallback, or were
>> migrated sometime after the allocation. These new counters track the
>> results of allocations at the time they happen, preserving that
>> information regardless of what may happen later on.
> 
> Sorry for late reply.
> 
> IMHO, step 3) doesn't add much to the flow.  It only counts allocation,
> not migration, freeing, etc.

This logic would undermine other existing stats.

> I'm afraid that it may be misleading.  For
> example, if a lot of pages have been allocated with a mempolicy, then
> these pages are freed.  /proc/*/numa_maps are more useful stats for the
> goal.

numa_maps only show live snapshots with no attribution. Even if we
tracked them over time, there's no way to determine if the allocations
exist as a result of a policy decision.

> To get all necessary information, I think that more thorough
> tracing is necessary.

Tracking other sources of pages on a node (migration, etc) is
beyond the goal of this patch.

next prev parent reply	other threads:[~2026-03-11 17:32 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07  4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
2026-03-07 12:27 ` Huang, Ying
2026-03-08 19:20   ` Gregory Price
2026-03-09  4:11     ` JP Kobryn (Meta)
2026-03-09  4:31   ` JP Kobryn (Meta)
2026-03-11  2:56     ` Huang, Ying
2026-03-11 17:31       ` JP Kobryn (Meta) [this message]
2026-03-07 14:32 ` kernel test robot
2026-03-07 19:57 ` kernel test robot
2026-03-08 19:24 ` Usama Arif
2026-03-09  3:30   ` JP Kobryn (Meta)
2026-03-11 18:06     ` Johannes Weiner
2026-03-09 23:35 ` Shakeel Butt
2026-03-09 23:43 ` Shakeel Butt
2026-03-10  4:17   ` JP Kobryn (Meta)
2026-03-10 14:53     ` Shakeel Butt
2026-03-10 17:01       ` JP Kobryn (Meta)
2026-03-12 13:40 ` Vlastimil Babka (SUSE)
2026-03-12 16:13   ` JP Kobryn (Meta)
2026-03-13  5:07     ` Huang, Ying
2026-03-13  6:14       ` JP Kobryn (Meta)
2026-03-13  7:34         ` Vlastimil Babka (SUSE)
2026-03-13  9:31           ` Huang, Ying
2026-03-13 18:28             ` JP Kobryn (Meta)
2026-03-13 18:09           ` JP Kobryn (Meta)
2026-03-16  2:54             ` Huang, Ying
2026-03-17  4:37               ` JP Kobryn (Meta)
2026-03-17  6:44                 ` Huang, Ying
2026-03-17 11:10                   ` Vlastimil Babka (SUSE)
2026-03-17 17:55                   ` JP Kobryn (Meta)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a26df89c-5504-445d-a639-ffd2d12efaf5@linux.dev \
    --to=jp.kobryn@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=david@kernel.org \
    --cc=eperezma@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jasowang@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=rakie.kim@sk.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux.dev \
    --cc=weixugc@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox