From: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
To: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@suse.com,
vbabka@suse.cz, apopple@nvidia.com, axelrasmussen@google.com,
byungchul@sk.com, cgroups@vger.kernel.org, david@kernel.org,
eperezma@redhat.com, gourry@gourry.net, jasowang@redhat.com,
hannes@cmpxchg.org, joshua.hahnjy@gmail.com,
Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org,
lorenzo.stoakes@oracle.com, matthew.brost@intel.com,
mst@redhat.com, rppt@kernel.org, muchun.song@linux.dev,
zhengqi.arch@bytedance.com, rakie.kim@sk.com,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
surenb@google.com, virtualization@lists.linux.dev,
weixugc@google.com, xuanzhuo@linux.alibaba.com,
yuanchu@google.com, ziy@nvidia.com, kernel-team@meta.com
Subject: Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
Date: Wed, 11 Mar 2026 10:31:48 -0700 [thread overview]
Message-ID: <a26df89c-5504-445d-a639-ffd2d12efaf5@linux.dev> (raw)
In-Reply-To: <87cy1boyzd.fsf@DESKTOP-5N7EMDA>
On 3/10/26 7:56 PM, Huang, Ying wrote:
> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>
>> On 3/7/26 4:27 AM, Huang, Ying wrote:
>>> "JP Kobryn (Meta)" <jp.kobryn@linux.dev> writes:
>>>
>>>> When investigating pressure on a NUMA node, there is no straightforward way
>>>> to determine which policies are driving allocations to it.
>>>>
>>>> Add per-policy page allocation counters as new node stat items. These
>>>> counters track allocations to nodes and also whether the allocations were
>>>> intentional or fallbacks.
>>>>
>>>> The new stats follow the existing numa hit/miss/foreign style and have the
>>>> following meanings:
>>>>
>>>> hit
>>>> - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
>>>> - for other policies, allocation succeeded on intended node
>>>> - counted on the node of the allocation
>>>> miss
>>>> - allocation intended for other node, but happened on this one
>>>> - counted on other node
>>>> foreign
>>>> - allocation intended on this node, but happened on other node
>>>> - counted on this node
>>>>
>>>> Counters are exposed per-memcg, per-node in memory.numa_stat and globally
>>>> in /proc/vmstat.
>>> IMHO, it may be better to describe your workflow as an example to
>>> use
>>> the newly added statistics. That can describe why we need them. For
>>> example, what you have described in
>>> https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/
>>>
>>>> 1) Pressure/OOMs reported while system-wide memory is free.
>>>> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
>>>> down node(s) under pressure. They become available in
>>>> /sys/devices/system/node/nodeN/vmstat.
>>>> 3) Check per-policy allocation counters (this patch) on that node to
>>>> find what policy was driving it. Same readout at nodeN/vmstat.
>>>> 4) Now use /proc/*/numa_maps to identify tasks using the policy.
>>>
>>
>> Good call. I'll add a workflow adapted for the current approach in
>> the next revision. I included it in another response in this thread, but
>> I'll repeat here because it will make it easier to answer your question
>> below.
>>
>> 1) Pressure/OOMs reported while system-wide memory is free.
>> 2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
>> down node(s) under pressure.
>> 3) Check per-policy hit/miss/foreign counters (added by this patch) on
>> node(s) to see what policy is driving allocations there (intentional
>> vs fallback).
>> 4) Use /proc/*/numa_maps to identify tasks using the policy.
>>
>>> One question. If we have to search /proc/*/numa_maps, why can't we
>>> find all necessary information via /proc/*/numa_maps? For example,
>>> which VMA uses the most pages on the node? Which policy is used in the
>>> VMA? ...
>>>
>>
>> There's a gap in the flow of information if we go straight from a node
>> in question to numa_maps. Without step 3 above, we can't distinguish
>> whether pages landed there intentionally, as a fallback, or were
>> migrated sometime after the allocation. These new counters track the
>> results of allocations at the time they happen, preserving that
>> information regardless of what may happen later on.
>
> Sorry for late reply.
>
> IMHO, step 3) doesn't add much to the flow. It only counts allocation,
> not migration, freeing, etc.
This logic would undermine other existing stats.
> I'm afraid that it may be misleading. For
> example, if a lot of pages have been allocated with a mempolicy, then
> these pages are freed. /proc/*/numa_maps are more useful stats for the
> goal.
numa_maps only show live snapshots with no attribution. Even if we
tracked them over time, there's no way to determine if the allocations
exist as a result of a policy decision.
> To get all necessary information, I think that more thorough
> tracing is necessary.
Tracking other sources of pages on a node (migration, etc) is
beyond the goal of this patch.
next prev parent reply other threads:[~2026-03-11 17:32 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 4:55 [PATCH v2] mm/mempolicy: track page allocations per mempolicy JP Kobryn (Meta)
2026-03-07 12:27 ` Huang, Ying
2026-03-08 19:20 ` Gregory Price
2026-03-09 4:11 ` JP Kobryn (Meta)
2026-03-09 4:31 ` JP Kobryn (Meta)
2026-03-11 2:56 ` Huang, Ying
2026-03-11 17:31 ` JP Kobryn (Meta) [this message]
2026-03-07 14:32 ` kernel test robot
2026-03-07 19:57 ` kernel test robot
2026-03-08 19:24 ` Usama Arif
2026-03-09 3:30 ` JP Kobryn (Meta)
2026-03-11 18:06 ` Johannes Weiner
2026-03-09 23:35 ` Shakeel Butt
2026-03-09 23:43 ` Shakeel Butt
2026-03-10 4:17 ` JP Kobryn (Meta)
2026-03-10 14:53 ` Shakeel Butt
2026-03-10 17:01 ` JP Kobryn (Meta)
2026-03-12 13:40 ` Vlastimil Babka (SUSE)
2026-03-12 16:13 ` JP Kobryn (Meta)
2026-03-13 5:07 ` Huang, Ying
2026-03-13 6:14 ` JP Kobryn (Meta)
2026-03-13 7:34 ` Vlastimil Babka (SUSE)
2026-03-13 9:31 ` Huang, Ying
2026-03-13 18:28 ` JP Kobryn (Meta)
2026-03-13 18:09 ` JP Kobryn (Meta)
2026-03-16 2:54 ` Huang, Ying
2026-03-17 4:37 ` JP Kobryn (Meta)
2026-03-17 6:44 ` Huang, Ying
2026-03-17 11:10 ` Vlastimil Babka (SUSE)
2026-03-17 17:55 ` JP Kobryn (Meta)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a26df89c-5504-445d-a639-ffd2d12efaf5@linux.dev \
--to=jp.kobryn@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=byungchul@sk.com \
--cc=cgroups@vger.kernel.org \
--cc=david@kernel.org \
--cc=eperezma@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=jasowang@redhat.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mst@redhat.com \
--cc=muchun.song@linux.dev \
--cc=rakie.kim@sk.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux.dev \
--cc=weixugc@google.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox