From: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, vbabka@kernel.org, mhocko@suse.com,
ying.huang@linux.alibaba.com, hannes@cmpxchg.org,
shakeel.butt@linux.dev, gourry@gourry.net, kasong@tencent.com,
qi.zheng@linux.dev, baohua@kernel.org, axelrasmussen@google.com,
yuanchu@google.com, weixugc@google.com, david@kernel.org,
ljs@kernel.org, liam@infradead.org, rppt@kernel.org,
surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
apopple@nvidia.com, linux-kernel@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v4] mm/mempolicy: track user-defined mempolicy allocations
Date: Mon, 4 May 2026 23:22:40 -0700 [thread overview]
Message-ID: <8b0cf967-9d90-4494-8264-a28bbc498ca7@linux.dev> (raw)
In-Reply-To: <20260427141123.734c66450100c0f900e02947@linux-foundation.org>
On 4/27/26 2:11 PM, Andrew Morton wrote:
> On Mon, 27 Apr 2026 08:15:20 -0700 "JP Kobryn (Meta)" <jp.kobryn@linux.dev> wrote:
>
>> When investigating pressure on a NUMA node, there is no straightforward way
>> to determine which user-defined policies are driving allocations to it.
>>
>> Add NUMA mempolicy allocation counters as new node stat items. These
>> counters track allocations to nodes and also whether the allocations were
>> intentional or fallbacks.
>
> AI review:
> https://sashiko.dev/#/patchset/20260427151520.137341-1-jp.kobryn@linux.dev
This was helpful. I quoted the review points and answered them inline.
: For MPOL_PREFERRED_MANY and MPOL_BIND policies, policy_nodemask() does
: not modify the nid parameter unless home_node is set, so intended_nid
: defaults to the caller's local node.
Yes, this patch is not intended to change that behavior.
: If an allocation falls back outside the preferred nodemask, will the
: FOREIGN stat incorrectly penalize the local node, which was never the
: intended target?
The allocation can land in the mask and count as a hit but if outside
the mask, the intended node is incremented with foreign. Assuming home
node is not set, the local node dictates the search path for the
fallback path. So in that regard foreign can apply. The alternative
would be to increment foreign for all nodes in the mask after a miss but
that would imbalance miss/foreign and skew the data. Foreign may not
make sense for mask-based policies.
: Furthermore, if the fallback allocation lands on the local node, will
: it simultaneously count as both a 'miss' and a 'foreign' on the exact
: same node?
Yes, more support that foreign does not map well to a mask-based policy.
Having data on just hit and miss would be sufficient for the
investigative purpose of this patch.
: mod_node_page_state() unconditionally executes local_irq_save() and
: local_irq_restore().
: Since mpol_count_numa_alloc() is invoked on every
: successful page allocation governed by a user-defined mempolicy, does
: this introduce severe IRQ-disabling overhead into the highly optimized
: page allocation fast path?
: Established NUMA counters (like NUMA_HIT) avoid this lock contention
: by using lockless per-cpu operations via __count_numa_event() and
: raw_cpu_add().
For reasons explained more below, I plan on changing from counters to
tracepoints.
: The patch tracks user-defined mempolicy allocations by instrumenting
: alloc_pages_mpol().
: Bulk memory allocations under a mempolicy are routed through
: alloc_pages_bulk_mempolicy_noprof(), which dispatches to specialized
: bulk allocators and bypasses alloc_pages_mpol() entirely. Will this
: lead to silent undercounting of mempolicy allocations for workloads
: utilizing bulk allocation?
: Similarly, Hugetlbfs allocations resolve their mempolicies
: independently via huge_node() and allocate pages through
: alloc_buddy_hugetlb_folio_with_mpol(), which directly invokes the
: buddy allocator. Will Hugetlbfs allocations also be completely
: excluded from the new NUMA_MPOL_* counters?
It seems the existing NUMA_INTERLEAVE_HIT misses this as well (only
counted in alloc_pages_mpol()). But closing this gap with the new stats
looks like it will become messy since every individual allocation of the
bulk request would have to be accounted for. I think using tracepoints
would be cleaner for not only solving this bulk issue, but the other
concerns as well.
I know some other reviewers favored tracepoints over adding new stats
altogether. Originally I saw this as a convenience trade off because
of the instrumentation needed from a userspace consumer. But given the
challenges with the foreign mapping, irq concern, and bulk counting
complexity, I'll go this direction in v5 and hopefully get more
consensus.
prev parent reply other threads:[~2026-05-05 6:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 15:15 [PATCH v4] mm/mempolicy: track user-defined mempolicy allocations JP Kobryn (Meta)
2026-04-27 21:11 ` Andrew Morton
2026-05-05 6:22 ` JP Kobryn (Meta) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b0cf967-9d90-4494-8264-a28bbc498ca7@linux.dev \
--to=jp.kobryn@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=byungchul@sk.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=qi.zheng@linux.dev \
--cc=rakie.kim@sk.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox