Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: mempolicy: fix automatic numa balancing for shmem
@ 2026-06-29 16:33 Johannes Weiner
  2026-06-29 17:59 ` Gregory Price
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Johannes Weiner @ 2026-06-29 16:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	linux-mm, linux-kernel, Neha Gholkar

Neha reports that mapped shmem aren't considered for NUMA balancing,
noting convergence problems and bandwidth bottlenecking for cachelib
based workloads on tiered memory systems.

Looking at the code and going through the git history, this doesn't
actually seem intentional:

Commit fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault
VMAs") added a vma_policy_mof() gate to task_numa_work() so VMAs whose
policy lacks MPOL_F_MOF are skipped from NUMA balancing scans. The
motivation was a real usecase: Oracle was pinning shared segments with
mbind(MPOL_BIND) so trapping faults was both expensive and pointless.

The handling of NULL from vm_ops->get_policy, however, treated "user
explicitly opted out" the same as "user never specified anything." For
VMAs whose shared policy is absent - the common case for shmem - the
scan was disabled too.

This issue is old. It probably hurts less in conventional NUMA. But it's
very noticable on tiered systems, where entire tmpfs workingsets can get
stuck on lower-bandwidth memory.

Fix this by having vma_policy_mof() use __get_vma_policy() directly, and
thereby handle the fallback to task policy (-> preferred_node_policy()
has MPOL_F_MOF per default). Every other consumer of vm_ops->get_policy
already handles it this way, the scan-eligibility check was the outlier.

This preserves Mel's intended fix: don't scan stuff the user explicitly
pinned. But allow default policy vmas to participate in balancing.

Reported-by: Neha Gholkar <nehagholkar@gmail.com>
Tested-by: Neha Gholkar <nehagholkar@gmail.com>
Fixes: fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault VMAs")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/mempolicy.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36699fabd3c2..bba65898aee1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2057,24 +2057,15 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
 bool vma_policy_mof(struct vm_area_struct *vma)
 {
 	struct mempolicy *pol;
+	pgoff_t ilx;
+	bool mof;
 
-	if (vma->vm_ops && vma->vm_ops->get_policy) {
-		bool ret = false;
-		pgoff_t ilx;		/* ignored here */
-
-		pol = vma->vm_ops->get_policy(vma, vma->vm_start, &ilx);
-		if (pol && (pol->flags & MPOL_F_MOF))
-			ret = true;
-		mpol_cond_put(pol);
-
-		return ret;
-	}
-
-	pol = vma->vm_policy;
+	pol = __get_vma_policy(vma, vma->vm_start, &ilx);
 	if (!pol)
 		pol = get_task_policy(current);
-
-	return pol->flags & MPOL_F_MOF;
+	mof = pol->flags & MPOL_F_MOF;
+	mpol_cond_put(pol);
+	return mof;
 }
 
 bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-07-01 15:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 16:33 [PATCH] mm: mempolicy: fix automatic numa balancing for shmem Johannes Weiner
2026-06-29 17:59 ` Gregory Price
2026-06-29 18:22   ` Johannes Weiner
2026-06-30 11:20   ` Huang, Ying
2026-06-30 15:29     ` Gregory Price
2026-07-01 11:03       ` Huang, Ying
2026-07-01 15:33         ` Gregory Price
2026-07-01 15:49           ` Johannes Weiner
2026-06-29 18:33 ` David Hildenbrand (Arm)
2026-06-29 18:47   ` Johannes Weiner
2026-06-30 11:26     ` David Hildenbrand (Arm)
2026-06-30 23:40 ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox