All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <ziy@nvidia.com>, Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Neha Gholkar <nehagholkar@gmail.com>
Subject: Re: [PATCH] mm: mempolicy: fix automatic numa balancing for shmem
Date: Mon, 29 Jun 2026 20:33:32 +0200	[thread overview]
Message-ID: <e18f075a-2203-4ebb-8f4e-713d386d0ef3@kernel.org> (raw)
In-Reply-To: <20260629163337.1264881-1-hannes@cmpxchg.org>

On 6/29/26 18:33, Johannes Weiner wrote:
> Neha reports that mapped shmem aren't considered for NUMA balancing,
> noting convergence problems and bandwidth bottlenecking for cachelib
> based workloads on tiered memory systems.
> 
> Looking at the code and going through the git history, this doesn't
> actually seem intentional:
> 
> Commit fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault
> VMAs") added a vma_policy_mof() gate to task_numa_work() so VMAs whose
> policy lacks MPOL_F_MOF are skipped from NUMA balancing scans. The
> motivation was a real usecase: Oracle was pinning shared segments with
> mbind(MPOL_BIND) so trapping faults was both expensive and pointless.
> 
> The handling of NULL from vm_ops->get_policy, however, treated "user
> explicitly opted out" the same as "user never specified anything." For
> VMAs whose shared policy is absent - the common case for shmem - the
> scan was disabled too.
> 
> This issue is old. It probably hurts less in conventional NUMA. But it's
> very noticable on tiered systems, where entire tmpfs workingsets can get
> stuck on lower-bandwidth memory.

Sounds bad enough to warrant CC: stable?

> 
> Fix this by having vma_policy_mof() use __get_vma_policy() directly, and
> thereby handle the fallback to task policy (-> preferred_node_policy()
> has MPOL_F_MOF per default). Every other consumer of vm_ops->get_policy
> already handles it this way, the scan-eligibility check was the outlier.
> 
> This preserves Mel's intended fix: don't scan stuff the user explicitly
> pinned. But allow default policy vmas to participate in balancing.
> 
> Reported-by: Neha Gholkar <nehagholkar@gmail.com>
> Tested-by: Neha Gholkar <nehagholkar@gmail.com>
> Fixes: fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault VMAs")



> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/mempolicy.c | 21 ++++++---------------
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 36699fabd3c2..bba65898aee1 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2057,24 +2057,15 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
>  bool vma_policy_mof(struct vm_area_struct *vma)
>  {
>  	struct mempolicy *pol;
> +	pgoff_t ilx;
> +	bool mof;
>  
> -	if (vma->vm_ops && vma->vm_ops->get_policy) {
> -		bool ret = false;
> -		pgoff_t ilx;		/* ignored here */
> -
> -		pol = vma->vm_ops->get_policy(vma, vma->vm_start, &ilx);
> -		if (pol && (pol->flags & MPOL_F_MOF))
> -			ret = true;
> -		mpol_cond_put(pol);
> -
> -		return ret;
> -	}

Okay, we used the fallback of vma->vm_policy before (if vma->vm_ops->get_policy
was not available), which is what __get_vma_policy() does at well.

But if vma->vm_ops->get_policy now returns NULL, we fallback to get_task_policy().


Makes sense to me although this is a source of confusion for me.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

  parent reply	other threads:[~2026-06-29 18:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 16:33 [PATCH] mm: mempolicy: fix automatic numa balancing for shmem Johannes Weiner
2026-06-29 17:59 ` Gregory Price
2026-06-29 18:22   ` Johannes Weiner
2026-06-30 11:20   ` Huang, Ying
2026-06-30 15:29     ` Gregory Price
2026-07-01 11:03       ` Huang, Ying
2026-07-01 15:33         ` Gregory Price
2026-07-01 15:49           ` Johannes Weiner
2026-07-01 16:22             ` Gregory Price
2026-06-29 18:33 ` David Hildenbrand (Arm) [this message]
2026-06-29 18:47   ` Johannes Weiner
2026-06-30 11:26     ` David Hildenbrand (Arm)
2026-06-30 23:40 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e18f075a-2203-4ebb-8f4e-713d386d0ef3@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=nehagholkar@gmail.com \
    --cc=rakie.kim@sk.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.