linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Bijan Tabatabai <bijan311@gmail.com>
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, sj@kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net, ziy@nvidia.com,
	matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	bijantabatab@micron.com, venkataravis@micron.com,
	emirakhur@micron.com, ajayjoshi@micron.com,
	vtavarespetr@micron.com, damon@lists.linux.dev
Subject: Re: [RFC PATCH 1/4] mm/mempolicy: Expose policy_nodemask() in include/linux/mempolicy.h
Date: Mon, 16 Jun 2025 11:45:58 +0200	[thread overview]
Message-ID: <e40aa590-f0a2-4666-84b0-c33c8f4fef87@redhat.com> (raw)
In-Reply-To: <CAMvvPS5U8exSvy0fknfhv8ym_dKgMVa7cfMOqn0fGyd+NSjSuQ@mail.gmail.com>

On 13.06.25 18:33, Bijan Tabatabai wrote:
> On Fri, Jun 13, 2025 at 8:45 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 12.06.25 20:13, Bijan Tabatabai wrote:
>>> From: Bijan Tabatabai <bijantabatab@micron.com>
>>>
>>> This patch is to allow DAMON to call policy_nodemask() so it can
>>> determine where to place a page for interleaving.
>>>
>>> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
>>> ---
>>>    include/linux/mempolicy.h | 9 +++++++++
>>>    mm/mempolicy.c            | 4 +---
>>>    2 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
>>> index 0fe96f3ab3ef..e96bf493ff7a 100644
>>> --- a/include/linux/mempolicy.h
>>> +++ b/include/linux/mempolicy.h
>>> @@ -133,6 +133,8 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
>>>    struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
>>>                unsigned long addr, int order, pgoff_t *ilx);
>>>    bool vma_policy_mof(struct vm_area_struct *vma);
>>> +nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
>>> +             pgoff_t ilx, int *nid);
>>>
>>>    extern void numa_default_policy(void);
>>>    extern void numa_policy_init(void);
>>> @@ -232,6 +234,13 @@ static inline struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
>>>        return NULL;
>>>    }
>>>
>>> +static inline nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
>>> +                             pgoff_t ilx, int *nid)
>>> +{
>>> +     *nid = NUMA_NO_NODE;
>>> +     return NULL;
>>> +}
>>> +
>>>    static inline int
>>>    vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst)
>>>    {
>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>> index 3b1dfd08338b..54f539497e20 100644
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -596,8 +596,6 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
>>>
>>>    static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
>>>                                unsigned long flags);
>>> -static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
>>> -                             pgoff_t ilx, int *nid);
>>>
>>>    static bool strictly_unmovable(unsigned long flags)
>>>    {
>>> @@ -2195,7 +2193,7 @@ static unsigned int interleave_nid(struct mempolicy *pol, pgoff_t ilx)
>>>     * Return a nodemask representing a mempolicy for filtering nodes for
>>>     * page allocation, together with preferred node id (or the input node id).
>>>     */
>>> -static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
>>> +nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
>>>                                   pgoff_t ilx, int *nid)
>>>    {
>>>        nodemask_t *nodemask = NULL;
>>
>> You actually only care about the nid for your use case.
>>
>> Maybe we should add
>>
>> get_vma_policy_node() that internally does a get_vma_policy() to then
>> give you only the node back.
>>
>> If get_vma_policy() is not the right thing (see my reply to patch #2),
>> of course a get_task_policy_node() could be added.
>>
>> --
>> Cheers,
>>
>> David / dhildenb
> 
> Hi David,

Hi,

> 
> I did not use get_vma_policy or mpol_misplaced, which I believe is the
> closest function that exists for what I want in this patch, because
> those functions

I think what you mean is, that you are performing an rmap walk. But 
there, you do have a VMA + MM available (stable).

> seem to assume they are called inside of the task that the folio/vma
> is mapped to.

But, we do have a VMA at hand, so why would we want to ignore any set 
policy? (I think VMA policies so far only apply to shmem, but still).

I really think you want to use get_vma_policy() instead of the task policy.


> More specifically, mpol_misplaced assumes it is being called within a
> page fault.
> This doesn't work for us, because we call it inside of a kdamond process.

Right.

But it uses the vmf only for ...

1) Obtaining the VMA
2) Sanity-checking that the ptlock is held.

Which, you also have during the rmap walk.


So what about factoring out that handling from mpol_misplaced(), having 
another function where you pass the VMA instead of the vmf?

> 
> I would be open to adding a new function that takes in a folio, vma,
> address, and
> task_struct and returns the nid the folio should be placed on. It could possibly
> be implemented as a function internal to mpol_misplaced because the two would
> be very similar.

Good, you had the same thought :)

> 
> How would you propose we handle MPOL_BIND and MPOL_PREFFERED_MANY
> in this function? mpol_misplaced chooses a nid based on the node and
> cpu the fault
> occurred on, which we wouldn't have in a kdamond context. The two options I see
> are either:
> 1. return the nid of the first node in the policy's nodemask
> 2. return NUMA_NO_NODE
> I think I would lean towards the first.

I guess we'd need a way for your new helper to deal with both cases 
(is_fault vs. !is_fault), and make a decision based on that.


For your use case, you can then decide what would be appropriate. It's a 
good question what the appropriate action would be: 1) sounds better, 
but I do wonder if we would rather want to distribute the folios in a 
different way across applicable nodes, not sure ...

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-06-16  9:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 18:13 [RFC PATCH 0/4] mm/damon: Add DAMOS action to interleave data across nodes Bijan Tabatabai
2025-06-12 18:13 ` [RFC PATCH 1/4] mm/mempolicy: Expose policy_nodemask() in include/linux/mempolicy.h Bijan Tabatabai
2025-06-13 13:45   ` David Hildenbrand
2025-06-13 16:33     ` Bijan Tabatabai
2025-06-16  9:45       ` David Hildenbrand [this message]
2025-06-16 11:02         ` Huang, Ying
2025-06-16 11:11           ` David Hildenbrand
2025-06-16 14:16         ` Bijan Tabatabai
2025-06-16 14:26           ` David Hildenbrand
2025-06-16 17:43           ` Gregory Price
2025-06-16 22:16             ` Bijan Tabatabai
2025-06-17 18:58               ` SeongJae Park
2025-06-17 19:54                 ` Bijan Tabatabai
2025-06-17 22:30                   ` SeongJae Park
2025-06-16 10:55       ` Huang, Ying
2025-06-12 18:13 ` [RFC PATCH 2/4] mm/damon/paddr: Add DAMOS_INTERLEAVE action Bijan Tabatabai
2025-06-13 13:43   ` David Hildenbrand
2025-06-12 18:13 ` [RFC PATCH 3/4] mm/damon: Move damon_pa_migrate_pages to ops-common Bijan Tabatabai
2025-06-12 18:13 ` [RFC PATCH 4/4] mm/damon/vaddr: Add vaddr version of DAMOS_INTERLEAVE Bijan Tabatabai
2025-06-12 23:49 ` [RFC PATCH 0/4] mm/damon: Add DAMOS action to interleave data across nodes SeongJae Park
2025-06-13  2:41   ` Huang, Ying
2025-06-13 16:02     ` Bijan Tabatabai
2025-06-13 15:44   ` Bijan Tabatabai
2025-06-13 17:12     ` SeongJae Park
2025-06-16  7:42     ` Byungchul Park
2025-06-16 15:01       ` Bijan Tabatabai
2025-06-13  9:55 ` Rakie Kim
2025-06-13 16:12   ` Bijan Tabatabai
2025-06-13 15:25 ` Joshua Hahn
2025-06-13 16:46   ` Bijan Tabatabai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e40aa590-f0a2-4666-84b0-c33c8f4fef87@redhat.com \
    --to=david@redhat.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bijan311@gmail.com \
    --cc=bijantabatab@micron.com \
    --cc=byungchul@sk.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=emirakhur@micron.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=rakie.kim@sk.com \
    --cc=sj@kernel.org \
    --cc=venkataravis@micron.com \
    --cc=vtavarespetr@micron.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).