From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Donet Tom <donettom@linux.ibm.com>
Cc: David Hildenbrand <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ritesh Harjani <ritesh.list@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Ying Huang <huang.ying.caritas@gmail.com>,
Juri Lelli <juri.lelli@redhat.com>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
Date: Thu, 02 Apr 2026 14:24:11 +0800 [thread overview]
Message-ID: <87o6k1ubg4.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <c571bb69-82f2-4346-9f99-6a7258e28a27@linux.ibm.com> (Donet Tom's message of "Thu, 2 Apr 2026 10:29:39 +0530")
Donet Tom <donettom@linux.ibm.com> writes:
> Hi
Hi, Donet,
> On 4/2/26 8:57 AM, Huang, Ying wrote:
>> Donet Tom <donettom@linux.ibm.com> writes:
>>
>>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the pages are on the lower tier, the pages may still be
>>> promoted.
>>>
>>> This happens because task_numa_work() updates the last_cpupid field to
>>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>>> enabled and the folio is on the lower tier. If
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>>> can retains a valid last CPU id.
>>>
>>> In should_numa_migrate_memory(), the decision checks whether
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>>> evaluates to false and migration is allowed.
>>>
>>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the folio is on the lower tier.
>>>
>>> Behavior before this change:
>>> ============================
>>> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>>> nodes within the same memory tier, and promotion from lower
>>> tier to higher tier may also happen.
>>>
>>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>>> lower tier to higher tier nodes is allowed.
>>>
>>> Behavior after this change:
>>> ===========================
>>> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>>> between nodes within the same memory tier.
>>>
>>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>>> tier to higher tier nodes will be allowed.
>>>
>>> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>>> enabled, both migration (same tier) and promotion (cross tier) are
>>> allowed.
>>>
>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>> ---
>>> v1 -> v2
>>> ========
>>> 1. Dropped changes in task_numa_fault() since the original changes
>>> already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>>>
>>> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
>>> ---
>>> kernel/sched/fair.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index bf948db905ed..4b43809a3fb1 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>>> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>>> last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>>> + /*
>>> + * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
>>> + * and the pages are on the lower tier.
>>> + */
>>> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
>>> - !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
>>> + !node_is_toptier(src_nid))
>>> return false;
>>> /*
>> No. Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
>> allow migrate pages from lower tier to higher tier via
>> NUMA_BALANCING_NORMAL. If we have precious DDR, why waste it? This
>> follows the semantics of NUMA_BALANCING_NORMAL before introducing
>> NUMA_BALANCING_MEMORY_TIERING.
>
> Thank you for the review comments.
>
> One thing I am trying to understand is that page promotion
> appears to happen regardless of whether
> NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
> case, what is the specific role of
> NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
> when it is enabled?
You can search NUMA_BALANCING_MEMORY_TIERING to find out what it does.
We can get better performance as the original commit message says.
When NUMA_BALANCING_MEMORY_TIERING is introduced, we didn't change the
original behavior of NUMA_BALANCING_MEMORY_NORMAL because we had no good
reason to do that. In fact, you change its behavior, so you should
provide some supporting data or bug report to justify the change.
> My initial understanding was that disabling
> NUMA_BALANCING_MEMORY_TIERING could be used to turn off
> promotion. However, it seems that currently we cannot control
> promotion independently. If NUMA_BALANCING_NORMAL is disabled,
> neither migration nor promotion happens, and if it is enabled,
> both migration and promotion can occur.
>
> I was under the impression that:
> - NUMA_BALANCING_NORMAL would handle migration within the same tier,
> - NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
> - and enabling both would allow both migration and promotion.
>
> This would provide more fine-grained control. Is my
> understanding correct, or am I missing something here?
You can change this, if you have some supporting data or bug report.
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2026-04-02 6:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 9:48 [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled Donet Tom
2026-04-02 0:22 ` Andrew Morton
2026-04-02 3:31 ` Huang, Ying
2026-04-02 3:27 ` Huang, Ying
2026-04-02 4:59 ` Donet Tom
2026-04-02 6:24 ` Huang, Ying [this message]
2026-04-08 13:20 ` Donet Tom
2026-04-09 1:28 ` Huang, Ying
2026-04-09 3:42 ` Ritesh Harjani
2026-04-09 6:39 ` Huang, Ying
2026-04-09 14:10 ` Gregory Price
2026-04-10 1:07 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o6k1ubg4.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=donettom@linux.ibm.com \
--cc=huang.ying.caritas@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ritesh.list@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.