public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH] sched/numa, mm: Skip page promotion if cpu pid is valid
@ 2026-03-26  7:12 Donet Tom
  2026-03-26 10:29 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 2+ messages in thread
From: Donet Tom @ 2026-03-26  7:12 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Ingo Molnar, Peter Zijlstra
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Baolin Wang, Ying Huang,
	Juri Lelli, Mel Gorman, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Donet Tom

If memory tiering is disabled, cpupid of slow memory pages may
contain a valid CPU and PID. If tiering is enabled at runtime,
there is a chance that in should_numa_migrate_memory(), this
valid CPU/PID is treated as a last access timestamp, leading
to unnecessary promotion.

Prevent this by skipping promotion when cpupid is valid.

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b43809a3fb1..f5830a5a94d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2001,6 +2001,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
 		unsigned int latency, th, def_th;
 		long nr = folio_nr_pages(folio);
 
+		/* When tiering is enabled at runtime, last_cpupid may
+		 * hold a valid cpupid instead of an access timestamp.
+		 * If so, skip page promotion.
+		 */
+		if (cpupid_valid(folio_last_cpupid(folio)))
+			return false;
+
 		pgdat = NODE_DATA(dst_nid);
 		if (pgdat_free_space_enough(pgdat)) {
 			/* workload changed, reset hot threshold */
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] sched/numa, mm: Skip page promotion if cpu pid is valid
  2026-03-26  7:12 [PATCH] sched/numa, mm: Skip page promotion if cpu pid is valid Donet Tom
@ 2026-03-26 10:29 ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 2+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-26 10:29 UTC (permalink / raw)
  To: Donet Tom, Andrew Morton, Ingo Molnar, Peter Zijlstra
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Baolin Wang, Ying Huang,
	Juri Lelli, Mel Gorman, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt

On 3/26/26 08:12, Donet Tom wrote:
> If memory tiering is disabled, cpupid of slow memory pages may
> contain a valid CPU and PID. If tiering is enabled at runtime,
> there is a chance that in should_numa_migrate_memory(), this
> valid CPU/PID is treated as a last access timestamp, leading
> to unnecessary promotion.

Is that measurable? Should we at least have a Fixes: ?

> 
> Prevent this by skipping promotion when cpupid is valid.
> 
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
>  kernel/sched/fair.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4b43809a3fb1..f5830a5a94d5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2001,6 +2001,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>  		unsigned int latency, th, def_th;
>  		long nr = folio_nr_pages(folio);
>  

/*
 * When ...

> +		/* When tiering is enabled at runtime, last_cpupid may
> +		 * hold a valid cpupid instead of an access timestamp.
> +		 * If so, skip page promotion.
> +		 */
> +		if (cpupid_valid(folio_last_cpupid(folio)))
> +			return false;
> +

IIUC, as timestamp we use jiffies_to_msecs(). So, soon after bootup,
we would no longer get false positives for cpupid_valid().
I suppose overflows are not a problem, correct?

So what we're saying is that folio_use_access_time()==true does not
imply that there is actually a valid time in there.

In numa_migrate_check() we could still use the valid cpuid I guess and
make that code a bit clearer?

diff --git a/mm/memory.c b/mm/memory.c
index 631205a384e1..ba68933a9e4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6119,10 +6119,9 @@ int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
         * For memory tiering mode, cpupid of slow memory page is used
         * to record page access time.  So use default value.
         */
-       if (folio_use_access_time(folio))
+       *last_cpupid = folio_last_cpupid(folio);
+       if (!cpupid_valid(*last_cpupid))
                *last_cpupid = (-1 & LAST_CPUPID_MASK);
-       else
-               *last_cpupid = folio_last_cpupid(folio);
 
        /* Record the current PID accessing VMA */
        vma_set_access_pid_bit(vma);


The change itself here looks reasonable to me.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-26 10:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26  7:12 [PATCH] sched/numa, mm: Skip page promotion if cpu pid is valid Donet Tom
2026-03-26 10:29 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox