All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Li Zhijian <lizhijian@fujitsu.com>
Cc: linux-mm@kvack.org,  akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org,  y-goto@fujitsu.com,
	 Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>,  Mel Gorman <mgorman@suse.de>,
	 Valentin Schneider <vschneid@redhat.com>
Subject: Re: [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting
Date: Fri, 20 Jun 2025 14:28:51 +0800	[thread overview]
Message-ID: <87ldpn2afw.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250619075245.3272384-1-lizhijian@fujitsu.com> (Li Zhijian's message of "Thu, 19 Jun 2025 15:52:45 +0800")

Li Zhijian <lizhijian@fujitsu.com> writes:

> Goto-san reported confusing pgpromote statistics where
> the pgpromote_success count significantly exceeded pgpromote_candidate.
> The issue manifests under specific memory pressure conditions:
> when top-tier memory (DRAM) is exhausted by memhog and allocation begins
> in lower-tier memory (CXL). After terminating memhog, the stats show:

The above description is confusing.  The page promotion occurs when the
size of the top-tier free space is large enough (after killing the
memhog above).  The accessed lower-tier memory will be promoted upon
accessing to take full advantage of the more expensive top-tier memory.

> $ grep -e pgpromote /proc/vmstat
> pgpromote_success 2579
> pgpromote_candidate 1
>
> This update increments PGPROMOTE_CANDIDATE within the free space branch
> when a promotion decision is made, which may alter the mechanism of the
> rate limit. Consequently, it becomes easier to reach the rate limit than
> it was previously.
>
> For example:
> Rate Limit = 100 pages/sec
> Scenario:
>   T0: 90 free-space migrations
>   T0+100ms: 20-page migration request
>
> Before:
>   Rate limit is *not* reached: 0 + 20 = 20 < 100
>   PGPROMOTE_CANDIDATE: 20
> After:
>   Rate limit is reached: 90 + 20 = 110 > 100
>   PGPROMOTE_CANDIDATE: 110

Yes.  The rate limit will be influenced by the change.  So, more tests
may be needed to verify it will not incurs regressions.

>
> Reported-by: Yasunori Gotou (Fujitsu) <y-goto@fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>
> This is markes as RFC because I am uncertain whether we originally
> intended for this or if it was overlooked.
>
> However, the current situation where pgpromote_candidate < pgpromote_success
> is indeed confusing when interpreted literally.
>
> Cc: Huang Ying <ying.huang@linux.alibaba.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> ---
>  kernel/sched/fair.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a14da5396fb..4715cd4fa248 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1940,11 +1940,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>  		struct pglist_data *pgdat;
>  		unsigned long rate_limit;
>  		unsigned int latency, th, def_th;
> +		long nr = folio_nr_pages(folio)
>  
>  		pgdat = NODE_DATA(dst_nid);
>  		if (pgdat_free_space_enough(pgdat)) {
>  			/* workload changed, reset hot threshold */
>  			pgdat->nbp_threshold = 0;
> +			mod_node_page_state(pgdat, PGPROMOTE_CANDIDATE, nr);
>  			return true;
>  		}
>  
> @@ -1958,8 +1960,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>  		if (latency >= th)
>  			return false;
>  
> -		return !numa_promotion_rate_limit(pgdat, rate_limit,
> -						  folio_nr_pages(folio));
> +		return !numa_promotion_rate_limit(pgdat, rate_limit, nr);
>  	}
>  
>  	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);

---
Best Regards,
Huang, Ying


  parent reply	other threads:[~2025-06-20  6:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-19  7:52 [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting Li Zhijian
2025-06-19 22:06 ` kernel test robot
2025-06-20  2:04   ` Zhijian Li (Fujitsu)
2025-06-20  2:22     ` Philip Li
2025-06-20  6:28 ` Huang, Ying [this message]
2025-06-23  8:54   ` Zhijian Li (Fujitsu)
2025-06-24  2:46     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldpn2afw.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhijian@fujitsu.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.