From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Li Zhijian <lizhijian@fujitsu.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, y-goto@fujitsu.com,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>
Subject: Re: [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting
Date: Fri, 20 Jun 2025 14:28:51 +0800 [thread overview]
Message-ID: <87ldpn2afw.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250619075245.3272384-1-lizhijian@fujitsu.com> (Li Zhijian's message of "Thu, 19 Jun 2025 15:52:45 +0800")
Li Zhijian <lizhijian@fujitsu.com> writes:
> Goto-san reported confusing pgpromote statistics where
> the pgpromote_success count significantly exceeded pgpromote_candidate.
> The issue manifests under specific memory pressure conditions:
> when top-tier memory (DRAM) is exhausted by memhog and allocation begins
> in lower-tier memory (CXL). After terminating memhog, the stats show:
The above description is confusing. The page promotion occurs when the
size of the top-tier free space is large enough (after killing the
memhog above). The accessed lower-tier memory will be promoted upon
accessing to take full advantage of the more expensive top-tier memory.
> $ grep -e pgpromote /proc/vmstat
> pgpromote_success 2579
> pgpromote_candidate 1
>
> This update increments PGPROMOTE_CANDIDATE within the free space branch
> when a promotion decision is made, which may alter the mechanism of the
> rate limit. Consequently, it becomes easier to reach the rate limit than
> it was previously.
>
> For example:
> Rate Limit = 100 pages/sec
> Scenario:
> T0: 90 free-space migrations
> T0+100ms: 20-page migration request
>
> Before:
> Rate limit is *not* reached: 0 + 20 = 20 < 100
> PGPROMOTE_CANDIDATE: 20
> After:
> Rate limit is reached: 90 + 20 = 110 > 100
> PGPROMOTE_CANDIDATE: 110
Yes. The rate limit will be influenced by the change. So, more tests
may be needed to verify it will not incurs regressions.
>
> Reported-by: Yasunori Gotou (Fujitsu) <y-goto@fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>
> This is markes as RFC because I am uncertain whether we originally
> intended for this or if it was overlooked.
>
> However, the current situation where pgpromote_candidate < pgpromote_success
> is indeed confusing when interpreted literally.
>
> Cc: Huang Ying <ying.huang@linux.alibaba.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> ---
> kernel/sched/fair.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a14da5396fb..4715cd4fa248 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1940,11 +1940,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> struct pglist_data *pgdat;
> unsigned long rate_limit;
> unsigned int latency, th, def_th;
> + long nr = folio_nr_pages(folio)
>
> pgdat = NODE_DATA(dst_nid);
> if (pgdat_free_space_enough(pgdat)) {
> /* workload changed, reset hot threshold */
> pgdat->nbp_threshold = 0;
> + mod_node_page_state(pgdat, PGPROMOTE_CANDIDATE, nr);
> return true;
> }
>
> @@ -1958,8 +1960,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> if (latency >= th)
> return false;
>
> - return !numa_promotion_rate_limit(pgdat, rate_limit,
> - folio_nr_pages(folio));
> + return !numa_promotion_rate_limit(pgdat, rate_limit, nr);
> }
>
> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2025-06-20 6:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-19 7:52 [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting Li Zhijian
2025-06-19 22:06 ` kernel test robot
2025-06-20 2:04 ` Zhijian Li (Fujitsu)
2025-06-20 2:22 ` Philip Li
2025-06-20 6:28 ` Huang, Ying [this message]
2025-06-23 8:54 ` Zhijian Li (Fujitsu)
2025-06-24 2:46 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldpn2afw.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhijian@fujitsu.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.