Re: [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
	 Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>,  Mel Gorman <mgorman@suse.de>,
	 Valentin Schneider <vschneid@redhat.com>,
	 kernel test robot <lkp@intel.com>
Subject: Re: [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting
Date: Tue, 24 Jun 2025 10:46:44 +0800	[thread overview]
Message-ID: <87ms9xonzf.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <47f42c60-9752-4bc6-9079-627b6e0b9cfc@fujitsu.com> (Zhijian Li's message of "Mon, 23 Jun 2025 08:54:28 +0000")

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 20/06/2025 14:28, Huang, Ying wrote:
>> Li Zhijian <lizhijian@fujitsu.com> writes:
>> 
>>> Goto-san reported confusing pgpromote statistics where
>>> the pgpromote_success count significantly exceeded pgpromote_candidate.
>>> The issue manifests under specific memory pressure conditions:
>>> when top-tier memory (DRAM) is exhausted by memhog and allocation begins
>>> in lower-tier memory (CXL). After terminating memhog, the stats show:
>> 
>> The above description is confusing.  The page promotion occurs when the
>> size of the top-tier free space is large enough (after killing the
>> memhog above).  The accessed lower-tier memory will be promoted upon
>> accessing to take full advantage of the more expensive top-tier memory.
>
> Yeah, that's what the promotion does.
>
> Let's clarify the reproducer steps specifically(thanks Goto-san for the reproducer):
> On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
>
> # Enable demotion only
> echo 1 > /sys/kernel/mm/numa/demotion_enabled
> numactl -m 0-1 memhog -r200 3500M >/dev/null &
> pid=$!
> sleep 2
> numactl memhog -r100 2500M >/dev/null &
> sleep 10
> kill -9 $pid
> # Enable promotion
> echo 2 > /proc/sys/kernel/numa_balancing
>
> # After a few seconds, we observe `pgpromote_candidate < pgpromote_success`
>
> In this scenario, after terminating the first memhog, the conditions
> for pgdat_free_space_enough() are quickly met, triggering promotion.
> However, these migrated pages are only accounted for in PGPROMOTE_SUCCESS, not in PGPROMOTE_CANDIDATE.

Yes.  This is the expected behavior of current implementation.

>
>> 
>>> $ grep -e pgpromote /proc/vmstat
>>> pgpromote_success 2579
>>> pgpromote_candidate 1
>>>
>>> This update increments PGPROMOTE_CANDIDATE within the free space branch
>>> when a promotion decision is made, which may alter the mechanism of the
>>> rate limit. Consequently, it becomes easier to reach the rate limit than
>>> it was previously.
>>>
>>> For example:
>>> Rate Limit = 100 pages/sec
>>> Scenario:
>>>    T0: 90 free-space migrations
>>>    T0+100ms: 20-page migration request
>>>
>>> Before:
>>>    Rate limit is *not* reached: 0 + 20 = 20 < 100
>>>    PGPROMOTE_CANDIDATE: 20
>>> After:
>>>    Rate limit is reached: 90 + 20 = 110 > 100
>>>    PGPROMOTE_CANDIDATE: 110
>> 
>> Yes.  The rate limit will be influenced by the change.  So, more tests
>> may be needed to verify it will not incurs regressions.
>
>
> Testing this might be challenging due to workload dependencies. Do you
> have any recommended workloads for evaluation?

Some in-memory database should be good workloads, for example, redis, etc.

> Alternatively, could we could rely on the LKP project for impact assessment(Current patch has not really tested
> by LKP due to a compiling error, I will post a V2 soon).

LKP has some basic workload to test this, for example, pmbench with
Gauss-ih access pattern.

> However, regarding the rate limit change itself, I consider this patch
> logically correct. As stated in the numa_promotion_rate_limit()
> comment:
>> "For memory tiering mode, too high promotion/demotion throughput may hurt application latency."
> It seems there is no justification for excluding
> pgdat_free_space_enough() triggered promotions from the rate limiting
> mechanism.

In fact, we don't rate limit promotion if there are enough free space on
fast memory to fill the fast memory quickly.  I think that it's
necessary to prevent the fast memory from under-utilized ASAP.

>
>
>> 
>>>
>>> Reported-by: Yasunori Gotou (Fujitsu) <y-goto@fujitsu.com>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

[snip]

---
Best Regards,
Huang, Ying

     prev parent reply	other threads:[~2025-06-24  2:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-19  7:52 [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting Li Zhijian
2025-06-19 22:06 ` kernel test robot
2025-06-20  2:04   ` Zhijian Li (Fujitsu)
2025-06-20  2:22     ` Philip Li
2025-06-20  6:28 ` Huang, Ying
2025-06-23  8:54   ` Zhijian Li (Fujitsu)
2025-06-24  2:46     ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ms9xonzf.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhijian@fujitsu.com \
    --cc=lkp@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.