public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@amd.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: <Jonathan.Cameron@huawei.com>, <dave.hansen@intel.com>,
	<gourry@gourry.net>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
	<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
	<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
	<shivankg@amd.com>
Subject: Re: [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure
Date: Mon, 23 Mar 2026 15:28:29 +0530	[thread overview]
Message-ID: <957f2242-56d4-4bf0-8aeb-9d60fbea8c8c@amd.com> (raw)
In-Reply-To: <20260323095104.238982-1-bharata@amd.com>

Redis-memtier results

Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)

$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node   0   1   2
  0:  10  32  50
  1:  32  10  60
  2:  255  255  10

Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
         in the patched case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
         use of hint faults as source in the patched case.

Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)

==============================================================
Scenario 1 - Enough memory in toptier and hence only promotion
==============================================================
In the setup phase, 64GB database is provisioned and explicitly moved
to Node 2 by migrating redis-server's memory to Node 2.
Memtier is run on Node 1.

Parallel distribution, 50% of the keys accessed, each 4 times.
16        Threads
100       Connections per thread
77808     Requests per client

==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9
Latency       KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals     226611.42       225.92873       224.25500       423.93500
454.65500    514886.68
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals     257211.48       204.99755       216.06300       370.68700
454.65500    584413.47
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals     255631.78       209.20335       216.06300       378.87900
450.55900    580824.22
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals     249494.46       209.31820       212.99100       380.92700
448.51100    566879.53
==================================================================================================

pgpromote_success
==================================
Base, NUMAB0            0
Base, NUMAB2            10,435,176
pghot-default, NUMAB2   10,435,235
pghot-precise, NUMAB2   10,435,294
==================================

- There is a clear benefit of hot page promotion seen. Both
  base and pghot show similar benefits.
- The number of pages promoted in both cases are more or less
  same.

==============================================================
Scenario 2 - Toptier memory overcommited, promotion + demotion
==============================================================
In the setup phase, 192GB database is provisioned. The database occupies
Node 1 entirely(~128GB) and spills over to Node 2 (~64GB).
Memtier is run on Node 1.

Parallel distribution, 50% of the keys accessed, each 4 times.
16        Threads
100       Connections per thread
233424    Requests per client

==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9
Latency       KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals     237743.40       217.72842       201.72700       395.26300
440.31900    540389.78
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals     235935.72       219.36544       210.94300       411.64700
477.18300    536280.93
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals     248283.99       219.74875       211.96700       413.69500
509.95100    564348.49
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals     240529.35       222.11878       215.03900       411.64700
464.89500    546722.22
==================================================================================================
                        pgpromote_success       pgdemote_kswapd
===============================================================
Base, NUMAB0            0                       672,591
Base, NUMAB2            350,632                 689,751
pghot-default, NUMAB2   17,118,987              17,421,474
pghot-precise, NUMAB2   24,030,292              24,342,569
===============================================================

- No clear benefit is seen with hot page promotion both in base and pghot case.
- Most promotion attempts in base case fail because the NUMA hint fault latency
  is found to exceed the threshold value (default threshold of 1000ms) in
  majority of the promotion attempts.
- Unlike base NUMAB2 where the hint fault latency is the difference between the
  PTE update time (during scanning) and the access time (hint fault), pghot uses
  a single latency threshold (3000ms in pghot-default and 5000ms in
  pghot-precise) for two purposes.
        1. If the time difference between successive accesses are within the
           threshold, the page is marked as hot.
        2. Later when kmigrated picks up the page for migration, it will migrate
           only if the difference between the current time and the time when the
          page was marked hot is with the threshold.
  Because of the above difference in behaviour, more number of pages get
  qualified for promotion compared to base NUMAB2.



  parent reply	other threads:[~2026-03-23  9:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23  9:50 [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 1/5] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 2/5] mm: migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-03-26  5:50   ` Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot Bharata B Rao
2026-03-26 10:41   ` Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 5/5] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-03-23  9:56 ` [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23  9:58 ` Bharata B Rao [this message]
2026-03-23  9:59 ` Bharata B Rao
2026-03-23 10:01 ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=957f2242-56d4-4bf0-8aeb-9d60fbea8c8c@amd.com \
    --to=bharata@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shivankg@amd.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox