From: Bharata B Rao <bharata@amd.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: <Jonathan.Cameron@huawei.com>, <dave.hansen@intel.com>,
<gourry@gourry.net>, <mgorman@techsingularity.net>,
<mingo@redhat.com>, <peterz@infradead.org>,
<raghavendra.kt@amd.com>, <riel@surriel.com>,
<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
<akpm@linux-foundation.org>, <david@redhat.com>,
<byungchul@sk.com>, <kinseyho@google.com>,
<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
<shivankg@amd.com>
Subject: Re: [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure
Date: Mon, 23 Mar 2026 15:31:22 +0530 [thread overview]
Message-ID: <2f37d86b-82d5-4fbe-8071-2f346ae3e5d9@amd.com> (raw)
In-Reply-To: <20260323095104.238982-1-bharata@amd.com>
NAS Parallel Benchmarks - BT results
Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)
$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node 0 1 2
0: 10 32 50
1: 32 10 60
2: 255 255 10
Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
in the pghot case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
use of hint faults as source in the pghot case.
Both promotion and demotion are enabled in this case.
NUMAB3 - Enabled both regular and tiering mode of NUMA Balancing
(kernel.numa_balancing=3)
Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)
NAS-BT details
--------------
Command: mpirun -np 16 /usr/bin/numactl --cpunodebind=0,1
NPB3.4.4/NPB3.4-MPI/bin/bt.F.x
While class D uses around 24G of memory (which is too less to show the benefit
of promotion), class E results in around 368G of memory which overflows my
toptier. Hence I wanted something in between these classes. So I have modified
class F to the problem size of 768 which results in around 160GB of memory.
After the memory consumption stabilizes, all the rank PIDs are paused and
their memory is moved to CXL node using migratepages command. This simulates
the situation of memory residing on lower tier node and access by BT processes
leading to promotion.
Time in seconds - Lower is better
Mop/s total - Higher is better
=====================================================================================
Base Base pghot-default
pghot-precise
NUMAB0 NUMAB2 NUMAB2 NUMAB2
=====================================================================================
Time in seconds 7321.79 4333.85 6498.78 4386.27
Mop/s total 53451.77 90303.780 60221.01 89224.51
pgpromote_success 0 41971151 423163051 41957809
pgpromote_candidate 0 0 1870949786 0
pgpromote_candidate_nrl 0 41971151 29360089 41957809
pgdemote_kswapd 0 0 391179763 0
numa_pte_updates 0 42041312 1919944389 2568923206
numa_hint_faults 0 41972330 1911683592 2562729196
=====================================================================================
pghot-default
NUMAB3
=====================================================================================
Time in seconds 4425.84
Mop/s total 88426.77
pgpromote_success 41957442
pgpromote_candidate 0
pgpromote_candidate_nrl 41957442
pgdemote_kswapd 0
numa_pte_updates 2588634775
numa_hint_faults 2581645889
=====================================================================================
- In the base case, the benchmark numbers improve significantly due to hot page
promotion.
- Though the benchmark runs for hundreds of minutes, the pages get promoted
within the first few mins.
- pghot-precise is able to match the base case numbers.
- The benchmark suffers in pghot-default case due to promotion being limited
to the default NID (0) only. This leads to excessive PTE updates, hint faults,
demotion and promotion churn.
- With NUMAB3, pghot-default case recovers the performce as in this mode
misplaced hot page migrations get right placed due to NUMA balancing
mode=1 being active.
prev parent reply other threads:[~2026-03-23 10:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 9:50 [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 1/5] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 2/5] mm: migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-03-26 5:50 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot Bharata B Rao
2026-03-26 10:41 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 5/5] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-03-23 9:56 ` [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:58 ` Bharata B Rao
2026-03-23 9:59 ` Bharata B Rao
2026-03-23 10:01 ` Bharata B Rao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f37d86b-82d5-4fbe-8071-2f346ae3e5d9@amd.com \
--to=bharata@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alok.rathore@samsung.com \
--cc=balbirs@nvidia.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=kinseyho@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=shivankg@amd.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox