Re: [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: <AneeshKumar.KizhakeVeetil@arm.com>, <Hasan.Maruf@amd.com>,
	<Michael.Day@amd.com>, <akpm@linux-foundation.org>,
	<bharata@amd.com>, <dave.hansen@intel.com>, <david@redhat.com>,
	<dongjoo.linux.dev@gmail.com>, <feng.tang@intel.com>,
	<gourry@gourry.net>, <hannes@cmpxchg.org>, <honggyu.kim@sk.com>,
	<hughd@google.com>, <jhubbard@nvidia.com>, <jon.grimm@amd.com>,
	<k.shutemov@gmail.com>, <kbusch@meta.com>,
	<kmanaouil.dev@gmail.com>, <leesuyeon0506@gmail.com>,
	<leillc@google.com>, <liam.howlett@oracle.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<mgorman@techsingularity.net>, <mingo@redhat.com>,
	<nadav.amit@gmail.com>, <nphamcs@gmail.com>,
	<peterz@infradead.org>, <riel@surriel.com>, <rientjes@google.com>,
	<rppt@kernel.org>, <santosh.shukla@amd.com>, <shivankg@amd.com>,
	<shy828301@gmail.com>, <sj@kernel.org>, <vbabka@suse.cz>,
	<weixugc@google.com>, <willy@infradead.org>,
	<ying.huang@linux.alibaba.com>, <ziy@nvidia.com>,
	<dave@stgolabs.net>
Subject: Re: [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit
Date: Fri, 21 Mar 2025 15:52:44 +0000	[thread overview]
Message-ID: <20250321155244.00006338@huawei.com> (raw)
In-Reply-To: <20250319193028.29514-1-raghavendra.kt@amd.com>

On Wed, 19 Mar 2025 19:30:15 +0000
Raghavendra K T <raghavendra.kt@amd.com> wrote:

> Introduction:
> =============
> In the current hot page promotion, all the activities including the
> process address space scanning, NUMA hint fault handling and page
> migration is performed in the process context. i.e., scanning overhead is
> borne by applications.
> 
> This is RFC V1 patch series to do (slow tier) CXL page promotion.
> The approach in this patchset assists/addresses the issue by adding PTE
> Accessed bit scanning.
> 
> Scanning is done by a global kernel thread which routinely scans all
> the processes' address spaces and checks for accesses by reading the
> PTE A bit. 
> 
> A separate migration thread migrates/promotes the pages to the toptier
> node based on a simple heuristic that uses toptier scan/access information
> of the mm.
> 
> Additionally based on the feedback for RFC V0 [4], a prctl knob with
> a scalar value is provided to control per task scanning.
> 
> Initial results show promising number on a microbenchmark. Soon
> will get numbers with real benchmarks and findings (tunings). 
> 
> Experiment:
> ============
> Abench microbenchmark,
> - Allocates 8GB/16GB/32GB/64GB of memory on CXL node
> - 64 threads created, and each thread randomly accesses pages in 4K
>   granularity.

So if I'm reading this right, this is a flat distribution and any
estimate of what is hot is noise?

That will put a positive spin on costs of migration as we will
be moving something that isn't really all that hot and so is moderately
unlikely to be accessed whilst migration is going on.  Or is the point that
the rest of the memory is also mapped but not being accessed?

I'm not entirely sure I follow what this is bound by. Is it bandwidth
bound?


> - 512 iterations with a delay of 1 us between two successive iterations.
> 
> SUT: 512 CPU, 2 node 256GB, AMD EPYC.
> 
> 3 runs, command:  abench -m 2 -d 1 -i 512 -s <size>
> 
> Calculates how much time is taken to complete the task, lower is better.
> Expectation is CXL node memory is expected to be migrated as fast as
> possible.

> 
> Base case: 6.14-rc6    w/ numab mode = 2 (hot page promotion is enabled).
> patched case: 6.14-rc6 w/ numab mode = 1 (numa balancing is enabled).
> we expect daemon to do page promotion.
> 
> Result:
> ========
>          base NUMAB2                    patched NUMAB1
>          time in sec  (%stdev)   time in sec  (%stdev)     %gain
>  8GB     134.33       ( 0.19 )        120.52  ( 0.21 )     10.28
> 16GB     292.24       ( 0.60 )        275.97  ( 0.18 )      5.56
> 32GB     585.06       ( 0.24 )        546.49  ( 0.35 )      6.59
> 64GB    1278.98       ( 0.27 )       1205.20  ( 2.29 )      5.76
> 
> Base case: 6.14-rc6    w/ numab mode = 1 (numa balancing is enabled).
> patched case: 6.14-rc6 w/ numab mode = 1 (numa balancing is enabled).
>          base NUMAB1                    patched NUMAB1
>          time in sec  (%stdev)   time in sec  (%stdev)     %gain
>  8GB     186.71       ( 0.99 )        120.52  ( 0.21 )     35.45 
> 16GB     376.09       ( 0.46 )        275.97  ( 0.18 )     26.62 
> 32GB     744.37       ( 0.71 )        546.49  ( 0.35 )     26.58 
> 64GB    1534.49       ( 0.09 )       1205.20  ( 2.29 )     21.45

Nice numbers, but maybe some more details on what they are showing?
At what point in the workload has all the memory migrated to the
fast node or does that never happen?

I'm confused :(

Jonathan

next prev parent reply	other threads:[~2025-03-21 15:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-19 19:30 [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 01/13] mm: Add kmmscand kernel daemon Raghavendra K T
2025-03-21 16:06   ` Jonathan Cameron
2025-03-24 15:09     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 02/13] mm: Maintain mm_struct list in the system Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 03/13] mm: Scan the mm and create a migration list Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 04/13] mm: Create a separate kernel thread for migration Raghavendra K T
2025-03-21 17:29   ` Jonathan Cameron
2025-03-24 15:17     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 05/13] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 06/13] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 07/13] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 08/13] mm: Add initial scan delay Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 09/13] mm: Add heuristic to calculate target node Raghavendra K T
2025-03-21 17:42   ` Jonathan Cameron
2025-03-24 16:17     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 10/13] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 11/13] vmstat: Add vmstat counters Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 12/13] trace/kmmscand: Add tracing of scanning and migration Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 13/13] prctl: Introduce new prctl to control scanning Raghavendra K T
2025-03-19 23:00 ` [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit Davidlohr Bueso
2025-03-20  8:51   ` Raghavendra K T
2025-03-20 19:11     ` Raghavendra K T
2025-03-21 20:35       ` Davidlohr Bueso
2025-03-25  6:36         ` Raghavendra K T
2025-03-20 21:50     ` Davidlohr Bueso
2025-03-21  6:48       ` Raghavendra K T
2025-03-21 15:52 ` Jonathan Cameron [this message]
     [not found] ` <20250321105309.3521-1-hdanton@sina.com>
2025-03-23 18:14   ` [RFC PATCH V1 09/13] mm: Add heuristic to calculate target node Raghavendra K T
     [not found]   ` <20250324110543.3599-1-hdanton@sina.com>
2025-03-24 14:54     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250321155244.00006338@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Michael.Day@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dongjoo.linux.dev@gmail.com \
    --cc=feng.tang@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=honggyu.kim@sk.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kbusch@meta.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=nphamcs@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=santosh.shukla@amd.com \
    --cc=shivankg@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox