From: Raghavendra K T <raghavendra.kt@amd.com>
To: SeongJae Park <sj@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com,
david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com,
akpm@linux-foundation.org, hannes@cmpxchg.org,
feng.tang@intel.com, kbusch@meta.com, bharata@amd.com,
Hasan.Maruf@amd.com, willy@infradead.org,
kirill.shutemov@linux.intel.com, mgorman@techsingularity.net,
vbabka@suse.cz, hughd@google.com, rientjes@google.com,
shy828301@gmail.com, Liam.Howlett@Oracle.com,
peterz@infradead.org, mingo@redhat.com
Subject: Re: [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit
Date: Fri, 20 Dec 2024 12:00:09 +0530 [thread overview]
Message-ID: <84c8de4c-cf2b-4b5d-b1e2-952d52f42fd4@amd.com> (raw)
In-Reply-To: <20241210185357.81214-1-sj@kernel.org>
On 12/11/2024 12:23 AM, SeongJae Park wrote:
> Hello Raghavendra,
>
>
> Thank you for posting this nice patch series. I gave you some feedback
> offline. Adding those here again for transparency on this grateful public
> discussion.
>
> On Sun, 1 Dec 2024 15:38:08 +0000 Raghavendra K T <raghavendra.kt@amd.com> wrote:
>
>> Introduction:
>> =============
>> This patchset is an outcome of an ongoing collaboration between AMD and Meta.
>> Meta wanted to explore an alternative page promotion technique as they
>> observe high latency spikes in their workloads that access CXL memory.
>>
>> In the current hot page promotion, all the activities including the
>> process address space scanning, NUMA hint fault handling and page
>> migration is performed in the process context. i.e., scanning overhead is
>> borne by applications.
>
> Yet another approach is using DAMON. DAMON does access monitoring, and further
> allows users to request access pattern-driven system operations in name of
> DAMOS (Data Access Monitoring-based Operation Schemes). Using it, users can
> request DAMON to find hot pages and promote, while finding cold pages and
> demote. SK hynix has made their CXL-based memory capacity expansion solution
> in the way (https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion). We
> collaboratively developed new DAMON features for that, and those are all
> in the mainline since Linux v6.11.
> > I also proposed an idea for advancing it using DAMOS auto-tuning on more
> general (>2 tiers) setup
> (https:lore.kernel.org/20231112195602.61525-1-sj@kernel.org). I haven't had a
> time to further implement and test the idea so far, though.
>
>>
>> This is an early RFC patch series to do (slow tier) CXL page promotion.
>> The approach in this patchset assists/addresses the issue by adding PTE
>> Accessed bit scanning.
>>
>> Scanning is done by a global kernel thread which routinely scans all
>> the processes' address spaces and checks for accesses by reading the
>> PTE A bit. It then migrates/promotes the pages to the toptier node
>> (node 0 in the current approach).
>>
>> Thus, the approach pushes overhead of scanning, NUMA hint faults and
>> migrations off from process context.
>
> DAMON also uses PTE A bit as major source of the access information. And DAMON
> does both access monitoring and promotion/demotion in a global kernel thread,
> namely kdamond. Hence the DAMON-based approach would also offload the
> overheads from process context. So I feel your approach has a sort of
> similarity with DAMON-based one in a way, and we might have a chance to avoid
> unnecessary duplicates.
>
> [...]
>>
>> Limitations:
>> ===========
>> PTE A bit scanning approach lacks information about exact destination
>> node to migrate to.
>
> This is same for DAMON-based approach, since DAMON also uses PTE A bit as the
> major source of the information. We aim to extend DAMON to aware of the access
> source CPU, and use it for solving this problem, though. Utilizing page faults
> or AMD IBS-like h/w features are on the table of the ideas.
>
>>
>> Notes/Observations on design/Implementations/Alternatives/TODOs...
>> ================================
>> 1. Fine-tuning scan throttling
>
> DAMON allows users set the upper-limit of monitoring overhead, using
> max_nr_regions parameter. Then it provides its best-effort accuracy. We also
> have ongoing projects for making it more accurate and easier to tune.
>
>>
>> 2. Use migrate_balanced_pgdat() to balance toptier node before migration
>> OR Use migrate_misplaced_folio_prepare() directly.
>> But it may need some optimizations (for e.g., invoke occasionaly so
>> that overhead is not there for every migration).
>>
>> 3. Explore if a separate PAGE_EXT flag is needed instead of reusing
>> PAGE_IDLE flag (cons: complicates PTE A bit handling in the system),
>> But practically does not look good idea.
>>
>> 4. Use timestamp information-based migration (Similar to numab mode=2).
>> instead of migrating immediately when PTE A bit set.
>> (cons:
>> - It will not be accurate since it is done outside of process
>> context.
>> - Performance benefit may be lost.)
>
> DAMON provides a sort of time-based aggregated monitoring results. And DAMOS
> provides prioritization of pages based on the access temperature. Hence,
> DAMON-based apparoach can also be used for a similar purpose (promoting not
> every accessed pages but pages that more frequently used for longer time).
>
>>
>> 5. Explore if we need to use PFN information + hash list instead of
>> simple migration list. Here scanning is directly done with PFN belonging
>> to CXL node.
>
> DAMON supports physical address space monitoring, and maintains the access
> monitoring results in its own data structure called damon_region. So I think
> similar benefit can be achieved using DAMON?
>
> [...]
>> 8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of
>> physical addresses accessed.
>
> My biased humble opinion is that it would be very nice to explore this
> opportunity, since I show some similarities and opportunities to solve some of
> challenges on your approach in an easier way. Even if it turns out that DAMON
> cannot be used for your use case, failing earlier is a good thing, I'd say :)
>
>>
>> 9. Gregory has nicely mentioned some details/ideas on different approaches in
>> [1] : development notes, in the context of promoting unmapped page cache folios.
>
> DAMON supports monitoring accesses to unmapped page cache folios, so hopefully
> DAMON-based approaches can also solve this issue.
>
Hello SJ,
Thank you for detailed explanation again. (Sorry for late
acknowledgement as I was looking forward to MM alignment discussion when
this message came).
I think once the direction is fixed, we could surely use / Reuse lot
source code from DAMON, MGLRU. Amazing design of DAMON should surely
help. Will keep in mind all the points raised here.
Thanks and Regards
- Raghu
next prev parent reply other threads:[~2024-12-20 6:30 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-01 15:38 [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 01/10] mm: Add kmmscand kernel daemon Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 02/10] mm: Maintain mm_struct list in the system Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 03/10] mm: Scan the mm and create a migration list Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 04/10] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 05/10] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 06/10] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 07/10] sysfs: Add sysfs support to tune scanning Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 08/10] vmstat: Add vmstat counters Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 09/10] trace/kmmscand: Add tracing of scanning and migration Raghavendra K T
2024-12-05 17:46 ` Steven Rostedt
2024-12-06 6:33 ` Raghavendra K T
2024-12-06 14:49 ` Steven Rostedt
2024-12-01 15:38 ` [RFC PATCH V0 DO NOT MERGE 10/10] kmmscand: Add scanning Raghavendra K T
2024-12-10 18:53 ` [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit SeongJae Park
2024-12-20 6:30 ` Raghavendra K T [this message]
2025-02-12 17:02 ` Davidlohr Bueso
2025-02-13 5:39 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84c8de4c-cf2b-4b5d-b1e2-952d52f42fd4@amd.com \
--to=raghavendra.kt@amd.com \
--cc=Hasan.Maruf@amd.com \
--cc=Liam.Howlett@Oracle.com \
--cc=abhishekd@meta.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=david@redhat.com \
--cc=feng.tang@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kbusch@meta.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nehagholkar@meta.com \
--cc=nphamcs@gmail.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox