public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@amd.com>
To: SeongJae Park <sj@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com,
	david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	feng.tang@intel.com, kbusch@meta.com, bharata@amd.com,
	Hasan.Maruf@amd.com, willy@infradead.org,
	kirill.shutemov@linux.intel.com, mgorman@techsingularity.net,
	vbabka@suse.cz, hughd@google.com, rientjes@google.com,
	shy828301@gmail.com, Liam.Howlett@Oracle.com,
	peterz@infradead.org, mingo@redhat.com
Subject: Re: [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit
Date: Fri, 20 Dec 2024 12:00:09 +0530	[thread overview]
Message-ID: <84c8de4c-cf2b-4b5d-b1e2-952d52f42fd4@amd.com> (raw)
In-Reply-To: <20241210185357.81214-1-sj@kernel.org>



On 12/11/2024 12:23 AM, SeongJae Park wrote:
> Hello Raghavendra,
> 
> 
> Thank you for posting this nice patch series.  I gave you some feedback
> offline.  Adding those here again for transparency on this grateful public
> discussion.
> 
> On Sun, 1 Dec 2024 15:38:08 +0000 Raghavendra K T <raghavendra.kt@amd.com> wrote:
> 
>> Introduction:
>> =============
>> This patchset is an outcome of an ongoing collaboration between AMD and Meta.
>> Meta wanted to explore an alternative page promotion technique as they
>> observe high latency spikes in their workloads that access CXL memory.
>>
>> In the current hot page promotion, all the activities including the
>> process address space scanning, NUMA hint fault handling and page
>> migration is performed in the process context. i.e., scanning overhead is
>> borne by applications.
> 
> Yet another approach is using DAMON.  DAMON does access monitoring, and further
> allows users to request access pattern-driven system operations in name of
> DAMOS (Data Access Monitoring-based Operation Schemes).  Using it, users can
> request DAMON to find hot pages and promote, while finding cold pages and
> demote.  SK hynix has made their CXL-based memory capacity expansion solution
> in the way (https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion).  We
> collaboratively developed new DAMON features for that, and those are all
> in the mainline since Linux v6.11.
> > I also proposed an idea for advancing it using DAMOS auto-tuning on more
> general (>2 tiers) setup
> (https:lore.kernel.org/20231112195602.61525-1-sj@kernel.org).  I haven't had a
> time to further implement and test the idea so far, though.
> 
>>
>> This is an early RFC patch series to do (slow tier) CXL page promotion.
>> The approach in this patchset assists/addresses the issue by adding PTE
>> Accessed bit scanning.
>>
>> Scanning is done by a global kernel thread which routinely scans all
>> the processes' address spaces and checks for accesses by reading the
>> PTE A bit. It then migrates/promotes the pages to the toptier node
>> (node 0 in the current approach).
>>
>> Thus, the approach pushes overhead of scanning, NUMA hint faults and
>> migrations off from process context.
> 
> DAMON also uses PTE A bit as major source of the access information.  And DAMON
> does both access monitoring and promotion/demotion in a global kernel thread,
> namely kdamond.  Hence the DAMON-based approach would also offload the
> overheads from process context.  So I feel your approach has a sort of
> similarity with DAMON-based one in a way, and we might have a chance to avoid
> unnecessary duplicates.
> 
> [...]
>>
>> Limitations:
>> ===========
>> PTE A bit scanning approach lacks information about exact destination
>> node to migrate to.
> 
> This is same for DAMON-based approach, since DAMON also uses PTE A bit as the
> major source of the information.  We aim to extend DAMON to aware of the access
> source CPU, and use it for solving this problem, though.  Utilizing page faults
> or AMD IBS-like h/w features are on the table of the ideas.
> 
>>
>> Notes/Observations on design/Implementations/Alternatives/TODOs...
>> ================================
>> 1. Fine-tuning scan throttling
> 
> DAMON allows users set the upper-limit of monitoring overhead, using
> max_nr_regions parameter.  Then it provides its best-effort accuracy.  We also
> have ongoing projects for making it more accurate and easier to tune.
> 
>>
>> 2. Use migrate_balanced_pgdat() to balance toptier node before migration
>>   OR Use migrate_misplaced_folio_prepare() directly.
>>   But it may need some optimizations (for e.g., invoke occasionaly so
>> that overhead is not there for every migration).
>>
>> 3. Explore if a separate PAGE_EXT flag is needed instead of reusing
>> PAGE_IDLE flag (cons: complicates PTE A bit handling in the system),
>> But practically does not look good idea.
>>
>> 4. Use timestamp information-based migration (Similar to numab mode=2).
>> instead of migrating immediately when PTE A bit set.
>> (cons:
>>   - It will not be accurate since it is done outside of process
>> context.
>>   - Performance benefit may be lost.)
> 
> DAMON provides a sort of time-based aggregated monitoring results.  And DAMOS
> provides prioritization of pages based on the access temperature.  Hence,
> DAMON-based apparoach can also be used for a similar purpose (promoting not
> every accessed pages but pages that more frequently used for longer time).
> 
>>
>> 5. Explore if we need to use PFN information + hash list instead of
>> simple migration list. Here scanning is directly done with PFN belonging
>> to CXL node.
> 
> DAMON supports physical address space monitoring, and maintains the access
> monitoring results in its own data structure called damon_region.  So I think
> similar benefit can be achieved using DAMON?
> 
> [...]
>> 8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of
>> physical addresses accessed.
> 
> My biased humble opinion is that it would be very nice to explore this
> opportunity, since I show some similarities and opportunities to solve some of
> challenges on your approach in an easier way.  Even if it turns out that DAMON
> cannot be used for your use case, failing earlier is a good thing, I'd say :)
> 
>>
>> 9. Gregory has nicely mentioned some details/ideas on different approaches in
>> [1] : development notes, in the context of promoting unmapped page cache folios.
> 
> DAMON supports monitoring accesses to unmapped page cache folios, so hopefully
> DAMON-based approaches can also solve this issue.
> 

Hello SJ,

Thank you for detailed explanation again. (Sorry for late
acknowledgement as I was looking forward to MM alignment discussion when
this message came).

I think once the direction is fixed, we could surely use / Reuse lot
source code from DAMON, MGLRU. Amazing design of DAMON should surely
help. Will keep in mind all the points raised here.

Thanks and Regards
- Raghu

  reply	other threads:[~2024-12-20  6:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-01 15:38 [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 01/10] mm: Add kmmscand kernel daemon Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 02/10] mm: Maintain mm_struct list in the system Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 03/10] mm: Scan the mm and create a migration list Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 04/10] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 05/10] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 06/10] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 07/10] sysfs: Add sysfs support to tune scanning Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 08/10] vmstat: Add vmstat counters Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 09/10] trace/kmmscand: Add tracing of scanning and migration Raghavendra K T
2024-12-05 17:46   ` Steven Rostedt
2024-12-06  6:33     ` Raghavendra K T
2024-12-06 14:49       ` Steven Rostedt
2024-12-01 15:38 ` [RFC PATCH V0 DO NOT MERGE 10/10] kmmscand: Add scanning Raghavendra K T
2024-12-10 18:53 ` [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit SeongJae Park
2024-12-20  6:30   ` Raghavendra K T [this message]
2025-02-12 17:02 ` Davidlohr Bueso
2025-02-13  5:39   ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84c8de4c-cf2b-4b5d-b1e2-952d52f42fd4@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=abhishekd@meta.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kbusch@meta.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nehagholkar@meta.com \
    --cc=nphamcs@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox