linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Linux Memory Hotness and Promotion] Notes from July 17, 2025
@ 2025-07-30  4:11 David Rientjes
  2025-07-31 17:31 ` Davidlohr Bueso
  0 siblings, 1 reply; 2+ messages in thread
From: David Rientjes @ 2025-07-30  4:11 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker, SeongJae Park,
	Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, July 17.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
I shared the news with the group that the Device and Specific Purpose
Memory microconference is happening at LPC on December 11-13:
https://lpc.events/event/19/contributions/2009/

This has a lot of overlap with this group, including hotness tracking and
CXL memory expansion.

----->o-----
We chatted about the status of pfn tracked folios.  Bharata noted that he
had updated the patch series for kmigrated and started experimenting with
it.  Page hotness information was recorded per page with page flag
extensions and then a kthread scanned the pfns to figure out which pfn
needed to be migrated.

This turned out to be problematic: a bimodal behavior was observed based
on when the kmigrated kthread could get to this page in its scan; it could
either be very fast or it could be very slow depending on the scan period.
As a result, the hot pages were not recognized in time which resulted in
the worst case probability for benchmarks being used.

Exempting some memory zones from the scan was insufficient itself due to
the sheer amount of memory that could be present.  This led to a pivot to
kpromoted that only manages the set of hot memory.  This had its own
challenges, however, due to synchronization required between the producers
and consumers.  The hot pfns were maintained in a hash bucket which was
good for lookup, but if hot pfns must be extracted out of it then this was
problematic for hash; a secondary data structure was going to be required
that keeps this in priority based order (Max Heap).

This was working in practice but more testing was needed, including for
scalability due to synchronization between the two data structures.  I
asked about the testing methodology for kmigrated and Bharata noted that
it was a very simple test that just simulates memory accesses on the
remote node (traditional NUMA Balancing style testing).

Additionally, these data structures only need to be maintained for lower
tier memory given the approach here is focused on kpromoted.

----->o-----
Wei Xu asked if we need to allocate a node to maintain the data structures
constantly.  Bharata noted this is why the shift happened toward page
extensions instead.  Wei thought about maintaining multi-level bitmaps
which avoids the allocations for every single page.

Bharata asked about hotness information itself, including access
information.  Wei said described this data structure as similar to page
flags but sparsely populated.  This was based on upcoming support for CHMU
when a PFN is read, we can just immediately promote.  However, since there
are two tasks involved (collection and promotion), a single thread is less
than ideal.

I asked how we could share the code between the groups so that people can
work from a common understanding.  Wei mentioned that it would be possible
to share soon.

Raghavendra discussed pfn scanning and the data structure that would be
needed, similar to Maple Tree to store the range of memory.  Idea was to
store the timestamp of the scan to avoid storing this information for
every page.  He noted that in his RFC v2 patch series that the scanning
and migration thread are separate.

Raghavendra presented a slide that suggested a kpromoted interface to be
used when the source does not maintain hotness information and a per-node
kmigrated thread that does blind migration, throttling, and batching.
Bharata noted that Raghu's PTE scanning series would use kmigrated
directly and bypass kpromoted -- approaches that do not have complete
hotness information themselves would instead use kpromoted.

Kpromoted will maintain the hot page information, potentially based on
multiple inputs, and then hand the information to kmigrated for the actual
migration itself.

----->o-----
Next meeting will be on Thursday, July 31 at 8:30am PDT (UTC-7),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:

 - update on status of kpromoted (collector) and kmigrated (promoter)
   approaches and testing
 - update on sharing Google approach for both to overlap the shared goals
   and converge where possible
 - discuss proactive demotion interface as an extension to memory.reclaim
   + possibly leveraging working set extensions on top of MGLRU
 - discuss overall testing and benchmarking methodology for various
   approaches as we go along
   + minimal viable infrastructure, testing workloads, and metrics of
     interest to collect
 - enlightening migrate_pages() for hardware assists and how this work
   will be charged to userspace

Please let me know if you'd like to propose additional topics for
discussion, thank you!


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Linux Memory Hotness and Promotion] Notes from July 17, 2025
  2025-07-30  4:11 [Linux Memory Hotness and Promotion] Notes from July 17, 2025 David Rientjes
@ 2025-07-31 17:31 ` Davidlohr Bueso
  0 siblings, 0 replies; 2+ messages in thread
From: Davidlohr Bueso @ 2025-07-31 17:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Fan Ni, Gregory Price, Jonathan Cameron, Joshua Hahn,
	Raghavendra K T, Rao, Bharata Bhasker, SeongJae Park, Wei Xu,
	Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan, linux-mm

On Tue, 29 Jul 2025, David Rientjes wrote:

>Topics for the next meeting:
>
> - update on status of kpromoted (collector) and kmigrated (promoter)
>   approaches and testing
> - update on sharing Google approach for both to overlap the shared goals
>   and converge where possible
> - discuss proactive demotion interface as an extension to memory.reclaim
>   + possibly leveraging working set extensions on top of MGLRU

fyi the per-node interface to proactive reclaim has also recently been
added.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b980077899ea49cc747afe003e01ca303b00d463

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-07-31 17:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30  4:11 [Linux Memory Hotness and Promotion] Notes from July 17, 2025 David Rientjes
2025-07-31 17:31 ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).