* [Linux Memory Hotness and Promotion] Notes from July 31, 2025
@ 2025-08-14 3:21 David Rientjes
0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-08-14 3:21 UTC (permalink / raw)
To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker, SeongJae Park,
Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan
Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 5659 bytes --]
Hi everybody,
Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, July 31. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Jonathan Cameron started off by talking about the LPC microconference: see
https://lpc.events/event/19/contributions/2009/ for all the details on
that conference since there are some overlapping topics with this call.
----->o-----
Raghavendra updated in chat: ”Status: Coming with RFC V3 with many of the
TODOs addressed There was one BUG that was causing idle system to crash
got fixed (wrong kmemcache) there was a corner case where in MAP_POPULATE
where all the pages were getting migrated, fixed that, but code is not
looking great... will experiment a bit, cleanup and send it very soon
(early next week if not by this week).”””
----->o-----
Bharata discussed early integration of PTE Accessed bit scanning and
kpromoted. He presented a slide that I added to the shared drive and went
over kpromoted as the single source of page hotness truth. This like AMD
IBS, kscand, LRU, CHMU, etc, can record access and store in kpromoted as
the single source of truth. There was some confusion that was clarified
that kpromoted was not actually a kthread. The list of hot pages can then
be sent to kmigrated to handle hot page migration/promotion with an
optional use of hardware acceleration if available. PTE Accessed bit
scanning is now referred to as kscand.
The single source of truth maintains page hotness information only for
memory that has been accessed, not for the entire system; this was to
address issues where it took too long to discover hot pages with regular
scanning. The single source of truth determines which pages to promote
as the final decision maker.
Bharata showed an early benchmark, abench, that does random memory access.
kpromoted+kscand was roughly on par with NUMAB=2 and very significantly
better than NUMAB=0. The number of pages migrated was also almost
identical between kpromoted+kscand and NUMAB=2.
----->o-----
The page hotness data structure ("kpromoted") supported a has table for
quick lookup and eliminates duplicates. Hotness of the record is updated
when the access is recorded and records that cross the hotness threshold
are pushed to Max Heap for easy retrieval. Kmigrated then extracts the N
hottest records and tries to promote that memory.
Allocation and deallocation of millions of records can be an issue.
Accesses get reported from atomic contexts and the statically available
per-page space for storing hotness record is preferrable.
We discussed statically allocated data structures to avoid constant
allocation and deallocation of the data structure. I asked how many bits
were required for this, Bharata suggested 64 bits. Previously, this was
done with page_ext. Gregory Price noted the static allocation of this
may be contentious upstream.
Wei Xu said time may be optional if we can reset. He pointed at three
data structures: hash table that stores all information, Max Heap for the
hottest pages, and the migration list. Bharata said the Max Heap would
not be as big as the hash table because of the hotness threshold. Max
Heap can only store a certain number of elements.
Bharata noted that pages move from the hash table to the Max Heap at the
time of access, not during a scan, when the hotness threshold is reached.
If we exceed the capacity of the Max Heap, then we'd need to rediscover
the page on a later iteration.
Gregory also noted that we could discover page cache this way which was
not possible to do with traditional NUMAB=2.
----->o-----
I followed up with the group in email to keep the conversation going
before the next biweekly meeting. Bharata there noted that the source of
truth only maintains page hotness information for lower tier memory. It
would be possible to extend this for multiple lower tier nodes if needed.
He was also planning on exploring reusing or moving the existing
throttling mechanisms from NUMAB=2 to kpromoted.
I asked a few more questions as well:
- any thoughts on memcg controls that this could use if we want to
control the demotion of memory for latency sensitive vs latency
tolerant workloads?
- klruscand was mentioned; trying to figure out an update for that and
any potential next steps
- Wei, could you update on sharing of internal code so that we're all
operating from the same base understanding and surfacing the overlaps
and opportunity for collaboration?
----->o-----
Next meeting will be on Thursday, August 14 at 8:30am PDT (UTC-7),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm
Topics for the next meeting:
- update on status of kpromoted as single source of truth and the
kmigrated kernel thread
- determining if klruscand will provide yet another source of page
hotness information based on MGLRU data
- update on sharing Google approach for both to overlap the shared goals
and converge where possible
- discuss proactive demotion interface as an extension to memory.reclaim
+ possibly leveraging working set extensions on top of MGLRU
- discuss overall testing and benchmarking methodology for various
approaches as we go along
+ minimal viable infrastructure, testing workloads, and metrics of
interest to collect
- enlightening migrate_pages() for hardware assists and how this work
will be charged to userspace
Please let me know if you'd like to propose additional topics for
discussion, thank you!
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-08-14 3:21 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 3:21 [Linux Memory Hotness and Promotion] Notes from July 31, 2025 David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).