[Linux Memory Hotness and Promotion] Notes from May 22, 2025

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Linux Memory Hotness and Promotion] Notes from May 22, 2025
@ 2025-06-04  2:42 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-06-04  2:42 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker, SeongJae Park,
	Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, May 22.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Bharata was working on separating out migration from NUMA Balancing[1].
He had posted the patch set on May 21 and received feedback from Gregory
Price and others.

Gregory said that any concerns could be addressed on the mailing list but
also noted that if it's going to go down this route, then having the
migration done asynchronously per task was likely not ideal.  It would be
better to have it done per node or per cpu.  This would also help to
resolve some issues related to lengthy lists of isolated folios.

I asked about the per-task asynchronous migration.  Bharata discussed the
previous approach of kpromoted that was per-node but was inspired by
Gregory's recent work and looking more deeply into NUMA Balancing so
there were some advantages to isolating a small number of folios and then
ensuring migration within a small period of time.  He was not against
doing this per node.

Bharata discussed two aspects to NUMA Balancing, the locality based
balancing and the hot page promotion.  For the former, there has been a
lot of discussion on a generalized abstraction to describe memory hotness
that would then be handed over to the migrator -- this is not needed by
traditional NUMA Balancing.

Raghavendra suggested a migrate thread that would only do the migration
itself.  Bharata said this should be independent of the backend that is
determining which folios to migrate.

Instead of isolating the pages for a per-node migration thread, Bharata
suggested tracking the pages by pfn instead.  Gregory didn't have a
strong opinion on isolation vs pfn tracking; he pushed back on per-task
because of the asynchronous migration -- he did not want to leave a ton
of memory isolated.  I agreed.

----->o-----
I discussed the methodology for benchmarking and collecting data for
various approaches.  Bharata noted that the majority feeling was to move
this out of the task_work context so there is no urgency in testing this
today.  When we have the alternative mechanism, we could collect numbers
at that time.

Gregory noted that all we have to do with the current patchset is to
clean up the task exit putback.  He noted that having this done as part
of task_work() would cause latencies to show up in weird places that may
not be easy to attribute to migration.

Davidlohr noted that even with this approach, that the cost of the
scheduler aspect of NUMA Balancing is still a burden of the task.

Gregory noted that his feel was still that migration should be done by a
kthread, but the discussion would be more around where you store the data
and how much data you store to track what needs to be migrated.  As a
simple example, he described a per-node list that would occasionally
migrate from that list.  This would be possible experiment with and
evolve, including potentially for per cpu.  Bharata said that he would do
this.

Raghu noted that the migration thread may also be able to handle both an
isolated list of folios *or* pfn tracked memory.

----->o-----
Raghavendra discussed feedback from upstream from Davidlohr.  He
discussed PTE based scanning and its implications on first access, this
is something that we would like to filter.  He discussed whether the LRU
was able to provide insight into whether a folio was active in the N-1
generation; this would allow us to filter as desired.

Davidlohr described his idea as when a page fault comes in and we
determine that the page is hot, and have decided that we want to migrate,
then it gets added to the asynchronous migrating entity described above
for processing.  Once that entity schedules, it will determine if the
memory is still hot by checking the active list and only then do the
isolation needed for migration.

----->o-----
Next meeting will be on Thursday, June 5 at 8:30am PDT (UTC-7), everybody
is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:

 - update on the pivot toward per-node or per-cpu tracking of pages to
   migrate
   + decide if pfn based tracking is required instead
   + update on Bharata's latest proposal that incorporates this feedback,
     targeting the next meeting if possible
 - discuss overall testing and benchmarking methodology for various
   approaches as we go along
   + minimal viable infrastructure, testing workloads, and metrics of
     interest to collect
 - enlightening migrate_pages() for hardware assists and how this work
   will be charged to userspace

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1]
https://lore.kernel.org/linux-mm/aC6VIG7GPnqr3ug-@gourry-fedora-PF4VCD3F/T/#t

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-06-04  2:42 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04  2:42 [Linux Memory Hotness and Promotion] Notes from May 22, 2025 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).