public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [Linux Memory Hotness and Promotion] Notes from April 23, 2026
@ 2026-04-25 22:10 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2026-04-25 22:10 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker, SeongJae Park,
	Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, April 23.  Thanks to everybody who was 
involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Bharata updated offline that he has a working version of IBS memory 
profiler driver which acts as a page hotness source for pghot.  It is 
currently going through review.  He should be able to post v7 of pghot 
with NUMB2 hint faults and IBS memprofiler as sources of hotness 
information by the end of April.

----->o-----
Shivank updated on the status of his patch series for page migration 
hardware assist.  He will be posting v5 of that series on Monday.  
Functionally this is working and he also tested this with memory 
compaction.  His slides are attached to the cover letter of the meeting.  
For an example of a fully fragmented 250GB node that is 50% free (every 
other 4KB page is in use), hugepage allocation is blocked.  He pinned 
compaction and a cpu-hog to the same core that is competing for cpu 
cycles.  Allocating 16384 hugepages triggered ~4.6M page migrations 
through compaction:

		Time	Pages migrated	Hog iters	Sys%	User%
Baseline	33.3s	4.6M		62.3B		49.8%	49.9%
DMA offload	32.5s	4.6M		66.0B		43.0%	56.3%

DMA offload frees cpu time during compaction so that 6% more work was done 
by the application.  This is also discussed in the upstream patch 
series[1].

I asked if this is direct compaction coming from the page allocator 
slowpath and Shivank clarifie that is correct.  This example was using DMA 
offload in Zen3 using a batch size of migration of 32 pages.  Shivank 
noted that the workload is still stalled in the page allocator while this 
migration is happening, so the benefit here is purely the speed-up 
achieved of DMA offload.

Jonathan asked if there were results for more real-world scenarios 
involving memory fragmentation.  Shivank noted this is the extent of the 
deterministic data that he has.  Jonathan opined that it may actually show 
better results with more realistic scenarios.

----->o-----
Joshua discussed his latest update for tier-aware memcg limits.  The 
graphs of data that he presented at the meeting demonstrate throughput 
differences between three noisy neighbor memory hogs and a victim 
workload.  On a 1TB machine (750GB DRAM, 250GB CXL), each workload takes 
up 220GB of memory.  The three hogs are launched first and allocate all of 
their memory, and only once they are done allocating, the victim workload 
gets to start allocating.  Once the victim gets its memory allocated, we 
start accessing the memory and measure how many reads each workload can 
perform.  The three setups presented are (1) random access, (2) 60-40 
hot/cold region accesses, and (3) 90-10 hot/cold region accesses.  He 
tested with both NUMAB2 and NUMAB0.

In all of the experiments, tiered memcg limits provides a tighter band of 
throughput.  Monitoring memory.numa_stat and looking at anonymous memory 
usage, in the non-tiered setup only the victim workload uses CXL memory.  
In a tiered setup, everybody uses the same amount of DRAM and CXL.  Joshua 
noted that the difference between NUMAB2 and NUMAB0 is also interesting, 
it seems NUMAB2 is actively harmful to the system under these scenarios, 
since it fights against the promotion/demotion caused by tiered limits.  
He's planning on sending out a new RFC later today.  His sides are 
attached to the cover letter for the meeting.

Yiannis asked if all the demotions are from the lru in this scenario and 
there were no promotions.  Joshua confirmed this is the case, that we read 
directly from CXL without promotions.

We discussed the design and implementation of NUMAB2 and Joshua made the 
observation that it is unaware of memcg so it is trying to do what is in 
the best interest of the system overall, which may be why it is fighting 
with the memcg tier aware limits.

----->o-----
NOTE!!!  The next meeting will be canceled due to LSF/MM/BPF 2026.

Next meeting will be on Thursday, May 21 at 8:30am PDT (UTC-7), everybody
is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:

 - debrief discussions from LSF/MM/BPF 2026
 - v7 of Bharata's patch series, including new IBS hotness information
   and NUMBA2 hint faults
 - v5 of Shivank's series for enlightening migrate_pages() for hardware
   assists and how this work will be charged to userspace, including for
   memory compaction
 - v2 of tier-aware memcg limits, including new page counters and rework
   to pass folios into the charge path
 - Yiannis's patch series for non-temporal stores support
 - discuss generalized subsystem for providing bandwidth information
   independent of the underlying platform, ideally through resctrl,
   otherwise utilizing bandwidth information will be challenging
   + preferably this bandwidth monitoring is not per NUMA node but rather
     slow and fast
 - later: testing of tier aware memcg limits with Bharata's changes once
   tier aware memcg limits is stable and further along

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1] 
https://lore.kernel.org/linux-mm/a69f463c-0ee3-492c-8505-710d757a1f21@amd.com/


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-04-25 22:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-25 22:10 [Linux Memory Hotness and Promotion] Notes from April 23, 2026 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox