All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Raghavendra K T <raghavendra.kt@amd.com>, <linux-mm@kvack.org>,
	<akpm@linux-foundation.org>, <lsf-pc@lists.linux-foundation.org>,
	<bharata@amd.com>, <gourry@gourry.net>, <nehagholkar@meta.com>,
	<abhishekd@meta.com>, <nphamcs@gmail.com>, <hannes@cmpxchg.org>,
	<feng.tang@intel.com>, <kbusch@meta.com>, <Hasan.Maruf@amd.com>,
	<sj@kernel.org>, <david@redhat.com>, <willy@infradead.org>,
	<k.shutemov@gmail.com>, <mgorman@techsingularity.net>,
	<vbabka@suse.cz>, <hughd@google.com>, <rientjes@google.com>,
	<shy828301@gmail.com>, <liam.howlett@oracle.com>,
	<peterz@infradead.org>, <mingo@redhat.com>,
	<nadav.amit@gmail.com>, <shivankg@amd.com>, <ziy@nvidia.com>,
	<jhubbard@nvidia.com>, <AneeshKumar.KizhakeVeetil@arm.com>,
	<linux-kernel@vger.kernel.org>, <jon.grimm@amd.com>,
	<santosh.shukla@amd.com>, <Michael.Day@amd.com>,
	<riel@surriel.com>, <weixugc@google.com>,
	<leesuyeon0506@gmail.com>, <honggyu.kim@sk.com>,
	<leillc@google.com>, <kmanaouil.dev@gmail.com>, <rppt@kernel.org>,
	<dave.hansen@intel.com>
Subject: Re: [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted?
Date: Fri, 14 Mar 2025 14:24:12 +0000	[thread overview]
Message-ID: <20250314142412.00001689@huawei.com> (raw)
In-Reply-To: <87h64u2xkh.fsf@DESKTOP-5N7EMDA>

On Sun, 16 Feb 2025 14:49:50 +0800
"Huang, Ying" <ying.huang@linux.alibaba.com> wrote:

> Hi, Jonathan,
> 
> Sorry for late reply.

Sorry for even later reply!

> 
> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:
> 
> > On Fri, 31 Jan 2025 12:28:03 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >  
> >> > Here is the list of potential discussion points:    
> >> ...
> >>   
> >> > 2. Possibility of maintaining single source of truth for page hotness that would
> >> > maintain hot page information from multiple sources and let other sub-systems
> >> > use that info.    
> >> Hi,
> >> 
> >> I was thinking of proposing a separate topic on a single source of hotness,
> >> but this question covers it so I'll add some thoughts here instead.
> >> I think we are very early, but sharing some experience and thoughts in a
> >> session may be useful.  
> >
> > Thinking more on this over lunch, I think it is worth calling this out as a
> > potential session topic in it's own right rather than trying to find
> > time within other sessions.  Hence the title change.
> >
> > I think a session would start with a brief listing of the temperature sources
> > we have and those on the horizon to motivate what we are unifying, then
> > discussion to focus on need for such a unification + requirements 
> > (maybe with a straw man).
> >  
> >> 
> >> What do the other subsystems that want to use a single source of page hotness
> >> want to be able to find out? (subject to filters like memory range, process etc)
> >> 
> >> A) How hot is page X?  
> >> - Is this useful, or too much data? What would use it?
> >>   * Application optimization maybe. Very handy for developing algorithms
> >>     to do the rest of the options here as an Oracle!
> >> - Provides both the cold and hot end of the scale, but maybe measurement
> >>   techniques vary and can not be easily combined. Hard in general to combine
> >>   multiple sources of truth if aiming for an absolute number.
> >> 
> >> B) Which pages are super hot?
> >> - Probably these that make the most difference if they are in a slower memory tier.
> >> 
> >> C) Some pages are hot enough to consider moving?
> >> - This may be good enough to get the key data into the fast memory over time.
> >> - Can combine sources of info as being able to compare precise numbers doesn't matter.
> >> 
> >> D) Which pages are fairly cold?
> >> - Likewise maybe good enough over time.
> >> 
> >> E) Which pages are very cold?
> >> - Ideal case for tiering. Swap these with the super hot ones.
> >> - Maybe extra signal for swap / zswap etc
> >> 
> >> F) Did these hot pages remain hot (and same for cold)
> >> - This is needed to know when to back off doing things as we have unstable
> >>   hotness (two phase applications are a pain for this), sampling a few
> >>   pages may be fine.
> >> 
> >> Messy corners:
> >> 
> >> Temporal aspects.
> >> - If only providing lists of hottest / coldest in last second, very hard
> >>   to find those that are of a stable temperature. We end up moving
> >>   very hot data (which is disruptive) and it doesn't stay hot.
> >> - Can reduce that affect by long sampling windows on some measurement approaches
> >>   (on hardware trackers that can trash accuracy due to resource exhaustion
> >>    and other subtle effects).
> >> - bistable / phase based applications are a pain but perhaps up to higher
> >>   levels to back off.
> >> 
> >> My main interest is migrating in tiered systems but good to look at what
> >> else would use a common layer.
> >> 
> >> Mostly I want to know something that is useful to move, and assume convergence
> >> over the long term with the best things to move so to me the ideal layer has
> >> following interface (strawman so shoot holes in it!):
> >> 
> >> 1) Give me up to X hotish pages from a slow tier (greater than a specific measure
> >> of temperature)  
> 
> Because the hot pages may be available upon page accessing (such PROT_NONE
> page fault), the interface may be "push" style instead of "pull" style,
> e.g.,

Absolutely agree that might be the approach, but with some form of back pressure
as for at least some approaches it is much cheaper to find a find a few hot
pages than to find lots of them.  More complex if you want a few of the very hottest
or just hotter than X. 

> 
> int register_hot_page_handler(void (*handler)(struct page *hot_page, int temperature));
> 
> >> 2) Give me X coldish pages a faster tier.
> >> 3) I expect to ask again in X seconds so please have some info ready for me!
> >> 4) (a path to get an idea of 'unhelpful moves' from earlier iterations - this
> >>     is bleeding the tiering application into a shared interface though).  
> 
> In addition to get a list hot/cold pages, it's also useful to get
> hot/cold statistics of a memory device (NUMA node), e.g., something like
> below,
> 
> Access frequency        percent
>    > 1000 HZ            10%  
>  600-1000 HZ            20%
>  200- 600 HZ            50%
>    1- 200 HZ            15%
>       < 1 HZ             5%
> 
> Compared with hot/cold pages list, this may be gotten with lower
> overhead and can be useful to tune the promotion/demotion alrogithm.  At
> the same time, a sampled (incomplete) list of hot/cold page list may be
> available too.

I agree it's useful info and 'might' be cheaper to get.  Depends on the
tracking solution and impacts of sampling approaches.

> 
> >> If we have multiple subsystems using the data we will need to resolve their
> >> conflicting demands to generate good enough data with appropriate overhead.
> >> 
> >> I'd also like a virtualized solution for case of hardware PA trackers (what
> >> I have with CXL Hotness Monitoring Units) and classic memory pool / stranding
> >> avoidance case where the VM is the right entity to make migration decisions.
> >> Making that interface convey what the kernel is going to use would be an
> >> efficient option. I'd like to hide how the sausage was made from the VM.  
> 
> ---
> Best Regards,
> Huang, Ying



  parent reply	other threads:[~2025-03-14 14:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23 10:57 [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning Raghavendra K T
2025-01-23 18:20 ` SeongJae Park
2025-01-24  8:54   ` Raghavendra K T
2025-01-24 18:05     ` Jonathan Cameron
2025-01-24  5:53 ` Hyeonggon Yoo
2025-01-24  9:02   ` Raghavendra K T
2025-01-27  7:01     ` David Rientjes
2025-01-27  7:11       ` Raghavendra K T
2025-02-06  3:14   ` Yuanchu Xie
2025-01-26  2:27 ` Huang, Ying
2025-01-27  5:11   ` Bharata B Rao
2025-01-27 18:34     ` SeongJae Park
2025-02-07  8:10       ` Huang, Ying
2025-02-07  9:06         ` Gregory Price
2025-02-07 19:52         ` SeongJae Park
2025-02-07 19:06   ` Davidlohr Bueso
2025-03-14  1:56     ` Raghavendra K T
2025-03-14  2:12       ` Raghavendra K T
2025-01-31 12:28 ` Jonathan Cameron
2025-01-31 13:09   ` [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted? Jonathan Cameron
2025-02-05  6:24     ` Bharata B Rao
2025-02-05 16:05       ` Johannes Weiner
2025-02-06  6:46         ` SeongJae Park
2025-02-06 15:30         ` Jonathan Cameron
2025-02-07  9:50       ` Matthew Wilcox
2025-02-16  7:04       ` Huang, Ying
2025-02-16  6:49     ` Huang, Ying
2025-02-17  4:10       ` Bharata B Rao
2025-02-17  8:06         ` Huang, Ying
2025-03-14 14:24       ` Jonathan Cameron [this message]
2025-03-17 22:34         ` Davidlohr Bueso
2025-02-03  2:23   ` [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning Raghavendra K T
2025-04-07  3:13 ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250314142412.00001689@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Michael.Day@amd.com \
    --cc=abhishekd@meta.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=honggyu.kim@sk.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kbusch@meta.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=nehagholkar@meta.com \
    --cc=nphamcs@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=santosh.shukla@amd.com \
    --cc=shivankg@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.