Re: [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Donet Tom <donettom@linux.ibm.com>
To: Bharata B Rao <bharata@amd.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
	gourry@gourry.net, mgorman@techsingularity.net, mingo@redhat.com,
	peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com,
	rientjes@google.com, sj@kernel.org, weixugc@google.com,
	willy@infradead.org, ying.huang@linux.alibaba.com,
	ziy@nvidia.com, dave@stgolabs.net, nifan.cxl@gmail.com,
	xuezhengchu@huawei.com, yiannis@zptcorp.com,
	akpm@linux-foundation.org, david@redhat.com, byungchul@sk.com,
	kinseyho@google.com, joshua.hahnjy@gmail.com, yuanchu@google.com,
	balbirs@nvidia.com, alok.rathore@samsung.com, shivankg@amd.com
Subject: Re: [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot
Date: Thu, 30 Apr 2026 12:36:06 +0530	[thread overview]
Message-ID: <32d03cb3-2199-444b-94de-ff34cf2d5315@linux.ibm.com> (raw)
In-Reply-To: <ce8bf715-5e76-4c73-9523-8e7309821404@amd.com>

Hi Bharata

On 4/27/26 10:54 AM, Bharata B Rao wrote:
> On 24-Apr-26 6:27 PM, Donet Tom wrote:
>>> +int pghot_record_access(unsigned long pfn, int nid, int src, unsigned long now)
>>> +{
>>> +    struct mem_section *ms;
>>> +    struct folio *folio;
>>> +    phi_t *phi, *hot_map;
>>> +    struct page *page;
>>> +
>>> +    if (!kmigrated_started)
>>> +        return 0;
>>> +
>>> +    if (!pghot_nid_valid(nid))
>>> +        return -EINVAL;
>>> +
>>> +    switch (src) {
>>> +    case PGHOT_HINTFAULTS:
>>> +        if (!static_branch_unlikely(&pghot_src_hintfaults))
>>> +            return 0;
>>> +        count_vm_event(PGHOT_RECORDED_HINTFAULTS);
>>> +        break;
>>> +    case PGHOT_HWHINTS:
>>> +        if (!static_branch_unlikely(&pghot_src_hwhints))
>>> +            return 0;
>>> +        count_vm_event(PGHOT_RECORDED_HWHINTS);
>>> +        break;
>>> +    default:
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /*
>>> +     * Record only accesses from lower tiers.
>>> +     */
>>> +    if (node_is_toptier(pfn_to_nid(pfn)))
>>> +        return 0;
>>
>> Just a thought—could we check this at the beginning of the function, before the
>> switch case?
> I am accumulating two stats here: How many hot page intimations pghot obtained
> that are attributable to different sources
>
> and
>
> out of them, how many turned out to be actionable
>

Understood. Thanks for the clarification.

>
>>
>>> +
>>> +    /*
>>> +     * Reject the non-migratable pages right away.
>>> +     */
>>> +    page = pfn_to_online_page(pfn);
>>> +    if (!page || is_zone_device_page(page))
>>> +        return 0;
>>> +
>>> +    folio = page_folio(page);
>>> +    if (!folio_try_get(folio))
>>> +        return 0;
>>> +
>>> +    if (unlikely(page_folio(page) != folio))
>>> +        goto out;
>>> +
>>> +    if (!folio_test_lru(folio))
>>> +        goto out;
>>> +
>>> +    /* Get the hotness slot corresponding to the 1st PFN of the folio */
>>> +    pfn = folio_pfn(folio);
>>> +    ms = __pfn_to_section(pfn);
>>> +    if (!ms || !ms->hot_map)
>>> +        goto out;
>>> +
>>> +    hot_map = (phi_t *)(((unsigned long)(ms->hot_map)) &
>>> ~PGHOT_SECTION_HOT_MASK);
>>> +    phi = &hot_map[pfn % PAGES_PER_SECTION];
>>> +
>>> +    count_vm_event(PGHOT_RECORDED_ACCESSES);
> which is this ^
>
>>> +static void kmigrated_do_work(pg_data_t *pgdat)
>>> +{
>>> +    unsigned long section_nr, s_begin, start_pfn;
>>> +    struct mem_section *ms;
>>> +    int nid;
>>> +
>>> +    clear_bit(PGDAT_KMIGRATED_ACTIVATE, &pgdat->flags);
>>> +    s_begin = next_present_section_nr(-1);
>>> +    for_each_present_section_nr(s_begin, section_nr) {
>>> +        start_pfn = section_nr_to_pfn(section_nr);
>>
>> I may be missing something, but in pghot_setup_hot_map() and kmigrated_do_work()
>> we seem to iterate over all memory sections. On large memory systems, could this
>> become a bottleneck right?
>>
>> Since hot_map is allocated only for lower-tier memory and the hotness
>> information is primarily used there, would it make sense to skip scanning
>> higher-tier sections?
>>
>> for_each_online_node(nid) {
>>          if (node_is_toptier(nid))
>>              continue;
>>
>>          start_pfn = node_start_pfn(nid);
>>          end_pfn = node_end_pfn(nid);
>>
>>          s_begin = pfn_to_section_nr(start_pfn);
>>          for_each_present_section_nr(s_begin, section_nr) {
>>      }
>> }
>>
>> Would this approach be reasonable, or am I overlooking something?
> I didn't just yet optimize the walk. Since there is one kmigrated thread per
> lower tier, this routine already is aware of which node to scan. We can limit
> the section walk to that node instead. Something like this:
>
> static void kmigrated_do_work(pg_data_t *pgdat)
> {
> 	unsigned long section_nr, s_begin, start_pfn, end_pfn;
> 	struct mem_section *ms;
> 	int nid = pgdat->node_id;
>
> 	start_pfn = SECTION_ALIGN_DOWN(node_start_pfn(nid));
> 	end_pfn = SECTION_ALIGN_UP(start_pfn + node_end_pfn(nid));
>
> 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> 		section_nr = pfn_to_section_nr(pfn);
> 		
> 		if (!present_section_nr(section_nr))
> 			continue;
>
> 		ms = __nr_to_section(section_nr);
> 		
> 		...
>
> 		kmigrated_walk_zone(pfn, pfn + PAGES_PER_SECTION, nid);
> 	}
> }


Thanks. This looks good to me.


>
>>> +static int pghot_online_sec_hotmap(unsigned long start_pfn,
>>> +                   unsigned long nr_pages)
>>> +{
>>> +    int nid = pfn_to_nid(start_pfn);
>>> +    unsigned long start, end, pfn;
>>> +    struct mem_section *ms;
>>> +    int fail = 0;
>>> +
>>> +    start = SECTION_ALIGN_DOWN(start_pfn);
>>> +    end = SECTION_ALIGN_UP(start_pfn + nr_pages);
>>> +
>>> +    for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
>>> +        ms = __pfn_to_section(pfn);
>>> +        if (!ms || ms->hot_map)
>>> +            continue;
>>> +
>>> +        fail = pghot_alloc_hot_map(ms, nid);
>> I may be missing something, but after pghot_alloc_hot_map fails, we continue the
>> loop. Would it make sense to break and go to the cleanup logic instead?
> There is a !fail check in the for-loop due to which we break when alloc fails.

My bad, I missed it.

-Donet

> Regards,
> Bharata.
>

next prev parent reply	other threads:[~2026-04-30  7:06 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23  9:50 [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 1/5] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 2/5] mm: migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-03-26  5:50   ` Bharata B Rao
2026-04-21 15:25   ` Donet Tom
2026-04-21 16:05     ` Gregory Price
2026-04-22  3:26       ` Bharata B Rao
2026-04-22  3:37         ` Gregory Price
2026-04-22  4:04           ` Donet Tom
2026-04-22  4:15             ` Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot Bharata B Rao
2026-04-24 12:57   ` Donet Tom
2026-04-24 13:21     ` Gregory Price
2026-04-24 15:40       ` Donet Tom
2026-04-27  5:24     ` Bharata B Rao
2026-04-30  7:06       ` Donet Tom [this message]
2026-03-23  9:51 ` [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot Bharata B Rao
2026-03-26 10:41   ` Bharata B Rao
2026-03-23  9:51 ` [RFC PATCH v6 5/5] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-03-30  4:46   ` Bharata B Rao
2026-03-23  9:56 ` [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23  9:58 ` Bharata B Rao
2026-03-23  9:59 ` Bharata B Rao
2026-03-23 10:01 ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32d03cb3-2199-444b-94de-ff34cf2d5315@linux.ibm.com \
    --to=donettom@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=bharata@amd.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shivankg@amd.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox