From: Donet Tom <donettom@linux.ibm.com>
To: Bharata B Rao <bharata@amd.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
gourry@gourry.net, mgorman@techsingularity.net, mingo@redhat.com,
peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com,
rientjes@google.com, sj@kernel.org, weixugc@google.com,
willy@infradead.org, ying.huang@linux.alibaba.com,
ziy@nvidia.com, dave@stgolabs.net, nifan.cxl@gmail.com,
xuezhengchu@huawei.com, yiannis@zptcorp.com,
akpm@linux-foundation.org, david@redhat.com, byungchul@sk.com,
kinseyho@google.com, joshua.hahnjy@gmail.com, yuanchu@google.com,
balbirs@nvidia.com, alok.rathore@samsung.com, shivankg@amd.com
Subject: Re: [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot
Date: Thu, 30 Apr 2026 12:36:06 +0530 [thread overview]
Message-ID: <32d03cb3-2199-444b-94de-ff34cf2d5315@linux.ibm.com> (raw)
In-Reply-To: <ce8bf715-5e76-4c73-9523-8e7309821404@amd.com>
Hi Bharata
On 4/27/26 10:54 AM, Bharata B Rao wrote:
> On 24-Apr-26 6:27 PM, Donet Tom wrote:
>>> +int pghot_record_access(unsigned long pfn, int nid, int src, unsigned long now)
>>> +{
>>> + struct mem_section *ms;
>>> + struct folio *folio;
>>> + phi_t *phi, *hot_map;
>>> + struct page *page;
>>> +
>>> + if (!kmigrated_started)
>>> + return 0;
>>> +
>>> + if (!pghot_nid_valid(nid))
>>> + return -EINVAL;
>>> +
>>> + switch (src) {
>>> + case PGHOT_HINTFAULTS:
>>> + if (!static_branch_unlikely(&pghot_src_hintfaults))
>>> + return 0;
>>> + count_vm_event(PGHOT_RECORDED_HINTFAULTS);
>>> + break;
>>> + case PGHOT_HWHINTS:
>>> + if (!static_branch_unlikely(&pghot_src_hwhints))
>>> + return 0;
>>> + count_vm_event(PGHOT_RECORDED_HWHINTS);
>>> + break;
>>> + default:
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /*
>>> + * Record only accesses from lower tiers.
>>> + */
>>> + if (node_is_toptier(pfn_to_nid(pfn)))
>>> + return 0;
>>
>> Just a thought—could we check this at the beginning of the function, before the
>> switch case?
> I am accumulating two stats here: How many hot page intimations pghot obtained
> that are attributable to different sources
>
> and
>
> out of them, how many turned out to be actionable
>
Understood. Thanks for the clarification.
>
>>
>>> +
>>> + /*
>>> + * Reject the non-migratable pages right away.
>>> + */
>>> + page = pfn_to_online_page(pfn);
>>> + if (!page || is_zone_device_page(page))
>>> + return 0;
>>> +
>>> + folio = page_folio(page);
>>> + if (!folio_try_get(folio))
>>> + return 0;
>>> +
>>> + if (unlikely(page_folio(page) != folio))
>>> + goto out;
>>> +
>>> + if (!folio_test_lru(folio))
>>> + goto out;
>>> +
>>> + /* Get the hotness slot corresponding to the 1st PFN of the folio */
>>> + pfn = folio_pfn(folio);
>>> + ms = __pfn_to_section(pfn);
>>> + if (!ms || !ms->hot_map)
>>> + goto out;
>>> +
>>> + hot_map = (phi_t *)(((unsigned long)(ms->hot_map)) &
>>> ~PGHOT_SECTION_HOT_MASK);
>>> + phi = &hot_map[pfn % PAGES_PER_SECTION];
>>> +
>>> + count_vm_event(PGHOT_RECORDED_ACCESSES);
> which is this ^
>
>>> +static void kmigrated_do_work(pg_data_t *pgdat)
>>> +{
>>> + unsigned long section_nr, s_begin, start_pfn;
>>> + struct mem_section *ms;
>>> + int nid;
>>> +
>>> + clear_bit(PGDAT_KMIGRATED_ACTIVATE, &pgdat->flags);
>>> + s_begin = next_present_section_nr(-1);
>>> + for_each_present_section_nr(s_begin, section_nr) {
>>> + start_pfn = section_nr_to_pfn(section_nr);
>>
>> I may be missing something, but in pghot_setup_hot_map() and kmigrated_do_work()
>> we seem to iterate over all memory sections. On large memory systems, could this
>> become a bottleneck right?
>>
>> Since hot_map is allocated only for lower-tier memory and the hotness
>> information is primarily used there, would it make sense to skip scanning
>> higher-tier sections?
>>
>> for_each_online_node(nid) {
>> if (node_is_toptier(nid))
>> continue;
>>
>> start_pfn = node_start_pfn(nid);
>> end_pfn = node_end_pfn(nid);
>>
>> s_begin = pfn_to_section_nr(start_pfn);
>> for_each_present_section_nr(s_begin, section_nr) {
>> }
>> }
>>
>> Would this approach be reasonable, or am I overlooking something?
> I didn't just yet optimize the walk. Since there is one kmigrated thread per
> lower tier, this routine already is aware of which node to scan. We can limit
> the section walk to that node instead. Something like this:
>
> static void kmigrated_do_work(pg_data_t *pgdat)
> {
> unsigned long section_nr, s_begin, start_pfn, end_pfn;
> struct mem_section *ms;
> int nid = pgdat->node_id;
>
> start_pfn = SECTION_ALIGN_DOWN(node_start_pfn(nid));
> end_pfn = SECTION_ALIGN_UP(start_pfn + node_end_pfn(nid));
>
> for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> section_nr = pfn_to_section_nr(pfn);
>
> if (!present_section_nr(section_nr))
> continue;
>
> ms = __nr_to_section(section_nr);
>
> ...
>
> kmigrated_walk_zone(pfn, pfn + PAGES_PER_SECTION, nid);
> }
> }
Thanks. This looks good to me.
>
>>> +static int pghot_online_sec_hotmap(unsigned long start_pfn,
>>> + unsigned long nr_pages)
>>> +{
>>> + int nid = pfn_to_nid(start_pfn);
>>> + unsigned long start, end, pfn;
>>> + struct mem_section *ms;
>>> + int fail = 0;
>>> +
>>> + start = SECTION_ALIGN_DOWN(start_pfn);
>>> + end = SECTION_ALIGN_UP(start_pfn + nr_pages);
>>> +
>>> + for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
>>> + ms = __pfn_to_section(pfn);
>>> + if (!ms || ms->hot_map)
>>> + continue;
>>> +
>>> + fail = pghot_alloc_hot_map(ms, nid);
>> I may be missing something, but after pghot_alloc_hot_map fails, we continue the
>> loop. Would it make sense to break and go to the cleanup logic instead?
> There is a !fail check in the for-loop due to which we break when alloc fails.
My bad, I missed it.
-Donet
> Regards,
> Bharata.
>
next prev parent reply other threads:[~2026-04-30 7:06 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 9:50 [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 1/5] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 2/5] mm: migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-03-26 5:50 ` Bharata B Rao
2026-04-21 15:25 ` Donet Tom
2026-04-21 16:05 ` Gregory Price
2026-04-22 3:26 ` Bharata B Rao
2026-04-22 3:37 ` Gregory Price
2026-04-22 4:04 ` Donet Tom
2026-04-22 4:15 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot Bharata B Rao
2026-04-24 12:57 ` Donet Tom
2026-04-24 13:21 ` Gregory Price
2026-04-24 15:40 ` Donet Tom
2026-04-27 5:24 ` Bharata B Rao
2026-04-30 7:06 ` Donet Tom [this message]
2026-03-23 9:51 ` [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot Bharata B Rao
2026-03-26 10:41 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 5/5] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-03-30 4:46 ` Bharata B Rao
2026-03-23 9:56 ` [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:58 ` Bharata B Rao
2026-03-23 9:59 ` Bharata B Rao
2026-03-23 10:01 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32d03cb3-2199-444b-94de-ff34cf2d5315@linux.ibm.com \
--to=donettom@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alok.rathore@samsung.com \
--cc=balbirs@nvidia.com \
--cc=bharata@amd.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=kinseyho@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=shivankg@amd.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox