From: Bharata B Rao <bharata@amd.com>
To: Donet Tom <donettom@linux.ibm.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: <Jonathan.Cameron@huawei.com>, <dave.hansen@intel.com>,
<gourry@gourry.net>, <mgorman@techsingularity.net>,
<mingo@redhat.com>, <peterz@infradead.org>,
<raghavendra.kt@amd.com>, <riel@surriel.com>,
<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
<akpm@linux-foundation.org>, <david@redhat.com>,
<byungchul@sk.com>, <kinseyho@google.com>,
<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
<shivankg@amd.com>
Subject: Re: [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot
Date: Mon, 27 Apr 2026 10:54:55 +0530 [thread overview]
Message-ID: <ce8bf715-5e76-4c73-9523-8e7309821404@amd.com> (raw)
In-Reply-To: <250e68f3-3664-4148-bfbf-52fd4230a3b9@linux.ibm.com>
On 24-Apr-26 6:27 PM, Donet Tom wrote:
>> +int pghot_record_access(unsigned long pfn, int nid, int src, unsigned long now)
>> +{
>> + struct mem_section *ms;
>> + struct folio *folio;
>> + phi_t *phi, *hot_map;
>> + struct page *page;
>> +
>> + if (!kmigrated_started)
>> + return 0;
>> +
>> + if (!pghot_nid_valid(nid))
>> + return -EINVAL;
>> +
>> + switch (src) {
>> + case PGHOT_HINTFAULTS:
>> + if (!static_branch_unlikely(&pghot_src_hintfaults))
>> + return 0;
>> + count_vm_event(PGHOT_RECORDED_HINTFAULTS);
>> + break;
>> + case PGHOT_HWHINTS:
>> + if (!static_branch_unlikely(&pghot_src_hwhints))
>> + return 0;
>> + count_vm_event(PGHOT_RECORDED_HWHINTS);
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + /*
>> + * Record only accesses from lower tiers.
>> + */
>> + if (node_is_toptier(pfn_to_nid(pfn)))
>> + return 0;
>
>
> Just a thought—could we check this at the beginning of the function, before the
> switch case?
I am accumulating two stats here: How many hot page intimations pghot obtained
that are attributable to different sources
and
out of them, how many turned out to be actionable
>
>
>> +
>> + /*
>> + * Reject the non-migratable pages right away.
>> + */
>> + page = pfn_to_online_page(pfn);
>> + if (!page || is_zone_device_page(page))
>> + return 0;
>> +
>> + folio = page_folio(page);
>> + if (!folio_try_get(folio))
>> + return 0;
>> +
>> + if (unlikely(page_folio(page) != folio))
>> + goto out;
>> +
>> + if (!folio_test_lru(folio))
>> + goto out;
>> +
>> + /* Get the hotness slot corresponding to the 1st PFN of the folio */
>> + pfn = folio_pfn(folio);
>> + ms = __pfn_to_section(pfn);
>> + if (!ms || !ms->hot_map)
>> + goto out;
>> +
>> + hot_map = (phi_t *)(((unsigned long)(ms->hot_map)) &
>> ~PGHOT_SECTION_HOT_MASK);
>> + phi = &hot_map[pfn % PAGES_PER_SECTION];
>> +
>> + count_vm_event(PGHOT_RECORDED_ACCESSES);
which is this ^
>> +static void kmigrated_do_work(pg_data_t *pgdat)
>> +{
>> + unsigned long section_nr, s_begin, start_pfn;
>> + struct mem_section *ms;
>> + int nid;
>> +
>> + clear_bit(PGDAT_KMIGRATED_ACTIVATE, &pgdat->flags);
>> + s_begin = next_present_section_nr(-1);
>> + for_each_present_section_nr(s_begin, section_nr) {
>> + start_pfn = section_nr_to_pfn(section_nr);
>
>
> I may be missing something, but in pghot_setup_hot_map() and kmigrated_do_work()
> we seem to iterate over all memory sections. On large memory systems, could this
> become a bottleneck right?
>
> Since hot_map is allocated only for lower-tier memory and the hotness
> information is primarily used there, would it make sense to skip scanning
> higher-tier sections?
>
> for_each_online_node(nid) {
> if (node_is_toptier(nid))
> continue;
>
> start_pfn = node_start_pfn(nid);
> end_pfn = node_end_pfn(nid);
>
> s_begin = pfn_to_section_nr(start_pfn);
> for_each_present_section_nr(s_begin, section_nr) {
> }
> }
>
> Would this approach be reasonable, or am I overlooking something?
I didn't just yet optimize the walk. Since there is one kmigrated thread per
lower tier, this routine already is aware of which node to scan. We can limit
the section walk to that node instead. Something like this:
static void kmigrated_do_work(pg_data_t *pgdat)
{
unsigned long section_nr, s_begin, start_pfn, end_pfn;
struct mem_section *ms;
int nid = pgdat->node_id;
start_pfn = SECTION_ALIGN_DOWN(node_start_pfn(nid));
end_pfn = SECTION_ALIGN_UP(start_pfn + node_end_pfn(nid));
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
section_nr = pfn_to_section_nr(pfn);
if (!present_section_nr(section_nr))
continue;
ms = __nr_to_section(section_nr);
...
kmigrated_walk_zone(pfn, pfn + PAGES_PER_SECTION, nid);
}
}
>> +static int pghot_online_sec_hotmap(unsigned long start_pfn,
>> + unsigned long nr_pages)
>> +{
>> + int nid = pfn_to_nid(start_pfn);
>> + unsigned long start, end, pfn;
>> + struct mem_section *ms;
>> + int fail = 0;
>> +
>> + start = SECTION_ALIGN_DOWN(start_pfn);
>> + end = SECTION_ALIGN_UP(start_pfn + nr_pages);
>> +
>> + for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
>> + ms = __pfn_to_section(pfn);
>> + if (!ms || ms->hot_map)
>> + continue;
>> +
>> + fail = pghot_alloc_hot_map(ms, nid);
>
> I may be missing something, but after pghot_alloc_hot_map fails, we continue the
> loop. Would it make sense to break and go to the cleanup logic instead?
There is a !fail check in the for-loop due to which we break when alloc fails.
Regards,
Bharata.
next prev parent reply other threads:[~2026-04-27 5:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 9:50 [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 1/5] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 2/5] mm: migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-03-26 5:50 ` Bharata B Rao
2026-04-21 15:25 ` Donet Tom
2026-04-21 16:05 ` Gregory Price
2026-04-22 3:26 ` Bharata B Rao
2026-04-22 3:37 ` Gregory Price
2026-04-22 4:04 ` Donet Tom
2026-04-22 4:15 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot Bharata B Rao
2026-04-24 12:57 ` Donet Tom
2026-04-24 13:21 ` Gregory Price
2026-04-24 15:40 ` Donet Tom
2026-04-27 5:24 ` Bharata B Rao [this message]
2026-04-30 7:06 ` Donet Tom
2026-03-23 9:51 ` [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot Bharata B Rao
2026-03-26 10:41 ` Bharata B Rao
2026-03-23 9:51 ` [RFC PATCH v6 5/5] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-03-30 4:46 ` Bharata B Rao
2026-03-23 9:56 ` [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-03-23 9:58 ` Bharata B Rao
2026-03-23 9:59 ` Bharata B Rao
2026-03-23 10:01 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce8bf715-5e76-4c73-9523-8e7309821404@amd.com \
--to=bharata@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alok.rathore@samsung.com \
--cc=balbirs@nvidia.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=donettom@linux.ibm.com \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=kinseyho@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=shivankg@amd.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox