From: Wei Yang <richard.weiyang@gmail.com>
To: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>, Wei Yang <richard.weiyang@gmail.com>,
akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
wangkefeng.wang@huawei.com, linux-mm@kvack.org,
Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Sat, 13 Sep 2025 00:11:59 +0000 [thread overview]
Message-ID: <20250913001159.2oamn2veekw2f6li@master> (raw)
In-Reply-To: <aMRWuS-Ore7jgna3@fedora>
On Fri, Sep 12, 2025 at 10:22:01AM -0700, Vishal Moola (Oracle) wrote:
[...]
>> >>>>>>> ---
>> >>>>>>> mm/compaction.c | 2 +-
>> >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >>>>>>>
>> >>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>> >>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
>> >>>>>>> --- a/mm/compaction.c
>> >>>>>>> +++ b/mm/compaction.c
>> >>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> >>>>>>> * Hugepage was successfully isolated and placed
>> >>>>>>> * on the cc->migratepages list.
>> >>>>>>> */
>> >>>>>>> - low_pfn += folio_nr_pages(folio) - 1;
>> >>>>>>> + low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
>> >>>>>>
>> >>>>>> One question is why we advance compound_nr() in original version.
>> >>>>>>
>> >>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
>> >>>>>> on the same large page and do the same thing and advance 1 again.
>> >>>>>>
>> >>>>>> Not sure which part story I missed.
>> >>>>>
>> >>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
>> >>>>> How likely the code hit in the middle of a hugetlb?
>> >>>>>
>> >>>>
>> >>>> OK, this is a kind of optimization based on the knowledge it is not likely to
>> >>>> be a tail page?
>> >>>
>> >>> No, it might be that most of the time page is the head, or people assume so.
>> >>
>> >> For compound pages, we will always have tail pfn < head pfn, so we should
>> >> always find the head page first.
>> >>
>> >
>> > I think you want to say tail pfn > head pfn?
>> >
>> >> If you did find a case where we somehow encounter a tail page here, I'd
>> >> love to see it. And then you'd also want to make sure the other compaction
>> >> trackers are appropriately accounted for.
>> >
>> > I may not follow you here, below is the call flow for
>> > isolate_migratepages_block() invoked during __alloc_contig_pages().
>> >
>> > __alloc_contig_pages(nr_pages, ..);
>> > start = ALIGN(zone->zone_start_pfn, nr_pages);
>> > alloc_contig_range_noprof(start, ..);
>> > __alloc_contig_migrate_range(.., start, ..);
>> > pfn = start;
>> > isolate_migratepages_range(.., pfn, ..);
>> > isolate_migratepages_block(.., pfn, ..);
>> > page = pfn_to_page(pfn);
>> > start += nr_pages;
>> >
>> > In the loop of __alloc_contig_pages(), it iterate on each nr_pages range. And
>> > nr_pages seems could be any positive number, so it looks the first pfn checked
>> > by isolate_migratepages_block() could be not aligned with page order or less
>> > than MAX_PAGE_ORDER. This mean it could be a tail page per my understanding.
>> >
>> > Maybe I missed some point here?
>
>That looks right, I missed the comment:
> /* Scan block by block. First and last block may be incomplete */
>
>> You are right.
>>
>> But nr_pages cannot be any positive number, since ALIGN only accepts
>> power of 2 as the alignment. So alloc_contig_pages() might need another
>> fix to handle the case nr_pages is not power of 2.
>>
>> Oh, after I checked pfn_range_valid_contig(), I find your example does not
>> apply, since it returns false when any page in the range is PageHuge.
>> This means with your example, PageHuge branch will never be executed.
>> But alloc_contig_range_noprof() is exported and can be used directly,
>> the @start input can be any pfn, which can be in the middle of PageHuge.
>> So your fix is still needed for this case.
>
>Makes sense to me.
Thanks :-)
--
Wei Yang
Help you, Help me
next prev parent reply other threads:[~2025-09-13 0:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 9:22 [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb Wei Yang
2025-09-11 1:25 ` Wei Yang
2025-09-11 1:35 ` Zi Yan
2025-09-11 1:38 ` Zi Yan
2025-09-11 1:50 ` Zi Yan
2025-09-11 3:34 ` Wei Yang
2025-09-11 6:30 ` Wei Yang
2025-09-11 16:28 ` Zi Yan
2025-09-12 0:28 ` Wei Yang
2025-09-12 1:13 ` Zi Yan
2025-09-12 6:00 ` Baolin Wang
2025-09-13 0:22 ` Zi Yan
2025-09-11 3:27 ` Wei Yang
2025-09-11 16:19 ` Zi Yan
2025-09-11 17:27 ` Vishal Moola (Oracle)
2025-09-12 1:07 ` Wei Yang
2025-09-12 1:29 ` Zi Yan
2025-09-12 17:22 ` Vishal Moola (Oracle)
2025-09-13 0:11 ` Wei Yang [this message]
2025-09-13 0:10 ` Wei Yang
2025-09-23 2:35 ` Andrew Morton
2025-09-23 2:41 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250913001159.2oamn2veekw2f6li@master \
--to=richard.weiyang@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).