All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>, Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	wangkefeng.wang@huawei.com, linux-mm@kvack.org,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Fri, 12 Sep 2025 14:00:53 +0800	[thread overview]
Message-ID: <2e2dd208-c3f2-4d44-835e-003af2f019bd@linux.alibaba.com> (raw)
In-Reply-To: <0D6DB0F3-A893-4B60-9939-FF4575FEE3CB@nvidia.com>



On 2025/9/12 09:13, Zi Yan wrote:
> On 11 Sep 2025, at 20:28, Wei Yang wrote:
> 
>> On Thu, Sep 11, 2025 at 12:28:24PM -0400, Zi Yan wrote:
>>> On 11 Sep 2025, at 2:30, Wei Yang wrote:
>>>
>>>> On Thu, Sep 11, 2025 at 03:34:13AM +0000, Wei Yang wrote:
>>>>> On Wed, Sep 10, 2025 at 09:50:09PM -0400, Zi Yan wrote:
>>>>>> On 10 Sep 2025, at 21:38, Zi Yan wrote:
>>>>>>
>>>>>>> On 10 Sep 2025, at 21:35, Zi Yan wrote:
>>>>>>>
>>>>>>>> On 10 Sep 2025, at 21:25, Wei Yang wrote:
>>>>>>>>
>>>>>>>>> On Wed, Sep 10, 2025 at 09:22:40AM +0000, Wei Yang wrote:
>>>>>>>>>> Commit 56ae0bb349b4 ("mm: compaction: convert to use a folio in
>>>>>>>>>> isolate_migratepages_block()") converts api from page to folio. But the
>>>>>>>>>> low_pfn advance for hugetlb page seems wrong when low_pfn doesn't point
>>>>>>>>>> to head page.
>>>>>>>>>>
>>>>>>>>>> Originally, if page is a hugetlb tail page, compound_nr() return 1,
>>>>>>>>>> which means low_pfn only advance one in next iteration. After the
>>>>>>>>>> change, low_pfn would advance more than the hugetlb range, since
>>>>>>>>>> folio_nr_pages() always return total number of the large page. This
>>>>>>>>>> results in skipping some range to isolate and then to migrate.
>>>>>>>>>>
>>>>>>>>>> The worst case for alloc_contig is it does all the isolation and
>>>>>>>>>> migration, but finally find some range is still not isolated. And then
>>>>>>>>>> undo all the work and try a new range.
>>>>>>>>>>
>>>>>>>>>> Advance low_pfn to the end of hugetlb.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>>>>>>> Fixes: 56ae0bb349b4 ("mm: compaction: convert to use a folio in isolate_migratepages_block()")
>>>>>>
>>>>>> This behavior seems to be introduced by commit 369fa227c219 ("mm: make
>>>>>> alloc_contig_range handle free hugetlb pages”). The related change is:
>>>>>>
>>>>>> +               if (PageHuge(page) && cc->alloc_contig) {
>>>>>> +                       ret = isolate_or_dissolve_huge_page(page);
>>>>>> +
>>>>>> +                       /*
>>>>>> +                        * Fail isolation in case isolate_or_dissolve_huge_page()
>>>>>> +                        * reports an error. In case of -ENOMEM, abort right away.
>>>>>> +                        */
>>>>>> +                       if (ret < 0) {
>>>>>> +                                /* Do not report -EBUSY down the chain */
>>>>>> +                               if (ret == -EBUSY)
>>>>>> +                                       ret = 0;
>>>>>> +                               low_pfn += (1UL << compound_order(page)) - 1;
>>>>>
>>>>> The compound_order(page) return 1 for a tail page.
>>>>>
>>>>> See below.
>>>>>
>>>>>> +                               goto isolate_fail;
>>>>>> +                       }
>>>>>> +
>>>>>> +                       /*
>>>>>> +                        * Ok, the hugepage was dissolved. Now these pages are
>>>>>> +                        * Buddy and cannot be re-allocated because they are
>>>>>> +                        * isolated. Fall-through as the check below handles
>>>>>> +                        * Buddy pages.
>>>>>> +                        */
>>>>>> +               }
>>>>>> +
>>>>>>
>>>>>>>>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>
>>>>>>>>> Forgot to cc stable.
>>>>>>>>>
>>>>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>>>
>>>>>>>> Is there any bug report to justify the backport? Since it is more likely
>>>>>>>> to be a performance issue instead of a correctness issue.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> mm/compaction.c | 2 +-
>>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
>>>>>>>>>> --- a/mm/compaction.c
>>>>>>>>>> +++ b/mm/compaction.c
>>>>>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>>>>>> 				 * Hugepage was successfully isolated and placed
>>>>>>>>>> 				 * on the cc->migratepages list.
>>>>>>>>>> 				 */
>>>>>>>>>> -				low_pfn += folio_nr_pages(folio) - 1;
>>>>>>>>>> +				low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
>>>>>>>>>
>>>>>>>>> One question is why we advance compound_nr() in original version.
>>>>>>>>>
>>>>>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
>>>>>>>>> on the same large page and do the same thing and advance 1 again.
>>>>>>>>>
>>>>>>>>> Not sure which part story I missed.
>>>>>>>>
>>>>>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
>>>>>>>> How likely the code hit in the middle of a hugetlb?
>>>>>>>>
>>>>>>>
>>>>>>> In addition, there are two other “low_pfn += (1UL << order) - 1”
>>>>>>> in the if (PageHuge(page)), why not change them too if you think
>>>>>>> page can point to the middle of a hugetlb?
>>>>>>>
>>>>>
>>>>> The order here is get from compound_order(page), which is 1 for a tail page.
>>>>>
>>>>> So it looks right. Maybe I misunderstand it?
>>>>
>>>> Oops, compound_order(page) returns 0 for tail page.
>>>>
>>>> What I want to say is low_pfn advance by 1 for tail page. Sorry for the
>>>> misleading.
>>>
>>> OK, that sounds inefficient and inconsistent with your fix.
>>>
>>> While at it, can you also change two “low_pfn += (1UL << order) - 1” to skip
>>> the rest of hugetlb?
>>>
>>
>> Sure, glad to.
>>
>> You prefer do the fix in one patch or have a separate one?
> 
> A separate one is better, since these are optimizations, whereas your current
> patch is a fix.

I'm afraid we should not do that. Because the order getting from 
compound_order(page) is not stable without lock protection for a 
hugetlb, This is why we add a sanity check 'if (order <= 
MAX_PAGE_ORDER)' before advancing low_pfn.


  reply	other threads:[~2025-09-12  6:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  9:22 [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb Wei Yang
2025-09-11  1:25 ` Wei Yang
2025-09-11  1:35   ` Zi Yan
2025-09-11  1:38     ` Zi Yan
2025-09-11  1:50       ` Zi Yan
2025-09-11  3:34         ` Wei Yang
2025-09-11  6:30           ` Wei Yang
2025-09-11 16:28             ` Zi Yan
2025-09-12  0:28               ` Wei Yang
2025-09-12  1:13                 ` Zi Yan
2025-09-12  6:00                   ` Baolin Wang [this message]
2025-09-13  0:22                     ` Zi Yan
2025-09-11  3:27     ` Wei Yang
2025-09-11 16:19       ` Zi Yan
2025-09-11 17:27         ` Vishal Moola (Oracle)
2025-09-12  1:07           ` Wei Yang
2025-09-12  1:29             ` Zi Yan
2025-09-12 17:22               ` Vishal Moola (Oracle)
2025-09-13  0:11                 ` Wei Yang
2025-09-13  0:10               ` Wei Yang
2025-09-23  2:35                 ` Andrew Morton
2025-09-23  2:41                   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e2dd208-c3f2-4d44-835e-003af2f019bd@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=richard.weiyang@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.