All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	David Rientjes <rientjes@google.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Barry Song <21cnbao@gmail.com>, Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup
Date: Thu, 21 Sep 2023 09:42:10 +0800	[thread overview]
Message-ID: <98613579-0644-4532-8DCC-D4BA7183AB15@linux.dev> (raw)
In-Reply-To: <257b5833-5aaf-4748-a576-7610bd36e632@oracle.com>



> On Sep 20, 2023, at 18:39, Joao Martins <joao.m.martins@oracle.com> wrote:
> 
> On 20/09/2023 03:47, Muchun Song wrote:
>>> On Sep 19, 2023, at 23:09, Joao Martins <joao.m.martins@oracle.com> wrote:
>>> On 19/09/2023 09:57, Muchun Song wrote:
>>>>> On Sep 19, 2023, at 16:55, Joao Martins <joao.m.martins@oracle.com> wrote:
>>>>> On 19/09/2023 09:41, Muchun Song wrote:
>>>>>>> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@oracle.com> wrote:
>>>>>>> On 19/09/2023 07:42, Muchun Song wrote:
>>>>>>>> On 2023/9/19 07:01, Mike Kravetz wrote:
>>>>>>>>>   list_for_each_entry(folio, folio_list, lru) {
>>>>>>>>>       int ret = __hugetlb_vmemmap_optimize(h, &folio->page,
>>>>>>>>>                               &vmemmap_pages);
>>>>>>>> 
>>>>>>>> This is unlikely to be failed since the page table allocation
>>>>>>>> is moved to the above 
>>>>>>> 
>>>>>>>> (Note that the head vmemmap page allocation
>>>>>>>> is not mandatory). 
>>>>>>> 
>>>>>>> Good point that I almost forgot
>>>>>>> 
>>>>>>>> So we should handle the error case in the above
>>>>>>>> splitting operation.
>>>>>>> 
>>>>>>> But back to the previous discussion in v2... the thinking was that /some/ PMDs
>>>>>>> got split, and say could allow some PTE remapping to occur and free some pages
>>>>>>> back (each page allows 6 more splits worst case). Then the next
>>>>>>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those
>>>>>>> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb
>>>>>>> flush in this stage).
>>>>>> 
>>>>>> Oh, yes. Maybe we could break the above traversal as early as possible
>>>>>> once we enter an ENOMEM?
>>>>>> 
>>>>> 
>>>>> Sounds good -- no point in keep trying to split if we are failing with OOM.
>>>>> 
>>>>> Perhaps a comment in both of these clauses (the early break on split and the OOM
>>>>> handling in batch optimize) could help make this clear.
>>>> 
>>>> Make sense.
>>> 
>>> These are the changes I have so far for this patch based on the discussion so
>>> far. For next one it's at the end:
>> 
>> Code looks good to me. One nit below.
>> 
> Thanks
> 
>>> 
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index e8bc2f7567db..d9c6f2cf698c 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -27,7 +27,8 @@
>>> * @reuse_addr:                the virtual address of the @reuse_page page.
>>> * @vmemmap_pages:     the list head of the vmemmap pages that can be freed
>>> *                     or is mapped from.
>>> - * @flags:             used to modify behavior in bulk operations
>>> + * @flags:             used to modify behavior in vmemmap page table walking
>>> + *                     operations.
>>> */
>>> struct vmemmap_remap_walk {
>>>       void                    (*remap_pte)(pte_t *pte, unsigned long addr,
>>> @@ -36,6 +37,8 @@ struct vmemmap_remap_walk {
>>>       struct page             *reuse_page;
>>>       unsigned long           reuse_addr;
>>>       struct list_head        *vmemmap_pages;
>>> +
>>> +/* Skip the TLB flush when we split the PMD */
>>> #define VMEMMAP_SPLIT_NO_TLB_FLUSH     BIT(0)
>>>       unsigned long           flags;
>>> };
>>> @@ -132,7 +135,7 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
>>>               int ret;
>>> 
>>>               ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK,
>>> -                               walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH);
>>> +                               !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH));
>>>               if (ret)
>>>                       return ret;
>>> 
>>> @@ -677,13 +680,13 @@ void hugetlb_vmemmap_optimize(const struct hstate *h,
>>> struct page *head)
>>>       free_vmemmap_page_list(&vmemmap_pages);
>>> }
>>> 
>>> -static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head)
>>> +static int hugetlb_vmemmap_split(const struct hstate *h, struct page *head)
>>> {
>>>       unsigned long vmemmap_start = (unsigned long)head, vmemmap_end;
>>>       unsigned long vmemmap_reuse;
>>> 
>>>       if (!vmemmap_should_optimize(h, head))
>>> -               return;
>>> +               return 0;
>>> 
>>>       vmemmap_end     = vmemmap_start + hugetlb_vmemmap_size(h);
>>>       vmemmap_reuse   = vmemmap_start;
>>> @@ -693,7 +696,7 @@ static void hugetlb_vmemmap_split(const struct hstate *h,
>>> struct page *head)
>>>        * Split PMDs on the vmemmap virtual address range [@vmemmap_start,
>>>        * @vmemmap_end]
>>>        */
>>> -       vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse);
>>> +       return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse);
>>> }
>>> 
>>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head
>>> *folio_list)
>>> @@ -701,8 +704,18 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h,
>>> struct list_head *folio_l
>>>       struct folio *folio;
>>>       LIST_HEAD(vmemmap_pages);
>>> 
>>> -       list_for_each_entry(folio, folio_list, lru)
>>> -               hugetlb_vmemmap_split(h, &folio->page);
>>> +       list_for_each_entry(folio, folio_list, lru) {
>>> +               int ret = hugetlb_vmemmap_split(h, &folio->page);
>>> +
>>> +               /*
>>> +                * Spliting the PMD requires allocating a page, thus lets fail
>>                      ^^^^                                 ^^^
>>                    Splitting                           page table page
>> 
>> I'd like to specify the functionality of the allocated page.
>> 
> OK
> 
>>> +                * early once we encounter the first OOM. No point in retrying
>>> +                * as it can be dynamically done on remap with the memory
>>> +                * we get back from the vmemmap deduplication.
>>> +                */
>>> +               if (ret == -ENOMEM)
>>> +                       break;
>>> +       }
>>> 
>>>       flush_tlb_all();
>>> 
>>> For patch 7, I only have commentary added derived from this earlier discussion
>>> above:
>>> 
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index d9c6f2cf698c..f6a1020a4b6a 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -40,6 +40,8 @@ struct vmemmap_remap_walk {
>>> 
>>> /* Skip the TLB flush when we split the PMD */
>>> #define VMEMMAP_SPLIT_NO_TLB_FLUSH     BIT(0)
>>> +/* Skip the TLB flush when we remap the PTE */
>>> #define VMEMMAP_REMAP_NO_TLB_FLUSH     BIT(1)
>>>       unsigned long           flags;
>>> };
>>> 
>>> @@ -721,19 +739,28 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h,
>>> struct list_head *folio_l
>>> 
>>>       list_for_each_entry(folio, folio_list, lru) {
>>>               int ret = __hugetlb_vmemmap_optimize(h, &folio->page,
>>>                                              &vmemmap_pages,
>>>                                              VMEMMAP_REMAP_NO_TLB_FLUSH);
>>> 
>>>               /*
>>>                * Pages to be freed may have been accumulated.  If we
>>>                * encounter an ENOMEM,  free what we have and try again.
>>> +                * This can occur in the case that both spliting fails
>>                                                            ^^^
>>                                                         splitting
>> 
> 
> ok
> 
>>> +                * halfway and head page allocation also failed. In this
>>                                 ^^^^^^^
>>                            head vmemmap page
>> 
> ok
> 
>> Otherwise:
>> 
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>> 
> 
> Thanks, I assume that's for both patches?

Yes. Thanks.

> 
>> Thanks.
>> 
>>> +                * case __hugetlb_vmemmap_optimize() would free memory
>>> +                * allowing more vmemmap remaps to occur.
>>>                */
>>>               if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) {




  reply	other threads:[~2023-09-21  1:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18 23:01 [PATCH v4 0/8] Batch hugetlb vmemmap modification operations Mike Kravetz
2023-09-18 23:01 ` [PATCH v4 1/8] hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles Mike Kravetz
2023-09-18 23:01 ` [PATCH v4 2/8] hugetlb: restructure pool allocations Mike Kravetz
2023-09-18 23:01 ` [PATCH v4 3/8] hugetlb: perform vmemmap optimization on a list of pages Mike Kravetz
2023-09-19  3:10   ` Muchun Song
2023-09-19 20:49     ` Mike Kravetz
2023-09-20  3:05       ` Muchun Song
2023-09-18 23:01 ` [PATCH v4 4/8] hugetlb: perform vmemmap restoration " Mike Kravetz
2023-09-19  9:52   ` Muchun Song
2023-09-19 20:57     ` Mike Kravetz
2023-09-20  2:56       ` Muchun Song
2023-09-20  3:03         ` Muchun Song
2023-09-21  1:12           ` Mike Kravetz
2023-09-21  9:31             ` Muchun Song
2023-09-21  9:47               ` Muchun Song
2023-09-21 21:58               ` Mike Kravetz
2023-09-22  8:19                 ` Muchun Song
2023-09-22 17:01                   ` Mike Kravetz
2023-09-22 17:28                     ` Mike Kravetz
2023-09-18 23:01 ` [PATCH v4 5/8] hugetlb: batch freeing of vmemmap pages Mike Kravetz
2023-09-19  6:09   ` Muchun Song
2023-09-19 21:32     ` Mike Kravetz
2023-09-18 23:01 ` [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup Mike Kravetz
2023-09-19  6:27   ` Muchun Song
2023-09-19  8:18     ` Joao Martins
2023-09-19  6:42   ` Muchun Song
2023-09-19  8:26     ` Joao Martins
2023-09-19  8:41       ` Muchun Song
2023-09-19  8:55         ` Joao Martins
2023-09-19  8:57           ` Muchun Song
2023-09-19 15:09             ` Joao Martins
2023-09-20  2:47               ` Muchun Song
2023-09-20 10:39                 ` Joao Martins
2023-09-21  1:42                   ` Muchun Song [this message]
2023-09-18 23:01 ` [PATCH v4 7/8] hugetlb: batch TLB flushes when freeing vmemmap Mike Kravetz
2023-09-18 23:02 ` [PATCH v4 8/8] hugetlb: batch TLB flushes when restoring vmemmap Mike Kravetz
2023-09-19  6:48   ` Muchun Song
2023-09-19 21:53     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98613579-0644-4532-8DCC-D4BA7183AB15@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.