From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Muchun Song <songmuchun@bytedance.com>,
Joao Martins <joao.m.martins@oracle.com>,
Oscar Salvador <osalvador@suse.de>,
David Hildenbrand <david@redhat.com>,
Miaohe Lin <linmiaohe@huawei.com>,
David Rientjes <rientjes@google.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Barry Song <song.bao.hua@hisilicon.com>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Xiongchun Duan <duanxiongchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH v3 11/12] hugetlb: batch TLB flushes when freeing vmemmap
Date: Fri, 15 Sep 2023 15:15:44 -0700 [thread overview]
Message-ID: <20230915221548.552084-12-mike.kravetz@oracle.com> (raw)
In-Reply-To: <20230915221548.552084-1-mike.kravetz@oracle.com>
From: Joao Martins <joao.m.martins@oracle.com>
Now that a list of pages is deduplicated at once, the TLB
flush can be batched for all vmemmap pages that got remapped.
Expand the flags field value to pass whether to skip the TLB flush
on remap of the PTE.
The TLB flush is global as we don't have guarantees from caller
that the set of folios is contiguous, or to add complexity in
composing a list of kVAs to flush.
Modified by Mike Kravetz to perform TLB flush on single folio if an
error is encountered.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
mm/hugetlb_vmemmap.c | 44 +++++++++++++++++++++++++++++++++-----------
1 file changed, 33 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c952e95a829c..921f2fa7cf1b 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -37,6 +37,7 @@ struct vmemmap_remap_walk {
unsigned long reuse_addr;
struct list_head *vmemmap_pages;
#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0)
+#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1)
unsigned long flags;
};
@@ -211,7 +212,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end,
return ret;
} while (pgd++, addr = next, addr != end);
- if (walk->remap_pte)
+ if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH))
flush_tlb_kernel_range(start, end);
return 0;
@@ -355,19 +356,21 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end,
* @reuse: reuse address.
* @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers
* responsibility to free pages.
+ * @flags: modifications to vmemmap_remap_walk flags
*
* Return: %0 on success, negative error code otherwise.
*/
static int vmemmap_remap_free(unsigned long start, unsigned long end,
unsigned long reuse,
- struct list_head *vmemmap_pages)
+ struct list_head *vmemmap_pages,
+ unsigned long flags)
{
int ret;
struct vmemmap_remap_walk walk = {
.remap_pte = vmemmap_remap_pte,
.reuse_addr = reuse,
.vmemmap_pages = vmemmap_pages,
- .flags = 0,
+ .flags = flags,
};
int nid = page_to_nid((struct page *)start);
gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
@@ -628,7 +631,8 @@ static bool vmemmap_should_optimize(const struct hstate *h, const struct page *h
static int __hugetlb_vmemmap_optimize(const struct hstate *h,
struct page *head,
- struct list_head *vmemmap_pages)
+ struct list_head *vmemmap_pages,
+ unsigned long flags)
{
int ret = 0;
unsigned long vmemmap_start = (unsigned long)head, vmemmap_end;
@@ -639,6 +643,18 @@ static int __hugetlb_vmemmap_optimize(const struct hstate *h,
return ret;
static_branch_inc(&hugetlb_optimize_vmemmap_key);
+ /*
+ * Very Subtle
+ * If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed
+ * immediately after remapping. As a result, subsequent accesses
+ * and modifications to struct pages associated with the hugetlb
+ * page could be to the OLD struct pages. Set the vmemmap optimized
+ * flag here so that it is copied to the new head page. This keeps
+ * the old and new struct pages in sync.
+ * If there is an error during optimization, we will immediately FLUSH
+ * the TLB and clear the flag below.
+ */
+ SetHPageVmemmapOptimized(head);
vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
vmemmap_reuse = vmemmap_start;
@@ -650,11 +666,12 @@ static int __hugetlb_vmemmap_optimize(const struct hstate *h,
* mapping the range to vmemmap_pages list so that they can be freed by
* the caller.
*/
- ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages);
- if (ret)
+ ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse,
+ vmemmap_pages, flags);
+ if (ret) {
static_branch_dec(&hugetlb_optimize_vmemmap_key);
- else
- SetHPageVmemmapOptimized(head);
+ ClearHPageVmemmapOptimized(head);
+ }
return ret;
}
@@ -673,7 +690,7 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head)
{
LIST_HEAD(vmemmap_pages);
- __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages);
+ __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages, 0);
free_vmemmap_page_list(&vmemmap_pages);
}
@@ -708,19 +725,24 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l
list_for_each_entry(folio, folio_list, lru) {
int ret = __hugetlb_vmemmap_optimize(h, &folio->page,
- &vmemmap_pages);
+ &vmemmap_pages,
+ VMEMMAP_REMAP_NO_TLB_FLUSH);
/*
* Pages may have been accumulated, thus free what we have
* and try again.
*/
if (ret == -ENOMEM) {
+ flush_tlb_all();
free_vmemmap_page_list(&vmemmap_pages);
INIT_LIST_HEAD(&vmemmap_pages);
- __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages);
+ __hugetlb_vmemmap_optimize(h, &folio->page,
+ &vmemmap_pages,
+ VMEMMAP_REMAP_NO_TLB_FLUSH);
}
}
+ flush_tlb_all();
free_vmemmap_page_list(&vmemmap_pages);
}
--
2.41.0
next prev parent reply other threads:[~2023-09-15 22:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-15 22:15 [PATCH v3 00/12] Batch hugetlb vmemmap modification operations Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 01/12] mm: page_alloc: remove pcppage migratetype caching fix Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 02/12] hugetlb: Use a folio in free_hpage_workfn() Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 03/12] hugetlb: Remove a few calls to page_folio() Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 04/12] hugetlb: Convert remove_pool_huge_page() to remove_pool_hugetlb_folio() Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 05/12] hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 06/12] hugetlb: restructure pool allocations Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 07/12] hugetlb: perform vmemmap optimization on a list of pages Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 08/12] hugetlb: perform vmemmap restoration " Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 09/12] hugetlb: batch freeing of vmemmap pages Mike Kravetz
2023-09-15 22:15 ` [PATCH v3 10/12] hugetlb: batch PMD split for bulk vmemmap dedup Mike Kravetz
2023-09-15 22:15 ` Mike Kravetz [this message]
2023-09-15 22:15 ` [PATCH v3 12/12] hugetlb: batch TLB flushes when restoring vmemmap Mike Kravetz
2023-09-15 22:22 ` [PATCH v3 00/12] Batch hugetlb vmemmap modification operations Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230915221548.552084-12-mike.kravetz@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=david@redhat.com \
--cc=duanxiongchun@bytedance.com \
--cc=joao.m.martins@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=naoya.horiguchi@linux.dev \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=song.bao.hua@hisilicon.com \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).