From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D659E9410A for ; Fri, 6 Oct 2023 19:18:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233392AbjJFTSt (ORCPT ); Fri, 6 Oct 2023 15:18:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233407AbjJFTSr (ORCPT ); Fri, 6 Oct 2023 15:18:47 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF1D2A6 for ; Fri, 6 Oct 2023 12:18:45 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22DE5C433C8; Fri, 6 Oct 2023 19:18:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1696619925; bh=I73V64LOYdDPCgOQn4M17U5NsUR2wlK7Kwfjsm3P1cg=; h=Date:To:From:Subject:From; b=Mu1uokrinEox+9JZka1sQn8z77EwV9kcDeI9y3sm0AUMAk2ddtD/jKBAd3gkYZkpZ 5CqEQ/oztWjBtjibgRwznkRYd3mYGyJ0AioYkJ7rIUA0OSDv9xzLqZB0EKu9WJf3qM enntSWf5RBbfjkDsC+a5o1qtWgHhk8zPiWh98HsI= Date: Fri, 06 Oct 2023 12:18:43 -0700 To: mm-commits@vger.kernel.org, willy@infradead.org, songmuchun@bytedance.com, rientjes@google.com, osalvador@suse.de, naoya.horiguchi@linux.dev, mike.kravetz@oracle.com, mhocko@suse.com, linmiaohe@huawei.com, konradybcio@kernel.org, jthoughton@google.com, duanxiongchun@bytedance.com, david@redhat.com, anshuman.khandual@arm.com, 21cnbao@gmail.com, joao.m.martins@oracle.com, akpm@linux-foundation.org From: Andrew Morton Subject: + hugetlb-batch-tlb-flushes-when-freeing-vmemmap.patch added to mm-unstable branch Message-Id: <20231006191845.22DE5C433C8@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: hugetlb: batch TLB flushes when freeing vmemmap has been added to the -mm mm-unstable branch. Its filename is hugetlb-batch-tlb-flushes-when-freeing-vmemmap.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/hugetlb-batch-tlb-flushes-when-freeing-vmemmap.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Joao Martins Subject: hugetlb: batch TLB flushes when freeing vmemmap Date: Thu, 5 Oct 2023 20:20:09 -0700 Now that a list of pages is deduplicated at once, the TLB flush can be batched for all vmemmap pages that got remapped. Expand the flags field value to pass whether to skip the TLB flush on remap of the PTE. The TLB flush is global as we don't have guarantees from caller that the set of folios is contiguous, or to add complexity in composing a list of kVAs to flush. Modified by Mike Kravetz to perform TLB flush on single folio if an error is encountered. Link: https://lkml.kernel.org/r/20231006032012.296473-8-mike.kravetz@oracle.com Signed-off-by: Joao Martins Signed-off-by: Mike Kravetz Reviewed-by: Muchun Song Cc: Anshuman Khandual Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand Cc: David Rientjes Cc: James Houghton Cc: Konrad Dybcio Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Michal Hocko Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- mm/hugetlb_vmemmap.c | 49 +++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 11 deletions(-) --- a/mm/hugetlb_vmemmap.c~hugetlb-batch-tlb-flushes-when-freeing-vmemmap +++ a/mm/hugetlb_vmemmap.c @@ -40,6 +40,8 @@ struct vmemmap_remap_walk { /* Skip the TLB flush when we split the PMD */ #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) +/* Skip the TLB flush when we remap the PTE */ +#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) unsigned long flags; }; @@ -214,7 +216,7 @@ static int vmemmap_remap_range(unsigned return ret; } while (pgd++, addr = next, addr != end); - if (walk->remap_pte) + if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, end); return 0; @@ -355,19 +357,21 @@ static int vmemmap_remap_split(unsigned * @reuse: reuse address. * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers * responsibility to free pages. + * @flags: modifications to vmemmap_remap_walk flags * * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse, - struct list_head *vmemmap_pages) + struct list_head *vmemmap_pages, + unsigned long flags) { int ret; struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_remap_pte, .reuse_addr = reuse, .vmemmap_pages = vmemmap_pages, - .flags = 0, + .flags = flags, }; int nid = page_to_nid((struct page *)reuse); gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; @@ -629,7 +633,8 @@ static bool vmemmap_should_optimize(cons static int __hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head, - struct list_head *vmemmap_pages) + struct list_head *vmemmap_pages, + unsigned long flags) { int ret = 0; unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; @@ -640,6 +645,18 @@ static int __hugetlb_vmemmap_optimize(co return ret; static_branch_inc(&hugetlb_optimize_vmemmap_key); + /* + * Very Subtle + * If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed + * immediately after remapping. As a result, subsequent accesses + * and modifications to struct pages associated with the hugetlb + * page could be to the OLD struct pages. Set the vmemmap optimized + * flag here so that it is copied to the new head page. This keeps + * the old and new struct pages in sync. + * If there is an error during optimization, we will immediately FLUSH + * the TLB and clear the flag below. + */ + SetHPageVmemmapOptimized(head); vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); vmemmap_reuse = vmemmap_start; @@ -651,11 +668,12 @@ static int __hugetlb_vmemmap_optimize(co * mapping the range to vmemmap_pages list so that they can be freed by * the caller. */ - ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages); - if (ret) + ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, + vmemmap_pages, flags); + if (ret) { static_branch_dec(&hugetlb_optimize_vmemmap_key); - else - SetHPageVmemmapOptimized(head); + ClearHPageVmemmapOptimized(head); + } return ret; } @@ -674,7 +692,7 @@ void hugetlb_vmemmap_optimize(const stru { LIST_HEAD(vmemmap_pages); - __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages); + __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages, 0); free_vmemmap_page_list(&vmemmap_pages); } @@ -719,19 +737,28 @@ void hugetlb_vmemmap_optimize_folios(str list_for_each_entry(folio, folio_list, lru) { int ret = __hugetlb_vmemmap_optimize(h, &folio->page, - &vmemmap_pages); + &vmemmap_pages, + VMEMMAP_REMAP_NO_TLB_FLUSH); /* * Pages to be freed may have been accumulated. If we * encounter an ENOMEM, free what we have and try again. + * This can occur in the case that both spliting fails + * halfway and head page allocation also failed. In this + * case __hugetlb_vmemmap_optimize() would free memory + * allowing more vmemmap remaps to occur. */ if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) { + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); - __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages); + __hugetlb_vmemmap_optimize(h, &folio->page, + &vmemmap_pages, + VMEMMAP_REMAP_NO_TLB_FLUSH); } } + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); } _ Patches currently in -mm which might be from joao.m.martins@oracle.com are hugetlb-batch-pmd-split-for-bulk-vmemmap-dedup.patch hugetlb-batch-tlb-flushes-when-freeing-vmemmap.patch