From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C45C5204011 for ; Fri, 18 Oct 2024 17:27:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729272425; cv=none; b=PlNs+ocmfLDHZwliQyCQNkzbZ9EQQTMADaJcpWAn4mAfem58rk6G3q2CdgddBZBz48qB6T3sgu/0TRT5SIKu4VowOhjd6G4Stp5ZXvqIxpaFnQbzAcPPGBWDEjCLlmiaYsOVOZNusMi3Jk25Bj9ySXk7aO4AmcO8qjG/bO7jEYw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729272425; c=relaxed/simple; bh=w7kB5wib5tndfp7xuvkvAYZPeetX211sxv2O+JjC7+o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=XfDsX77KjEWwA45meK6t06jehuv8dYVv7nZbjclTzdXSmtPMsAbujSL4kNPHQ2BgKd+iszS2rQQTPcnXbJ78w0kgJEzM5WytMxDqQeNz/RGjVjJNIZE9wOCCDyQI8HJv+E9pZPvACu/XQ0PufL5IlEeFYNT7R0yjU2V5yRnXkNg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BQfK/Ubz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BQfK/Ubz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1EFD7C4CEC5; Fri, 18 Oct 2024 17:27:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729272425; bh=w7kB5wib5tndfp7xuvkvAYZPeetX211sxv2O+JjC7+o=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=BQfK/UbzmrDz0HatLn8pFosGBLg09LPhVxbcYekZtfWVk41lVgL1S7nT1Efb69rZj VoL8cfmzR8DLLhmXLdlXNVMvvyFyMOU7a1mVOZVqa61BZJcbr5iTFotyADxpLw0Yqg PWTHq5c6p2gDBZrmFIUu/xhZOOPDKrQ9QdbgAMQX+kbor+K1Zs5hO8ZsoxiGYOygnT HziOJ4IlzwGh/ftzK4KCDRVNw0cZb0XNq4UGe6ny39AJ0mN12yWAepn5NR5CfNEkQG 4LLemNfoj1giHcOZVRTHI2WrRePersd9nmG7L63xPrNNI2Munpb3agdznueF2NU3cK XtBj2vU4NffQQ== From: chrisl@kernel.org Date: Fri, 18 Oct 2024 10:27:04 -0700 Subject: [PATCH 6.11.y v2 1/3] mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20241018-stable-yuzhao-v2-1-1fd556716eda@kernel.org> References: <20241018-stable-yuzhao-v2-0-1fd556716eda@kernel.org> In-Reply-To: <20241018-stable-yuzhao-v2-0-1fd556716eda@kernel.org> To: stable@vger.kernel.org Cc: Greg KH , Muchun Song , Andrew Morton , Yu Zhao , Suren Baghdasaryan , Kent Overstreet , Vlastimil Babka , kernel test robot , Janosch Frank , Marc Hartmayer , Chris Li X-Mailer: b4 0.13.0 From: Yu Zhao [ Upstream commit c2a967f6ab0ec896648c0497d3dc15d8f136b148 ] hugetlb_vmemmap_optimize_folio() and hugetlb_vmemmap_restore_folio() are wrappers meant to be called regardless of whether HVO is enabled. Therefore, they should not call synchronize_rcu(). Otherwise, it regresses use cases not enabling HVO. So move synchronize_rcu() to __hugetlb_vmemmap_optimize_folio() and __hugetlb_vmemmap_restore_folio(), and call it once for each batch of folios when HVO is enabled. Link: https://lkml.kernel.org/r/20240719042503.2752316-1-yuzhao@google.com Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202407091001.1250ad4a-oliver.sang@intel.com Reported-by: Janosch Frank Tested-by: Marc Hartmayer Acked-by: Muchun Song Signed-off-by: Andrew Morton Signed-off-by: Chris Li --- mm/hugetlb_vmemmap.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 0c3f56b3578eb..57b7f591eee82 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -43,6 +43,8 @@ struct vmemmap_remap_walk { #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) /* Skip the TLB flush when we remap the PTE */ #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) +/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ +#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) unsigned long flags; }; @@ -457,6 +459,9 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, if (!folio_test_hugetlb_vmemmap_optimized(folio)) return 0; + if (flags & VMEMMAP_SYNCHRONIZE_RCU) + synchronize_rcu(); + vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); vmemmap_reuse = vmemmap_start; vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; @@ -489,10 +494,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, */ int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *folio) { - /* avoid writes from page_ref_add_unless() while unfolding vmemmap */ - synchronize_rcu(); - - return __hugetlb_vmemmap_restore_folio(h, folio, 0); + return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU); } /** @@ -515,14 +517,14 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, struct folio *folio, *t_folio; long restored = 0; long ret = 0; - - /* avoid writes from page_ref_add_unless() while unfolding vmemmap */ - synchronize_rcu(); + unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; list_for_each_entry_safe(folio, t_folio, folio_list, lru) { if (folio_test_hugetlb_vmemmap_optimized(folio)) { - ret = __hugetlb_vmemmap_restore_folio(h, folio, - VMEMMAP_REMAP_NO_TLB_FLUSH); + ret = __hugetlb_vmemmap_restore_folio(h, folio, flags); + /* only need to synchronize_rcu() once for each batch */ + flags &= ~VMEMMAP_SYNCHRONIZE_RCU; + if (ret) break; restored++; @@ -570,6 +572,9 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h, return ret; static_branch_inc(&hugetlb_optimize_vmemmap_key); + + if (flags & VMEMMAP_SYNCHRONIZE_RCU) + synchronize_rcu(); /* * Very Subtle * If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed @@ -617,10 +622,7 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio) { LIST_HEAD(vmemmap_pages); - /* avoid writes from page_ref_add_unless() while folding vmemmap */ - synchronize_rcu(); - - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, 0); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHRONIZE_RCU); free_vmemmap_page_list(&vmemmap_pages); } @@ -647,6 +649,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l { struct folio *folio; LIST_HEAD(vmemmap_pages); + unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; list_for_each_entry(folio, folio_list, lru) { int ret = hugetlb_vmemmap_split_folio(h, folio); @@ -663,14 +666,12 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l flush_tlb_all(); - /* avoid writes from page_ref_add_unless() while folding vmemmap */ - synchronize_rcu(); - list_for_each_entry(folio, folio_list, lru) { int ret; - ret = __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, - VMEMMAP_REMAP_NO_TLB_FLUSH); + ret = __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); + /* only need to synchronize_rcu() once for each batch */ + flags &= ~VMEMMAP_SYNCHRONIZE_RCU; /* * Pages to be freed may have been accumulated. If we @@ -684,8 +685,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, - VMEMMAP_REMAP_NO_TLB_FLUSH); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); } } -- 2.47.0.rc1.288.g06298d1525-goog