From: Lance Yang <lance.yang@linux.dev>
To: akpm@linux-foundation.org
Cc: will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com,
peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, david@kernel.org,
lorenzo.stoakes@oracle.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, ioworker0@gmail.com, shy828301@gmail.com,
riel@surriel.com, jannh@google.com, linux-arch@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Lance Yang <lance.yang@linux.dev>
Subject: [PATCH RFC 3/3] mm/khugepaged: skip redundant IPI in collapse_huge_page()
Date: Sat, 13 Dec 2025 16:00:25 +0800 [thread overview]
Message-ID: <20251213080038.10917-4-lance.yang@linux.dev> (raw)
In-Reply-To: <20251213080038.10917-1-lance.yang@linux.dev>
From: Lance Yang <lance.yang@linux.dev>
Similar to the hugetlb PMD unsharing optimization, skip the second IPI
in collapse_huge_page() when the TLB flush already provides necessary
synchronization.
Before commit a37259732a7d ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE
unconditional"), bare metal x86 didn't enable MMU_GATHER_RCU_TABLE_FREE.
In that configuration, tlb_remove_table_sync_one() was a NOP. GUP-fast
synchronization relied on IRQ disabling, which blocks TLB flush IPIs.
When Rik made MMU_GATHER_RCU_TABLE_FREE unconditional to support AMD's
INVLPGB, all x86 systems started sending the second IPI. However, on
native x86 this is redundant:
- pmdp_collapse_flush() calls flush_tlb_range(), sending IPIs to all
CPUs to invalidate TLB entries
- GUP-fast runs with IRQs disabled, so when the flush IPI completes,
any concurrent GUP-fast must have finished
- tlb_remove_table_sync_one() provides no additional synchronization
On x86, skip the second IPI when running native (without paravirt) and
without INVLPGB. For paravirt with non-native flush_tlb_multi and for
INVLPGB, conservatively keep both IPIs.
Use tlb_table_flush_implies_ipi_broadcast(), consistent with the hugetlb
optimization.
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/khugepaged.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 97d1b2824386..06ea793a8190 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1178,7 +1178,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
_pmd = pmdp_collapse_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(&range);
- tlb_remove_table_sync_one();
+ /*
+ * Skip the second IPI if the TLB flush above already synchronized
+ * with concurrent GUP-fast via broadcast IPIs.
+ */
+ if (!tlb_table_flush_implies_ipi_broadcast())
+ tlb_remove_table_sync_one();
pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl);
if (pte) {
--
2.49.0
next prev parent reply other threads:[~2025-12-13 8:01 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-13 8:00 [PATCH RFC 0/3] skip redundant TLB sync IPIs Lance Yang
2025-12-13 8:00 ` [PATCH RFC 1/3] mm/tlb: allow architectures to " Lance Yang
2025-12-15 5:48 ` Lance Yang
2025-12-18 13:08 ` David Hildenbrand (Red Hat)
2025-12-13 8:00 ` [PATCH RFC 2/3] x86/mm: implement redundant IPI elimination for PMD unsharing Lance Yang
2025-12-18 13:08 ` David Hildenbrand (Red Hat)
2025-12-22 3:19 ` [PATCH RFC 2/3] x86/mm: implement redundant IPI elimination for Lance Yang
2025-12-23 9:44 ` David Hildenbrand (Red Hat)
2025-12-23 11:13 ` Lance Yang
2025-12-13 8:00 ` Lance Yang [this message]
2025-12-14 13:24 ` [PATCH RFC 3/3] mm/khugepaged: skip redundant IPI in collapse_huge_page() kernel test robot
2025-12-14 13:56 ` kernel test robot
2025-12-18 13:13 ` David Hildenbrand (Red Hat)
2025-12-18 14:35 ` Lance Yang
2025-12-19 8:25 ` David Hildenbrand (Red Hat)
2025-12-21 10:43 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251213080038.10917-4-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=arnd@arndb.de \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hpa@zytor.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mingo@redhat.com \
--cc=npache@redhat.com \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.