* [PATCH v3 0/2] skip redundant TLB sync IPIs
@ 2026-01-06 11:50 lance.yang
2026-01-06 11:50 ` [PATCH v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized lance.yang
2026-01-06 11:50 ` [PATCH v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI lance.yang
0 siblings, 2 replies; 3+ messages in thread
From: lance.yang @ 2026-01-06 11:50 UTC (permalink / raw)
To: akpm
Cc: david, dave.hansen, dave.hansen, will, aneesh.kumar, npiggin,
peterz, tglx, mingo, bp, x86, hpa, arnd, lorenzo.stoakes, ziy,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
shy828301, riel, jannh, linux-arch, linux-mm, linux-kernel,
ioworker0
Hi all,
When unsharing hugetlb PMD page tables or collapsing pages in khugepaged,
we send two IPIs: one for TLB invalidation, and another to synchronize
with concurrent GUP-fast walkers. However, if the TLB flush already
reaches all CPUs, the second IPI is redundant. GUP-fast runs with IRQs
disabled, so when the TLB flush IPI completes, any concurrent GUP-fast
must have finished.
We now track whether IPIs were actually sent during TLB flush. We pass
the mmu_gather context through the flush path, and native_flush_tlb_multi()
sets a flag when sending IPIs. Works with PV and INVLPGB since only
native_flush_tlb_multi() sets the flag - no matter what replaces
pv_ops.mmu.flush_tlb_multi or whether INVLPGB is available.
David Hildenbrand did the initial implementation. I built on his work and
relied on off-list discussions to push it further - thanks a lot David!
v2 -> v3:
- Complete rewrite: use dynamic IPI tracking instead of static checks
(per Dave Hansen, thanks!)
- Track IPIs via mmu_gather: native_flush_tlb_multi() sets flag when
actually sending IPIs
- Motivation for skipping redundant IPIs explained by David:
https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
- https://lore.kernel.org/linux-mm/20251229145245.85452-1-lance.yang@linux.dev/
v1 -> v2:
- Fix cover letter encoding to resolve send-email issues. Apologies for
any email flood caused by the failed send attempts :(
RFC -> v1:
- Use a callback function in pv_mmu_ops instead of comparing function
pointers (per David)
- Embed the check directly in tlb_remove_table_sync_one() instead of
requiring every caller to check explicitly (per David)
- Move tlb_table_flush_implies_ipi_broadcast() outside of
CONFIG_MMU_GATHER_RCU_TABLE_FREE to fix build error on architectures
that don't enable this config.
https://lore.kernel.org/oe-kbuild-all/202512142156.cShiu6PU-lkp@intel.com/
- https://lore.kernel.org/linux-mm/20251213080038.10917-1-lance.yang@linux.dev/
Lance Yang (2):
mm/tlb: skip redundant IPI when TLB flush already synchronized
mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI
arch/x86/include/asm/tlb.h | 3 ++-
arch/x86/include/asm/tlbflush.h | 9 +++++----
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/ldt.c | 2 +-
arch/x86/mm/tlb.c | 22 +++++++++++++++------
include/asm-generic/tlb.h | 14 +++++++++-----
include/linux/pgtable.h | 13 +++++++++----
mm/khugepaged.c | 9 +++------
mm/mmu_gather.c | 26 ++++++++++++++++++-------
mm/pgtable-generic.c | 34 +++++++++++++++++++++++++++++++++
10 files changed, 99 insertions(+), 35 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized
2026-01-06 11:50 [PATCH v3 0/2] skip redundant TLB sync IPIs lance.yang
@ 2026-01-06 11:50 ` lance.yang
2026-01-06 11:50 ` [PATCH v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI lance.yang
1 sibling, 0 replies; 3+ messages in thread
From: lance.yang @ 2026-01-06 11:50 UTC (permalink / raw)
To: akpm
Cc: david, dave.hansen, dave.hansen, will, aneesh.kumar, npiggin,
peterz, tglx, mingo, bp, x86, hpa, arnd, lorenzo.stoakes, ziy,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
shy828301, riel, jannh, linux-arch, linux-mm, linux-kernel,
ioworker0, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
When unsharing hugetlb PMD page tables, we currently send two IPIs: one
for TLB invalidation, and another to synchronize with concurrent GUP-fast
walkers via tlb_remove_table_sync_one().
However, if the TLB flush already sent IPIs to all CPUs (when freed_tables
or unshared_tables is true), the second IPI is redundant. GUP-fast runs
with IRQs disabled, so when the TLB flush IPI completes, any concurrent
GUP-fast must have finished.
To avoid the redundant IPI, we add a flag to mmu_gather to track whether
the TLB flush sent IPIs. We pass the mmu_gather pointer through the TLB
flush path via flush_tlb_info, so native_flush_tlb_multi() can set the
flag when it sends IPIs for freed_tables. We also set the flag for
local-only flushes, since disabling IRQs provides the same guarantee.
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
arch/x86/include/asm/tlb.h | 3 ++-
arch/x86/include/asm/tlbflush.h | 9 +++++----
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/ldt.c | 2 +-
arch/x86/mm/tlb.c | 22 ++++++++++++++++------
include/asm-generic/tlb.h | 14 +++++++++-----
mm/mmu_gather.c | 26 +++++++++++++++++++-------
7 files changed, 53 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 866ea78ba156..c5950a92058c 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -20,7 +20,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
end = tlb->end;
}
- flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
+ flush_tlb_mm_range(tlb->mm, start, end, stride_shift,
+ tlb->freed_tables || tlb->unshared_tables, tlb);
}
static inline void invlpg(unsigned long addr)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 00daedfefc1b..83c260c88b80 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -220,6 +220,7 @@ struct flush_tlb_info {
* will be zero.
*/
struct mm_struct *mm;
+ struct mmu_gather *tlb;
unsigned long start;
unsigned long end;
u64 new_tlb_gen;
@@ -305,23 +306,23 @@ static inline bool mm_in_asid_transition(struct mm_struct *mm) { return false; }
#endif
#define flush_tlb_mm(mm) \
- flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
+ flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true, NULL)
#define flush_tlb_range(vma, start, end) \
flush_tlb_mm_range((vma)->vm_mm, start, end, \
((vma)->vm_flags & VM_HUGETLB) \
? huge_page_shift(hstate_vma(vma)) \
- : PAGE_SHIFT, true)
+ : PAGE_SHIFT, true, NULL)
extern void flush_tlb_all(void);
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned int stride_shift,
- bool freed_tables);
+ bool freed_tables, struct mmu_gather *tlb);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
{
- flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
+ flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false, NULL);
}
static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 28518371d8bf..006f3705b616 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2572,7 +2572,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
*/
flush_tlb_mm_range(text_poke_mm, text_poke_mm_addr, text_poke_mm_addr +
(cross_page_boundary ? 2 : 1) * PAGE_SIZE,
- PAGE_SHIFT, false);
+ PAGE_SHIFT, false, NULL);
if (func == text_poke_memcpy) {
/*
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 0f19ef355f5f..d8494706fec5 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -374,7 +374,7 @@ static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
}
va = (unsigned long)ldt_slot_va(ldt->slot);
- flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false);
+ flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false, NULL);
}
#else /* !CONFIG_MITIGATION_PAGE_TABLE_ISOLATION */
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f5b93e01e347..be45976c0d16 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1374,6 +1374,9 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
else
on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
+
+ if (info->freed_tables && info->tlb)
+ info->tlb->tlb_flush_sent_ipi = true;
}
void flush_tlb_multi(const struct cpumask *cpumask,
@@ -1403,7 +1406,7 @@ static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx);
static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
unsigned long start, unsigned long end,
unsigned int stride_shift, bool freed_tables,
- u64 new_tlb_gen)
+ u64 new_tlb_gen, struct mmu_gather *tlb)
{
struct flush_tlb_info *info = this_cpu_ptr(&flush_tlb_info);
@@ -1433,6 +1436,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
info->new_tlb_gen = new_tlb_gen;
info->initiating_cpu = smp_processor_id();
info->trim_cpumask = 0;
+ info->tlb = tlb;
return info;
}
@@ -1447,8 +1451,8 @@ static void put_flush_tlb_info(void)
}
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, unsigned int stride_shift,
- bool freed_tables)
+ unsigned long end, unsigned int stride_shift,
+ bool freed_tables, struct mmu_gather *tlb)
{
struct flush_tlb_info *info;
int cpu = get_cpu();
@@ -1458,7 +1462,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
new_tlb_gen = inc_mm_tlb_gen(mm);
info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables,
- new_tlb_gen);
+ new_tlb_gen, tlb);
/*
* flush_tlb_multi() is not optimized for the common case in which only
@@ -1476,6 +1480,12 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
local_irq_disable();
flush_tlb_func(info);
local_irq_enable();
+ /*
+ * Only current CPU uses this mm, so we can treat this as
+ * having synchronized with GUP-fast. No sync IPI needed.
+ */
+ if (tlb && freed_tables)
+ tlb->tlb_flush_sent_ipi = true;
}
put_flush_tlb_info();
@@ -1553,7 +1563,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
guard(preempt)();
info = get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false,
- TLB_GENERATION_INVALID);
+ TLB_GENERATION_INVALID, NULL);
if (info->end == TLB_FLUSH_ALL)
kernel_tlb_flush_all(info);
@@ -1733,7 +1743,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
int cpu = get_cpu();
info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false,
- TLB_GENERATION_INVALID);
+ TLB_GENERATION_INVALID, NULL);
/*
* flush_tlb_multi() is not optimized for the common case in which only
* a local TLB flush is needed. Optimize this use-case by calling
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3975f7d11553..cbbe008590ee 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -249,6 +249,7 @@ static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
#define tlb_needs_table_invalidate() (true)
#endif
+void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb);
void tlb_remove_table_sync_one(void);
#else
@@ -257,6 +258,7 @@ void tlb_remove_table_sync_one(void);
#error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE
#endif
+static inline void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb) { }
static inline void tlb_remove_table_sync_one(void) { }
#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
@@ -378,6 +380,12 @@ struct mmu_gather {
*/
unsigned int fully_unshared_tables : 1;
+ /*
+ * Did the TLB flush for freed/unshared tables send IPIs to all CPUs?
+ * If true, we can skip the redundant IPI in tlb_remove_table_sync_one().
+ */
+ unsigned int tlb_flush_sent_ipi : 1;
+
unsigned int batch_count;
#ifndef CONFIG_MMU_GATHER_NO_GATHER
@@ -833,13 +841,9 @@ static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb)
*
* We only perform this when we are the last sharer of a page table,
* as the IPI will reach all CPUs: any GUP-fast.
- *
- * Note that on configs where tlb_remove_table_sync_one() is a NOP,
- * the expectation is that the tlb_flush_mmu_tlbonly() would have issued
- * required IPIs already for us.
*/
if (tlb->fully_unshared_tables) {
- tlb_remove_table_sync_one();
+ tlb_gather_remove_table_sync_one(tlb);
tlb->fully_unshared_tables = false;
}
}
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 2faa23d7f8d4..da36de52b281 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -273,8 +273,14 @@ static void tlb_remove_table_smp_sync(void *arg)
/* Simply deliver the interrupt */
}
-void tlb_remove_table_sync_one(void)
+void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb)
{
+ /* Skip the IPI if the TLB flush already synchronized with other CPUs */
+ if (tlb && tlb->tlb_flush_sent_ipi) {
+ tlb->tlb_flush_sent_ipi = false;
+ return;
+ }
+
/*
* This isn't an RCU grace period and hence the page-tables cannot be
* assumed to be actually RCU-freed.
@@ -285,6 +291,11 @@ void tlb_remove_table_sync_one(void)
smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
}
+void tlb_remove_table_sync_one(void)
+{
+ tlb_gather_remove_table_sync_one(NULL);
+}
+
static void tlb_remove_table_rcu(struct rcu_head *head)
{
__tlb_remove_table_free(container_of(head, struct mmu_table_batch, rcu));
@@ -328,7 +339,7 @@ static inline void __tlb_remove_table_one_rcu(struct rcu_head *head)
__tlb_remove_table(ptdesc);
}
-static inline void __tlb_remove_table_one(void *table)
+static inline void __tlb_remove_table_one(void *table, struct mmu_gather *tlb)
{
struct ptdesc *ptdesc;
@@ -336,16 +347,16 @@ static inline void __tlb_remove_table_one(void *table)
call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
}
#else
-static inline void __tlb_remove_table_one(void *table)
+static inline void __tlb_remove_table_one(void *table, struct mmu_gather *tlb)
{
- tlb_remove_table_sync_one();
+ tlb_gather_remove_table_sync_one(tlb);
__tlb_remove_table(table);
}
#endif /* CONFIG_PT_RECLAIM */
-static void tlb_remove_table_one(void *table)
+static void tlb_remove_table_one(void *table, struct mmu_gather *tlb)
{
- __tlb_remove_table_one(table);
+ __tlb_remove_table_one(table, tlb);
}
static void tlb_table_flush(struct mmu_gather *tlb)
@@ -367,7 +378,7 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT);
if (*batch == NULL) {
tlb_table_invalidate(tlb);
- tlb_remove_table_one(table);
+ tlb_remove_table_one(table, tlb);
return;
}
(*batch)->nr = 0;
@@ -427,6 +438,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
tlb->vma_pfn = 0;
tlb->fully_unshared_tables = 0;
+ tlb->tlb_flush_sent_ipi = 0;
__tlb_reset_range(tlb);
inc_tlb_flush_pending(tlb->mm);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI
2026-01-06 11:50 [PATCH v3 0/2] skip redundant TLB sync IPIs lance.yang
2026-01-06 11:50 ` [PATCH v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized lance.yang
@ 2026-01-06 11:50 ` lance.yang
1 sibling, 0 replies; 3+ messages in thread
From: lance.yang @ 2026-01-06 11:50 UTC (permalink / raw)
To: akpm
Cc: david, dave.hansen, dave.hansen, will, aneesh.kumar, npiggin,
peterz, tglx, mingo, bp, x86, hpa, arnd, lorenzo.stoakes, ziy,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
shy828301, riel, jannh, linux-arch, linux-mm, linux-kernel,
ioworker0, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
pmdp_collapse_flush() may already send IPIs to flush TLBs, and then
callers send another IPI via tlb_remove_table_sync_one() or
pmdp_get_lockless_sync() to synchronize with concurrent GUP-fast walkers.
However, since GUP-fast runs with IRQs disabled, the TLB flush IPI already
provides the necessary synchronization. We can avoid the redundant second
IPI.
Introduce pmdp_collapse_flush_sync() which combines flush and sync:
- For architectures using the generic pmdp_collapse_flush() implementation
(e.g., x86): Use mmu_gather to track IPI sends. If the TLB flush sent
an IPI, tlb_gather_remove_table_sync_one() will skip the redundant one.
- For architectures with custom pmdp_collapse_flush() (s390, riscv,
powerpc): Fall back to calling pmdp_collapse_flush() followed by
tlb_remove_table_sync_one(). No behavior change.
Update khugepaged to use pmdp_collapse_flush_sync() instead of separate
flush and sync calls. Remove the now-unused pmdp_get_lockless_sync() macro.
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
include/linux/pgtable.h | 13 +++++++++----
mm/khugepaged.c | 9 +++------
mm/pgtable-generic.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 46 insertions(+), 10 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index eb8aacba3698..69e290dab450 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
return pmd;
}
#define pmdp_get_lockless pmdp_get_lockless
-#define pmdp_get_lockless_sync() tlb_remove_table_sync_one()
#endif /* CONFIG_PGTABLE_LEVELS > 2 */
#endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */
@@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
{
return pmdp_get(pmdp);
}
-static inline void pmdp_get_lockless_sync(void)
-{
-}
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -1174,6 +1170,8 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm,
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
+extern pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
#else
static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
unsigned long address,
@@ -1182,6 +1180,13 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
BUILD_BUG();
return *pmdp;
}
+static inline pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma,
+ unsigned long address,
+ pmd_t *pmdp)
+{
+ BUILD_BUG();
+ return *pmdp;
+}
#define pmdp_collapse_flush pmdp_collapse_flush
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9f790ec34400..0a98afc85c50 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1177,10 +1177,9 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
* Parallel GUP-fast is fine since GUP-fast will back off when
* it detects PMD is changed.
*/
- _pmd = pmdp_collapse_flush(vma, address, pmd);
+ _pmd = pmdp_collapse_flush_sync(vma, address, pmd);
spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(&range);
- tlb_remove_table_sync_one();
pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl);
if (pte) {
@@ -1663,8 +1662,7 @@ static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
}
}
}
- pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
- pmdp_get_lockless_sync();
+ pgt_pmd = pmdp_collapse_flush_sync(vma, haddr, pmd);
pte_unmap_unlock(start_pte, ptl);
if (ptl != pml)
spin_unlock(pml);
@@ -1817,8 +1815,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* races against the prior checks.
*/
if (likely(file_backed_vma_is_retractable(vma))) {
- pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
- pmdp_get_lockless_sync();
+ pgt_pmd = pmdp_collapse_flush_sync(vma, addr, pmd);
success = true;
}
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index d3aec7a9926a..be2ee82e6fc4 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -233,6 +233,40 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return pmd;
}
+
+pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ struct mmu_gather tlb;
+ pmd_t pmd;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+ VM_BUG_ON(pmd_trans_huge(*pmdp));
+
+ tlb_gather_mmu(&tlb, vma->vm_mm);
+ pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
+
+ flush_tlb_mm_range(vma->vm_mm, address, address + HPAGE_PMD_SIZE,
+ PAGE_SHIFT, true, &tlb);
+
+ /*
+ * Synchronize with GUP-fast. If the flush sent IPIs, skip the
+ * redundant sync IPI.
+ */
+ tlb_gather_remove_table_sync_one(&tlb);
+ tlb_finish_mmu(&tlb);
+ return pmd;
+}
+#else
+pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ pmd_t pmd;
+
+ pmd = pmdp_collapse_flush(vma, address, pmdp);
+ tlb_remove_table_sync_one();
+ return pmd;
+}
#endif
/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */
--
2.49.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-01-06 11:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06 11:50 [PATCH v3 0/2] skip redundant TLB sync IPIs lance.yang
2026-01-06 11:50 ` [PATCH v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized lance.yang
2026-01-06 11:50 ` [PATCH v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI lance.yang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.