* [PATCH v3 1/1] mm/mmu_gather: replace IPI with synchronize_rcu() when batch allocation fails
@ 2026-02-24 14:21 Lance Yang
0 siblings, 0 replies; only message in thread
From: Lance Yang @ 2026-02-24 14:21 UTC (permalink / raw)
To: akpm, peterz
Cc: david, dave.hansen, will, aneesh.kumar, npiggin, linux-arch,
linux-mm, linux-kernel, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
When freeing page tables, we try to batch them. If batch allocation fails
(GFP_NOWAIT), __tlb_remove_table_one() immediately frees the one without
batching.
On !CONFIG_PT_RECLAIM, the fallback sends an IPI to all CPUs via
tlb_remove_table_sync_one(). It disrupts all CPUs even when only a single
process is unmapping memory. IPI broadcast was reported to hurt RT
workloads[1].
tlb_remove_table_sync_one() synchronizes with lockless page-table walkers
(e.g. GUP-fast) that rely on IRQ disabling. These walkers use
local_irq_disable(), which is also an RCU read-side critical section.
This patch introduces tlb_remove_table_sync_rcu() which uses RCU grace
period (synchronize_rcu()) instead of IPI broadcast. This provides the
same guarantee as IPI but without disrupting all CPUs. Since batch
allocation already failed, we are in a slow path where sleeping is
acceptable - we are in process context (unmap_region, exit_mmap) with only
mmap_lock held.
tlb_remove_table_sync_one() is retained for other callers (e.g., khugepaged
after pmdp_collapse_flush(), tlb_finish_mmu() when
tlb->fully_unshared_tables) that are not slow paths. Converting those may
require different approaches such as targeted IPIs.
[1] https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
Link: https://lore.kernel.org/linux-mm/20260202150957.GD1282955@noisy.programming.kicks-ass.net/
Link: https://lore.kernel.org/linux-mm/dfdfeac9-5cd5-46fc-a5c1-9ccf9bd3502a@intel.com/
Link: https://lore.kernel.org/linux-mm/bc489455-bb18-44dc-8518-ae75abda6bec@kernel.org/
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
v2 -> v3:
- Remove explicit might_sleep() as synchronize_rcu() already has it
(per Peter)
- Add changelog explanation for why tlb_remove_table_sync_one() is retained
(per Peter)
- Collect Acked-by from David and Peter, thanks!
- https://lore.kernel.org/linux-mm/20260224030700.35857-1-lance.yang@linux.dev/
v1 -> v2:
- Wrap synchronize_rcu() in tlb_remove_table_sync_rcu() with proper
kerneldoc (per David)
- Add might_sleep() to make sleeping constraint explicit (per Dave)
- Clarify this is for synchronization, not memory freeing (per Dave)
- https://lore.kernel.org/linux-mm/20260223033604.10198-1-lance.yang@linux.dev/
include/asm-generic/tlb.h | 4 ++++
mm/mmu_gather.c | 21 ++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 4aeac0c3d3f0..bdcc2778ac64 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -251,6 +251,8 @@ static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
void tlb_remove_table_sync_one(void);
+void tlb_remove_table_sync_rcu(void);
+
#else
#ifdef tlb_needs_table_invalidate
@@ -259,6 +261,8 @@ void tlb_remove_table_sync_one(void);
static inline void tlb_remove_table_sync_one(void) { }
+static inline void tlb_remove_table_sync_rcu(void) { }
+
#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index fe5b6a031717..3985d856de7f 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -296,6 +296,25 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch)
call_rcu(&batch->rcu, tlb_remove_table_rcu);
}
+/**
+ * tlb_remove_table_sync_rcu - synchronize with software page-table walkers
+ *
+ * Like tlb_remove_table_sync_one() but uses RCU grace period instead of IPI
+ * broadcast. Use in slow paths where sleeping is acceptable.
+ *
+ * Software/Lockless page-table walkers use local_irq_disable(), which is also
+ * an RCU read-side critical section. synchronize_rcu() waits for all such
+ * sections, providing the same guarantee as tlb_remove_table_sync_one() but
+ * without disrupting all CPUs with IPIs.
+ *
+ * Do not use for freeing memory. Use RCU callbacks instead to avoid latency
+ * spikes.
+ */
+void tlb_remove_table_sync_rcu(void)
+{
+ synchronize_rcu();
+}
+
#else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */
static void tlb_remove_table_free(struct mmu_table_batch *batch)
@@ -339,7 +358,7 @@ static inline void __tlb_remove_table_one(void *table)
#else
static inline void __tlb_remove_table_one(void *table)
{
- tlb_remove_table_sync_one();
+ tlb_remove_table_sync_rcu();
__tlb_remove_table(table);
}
#endif /* CONFIG_PT_RECLAIM */
--
2.49.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-02-24 14:21 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 14:21 [PATCH v3 1/1] mm/mmu_gather: replace IPI with synchronize_rcu() when batch allocation fails Lance Yang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.