[PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups
@ 2026-06-10 23:04 Dave Hansen
  2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4328 bytes --]

tl;dr: Make per-VMA locks available in all configs. Simplify some
of the per-VMA lock users now that they can rely on them being
always available.

Binder and networking folks: Your code is the target of the cleanups.
I'm cc'ing you now on v2 because there's emerging consensus on the mm
side that the approach here is sane. I'm not quite sure how this pile
would get merged, but ack/review tags would be appreciated if this
looks good to you.

Longer version:

When working on some x86 shadow stack code, it was a real pain to
avoid causing recursive locking problems with mmap_lock. One way
to avoid those was to avoid mmap_lock and use per-VMA locks instead.
They are great, but they are not available in all configs which
makes them unusable in generic code, or if you want to completely
avoid mmap_lock.

Make per-VMA locks available in all configs. Right now, they are
only available on select architectures when SMP and MMU are enabled.
But all of the primitives that per-VMA locks are built on (RCU, maple
trees, refcounts) work just fine without SMP or MMU.

The only real downside is that making VMAs a wee bit bigger on !MMU
and !SMP builds.

The upside is much cleaner code, lower complexity and less #ifdeffery.

Clean up a binder VMA locking site now that it can rely on per-VMA
locks.

Building on top of universally-available per-VMA locks, introduce a
new helper. Since the new API does not require callers to have a
fallback to mmap_lock, it's much easier to use. Callers can
potentially replace this very common kernel idiom:

	mmap_read_lock(mm);
	vma = vma_lookup()
	// fiddle with vma
	mmap_read_unlock(mm);

with:

	vma = vma_start_read_unlocked(mm, address);
	// fiddle with vma
	vma_end_read(vma);

Which avoids mmap_lock entirely in the fast path.

Use that new API for another binder site and one in the TCP code.

Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

Changes from v1:
 * Better naming and non-loopy, simpler implementation.
   Thanks Suren and Lorenzo!
 * Cc networking and binder folks
 * Add tags. Thanks reviewers!
 * Drop x86 shadow stack changes

 arch/arm/Kconfig                       |    1 
 arch/arm64/Kconfig                     |    1 
 arch/loongarch/Kconfig                 |    1 
 arch/powerpc/platforms/powernv/Kconfig |    1 
 arch/powerpc/platforms/pseries/Kconfig |    1 
 arch/riscv/Kconfig                     |    1 
 arch/s390/Kconfig                      |    1 
 arch/x86/Kconfig                       |    2 -
 drivers/android/binder_alloc.c         |   43 ++++++++-----------------
 fs/proc/internal.h                     |    2 -
 fs/proc/task_mmu.c                     |   51 ------------------------------
 include/linux/mm.h                     |   12 -------
 include/linux/mm_types.h               |    7 ----
 include/linux/mmap_lock.h              |   51 +-----------------------------
 kernel/bpf/task_iter.c                 |    5 ---
 kernel/fork.c                          |    2 -
 mm/Kconfig                             |   13 -------
 mm/Kconfig.debug                       |    1 
 mm/debug.c                             |    4 --
 mm/init-mm.c                           |    2 -
 mm/memory.c                            |    2 -
 mm/mmap_lock.c                         |   51 ++++++++++++++++--------------
 mm/pagewalk.c                          |    2 -
 mm/rmap.c                              |    2 -
 mm/userfaultfd.c                       |   55 ---------------------------------
 net/ipv4/tcp.c                         |   31 +++++-------------
 rust/kernel/mm.rs                      |    7 ----
 tools/testing/vma/include/dup.h        |    4 --
 tools/testing/vma/vma_internal.h       |    1 
 29 files changed, 54 insertions(+), 303 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/5] mm: Make per-VMA locks available universally
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
@ 2026-06-10 23:04 ` Dave Hansen
  2026-06-11 19:29   ` Suren Baghdasaryan
  2026-06-12 14:12   ` Vlastimil Babka (SUSE)
  2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 27337 bytes --]


From: Dave Hansen <dave.hansen@linux.intel.com>

The per-VMA locks have been around for several years. They've had some
bugs worked out of them and have seen quite wide use. However, they
are still only available when architectures explicitly enable them.
Remove the conditional compilation around the per-VMA locks, making
them available on all architectures and configs.

The approach up to now seemed to be to add ARCH_SUPPORTS_PER_VMA_LOCK
when the architecture started using per-VMA locks in the fault
handler. But, contrary to the naming, the Kconfig option does not
really indicate whether the architecture supports per-VMA locks or
not. It is more of a marker for whether the architecture is likely to
benefit from per-VMA locks.

To me, the most important thing side-effect of universal availability
is letting per-VMA locks be used in SMP=n configs. This lets us use
per-VMA locking in all x86 code without fallbacks.

Overall, this just generally makes the kernel simpler. Just look at
the diffstat. It also opens the door to users that want to use the
per-VMA locks in common code. Doing *that* brings additional
simplifications.

The downside of this is adding some fields to vm_area_struct and
mm_struct. There are likely ways to optimize this, especially for
things like SMP=n configs. For now, do the simplest thing: use the
same implementation everywhere.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

--

Changes from v1:
 * Remove a bunch of left over CONFIG_PER_VMA_LOCKs
 * Trim some speculation out of the changelog
---

 b/arch/arm/Kconfig                       |    1 
 b/arch/arm64/Kconfig                     |    1 
 b/arch/loongarch/Kconfig                 |    1 
 b/arch/powerpc/platforms/powernv/Kconfig |    1 
 b/arch/powerpc/platforms/pseries/Kconfig |    1 
 b/arch/riscv/Kconfig                     |    1 
 b/arch/s390/Kconfig                      |    1 
 b/arch/x86/Kconfig                       |    2 -
 b/fs/proc/internal.h                     |    2 -
 b/fs/proc/task_mmu.c                     |   51 ----------------------------
 b/include/linux/mm.h                     |   12 ------
 b/include/linux/mm_types.h               |    7 ---
 b/include/linux/mmap_lock.h              |   48 ---------------------------
 b/kernel/bpf/task_iter.c                 |    5 --
 b/kernel/fork.c                          |    2 -
 b/mm/Kconfig                             |   13 -------
 b/mm/Kconfig.debug                       |    1 
 b/mm/debug.c                             |    4 --
 b/mm/init-mm.c                           |    2 -
 b/mm/memory.c                            |    2 -
 b/mm/mmap_lock.c                         |   24 -------------
 b/mm/pagewalk.c                          |    2 -
 b/mm/rmap.c                              |    2 -
 b/mm/userfaultfd.c                       |   55 -------------------------------
 b/rust/kernel/mm.rs                      |    7 ---
 b/tools/testing/vma/include/dup.h        |    4 --
 b/tools/testing/vma/vma_internal.h       |    1 
 27 files changed, 1 insertion(+), 252 deletions(-)

diff -puN arch/arm64/Kconfig~unconditional-vma-locks arch/arm64/Kconfig
--- a/arch/arm64/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.491348630 -0700
+++ b/arch/arm64/Kconfig	2026-06-10 15:57:54.069369179 -0700
@@ -80,7 +80,6 @@ config ARM64
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_SUPPORTS_PAGE_TABLE_CHECK
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
 	select ARCH_SUPPORTS_RT
 	select ARCH_SUPPORTS_SCHED_SMT
diff -puN arch/arm/Kconfig~unconditional-vma-locks arch/arm/Kconfig
--- a/arch/arm/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.499348914 -0700
+++ b/arch/arm/Kconfig	2026-06-10 15:57:54.070369215 -0700
@@ -41,7 +41,6 @@ config ARM
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_CFI
 	select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_RT
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
diff -puN arch/loongarch/Kconfig~unconditional-vma-locks arch/loongarch/Kconfig
--- a/arch/loongarch/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.542350439 -0700
+++ b/arch/loongarch/Kconfig	2026-06-10 15:57:54.070369215 -0700
@@ -68,7 +68,6 @@ config LOONGARCH
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_SUPPORTS_NUMA_BALANCING if NUMA
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_RT
 	select ARCH_SUPPORTS_SCHED_SMT if SMP
 	select ARCH_SUPPORTS_SCHED_MC  if SMP
diff -puN arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks arch/powerpc/platforms/powernv/Kconfig
--- a/arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.544350510 -0700
+++ b/arch/powerpc/platforms/powernv/Kconfig	2026-06-10 15:57:54.070369215 -0700
@@ -17,7 +17,6 @@ config PPC_POWERNV
 	select PPC_DOORBELL
 	select MMU_NOTIFIER
 	select FORCE_SMP
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
 	default y
 
diff -puN arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks arch/powerpc/platforms/pseries/Kconfig
--- a/arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.552350794 -0700
+++ b/arch/powerpc/platforms/pseries/Kconfig	2026-06-10 15:57:54.070369215 -0700
@@ -23,7 +23,6 @@ config PPC_PSERIES
 	select HOTPLUG_CPU
 	select FORCE_SMP
 	select SWIOTLB
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
 	default y
 
diff -puN arch/riscv/Kconfig~unconditional-vma-locks arch/riscv/Kconfig
--- a/arch/riscv/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.559351043 -0700
+++ b/arch/riscv/Kconfig	2026-06-10 15:57:54.070369215 -0700
@@ -70,7 +70,6 @@ config RISCV
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU
 	select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
-	select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
 	select ARCH_SUPPORTS_RT
 	select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
 	select ARCH_SUPPORTS_SCHED_MC if SMP
diff -puN arch/s390/Kconfig~unconditional-vma-locks arch/s390/Kconfig
--- a/arch/s390/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.571351470 -0700
+++ b/arch/s390/Kconfig	2026-06-10 15:57:54.071369250 -0700
@@ -153,7 +153,6 @@ config S390
 	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_SUPPORTS_PAGE_TABLE_CHECK
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_SYM_ANNOTATIONS
diff -puN arch/x86/Kconfig~unconditional-vma-locks arch/x86/Kconfig
--- a/arch/x86/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.577351684 -0700
+++ b/arch/x86/Kconfig	2026-06-10 15:57:54.071369250 -0700
@@ -27,7 +27,6 @@ config X86_64
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
-	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_SOFT_DIRTY
 	select MODULES_USE_ELF_RELA
@@ -1885,7 +1884,6 @@ config X86_USER_SHADOW_STACK
 	bool "X86 userspace shadow stack"
 	depends on AS_WRUSS
 	depends on X86_64
-	depends on PER_VMA_LOCK
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_USER_SHADOW_STACK
 	select X86_CET
diff -puN fs/proc/internal.h~unconditional-vma-locks fs/proc/internal.h
--- a/fs/proc/internal.h~unconditional-vma-locks	2026-06-10 15:57:53.579351755 -0700
+++ b/fs/proc/internal.h	2026-06-10 15:57:54.071369250 -0700
@@ -382,10 +382,8 @@ struct mem_size_stats;
 
 struct proc_maps_locking_ctx {
 	struct mm_struct *mm;
-#ifdef CONFIG_PER_VMA_LOCK
 	bool mmap_locked;
 	struct vm_area_struct *locked_vma;
-#endif
 };
 
 struct proc_maps_private {
diff -puN fs/proc/task_mmu.c~unconditional-vma-locks fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~unconditional-vma-locks	2026-06-10 15:57:53.594352288 -0700
+++ b/fs/proc/task_mmu.c	2026-06-10 15:57:54.072369286 -0700
@@ -130,8 +130,6 @@ static void release_task_mempolicy(struc
 }
 #endif
 
-#ifdef CONFIG_PER_VMA_LOCK
-
 static void reset_lock_ctx(struct proc_maps_locking_ctx *lock_ctx)
 {
 	lock_ctx->locked_vma = NULL;
@@ -213,33 +211,6 @@ static inline bool fallback_to_mmap_lock
 	return true;
 }
 
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline bool lock_vma_range(struct seq_file *m,
-				  struct proc_maps_locking_ctx *lock_ctx)
-{
-	return mmap_read_lock_killable(lock_ctx->mm) == 0;
-}
-
-static inline void unlock_vma_range(struct proc_maps_locking_ctx *lock_ctx)
-{
-	mmap_read_unlock(lock_ctx->mm);
-}
-
-static struct vm_area_struct *get_next_vma(struct proc_maps_private *priv,
-					   loff_t last_pos)
-{
-	return vma_next(&priv->iter);
-}
-
-static inline bool fallback_to_mmap_lock(struct proc_maps_private *priv,
-					 loff_t pos)
-{
-	return false;
-}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
 static struct vm_area_struct *proc_get_vma(struct seq_file *m, loff_t *ppos)
 {
 	struct proc_maps_private *priv = m->private;
@@ -527,8 +498,6 @@ static int pid_maps_open(struct inode *i
 		PROCMAP_QUERY_VMA_FLAGS				\
 )
 
-#ifdef CONFIG_PER_VMA_LOCK
-
 static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
 {
 	reset_lock_ctx(lock_ctx);
@@ -581,26 +550,6 @@ static struct vm_area_struct *query_vma_
 	return vma;
 }
 
-#else /* CONFIG_PER_VMA_LOCK */
-
-static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
-{
-	return mmap_read_lock_killable(lock_ctx->mm);
-}
-
-static void query_vma_teardown(struct proc_maps_locking_ctx *lock_ctx)
-{
-	mmap_read_unlock(lock_ctx->mm);
-}
-
-static struct vm_area_struct *query_vma_find_by_addr(struct proc_maps_locking_ctx *lock_ctx,
-						     unsigned long addr)
-{
-	return find_vma(lock_ctx->mm, addr);
-}
-
-#endif  /* CONFIG_PER_VMA_LOCK */
-
 static struct vm_area_struct *query_matching_vma(struct proc_maps_locking_ctx *lock_ctx,
 						 unsigned long addr, u32 flags)
 {
diff -puN include/linux/mmap_lock.h~unconditional-vma-locks include/linux/mmap_lock.h
--- a/include/linux/mmap_lock.h~unconditional-vma-locks	2026-06-10 15:57:53.599352466 -0700
+++ b/include/linux/mmap_lock.h	2026-06-10 15:57:54.072369286 -0700
@@ -76,8 +76,6 @@ static inline void mmap_assert_write_loc
 	rwsem_assert_held_write(&mm->mmap_lock);
 }
 
-#ifdef CONFIG_PER_VMA_LOCK
-
 #ifdef CONFIG_LOCKDEP
 #define __vma_lockdep_map(vma) (&vma->vmlock_dep_map)
 #else
@@ -484,52 +482,6 @@ struct vm_area_struct *lock_next_vma(str
 				     struct vma_iterator *iter,
 				     unsigned long address);
 
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline void mm_lock_seqcount_init(struct mm_struct *mm) {}
-static inline void mm_lock_seqcount_begin(struct mm_struct *mm) {}
-static inline void mm_lock_seqcount_end(struct mm_struct *mm) {}
-
-static inline bool mmap_lock_speculate_try_begin(struct mm_struct *mm, unsigned int *seq)
-{
-	return false;
-}
-
-static inline bool mmap_lock_speculate_retry(struct mm_struct *mm, unsigned int seq)
-{
-	return true;
-}
-static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt) {}
-static inline void vma_end_read(struct vm_area_struct *vma) {}
-static inline void vma_start_write(struct vm_area_struct *vma) {}
-static inline __must_check
-int vma_start_write_killable(struct vm_area_struct *vma) { return 0; }
-static inline void vma_assert_write_locked(struct vm_area_struct *vma)
-		{ mmap_assert_write_locked(vma->vm_mm); }
-static inline void vma_assert_attached(struct vm_area_struct *vma) {}
-static inline void vma_assert_detached(struct vm_area_struct *vma) {}
-static inline void vma_mark_attached(struct vm_area_struct *vma) {}
-static inline void vma_mark_detached(struct vm_area_struct *vma) {}
-
-static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
-		unsigned long address)
-{
-	return NULL;
-}
-
-static inline void vma_assert_locked(struct vm_area_struct *vma)
-{
-	mmap_assert_locked(vma->vm_mm);
-}
-
-static inline void vma_assert_stabilised(struct vm_area_struct *vma)
-{
-	/* If no VMA locks, then either mmap lock suffices to stabilise. */
-	mmap_assert_locked(vma->vm_mm);
-}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
 static inline void mmap_write_lock(struct mm_struct *mm)
 {
 	__mmap_lock_trace_start_locking(mm, true);
diff -puN include/linux/mm.h~unconditional-vma-locks include/linux/mm.h
--- a/include/linux/mm.h~unconditional-vma-locks	2026-06-10 15:57:53.745357660 -0700
+++ b/include/linux/mm.h	2026-06-10 15:57:54.073369321 -0700
@@ -890,7 +890,6 @@ static inline void vma_numab_state_free(
  * These must be here rather than mmap_lock.h as dependent on vm_fault type,
  * declared in this header.
  */
-#ifdef CONFIG_PER_VMA_LOCK
 static inline void release_fault_lock(struct vm_fault *vmf)
 {
 	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
@@ -906,17 +905,6 @@ static inline void assert_fault_locked(c
 	else
 		mmap_assert_locked(vmf->vma->vm_mm);
 }
-#else
-static inline void release_fault_lock(struct vm_fault *vmf)
-{
-	mmap_read_unlock(vmf->vma->vm_mm);
-}
-
-static inline void assert_fault_locked(const struct vm_fault *vmf)
-{
-	mmap_assert_locked(vmf->vma->vm_mm);
-}
-#endif /* CONFIG_PER_VMA_LOCK */
 
 static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
 {
diff -puN include/linux/mm_types.h~unconditional-vma-locks include/linux/mm_types.h
--- a/include/linux/mm_types.h~unconditional-vma-locks	2026-06-10 15:57:53.763358300 -0700
+++ b/include/linux/mm_types.h	2026-06-10 15:57:54.074369357 -0700
@@ -959,7 +959,6 @@ struct vm_area_struct {
 		vma_flags_t flags;
 	};
 
-#ifdef CONFIG_PER_VMA_LOCK
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
@@ -975,7 +974,7 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-#endif
+
 	/*
 	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
 	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
@@ -1007,7 +1006,6 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA_BALANCING
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
-#ifdef CONFIG_PER_VMA_LOCK
 	/*
 	 * Used to keep track of firstly, whether the VMA is attached, secondly,
 	 * if attached, how many read locks are taken, and thirdly, if the
@@ -1050,7 +1048,6 @@ struct vm_area_struct {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map vmlock_dep_map;
 #endif
-#endif
 	/*
 	 * For areas with an address space and backing store,
 	 * linkage into the address_space->i_mmap interval tree.
@@ -1249,7 +1246,6 @@ struct mm_struct {
 					  * init_mm.mmlist, and are protected
 					  * by mmlist_lock
 					  */
-#ifdef CONFIG_PER_VMA_LOCK
 		struct rcuwait vma_writer_wait;
 		/*
 		 * This field has lock-like semantics, meaning it is sometimes
@@ -1269,7 +1265,6 @@ struct mm_struct {
 		 * mmap_lock.
 		 */
 		seqcount_t mm_lock_seq;
-#endif
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 		struct mutex			futex_hash_lock;
 		struct futex_private_hash	__rcu *futex_phash;
diff -puN kernel/bpf/task_iter.c~unconditional-vma-locks kernel/bpf/task_iter.c
--- a/kernel/bpf/task_iter.c~unconditional-vma-locks	2026-06-10 15:57:53.773358655 -0700
+++ b/kernel/bpf/task_iter.c	2026-06-10 15:57:54.074369357 -0700
@@ -835,11 +835,6 @@ __bpf_kfunc int bpf_iter_task_vma_new(st
 	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
 	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
 
-	if (!IS_ENABLED(CONFIG_PER_VMA_LOCK)) {
-		kit->data = NULL;
-		return -EOPNOTSUPP;
-	}
-
 	/*
 	 * Reject irqs-disabled contexts including NMI. Operations used
 	 * by _next() and _destroy() (vma_end_read, fput, bpf_iter_mmput_async)
diff -puN kernel/fork.c~unconditional-vma-locks kernel/fork.c
--- a/kernel/fork.c~unconditional-vma-locks	2026-06-10 15:57:53.783359011 -0700
+++ b/kernel/fork.c	2026-06-10 15:57:54.074369357 -0700
@@ -1067,9 +1067,7 @@ static void mmap_init_lock(struct mm_str
 {
 	init_rwsem(&mm->mmap_lock);
 	mm_lock_seqcount_init(mm);
-#ifdef CONFIG_PER_VMA_LOCK
 	rcuwait_init(&mm->vma_writer_wait);
-#endif
 }
 
 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
diff -puN mm/debug.c~unconditional-vma-locks mm/debug.c
--- a/mm/debug.c~unconditional-vma-locks	2026-06-10 15:57:53.785359082 -0700
+++ b/mm/debug.c	2026-06-10 15:57:54.075369392 -0700
@@ -157,17 +157,13 @@ void dump_vma(const struct vm_area_struc
 	pr_emerg("vma %px start %px end %px mm %px\n"
 		"prot %lx anon_vma %px vm_ops %px\n"
 		"pgoff %lx file %px private_data %px\n"
-#ifdef CONFIG_PER_VMA_LOCK
 		"refcnt %x\n"
-#endif
 		"flags: %#lx(%pGv)\n",
 		vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
 		(unsigned long)pgprot_val(vma->vm_page_prot),
 		vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
 		vma->vm_file, vma->vm_private_data,
-#ifdef CONFIG_PER_VMA_LOCK
 		refcount_read(&vma->vm_refcnt),
-#endif
 		vma->vm_flags, &vma->vm_flags);
 }
 EXPORT_SYMBOL(dump_vma);
diff -puN mm/init-mm.c~unconditional-vma-locks mm/init-mm.c
--- a/mm/init-mm.c~unconditional-vma-locks	2026-06-10 15:57:53.808359899 -0700
+++ b/mm/init-mm.c	2026-06-10 15:57:54.075369392 -0700
@@ -39,10 +39,8 @@ struct mm_struct init_mm = {
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
 	.arg_lock	=  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
 	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
-#ifdef CONFIG_PER_VMA_LOCK
 	.vma_writer_wait = __RCUWAIT_INITIALIZER(init_mm.vma_writer_wait),
 	.mm_lock_seq	= SEQCNT_ZERO(init_mm.mm_lock_seq),
-#endif
 	.user_ns	= &init_user_ns,
 #ifdef CONFIG_SCHED_MM_CID
 	.mm_cid.lock = __RAW_SPIN_LOCK_UNLOCKED(init_mm.mm_cid.lock),
diff -puN mm/Kconfig~unconditional-vma-locks mm/Kconfig
--- a/mm/Kconfig~unconditional-vma-locks	2026-06-10 15:57:53.816360183 -0700
+++ b/mm/Kconfig	2026-06-10 15:57:54.075369392 -0700
@@ -1394,19 +1394,6 @@ config LRU_GEN_STATS
 config LRU_GEN_WALKS_MMU
 	def_bool y
 	depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
-# }
-
-config ARCH_SUPPORTS_PER_VMA_LOCK
-       def_bool n
-
-config PER_VMA_LOCK
-	def_bool y
-	depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
-	help
-	  Allow per-vma locking during page fault handling.
-
-	  This feature allows locking each virtual memory area separately when
-	  handling page faults instead of taking mmap_lock.
 
 config LOCK_MM_AND_FIND_VMA
 	bool
diff -puN mm/Kconfig.debug~unconditional-vma-locks mm/Kconfig.debug
--- a/mm/Kconfig.debug~unconditional-vma-locks	2026-06-10 15:57:53.820360326 -0700
+++ b/mm/Kconfig.debug	2026-06-10 15:57:54.075369392 -0700
@@ -310,7 +310,6 @@ config DEBUG_KMEMLEAK_VERBOSE
 
 config PER_VMA_LOCK_STATS
 	bool "Statistics for per-vma locks"
-	depends on PER_VMA_LOCK
 	help
 	  Say Y here to enable success, retry and failure counters of page
 	  faults handled under protection of per-vma locks. When enabled, the
diff -puN mm/memory.c~unconditional-vma-locks mm/memory.c
--- a/mm/memory.c~unconditional-vma-locks	2026-06-10 15:57:53.830360681 -0700
+++ b/mm/memory.c	2026-06-10 15:57:54.076369428 -0700
@@ -6659,7 +6659,6 @@ static vm_fault_t sanitize_fault_flags(s
 				 !is_cow_mapping(vma->vm_flags)))
 			return VM_FAULT_SIGSEGV;
 	}
-#ifdef CONFIG_PER_VMA_LOCK
 	/*
 	 * Per-VMA locks can't be used with FAULT_FLAG_RETRY_NOWAIT because of
 	 * the assumption that lock is dropped on VM_FAULT_RETRY.
@@ -6668,7 +6667,6 @@ static vm_fault_t sanitize_fault_flags(s
 			(FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)) ==
 			(FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)))
 		return VM_FAULT_SIGSEGV;
-#endif
 
 	return 0;
 }
diff -puN mm/mmap_lock.c~unconditional-vma-locks mm/mmap_lock.c
--- a/mm/mmap_lock.c~unconditional-vma-locks	2026-06-10 15:57:53.834360824 -0700
+++ b/mm/mmap_lock.c	2026-06-10 15:57:54.077369463 -0700
@@ -43,9 +43,6 @@ void __mmap_lock_do_trace_released(struc
 EXPORT_SYMBOL(__mmap_lock_do_trace_released);
 #endif /* CONFIG_TRACING */
 
-#ifdef CONFIG_MMU
-#ifdef CONFIG_PER_VMA_LOCK
-
 /* State shared across __vma_[start, end]_exclude_readers. */
 struct vma_exclude_readers_state {
 	/* Input parameters. */
@@ -431,7 +428,6 @@ fallback:
 
 	return vma;
 }
-#endif /* CONFIG_PER_VMA_LOCK */
 
 #ifdef CONFIG_LOCK_MM_AND_FIND_VMA
 #include <linux/extable.h>
@@ -548,23 +544,3 @@ fail:
 	return NULL;
 }
 #endif /* CONFIG_LOCK_MM_AND_FIND_VMA */
-
-#else /* CONFIG_MMU */
-
-/*
- * At least xtensa ends up having protection faults even with no
- * MMU.. No stack expansion, at least.
- */
-struct vm_area_struct *lock_mm_and_find_vma(struct mm_struct *mm,
-			unsigned long addr, struct pt_regs *regs)
-{
-	struct vm_area_struct *vma;
-
-	mmap_read_lock(mm);
-	vma = vma_lookup(mm, addr);
-	if (!vma)
-		mmap_read_unlock(mm);
-	return vma;
-}
-
-#endif /* CONFIG_MMU */
diff -puN mm/pagewalk.c~unconditional-vma-locks mm/pagewalk.c
--- a/mm/pagewalk.c~unconditional-vma-locks	2026-06-10 15:57:53.851361429 -0700
+++ b/mm/pagewalk.c	2026-06-10 15:57:54.077369463 -0700
@@ -446,7 +446,6 @@ static inline void process_mm_walk_lock(
 static inline void process_vma_walk_lock(struct vm_area_struct *vma,
 					 enum page_walk_lock walk_lock)
 {
-#ifdef CONFIG_PER_VMA_LOCK
 	switch (walk_lock) {
 	case PGWALK_WRLOCK:
 		vma_start_write(vma);
@@ -461,7 +460,6 @@ static inline void process_vma_walk_lock
 		/* PGWALK_RDLOCK is handled by process_mm_walk_lock */
 		break;
 	}
-#endif
 }
 
 /*
diff -puN mm/rmap.c~unconditional-vma-locks mm/rmap.c
--- a/mm/rmap.c~unconditional-vma-locks	2026-06-10 15:57:54.018367366 -0700
+++ b/mm/rmap.c	2026-06-10 15:57:54.077369463 -0700
@@ -260,11 +260,9 @@ static void check_anon_vma_clone(struct
 	/* For the anon_vma to be compatible, it can only be singular. */
 	VM_WARN_ON_ONCE(operation == VMA_OP_MERGE_UNFAULTED &&
 			!list_is_singular(&src->anon_vma_chain));
-#ifdef CONFIG_PER_VMA_LOCK
 	/* Only merging an unfaulted VMA leaves the destination attached. */
 	VM_WARN_ON_ONCE(operation != VMA_OP_MERGE_UNFAULTED &&
 			vma_is_attached(dst));
-#endif
 }
 
 static void maybe_reuse_anon_vma(struct vm_area_struct *dst,
diff -puN mm/userfaultfd.c~unconditional-vma-locks mm/userfaultfd.c
--- a/mm/userfaultfd.c~unconditional-vma-locks	2026-06-10 15:57:54.049368468 -0700
+++ b/mm/userfaultfd.c	2026-06-10 15:57:54.078369499 -0700
@@ -104,7 +104,6 @@ struct vm_area_struct *find_vma_and_prep
 	return vma;
 }
 
-#ifdef CONFIG_PER_VMA_LOCK
 /*
  * uffd_lock_vma() - Lookup and lock vma corresponding to @address.
  * @mm: mm to search vma in.
@@ -164,34 +163,6 @@ static void uffd_mfill_unlock(struct vm_
 	vma_end_read(vma);
 }
 
-#else
-
-static struct vm_area_struct *uffd_mfill_lock(struct mm_struct *dst_mm,
-					      unsigned long dst_start,
-					      unsigned long len)
-{
-	struct vm_area_struct *dst_vma;
-
-	mmap_read_lock(dst_mm);
-	dst_vma = find_vma_and_prepare_anon(dst_mm, dst_start);
-	if (IS_ERR(dst_vma))
-		goto out_unlock;
-
-	if (validate_dst_vma(dst_vma, dst_start + len))
-		return dst_vma;
-
-	dst_vma = ERR_PTR(-ENOENT);
-out_unlock:
-	mmap_read_unlock(dst_mm);
-	return dst_vma;
-}
-
-static void uffd_mfill_unlock(struct vm_area_struct *vma)
-{
-	mmap_read_unlock(vma->vm_mm);
-}
-#endif
-
 static void mfill_put_vma(struct mfill_state *state)
 {
 	if (!state->vma)
@@ -1672,7 +1643,6 @@ out_success:
 	return 0;
 }
 
-#ifdef CONFIG_PER_VMA_LOCK
 static int uffd_move_lock(struct mm_struct *mm,
 			  unsigned long dst_start,
 			  unsigned long src_start,
@@ -1747,31 +1717,6 @@ static void uffd_move_unlock(struct vm_a
 		vma_end_read(dst_vma);
 }
 
-#else
-
-static int uffd_move_lock(struct mm_struct *mm,
-			  unsigned long dst_start,
-			  unsigned long src_start,
-			  struct vm_area_struct **dst_vmap,
-			  struct vm_area_struct **src_vmap)
-{
-	int err;
-
-	mmap_read_lock(mm);
-	err = find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
-	if (err)
-		mmap_read_unlock(mm);
-	return err;
-}
-
-static void uffd_move_unlock(struct vm_area_struct *dst_vma,
-			     struct vm_area_struct *src_vma)
-{
-	mmap_assert_locked(src_vma->vm_mm);
-	mmap_read_unlock(dst_vma->vm_mm);
-}
-#endif
-
 /**
  * move_pages - move arbitrary anonymous pages of an existing vma
  * @ctx: pointer to the userfaultfd context
diff -puN rust/kernel/mm.rs~unconditional-vma-locks rust/kernel/mm.rs
--- a/rust/kernel/mm.rs~unconditional-vma-locks	2026-06-10 15:57:54.051368539 -0700
+++ b/rust/kernel/mm.rs	2026-06-10 15:57:54.078369499 -0700
@@ -174,7 +174,6 @@ impl MmWithUser {
     /// When per-vma locks are disabled, this always returns `None`.
     #[inline]
     pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
-        #[cfg(CONFIG_PER_VMA_LOCK)]
         {
             // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
             // `mm_users` is non-zero.
@@ -188,12 +187,6 @@ impl MmWithUser {
                 });
             }
         }
-
-        // Silence warnings about unused variables.
-        #[cfg(not(CONFIG_PER_VMA_LOCK))]
-        let _ = vma_addr;
-
-        None
     }
 
     /// Lock the mmap read lock.
diff -puN tools/testing/vma/include/dup.h~unconditional-vma-locks tools/testing/vma/include/dup.h
--- a/tools/testing/vma/include/dup.h~unconditional-vma-locks	2026-06-10 15:57:54.064369001 -0700
+++ b/tools/testing/vma/include/dup.h	2026-06-10 15:57:54.078369499 -0700
@@ -569,7 +569,6 @@ struct vm_area_struct {
 		vma_flags_t flags;
 	};
 
-#ifdef CONFIG_PER_VMA_LOCK
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
@@ -585,7 +584,6 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-#endif
 
 	/*
 	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
@@ -618,10 +616,8 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA_BALANCING
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
-#ifdef CONFIG_PER_VMA_LOCK
 	/* Unstable RCU readers are allowed to read this. */
 	refcount_t vm_refcnt;
-#endif
 	/*
 	 * For areas with an address space and backing store,
 	 * linkage into the address_space->i_mmap interval tree.
diff -puN tools/testing/vma/vma_internal.h~unconditional-vma-locks tools/testing/vma/vma_internal.h
--- a/tools/testing/vma/vma_internal.h~unconditional-vma-locks	2026-06-10 15:57:54.066369072 -0700
+++ b/tools/testing/vma/vma_internal.h	2026-06-10 15:57:54.078369499 -0700
@@ -15,7 +15,6 @@
 #include <stdlib.h>
 
 #define CONFIG_MMU
-#define CONFIG_PER_VMA_LOCK
 
 #ifdef __CONCAT
 #undef __CONCAT
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/5] mm: Make per-VMA locks available universally
  2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
@ 2026-06-11 19:29   ` Suren Baghdasaryan
  2026-06-12 14:09     ` Vlastimil Babka (SUSE)
  2026-06-12 14:12   ` Vlastimil Babka (SUSE)
  1 sibling, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-11 19:29 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos, Vlastimil Babka

On Wed, Jun 10, 2026 at 4:04 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> The per-VMA locks have been around for several years. They've had some
> bugs worked out of them and have seen quite wide use. However, they
> are still only available when architectures explicitly enable them.
> Remove the conditional compilation around the per-VMA locks, making
> them available on all architectures and configs.
>
> The approach up to now seemed to be to add ARCH_SUPPORTS_PER_VMA_LOCK
> when the architecture started using per-VMA locks in the fault
> handler. But, contrary to the naming, the Kconfig option does not
> really indicate whether the architecture supports per-VMA locks or
> not. It is more of a marker for whether the architecture is likely to
> benefit from per-VMA locks.

Correct. Originally per-VMA locks were used only in the page fault
handling path and architectures that used them in that path would set
ARCH_SUPPORTS_PER_VMA_LOCK. Over time these locks are used in more
places, so ARCH_SUPPORTS_PER_VMA_LOCK lost its meaning and indeed can
be removed.


>
> To me, the most important thing side-effect of universal availability
> is letting per-VMA locks be used in SMP=n configs. This lets us use
> per-VMA locking in all x86 code without fallbacks.
>
> Overall, this just generally makes the kernel simpler. Just look at
> the diffstat. It also opens the door to users that want to use the
> per-VMA locks in common code. Doing *that* brings additional
> simplifications.
>
> The downside of this is adding some fields to vm_area_struct and
> mm_struct. There are likely ways to optimize this, especially for
> things like SMP=n configs. For now, do the simplest thing: use the
> same implementation everywhere.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
>
> --
>
> Changes from v1:
>  * Remove a bunch of left over CONFIG_PER_VMA_LOCKs
>  * Trim some speculation out of the changelog
> ---
>
>  b/arch/arm/Kconfig                       |    1
>  b/arch/arm64/Kconfig                     |    1
>  b/arch/loongarch/Kconfig                 |    1
>  b/arch/powerpc/platforms/powernv/Kconfig |    1
>  b/arch/powerpc/platforms/pseries/Kconfig |    1
>  b/arch/riscv/Kconfig                     |    1
>  b/arch/s390/Kconfig                      |    1
>  b/arch/x86/Kconfig                       |    2 -
>  b/fs/proc/internal.h                     |    2 -
>  b/fs/proc/task_mmu.c                     |   51 ----------------------------
>  b/include/linux/mm.h                     |   12 ------
>  b/include/linux/mm_types.h               |    7 ---
>  b/include/linux/mmap_lock.h              |   48 ---------------------------
>  b/kernel/bpf/task_iter.c                 |    5 --
>  b/kernel/fork.c                          |    2 -
>  b/mm/Kconfig                             |   13 -------
>  b/mm/Kconfig.debug                       |    1
>  b/mm/debug.c                             |    4 --
>  b/mm/init-mm.c                           |    2 -
>  b/mm/memory.c                            |    2 -
>  b/mm/mmap_lock.c                         |   24 -------------
>  b/mm/pagewalk.c                          |    2 -
>  b/mm/rmap.c                              |    2 -
>  b/mm/userfaultfd.c                       |   55 -------------------------------
>  b/rust/kernel/mm.rs                      |    7 ---
>  b/tools/testing/vma/include/dup.h        |    4 --
>  b/tools/testing/vma/vma_internal.h       |    1
>  27 files changed, 1 insertion(+), 252 deletions(-)
>
> diff -puN arch/arm64/Kconfig~unconditional-vma-locks arch/arm64/Kconfig
> --- a/arch/arm64/Kconfig~unconditional-vma-locks        2026-06-10 15:57:53.491348630 -0700
> +++ b/arch/arm64/Kconfig        2026-06-10 15:57:54.069369179 -0700
> @@ -80,7 +80,6 @@ config ARM64
>         select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
>         select ARCH_SUPPORTS_NUMA_BALANCING
>         select ARCH_SUPPORTS_PAGE_TABLE_CHECK
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
>         select ARCH_SUPPORTS_RT
>         select ARCH_SUPPORTS_SCHED_SMT
> diff -puN arch/arm/Kconfig~unconditional-vma-locks arch/arm/Kconfig
> --- a/arch/arm/Kconfig~unconditional-vma-locks  2026-06-10 15:57:53.499348914 -0700
> +++ b/arch/arm/Kconfig  2026-06-10 15:57:54.070369215 -0700
> @@ -41,7 +41,6 @@ config ARM
>         select ARCH_SUPPORTS_ATOMIC_RMW
>         select ARCH_SUPPORTS_CFI
>         select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_SUPPORTS_RT
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_CMPXCHG_LOCKREF
> diff -puN arch/loongarch/Kconfig~unconditional-vma-locks arch/loongarch/Kconfig
> --- a/arch/loongarch/Kconfig~unconditional-vma-locks    2026-06-10 15:57:53.542350439 -0700
> +++ b/arch/loongarch/Kconfig    2026-06-10 15:57:54.070369215 -0700
> @@ -68,7 +68,6 @@ config LOONGARCH
>         select ARCH_SUPPORTS_LTO_CLANG_THIN
>         select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
>         select ARCH_SUPPORTS_NUMA_BALANCING if NUMA
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_SUPPORTS_RT
>         select ARCH_SUPPORTS_SCHED_SMT if SMP
>         select ARCH_SUPPORTS_SCHED_MC  if SMP
> diff -puN arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks arch/powerpc/platforms/powernv/Kconfig
> --- a/arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks    2026-06-10 15:57:53.544350510 -0700
> +++ b/arch/powerpc/platforms/powernv/Kconfig    2026-06-10 15:57:54.070369215 -0700
> @@ -17,7 +17,6 @@ config PPC_POWERNV
>         select PPC_DOORBELL
>         select MMU_NOTIFIER
>         select FORCE_SMP
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
>         default y
>
> diff -puN arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks arch/powerpc/platforms/pseries/Kconfig
> --- a/arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks    2026-06-10 15:57:53.552350794 -0700
> +++ b/arch/powerpc/platforms/pseries/Kconfig    2026-06-10 15:57:54.070369215 -0700
> @@ -23,7 +23,6 @@ config PPC_PSERIES
>         select HOTPLUG_CPU
>         select FORCE_SMP
>         select SWIOTLB
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
>         default y
>
> diff -puN arch/riscv/Kconfig~unconditional-vma-locks arch/riscv/Kconfig
> --- a/arch/riscv/Kconfig~unconditional-vma-locks        2026-06-10 15:57:53.559351043 -0700
> +++ b/arch/riscv/Kconfig        2026-06-10 15:57:54.070369215 -0700
> @@ -70,7 +70,6 @@ config RISCV
>         select ARCH_SUPPORTS_LTO_CLANG_THIN
>         select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU
>         select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
> -       select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
>         select ARCH_SUPPORTS_RT
>         select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
>         select ARCH_SUPPORTS_SCHED_MC if SMP
> diff -puN arch/s390/Kconfig~unconditional-vma-locks arch/s390/Kconfig
> --- a/arch/s390/Kconfig~unconditional-vma-locks 2026-06-10 15:57:53.571351470 -0700
> +++ b/arch/s390/Kconfig 2026-06-10 15:57:54.071369250 -0700
> @@ -153,7 +153,6 @@ config S390
>         select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
>         select ARCH_SUPPORTS_NUMA_BALANCING
>         select ARCH_SUPPORTS_PAGE_TABLE_CHECK
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_USE_BUILTIN_BSWAP
>         select ARCH_USE_CMPXCHG_LOCKREF
>         select ARCH_USE_SYM_ANNOTATIONS
> diff -puN arch/x86/Kconfig~unconditional-vma-locks arch/x86/Kconfig
> --- a/arch/x86/Kconfig~unconditional-vma-locks  2026-06-10 15:57:53.577351684 -0700
> +++ b/arch/x86/Kconfig  2026-06-10 15:57:54.071369250 -0700
> @@ -27,7 +27,6 @@ config X86_64
>         select ARCH_HAS_GIGANTIC_PAGE
>         select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
>         select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
> -       select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
>         select HAVE_ARCH_SOFT_DIRTY
>         select MODULES_USE_ELF_RELA
> @@ -1885,7 +1884,6 @@ config X86_USER_SHADOW_STACK
>         bool "X86 userspace shadow stack"
>         depends on AS_WRUSS
>         depends on X86_64
> -       depends on PER_VMA_LOCK
>         select ARCH_USES_HIGH_VMA_FLAGS
>         select ARCH_HAS_USER_SHADOW_STACK
>         select X86_CET
> diff -puN fs/proc/internal.h~unconditional-vma-locks fs/proc/internal.h
> --- a/fs/proc/internal.h~unconditional-vma-locks        2026-06-10 15:57:53.579351755 -0700
> +++ b/fs/proc/internal.h        2026-06-10 15:57:54.071369250 -0700
> @@ -382,10 +382,8 @@ struct mem_size_stats;
>
>  struct proc_maps_locking_ctx {
>         struct mm_struct *mm;
> -#ifdef CONFIG_PER_VMA_LOCK
>         bool mmap_locked;
>         struct vm_area_struct *locked_vma;
> -#endif
>  };
>
>  struct proc_maps_private {
> diff -puN fs/proc/task_mmu.c~unconditional-vma-locks fs/proc/task_mmu.c
> --- a/fs/proc/task_mmu.c~unconditional-vma-locks        2026-06-10 15:57:53.594352288 -0700
> +++ b/fs/proc/task_mmu.c        2026-06-10 15:57:54.072369286 -0700
> @@ -130,8 +130,6 @@ static void release_task_mempolicy(struc
>  }
>  #endif
>
> -#ifdef CONFIG_PER_VMA_LOCK
> -

A bunch of these helpers can be removed too but I'll do that cleanup
as a followup. Let's keep this one simple.

>  static void reset_lock_ctx(struct proc_maps_locking_ctx *lock_ctx)
>  {
>         lock_ctx->locked_vma = NULL;
> @@ -213,33 +211,6 @@ static inline bool fallback_to_mmap_lock
>         return true;
>  }
>
> -#else /* CONFIG_PER_VMA_LOCK */
> -
> -static inline bool lock_vma_range(struct seq_file *m,
> -                                 struct proc_maps_locking_ctx *lock_ctx)
> -{
> -       return mmap_read_lock_killable(lock_ctx->mm) == 0;
> -}
> -
> -static inline void unlock_vma_range(struct proc_maps_locking_ctx *lock_ctx)
> -{
> -       mmap_read_unlock(lock_ctx->mm);
> -}
> -
> -static struct vm_area_struct *get_next_vma(struct proc_maps_private *priv,
> -                                          loff_t last_pos)
> -{
> -       return vma_next(&priv->iter);
> -}
> -
> -static inline bool fallback_to_mmap_lock(struct proc_maps_private *priv,
> -                                        loff_t pos)
> -{
> -       return false;
> -}
> -
> -#endif /* CONFIG_PER_VMA_LOCK */
> -
>  static struct vm_area_struct *proc_get_vma(struct seq_file *m, loff_t *ppos)
>  {
>         struct proc_maps_private *priv = m->private;
> @@ -527,8 +498,6 @@ static int pid_maps_open(struct inode *i
>                 PROCMAP_QUERY_VMA_FLAGS                         \
>  )
>
> -#ifdef CONFIG_PER_VMA_LOCK
> -
>  static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
>  {
>         reset_lock_ctx(lock_ctx);
> @@ -581,26 +550,6 @@ static struct vm_area_struct *query_vma_
>         return vma;
>  }
>
> -#else /* CONFIG_PER_VMA_LOCK */
> -
> -static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
> -{
> -       return mmap_read_lock_killable(lock_ctx->mm);
> -}
> -
> -static void query_vma_teardown(struct proc_maps_locking_ctx *lock_ctx)
> -{
> -       mmap_read_unlock(lock_ctx->mm);
> -}
> -
> -static struct vm_area_struct *query_vma_find_by_addr(struct proc_maps_locking_ctx *lock_ctx,
> -                                                    unsigned long addr)
> -{
> -       return find_vma(lock_ctx->mm, addr);
> -}
> -
> -#endif  /* CONFIG_PER_VMA_LOCK */
> -
>  static struct vm_area_struct *query_matching_vma(struct proc_maps_locking_ctx *lock_ctx,
>                                                  unsigned long addr, u32 flags)
>  {
> diff -puN include/linux/mmap_lock.h~unconditional-vma-locks include/linux/mmap_lock.h
> --- a/include/linux/mmap_lock.h~unconditional-vma-locks 2026-06-10 15:57:53.599352466 -0700
> +++ b/include/linux/mmap_lock.h 2026-06-10 15:57:54.072369286 -0700
> @@ -76,8 +76,6 @@ static inline void mmap_assert_write_loc
>         rwsem_assert_held_write(&mm->mmap_lock);
>  }
>
> -#ifdef CONFIG_PER_VMA_LOCK
> -
>  #ifdef CONFIG_LOCKDEP
>  #define __vma_lockdep_map(vma) (&vma->vmlock_dep_map)
>  #else
> @@ -484,52 +482,6 @@ struct vm_area_struct *lock_next_vma(str
>                                      struct vma_iterator *iter,
>                                      unsigned long address);
>
> -#else /* CONFIG_PER_VMA_LOCK */
> -
> -static inline void mm_lock_seqcount_init(struct mm_struct *mm) {}
> -static inline void mm_lock_seqcount_begin(struct mm_struct *mm) {}
> -static inline void mm_lock_seqcount_end(struct mm_struct *mm) {}
> -
> -static inline bool mmap_lock_speculate_try_begin(struct mm_struct *mm, unsigned int *seq)
> -{
> -       return false;
> -}
> -
> -static inline bool mmap_lock_speculate_retry(struct mm_struct *mm, unsigned int seq)
> -{
> -       return true;
> -}
> -static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt) {}
> -static inline void vma_end_read(struct vm_area_struct *vma) {}
> -static inline void vma_start_write(struct vm_area_struct *vma) {}
> -static inline __must_check
> -int vma_start_write_killable(struct vm_area_struct *vma) { return 0; }
> -static inline void vma_assert_write_locked(struct vm_area_struct *vma)
> -               { mmap_assert_write_locked(vma->vm_mm); }
> -static inline void vma_assert_attached(struct vm_area_struct *vma) {}
> -static inline void vma_assert_detached(struct vm_area_struct *vma) {}
> -static inline void vma_mark_attached(struct vm_area_struct *vma) {}
> -static inline void vma_mark_detached(struct vm_area_struct *vma) {}
> -
> -static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> -               unsigned long address)
> -{
> -       return NULL;
> -}
> -
> -static inline void vma_assert_locked(struct vm_area_struct *vma)
> -{
> -       mmap_assert_locked(vma->vm_mm);
> -}
> -
> -static inline void vma_assert_stabilised(struct vm_area_struct *vma)
> -{
> -       /* If no VMA locks, then either mmap lock suffices to stabilise. */
> -       mmap_assert_locked(vma->vm_mm);
> -}
> -
> -#endif /* CONFIG_PER_VMA_LOCK */
> -
>  static inline void mmap_write_lock(struct mm_struct *mm)
>  {
>         __mmap_lock_trace_start_locking(mm, true);
> diff -puN include/linux/mm.h~unconditional-vma-locks include/linux/mm.h
> --- a/include/linux/mm.h~unconditional-vma-locks        2026-06-10 15:57:53.745357660 -0700
> +++ b/include/linux/mm.h        2026-06-10 15:57:54.073369321 -0700
> @@ -890,7 +890,6 @@ static inline void vma_numab_state_free(
>   * These must be here rather than mmap_lock.h as dependent on vm_fault type,
>   * declared in this header.
>   */
> -#ifdef CONFIG_PER_VMA_LOCK
>  static inline void release_fault_lock(struct vm_fault *vmf)
>  {
>         if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> @@ -906,17 +905,6 @@ static inline void assert_fault_locked(c
>         else
>                 mmap_assert_locked(vmf->vma->vm_mm);
>  }
> -#else
> -static inline void release_fault_lock(struct vm_fault *vmf)
> -{
> -       mmap_read_unlock(vmf->vma->vm_mm);
> -}
> -
> -static inline void assert_fault_locked(const struct vm_fault *vmf)
> -{
> -       mmap_assert_locked(vmf->vma->vm_mm);
> -}
> -#endif /* CONFIG_PER_VMA_LOCK */
>
>  static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
>  {
> diff -puN include/linux/mm_types.h~unconditional-vma-locks include/linux/mm_types.h
> --- a/include/linux/mm_types.h~unconditional-vma-locks  2026-06-10 15:57:53.763358300 -0700
> +++ b/include/linux/mm_types.h  2026-06-10 15:57:54.074369357 -0700
> @@ -959,7 +959,6 @@ struct vm_area_struct {
>                 vma_flags_t flags;
>         };
>
> -#ifdef CONFIG_PER_VMA_LOCK
>         /*
>          * Can only be written (using WRITE_ONCE()) while holding both:
>          *  - mmap_lock (in write mode)
> @@ -975,7 +974,7 @@ struct vm_area_struct {
>          * slowpath.
>          */
>         unsigned int vm_lock_seq;
> -#endif
> +
>         /*
>          * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
>          * list, after a COW of one of the file pages.  A MAP_SHARED vma
> @@ -1007,7 +1006,6 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA_BALANCING
>         struct vma_numab_state *numab_state;    /* NUMA Balancing state */
>  #endif
> -#ifdef CONFIG_PER_VMA_LOCK
>         /*
>          * Used to keep track of firstly, whether the VMA is attached, secondly,
>          * if attached, how many read locks are taken, and thirdly, if the
> @@ -1050,7 +1048,6 @@ struct vm_area_struct {
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>         struct lockdep_map vmlock_dep_map;
>  #endif
> -#endif
>         /*
>          * For areas with an address space and backing store,
>          * linkage into the address_space->i_mmap interval tree.
> @@ -1249,7 +1246,6 @@ struct mm_struct {
>                                           * init_mm.mmlist, and are protected
>                                           * by mmlist_lock
>                                           */
> -#ifdef CONFIG_PER_VMA_LOCK
>                 struct rcuwait vma_writer_wait;
>                 /*
>                  * This field has lock-like semantics, meaning it is sometimes
> @@ -1269,7 +1265,6 @@ struct mm_struct {
>                  * mmap_lock.
>                  */
>                 seqcount_t mm_lock_seq;
> -#endif
>  #ifdef CONFIG_FUTEX_PRIVATE_HASH
>                 struct mutex                    futex_hash_lock;
>                 struct futex_private_hash       __rcu *futex_phash;
> diff -puN kernel/bpf/task_iter.c~unconditional-vma-locks kernel/bpf/task_iter.c
> --- a/kernel/bpf/task_iter.c~unconditional-vma-locks    2026-06-10 15:57:53.773358655 -0700
> +++ b/kernel/bpf/task_iter.c    2026-06-10 15:57:54.074369357 -0700
> @@ -835,11 +835,6 @@ __bpf_kfunc int bpf_iter_task_vma_new(st
>         BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
>         BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
>
> -       if (!IS_ENABLED(CONFIG_PER_VMA_LOCK)) {
> -               kit->data = NULL;
> -               return -EOPNOTSUPP;
> -       }
> -
>         /*
>          * Reject irqs-disabled contexts including NMI. Operations used
>          * by _next() and _destroy() (vma_end_read, fput, bpf_iter_mmput_async)
> diff -puN kernel/fork.c~unconditional-vma-locks kernel/fork.c
> --- a/kernel/fork.c~unconditional-vma-locks     2026-06-10 15:57:53.783359011 -0700
> +++ b/kernel/fork.c     2026-06-10 15:57:54.074369357 -0700
> @@ -1067,9 +1067,7 @@ static void mmap_init_lock(struct mm_str
>  {
>         init_rwsem(&mm->mmap_lock);
>         mm_lock_seqcount_init(mm);
> -#ifdef CONFIG_PER_VMA_LOCK
>         rcuwait_init(&mm->vma_writer_wait);
> -#endif
>  }
>
>  static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> diff -puN mm/debug.c~unconditional-vma-locks mm/debug.c
> --- a/mm/debug.c~unconditional-vma-locks        2026-06-10 15:57:53.785359082 -0700
> +++ b/mm/debug.c        2026-06-10 15:57:54.075369392 -0700
> @@ -157,17 +157,13 @@ void dump_vma(const struct vm_area_struc
>         pr_emerg("vma %px start %px end %px mm %px\n"
>                 "prot %lx anon_vma %px vm_ops %px\n"
>                 "pgoff %lx file %px private_data %px\n"
> -#ifdef CONFIG_PER_VMA_LOCK
>                 "refcnt %x\n"
> -#endif
>                 "flags: %#lx(%pGv)\n",
>                 vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
>                 (unsigned long)pgprot_val(vma->vm_page_prot),
>                 vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
>                 vma->vm_file, vma->vm_private_data,
> -#ifdef CONFIG_PER_VMA_LOCK
>                 refcount_read(&vma->vm_refcnt),
> -#endif
>                 vma->vm_flags, &vma->vm_flags);
>  }
>  EXPORT_SYMBOL(dump_vma);
> diff -puN mm/init-mm.c~unconditional-vma-locks mm/init-mm.c
> --- a/mm/init-mm.c~unconditional-vma-locks      2026-06-10 15:57:53.808359899 -0700
> +++ b/mm/init-mm.c      2026-06-10 15:57:54.075369392 -0700
> @@ -39,10 +39,8 @@ struct mm_struct init_mm = {
>         .page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
>         .arg_lock       =  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
>         .mmlist         = LIST_HEAD_INIT(init_mm.mmlist),
> -#ifdef CONFIG_PER_VMA_LOCK
>         .vma_writer_wait = __RCUWAIT_INITIALIZER(init_mm.vma_writer_wait),
>         .mm_lock_seq    = SEQCNT_ZERO(init_mm.mm_lock_seq),
> -#endif
>         .user_ns        = &init_user_ns,
>  #ifdef CONFIG_SCHED_MM_CID
>         .mm_cid.lock = __RAW_SPIN_LOCK_UNLOCKED(init_mm.mm_cid.lock),
> diff -puN mm/Kconfig~unconditional-vma-locks mm/Kconfig
> --- a/mm/Kconfig~unconditional-vma-locks        2026-06-10 15:57:53.816360183 -0700
> +++ b/mm/Kconfig        2026-06-10 15:57:54.075369392 -0700
> @@ -1394,19 +1394,6 @@ config LRU_GEN_STATS
>  config LRU_GEN_WALKS_MMU
>         def_bool y
>         depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> -# }
> -
> -config ARCH_SUPPORTS_PER_VMA_LOCK
> -       def_bool n
> -
> -config PER_VMA_LOCK
> -       def_bool y
> -       depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> -       help
> -         Allow per-vma locking during page fault handling.
> -
> -         This feature allows locking each virtual memory area separately when
> -         handling page faults instead of taking mmap_lock.
>
>  config LOCK_MM_AND_FIND_VMA
>         bool
> diff -puN mm/Kconfig.debug~unconditional-vma-locks mm/Kconfig.debug
> --- a/mm/Kconfig.debug~unconditional-vma-locks  2026-06-10 15:57:53.820360326 -0700
> +++ b/mm/Kconfig.debug  2026-06-10 15:57:54.075369392 -0700
> @@ -310,7 +310,6 @@ config DEBUG_KMEMLEAK_VERBOSE
>
>  config PER_VMA_LOCK_STATS
>         bool "Statistics for per-vma locks"
> -       depends on PER_VMA_LOCK
>         help
>           Say Y here to enable success, retry and failure counters of page
>           faults handled under protection of per-vma locks. When enabled, the
> diff -puN mm/memory.c~unconditional-vma-locks mm/memory.c
> --- a/mm/memory.c~unconditional-vma-locks       2026-06-10 15:57:53.830360681 -0700
> +++ b/mm/memory.c       2026-06-10 15:57:54.076369428 -0700
> @@ -6659,7 +6659,6 @@ static vm_fault_t sanitize_fault_flags(s
>                                  !is_cow_mapping(vma->vm_flags)))
>                         return VM_FAULT_SIGSEGV;
>         }
> -#ifdef CONFIG_PER_VMA_LOCK
>         /*
>          * Per-VMA locks can't be used with FAULT_FLAG_RETRY_NOWAIT because of
>          * the assumption that lock is dropped on VM_FAULT_RETRY.
> @@ -6668,7 +6667,6 @@ static vm_fault_t sanitize_fault_flags(s
>                         (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)) ==
>                         (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)))
>                 return VM_FAULT_SIGSEGV;
> -#endif
>
>         return 0;
>  }
> diff -puN mm/mmap_lock.c~unconditional-vma-locks mm/mmap_lock.c
> --- a/mm/mmap_lock.c~unconditional-vma-locks    2026-06-10 15:57:53.834360824 -0700
> +++ b/mm/mmap_lock.c    2026-06-10 15:57:54.077369463 -0700
> @@ -43,9 +43,6 @@ void __mmap_lock_do_trace_released(struc
>  EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #endif /* CONFIG_TRACING */
>
> -#ifdef CONFIG_MMU
> -#ifdef CONFIG_PER_VMA_LOCK
> -
>  /* State shared across __vma_[start, end]_exclude_readers. */
>  struct vma_exclude_readers_state {
>         /* Input parameters. */
> @@ -431,7 +428,6 @@ fallback:
>
>         return vma;
>  }
> -#endif /* CONFIG_PER_VMA_LOCK */
>
>  #ifdef CONFIG_LOCK_MM_AND_FIND_VMA
>  #include <linux/extable.h>
> @@ -548,23 +544,3 @@ fail:
>         return NULL;
>  }
>  #endif /* CONFIG_LOCK_MM_AND_FIND_VMA */
> -
> -#else /* CONFIG_MMU */
> -
> -/*
> - * At least xtensa ends up having protection faults even with no
> - * MMU.. No stack expansion, at least.
> - */
> -struct vm_area_struct *lock_mm_and_find_vma(struct mm_struct *mm,
> -                       unsigned long addr, struct pt_regs *regs)
> -{
> -       struct vm_area_struct *vma;
> -
> -       mmap_read_lock(mm);
> -       vma = vma_lookup(mm, addr);
> -       if (!vma)
> -               mmap_read_unlock(mm);
> -       return vma;
> -}

Might this removal break CONFIG_MMU=n && CONFIG_LOCK_MM_AND_FIND_VMA=n case?


> -
> -#endif /* CONFIG_MMU */
> diff -puN mm/pagewalk.c~unconditional-vma-locks mm/pagewalk.c
> --- a/mm/pagewalk.c~unconditional-vma-locks     2026-06-10 15:57:53.851361429 -0700
> +++ b/mm/pagewalk.c     2026-06-10 15:57:54.077369463 -0700
> @@ -446,7 +446,6 @@ static inline void process_mm_walk_lock(
>  static inline void process_vma_walk_lock(struct vm_area_struct *vma,
>                                          enum page_walk_lock walk_lock)
>  {
> -#ifdef CONFIG_PER_VMA_LOCK
>         switch (walk_lock) {
>         case PGWALK_WRLOCK:
>                 vma_start_write(vma);
> @@ -461,7 +460,6 @@ static inline void process_vma_walk_lock
>                 /* PGWALK_RDLOCK is handled by process_mm_walk_lock */
>                 break;
>         }
> -#endif
>  }
>
>  /*
> diff -puN mm/rmap.c~unconditional-vma-locks mm/rmap.c
> --- a/mm/rmap.c~unconditional-vma-locks 2026-06-10 15:57:54.018367366 -0700
> +++ b/mm/rmap.c 2026-06-10 15:57:54.077369463 -0700
> @@ -260,11 +260,9 @@ static void check_anon_vma_clone(struct
>         /* For the anon_vma to be compatible, it can only be singular. */
>         VM_WARN_ON_ONCE(operation == VMA_OP_MERGE_UNFAULTED &&
>                         !list_is_singular(&src->anon_vma_chain));
> -#ifdef CONFIG_PER_VMA_LOCK
>         /* Only merging an unfaulted VMA leaves the destination attached. */
>         VM_WARN_ON_ONCE(operation != VMA_OP_MERGE_UNFAULTED &&
>                         vma_is_attached(dst));
> -#endif
>  }
>
>  static void maybe_reuse_anon_vma(struct vm_area_struct *dst,
> diff -puN mm/userfaultfd.c~unconditional-vma-locks mm/userfaultfd.c
> --- a/mm/userfaultfd.c~unconditional-vma-locks  2026-06-10 15:57:54.049368468 -0700
> +++ b/mm/userfaultfd.c  2026-06-10 15:57:54.078369499 -0700
> @@ -104,7 +104,6 @@ struct vm_area_struct *find_vma_and_prep
>         return vma;
>  }
>
> -#ifdef CONFIG_PER_VMA_LOCK
>  /*
>   * uffd_lock_vma() - Lookup and lock vma corresponding to @address.
>   * @mm: mm to search vma in.
> @@ -164,34 +163,6 @@ static void uffd_mfill_unlock(struct vm_
>         vma_end_read(vma);
>  }
>
> -#else
> -
> -static struct vm_area_struct *uffd_mfill_lock(struct mm_struct *dst_mm,
> -                                             unsigned long dst_start,
> -                                             unsigned long len)
> -{
> -       struct vm_area_struct *dst_vma;
> -
> -       mmap_read_lock(dst_mm);
> -       dst_vma = find_vma_and_prepare_anon(dst_mm, dst_start);
> -       if (IS_ERR(dst_vma))
> -               goto out_unlock;
> -
> -       if (validate_dst_vma(dst_vma, dst_start + len))
> -               return dst_vma;
> -
> -       dst_vma = ERR_PTR(-ENOENT);
> -out_unlock:
> -       mmap_read_unlock(dst_mm);
> -       return dst_vma;
> -}
> -
> -static void uffd_mfill_unlock(struct vm_area_struct *vma)
> -{
> -       mmap_read_unlock(vma->vm_mm);
> -}
> -#endif
> -
>  static void mfill_put_vma(struct mfill_state *state)
>  {
>         if (!state->vma)
> @@ -1672,7 +1643,6 @@ out_success:
>         return 0;
>  }
>
> -#ifdef CONFIG_PER_VMA_LOCK
>  static int uffd_move_lock(struct mm_struct *mm,
>                           unsigned long dst_start,
>                           unsigned long src_start,
> @@ -1747,31 +1717,6 @@ static void uffd_move_unlock(struct vm_a
>                 vma_end_read(dst_vma);
>  }
>
> -#else
> -
> -static int uffd_move_lock(struct mm_struct *mm,
> -                         unsigned long dst_start,
> -                         unsigned long src_start,
> -                         struct vm_area_struct **dst_vmap,
> -                         struct vm_area_struct **src_vmap)
> -{
> -       int err;
> -
> -       mmap_read_lock(mm);
> -       err = find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
> -       if (err)
> -               mmap_read_unlock(mm);
> -       return err;
> -}
> -
> -static void uffd_move_unlock(struct vm_area_struct *dst_vma,
> -                            struct vm_area_struct *src_vma)
> -{
> -       mmap_assert_locked(src_vma->vm_mm);
> -       mmap_read_unlock(dst_vma->vm_mm);
> -}
> -#endif
> -
>  /**
>   * move_pages - move arbitrary anonymous pages of an existing vma
>   * @ctx: pointer to the userfaultfd context
> diff -puN rust/kernel/mm.rs~unconditional-vma-locks rust/kernel/mm.rs
> --- a/rust/kernel/mm.rs~unconditional-vma-locks 2026-06-10 15:57:54.051368539 -0700
> +++ b/rust/kernel/mm.rs 2026-06-10 15:57:54.078369499 -0700
> @@ -174,7 +174,6 @@ impl MmWithUser {
>      /// When per-vma locks are disabled, this always returns `None`.
>      #[inline]
>      pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
> -        #[cfg(CONFIG_PER_VMA_LOCK)]
>          {
>              // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
>              // `mm_users` is non-zero.
> @@ -188,12 +187,6 @@ impl MmWithUser {
>                  });
>              }
>          }
> -
> -        // Silence warnings about unused variables.
> -        #[cfg(not(CONFIG_PER_VMA_LOCK))]
> -        let _ = vma_addr;
> -
> -        None
>      }
>
>      /// Lock the mmap read lock.
> diff -puN tools/testing/vma/include/dup.h~unconditional-vma-locks tools/testing/vma/include/dup.h
> --- a/tools/testing/vma/include/dup.h~unconditional-vma-locks   2026-06-10 15:57:54.064369001 -0700
> +++ b/tools/testing/vma/include/dup.h   2026-06-10 15:57:54.078369499 -0700
> @@ -569,7 +569,6 @@ struct vm_area_struct {
>                 vma_flags_t flags;
>         };
>
> -#ifdef CONFIG_PER_VMA_LOCK
>         /*
>          * Can only be written (using WRITE_ONCE()) while holding both:
>          *  - mmap_lock (in write mode)
> @@ -585,7 +584,6 @@ struct vm_area_struct {
>          * slowpath.
>          */
>         unsigned int vm_lock_seq;
> -#endif
>
>         /*
>          * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
> @@ -618,10 +616,8 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA_BALANCING
>         struct vma_numab_state *numab_state;    /* NUMA Balancing state */
>  #endif
> -#ifdef CONFIG_PER_VMA_LOCK
>         /* Unstable RCU readers are allowed to read this. */
>         refcount_t vm_refcnt;
> -#endif
>         /*
>          * For areas with an address space and backing store,
>          * linkage into the address_space->i_mmap interval tree.
> diff -puN tools/testing/vma/vma_internal.h~unconditional-vma-locks tools/testing/vma/vma_internal.h
> --- a/tools/testing/vma/vma_internal.h~unconditional-vma-locks  2026-06-10 15:57:54.066369072 -0700
> +++ b/tools/testing/vma/vma_internal.h  2026-06-10 15:57:54.078369499 -0700
> @@ -15,7 +15,6 @@
>  #include <stdlib.h>
>
>  #define CONFIG_MMU
> -#define CONFIG_PER_VMA_LOCK
>
>  #ifdef __CONCAT
>  #undef __CONCAT
> _

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/5] mm: Make per-VMA locks available universally
  2026-06-11 19:29   ` Suren Baghdasaryan
@ 2026-06-12 14:09     ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 14:09 UTC (permalink / raw)
  To: Suren Baghdasaryan, Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

On 6/11/26 21:29, Suren Baghdasaryan wrote:
>> -
>> -/*
>> - * At least xtensa ends up having protection faults even with no
>> - * MMU.. No stack expansion, at least.
>> - */
>> -struct vm_area_struct *lock_mm_and_find_vma(struct mm_struct *mm,
>> -                       unsigned long addr, struct pt_regs *regs)
>> -{
>> -       struct vm_area_struct *vma;
>> -
>> -       mmap_read_lock(mm);
>> -       vma = vma_lookup(mm, addr);
>> -       if (!vma)
>> -               mmap_read_unlock(mm);
>> -       return vma;
>> -}
> 
> Might this removal break CONFIG_MMU=n && CONFIG_LOCK_MM_AND_FIND_VMA=n case?

Seems to me only the architectures that select CONFIG_LOCK_MM_AND_FIND_VMA
also call lock_mm_and_find_vma(). So it's probably fine?

I think this fallback was only for architectures that select
CONFIG_LOCK_MM_AND_FIND_VMA *and* have CONFIG_MMU=n (variant).
It just wasn't guarded by #ifdef CONFIG_LOCK_MM_AND_FIND_VMA as well.

>> -
>> -#endif /* CONFIG_MMU */
>> diff -puN mm/pagewalk.c~unconditional-vma-locks mm/pagewalk.c

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/5] mm: Make per-VMA locks available universally
  2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
  2026-06-11 19:29   ` Suren Baghdasaryan
@ 2026-06-12 14:12   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 14:12 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos

On 6/11/26 01:04, Dave Hansen wrote:

Not a Rust expert but...

> --- a/rust/kernel/mm.rs~unconditional-vma-locks	2026-06-10 15:57:54.051368539 -0700
> +++ b/rust/kernel/mm.rs	2026-06-10 15:57:54.078369499 -0700
> @@ -174,7 +174,6 @@ impl MmWithUser {
>      /// When per-vma locks are disabled, this always returns `None`.
>      #[inline]
>      pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
> -        #[cfg(CONFIG_PER_VMA_LOCK)]
>          {
>              // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
>              // `mm_users` is non-zero.
> @@ -188,12 +187,6 @@ impl MmWithUser {
>                  });
>              }
>          }

Think you can remove the { } as well (as that was the scope of #[cfg] and
reduce indentation of what's inside.

> -
> -        // Silence warnings about unused variables.
> -        #[cfg(not(CONFIG_PER_VMA_LOCK))]
> -        let _ = vma_addr;
> -
> -        None

And here you should leave the 'None' as the #[cfg] only applied to the one
line below.

>      }
>  
>      /// Lock the mmap read lock.
> diff -puN tools/testing/vma/include/dup.h~unconditional-vma-locks tools/testing/vma/include/dup.h



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
  2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
@ 2026-06-10 23:04 ` Dave Hansen
  2026-06-11  7:53   ` Alice Ryhl
  2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4366 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().

Long Version:

== Background ==

Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but left mmap_read_trylock() as a
fallback.

This was presumably because the per-VMA locking can fail for several
reasons and most (all?) lock_vma_under_rcu() callers have a fallback
to mmap_read_trylock().

== Problem ==

The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().

The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.

== Solution ==

Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.

Full disclosure: I originally tried to do this with
lock_vma_under_rcu_wait(), but it did not fit well with the mmap_lock
trylock semantics. Claude caught this in a review and suggested the
approach in this path. It seemed sane to me. So, Suggesed-by: Claude,
I guess.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

--

Changes from v1:
 * Move forward even if 'vma' is NULL in binder_alloc_free_page().
   This can happen if the VMA is unmapped (Sashiko).
 * Rename goto label to be more accurate for new lock scheme

---

 b/drivers/android/binder_alloc.c |   26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock	2026-06-10 15:57:55.274412018 -0700
+++ b/drivers/android/binder_alloc.c	2026-06-10 15:57:55.277412124 -0700
@@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
 	struct vm_area_struct *vma;
 	struct page *page_to_free;
 	unsigned long page_addr;
-	int mm_locked = 0;
 	size_t index;

 	if (!mmget_not_zero(mm))
@@ -1151,15 +1150,12 @@ enum lru_status binder_alloc_free_page(s
 	index = mdata->page_index;
 	page_addr = alloc->vm_start + index * PAGE_SIZE;

-	/* attempt per-vma lock first */
+	/*
+	 * Attempt per-vma lock. This is essentially a
+	 * "trylock". It can fail even if the VMA exists
+	 * for 'page_addr'.
+	 */
 	vma = lock_vma_under_rcu(mm, page_addr);
-	if (!vma) {
-		/* fall back to mmap_lock */
-		if (!mmap_read_trylock(mm))
-			goto err_mmap_read_lock_failed;
-		mm_locked = 1;
-		vma = vma_lookup(mm, page_addr);
-	}

 	if (!mutex_trylock(&alloc->mutex))
 		goto err_get_alloc_mutex_failed;
@@ -1188,13 +1184,11 @@ enum lru_status binder_alloc_free_page(s
 		zap_vma_range(vma, page_addr, PAGE_SIZE);

 		trace_binder_unmap_user_end(alloc, index);
+
+		vma_end_read(vma);
 	}

 	mutex_unlock(&alloc->mutex);
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
-		vma_end_read(vma);
 	mmput_async(mm);
 	binder_free_page(page_to_free);

@@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(s
 err_invalid_vma:
 	mutex_unlock(&alloc->mutex);
 err_get_alloc_mutex_failed:
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
+	if (vma)
 		vma_end_read(vma);
-err_mmap_read_lock_failed:
+err_vma_lock_failed:
 	mmput_async(mm);
 err_mmget:
 	return LRU_SKIP;
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
@ 2026-06-11  7:53   ` Alice Ryhl
  2026-06-11 19:59     ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Alice Ryhl @ 2026-06-11  7:53 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

On Wed, Jun 10, 2026 at 04:04:13PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
> it and mmap_read_trylock().
> 
> Long Version:
> 
> == Background ==
> 
> Historically, binder used an mmap_read_trylock() in its shrinker code.
> This ensures that reclaim is not blocked on an mmap_lock. Commit
> 95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
> support for the per-VMA lock, but left mmap_read_trylock() as a
> fallback.
> 
> This was presumably because the per-VMA locking can fail for several
> reasons and most (all?) lock_vma_under_rcu() callers have a fallback
> to mmap_read_trylock().
> 
> == Problem ==
> 
> The fallback is not worth the complexity here. lock_vma_under_rcu() is
> essentially already a non-blocking trylock. The main reason it fails
> is also the reason mmap_read_trylock() fails: something is holding
> mmap_write_lock().
> 
> The only remedy for a collision with mmap_write_lock() is to wait,
> which this code can not do. So the "fallback" after
> lock_vma_under_rcu() failure is not really a fallback: it is really
> likely to just be retrying in vain. That retry in an of itself isn't
> horrible. But it adds complexity.
> 
> == Solution ==
> 
> Now that per-VMA locks are universally available, lock_vma_under_rcu()
> will not persistently fail. Rely on it alone and simplify the code.
> 
> Full disclosure: I originally tried to do this with
> lock_vma_under_rcu_wait(), but it did not fit well with the mmap_lock
> trylock semantics. Claude caught this in a review and suggested the
> approach in this path. It seemed sane to me. So, Suggesed-by: Claude,
> I guess.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> 
> --
> 
> Changes from v1:
>  * Move forward even if 'vma' is NULL in binder_alloc_free_page().
>    This can happen if the VMA is unmapped (Sashiko).
>  * Rename goto label to be more accurate for new lock scheme
> 
> 
> ---

This seems to include the list of changes in the commit message instead
of under the --- line.

>  b/drivers/android/binder_alloc.c |   26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
> --- a/drivers/android/binder_alloc.c~binder-try-vma-lock	2026-06-10 15:57:55.274412018 -0700
> +++ b/drivers/android/binder_alloc.c	2026-06-10 15:57:55.277412124 -0700
> @@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
>  	struct vm_area_struct *vma;
>  	struct page *page_to_free;
>  	unsigned long page_addr;
> -	int mm_locked = 0;
>  	size_t index;
>  
>  	if (!mmget_not_zero(mm))
> @@ -1151,15 +1150,12 @@ enum lru_status binder_alloc_free_page(s
>  	index = mdata->page_index;
>  	page_addr = alloc->vm_start + index * PAGE_SIZE;
>  
> -	/* attempt per-vma lock first */
> +	/*
> +	 * Attempt per-vma lock. This is essentially a
> +	 * "trylock". It can fail even if the VMA exists
> +	 * for 'page_addr'.
> +	 */
>  	vma = lock_vma_under_rcu(mm, page_addr);
> -	if (!vma) {
> -		/* fall back to mmap_lock */
> -		if (!mmap_read_trylock(mm))
> -			goto err_mmap_read_lock_failed;
> -		mm_locked = 1;
> -		vma = vma_lookup(mm, page_addr);
> -	}
>  
>  	if (!mutex_trylock(&alloc->mutex))
>  		goto err_get_alloc_mutex_failed;
> @@ -1188,13 +1184,11 @@ enum lru_status binder_alloc_free_page(s
>  		zap_vma_range(vma, page_addr, PAGE_SIZE);
>  
>  		trace_binder_unmap_user_end(alloc, index);
> +
> +		vma_end_read(vma);
>  	}
>  
>  	mutex_unlock(&alloc->mutex);
> -	if (mm_locked)
> -		mmap_read_unlock(mm);
> -	else
> -		vma_end_read(vma);
>  	mmput_async(mm);
>  	binder_free_page(page_to_free);
>  
> @@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(s
>  err_invalid_vma:
>  	mutex_unlock(&alloc->mutex);
>  err_get_alloc_mutex_failed:
> -	if (mm_locked)
> -		mmap_read_unlock(mm);
> -	else
> +	if (vma)
>  		vma_end_read(vma);
> -err_mmap_read_lock_failed:
> +err_vma_lock_failed:
>  	mmput_async(mm);

If the vma lookup fails because the mmap write lock is held, but the vma
actually exists (has not been unmapped), then this code might "successfully"
remove the page without invoking zap_vma_range(). This means that the
page does not actually get freed and will just hang around forever until
the process owning the vma exits or Binder needs this page and maps a
new page on top of the page.

Alice

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-11  7:53   ` Alice Ryhl
@ 2026-06-11 19:59     ` Suren Baghdasaryan
  2026-06-12 15:41       ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-11 19:59 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Dave Hansen, linux-kernel, Andrew Morton,
	Arve Hjønnevåg, Carlos Llamas, Christian Brauner,
	David Ahern, David S. Miller, Greg Kroah-Hartman, Liam R. Howlett,
	linux-mm, Lorenzo Stoakes, netdev, Shakeel Butt, Todd Kjos,
	Vlastimil Babka

On Thu, Jun 11, 2026 at 12:53 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Wed, Jun 10, 2026 at 04:04:13PM -0700, Dave Hansen wrote:
> >
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> >
> > tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
> > it and mmap_read_trylock().
> >
> > Long Version:
> >
> > == Background ==
> >
> > Historically, binder used an mmap_read_trylock() in its shrinker code.
> > This ensures that reclaim is not blocked on an mmap_lock. Commit
> > 95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
> > support for the per-VMA lock, but left mmap_read_trylock() as a
> > fallback.
> >
> > This was presumably because the per-VMA locking can fail for several
> > reasons and most (all?) lock_vma_under_rcu() callers have a fallback
> > to mmap_read_trylock().
> >
> > == Problem ==
> >
> > The fallback is not worth the complexity here. lock_vma_under_rcu() is
> > essentially already a non-blocking trylock. The main reason it fails
> > is also the reason mmap_read_trylock() fails: something is holding
> > mmap_write_lock().
> >
> > The only remedy for a collision with mmap_write_lock() is to wait,
> > which this code can not do. So the "fallback" after
> > lock_vma_under_rcu() failure is not really a fallback: it is really
> > likely to just be retrying in vain. That retry in an of itself isn't
> > horrible. But it adds complexity.
> >
> > == Solution ==
> >
> > Now that per-VMA locks are universally available, lock_vma_under_rcu()
> > will not persistently fail. Rely on it alone and simplify the code.
> >
> > Full disclosure: I originally tried to do this with
> > lock_vma_under_rcu_wait(), but it did not fit well with the mmap_lock
> > trylock semantics. Claude caught this in a review and suggested the
> > approach in this path. It seemed sane to me. So, Suggesed-by: Claude,
> > I guess.
> >
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> > Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > Cc: Vlastimil Babka <vbabka@kernel.org>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: linux-mm@kvack.org
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Arve Hjønnevåg <arve@android.com>
> > Cc: Todd Kjos <tkjos@android.com>
> > Cc: Christian Brauner <christian@brauner.io>
> > Cc: Carlos Llamas <cmllamas@google.com>
> > Cc: Alice Ryhl <aliceryhl@google.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: netdev@vger.kernel.org
> >
> > --
> >
> > Changes from v1:
> >  * Move forward even if 'vma' is NULL in binder_alloc_free_page().
> >    This can happen if the VMA is unmapped (Sashiko).
> >  * Rename goto label to be more accurate for new lock scheme
> >
> >
> > ---
>
> This seems to include the list of changes in the commit message instead
> of under the --- line.
>
> >  b/drivers/android/binder_alloc.c |   26 +++++++++-----------------
> >  1 file changed, 9 insertions(+), 17 deletions(-)
> >
> > diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
> > --- a/drivers/android/binder_alloc.c~binder-try-vma-lock      2026-06-10 15:57:55.274412018 -0700
> > +++ b/drivers/android/binder_alloc.c  2026-06-10 15:57:55.277412124 -0700
> > @@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
> >       struct vm_area_struct *vma;
> >       struct page *page_to_free;
> >       unsigned long page_addr;
> > -     int mm_locked = 0;
> >       size_t index;
> >
> >       if (!mmget_not_zero(mm))
> > @@ -1151,15 +1150,12 @@ enum lru_status binder_alloc_free_page(s
> >       index = mdata->page_index;
> >       page_addr = alloc->vm_start + index * PAGE_SIZE;
> >
> > -     /* attempt per-vma lock first */
> > +     /*
> > +      * Attempt per-vma lock. This is essentially a
> > +      * "trylock". It can fail even if the VMA exists
> > +      * for 'page_addr'.
> > +      */
> >       vma = lock_vma_under_rcu(mm, page_addr);
> > -     if (!vma) {
> > -             /* fall back to mmap_lock */
> > -             if (!mmap_read_trylock(mm))
> > -                     goto err_mmap_read_lock_failed;
> > -             mm_locked = 1;
> > -             vma = vma_lookup(mm, page_addr);
> > -     }
> >
> >       if (!mutex_trylock(&alloc->mutex))
> >               goto err_get_alloc_mutex_failed;
> > @@ -1188,13 +1184,11 @@ enum lru_status binder_alloc_free_page(s
> >               zap_vma_range(vma, page_addr, PAGE_SIZE);
> >
> >               trace_binder_unmap_user_end(alloc, index);
> > +
> > +             vma_end_read(vma);
> >       }
> >
> >       mutex_unlock(&alloc->mutex);
> > -     if (mm_locked)
> > -             mmap_read_unlock(mm);
> > -     else
> > -             vma_end_read(vma);
> >       mmput_async(mm);
> >       binder_free_page(page_to_free);
> >
> > @@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(s
> >  err_invalid_vma:
> >       mutex_unlock(&alloc->mutex);
> >  err_get_alloc_mutex_failed:
> > -     if (mm_locked)
> > -             mmap_read_unlock(mm);
> > -     else
> > +     if (vma)
> >               vma_end_read(vma);
> > -err_mmap_read_lock_failed:
> > +err_vma_lock_failed:
> >       mmput_async(mm);
>
> If the vma lookup fails because the mmap write lock is held, but the vma
> actually exists (has not been unmapped), then this code might "successfully"
> remove the page without invoking zap_vma_range(). This means that the
> page does not actually get freed and will just hang around forever until
> the process owning the vma exits or Binder needs this page and maps a
> new page on top of the page.

Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
jump to err_mmap_read_lock_failed, like we currently do if
mmap_read_trylock() fails.

>
> Alice

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-11 19:59     ` Suren Baghdasaryan
@ 2026-06-12 15:41       ` Vlastimil Babka (SUSE)
  2026-06-12 16:01         ` Suren Baghdasaryan
  2026-06-12 16:04         ` Dave Hansen
  0 siblings, 2 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 15:41 UTC (permalink / raw)
  To: Suren Baghdasaryan, Alice Ryhl
  Cc: Dave Hansen, linux-kernel, Andrew Morton,
	Arve Hjønnevåg, Carlos Llamas, Christian Brauner,
	David Ahern, David S. Miller, Greg Kroah-Hartman, Liam R. Howlett,
	linux-mm, Lorenzo Stoakes, netdev, Shakeel Butt, Todd Kjos

On 6/11/26 21:59, Suren Baghdasaryan wrote:
> On Thu, Jun 11, 2026 at 12:53 AM Alice Ryhl <aliceryhl@google.com> wrote:
>>
>> >  b/drivers/android/binder_alloc.c |   26 +++++++++-----------------
>> >  1 file changed, 9 insertions(+), 17 deletions(-)
>> >
>> > diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
>> > --- a/drivers/android/binder_alloc.c~binder-try-vma-lock      2026-06-10 15:57:55.274412018 -0700
>> > +++ b/drivers/android/binder_alloc.c  2026-06-10 15:57:55.277412124 -0700
>> > @@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
>> >       struct vm_area_struct *vma;
>> >       struct page *page_to_free;
>> >       unsigned long page_addr;
>> > -     int mm_locked = 0;
>> >       size_t index;
>> >
>> >       if (!mmget_not_zero(mm))
>> > @@ -1151,15 +1150,12 @@ enum lru_status binder_alloc_free_page(s
>> >       index = mdata->page_index;
>> >       page_addr = alloc->vm_start + index * PAGE_SIZE;
>> >
>> > -     /* attempt per-vma lock first */
>> > +     /*
>> > +      * Attempt per-vma lock. This is essentially a
>> > +      * "trylock". It can fail even if the VMA exists
>> > +      * for 'page_addr'.
>> > +      */
>> >       vma = lock_vma_under_rcu(mm, page_addr);
>> > -     if (!vma) {
>> > -             /* fall back to mmap_lock */
>> > -             if (!mmap_read_trylock(mm))
>> > -                     goto err_mmap_read_lock_failed;
>> > -             mm_locked = 1;
>> > -             vma = vma_lookup(mm, page_addr);
>> > -     }
>> >
>> >       if (!mutex_trylock(&alloc->mutex))
>> >               goto err_get_alloc_mutex_failed;
>> > @@ -1188,13 +1184,11 @@ enum lru_status binder_alloc_free_page(s
>> >               zap_vma_range(vma, page_addr, PAGE_SIZE);
>> >
>> >               trace_binder_unmap_user_end(alloc, index);
>> > +
>> > +             vma_end_read(vma);
>> >       }
>> >
>> >       mutex_unlock(&alloc->mutex);
>> > -     if (mm_locked)
>> > -             mmap_read_unlock(mm);
>> > -     else
>> > -             vma_end_read(vma);
>> >       mmput_async(mm);
>> >       binder_free_page(page_to_free);
>> >
>> > @@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(s
>> >  err_invalid_vma:
>> >       mutex_unlock(&alloc->mutex);
>> >  err_get_alloc_mutex_failed:
>> > -     if (mm_locked)
>> > -             mmap_read_unlock(mm);
>> > -     else
>> > +     if (vma)
>> >               vma_end_read(vma);
>> > -err_mmap_read_lock_failed:
>> > +err_vma_lock_failed:

This label is unused btw, which is related to Alice's point.

>> >       mmput_async(mm);
>>
>> If the vma lookup fails because the mmap write lock is held, but the vma
>> actually exists (has not been unmapped), then this code might "successfully"
>> remove the page without invoking zap_vma_range(). This means that the
>> page does not actually get freed and will just hang around forever until
>> the process owning the vma exits or Binder needs this page and maps a
>> new page on top of the page.
> 
> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
> jump to err_mmap_read_lock_failed, like we currently do if
> mmap_read_trylock() fails.

I don't think that will be enough as well, as the current code AFAICS does
something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
NULL because there's no vma at that address. Now we would just assume the
trylock failed even if the reason was that vma lookup found nothing for the
address. The problem is that lock_vma_under_rcu() can't distinguish those
two outcomes, so we would need something that does?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 15:41       ` Vlastimil Babka (SUSE)
@ 2026-06-12 16:01         ` Suren Baghdasaryan
  2026-06-12 16:04         ` Dave Hansen
  1 sibling, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-12 16:01 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Alice Ryhl, Dave Hansen, linux-kernel, Andrew Morton,
	Arve Hjønnevåg, Carlos Llamas, Christian Brauner,
	David Ahern, David S. Miller, Greg Kroah-Hartman, Liam R. Howlett,
	linux-mm, Lorenzo Stoakes, netdev, Shakeel Butt, Todd Kjos

On Fri, Jun 12, 2026 at 8:41 AM Vlastimil Babka (SUSE)
<vbabka@kernel.org> wrote:
>
> On 6/11/26 21:59, Suren Baghdasaryan wrote:
> > On Thu, Jun 11, 2026 at 12:53 AM Alice Ryhl <aliceryhl@google.com> wrote:
> >>
> >> >  b/drivers/android/binder_alloc.c |   26 +++++++++-----------------
> >> >  1 file changed, 9 insertions(+), 17 deletions(-)
> >> >
> >> > diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
> >> > --- a/drivers/android/binder_alloc.c~binder-try-vma-lock      2026-06-10 15:57:55.274412018 -0700
> >> > +++ b/drivers/android/binder_alloc.c  2026-06-10 15:57:55.277412124 -0700
> >> > @@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
> >> >       struct vm_area_struct *vma;
> >> >       struct page *page_to_free;
> >> >       unsigned long page_addr;
> >> > -     int mm_locked = 0;
> >> >       size_t index;
> >> >
> >> >       if (!mmget_not_zero(mm))
> >> > @@ -1151,15 +1150,12 @@ enum lru_status binder_alloc_free_page(s
> >> >       index = mdata->page_index;
> >> >       page_addr = alloc->vm_start + index * PAGE_SIZE;
> >> >
> >> > -     /* attempt per-vma lock first */
> >> > +     /*
> >> > +      * Attempt per-vma lock. This is essentially a
> >> > +      * "trylock". It can fail even if the VMA exists
> >> > +      * for 'page_addr'.
> >> > +      */
> >> >       vma = lock_vma_under_rcu(mm, page_addr);
> >> > -     if (!vma) {
> >> > -             /* fall back to mmap_lock */
> >> > -             if (!mmap_read_trylock(mm))
> >> > -                     goto err_mmap_read_lock_failed;
> >> > -             mm_locked = 1;
> >> > -             vma = vma_lookup(mm, page_addr);
> >> > -     }
> >> >
> >> >       if (!mutex_trylock(&alloc->mutex))
> >> >               goto err_get_alloc_mutex_failed;
> >> > @@ -1188,13 +1184,11 @@ enum lru_status binder_alloc_free_page(s
> >> >               zap_vma_range(vma, page_addr, PAGE_SIZE);
> >> >
> >> >               trace_binder_unmap_user_end(alloc, index);
> >> > +
> >> > +             vma_end_read(vma);
> >> >       }
> >> >
> >> >       mutex_unlock(&alloc->mutex);
> >> > -     if (mm_locked)
> >> > -             mmap_read_unlock(mm);
> >> > -     else
> >> > -             vma_end_read(vma);
> >> >       mmput_async(mm);
> >> >       binder_free_page(page_to_free);
> >> >
> >> > @@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(s
> >> >  err_invalid_vma:
> >> >       mutex_unlock(&alloc->mutex);
> >> >  err_get_alloc_mutex_failed:
> >> > -     if (mm_locked)
> >> > -             mmap_read_unlock(mm);
> >> > -     else
> >> > +     if (vma)
> >> >               vma_end_read(vma);
> >> > -err_mmap_read_lock_failed:
> >> > +err_vma_lock_failed:
>
> This label is unused btw, which is related to Alice's point.
>
> >> >       mmput_async(mm);
> >>
> >> If the vma lookup fails because the mmap write lock is held, but the vma
> >> actually exists (has not been unmapped), then this code might "successfully"
> >> remove the page without invoking zap_vma_range(). This means that the
> >> page does not actually get freed and will just hang around forever until
> >> the process owning the vma exits or Binder needs this page and maps a
> >> new page on top of the page.
> >
> > Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
> > jump to err_mmap_read_lock_failed, like we currently do if
> > mmap_read_trylock() fails.
>
> I don't think that will be enough as well, as the current code AFAICS does
> something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> NULL because there's no vma at that address. Now we would just assume the
> trylock failed even if the reason was that vma lookup found nothing for the
> address. The problem is that lock_vma_under_rcu() can't distinguish those
> two outcomes, so we would need something that does?

Ah, you are right. I misread the condition in that code.
Changing lock_vma_under_rcu() to distinguish between lock failure and
no-VMA case would require some refactoring. vma_start_read() should
return -ESRCH if VMA was not found and then we handle that in
lock_vma_under_rcu(). That part is not hard because vma_start_read()
has only 2 users but then we have to propagate that error code to
lock_vma_under_rcu() users and there are many of them. Though
converting these users is straght forward. Instead of checking if
(vma) they would have to check if (IS_ERR_OR_NULL(vma)).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 15:41       ` Vlastimil Babka (SUSE)
  2026-06-12 16:01         ` Suren Baghdasaryan
@ 2026-06-12 16:04         ` Dave Hansen
  2026-06-12 16:41           ` Suren Baghdasaryan
  1 sibling, 1 reply; 29+ messages in thread
From: Dave Hansen @ 2026-06-12 16:04 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Suren Baghdasaryan, Alice Ryhl
  Cc: Dave Hansen, linux-kernel, Andrew Morton,
	Arve Hjønnevåg, Carlos Llamas, Christian Brauner,
	David Ahern, David S. Miller, Greg Kroah-Hartman, Liam R. Howlett,
	linux-mm, Lorenzo Stoakes, netdev, Shakeel Butt, Todd Kjos

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On 6/12/26 08:41, Vlastimil Babka (SUSE) wrote:
>>> If the vma lookup fails because the mmap write lock is held, but the vma
>>> actually exists (has not been unmapped), then this code might "successfully"
>>> remove the page without invoking zap_vma_range(). This means that the
>>> page does not actually get freed and will just hang around forever until
>>> the process owning the vma exits or Binder needs this page and maps a
>>> new page on top of the page.
>> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
>> jump to err_mmap_read_lock_failed, like we currently do if
>> mmap_read_trylock() fails.
> I don't think that will be enough as well, as the current code AFAICS does
> something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> NULL because there's no vma at that address. Now we would just assume the
> trylock failed even if the reason was that vma lookup found nothing for the
> address. The problem is that lock_vma_under_rcu() can't distinguish those
> two outcomes, so we would need something that does?

I spent way too much time staring at this yesterday.

I think the key to distinguishing between:

	vma==NULL because there's no VMA
and
	vma==NULL because of a trylock failure

is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
finishes. vm_ops->close() shouldn't be able to happen while
lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
also got a stable is binder_alloc_is_mapped().

So, if you've got a vma!=NULL *and* binder_alloc_is_mapped()==true, I
think you can be pretty sure you've got the right VMA.

If you have vma==NULL and binder_alloc_is_mapped()==true, you can be
pretty sure that you hit some kind of transient lock_vma_under_rcu()
failure.

I came up with the attached patch. More eyeballs would be welcome.
There's a _lot_ going on here.

[-- Attachment #2: binder-try-vma-lock.patch --]
[-- Type: text/x-patch, Size: 5890 bytes --]

From: Dave Hansen <dave.hansen@linux.intel.com>

tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().

Long Version:

== Background ==

Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but left mmap_read_trylock() as a
fallback.

This fallback was required when per-VMA locking could fail
persistenty, but that is no longer the case.

== Problem ==

The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().

The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.

== Solution ==

Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.

The one wrinkle is that lock_vma_under_rcu() can return NULL even if
there is a VMA at 'page_addr'. But the later page zapping code
*must* run if the page might be mapped in to a VMA. Stop relying on
vma_lookup() for this. Just rely on binder_alloc_is_mapped().

== Discussion ==

I think there end up being four possible cases to handle. The first
two are straightforward. Note that "mapped" is shorthand for
binder_alloc_is_mapped().

        !vma && !mapped: reclaim, no zap
         vma &&  mapped: reclaim, with zap

The next one is arguably wrong in the code today:

         vma && !mapped: Wrong VMA. Skip and retry.

It induces LRU_SKIP behavior from another VMA getting mapped. That
seems wrong. It is possible to continue this behavior, but it also
seems a bit silly to go to any lengths to keep doing it if it is
a bug.

The last case comes from normal lock_vma_under_rcu() behavior like
like overflows that is transient. It can _safely_ be handled by
LRU_SKIP. This case is new.

        !vma &&  mapped: VMA lock race. Skip and retry.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

---

Changes from v1:
 * Move forward even if 'vma' is NULL in binder_alloc_free_page().
   This can happen if the VMA is unmapped (Sashiko).
 * Rename goto label to be more accurate for new lock scheme

Changes from v2:
 * Remove review tags. There's too much churn in here to keep them.
 * Rely on binder_alloc_free_page() instead of VMA lookups alone to
   determine if the range must be zapped.

---

 b/drivers/android/binder_alloc.c |   42 ++++++++++++++-------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock	2026-06-10 15:57:55.274412018 -0700
+++ b/drivers/android/binder_alloc.c	2026-06-11 15:17:25.240473010 -0700
@@ -1142,7 +1142,7 @@ enum lru_status binder_alloc_free_page(s
 	struct vm_area_struct *vma;
 	struct page *page_to_free;
 	unsigned long page_addr;
-	int mm_locked = 0;
+	bool mapped;
 	size_t index;

 	if (!mmget_not_zero(mm))
@@ -1151,26 +1151,21 @@ enum lru_status binder_alloc_free_page(s
 	index = mdata->page_index;
 	page_addr = alloc->vm_start + index * PAGE_SIZE;

-	/* attempt per-vma lock first */
 	vma = lock_vma_under_rcu(mm, page_addr);
-	if (!vma) {
-		/* fall back to mmap_lock */
-		if (!mmap_read_trylock(mm))
-			goto err_mmap_read_lock_failed;
-		mm_locked = 1;
-		vma = vma_lookup(mm, page_addr);
-	}

 	if (!mutex_trylock(&alloc->mutex))
-		goto err_get_alloc_mutex_failed;
+		goto err_vma_end_read;

 	/*
-	 * Since a binder_alloc can only be mapped once, we ensure
-	 * the vma corresponds to this mapping by checking whether
-	 * the binder_alloc is still mapped.
+	 * mapped==true means a VMA should be present. Any
+	 * inconsistency should be transient.  Skip the page
+	 * and try again later.
 	 */
-	if (vma && !binder_alloc_is_mapped(alloc))
-		goto err_invalid_vma;
+	mapped = binder_alloc_is_mapped(alloc);
+	if (!vma && mapped)
+		goto err_vma_inconsistent;
+
+	/* mapped==true now implies a valid 'vma' */

 	trace_binder_unmap_kernel_start(alloc, index);

@@ -1182,32 +1177,27 @@ enum lru_status binder_alloc_free_page(s
 	list_lru_isolate(lru, item);
 	spin_unlock(&lru->lock);

-	if (vma) {
+	if (mapped) {
 		trace_binder_unmap_user_start(alloc, index);

 		zap_vma_range(vma, page_addr, PAGE_SIZE);

 		trace_binder_unmap_user_end(alloc, index);
+
+		vma_end_read(vma);
 	}

 	mutex_unlock(&alloc->mutex);
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
-		vma_end_read(vma);
 	mmput_async(mm);
 	binder_free_page(page_to_free);

 	return LRU_REMOVED_RETRY;

-err_invalid_vma:
+err_vma_inconsistent:
 	mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
+err_vma_end_read:
+	if (vma)
 		vma_end_read(vma);
-err_mmap_read_lock_failed:
 	mmput_async(mm);
 err_mmget:
 	return LRU_SKIP;
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 16:04         ` Dave Hansen
@ 2026-06-12 16:41           ` Suren Baghdasaryan
  2026-06-12 16:54             ` Dave Hansen
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-12 16:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Vlastimil Babka (SUSE), Alice Ryhl, Dave Hansen, linux-kernel,
	Andrew Morton, Arve Hjønnevåg, Carlos Llamas,
	Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

On Fri, Jun 12, 2026 at 9:05 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 6/12/26 08:41, Vlastimil Babka (SUSE) wrote:
> >>> If the vma lookup fails because the mmap write lock is held, but the vma
> >>> actually exists (has not been unmapped), then this code might "successfully"
> >>> remove the page without invoking zap_vma_range(). This means that the
> >>> page does not actually get freed and will just hang around forever until
> >>> the process owning the vma exits or Binder needs this page and maps a
> >>> new page on top of the page.
> >> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
> >> jump to err_mmap_read_lock_failed, like we currently do if
> >> mmap_read_trylock() fails.
> > I don't think that will be enough as well, as the current code AFAICS does
> > something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> > NULL because there's no vma at that address. Now we would just assume the
> > trylock failed even if the reason was that vma lookup found nothing for the
> > address. The problem is that lock_vma_under_rcu() can't distinguish those
> > two outcomes, so we would need something that does?
>
> I spent way too much time staring at this yesterday.
>
> I think the key to distinguishing between:
>
>         vma==NULL because there's no VMA
> and
>         vma==NULL because of a trylock failure
>
> is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
> finishes. vm_ops->close() shouldn't be able to happen while
> lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
> also got a stable is binder_alloc_is_mapped().

By "stable binder_alloc_is_mapped()" do you mean it would always be
true? Asking because in your patch you removed this condition:

-         if (vma && !binder_alloc_is_mapped(alloc))
-                  goto err_invalid_vma;

So, previously if we found the VMA but binder_alloc_is_mapped()==false
we would bail out and now we don't. Are you reasoning that this
combination is impossible?

>
> So, if you've got a vma!=NULL *and* binder_alloc_is_mapped()==true, I
> think you can be pretty sure you've got the right VMA.
>
> If you have vma==NULL and binder_alloc_is_mapped()==true, you can be
> pretty sure that you hit some kind of transient lock_vma_under_rcu()
> failure.
>
> I came up with the attached patch. More eyeballs would be welcome.
> There's a _lot_ going on here.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 16:41           ` Suren Baghdasaryan
@ 2026-06-12 16:54             ` Dave Hansen
  2026-06-12 17:07               ` Carlos Llamas
  2026-06-12 17:44               ` Suren Baghdasaryan
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-12 16:54 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Alice Ryhl, Dave Hansen, linux-kernel,
	Andrew Morton, Arve Hjønnevåg, Carlos Llamas,
	Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

On 6/12/26 09:41, Suren Baghdasaryan wrote:
>> I think the key to distinguishing between:
>>
>>         vma==NULL because there's no VMA
>> and
>>         vma==NULL because of a trylock failure
>>
>> is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
>> finishes. vm_ops->close() shouldn't be able to happen while
>> lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
>> also got a stable is binder_alloc_is_mapped().
> By "stable binder_alloc_is_mapped()" do you mean it would always be
> true?

By stable, I meant that it can't change.

	vma = lock_vma_under_rcu()
	mapped = binder_alloc_is_mapped();
	<window>
	vma_end_read(vma);

During <window> it can't go from true=>false or false=>true.

false=>true never happens from what I can tell. It's just plain
impossible given the current code.

true=>false is locked out because when lock_vma_under_rcu() is held.

> Asking because in your patch you removed this condition:
> 
> -         if (vma && !binder_alloc_is_mapped(alloc))
> -                  goto err_invalid_vma;
> 
> So, previously if we found the VMA but binder_alloc_is_mapped()==false
> we would bail out and now we don't. Are you reasoning that this
> combination is impossible?

It's not impossible, but I do think it is irrelevant. Or at least that
the *VMA* is irrelevant in this case. binder_alloc_is_mapped()==false
means that the binder VMA is gone. It's not in the maple tree, and it's
not coming back. If a VMA is found, it's an impostor.

That's why I did:

-        if (vma) {
+        if (mapped) {

The question isn't whether a VMA was found. The question is whether the
binder VMA is still mapped at page_addr. *That* is best inferred from
binder_alloc_is_mapped(), not the VMA lookup.

At least that's what I decided after staring at it for far too long.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 16:54             ` Dave Hansen
@ 2026-06-12 17:07               ` Carlos Llamas
  2026-06-12 17:44               ` Suren Baghdasaryan
  1 sibling, 0 replies; 29+ messages in thread
From: Carlos Llamas @ 2026-06-12 17:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Suren Baghdasaryan, Vlastimil Babka (SUSE), Alice Ryhl,
	Dave Hansen, linux-kernel, Andrew Morton,
	Arve Hjønnevåg, Christian Brauner, David Ahern,
	David S. Miller, Greg Kroah-Hartman, Liam R. Howlett, linux-mm,
	Lorenzo Stoakes, netdev, Shakeel Butt, Todd Kjos

On Fri, Jun 12, 2026 at 09:54:58AM -0700, Dave Hansen wrote:
> On 6/12/26 09:41, Suren Baghdasaryan wrote:
> >> I think the key to distinguishing between:
> >>
> >>         vma==NULL because there's no VMA
> >> and
> >>         vma==NULL because of a trylock failure
> >>
> >> is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
> >> finishes. vm_ops->close() shouldn't be able to happen while
> >> lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
> >> also got a stable is binder_alloc_is_mapped().
> > By "stable binder_alloc_is_mapped()" do you mean it would always be
> > true?
> 
> By stable, I meant that it can't change.
> 
> 	vma = lock_vma_under_rcu()
> 	mapped = binder_alloc_is_mapped();
> 	<window>
> 	vma_end_read(vma);
> 
> During <window> it can't go from true=>false or false=>true.
> 
> false=>true never happens from what I can tell. It's just plain
> impossible given the current code.
> 
> true=>false is locked out because when lock_vma_under_rcu() is held.
> 
> > Asking because in your patch you removed this condition:
> > 
> > -         if (vma && !binder_alloc_is_mapped(alloc))
> > -                  goto err_invalid_vma;
> > 
> > So, previously if we found the VMA but binder_alloc_is_mapped()==false
> > we would bail out and now we don't. Are you reasoning that this
> > combination is impossible?
> 
> It's not impossible, but I do think it is irrelevant. Or at least that
> the *VMA* is irrelevant in this case. binder_alloc_is_mapped()==false
> means that the binder VMA is gone. It's not in the maple tree, and it's
> not coming back. If a VMA is found, it's an impostor.
> 
> That's why I did:
> 
> -        if (vma) {
> +        if (mapped) {
> 
> The question isn't whether a VMA was found. The question is whether the
> binder VMA is still mapped at page_addr. *That* is best inferred from
> binder_alloc_is_mapped(), not the VMA lookup.
> 
> At least that's what I decided after staring at it for far too long.

Yes, I _think_ binder_alloc_is_mapped() can help distinguish between the
two scenarios (contention vs vma-close). However, I think it would be
simpler and safe to do an early exit:

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 88c3e1667d5b..9dd7d927249d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1149,6 +1149,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	 * for 'page_addr'.
 	 */
 	vma = lock_vma_under_rcu(mm, page_addr);
+	if (!vma && binder_alloc_is_mapped(alloc))
+		goto err_vma_lock_failed;
 
 	if (!mutex_trylock(&alloc->mutex))
 		goto err_get_alloc_mutex_failed;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 16:54             ` Dave Hansen
  2026-06-12 17:07               ` Carlos Llamas
@ 2026-06-12 17:44               ` Suren Baghdasaryan
  2026-06-12 18:47                 ` Dave Hansen
  1 sibling, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-12 17:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Vlastimil Babka (SUSE), Alice Ryhl, Dave Hansen, linux-kernel,
	Andrew Morton, Arve Hjønnevåg, Carlos Llamas,
	Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

On Fri, Jun 12, 2026 at 9:55 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 6/12/26 09:41, Suren Baghdasaryan wrote:
> >> I think the key to distinguishing between:
> >>
> >>         vma==NULL because there's no VMA
> >> and
> >>         vma==NULL because of a trylock failure
> >>
> >> is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
> >> finishes. vm_ops->close() shouldn't be able to happen while
> >> lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
> >> also got a stable is binder_alloc_is_mapped().
> > By "stable binder_alloc_is_mapped()" do you mean it would always be
> > true?
>
> By stable, I meant that it can't change.
>
>         vma = lock_vma_under_rcu()
>         mapped = binder_alloc_is_mapped();
>         <window>
>         vma_end_read(vma);
>
> During <window> it can't go from true=>false or false=>true.
>
> false=>true never happens from what I can tell. It's just plain
> impossible given the current code.
>
> true=>false is locked out because when lock_vma_under_rcu() is held.
>
> > Asking because in your patch you removed this condition:
> >
> > -         if (vma && !binder_alloc_is_mapped(alloc))
> > -                  goto err_invalid_vma;
> >
> > So, previously if we found the VMA but binder_alloc_is_mapped()==false
> > we would bail out and now we don't. Are you reasoning that this
> > combination is impossible?
>
> It's not impossible, but I do think it is irrelevant. Or at least that
> the *VMA* is irrelevant in this case. binder_alloc_is_mapped()==false
> means that the binder VMA is gone. It's not in the maple tree, and it's
> not coming back. If a VMA is found, it's an impostor.

Right, but before your change we were bailing out early. With your
change we would be generating the traces and freeing the page. I think
that's a functional change. Was that your intention?

>
> That's why I did:
>
> -        if (vma) {
> +        if (mapped) {
>
> The question isn't whether a VMA was found. The question is whether the
> binder VMA is still mapped at page_addr. *That* is best inferred from
> binder_alloc_is_mapped(), not the VMA lookup.
>
> At least that's what I decided after staring at it for far too long.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 17:44               ` Suren Baghdasaryan
@ 2026-06-12 18:47                 ` Dave Hansen
  2026-06-12 19:50                   ` Alice Ryhl
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Hansen @ 2026-06-12 18:47 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Alice Ryhl, Dave Hansen, linux-kernel,
	Andrew Morton, Arve Hjønnevåg, Carlos Llamas,
	Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On 6/12/26 10:44, Suren Baghdasaryan wrote:
>> It's not impossible, but I do think it is irrelevant. Or at least that
>> the *VMA* is irrelevant in this case. binder_alloc_is_mapped()==false
>> means that the binder VMA is gone. It's not in the maple tree, and it's
>> not coming back. If a VMA is found, it's an impostor.
> Right, but before your change we were bailing out early. With your
> change we would be generating the traces and freeing the page. I think
> that's a functional change. Was that your intention?

Yeah, it was intentional.

I think the existing behavior is buggy. It also complicates the goal of
removing the mmap lock fallback. I've broken that behavior change out
into a separate patch. (attached here)

[-- Attachment #2: binder-impostor-fix.patch --]
[-- Type: text/x-patch, Size: 2462 bytes --]

tl;dr: Stop relying on VMA lookups to determine when to reclaim
pages. Instead, use binder-internal metadata.

== Background ==

Each 'struct binder_alloc' has one and only one place where it is
recorded as having been mapped. It can be munmap()'d. But after that,
binder_alloc_mmap_handler() will return errors for it being "already
mapped". So, binder mmap()s are a one-shot thing.

But, the original mmap() location is special even after munmap(). It
is still recorded in alloc->vm_start and never cleared out.
binder_alloc_free_page() continues to look up VMAs at that address.

== Problem ==

That leads to some suboptimal behavior. The moment an "impostor" VMA
is created at the old binder address, the shrinker function will
always hit the:

	if (vma && !binder_alloc_is_mapped(alloc))

case and LRU_SKIP all pages.

== Solution ==

Stop using the VMA to drive zapping decisions. Instead, use
binder_alloc_is_mapped().

== Discussion ==

Here's some pseudocode for how this behavior could be triggered:

	addr = mmap(..., len, binder_fd);
	// pages can be reclaimed
	munmap(addr, len);
	// pages can still be reclaimed
	mmap(addr, len, MAP_ANONYMOUS|MAP_PRIVATE, -1, ...);
	// Pages can no longer be reclaimed

There are plenty of ways the code could be restructured now
that it is less dependent on VMAs. But I've left that for future
patches.

---

 b/drivers/android/binder_alloc.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff -puN drivers/android/binder_alloc.c~binder-impostor-fix drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-impostor-fix	2026-06-12 10:46:06.704707233 -0700
+++ b/drivers/android/binder_alloc.c	2026-06-12 11:34:15.304460520 -0700
@@ -1164,14 +1164,6 @@ enum lru_status binder_alloc_free_page(s
 	if (!mutex_trylock(&alloc->mutex))
 		goto err_get_alloc_mutex_failed;

-	/*
-	 * Since a binder_alloc can only be mapped once, we ensure
-	 * the vma corresponds to this mapping by checking whether
-	 * the binder_alloc is still mapped.
-	 */
-	if (vma && !binder_alloc_is_mapped(alloc))
-		goto err_invalid_vma;
-
 	trace_binder_unmap_kernel_start(alloc, index);

 	page_to_free = alloc->pages[index];
@@ -1182,7 +1174,7 @@ enum lru_status binder_alloc_free_page(s
 	list_lru_isolate(lru, item);
 	spin_unlock(&lru->lock);

-	if (vma) {
+	if (binder_alloc_is_mapped(alloc)) {
 		trace_binder_unmap_user_start(alloc, index);

 		zap_vma_range(vma, page_addr, PAGE_SIZE);
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
  2026-06-12 18:47                 ` Dave Hansen
@ 2026-06-12 19:50                   ` Alice Ryhl
  0 siblings, 0 replies; 29+ messages in thread
From: Alice Ryhl @ 2026-06-12 19:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Suren Baghdasaryan, Vlastimil Babka (SUSE), Dave Hansen,
	linux-kernel, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos

On Fri, Jun 12, 2026 at 11:47:59AM -0700, Dave Hansen wrote:
> On 6/12/26 10:44, Suren Baghdasaryan wrote:
> >> It's not impossible, but I do think it is irrelevant. Or at least that
> >> the *VMA* is irrelevant in this case. binder_alloc_is_mapped()==false
> >> means that the binder VMA is gone. It's not in the maple tree, and it's
> >> not coming back. If a VMA is found, it's an impostor.
> > Right, but before your change we were bailing out early. With your
> > change we would be generating the traces and freeing the page. I think
> > that's a functional change. Was that your intention?
> 
> Yeah, it was intentional.
> 
> I think the existing behavior is buggy. It also complicates the goal of
> removing the mmap lock fallback. I've broken that behavior change out
> into a separate patch. (attached here)

I think you can just:

1. do a lock_vma_under_rcu().
2. if it fails, check binder_alloc_is_mapped().
3. if still mapped, return LRU_SKIP, otherwise behave like a failed
   vma_lookup() does today under the mmap read lock.

Or you can even skip steps 2 and 3 and treat failed lock_vma_under_rcu()
as LRU_SKIP because processes that unmap their Binder vma without
immediately closing the fd (freeing all the pages) does not really exist
in practice.

Alice

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
  2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
  2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
@ 2026-06-10 23:04 ` Dave Hansen
  2026-06-10 23:40   ` Dave Hansen
                     ` (2 more replies)
  2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4423 bytes --]


From: Dave Hansen <dave.hansen@linux.intel.com>

== Background ==

There are basically two parallel ways to look up a VMA: the
traditional way, which is protected by mmap_lock, and the RCU-based
per-VMA lock way which is based on RCU and refcounts.

== Problem ==

The mmap_lock one is more straightforward to use but it has a big
disadvantage in that it can not be mixed with page faults since those
can take mmap_lock for read, which can deadlock when mixed with page
faults. For example:

	mmap_read_lock(mm);
	// Another thread does mmap_write_lock().
	// New mmap_lock readers are blocked.
	vma = vma_lookup(mm, address);
	// This deadlocks on mmap_read_lock() if it faults:
	copy_from_user(address);
	mmap_read_unlock(mm);

The RCU one can be mixed with faults, but it is not available in all
configs, so all RCU users need to be able to fall back to the
traditional way.

== Solution ==

Add a variant of the RCU-based lookup that waits for writers. This is
basically the same as the existing RCU-based lookup, but it also takes
mmap_lock for read and waits for writers to finish before returning
the VMA. This has some advantages:

 1. Callers do not need to have a fallback path for when they
    collide with writers.
 2. It can be used in contexts where page faults can happen because
    it can take the mmap_lock for read but never *holds* it.
 3. Its fast path does not require taking mmap_lock for read.

Basically, when applied correctly, this approach results in faster
*and* simpler code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

--

Changes from v1:
 * Add a comment explaining that this can not be mixed with other
   per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
 * Add a FIXME about making the mmap_read_lock() killable
 * Add more chaneglog bits about the possibility for an infinite goto
   loop.
 * Adopt vma_start_read_unlocked() implementation from Lorenzo
---

 b/include/linux/mmap_lock.h |    3 +++
 b/mm/mmap_lock.c            |   27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
--- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait	2026-06-10 15:57:55.828431712 -0700
+++ b/include/linux/mmap_lock.h	2026-06-10 15:57:55.834431925 -0700
@@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
 	return vma_start_read_locked_nested(vma, 0);
 }
 
+struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
+					       unsigned long address);
+
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
 	vma_refcount_put(vma);
diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
--- a/mm/mmap_lock.c~lock-vma-under-rcu-wait	2026-06-10 15:57:55.831431819 -0700
+++ b/mm/mmap_lock.c	2026-06-10 16:02:50.723860779 -0700
@@ -338,6 +338,33 @@ inval:
 	return NULL;
 }
 
+/*
+ * Find the VMA covering 'address' and lock it for reading. Waits for writers to
+ * finish if the VMA is being modified. Returns NULL if there is no VMA covering
+ * 'address'.
+ *
+ * Use only in code paths where no mmap_lock and no VMA lock is held.
+ *
+ * The fast path does not take mmap_lock.
+ */
+struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
+					       unsigned long address)
+{
+	struct vm_area_struct *vma;
+
+	/* Fast path: return stable VMA covering 'address': */
+	vma = lock_vma_under_rcu(mm, address);
+	if (vma)
+		return vma;
+
+	/* Slow path: preclude VMA writers by getting mmap read lock. */
+	guard(rwsem_read)(&mm->mmap_lock);
+	if (!vma_start_read_locked(vma))
+		return NULL;
+
+	return vma;
+}
+
 static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
 							    struct vma_iterator *vmi,
 							    unsigned long from_addr)
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
  2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
@ 2026-06-10 23:40   ` Dave Hansen
  2026-06-11 20:35   ` Suren Baghdasaryan
  2026-06-12 18:00   ` Vlastimil Babka (SUSE)
  2 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:40 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

On 6/10/26 16:04, Dave Hansen wrote:
> +	/* Slow path: preclude VMA writers by getting mmap read lock. */
> +	guard(rwsem_read)(&mm->mmap_lock);
> +	if (!vma_start_read_locked(vma))
> +		return NULL;

Welp, I actually ran and tested this, but it's got a big bug. The slow
path is broken. It needs:

	/* Slow path: preclude VMA writers by getting mmap read lock. */
	guard(rwsem_read)(&mm->mmap_lock);
+	vma = vma_lookup(mm, address);
	if (!vma_start_read_locked(vma))
		return NULL;

Because VMA is NULL in slow path otherwise. So it'll definitely need a
v3 or a fixup before it goes anywhere.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
  2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
  2026-06-10 23:40   ` Dave Hansen
@ 2026-06-11 20:35   ` Suren Baghdasaryan
  2026-06-11 21:04     ` Dave Hansen
  2026-06-12 18:00   ` Vlastimil Babka (SUSE)
  2 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-11 20:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos, Vlastimil Babka

On Wed, Jun 10, 2026 at 4:04 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> == Background ==
>
> There are basically two parallel ways to look up a VMA: the
> traditional way, which is protected by mmap_lock, and the RCU-based

"which is protected by mmap_lock" better be more specific "which is
protected by mmap_read_lock".

> per-VMA lock way which is based on RCU and refcounts.
>
> == Problem ==
>
> The mmap_lock one is more straightforward to use but it has a big
> disadvantage in that it can not be mixed with page faults since those
> can take mmap_lock for read, which can deadlock when mixed with page
> faults.

"when mixed with page faults" or "when mixed with address space
modifiers that take mmap_write_lock"? I think that's what your example
shows.

> For example:
>
>         mmap_read_lock(mm);
>         // Another thread does mmap_write_lock().
>         // New mmap_lock readers are blocked.
>         vma = vma_lookup(mm, address);
>         // This deadlocks on mmap_read_lock() if it faults:
>         copy_from_user(address);
>         mmap_read_unlock(mm);

Ultimately the problem here is calling something that might require
mmap_read_lock(mm) (in this case copy_from_user()) while already
holding mmap_read_lock(mm). Normally that works up until you throw an
mmap_write_lock(mm) in the mix.

>
> The RCU one can be mixed with faults, but it is not available in all
> configs, so all RCU users need to be able to fall back to the
> traditional way.
>
> == Solution ==
>
> Add a variant of the RCU-based lookup that waits for writers. This is
> basically the same as the existing RCU-based lookup, but it also takes
> mmap_lock for read and waits for writers to finish before returning
> the VMA. This has some advantages:
>
>  1. Callers do not need to have a fallback path for when they
>     collide with writers.
>  2. It can be used in contexts where page faults can happen because
>     it can take the mmap_lock for read but never *holds* it.
>  3. Its fast path does not require taking mmap_lock for read.
>
> Basically, when applied correctly, this approach results in faster
> *and* simpler code.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
>
> --
>
> Changes from v1:
>  * Add a comment explaining that this can not be mixed with other
>    per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
>  * Add a FIXME about making the mmap_read_lock() killable
>  * Add more chaneglog bits about the possibility for an infinite goto
>    loop.
>  * Adopt vma_start_read_unlocked() implementation from Lorenzo
> ---
>
>  b/include/linux/mmap_lock.h |    3 +++
>  b/mm/mmap_lock.c            |   27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-06-10 15:57:55.828431712 -0700
> +++ b/include/linux/mmap_lock.h 2026-06-10 15:57:55.834431925 -0700
> @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
>         return vma_start_read_locked_nested(vma, 0);
>  }
>
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +                                              unsigned long address);
> +
>  static inline void vma_end_read(struct vm_area_struct *vma)
>  {
>         vma_refcount_put(vma);
> diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait    2026-06-10 15:57:55.831431819 -0700
> +++ b/mm/mmap_lock.c    2026-06-10 16:02:50.723860779 -0700
> @@ -338,6 +338,33 @@ inval:
>         return NULL;
>  }
>
> +/*
> + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> + * 'address'.
> + *
> + * Use only in code paths where no mmap_lock and no VMA lock is held.
> + *
> + * The fast path does not take mmap_lock.
> + */
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +                                              unsigned long address)

It's harder to review a function without a user but I guess users will
be added later in this patchset.

> +{
> +       struct vm_area_struct *vma;
> +
> +       /* Fast path: return stable VMA covering 'address': */
> +       vma = lock_vma_under_rcu(mm, address);
> +       if (vma)
> +               return vma;
> +
> +       /* Slow path: preclude VMA writers by getting mmap read lock. */
> +       guard(rwsem_read)(&mm->mmap_lock);

guard() is nice but mmap_read_{lock|unlock} has those
__mmap_lock_trace_* traces which we lose with guard(). Not sure if
that's a good enough reason to keep using older primitives.

> +       if (!vma_start_read_locked(vma))
> +               return NULL;
> +
> +       return vma;
> +}
> +
>  static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
>                                                             struct vma_iterator *vmi,
>                                                             unsigned long from_addr)
> _

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
  2026-06-11 20:35   ` Suren Baghdasaryan
@ 2026-06-11 21:04     ` Dave Hansen
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-11 21:04 UTC (permalink / raw)
  To: Suren Baghdasaryan, Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos, Vlastimil Babka

On 6/11/26 13:35, Suren Baghdasaryan wrote:
>> +       /* Slow path: preclude VMA writers by getting mmap read lock. */
>> +       guard(rwsem_read)(&mm->mmap_lock);
> guard() is nice but mmap_read_{lock|unlock} has those
> __mmap_lock_trace_* traces which we lose with guard(). Not sure if
> that's a good enough reason to keep using older primitives.

I stole the guard() from Lorenzo's suggestion. I'm totally fine doing it
whichever way you two would prefer.

I'm old school but I personally find guard() harder to read. I tend to
only use it where it is *vastly* superior to explicit lock/unlock. But,
seriously, I'm open to whatever folks want.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
  2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
  2026-06-10 23:40   ` Dave Hansen
  2026-06-11 20:35   ` Suren Baghdasaryan
@ 2026-06-12 18:00   ` Vlastimil Babka (SUSE)
  2 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 18:00 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos

On 6/11/26 01:04, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> == Background ==
> 
> There are basically two parallel ways to look up a VMA: the
> traditional way, which is protected by mmap_lock, and the RCU-based
> per-VMA lock way which is based on RCU and refcounts.
> 
> == Problem ==
> 
> The mmap_lock one is more straightforward to use but it has a big
> disadvantage in that it can not be mixed with page faults since those
> can take mmap_lock for read, which can deadlock when mixed with page
> faults. For example:

... mixed with nested page faults and parallel writers, perhaps?

> 
> 	mmap_read_lock(mm);
> 	// Another thread does mmap_write_lock().
> 	// New mmap_lock readers are blocked.
> 	vma = vma_lookup(mm, address);
> 	// This deadlocks on mmap_read_lock() if it faults:
> 	copy_from_user(address);
> 	mmap_read_unlock(mm);
> 
> The RCU one can be mixed with faults, but it is not available in all

I'd stick to the per-VMA lock term than "RCU one"

> configs, so all RCU users need to be able to fall back to the
> traditional way.

This is now an obsolete statement as patch 1 makes them available? But the
problem is that they can fail and using mmap read lock as a fallback in a
simple way has the above issue?

> == Solution ==
> 
> Add a variant of the RCU-based lookup that waits for writers. This is
> basically the same as the existing RCU-based lookup, but it also takes
> mmap_lock for read and waits for writers to finish before returning
> the VMA. This has some advantages:

I would stress the part that the mmap lock is taken *only temporarily* to
wait for the writers and ensure we obtain a per-vma read lock, and then
dropped again? As that's the main trick IIUC.

>  1. Callers do not need to have a fallback path for when they
>     collide with writers.
>  2. It can be used in contexts where page faults can happen because
>     it can take the mmap_lock for read but never *holds* it.
>  3. Its fast path does not require taking mmap_lock for read.
> 
> Basically, when applied correctly, this approach results in faster
> *and* simpler code.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> 
> --
> 
> Changes from v1:
>  * Add a comment explaining that this can not be mixed with other
>    per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
>  * Add a FIXME about making the mmap_read_lock() killable

I don't see it anywhere?

>  * Add more chaneglog bits about the possibility for an infinite goto
>    loop.
>  * Adopt vma_start_read_unlocked() implementation from Lorenzo
> ---
> 
>  b/include/linux/mmap_lock.h |    3 +++
>  b/mm/mmap_lock.c            |   27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait	2026-06-10 15:57:55.828431712 -0700
> +++ b/include/linux/mmap_lock.h	2026-06-10 15:57:55.834431925 -0700
> @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
>  	return vma_start_read_locked_nested(vma, 0);
>  }
>  
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address);
> +
>  static inline void vma_end_read(struct vm_area_struct *vma)
>  {
>  	vma_refcount_put(vma);
> diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait	2026-06-10 15:57:55.831431819 -0700
> +++ b/mm/mmap_lock.c	2026-06-10 16:02:50.723860779 -0700
> @@ -338,6 +338,33 @@ inval:
>  	return NULL;
>  }
>  
> +/*
> + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> + * 'address'.
> + *
> + * Use only in code paths where no mmap_lock and no VMA lock is held.

I think we have various asserts that could be used and are stronger than a
comment ;)

> + *
> + * The fast path does not take mmap_lock.
> + */
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address)
> +{
> +	struct vm_area_struct *vma;
> +
> +	/* Fast path: return stable VMA covering 'address': */
> +	vma = lock_vma_under_rcu(mm, address);
> +	if (vma)
> +		return vma;
> +
> +	/* Slow path: preclude VMA writers by getting mmap read lock. */

Again I would say "temporarily".

> +	guard(rwsem_read)(&mm->mmap_lock);

Aside from the missing vma_lookup() I'm not sure we should also trust the
result of the lookup blindly? Should we also verify we found a vma? Some
callers might not fail the lookup because they will only lookup something
that's sure to be present, but some might fail?

> +	if (!vma_start_read_locked(vma))
> +		return NULL;

You can count me on the side that would rather see explicit operations than
the guard. Exactly because it's a subtle usage of the mmap sem, and yeah
also the tracing that Suren pointed out.

Seems to me uffd_lock_vma() mostly does all this right (but also does more
stuff that we don't want to do). I'm just not sure right know when
vma_start_read_locked() failures can happen in practice here (can it be only
recfount overflow or also refcount being zero? hopefully not zero if we
found the vma under mmap lock for read? comment in
lock_next_vma_under_mmap_lock() seems to hint at that) and what to do about
them. We seem to have some unhelpfully stale comments around.

- uffd_lock_vma() doesn't document that it can return -EAGAIN. (and should
the caller then retry or what?)

- vma_start_read_locked() has a comment saying how it cannot fail, but it in
fact can.

> +
> +	return vma;
> +}
> +
>  static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
>  							    struct vma_iterator *vmi,
>  							    unsigned long from_addr)
> _


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 4/5] binder: Remove mmap_lock fallback
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
                   ` (2 preceding siblings ...)
  2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
@ 2026-06-10 23:04 ` Dave Hansen
  2026-06-11 20:40   ` Suren Baghdasaryan
  2026-06-12 18:07   ` Vlastimil Babka (SUSE)
  2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
  2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci
  5 siblings, 2 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]


From: Dave Hansen <dave.hansen@linux.intel.com>

Previously, the per-VMA locking could fail in the face of writers
which necessitate a fallback to mmap_lock. The new
vma_start_read_unlocked() will wait for writers instead of failing.

Use the new helper. Wait for writers. Remove the fallback to mmap_lock.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

---

 b/drivers/android/binder_alloc.c |   17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff -puN drivers/android/binder_alloc.c~binder-vma-waiter drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-vma-waiter	2026-06-10 15:57:56.419452721 -0700
+++ b/drivers/android/binder_alloc.c	2026-06-10 15:57:56.423452863 -0700
@@ -259,21 +259,14 @@ static int binder_page_insert(struct bin
 	struct vm_area_struct *vma;
 	int ret = -ESRCH;
 
-	/* attempt per-vma lock first */
-	vma = lock_vma_under_rcu(mm, addr);
-	if (vma) {
-		if (binder_alloc_is_mapped(alloc))
-			ret = vm_insert_page(vma, addr, page);
-		vma_end_read(vma);
+	vma = vma_start_read_unlocked(mm, addr);
+	if (!vma)
 		return ret;
-	}
 
-	/* fall back to mmap_lock */
-	mmap_read_lock(mm);
-	vma = vma_lookup(mm, addr);
-	if (vma && binder_alloc_is_mapped(alloc))
+	if (binder_alloc_is_mapped(alloc))
 		ret = vm_insert_page(vma, addr, page);
-	mmap_read_unlock(mm);
+
+	vma_end_read(vma);
 
 	return ret;
 }
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/5] binder: Remove mmap_lock fallback
  2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
@ 2026-06-11 20:40   ` Suren Baghdasaryan
  2026-06-12 18:07   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-11 20:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos, Vlastimil Babka

On Wed, Jun 10, 2026 at 4:04 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> Previously, the per-VMA locking could fail in the face of writers
> which necessitate a fallback to mmap_lock. The new
> vma_start_read_unlocked() will wait for writers instead of failing.
>
> Use the new helper. Wait for writers. Remove the fallback to mmap_lock.

Nice!

>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
>
> ---
>
>  b/drivers/android/binder_alloc.c |   17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff -puN drivers/android/binder_alloc.c~binder-vma-waiter drivers/android/binder_alloc.c
> --- a/drivers/android/binder_alloc.c~binder-vma-waiter  2026-06-10 15:57:56.419452721 -0700
> +++ b/drivers/android/binder_alloc.c    2026-06-10 15:57:56.423452863 -0700
> @@ -259,21 +259,14 @@ static int binder_page_insert(struct bin
>         struct vm_area_struct *vma;
>         int ret = -ESRCH;
>
> -       /* attempt per-vma lock first */
> -       vma = lock_vma_under_rcu(mm, addr);
> -       if (vma) {
> -               if (binder_alloc_is_mapped(alloc))
> -                       ret = vm_insert_page(vma, addr, page);
> -               vma_end_read(vma);
> +       vma = vma_start_read_unlocked(mm, addr);
> +       if (!vma)
>                 return ret;
> -       }
>
> -       /* fall back to mmap_lock */
> -       mmap_read_lock(mm);
> -       vma = vma_lookup(mm, addr);
> -       if (vma && binder_alloc_is_mapped(alloc))
> +       if (binder_alloc_is_mapped(alloc))
>                 ret = vm_insert_page(vma, addr, page);
> -       mmap_read_unlock(mm);
> +
> +       vma_end_read(vma);
>
>         return ret;
>  }
> _

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/5] binder: Remove mmap_lock fallback
  2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
  2026-06-11 20:40   ` Suren Baghdasaryan
@ 2026-06-12 18:07   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 18:07 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos

On 6/11/26 01:04, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Previously, the per-VMA locking could fail in the face of writers
> which necessitate a fallback to mmap_lock. The new
> vma_start_read_unlocked() will wait for writers instead of failing.
> 
> Use the new helper. Wait for writers. Remove the fallback to mmap_lock.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

The usage seems fine to me. But since it handles cases where vma is NULL, it
seems to support my point for patch 3 that vma_start_read_unlocked() should
also allow for and handle vma_lookup() returning NULL.

Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> 
> ---
> 
>  b/drivers/android/binder_alloc.c |   17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff -puN drivers/android/binder_alloc.c~binder-vma-waiter drivers/android/binder_alloc.c
> --- a/drivers/android/binder_alloc.c~binder-vma-waiter	2026-06-10 15:57:56.419452721 -0700
> +++ b/drivers/android/binder_alloc.c	2026-06-10 15:57:56.423452863 -0700
> @@ -259,21 +259,14 @@ static int binder_page_insert(struct bin
>  	struct vm_area_struct *vma;
>  	int ret = -ESRCH;
>  
> -	/* attempt per-vma lock first */
> -	vma = lock_vma_under_rcu(mm, addr);
> -	if (vma) {
> -		if (binder_alloc_is_mapped(alloc))
> -			ret = vm_insert_page(vma, addr, page);
> -		vma_end_read(vma);
> +	vma = vma_start_read_unlocked(mm, addr);
> +	if (!vma)
>  		return ret;
> -	}
>  
> -	/* fall back to mmap_lock */
> -	mmap_read_lock(mm);
> -	vma = vma_lookup(mm, addr);
> -	if (vma && binder_alloc_is_mapped(alloc))
> +	if (binder_alloc_is_mapped(alloc))
>  		ret = vm_insert_page(vma, addr, page);
> -	mmap_read_unlock(mm);
> +
> +	vma_end_read(vma);
>  
>  	return ret;
>  }
> _


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 5/5] tcp: Remove mmap_lock fallback path
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
                   ` (3 preceding siblings ...)
  2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
@ 2026-06-10 23:04 ` Dave Hansen
  2026-06-11 20:44   ` Suren Baghdasaryan
  2026-06-12 18:13   ` Vlastimil Babka (SUSE)
  2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci
  5 siblings, 2 replies; 29+ messages in thread
From: Dave Hansen @ 2026-06-10 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos,
	Vlastimil Babka

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2966 bytes --]


From: Dave Hansen <dave.hansen@linux.intel.com>

Previously, the per-VMA locking could fail in the face of writers
which necessitates a fallback to mmap_lock. The new
lock_vma_under_rcu_wait() will wait for writers instead of failing.

Use the new helper. Wait for writers. Remove the fallback to mmap_lock.

This really is a nice cleanup. It removes the need to pass the lock
state back and forth to find_tcp_vma().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org
---

 b/net/ipv4/tcp.c |   31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff -puN net/ipv4/tcp.c~ipv4-tcp-vma-waiter net/ipv4/tcp.c
--- a/net/ipv4/tcp.c~ipv4-tcp-vma-waiter	2026-06-10 15:57:56.972472379 -0700
+++ b/net/ipv4/tcp.c	2026-06-10 15:57:56.976472521 -0700
@@ -2171,27 +2171,18 @@ static void tcp_zc_finalize_rx_tstamp(st
 }
 
 static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
-					   unsigned long address,
-					   bool *mmap_locked)
+					   unsigned long address)
 {
-	struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
+	struct vm_area_struct *vma = vma_start_read_unlocked(mm, address);
 
-	if (vma) {
-		if (vma->vm_ops != &tcp_vm_ops) {
-			vma_end_read(vma);
-			return NULL;
-		}
-		*mmap_locked = false;
-		return vma;
-	}
+	if (!vma)
+		return NULL;
 
-	mmap_read_lock(mm);
-	vma = vma_lookup(mm, address);
-	if (!vma || vma->vm_ops != &tcp_vm_ops) {
-		mmap_read_unlock(mm);
+	if (vma->vm_ops != &tcp_vm_ops) {
+		vma_end_read(vma);
 		return NULL;
 	}
-	*mmap_locked = true;
+
 	return vma;
 }
 
@@ -2212,7 +2203,6 @@ static int tcp_zerocopy_receive(struct s
 	u32 seq = tp->copied_seq;
 	u32 total_bytes_to_map;
 	int inq = tcp_inq(sk);
-	bool mmap_locked;
 	int ret;
 
 	zc->copybuf_len = 0;
@@ -2237,7 +2227,7 @@ static int tcp_zerocopy_receive(struct s
 		return 0;
 	}
 
-	vma = find_tcp_vma(current->mm, address, &mmap_locked);
+	vma = find_tcp_vma(current->mm, address);
 	if (!vma)
 		return -EINVAL;
 
@@ -2319,10 +2309,7 @@ static int tcp_zerocopy_receive(struct s
 						   zc, total_bytes_to_map);
 	}
 out:
-	if (mmap_locked)
-		mmap_read_unlock(current->mm);
-	else
-		vma_end_read(vma);
+	vma_end_read(vma);
 	/* Try to copy straggler data. */
 	if (!ret)
 		copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/5] tcp: Remove mmap_lock fallback path
  2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
@ 2026-06-11 20:44   ` Suren Baghdasaryan
  2026-06-12 18:13   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2026-06-11 20:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Todd Kjos, Vlastimil Babka

On Wed, Jun 10, 2026 at 4:04 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> Previously, the per-VMA locking could fail in the face of writers
> which necessitates a fallback to mmap_lock. The new
> lock_vma_under_rcu_wait() will wait for writers instead of failing.
>
> Use the new helper. Wait for writers. Remove the fallback to mmap_lock.
>
> This really is a nice cleanup. It removes the need to pass the lock
> state back and forth to find_tcp_vma().

LGTM. As in the previous patch, you already have my Reviewed-by.

>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> ---
>
>  b/net/ipv4/tcp.c |   31 +++++++++----------------------
>  1 file changed, 9 insertions(+), 22 deletions(-)
>
> diff -puN net/ipv4/tcp.c~ipv4-tcp-vma-waiter net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c~ipv4-tcp-vma-waiter        2026-06-10 15:57:56.972472379 -0700
> +++ b/net/ipv4/tcp.c    2026-06-10 15:57:56.976472521 -0700
> @@ -2171,27 +2171,18 @@ static void tcp_zc_finalize_rx_tstamp(st
>  }
>
>  static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> -                                          unsigned long address,
> -                                          bool *mmap_locked)
> +                                          unsigned long address)
>  {
> -       struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +       struct vm_area_struct *vma = vma_start_read_unlocked(mm, address);
>
> -       if (vma) {
> -               if (vma->vm_ops != &tcp_vm_ops) {
> -                       vma_end_read(vma);
> -                       return NULL;
> -               }
> -               *mmap_locked = false;
> -               return vma;
> -       }
> +       if (!vma)
> +               return NULL;
>
> -       mmap_read_lock(mm);
> -       vma = vma_lookup(mm, address);
> -       if (!vma || vma->vm_ops != &tcp_vm_ops) {
> -               mmap_read_unlock(mm);
> +       if (vma->vm_ops != &tcp_vm_ops) {
> +               vma_end_read(vma);
>                 return NULL;
>         }
> -       *mmap_locked = true;
> +
>         return vma;
>  }
>
> @@ -2212,7 +2203,6 @@ static int tcp_zerocopy_receive(struct s
>         u32 seq = tp->copied_seq;
>         u32 total_bytes_to_map;
>         int inq = tcp_inq(sk);
> -       bool mmap_locked;
>         int ret;
>
>         zc->copybuf_len = 0;
> @@ -2237,7 +2227,7 @@ static int tcp_zerocopy_receive(struct s
>                 return 0;
>         }
>
> -       vma = find_tcp_vma(current->mm, address, &mmap_locked);
> +       vma = find_tcp_vma(current->mm, address);
>         if (!vma)
>                 return -EINVAL;
>
> @@ -2319,10 +2309,7 @@ static int tcp_zerocopy_receive(struct s
>                                                    zc, total_bytes_to_map);
>         }
>  out:
> -       if (mmap_locked)
> -               mmap_read_unlock(current->mm);
> -       else
> -               vma_end_read(vma);
> +       vma_end_read(vma);
>         /* Try to copy straggler data. */
>         if (!ret)
>                 copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> _

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/5] tcp: Remove mmap_lock fallback path
  2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
  2026-06-11 20:44   ` Suren Baghdasaryan
@ 2026-06-12 18:13   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 29+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-12 18:13 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Alice Ryhl, Andrew Morton, Arve Hjønnevåg,
	Carlos Llamas, Christian Brauner, David Ahern, David S. Miller,
	Greg Kroah-Hartman, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
	netdev, Shakeel Butt, Suren Baghdasaryan, Todd Kjos

On 6/11/26 01:04, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Previously, the per-VMA locking could fail in the face of writers
> which necessitates a fallback to mmap_lock. The new
> lock_vma_under_rcu_wait() will wait for writers instead of failing.

vma_start_read_unlocked()

> 
> Use the new helper. Wait for writers. Remove the fallback to mmap_lock.
> 
> This really is a nice cleanup.

[obama_medal_meme.jpg] ;)

But yeah, it is!

> It removes the need to pass the lock
> state back and forth to find_tcp_vma().
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Lorenzo Stoakes <ljs@kernel.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> ---
> 
>  b/net/ipv4/tcp.c |   31 +++++++++----------------------
>  1 file changed, 9 insertions(+), 22 deletions(-)
> 
> diff -puN net/ipv4/tcp.c~ipv4-tcp-vma-waiter net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c~ipv4-tcp-vma-waiter	2026-06-10 15:57:56.972472379 -0700
> +++ b/net/ipv4/tcp.c	2026-06-10 15:57:56.976472521 -0700
> @@ -2171,27 +2171,18 @@ static void tcp_zc_finalize_rx_tstamp(st
>  }
>  
>  static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> -					   unsigned long address,
> -					   bool *mmap_locked)
> +					   unsigned long address)
>  {
> -	struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +	struct vm_area_struct *vma = vma_start_read_unlocked(mm, address);
>  
> -	if (vma) {
> -		if (vma->vm_ops != &tcp_vm_ops) {
> -			vma_end_read(vma);
> -			return NULL;
> -		}
> -		*mmap_locked = false;
> -		return vma;
> -	}
> +	if (!vma)
> +		return NULL;
>  
> -	mmap_read_lock(mm);
> -	vma = vma_lookup(mm, address);
> -	if (!vma || vma->vm_ops != &tcp_vm_ops) {
> -		mmap_read_unlock(mm);
> +	if (vma->vm_ops != &tcp_vm_ops) {
> +		vma_end_read(vma);
>  		return NULL;
>  	}
> -	*mmap_locked = true;
> +
>  	return vma;
>  }
>  
> @@ -2212,7 +2203,6 @@ static int tcp_zerocopy_receive(struct s
>  	u32 seq = tp->copied_seq;
>  	u32 total_bytes_to_map;
>  	int inq = tcp_inq(sk);
> -	bool mmap_locked;
>  	int ret;
>  
>  	zc->copybuf_len = 0;
> @@ -2237,7 +2227,7 @@ static int tcp_zerocopy_receive(struct s
>  		return 0;
>  	}
>  
> -	vma = find_tcp_vma(current->mm, address, &mmap_locked);
> +	vma = find_tcp_vma(current->mm, address);
>  	if (!vma)
>  		return -EINVAL;
>  
> @@ -2319,10 +2309,7 @@ static int tcp_zerocopy_receive(struct s
>  						   zc, total_bytes_to_map);
>  	}
>  out:
> -	if (mmap_locked)
> -		mmap_read_unlock(current->mm);
> -	else
> -		vma_end_read(vma);
> +	vma_end_read(vma);
>  	/* Try to copy straggler data. */
>  	if (!ret)
>  		copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> _


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups
  2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
                   ` (4 preceding siblings ...)
  2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
@ 2026-06-11 20:24 ` syzbot ci
  5 siblings, 0 replies; 29+ messages in thread
From: syzbot ci @ 2026-06-11 20:24 UTC (permalink / raw)
  To: akpm, aliceryhl, arve, christian, cmllamas, dave.hansen, davem,
	dsahern, gregkh, liam.howlett, linux-kernel, linux-mm, ljs,
	netdev, shakeel.butt, surenb, tkjos, vbabka
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v2] mm: Unconditional per-VMA locks and cleanups
https://lore.kernel.org/all/20260610230409.A44D29FA@davehans-spike.ostc.intel.com
* [PATCH v2 1/5] mm: Make per-VMA locks available universally
* [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
* [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
* [PATCH v2 4/5] binder: Remove mmap_lock fallback
* [PATCH v2 5/5] tcp: Remove mmap_lock fallback path

and found the following issue:
general protection fault in tcp_zerocopy_receive

Full report is available here:
https://ci.syzbot.org/series/3e6d125a-b2ae-49a4-b833-babfb8bc9150

***

general protection fault in tcp_zerocopy_receive

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      c8459ee2fef502d6ef6c063751c33d9ac7943eab
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/e7d66981-7900-4c3d-b992-664ccd13a57e/config
syz repro: https://ci.syzbot.org/findings/59d09544-f280-48fe-8ca9-a2fd8225e9df/syz_repro

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 0 UID: 0 PID: 5876 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:vma_start_read_locked_nested include/linux/mmap_lock.h:240 [inline]
RIP: 0010:vma_start_read_locked+0xa0/0x300 include/linux/mmap_lock.h:257
Code: 28 84 c0 0f 85 2b 02 00 00 44 8b 35 ba 0d 1b 0e 31 ff 44 89 f6 e8 c0 e2 af ff 45 85 f6 74 48 4c 8d 73 10 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 f7 e8 b1 37 1b 00 bf 38 03 00 00 49 03
RSP: 0018:ffffc9000399f4a0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88816d558000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc9000399f538 R08: ffff8881076c64df R09: 1ffff11020ed8c9b
R10: dffffc0000000000 R11: ffffed1020ed8c9c R12: 1ffff92000733e94
R13: dffffc0000000000 R14: 0000000000000010 R15: 0000000000011000
FS:  00007f4866b966c0(0000) GS:ffff88818dc86000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000011000 CR3: 0000000117c8a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 vma_start_read_unlocked+0x3f/0x70 mm/mmap_lock.c:362
 find_tcp_vma net/ipv4/tcp.c:2173 [inline]
 tcp_zerocopy_receive+0x762/0x2200 net/ipv4/tcp.c:2227
 do_tcp_getsockopt+0x2079/0x2940 net/ipv4/tcp.c:4758
 tcp_getsockopt+0x83/0x130 net/ipv4/tcp.c:4856
 do_sock_getsockopt+0x51d/0x7e0 net/socket.c:2487
 __sys_getsockopt net/socket.c:2518 [inline]
 __do_sys_getsockopt net/socket.c:2525 [inline]
 __se_sys_getsockopt net/socket.c:2522 [inline]
 __x64_sys_getsockopt+0x1a4/0x240 net/socket.c:2522
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4865d9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4866b96028 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007f4866015fa0 RCX: 00007f4865d9ce59
RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000003
RBP: 00007f4865e32d6f R08: 0000200000000380 R09: 0000000000000000
R10: 0000200000000340 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f4866016038 R14: 00007f4866015fa0 R15: 00007ffc748de268
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vma_start_read_locked_nested include/linux/mmap_lock.h:240 [inline]
RIP: 0010:vma_start_read_locked+0xa0/0x300 include/linux/mmap_lock.h:257
Code: 28 84 c0 0f 85 2b 02 00 00 44 8b 35 ba 0d 1b 0e 31 ff 44 89 f6 e8 c0 e2 af ff 45 85 f6 74 48 4c 8d 73 10 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 f7 e8 b1 37 1b 00 bf 38 03 00 00 49 03
RSP: 0018:ffffc9000399f4a0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88816d558000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc9000399f538 R08: ffff8881076c64df R09: 1ffff11020ed8c9b
R10: dffffc0000000000 R11: ffffed1020ed8c9c R12: 1ffff92000733e94
R13: dffffc0000000000 R14: 0000000000000010 R15: 0000000000011000
FS:  00007f4866b966c0(0000) GS:ffff88818dc86000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4865dea540 CR3: 0000000117c8a000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	28 84 c0 0f 85 2b 02 	sub    %al,0x22b850f(%rax,%rax,8)
   7:	00 00                	add    %al,(%rax)
   9:	44 8b 35 ba 0d 1b 0e 	mov    0xe1b0dba(%rip),%r14d        # 0xe1b0dca
  10:	31 ff                	xor    %edi,%edi
  12:	44 89 f6             	mov    %r14d,%esi
  15:	e8 c0 e2 af ff       	call   0xffafe2da
  1a:	45 85 f6             	test   %r14d,%r14d
  1d:	74 48                	je     0x67
  1f:	4c 8d 73 10          	lea    0x10(%rbx),%r14
  23:	4c 89 f0             	mov    %r14,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 28 00       	cmpb   $0x0,(%rax,%r13,1) <-- trapping instruction
  2f:	74 08                	je     0x39
  31:	4c 89 f7             	mov    %r14,%rdi
  34:	e8 b1 37 1b 00       	call   0x1b37ea
  39:	bf 38 03 00 00       	mov    $0x338,%edi
  3e:	49                   	rex.WB
  3f:	03                   	.byte 0x3


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-06-12 19:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
2026-06-11 19:29   ` Suren Baghdasaryan
2026-06-12 14:09     ` Vlastimil Babka (SUSE)
2026-06-12 14:12   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
2026-06-11  7:53   ` Alice Ryhl
2026-06-11 19:59     ` Suren Baghdasaryan
2026-06-12 15:41       ` Vlastimil Babka (SUSE)
2026-06-12 16:01         ` Suren Baghdasaryan
2026-06-12 16:04         ` Dave Hansen
2026-06-12 16:41           ` Suren Baghdasaryan
2026-06-12 16:54             ` Dave Hansen
2026-06-12 17:07               ` Carlos Llamas
2026-06-12 17:44               ` Suren Baghdasaryan
2026-06-12 18:47                 ` Dave Hansen
2026-06-12 19:50                   ` Alice Ryhl
2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
2026-06-10 23:40   ` Dave Hansen
2026-06-11 20:35   ` Suren Baghdasaryan
2026-06-11 21:04     ` Dave Hansen
2026-06-12 18:00   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
2026-06-11 20:40   ` Suren Baghdasaryan
2026-06-12 18:07   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
2026-06-11 20:44   ` Suren Baghdasaryan
2026-06-12 18:13   ` Vlastimil Babka (SUSE)
2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox