* [PATCH 1/6] mm: Make per-VMA locks available universally
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
@ 2026-04-29 18:19 ` Dave Hansen
2026-04-29 18:19 ` [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:19 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
The per-VMA locks have been around for several years. They've had some
bugs worked out of them and have seen quite wide use. However, they
are still only available when architectures explicitly enable them.
Remove the conditional compilation around the per-VMA locks, making
them available on all architectures and configs.
The approach up to now seemed to be to add ARCH_SUPPORTS_PER_VMA_LOCK
when the architecture started using per-VMA locks in the fault
handler. But, contrary to the naming, the Kconfig option does not
really indicate whether the architecture supports per-VMA locks or
not. It is more of a marker for whether the architecture is likely to
benefit from per-VMA locks.
To me, the most important thing side-effect of universal availability
is letting per-VMA locks be used in SMP=n configs. This lets us use
per-VMA locking in all x86 code without fallbacks.
Overall, this just generally makes the kernel simpler. Just look at
the diffstat. It also opens the door to users that want to use the
per-VMA locks in common code. Doing *that* can bring additional
simplifications.
The downside of this is adding some fields to vm_area_struct and
mm_struct. I suspect there are some very simple ways to implement the
per-VMA locks that don't require any additional fields, especially if
such an approach was limited to SMP=n configs*. For now, do the
simplest thing: use the same implementation everywhere.
* For example, since SMP=n configs don't care much about scalability or
false sharing, there could be a single, global VMA seqcount that is
bumped when any VMA is modified instead of having space in each VMA
for a seqcount.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/arch/arm/Kconfig | 1
b/arch/arm64/Kconfig | 1
b/arch/loongarch/Kconfig | 1
b/arch/powerpc/platforms/powernv/Kconfig | 1
b/arch/powerpc/platforms/pseries/Kconfig | 1
b/arch/riscv/Kconfig | 1
b/arch/s390/Kconfig | 1
b/arch/x86/Kconfig | 2 -
b/fs/proc/internal.h | 2 -
b/fs/proc/task_mmu.c | 51 -------------------------------
b/include/linux/mm.h | 12 -------
b/include/linux/mm_types.h | 7 ----
b/include/linux/mmap_lock.h | 48 -----------------------------
b/kernel/fork.c | 2 -
b/mm/Kconfig | 13 -------
b/mm/mmap_lock.c | 2 -
16 files changed, 1 insertion(+), 145 deletions(-)
diff -puN arch/arm64/Kconfig~unconditional-vma-locks arch/arm64/Kconfig
--- a/arch/arm64/Kconfig~unconditional-vma-locks 2026-04-29 11:18:47.795519653 -0700
+++ b/arch/arm64/Kconfig 2026-04-29 11:18:49.088569421 -0700
@@ -80,7 +80,6 @@ config ARM64
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
- select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_SCHED_SMT
diff -puN arch/arm/Kconfig~unconditional-vma-locks arch/arm/Kconfig
--- a/arch/arm/Kconfig~unconditional-vma-locks 2026-04-29 11:18:47.915524272 -0700
+++ b/arch/arm/Kconfig 2026-04-29 11:18:49.088569421 -0700
@@ -41,7 +41,6 @@ config ARM
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_CFI
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
- select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_RT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
diff -puN arch/loongarch/Kconfig~unconditional-vma-locks arch/loongarch/Kconfig
--- a/arch/loongarch/Kconfig~unconditional-vma-locks 2026-04-29 11:18:47.956525850 -0700
+++ b/arch/loongarch/Kconfig 2026-04-29 11:18:49.088569421 -0700
@@ -68,7 +68,6 @@ config LOONGARCH
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_NUMA_BALANCING if NUMA
- select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_SCHED_SMT if SMP
select ARCH_SUPPORTS_SCHED_MC if SMP
diff -puN arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks arch/powerpc/platforms/powernv/Kconfig
--- a/arch/powerpc/platforms/powernv/Kconfig~unconditional-vma-locks 2026-04-29 11:18:47.969526350 -0700
+++ b/arch/powerpc/platforms/powernv/Kconfig 2026-04-29 11:18:49.089569460 -0700
@@ -17,7 +17,6 @@ config PPC_POWERNV
select PPC_DOORBELL
select MMU_NOTIFIER
select FORCE_SMP
- select ARCH_SUPPORTS_PER_VMA_LOCK
select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
default y
diff -puN arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks arch/powerpc/platforms/pseries/Kconfig
--- a/arch/powerpc/platforms/pseries/Kconfig~unconditional-vma-locks 2026-04-29 11:18:47.972526466 -0700
+++ b/arch/powerpc/platforms/pseries/Kconfig 2026-04-29 11:18:49.089569460 -0700
@@ -23,7 +23,6 @@ config PPC_PSERIES
select HOTPLUG_CPU
select FORCE_SMP
select SWIOTLB
- select ARCH_SUPPORTS_PER_VMA_LOCK
select PPC_RADIX_BROADCAST_TLBIE if PPC_RADIX_MMU
default y
diff -puN arch/riscv/Kconfig~unconditional-vma-locks arch/riscv/Kconfig
--- a/arch/riscv/Kconfig~unconditional-vma-locks 2026-04-29 11:18:48.060529854 -0700
+++ b/arch/riscv/Kconfig 2026-04-29 11:18:49.089569460 -0700
@@ -70,7 +70,6 @@ config RISCV
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU
select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
- select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_SCHED_MC if SMP
diff -puN arch/s390/Kconfig~unconditional-vma-locks arch/s390/Kconfig
--- a/arch/s390/Kconfig~unconditional-vma-locks 2026-04-29 11:18:48.125532357 -0700
+++ b/arch/s390/Kconfig 2026-04-29 11:18:49.089569460 -0700
@@ -153,7 +153,6 @@ config S390
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
- select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_SYM_ANNOTATIONS
diff -puN arch/x86/Kconfig~unconditional-vma-locks arch/x86/Kconfig
--- a/arch/x86/Kconfig~unconditional-vma-locks 2026-04-29 11:18:48.128532472 -0700
+++ b/arch/x86/Kconfig 2026-04-29 11:18:49.090569499 -0700
@@ -27,7 +27,6 @@ config X86_64
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
- select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
@@ -1885,7 +1884,6 @@ config X86_USER_SHADOW_STACK
bool "X86 userspace shadow stack"
depends on AS_WRUSS
depends on X86_64
- depends on PER_VMA_LOCK
select ARCH_USES_HIGH_VMA_FLAGS
select ARCH_HAS_USER_SHADOW_STACK
select X86_CET
diff -puN fs/proc/internal.h~unconditional-vma-locks fs/proc/internal.h
--- a/fs/proc/internal.h~unconditional-vma-locks 2026-04-29 11:18:48.305539283 -0700
+++ b/fs/proc/internal.h 2026-04-29 11:18:49.090569499 -0700
@@ -382,10 +382,8 @@ struct mem_size_stats;
struct proc_maps_locking_ctx {
struct mm_struct *mm;
-#ifdef CONFIG_PER_VMA_LOCK
bool mmap_locked;
struct vm_area_struct *locked_vma;
-#endif
};
struct proc_maps_private {
diff -puN fs/proc/task_mmu.c~unconditional-vma-locks fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~unconditional-vma-locks 2026-04-29 11:18:48.346540861 -0700
+++ b/fs/proc/task_mmu.c 2026-04-29 11:18:49.090569499 -0700
@@ -130,8 +130,6 @@ static void release_task_mempolicy(struc
}
#endif
-#ifdef CONFIG_PER_VMA_LOCK
-
static void reset_lock_ctx(struct proc_maps_locking_ctx *lock_ctx)
{
lock_ctx->locked_vma = NULL;
@@ -213,33 +211,6 @@ static inline bool fallback_to_mmap_lock
return true;
}
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline bool lock_vma_range(struct seq_file *m,
- struct proc_maps_locking_ctx *lock_ctx)
-{
- return mmap_read_lock_killable(lock_ctx->mm) == 0;
-}
-
-static inline void unlock_vma_range(struct proc_maps_locking_ctx *lock_ctx)
-{
- mmap_read_unlock(lock_ctx->mm);
-}
-
-static struct vm_area_struct *get_next_vma(struct proc_maps_private *priv,
- loff_t last_pos)
-{
- return vma_next(&priv->iter);
-}
-
-static inline bool fallback_to_mmap_lock(struct proc_maps_private *priv,
- loff_t pos)
-{
- return false;
-}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
static struct vm_area_struct *proc_get_vma(struct seq_file *m, loff_t *ppos)
{
struct proc_maps_private *priv = m->private;
@@ -527,8 +498,6 @@ static int pid_maps_open(struct inode *i
PROCMAP_QUERY_VMA_FLAGS \
)
-#ifdef CONFIG_PER_VMA_LOCK
-
static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
{
reset_lock_ctx(lock_ctx);
@@ -581,26 +550,6 @@ static struct vm_area_struct *query_vma_
return vma;
}
-#else /* CONFIG_PER_VMA_LOCK */
-
-static int query_vma_setup(struct proc_maps_locking_ctx *lock_ctx)
-{
- return mmap_read_lock_killable(lock_ctx->mm);
-}
-
-static void query_vma_teardown(struct proc_maps_locking_ctx *lock_ctx)
-{
- mmap_read_unlock(lock_ctx->mm);
-}
-
-static struct vm_area_struct *query_vma_find_by_addr(struct proc_maps_locking_ctx *lock_ctx,
- unsigned long addr)
-{
- return find_vma(lock_ctx->mm, addr);
-}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
static struct vm_area_struct *query_matching_vma(struct proc_maps_locking_ctx *lock_ctx,
unsigned long addr, u32 flags)
{
diff -puN include/linux/mmap_lock.h~unconditional-vma-locks include/linux/mmap_lock.h
--- a/include/linux/mmap_lock.h~unconditional-vma-locks 2026-04-29 11:18:48.700554487 -0700
+++ b/include/linux/mmap_lock.h 2026-04-29 11:18:49.091569537 -0700
@@ -76,8 +76,6 @@ static inline void mmap_assert_write_loc
rwsem_assert_held_write(&mm->mmap_lock);
}
-#ifdef CONFIG_PER_VMA_LOCK
-
#ifdef CONFIG_LOCKDEP
#define __vma_lockdep_map(vma) (&vma->vmlock_dep_map)
#else
@@ -484,52 +482,6 @@ struct vm_area_struct *lock_next_vma(str
struct vma_iterator *iter,
unsigned long address);
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline void mm_lock_seqcount_init(struct mm_struct *mm) {}
-static inline void mm_lock_seqcount_begin(struct mm_struct *mm) {}
-static inline void mm_lock_seqcount_end(struct mm_struct *mm) {}
-
-static inline bool mmap_lock_speculate_try_begin(struct mm_struct *mm, unsigned int *seq)
-{
- return false;
-}
-
-static inline bool mmap_lock_speculate_retry(struct mm_struct *mm, unsigned int seq)
-{
- return true;
-}
-static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt) {}
-static inline void vma_end_read(struct vm_area_struct *vma) {}
-static inline void vma_start_write(struct vm_area_struct *vma) {}
-static inline __must_check
-int vma_start_write_killable(struct vm_area_struct *vma) { return 0; }
-static inline void vma_assert_write_locked(struct vm_area_struct *vma)
- { mmap_assert_write_locked(vma->vm_mm); }
-static inline void vma_assert_attached(struct vm_area_struct *vma) {}
-static inline void vma_assert_detached(struct vm_area_struct *vma) {}
-static inline void vma_mark_attached(struct vm_area_struct *vma) {}
-static inline void vma_mark_detached(struct vm_area_struct *vma) {}
-
-static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
- unsigned long address)
-{
- return NULL;
-}
-
-static inline void vma_assert_locked(struct vm_area_struct *vma)
-{
- mmap_assert_locked(vma->vm_mm);
-}
-
-static inline void vma_assert_stabilised(struct vm_area_struct *vma)
-{
- /* If no VMA locks, then either mmap lock suffices to stabilise. */
- mmap_assert_locked(vma->vm_mm);
-}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
static inline void mmap_write_lock(struct mm_struct *mm)
{
__mmap_lock_trace_start_locking(mm, true);
diff -puN include/linux/mm.h~unconditional-vma-locks include/linux/mm.h
--- a/include/linux/mm.h~unconditional-vma-locks 2026-04-29 11:18:48.714555026 -0700
+++ b/include/linux/mm.h 2026-04-29 11:18:49.091569537 -0700
@@ -890,7 +890,6 @@ static inline void vma_numab_state_free(
* These must be here rather than mmap_lock.h as dependent on vm_fault type,
* declared in this header.
*/
-#ifdef CONFIG_PER_VMA_LOCK
static inline void release_fault_lock(struct vm_fault *vmf)
{
if (vmf->flags & FAULT_FLAG_VMA_LOCK)
@@ -906,17 +905,6 @@ static inline void assert_fault_locked(c
else
mmap_assert_locked(vmf->vma->vm_mm);
}
-#else
-static inline void release_fault_lock(struct vm_fault *vmf)
-{
- mmap_read_unlock(vmf->vma->vm_mm);
-}
-
-static inline void assert_fault_locked(const struct vm_fault *vmf)
-{
- mmap_assert_locked(vmf->vma->vm_mm);
-}
-#endif /* CONFIG_PER_VMA_LOCK */
static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
{
diff -puN include/linux/mm_types.h~unconditional-vma-locks include/linux/mm_types.h
--- a/include/linux/mm_types.h~unconditional-vma-locks 2026-04-29 11:18:48.761556836 -0700
+++ b/include/linux/mm_types.h 2026-04-29 11:18:49.092569576 -0700
@@ -959,7 +959,6 @@ struct vm_area_struct {
vma_flags_t flags;
};
-#ifdef CONFIG_PER_VMA_LOCK
/*
* Can only be written (using WRITE_ONCE()) while holding both:
* - mmap_lock (in write mode)
@@ -975,7 +974,7 @@ struct vm_area_struct {
* slowpath.
*/
unsigned int vm_lock_seq;
-#endif
+
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
@@ -1007,7 +1006,6 @@ struct vm_area_struct {
#ifdef CONFIG_NUMA_BALANCING
struct vma_numab_state *numab_state; /* NUMA Balancing state */
#endif
-#ifdef CONFIG_PER_VMA_LOCK
/*
* Used to keep track of firstly, whether the VMA is attached, secondly,
* if attached, how many read locks are taken, and thirdly, if the
@@ -1050,7 +1048,6 @@ struct vm_area_struct {
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map vmlock_dep_map;
#endif
-#endif
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree.
@@ -1249,7 +1246,6 @@ struct mm_struct {
* init_mm.mmlist, and are protected
* by mmlist_lock
*/
-#ifdef CONFIG_PER_VMA_LOCK
struct rcuwait vma_writer_wait;
/*
* This field has lock-like semantics, meaning it is sometimes
@@ -1269,7 +1265,6 @@ struct mm_struct {
* mmap_lock.
*/
seqcount_t mm_lock_seq;
-#endif
#ifdef CONFIG_FUTEX_PRIVATE_HASH
struct mutex futex_hash_lock;
struct futex_private_hash __rcu *futex_phash;
diff -puN kernel/fork.c~unconditional-vma-locks kernel/fork.c
--- a/kernel/fork.c~unconditional-vma-locks 2026-04-29 11:18:48.774557336 -0700
+++ b/kernel/fork.c 2026-04-29 11:18:49.092569576 -0700
@@ -1067,9 +1067,7 @@ static void mmap_init_lock(struct mm_str
{
init_rwsem(&mm->mmap_lock);
mm_lock_seqcount_init(mm);
-#ifdef CONFIG_PER_VMA_LOCK
rcuwait_init(&mm->vma_writer_wait);
-#endif
}
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
diff -puN mm/Kconfig~unconditional-vma-locks mm/Kconfig
--- a/mm/Kconfig~unconditional-vma-locks 2026-04-29 11:18:48.838559801 -0700
+++ b/mm/Kconfig 2026-04-29 11:18:49.093569614 -0700
@@ -1394,19 +1394,6 @@ config LRU_GEN_STATS
config LRU_GEN_WALKS_MMU
def_bool y
depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
-# }
-
-config ARCH_SUPPORTS_PER_VMA_LOCK
- def_bool n
-
-config PER_VMA_LOCK
- def_bool y
- depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
- help
- Allow per-vma locking during page fault handling.
-
- This feature allows locking each virtual memory area separately when
- handling page faults instead of taking mmap_lock.
config LOCK_MM_AND_FIND_VMA
bool
diff -puN mm/mmap_lock.c~unconditional-vma-locks mm/mmap_lock.c
--- a/mm/mmap_lock.c~unconditional-vma-locks 2026-04-29 11:18:49.084569267 -0700
+++ b/mm/mmap_lock.c 2026-04-29 11:18:49.093569614 -0700
@@ -44,7 +44,6 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_relea
#endif /* CONFIG_TRACING */
#ifdef CONFIG_MMU
-#ifdef CONFIG_PER_VMA_LOCK
/* State shared across __vma_[start, end]_exclude_readers. */
struct vma_exclude_readers_state {
@@ -431,7 +430,6 @@ fallback:
return vma;
}
-#endif /* CONFIG_PER_VMA_LOCK */
#ifdef CONFIG_LOCK_MM_AND_FIND_VMA
#include <linux/extable.h>
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
2026-04-29 18:19 ` [PATCH 1/6] mm: Make per-VMA locks available universally Dave Hansen
@ 2026-04-29 18:19 ` Dave Hansen
2026-04-29 18:19 ` [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers Dave Hansen
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:19 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().
Long Version:
== Background ==
Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but but left mmap_read_trylock() as a
fallback.
This was presumably because the per-VMA locking can fail for several
reasons and most (all?) lock_vma_under_rcu() callers have a fallback
to mmap_read_trylock().
== Problem ==
The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().
The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.
== Solution ==
Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.
Full disclosure: I originally tried to do this with
lock_vma_under_rcu_wait(), but it did not fit well with the mmap_lock
trylock semantics. Claude caught this in a review and suggested the
approach in this path. It seemed sane to me. So, Suggesed-by: Claude,
I guess.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/drivers/android/binder_alloc.c | 22 +++++-----------------
1 file changed, 5 insertions(+), 17 deletions(-)
diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock 2026-04-29 11:18:50.066607065 -0700
+++ b/drivers/android/binder_alloc.c 2026-04-29 11:18:50.069607180 -0700
@@ -1142,7 +1142,6 @@ enum lru_status binder_alloc_free_page(s
struct vm_area_struct *vma;
struct page *page_to_free;
unsigned long page_addr;
- int mm_locked = 0;
size_t index;
if (!mmget_not_zero(mm))
@@ -1151,15 +1150,10 @@ enum lru_status binder_alloc_free_page(s
index = mdata->page_index;
page_addr = alloc->vm_start + index * PAGE_SIZE;
- /* attempt per-vma lock first */
+ /* attempt per-vma lock */
vma = lock_vma_under_rcu(mm, page_addr);
- if (!vma) {
- /* fall back to mmap_lock */
- if (!mmap_read_trylock(mm))
- goto err_mmap_read_lock_failed;
- mm_locked = 1;
- vma = vma_lookup(mm, page_addr);
- }
+ if (!vma)
+ goto err_mmap_read_lock_failed;
if (!mutex_trylock(&alloc->mutex))
goto err_get_alloc_mutex_failed;
@@ -1191,10 +1185,7 @@ enum lru_status binder_alloc_free_page(s
}
mutex_unlock(&alloc->mutex);
- if (mm_locked)
- mmap_read_unlock(mm);
- else
- vma_end_read(vma);
+ vma_end_read(vma);
mmput_async(mm);
binder_free_page(page_to_free);
@@ -1203,10 +1194,7 @@ enum lru_status binder_alloc_free_page(s
err_invalid_vma:
mutex_unlock(&alloc->mutex);
err_get_alloc_mutex_failed:
- if (mm_locked)
- mmap_read_unlock(mm);
- else
- vma_end_read(vma);
+ vma_end_read(vma);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
2026-04-29 18:19 ` [PATCH 1/6] mm: Make per-VMA locks available universally Dave Hansen
2026-04-29 18:19 ` [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
@ 2026-04-29 18:19 ` Dave Hansen
2026-04-29 18:20 ` [PATCH 4/6] binder: Remove mmap_lock fallback Dave Hansen
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:19 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
== Background ==
There are basically two parallel ways to look up a VMA: the
traditional way, which is protected by mmap_lock, and the RCU-based
per-VMA lock way which is based on RCU and refcounts.
== Problems ==
The mmap_lock one is more straightforward to use but it has a big
disadvantage in that it can not be mixed with page faults since those
can take mmap_lock for read.
The RCU one can be mixed with faults, but it is not available in all
configs, so all RCU users need to be able to fall back to the
traditional way.
== Solution ==
Add a variant of the RCU-based lookup that waits for writers. This is
basically the same as the existing RCU-based lookup, but it also takes
mmap_lock for read and waits for writers to finish before returning
the VMA. This has two big advantages:
1. Callers do not need to have a fallback path for when they
collide with writers.
2. It can be used in contexts where page faults can happen because
it can take the mmap_lock for read but never *holds* it.
== Discussion ==
I am not married to the naming here at all. Naming suggestions would
be much appreciated.
This basically uses mmap_lock to wait for writers, nothing else. The
VMA is obviously stable under mmap_read_lock() and the code _can_
likely take advantage of that and possibly even remove the goto. For
instance, it could (probably) bump the VMA refcount and exclude future
writers. That would eliminate the goto.
But the approach as-is is probably the smallest line count and
arguably the simplest approach. It is a good place to start a
conversation if nothing else.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/include/linux/mmap_lock.h | 2 ++
b/mm/mmap_lock.c | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
--- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-04-29 11:18:50.633628887 -0700
+++ b/include/linux/mmap_lock.h 2026-04-29 11:18:50.707631737 -0700
@@ -470,6 +470,8 @@ static inline void vma_mark_detached(str
struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
unsigned long address);
+struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
+ unsigned long address);
/*
* Locks next vma pointed by the iterator. Confirms the locked vma has not
diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
--- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-04-29 11:18:50.704631622 -0700
+++ b/mm/mmap_lock.c 2026-04-29 11:18:50.707631737 -0700
@@ -340,6 +340,49 @@ inval:
return NULL;
}
+/*
+ * Find the VMA covering 'address' and lock it for reading. Waits for writers to
+ * finish if the VMA is being modified. Returns NULL if there is no VMA covering
+ * 'address'.
+ *
+ * The fast path does not take mmap lock.
+ */
+struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
+ unsigned long address)
+{
+ struct vm_area_struct *vma;
+
+retry:
+ vma = lock_vma_under_rcu(mm, address);
+ /* Fast path: return stable VMA covering 'address': */
+ if (vma)
+ return vma;
+
+ /*
+ * Slow path: the VMA covering 'address' is being modified.
+ * or there is no VMA covering 'address'. Rule out the
+ * possibility that the VMA is being modified:
+ */
+ mmap_read_lock(mm);
+ vma = vma_lookup(mm, address);
+ mmap_read_unlock(mm);
+
+ /* There was for sure no VMA covering 'address': */
+ if (!vma)
+ return NULL;
+
+ /*
+ * VMA was likely being modified during RCU lookup. Try again.
+ * mmap_read_lock() waited for the writer to complete and the
+ * writer is now done.
+ *
+ * There is no guarantee that any single retry will succeed,
+ * and it is possible but highly unlikely this will loop
+ * forever.
+ */
+ goto retry;
+}
+
static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
struct vma_iterator *vmi,
unsigned long from_addr)
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 4/6] binder: Remove mmap_lock fallback
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (2 preceding siblings ...)
2026-04-29 18:19 ` [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers Dave Hansen
@ 2026-04-29 18:20 ` Dave Hansen
2026-04-29 18:20 ` [PATCH 5/6] tcp: Remove mmap_lock fallback path Dave Hansen
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:20 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
Previously, the per-VMA locking could fail in the face of writers
which necessitate a fallback to mmap_lock. The new
lock_vma_under_rcu_wait() will wait for writers instead of failing.
Use the new helper. Wait for writers. Remove the fallback to mmap_lock.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/drivers/android/binder_alloc.c | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)
diff -puN drivers/android/binder_alloc.c~binder-vma-waiter drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-vma-waiter 2026-04-29 11:18:51.307654829 -0700
+++ b/drivers/android/binder_alloc.c 2026-04-29 11:18:51.310654944 -0700
@@ -259,21 +259,14 @@ static int binder_page_insert(struct bin
struct vm_area_struct *vma;
int ret = -ESRCH;
- /* attempt per-vma lock first */
- vma = lock_vma_under_rcu(mm, addr);
- if (vma) {
- if (binder_alloc_is_mapped(alloc))
- ret = vm_insert_page(vma, addr, page);
- vma_end_read(vma);
+ vma = lock_vma_under_rcu_wait(mm, addr);
+ if (!vma)
return ret;
- }
- /* fall back to mmap_lock */
- mmap_read_lock(mm);
- vma = vma_lookup(mm, addr);
- if (vma && binder_alloc_is_mapped(alloc))
+ if (binder_alloc_is_mapped(alloc))
ret = vm_insert_page(vma, addr, page);
- mmap_read_unlock(mm);
+
+ vma_end_read(vma);
return ret;
}
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 5/6] tcp: Remove mmap_lock fallback path
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (3 preceding siblings ...)
2026-04-29 18:20 ` [PATCH 4/6] binder: Remove mmap_lock fallback Dave Hansen
@ 2026-04-29 18:20 ` Dave Hansen
2026-04-29 18:20 ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Dave Hansen
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:20 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
Previously, the per-VMA locking could fail in the face of writers
which necessitates a fallback to mmap_lock. The new
lock_vma_under_rcu_wait() will wait for writers instead of failing.
Use the new helper. Wait for writers. Remove the fallback to mmap_lock.
This really is a nice cleanup. It removes the need to pass the lock
state back and forth to find_tcp_vma().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/net/ipv4/tcp.c | 31 +++++++++----------------------
1 file changed, 9 insertions(+), 22 deletions(-)
diff -puN net/ipv4/tcp.c~ipv4-tcp-vma-waiter net/ipv4/tcp.c
--- a/net/ipv4/tcp.c~ipv4-tcp-vma-waiter 2026-04-29 11:18:51.870676498 -0700
+++ b/net/ipv4/tcp.c 2026-04-29 11:18:51.874676652 -0700
@@ -2171,27 +2171,18 @@ static void tcp_zc_finalize_rx_tstamp(st
}
static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
- unsigned long address,
- bool *mmap_locked)
+ unsigned long address)
{
- struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
+ struct vm_area_struct *vma = lock_vma_under_rcu_wait(mm, address);
- if (vma) {
- if (vma->vm_ops != &tcp_vm_ops) {
- vma_end_read(vma);
- return NULL;
- }
- *mmap_locked = false;
- return vma;
- }
+ if (!vma)
+ return NULL;
- mmap_read_lock(mm);
- vma = vma_lookup(mm, address);
- if (!vma || vma->vm_ops != &tcp_vm_ops) {
- mmap_read_unlock(mm);
+ if (vma->vm_ops != &tcp_vm_ops) {
+ vma_end_read(vma);
return NULL;
}
- *mmap_locked = true;
+
return vma;
}
@@ -2212,7 +2203,6 @@ static int tcp_zerocopy_receive(struct s
u32 seq = tp->copied_seq;
u32 total_bytes_to_map;
int inq = tcp_inq(sk);
- bool mmap_locked;
int ret;
zc->copybuf_len = 0;
@@ -2237,7 +2227,7 @@ static int tcp_zerocopy_receive(struct s
return 0;
}
- vma = find_tcp_vma(current->mm, address, &mmap_locked);
+ vma = find_tcp_vma(current->mm, address);
if (!vma)
return -EINVAL;
@@ -2319,10 +2309,7 @@ static int tcp_zerocopy_receive(struct s
zc, total_bytes_to_map);
}
out:
- if (mmap_locked)
- mmap_read_unlock(current->mm);
- else
- vma_end_read(vma);
+ vma_end_read(vma);
/* Try to copy straggler data. */
if (!ret)
copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (4 preceding siblings ...)
2026-04-29 18:20 ` [PATCH 5/6] tcp: Remove mmap_lock fallback path Dave Hansen
@ 2026-04-29 18:20 ` Dave Hansen
2026-05-04 23:15 ` Edgecombe, Rick P
2026-04-29 18:22 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (2 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:20 UTC (permalink / raw)
To: linux-kernel
Cc: Dave Hansen, Andrew Morton, Liam R. Howlett, linux-mm,
Lorenzo Stoakes, Shakeel Butt, Suren Baghdasaryan,
Vlastimil Babka
From: Dave Hansen <dave.hansen@linux.intel.com>
The shadow stack code needs to look at the VMA from which it is
reading a userspace "token" to ensure that the memory is shadow stack
memory. If it did not do this, it might read the token from
non-shadow-stack memory, which could result in a control flow hijack.
But that lookup requires two things:
* Looking at a VMA, which must be locked
* Touching userspace
That's a bit of a pain because mmap_lock can not be held while
touching userspace. So the code has to drop the lock, touch userspace,
then re-acquire the lock and check if the VMA might have changed.
The current implementation does with a combination of holding
mmap_lock and looping if the VMA might have changed. It works great.
But the lock_vma_under_rcu_wait() API is a little simpler and also
does not use mmap_lock in its fast path.
Switch to lock_vma_under_rcu_wait().
BTW, this does swap in a mmap_read_lock() for
mmap_read_lock_killable(). That obviously isn't ideal, but it's
trivially fixable with another variant of the helper. I'd apprecaite
if we could handwave that away for the moment. :)
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
---
b/arch/x86/kernel/shstk.c | 47 ++++++++++++++++------------------------------
1 file changed, 17 insertions(+), 30 deletions(-)
diff -puN arch/x86/kernel/shstk.c~shstk-pop-rcu arch/x86/kernel/shstk.c
--- a/arch/x86/kernel/shstk.c~shstk-pop-rcu 2026-04-29 11:18:52.425697858 -0700
+++ b/arch/x86/kernel/shstk.c 2026-04-29 11:18:52.428697973 -0700
@@ -326,8 +326,9 @@ static int shstk_push_sigframe(unsigned
static int shstk_pop_sigframe(unsigned long *ssp)
{
+ struct vm_area_struct *vma;
unsigned long token_addr;
- unsigned int seq;
+ int err;
/*
* It is possible for the SSP to be off the end of a shadow stack by 4
@@ -338,35 +339,21 @@ static int shstk_pop_sigframe(unsigned l
if (!IS_ALIGNED(*ssp, 8))
return -EINVAL;
- do {
- struct vm_area_struct *vma;
- bool valid_vma;
- int err;
-
- if (mmap_read_lock_killable(current->mm))
- return -EINTR;
-
- vma = find_vma(current->mm, *ssp);
- valid_vma = vma && (vma->vm_flags & VM_SHADOW_STACK);
-
- /*
- * VMAs can change between get_shstk_data() and find_vma().
- * Watch for changes and ensure that 'token_addr' comes from
- * 'vma' by recording a seqcount.
- *
- * Ignore the return value of mmap_lock_speculate_try_begin()
- * because the mmap lock excludes the possibility of writers.
- */
- mmap_lock_speculate_try_begin(current->mm, &seq);
- mmap_read_unlock(current->mm);
-
- if (!valid_vma)
- return -EINVAL;
-
- err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp);
- if (err)
- return err;
- } while (mmap_lock_speculate_retry(current->mm, seq));
+ vma = lock_vma_under_rcu_wait(current->mm, *ssp);
+ if (!vma)
+ return -EINVAL;
+
+ if (!(vma->vm_flags & VM_SHADOW_STACK)) {
+ vma_end_read(vma);
+ return -EINVAL;
+ }
+
+ err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp);
+
+ vma_end_read(vma);
+
+ if (err)
+ return err;
/* Restore SSP aligned? */
if (unlikely(!IS_ALIGNED(token_addr, 8)))
_
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path
2026-04-29 18:20 ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Dave Hansen
@ 2026-05-04 23:15 ` Edgecombe, Rick P
0 siblings, 0 replies; 15+ messages in thread
From: Edgecombe, Rick P @ 2026-05-04 23:15 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com
Cc: Liam.Howlett@oracle.com, linux-mm@kvack.org, ljs@kernel.org,
surenb@google.com, vbabka@kernel.org, shakeel.butt@linux.dev,
akpm@linux-foundation.org
On Wed, 2026-04-29 at 11:20 -0700, Dave Hansen wrote:
> + vma = lock_vma_under_rcu_wait(current->mm, *ssp);
> + if (!vma)
> + return -EINVAL;
> +
> + if (!(vma->vm_flags & VM_SHADOW_STACK)) {
> + vma_end_read(vma);
> + return -EINVAL;
> + }
> +
> + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp);
Unfortunately, I think it won't work for the shadow stack case with the user
access. I get this splat from the shadow stack selftests:
======================================================
WARNING: possible circular locking dependency detected
7.1.0-rc1+ #2936 Not tainted
------------------------------------------------------
test_shadow_sta/930 is trying to acquire lock:
ff32a05fbc6a1008 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x3c/0x80
but task is already holding lock:
ff32a05f4caf3c48 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xaf/0x2e0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (vm_lock){++++}-{0:0}:
lock_acquire+0xbd/0x2f0
__vma_start_exclude_readers+0x8d/0x1e0
__vma_start_write+0x56/0xe0
vma_expand+0x7e/0x390
relocate_vma_down+0x126/0x220
setup_arg_pages+0x269/0x430
load_elf_binary+0x3d1/0x1840
bprm_execve+0x2cf/0x730
kernel_execve+0xf6/0x160
kernel_init+0xb9/0x1c0
ret_from_fork+0x2eb/0x340
ret_from_fork_asm+0x1a/0x30
-> #0 (&mm->mmap_lock){++++}-{4:4}:
check_prev_add+0xf1/0xd00
__lock_acquire+0x14a8/0x1ac0
lock_acquire+0xbd/0x2f0
__might_fault+0x5b/0x80
restore_signal_shadow_stack+0xd6/0x270
__do_sys_rt_sigreturn+0xdf/0xf0
do_syscall_64+0x11c/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(vm_lock);
lock(&mm->mmap_lock);
lock(vm_lock);
rlock(&mm->mmap_lock);
*** DEADLOCK ***
1 lock held by test_shadow_sta/930:
#0: ff32a05f4caf3c48 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xaf/0x2e0
stack backtrace:
CPU: 18 UID: 0 PID: 930 Comm: test_shadow_sta Not tainted 7.1.0-rc1+ #2936
PREEMPT(full)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_circular_bug+0x2ca/0x400
check_noncircular+0x12f/0x150
? __lock_acquire+0x49c/0x1ac0
check_prev_add+0xf1/0xd00
? reacquire_held_locks+0xe4/0x200
__lock_acquire+0x14a8/0x1ac0
lock_acquire+0xbd/0x2f0
? __might_fault+0x3c/0x80
? lock_is_held_type+0xa0/0x120
? __might_fault+0x3c/0x80
__might_fault+0x5b/0x80
? __might_fault+0x3c/0x80
restore_signal_shadow_stack+0xd6/0x270
__do_sys_rt_sigreturn+0xdf/0xf0
do_syscall_64+0x11c/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x40212f
Code: 61 00 00 e8 73 f1 ff ff 48 8b 05 4c 61 00 00 31 d2 48 0f 38 f6 10 48 8b
44 24 08 64 48 2b 08
RSP: 002b:00007ffc286fb208 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007ff628b187b0
RDX: 0000000000000000 RSI: 00000000066492a0 RDI: 0000000000000000
RBP: 00007ffc286fb360 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000001 R14: 00007ff628b6c000 R15: 0000000000406e18
I guess the problem is the lock ordering. Not sure if there is any slow path
avoidance details that could make this splat a false positive. But how about
this simpler munmap() case:
Shadow stack signal munmap()
------------------- --------
vma_start_read() (VM_SHADOW_STACK check)
mmap_write_lock()
mmap_read_lock() (user fault) <- deadlock
vma_start_write() <-deadlock
> +
> + vma_end_read(vma);
> +
> + if (err)
> + return err;
>
> /* Restore SSP aligned? */
> if (unlikely(!IS_ALIGNED(token_addr, 8)))
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/6] mm: Make per-VMA locks available in all builds
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (5 preceding siblings ...)
2026-04-29 18:20 ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Dave Hansen
@ 2026-04-29 18:22 ` Dave Hansen
2026-04-30 8:11 ` Lorenzo Stoakes
2026-04-30 7:55 ` [syzbot ci] " syzbot ci
[not found] ` <20260430072053.e0be1b431bcff02831f07e9d@linux-foundation.org>
8 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2026-04-29 18:22 UTC (permalink / raw)
To: Dave Hansen, linux-kernel
Cc: Andrew Morton, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
Shakeel Butt, Suren Baghdasaryan, Vlastimil Babka
BTW, this is *ENTIRELY* an [RFC]. It's just not tagged properly.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 0/6] mm: Make per-VMA locks available in all builds
2026-04-29 18:22 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
@ 2026-04-30 8:11 ` Lorenzo Stoakes
2026-04-30 17:17 ` Suren Baghdasaryan
0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2026-04-30 8:11 UTC (permalink / raw)
To: Dave Hansen
Cc: Dave Hansen, linux-kernel, Andrew Morton, Liam R. Howlett,
linux-mm, Shakeel Butt, Suren Baghdasaryan, Vlastimil Babka
On Wed, Apr 29, 2026 at 11:22:28AM -0700, Dave Hansen wrote:
> BTW, this is *ENTIRELY* an [RFC]. It's just not tagged properly.
Was going to say :)
Not going to be able to get to this until after LSF... :) Likely the same for
Suren also.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/6] mm: Make per-VMA locks available in all builds
2026-04-30 8:11 ` Lorenzo Stoakes
@ 2026-04-30 17:17 ` Suren Baghdasaryan
2026-04-30 17:20 ` Dave Hansen
0 siblings, 1 reply; 15+ messages in thread
From: Suren Baghdasaryan @ 2026-04-30 17:17 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Dave Hansen, Dave Hansen, linux-kernel, Andrew Morton,
Liam R. Howlett, linux-mm, Shakeel Butt, Vlastimil Babka
On Thu, Apr 30, 2026 at 1:11 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Wed, Apr 29, 2026 at 11:22:28AM -0700, Dave Hansen wrote:
> > BTW, this is *ENTIRELY* an [RFC]. It's just not tagged properly.
>
> Was going to say :)
>
> Not going to be able to get to this until after LSF... :) Likely the same for
> Suren also.
Yeah, sorry. Trying to wrap up all the urgent stuff before the trip. I
might be able to review the patches later this week before the
conference starts, but can't promise.
Thanks,
Suren.
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/6] mm: Make per-VMA locks available in all builds
2026-04-30 17:17 ` Suren Baghdasaryan
@ 2026-04-30 17:20 ` Dave Hansen
0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-30 17:20 UTC (permalink / raw)
To: Suren Baghdasaryan, Lorenzo Stoakes
Cc: Dave Hansen, linux-kernel, Andrew Morton, Liam R. Howlett,
linux-mm, Shakeel Butt, Vlastimil Babka
On 4/30/26 10:17, Suren Baghdasaryan wrote:
> On Thu, Apr 30, 2026 at 1:11 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
>> On Wed, Apr 29, 2026 at 11:22:28AM -0700, Dave Hansen wrote:
>>> BTW, this is *ENTIRELY* an [RFC]. It's just not tagged properly.
>> Was going to say 🙂
>>
>> Not going to be able to get to this until after LSF... 🙂 Likely the same for
>> Suren also.
> Yeah, sorry. Trying to wrap up all the urgent stuff before the trip. I
> might be able to review the patches later this week before the
> conference starts, but can't promise.
Seriously, don't worry about rushing this. After the conference is
perfectly fine with me. There are a few things that Sashiko complained
about that need to get fixed up anyway.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [syzbot ci] Re: mm: Make per-VMA locks available in all builds
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
` (6 preceding siblings ...)
2026-04-29 18:22 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
@ 2026-04-30 7:55 ` syzbot ci
2026-04-30 16:59 ` Dave Hansen
[not found] ` <20260430072053.e0be1b431bcff02831f07e9d@linux-foundation.org>
8 siblings, 1 reply; 15+ messages in thread
From: syzbot ci @ 2026-04-30 7:55 UTC (permalink / raw)
To: akpm, dave.hansen, liam.howlett, linux-kernel, linux-mm, ljs,
shakeel.butt, surenb, vbabka
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] mm: Make per-VMA locks available in all builds
https://lore.kernel.org/all/20260429181954.F50224AE@davehans-spike.ostc.intel.com
* [PATCH 1/6] mm: Make per-VMA locks available universally
* [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock
* [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers
* [PATCH 4/6] binder: Remove mmap_lock fallback
* [PATCH 5/6] tcp: Remove mmap_lock fallback path
* [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path
and found the following issue:
WARNING in mbind_range
Full report is available here:
https://ci.syzbot.org/series/374f338e-4b3b-4645-871c-78964f944bbd
***
WARNING in mbind_range
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: 57b8e2d666a31fa201432d58f5fe3469a0dd83ba
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/786ac31f-0f5e-4ceb-88c7-45f4bee79d60/config
syz repro: https://ci.syzbot.org/findings/4b270176-3e48-4ac7-8ddb-f326d6883d93/syz_repro
pgoff 200000000 file 0000000000000000 private_data 0000000000000000
flags: 0x8100077(read|write|exec|mayread|maywrite|mayexec|account|softdirty)
------------[ cut here ]------------
1
WARNING: ./include/linux/mmap_lock.h:332 at vma_assert_write_locked include/linux/mmap_lock.h:332 [inline], CPU#0: syz.2.19/5876
WARNING: ./include/linux/mmap_lock.h:332 at vma_replace_policy mm/mempolicy.c:1016 [inline], CPU#0: syz.2.19/5876
WARNING: ./include/linux/mmap_lock.h:332 at mbind_range+0x57a/0x810 mm/mempolicy.c:1063, CPU#0: syz.2.19/5876
Modules linked in:
CPU: 0 UID: 0 PID: 5876 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:vma_assert_write_locked include/linux/mmap_lock.h:332 [inline]
RIP: 0010:vma_replace_policy mm/mempolicy.c:1016 [inline]
RIP: 0010:mbind_range+0x57a/0x810 mm/mempolicy.c:1063
Code: 97 ff e9 b2 fb ff ff e8 a4 4a 97 ff 90 0f 0b 90 e9 14 fe ff ff e8 96 4a 97 ff 4c 89 ff e8 0e b5 f9 fe c6 05 9c 2e ec 0d 01 90 <0f> 0b 90 4d 85 e4 0f 85 48 fe ff ff e8 75 4a 97 ff 31 db 4d 8d 77
RSP: 0018:ffffc90003bc7c78 EFLAGS: 00010292
RAX: 000000000000011f RBX: 000000000000000b RCX: a410c902b7e34800
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000009 R08: ffffc90003bc7967 R09: 1ffff92000778f2c
R10: dffffc0000000000 R11: fffff52000778f2d R12: ffff8881012c9e00
R13: dffffc0000000000 R14: ffff888115b99bf8 R15: ffff8881162fa300
FS: 00007fa09187c6c0(0000) GS:ffff88818dc93000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b33b63fff CR3: 0000000115bda000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_mbind mm/mempolicy.c:1560 [inline]
kernel_mbind mm/mempolicy.c:1757 [inline]
__do_sys_mbind mm/mempolicy.c:1831 [inline]
__se_sys_mbind+0xad4/0x10f0 mm/mempolicy.c:1827
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa09099cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa09187c028 EFLAGS: 00000246 ORIG_RAX: 00000000000000ed
RAX: ffffffffffffffda RBX: 00007fa090c15fa0 RCX: 00007fa09099cdd9
RDX: 0000000000000001 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007fa090a32d69 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa090c16038 R14: 00007fa090c15fa0 R15: 00007ffff596d0f8
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [syzbot ci] Re: mm: Make per-VMA locks available in all builds
2026-04-30 7:55 ` [syzbot ci] " syzbot ci
@ 2026-04-30 16:59 ` Dave Hansen
0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-30 16:59 UTC (permalink / raw)
To: syzbot ci, akpm, dave.hansen, liam.howlett, linux-kernel,
linux-mm, ljs, shakeel.butt, surenb, vbabka
Cc: syzbot, syzkaller-bugs
On 4/30/26 00:55, syzbot ci wrote:
> and found the following issue:
> WARNING in mbind_range
>
> Full report is available here:
> https://ci.syzbot.org/series/374f338e-4b3b-4645-871c-78964f944bbd
>
> ***
>
> WARNING in mbind_range
I left a whole bunch of #ifdef CONFIG_PER_VMA_LOCK cruft around in v1. I
suspect some of the debugging code ended up in a weird state in a config
that I didn't test.
I'll try to reproduce this splat, though. It looks straightforward enough.
^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20260430072053.e0be1b431bcff02831f07e9d@linux-foundation.org>]
* Re: [PATCH 0/6] mm: Make per-VMA locks available in all builds
[not found] ` <20260430072053.e0be1b431bcff02831f07e9d@linux-foundation.org>
@ 2026-04-30 16:52 ` Dave Hansen
0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2026-04-30 16:52 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, linux-mm, Lorenzo Stoakes,
Shakeel Butt, Suren Baghdasaryan, Vlastimil Babka, LKML
... adding all the cc's back.
On 4/30/26 07:20, Andrew Morton wrote:
> On Wed, 29 Apr 2026 11:19:54 -0700 Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>> When working on some x86 shadow stack code, it was a real pain to
>> avoid causing recursive locking problems with mmap_lock. One way
>> to avoid those was to avoid mmap_lock and use per-VMA locks instead.
>> They are great, but they are not available in all configs which
>> makes them unusable in generic code, or if you want to completely
>> avoid mmap_lock.
>
> Did you see the AI review?
> https://sashiko.dev/#/patchset/20260429181954.F50224AE@davehans-spike.ostc.intel.com
I just went through it. There was some absolutely valid stuff in there
like a bunch of CONFIG_PER_VMA_LOCK references needing to be cleaned up.
It also made some good points about the binder shrinker sites that I
think I cleaned up for v2.
There are three very valid structural problems that it's concerned about.
First is that lock_vma_under_rcu_wait() doesn't use
mmap_read_lock_killable(). It probably needs to, or at least there would
need to be killable and non-killable variants. That's easy enough to do
if folks agree that this is overall something that should go forward.
I'd prefer to hand wave it away for the moment, though.
Second, it brings up concerns about lock_vma_under_rcu_wait() deadlocks
in the face of other per-VMA or mmap_lock holders. This is very valid,
but it's inherent for users of mmap_lock. I think it's just a
documentation issue.
Third, Sashiko is _very_ peeved about the lock_vma_under_rcu_wait() goto
loop. Broadly, it's concerned about fairness and livelocks in the face
of userspace being able to compel 'goto retry' to happen forever. It's a
valid theoretical concern for sure. I'm less convinced that it will be a
problem in practice, and I should probably hack together a torture test
to see how many retries actually happen.
The other way to fix it more robustly would be to acquire the vma
reference under the existing mmap_read_lock(). I _think_ it's just a
couple extra lines of code, but I haven't done the legwork to flesh out
how that would look.
But the key questions for this series remain:
1. Should/can per-VMA locking be available in all configs?
2. Is *a* lock_vma_under_rcu_wait() implementation feasible and useful?
The implementation in this series is highly imperfect. Is there a
chance of a better one or is it just impossible?
^ permalink raw reply [flat|nested] 15+ messages in thread