* [PATCH v2 1/2] mm: limit the scope of vma_start_read()
@ 2025-07-31 15:19 Suren Baghdasaryan
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2025-07-31 15:19 UTC (permalink / raw)
To: akpm
Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm,
linux-kernel, surenb
Limit the scope of vma_start_read() as it is used only as a helper for
higher-level locking functions implemented inside mmap_lock.c and we are
about to introduce more complex RCU rules for this function.
The change is pure code refactoring and has no functional changes.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mmap_lock.h | 85 ---------------------------------------
mm/mmap_lock.c | 85 +++++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+), 85 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 11a078de9150..2c9fffa58714 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -147,91 +147,6 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
}
}
-/*
- * Try to read-lock a vma. The function is allowed to occasionally yield false
- * locked result to avoid performance overhead, in which case we fall back to
- * using mmap_lock. The function should never yield false unlocked result.
- * False locked result is possible if mm_lock_seq overflows or if vma gets
- * reused and attached to a different mm before we lock it.
- * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
- * detached.
- *
- * WARNING! The vma passed to this function cannot be used if the function
- * fails to lock it because in certain cases RCU lock is dropped and then
- * reacquired. Once RCU lock is dropped the vma can be concurently freed.
- */
-static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
- struct vm_area_struct *vma)
-{
- int oldcnt;
-
- /*
- * Check before locking. A race might cause false locked result.
- * We can use READ_ONCE() for the mm_lock_seq here, and don't need
- * ACQUIRE semantics, because this is just a lockless check whose result
- * we don't rely on for anything - the mm_lock_seq read against which we
- * need ordering is below.
- */
- if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
- return NULL;
-
- /*
- * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
- * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
- * Acquire fence is required here to avoid reordering against later
- * vm_lock_seq check and checks inside lock_vma_under_rcu().
- */
- if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
- VMA_REF_LIMIT))) {
- /* return EAGAIN if vma got detached from under us */
- return oldcnt ? NULL : ERR_PTR(-EAGAIN);
- }
-
- rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
-
- /*
- * If vma got attached to another mm from under us, that mm is not
- * stable and can be freed in the narrow window after vma->vm_refcnt
- * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
- * releasing vma->vm_refcnt.
- */
- if (unlikely(vma->vm_mm != mm)) {
- /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
- struct mm_struct *other_mm = vma->vm_mm;
-
- /*
- * __mmdrop() is a heavy operation and we don't need RCU
- * protection here. Release RCU lock during these operations.
- * We reinstate the RCU read lock as the caller expects it to
- * be held when this function returns even on error.
- */
- rcu_read_unlock();
- mmgrab(other_mm);
- vma_refcount_put(vma);
- mmdrop(other_mm);
- rcu_read_lock();
- return NULL;
- }
-
- /*
- * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
- * False unlocked result is impossible because we modify and check
- * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
- * modification invalidates all existing locks.
- *
- * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
- * racing with vma_end_write_all(), we only start reading from the VMA
- * after it has been unlocked.
- * This pairs with RELEASE semantics in vma_end_write_all().
- */
- if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
- vma_refcount_put(vma);
- return NULL;
- }
-
- return vma;
-}
-
/*
* Use only while holding mmap read lock which guarantees that locking will not
* fail (nobody can concurrently write-lock the vma). vma_start_read() should
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index b006cec8e6fe..10826f347a9f 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -127,6 +127,91 @@ void vma_mark_detached(struct vm_area_struct *vma)
}
}
+/*
+ * Try to read-lock a vma. The function is allowed to occasionally yield false
+ * locked result to avoid performance overhead, in which case we fall back to
+ * using mmap_lock. The function should never yield false unlocked result.
+ * False locked result is possible if mm_lock_seq overflows or if vma gets
+ * reused and attached to a different mm before we lock it.
+ * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
+ * detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
+ */
+static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
+ struct vm_area_struct *vma)
+{
+ int oldcnt;
+
+ /*
+ * Check before locking. A race might cause false locked result.
+ * We can use READ_ONCE() for the mm_lock_seq here, and don't need
+ * ACQUIRE semantics, because this is just a lockless check whose result
+ * we don't rely on for anything - the mm_lock_seq read against which we
+ * need ordering is below.
+ */
+ if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
+ return NULL;
+
+ /*
+ * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
+ * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
+ * Acquire fence is required here to avoid reordering against later
+ * vm_lock_seq check and checks inside lock_vma_under_rcu().
+ */
+ if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
+ VMA_REF_LIMIT))) {
+ /* return EAGAIN if vma got detached from under us */
+ return oldcnt ? NULL : ERR_PTR(-EAGAIN);
+ }
+
+ rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
+ /*
+ * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
+ * False unlocked result is impossible because we modify and check
+ * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
+ * modification invalidates all existing locks.
+ *
+ * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
+ * racing with vma_end_write_all(), we only start reading from the VMA
+ * after it has been unlocked.
+ * This pairs with RELEASE semantics in vma_end_write_all().
+ */
+ if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
+ vma_refcount_put(vma);
+ return NULL;
+ }
+
+ return vma;
+}
+
/*
* Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be
* stable and not isolated. If the VMA is not found or is being modified the
base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure
2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
@ 2025-07-31 15:19 ` Suren Baghdasaryan
2025-08-01 9:09 ` Vlastimil Babka
2025-08-01 11:00 ` Lorenzo Stoakes
2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
2025-08-01 8:42 ` Vlastimil Babka
2 siblings, 2 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2025-07-31 15:19 UTC (permalink / raw)
To: akpm
Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm,
linux-kernel, surenb
vma_start_read() can drop and reacquire RCU lock in certain failure
cases. It's not apparent that the RCU session started by the caller of
this function might be interrupted when vma_start_read() fails to lock
the vma. This might become a source of subtle bugs and to prevent that
we change the locking rules for vma_start_read() to drop RCU read lock
upon failure. This way it's more obvious that RCU-protected objects are
unsafe after vma locking fails.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
Changes since v1 [1]:
- Fixed missing RCU unlock in lock_vma_under_rcu(), per Lorenzo Stoakes
- Modified comments, per Lorenzo Stoakes
[1] https://lore.kernel.org/all/20250731013405.4066346-2-surenb@google.com/
mm/mmap_lock.c | 86 +++++++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 39 deletions(-)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 10826f347a9f..7ea603f26975 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -136,15 +136,16 @@ void vma_mark_detached(struct vm_area_struct *vma)
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
*
- * WARNING! The vma passed to this function cannot be used if the function
- * fails to lock it because in certain cases RCU lock is dropped and then
- * reacquired. Once RCU lock is dropped the vma can be concurently freed.
+ * IMPORTANT: RCU lock must be held upon entering the function, but upon error
+ * IT IS RELEASED. The caller must handle this correctly.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
{
+ struct mm_struct *other_mm;
int oldcnt;
+ RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held");
/*
* Check before locking. A race might cause false locked result.
* We can use READ_ONCE() for the mm_lock_seq here, and don't need
@@ -152,8 +153,10 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
* we don't rely on for anything - the mm_lock_seq read against which we
* need ordering is below.
*/
- if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
- return NULL;
+ if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) {
+ vma = NULL;
+ goto err;
+ }
/*
* If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
@@ -164,34 +167,14 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
VMA_REF_LIMIT))) {
/* return EAGAIN if vma got detached from under us */
- return oldcnt ? NULL : ERR_PTR(-EAGAIN);
+ vma = oldcnt ? NULL : ERR_PTR(-EAGAIN);
+ goto err;
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
- /*
- * If vma got attached to another mm from under us, that mm is not
- * stable and can be freed in the narrow window after vma->vm_refcnt
- * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
- * releasing vma->vm_refcnt.
- */
- if (unlikely(vma->vm_mm != mm)) {
- /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
- struct mm_struct *other_mm = vma->vm_mm;
-
- /*
- * __mmdrop() is a heavy operation and we don't need RCU
- * protection here. Release RCU lock during these operations.
- * We reinstate the RCU read lock as the caller expects it to
- * be held when this function returns even on error.
- */
- rcu_read_unlock();
- mmgrab(other_mm);
- vma_refcount_put(vma);
- mmdrop(other_mm);
- rcu_read_lock();
- return NULL;
- }
+ if (unlikely(vma->vm_mm != mm))
+ goto err_unstable;
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
@@ -206,10 +189,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
*/
if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
vma_refcount_put(vma);
- return NULL;
+ vma = NULL;
+ goto err;
}
return vma;
+err:
+ rcu_read_unlock();
+
+ return vma;
+err_unstable:
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ other_mm = vma->vm_mm; /* use a copy as vma can be freed after we drop vm_refcnt */
+
+ /* __mmdrop() is a heavy operation, do it after dropping RCU lock. */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+
+ return NULL;
}
/*
@@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
MA_STATE(mas, &mm->mm_mt, address, address);
struct vm_area_struct *vma;
- rcu_read_lock();
retry:
+ rcu_read_lock();
vma = mas_walk(&mas);
- if (!vma)
+ if (!vma) {
+ rcu_read_unlock();
goto inval;
+ }
vma = vma_start_read(mm, vma);
if (IS_ERR_OR_NULL(vma)) {
@@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
/* Failed to lock the VMA */
goto inval;
}
+
+ rcu_read_unlock();
+
/*
* At this point, we have a stable reference to a VMA: The VMA is
* locked and we know it hasn't already been isolated.
@@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(address < vma->vm_start || address >= vma->vm_end))
- goto inval_end_read;
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
+ vma_end_read(vma);
+ goto inval;
+ }
- rcu_read_unlock();
return vma;
-inval_end_read:
- vma_end_read(vma);
inval:
- rcu_read_unlock();
count_vm_vma_lock_event(VMA_LOCK_ABORT);
return NULL;
}
@@ -313,6 +320,7 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
*/
if (PTR_ERR(vma) == -EAGAIN) {
/* reset to search from the last address */
+ rcu_read_lock();
vma_iter_set(vmi, from_addr);
goto retry;
}
@@ -342,9 +350,9 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
return vma;
fallback_unlock:
+ rcu_read_unlock();
vma_end_read(vma);
fallback:
- rcu_read_unlock();
vma = lock_next_vma_under_mmap_lock(mm, vmi, from_addr);
rcu_read_lock();
/* Reinitialize the iterator after re-entering rcu read section */
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read()
2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
@ 2025-07-31 15:20 ` Suren Baghdasaryan
2025-08-01 10:43 ` Lorenzo Stoakes
2025-08-01 8:42 ` Vlastimil Babka
2 siblings, 1 reply; 8+ messages in thread
From: Suren Baghdasaryan @ 2025-07-31 15:20 UTC (permalink / raw)
To: akpm
Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm,
linux-kernel
On Thu, Jul 31, 2025 at 8:19 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Limit the scope of vma_start_read() as it is used only as a helper for
> higher-level locking functions implemented inside mmap_lock.c and we are
> about to introduce more complex RCU rules for this function.
> The change is pure code refactoring and has no functional changes.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Forgot to add Lorenzo's
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Thanks!
> ---
> include/linux/mmap_lock.h | 85 ---------------------------------------
> mm/mmap_lock.c | 85 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 85 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 11a078de9150..2c9fffa58714 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -147,91 +147,6 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
> }
> }
>
> -/*
> - * Try to read-lock a vma. The function is allowed to occasionally yield false
> - * locked result to avoid performance overhead, in which case we fall back to
> - * using mmap_lock. The function should never yield false unlocked result.
> - * False locked result is possible if mm_lock_seq overflows or if vma gets
> - * reused and attached to a different mm before we lock it.
> - * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
> - * detached.
> - *
> - * WARNING! The vma passed to this function cannot be used if the function
> - * fails to lock it because in certain cases RCU lock is dropped and then
> - * reacquired. Once RCU lock is dropped the vma can be concurently freed.
> - */
> -static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> - struct vm_area_struct *vma)
> -{
> - int oldcnt;
> -
> - /*
> - * Check before locking. A race might cause false locked result.
> - * We can use READ_ONCE() for the mm_lock_seq here, and don't need
> - * ACQUIRE semantics, because this is just a lockless check whose result
> - * we don't rely on for anything - the mm_lock_seq read against which we
> - * need ordering is below.
> - */
> - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
> - return NULL;
> -
> - /*
> - * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
> - * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
> - * Acquire fence is required here to avoid reordering against later
> - * vm_lock_seq check and checks inside lock_vma_under_rcu().
> - */
> - if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
> - VMA_REF_LIMIT))) {
> - /* return EAGAIN if vma got detached from under us */
> - return oldcnt ? NULL : ERR_PTR(-EAGAIN);
> - }
> -
> - rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> -
> - /*
> - * If vma got attached to another mm from under us, that mm is not
> - * stable and can be freed in the narrow window after vma->vm_refcnt
> - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
> - * releasing vma->vm_refcnt.
> - */
> - if (unlikely(vma->vm_mm != mm)) {
> - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> - struct mm_struct *other_mm = vma->vm_mm;
> -
> - /*
> - * __mmdrop() is a heavy operation and we don't need RCU
> - * protection here. Release RCU lock during these operations.
> - * We reinstate the RCU read lock as the caller expects it to
> - * be held when this function returns even on error.
> - */
> - rcu_read_unlock();
> - mmgrab(other_mm);
> - vma_refcount_put(vma);
> - mmdrop(other_mm);
> - rcu_read_lock();
> - return NULL;
> - }
> -
> - /*
> - * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
> - * False unlocked result is impossible because we modify and check
> - * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
> - * modification invalidates all existing locks.
> - *
> - * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
> - * racing with vma_end_write_all(), we only start reading from the VMA
> - * after it has been unlocked.
> - * This pairs with RELEASE semantics in vma_end_write_all().
> - */
> - if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
> - vma_refcount_put(vma);
> - return NULL;
> - }
> -
> - return vma;
> -}
> -
> /*
> * Use only while holding mmap read lock which guarantees that locking will not
> * fail (nobody can concurrently write-lock the vma). vma_start_read() should
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index b006cec8e6fe..10826f347a9f 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -127,6 +127,91 @@ void vma_mark_detached(struct vm_area_struct *vma)
> }
> }
>
> +/*
> + * Try to read-lock a vma. The function is allowed to occasionally yield false
> + * locked result to avoid performance overhead, in which case we fall back to
> + * using mmap_lock. The function should never yield false unlocked result.
> + * False locked result is possible if mm_lock_seq overflows or if vma gets
> + * reused and attached to a different mm before we lock it.
> + * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
> + * detached.
> + *
> + * WARNING! The vma passed to this function cannot be used if the function
> + * fails to lock it because in certain cases RCU lock is dropped and then
> + * reacquired. Once RCU lock is dropped the vma can be concurently freed.
> + */
> +static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> + struct vm_area_struct *vma)
> +{
> + int oldcnt;
> +
> + /*
> + * Check before locking. A race might cause false locked result.
> + * We can use READ_ONCE() for the mm_lock_seq here, and don't need
> + * ACQUIRE semantics, because this is just a lockless check whose result
> + * we don't rely on for anything - the mm_lock_seq read against which we
> + * need ordering is below.
> + */
> + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
> + return NULL;
> +
> + /*
> + * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
> + * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
> + * Acquire fence is required here to avoid reordering against later
> + * vm_lock_seq check and checks inside lock_vma_under_rcu().
> + */
> + if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
> + VMA_REF_LIMIT))) {
> + /* return EAGAIN if vma got detached from under us */
> + return oldcnt ? NULL : ERR_PTR(-EAGAIN);
> + }
> +
> + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> +
> + /*
> + * If vma got attached to another mm from under us, that mm is not
> + * stable and can be freed in the narrow window after vma->vm_refcnt
> + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
> + * releasing vma->vm_refcnt.
> + */
> + if (unlikely(vma->vm_mm != mm)) {
> + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> + struct mm_struct *other_mm = vma->vm_mm;
> +
> + /*
> + * __mmdrop() is a heavy operation and we don't need RCU
> + * protection here. Release RCU lock during these operations.
> + * We reinstate the RCU read lock as the caller expects it to
> + * be held when this function returns even on error.
> + */
> + rcu_read_unlock();
> + mmgrab(other_mm);
> + vma_refcount_put(vma);
> + mmdrop(other_mm);
> + rcu_read_lock();
> + return NULL;
> + }
> +
> + /*
> + * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
> + * False unlocked result is impossible because we modify and check
> + * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
> + * modification invalidates all existing locks.
> + *
> + * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
> + * racing with vma_end_write_all(), we only start reading from the VMA
> + * after it has been unlocked.
> + * This pairs with RELEASE semantics in vma_end_write_all().
> + */
> + if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
> + vma_refcount_put(vma);
> + return NULL;
> + }
> +
> + return vma;
> +}
> +
> /*
> * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be
> * stable and not isolated. If the VMA is not found or is being modified the
>
> base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302
> --
> 2.50.1.552.g942d659e1b-goog
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read()
2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
@ 2025-08-01 8:42 ` Vlastimil Babka
2 siblings, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2025-08-01 8:42 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm
Cc: jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm,
linux-kernel
On 7/31/25 17:19, Suren Baghdasaryan wrote:
> Limit the scope of vma_start_read() as it is used only as a helper for
> higher-level locking functions implemented inside mmap_lock.c and we are
> about to introduce more complex RCU rules for this function.
> The change is pure code refactoring and has no functional changes.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
@ 2025-08-01 9:09 ` Vlastimil Babka
2025-08-01 15:30 ` Suren Baghdasaryan
2025-08-01 11:00 ` Lorenzo Stoakes
1 sibling, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2025-08-01 9:09 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm
Cc: jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm,
linux-kernel
On 7/31/25 17:19, Suren Baghdasaryan wrote:
> vma_start_read() can drop and reacquire RCU lock in certain failure
> cases. It's not apparent that the RCU session started by the caller of
> this function might be interrupted when vma_start_read() fails to lock
> the vma. This might become a source of subtle bugs and to prevent that
> we change the locking rules for vma_start_read() to drop RCU read lock
> upon failure. This way it's more obvious that RCU-protected objects are
> unsafe after vma locking fails.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
IIRC you considered it yourself, I just convinced you to try :)
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
I thought we didn't need the drop rcu lock for -EAGAIN, but that would just
made it more complex for little gain, so this looks good to me.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Nit:
> @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> MA_STATE(mas, &mm->mm_mt, address, address);
> struct vm_area_struct *vma;
>
> - rcu_read_lock();
> retry:
> + rcu_read_lock();
> vma = mas_walk(&mas);
> - if (!vma)
> + if (!vma) {
> + rcu_read_unlock();
> goto inval;
> + }
>
> vma = vma_start_read(mm, vma);
> if (IS_ERR_OR_NULL(vma)) {
> @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> /* Failed to lock the VMA */
> goto inval;
> }
> +
> + rcu_read_unlock();
Would it make sense to put this under the comment below?
> +
> /*
> * At this point, we have a stable reference to a VMA: The VMA is
> * locked and we know it hasn't already been isolated.
Give it continues like this:
* From here on, we can access the VMA without worrying about which
* fields are accessible for RCU readers.
> @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> */
>
> /* Check if the vma we locked is the right one. */
> - if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> - goto inval_end_read;
> + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
> + vma_end_read(vma);
> + goto inval;
> + }
>
> - rcu_read_unlock();
> return vma;
>
> -inval_end_read:
> - vma_end_read(vma);
> inval:
> - rcu_read_unlock();
> count_vm_vma_lock_event(VMA_LOCK_ABORT);
> return NULL;
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read()
2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
@ 2025-08-01 10:43 ` Lorenzo Stoakes
0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2025-08-01 10:43 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, jannh, Liam.Howlett, vbabka, pfalcato, linux-mm,
linux-kernel
On Thu, Jul 31, 2025 at 08:20:37AM -0700, Suren Baghdasaryan wrote:
> On Thu, Jul 31, 2025 at 8:19 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Limit the scope of vma_start_read() as it is used only as a helper for
> > higher-level locking functions implemented inside mmap_lock.c and we are
> > about to introduce more complex RCU rules for this function.
> > The change is pure code refactoring and has no functional changes.
> >
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> Forgot to add Lorenzo's
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks!
:>)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
2025-08-01 9:09 ` Vlastimil Babka
@ 2025-08-01 11:00 ` Lorenzo Stoakes
1 sibling, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2025-08-01 11:00 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, jannh, Liam.Howlett, vbabka, pfalcato, linux-mm,
linux-kernel
On Thu, Jul 31, 2025 at 08:19:19AM -0700, Suren Baghdasaryan wrote:
> vma_start_read() can drop and reacquire RCU lock in certain failure
> cases. It's not apparent that the RCU session started by the caller of
> this function might be interrupted when vma_start_read() fails to lock
> the vma. This might become a source of subtle bugs and to prevent that
> we change the locking rules for vma_start_read() to drop RCU read lock
> upon failure. This way it's more obvious that RCU-protected objects are
> unsafe after vma locking fails.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Thanks for iterating on this so quickly - I think this is a good cleanup so
well worth having.
The change LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
I also confirmed this fixes the issue locally and ran mm self-tests so feel
free to also add:
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> Changes since v1 [1]:
> - Fixed missing RCU unlock in lock_vma_under_rcu(), per Lorenzo Stoakes
> - Modified comments, per Lorenzo Stoakes
>
> [1] https://lore.kernel.org/all/20250731013405.4066346-2-surenb@google.com/
>
> mm/mmap_lock.c | 86 +++++++++++++++++++++++++++-----------------------
> 1 file changed, 47 insertions(+), 39 deletions(-)
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 10826f347a9f..7ea603f26975 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -136,15 +136,16 @@ void vma_mark_detached(struct vm_area_struct *vma)
> * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
> * detached.
> *
> - * WARNING! The vma passed to this function cannot be used if the function
> - * fails to lock it because in certain cases RCU lock is dropped and then
> - * reacquired. Once RCU lock is dropped the vma can be concurently freed.
> + * IMPORTANT: RCU lock must be held upon entering the function, but upon error
> + * IT IS RELEASED. The caller must handle this correctly.
> */
Actually maybe this is better replacing the original, as it is super clear
and direct.
> static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> struct vm_area_struct *vma)
> {
> + struct mm_struct *other_mm;
> int oldcnt;
>
> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held");
> /*
> * Check before locking. A race might cause false locked result.
> * We can use READ_ONCE() for the mm_lock_seq here, and don't need
> @@ -152,8 +153,10 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> * we don't rely on for anything - the mm_lock_seq read against which we
> * need ordering is below.
> */
> - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
> - return NULL;
> + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) {
> + vma = NULL;
> + goto err;
> + }
>
> /*
> * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
> @@ -164,34 +167,14 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
> VMA_REF_LIMIT))) {
> /* return EAGAIN if vma got detached from under us */
> - return oldcnt ? NULL : ERR_PTR(-EAGAIN);
> + vma = oldcnt ? NULL : ERR_PTR(-EAGAIN);
> + goto err;
> }
>
> rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
>
> - /*
> - * If vma got attached to another mm from under us, that mm is not
> - * stable and can be freed in the narrow window after vma->vm_refcnt
> - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
> - * releasing vma->vm_refcnt.
> - */
> - if (unlikely(vma->vm_mm != mm)) {
> - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> - struct mm_struct *other_mm = vma->vm_mm;
> -
> - /*
> - * __mmdrop() is a heavy operation and we don't need RCU
> - * protection here. Release RCU lock during these operations.
> - * We reinstate the RCU read lock as the caller expects it to
> - * be held when this function returns even on error.
> - */
> - rcu_read_unlock();
> - mmgrab(other_mm);
> - vma_refcount_put(vma);
> - mmdrop(other_mm);
> - rcu_read_lock();
> - return NULL;
> - }
> + if (unlikely(vma->vm_mm != mm))
> + goto err_unstable;
>
> /*
> * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
> @@ -206,10 +189,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> */
> if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) {
> vma_refcount_put(vma);
> - return NULL;
> + vma = NULL;
> + goto err;
> }
>
> return vma;
> +err:
> + rcu_read_unlock();
> +
> + return vma;
> +err_unstable:
> + /*
> + * If vma got attached to another mm from under us, that mm is not
> + * stable and can be freed in the narrow window after vma->vm_refcnt
> + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
> + * releasing vma->vm_refcnt.
> + */
> + other_mm = vma->vm_mm; /* use a copy as vma can be freed after we drop vm_refcnt */
> +
> + /* __mmdrop() is a heavy operation, do it after dropping RCU lock. */
> + rcu_read_unlock();
> + mmgrab(other_mm);
> + vma_refcount_put(vma);
> + mmdrop(other_mm);
> +
> + return NULL;
> }
>
> /*
> @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> MA_STATE(mas, &mm->mm_mt, address, address);
> struct vm_area_struct *vma;
>
> - rcu_read_lock();
> retry:
> + rcu_read_lock();
> vma = mas_walk(&mas);
> - if (!vma)
> + if (!vma) {
> + rcu_read_unlock();
Good :>)
Again, very easy thing to miss.
> goto inval;
> + }
>
> vma = vma_start_read(mm, vma);
> if (IS_ERR_OR_NULL(vma)) {
> @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> /* Failed to lock the VMA */
> goto inval;
> }
> +
> + rcu_read_unlock();
> +
> /*
> * At this point, we have a stable reference to a VMA: The VMA is
> * locked and we know it hasn't already been isolated.
> @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> */
>
> /* Check if the vma we locked is the right one. */
> - if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> - goto inval_end_read;
> + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
> + vma_end_read(vma);
> + goto inval;
> + }
>
> - rcu_read_unlock();
> return vma;
>
> -inval_end_read:
> - vma_end_read(vma);
> inval:
> - rcu_read_unlock();
> count_vm_vma_lock_event(VMA_LOCK_ABORT);
> return NULL;
> }
> @@ -313,6 +320,7 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
> */
> if (PTR_ERR(vma) == -EAGAIN) {
> /* reset to search from the last address */
> + rcu_read_lock();
> vma_iter_set(vmi, from_addr);
> goto retry;
> }
> @@ -342,9 +350,9 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
> return vma;
>
> fallback_unlock:
> + rcu_read_unlock();
> vma_end_read(vma);
> fallback:
> - rcu_read_unlock();
> vma = lock_next_vma_under_mmap_lock(mm, vmi, from_addr);
> rcu_read_lock();
> /* Reinitialize the iterator after re-entering rcu read section */
> --
> 2.50.1.552.g942d659e1b-goog
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure
2025-08-01 9:09 ` Vlastimil Babka
@ 2025-08-01 15:30 ` Suren Baghdasaryan
0 siblings, 0 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2025-08-01 15:30 UTC (permalink / raw)
To: Vlastimil Babka
Cc: akpm, jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm,
linux-kernel
On Fri, Aug 1, 2025 at 2:09 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 7/31/25 17:19, Suren Baghdasaryan wrote:
> > vma_start_read() can drop and reacquire RCU lock in certain failure
> > cases. It's not apparent that the RCU session started by the caller of
> > this function might be interrupted when vma_start_read() fails to lock
> > the vma. This might become a source of subtle bugs and to prevent that
> > we change the locking rules for vma_start_read() to drop RCU read lock
> > upon failure. This way it's more obvious that RCU-protected objects are
> > unsafe after vma locking fails.
> >
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>
> IIRC you considered it yourself, I just convinced you to try :)
>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> I thought we didn't need the drop rcu lock for -EAGAIN, but that would just
> made it more complex for little gain, so this looks good to me.
Yes, we technically don't but this way it's simpler to explain (drop
RCU on any failure).
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> Nit:
>
> > @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > MA_STATE(mas, &mm->mm_mt, address, address);
> > struct vm_area_struct *vma;
> >
> > - rcu_read_lock();
> > retry:
> > + rcu_read_lock();
> > vma = mas_walk(&mas);
> > - if (!vma)
> > + if (!vma) {
> > + rcu_read_unlock();
> > goto inval;
> > + }
> >
> > vma = vma_start_read(mm, vma);
> > if (IS_ERR_OR_NULL(vma)) {
> > @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > /* Failed to lock the VMA */
> > goto inval;
> > }
> > +
> > + rcu_read_unlock();
>
> Would it make sense to put this under the comment below?
>
> > +
> > /*
> > * At this point, we have a stable reference to a VMA: The VMA is
> > * locked and we know it hasn't already been isolated.
>
> Give it continues like this:
>
> * From here on, we can access the VMA without worrying about which
>
> * fields are accessible for RCU readers.
Yep, will change.
Thanks!
>
> > @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > */
> >
> > /* Check if the vma we locked is the right one. */
> > - if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> > - goto inval_end_read;
> > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
> > + vma_end_read(vma);
> > + goto inval;
> > + }
> >
> > - rcu_read_unlock();
> > return vma;
> >
> > -inval_end_read:
> > - vma_end_read(vma);
> > inval:
> > - rcu_read_unlock();
> > count_vm_vma_lock_event(VMA_LOCK_ABORT);
> > return NULL;
> > }
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-01 15:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan
2025-08-01 9:09 ` Vlastimil Babka
2025-08-01 15:30 ` Suren Baghdasaryan
2025-08-01 11:00 ` Lorenzo Stoakes
2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan
2025-08-01 10:43 ` Lorenzo Stoakes
2025-08-01 8:42 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).