* [PATCH v2 1/2] mm: limit the scope of vma_start_read() @ 2025-07-31 15:19 Suren Baghdasaryan 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Suren Baghdasaryan @ 2025-07-31 15:19 UTC (permalink / raw) To: akpm Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm, linux-kernel, surenb Limit the scope of vma_start_read() as it is used only as a helper for higher-level locking functions implemented inside mmap_lock.c and we are about to introduce more complex RCU rules for this function. The change is pure code refactoring and has no functional changes. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/mmap_lock.h | 85 --------------------------------------- mm/mmap_lock.c | 85 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 85 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 11a078de9150..2c9fffa58714 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -147,91 +147,6 @@ static inline void vma_refcount_put(struct vm_area_struct *vma) } } -/* - * Try to read-lock a vma. The function is allowed to occasionally yield false - * locked result to avoid performance overhead, in which case we fall back to - * using mmap_lock. The function should never yield false unlocked result. - * False locked result is possible if mm_lock_seq overflows or if vma gets - * reused and attached to a different mm before we lock it. - * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got - * detached. - * - * WARNING! The vma passed to this function cannot be used if the function - * fails to lock it because in certain cases RCU lock is dropped and then - * reacquired. Once RCU lock is dropped the vma can be concurently freed. - */ -static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, - struct vm_area_struct *vma) -{ - int oldcnt; - - /* - * Check before locking. A race might cause false locked result. - * We can use READ_ONCE() for the mm_lock_seq here, and don't need - * ACQUIRE semantics, because this is just a lockless check whose result - * we don't rely on for anything - the mm_lock_seq read against which we - * need ordering is below. - */ - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) - return NULL; - - /* - * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() - * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET. - * Acquire fence is required here to avoid reordering against later - * vm_lock_seq check and checks inside lock_vma_under_rcu(). - */ - if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, - VMA_REF_LIMIT))) { - /* return EAGAIN if vma got detached from under us */ - return oldcnt ? NULL : ERR_PTR(-EAGAIN); - } - - rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); - - /* - * If vma got attached to another mm from under us, that mm is not - * stable and can be freed in the narrow window after vma->vm_refcnt - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before - * releasing vma->vm_refcnt. - */ - if (unlikely(vma->vm_mm != mm)) { - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ - struct mm_struct *other_mm = vma->vm_mm; - - /* - * __mmdrop() is a heavy operation and we don't need RCU - * protection here. Release RCU lock during these operations. - * We reinstate the RCU read lock as the caller expects it to - * be held when this function returns even on error. - */ - rcu_read_unlock(); - mmgrab(other_mm); - vma_refcount_put(vma); - mmdrop(other_mm); - rcu_read_lock(); - return NULL; - } - - /* - * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. - * False unlocked result is impossible because we modify and check - * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq - * modification invalidates all existing locks. - * - * We must use ACQUIRE semantics for the mm_lock_seq so that if we are - * racing with vma_end_write_all(), we only start reading from the VMA - * after it has been unlocked. - * This pairs with RELEASE semantics in vma_end_write_all(). - */ - if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { - vma_refcount_put(vma); - return NULL; - } - - return vma; -} - /* * Use only while holding mmap read lock which guarantees that locking will not * fail (nobody can concurrently write-lock the vma). vma_start_read() should diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index b006cec8e6fe..10826f347a9f 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -127,6 +127,91 @@ void vma_mark_detached(struct vm_area_struct *vma) } } +/* + * Try to read-lock a vma. The function is allowed to occasionally yield false + * locked result to avoid performance overhead, in which case we fall back to + * using mmap_lock. The function should never yield false unlocked result. + * False locked result is possible if mm_lock_seq overflows or if vma gets + * reused and attached to a different mm before we lock it. + * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got + * detached. + * + * WARNING! The vma passed to this function cannot be used if the function + * fails to lock it because in certain cases RCU lock is dropped and then + * reacquired. Once RCU lock is dropped the vma can be concurently freed. + */ +static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, + struct vm_area_struct *vma) +{ + int oldcnt; + + /* + * Check before locking. A race might cause false locked result. + * We can use READ_ONCE() for the mm_lock_seq here, and don't need + * ACQUIRE semantics, because this is just a lockless check whose result + * we don't rely on for anything - the mm_lock_seq read against which we + * need ordering is below. + */ + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) + return NULL; + + /* + * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() + * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET. + * Acquire fence is required here to avoid reordering against later + * vm_lock_seq check and checks inside lock_vma_under_rcu(). + */ + if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, + VMA_REF_LIMIT))) { + /* return EAGAIN if vma got detached from under us */ + return oldcnt ? NULL : ERR_PTR(-EAGAIN); + } + + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); + + /* + * If vma got attached to another mm from under us, that mm is not + * stable and can be freed in the narrow window after vma->vm_refcnt + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before + * releasing vma->vm_refcnt. + */ + if (unlikely(vma->vm_mm != mm)) { + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ + struct mm_struct *other_mm = vma->vm_mm; + + /* + * __mmdrop() is a heavy operation and we don't need RCU + * protection here. Release RCU lock during these operations. + * We reinstate the RCU read lock as the caller expects it to + * be held when this function returns even on error. + */ + rcu_read_unlock(); + mmgrab(other_mm); + vma_refcount_put(vma); + mmdrop(other_mm); + rcu_read_lock(); + return NULL; + } + + /* + * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. + * False unlocked result is impossible because we modify and check + * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq + * modification invalidates all existing locks. + * + * We must use ACQUIRE semantics for the mm_lock_seq so that if we are + * racing with vma_end_write_all(), we only start reading from the VMA + * after it has been unlocked. + * This pairs with RELEASE semantics in vma_end_write_all(). + */ + if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { + vma_refcount_put(vma); + return NULL; + } + + return vma; +} + /* * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be * stable and not isolated. If the VMA is not found or is being modified the base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302 -- 2.50.1.552.g942d659e1b-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure 2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan @ 2025-07-31 15:19 ` Suren Baghdasaryan 2025-08-01 9:09 ` Vlastimil Babka 2025-08-01 11:00 ` Lorenzo Stoakes 2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan 2025-08-01 8:42 ` Vlastimil Babka 2 siblings, 2 replies; 8+ messages in thread From: Suren Baghdasaryan @ 2025-07-31 15:19 UTC (permalink / raw) To: akpm Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm, linux-kernel, surenb vma_start_read() can drop and reacquire RCU lock in certain failure cases. It's not apparent that the RCU session started by the caller of this function might be interrupted when vma_start_read() fails to lock the vma. This might become a source of subtle bugs and to prevent that we change the locking rules for vma_start_read() to drop RCU read lock upon failure. This way it's more obvious that RCU-protected objects are unsafe after vma locking fails. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- Changes since v1 [1]: - Fixed missing RCU unlock in lock_vma_under_rcu(), per Lorenzo Stoakes - Modified comments, per Lorenzo Stoakes [1] https://lore.kernel.org/all/20250731013405.4066346-2-surenb@google.com/ mm/mmap_lock.c | 86 +++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 39 deletions(-) diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 10826f347a9f..7ea603f26975 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -136,15 +136,16 @@ void vma_mark_detached(struct vm_area_struct *vma) * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got * detached. * - * WARNING! The vma passed to this function cannot be used if the function - * fails to lock it because in certain cases RCU lock is dropped and then - * reacquired. Once RCU lock is dropped the vma can be concurently freed. + * IMPORTANT: RCU lock must be held upon entering the function, but upon error + * IT IS RELEASED. The caller must handle this correctly. */ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, struct vm_area_struct *vma) { + struct mm_struct *other_mm; int oldcnt; + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held"); /* * Check before locking. A race might cause false locked result. * We can use READ_ONCE() for the mm_lock_seq here, and don't need @@ -152,8 +153,10 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, * we don't rely on for anything - the mm_lock_seq read against which we * need ordering is below. */ - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) - return NULL; + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) { + vma = NULL; + goto err; + } /* * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() @@ -164,34 +167,14 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, VMA_REF_LIMIT))) { /* return EAGAIN if vma got detached from under us */ - return oldcnt ? NULL : ERR_PTR(-EAGAIN); + vma = oldcnt ? NULL : ERR_PTR(-EAGAIN); + goto err; } rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); - /* - * If vma got attached to another mm from under us, that mm is not - * stable and can be freed in the narrow window after vma->vm_refcnt - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before - * releasing vma->vm_refcnt. - */ - if (unlikely(vma->vm_mm != mm)) { - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ - struct mm_struct *other_mm = vma->vm_mm; - - /* - * __mmdrop() is a heavy operation and we don't need RCU - * protection here. Release RCU lock during these operations. - * We reinstate the RCU read lock as the caller expects it to - * be held when this function returns even on error. - */ - rcu_read_unlock(); - mmgrab(other_mm); - vma_refcount_put(vma); - mmdrop(other_mm); - rcu_read_lock(); - return NULL; - } + if (unlikely(vma->vm_mm != mm)) + goto err_unstable; /* * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. @@ -206,10 +189,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, */ if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { vma_refcount_put(vma); - return NULL; + vma = NULL; + goto err; } return vma; +err: + rcu_read_unlock(); + + return vma; +err_unstable: + /* + * If vma got attached to another mm from under us, that mm is not + * stable and can be freed in the narrow window after vma->vm_refcnt + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before + * releasing vma->vm_refcnt. + */ + other_mm = vma->vm_mm; /* use a copy as vma can be freed after we drop vm_refcnt */ + + /* __mmdrop() is a heavy operation, do it after dropping RCU lock. */ + rcu_read_unlock(); + mmgrab(other_mm); + vma_refcount_put(vma); + mmdrop(other_mm); + + return NULL; } /* @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, MA_STATE(mas, &mm->mm_mt, address, address); struct vm_area_struct *vma; - rcu_read_lock(); retry: + rcu_read_lock(); vma = mas_walk(&mas); - if (!vma) + if (!vma) { + rcu_read_unlock(); goto inval; + } vma = vma_start_read(mm, vma); if (IS_ERR_OR_NULL(vma)) { @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, /* Failed to lock the VMA */ goto inval; } + + rcu_read_unlock(); + /* * At this point, we have a stable reference to a VMA: The VMA is * locked and we know it hasn't already been isolated. @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, */ /* Check if the vma we locked is the right one. */ - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) - goto inval_end_read; + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { + vma_end_read(vma); + goto inval; + } - rcu_read_unlock(); return vma; -inval_end_read: - vma_end_read(vma); inval: - rcu_read_unlock(); count_vm_vma_lock_event(VMA_LOCK_ABORT); return NULL; } @@ -313,6 +320,7 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, */ if (PTR_ERR(vma) == -EAGAIN) { /* reset to search from the last address */ + rcu_read_lock(); vma_iter_set(vmi, from_addr); goto retry; } @@ -342,9 +350,9 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, return vma; fallback_unlock: + rcu_read_unlock(); vma_end_read(vma); fallback: - rcu_read_unlock(); vma = lock_next_vma_under_mmap_lock(mm, vmi, from_addr); rcu_read_lock(); /* Reinitialize the iterator after re-entering rcu read section */ -- 2.50.1.552.g942d659e1b-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan @ 2025-08-01 9:09 ` Vlastimil Babka 2025-08-01 15:30 ` Suren Baghdasaryan 2025-08-01 11:00 ` Lorenzo Stoakes 1 sibling, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2025-08-01 9:09 UTC (permalink / raw) To: Suren Baghdasaryan, akpm Cc: jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm, linux-kernel On 7/31/25 17:19, Suren Baghdasaryan wrote: > vma_start_read() can drop and reacquire RCU lock in certain failure > cases. It's not apparent that the RCU session started by the caller of > this function might be interrupted when vma_start_read() fails to lock > the vma. This might become a source of subtle bugs and to prevent that > we change the locking rules for vma_start_read() to drop RCU read lock > upon failure. This way it's more obvious that RCU-protected objects are > unsafe after vma locking fails. > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> IIRC you considered it yourself, I just convinced you to try :) > Signed-off-by: Suren Baghdasaryan <surenb@google.com> I thought we didn't need the drop rcu lock for -EAGAIN, but that would just made it more complex for little gain, so this looks good to me. Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Nit: > @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > MA_STATE(mas, &mm->mm_mt, address, address); > struct vm_area_struct *vma; > > - rcu_read_lock(); > retry: > + rcu_read_lock(); > vma = mas_walk(&mas); > - if (!vma) > + if (!vma) { > + rcu_read_unlock(); > goto inval; > + } > > vma = vma_start_read(mm, vma); > if (IS_ERR_OR_NULL(vma)) { > @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > /* Failed to lock the VMA */ > goto inval; > } > + > + rcu_read_unlock(); Would it make sense to put this under the comment below? > + > /* > * At this point, we have a stable reference to a VMA: The VMA is > * locked and we know it hasn't already been isolated. Give it continues like this: * From here on, we can access the VMA without worrying about which * fields are accessible for RCU readers. > @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > */ > > /* Check if the vma we locked is the right one. */ > - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > - goto inval_end_read; > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > + vma_end_read(vma); > + goto inval; > + } > > - rcu_read_unlock(); > return vma; > > -inval_end_read: > - vma_end_read(vma); > inval: > - rcu_read_unlock(); > count_vm_vma_lock_event(VMA_LOCK_ABORT); > return NULL; > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure 2025-08-01 9:09 ` Vlastimil Babka @ 2025-08-01 15:30 ` Suren Baghdasaryan 0 siblings, 0 replies; 8+ messages in thread From: Suren Baghdasaryan @ 2025-08-01 15:30 UTC (permalink / raw) To: Vlastimil Babka Cc: akpm, jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm, linux-kernel On Fri, Aug 1, 2025 at 2:09 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 7/31/25 17:19, Suren Baghdasaryan wrote: > > vma_start_read() can drop and reacquire RCU lock in certain failure > > cases. It's not apparent that the RCU session started by the caller of > > this function might be interrupted when vma_start_read() fails to lock > > the vma. This might become a source of subtle bugs and to prevent that > > we change the locking rules for vma_start_read() to drop RCU read lock > > upon failure. This way it's more obvious that RCU-protected objects are > > unsafe after vma locking fails. > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> > > IIRC you considered it yourself, I just convinced you to try :) > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > I thought we didn't need the drop rcu lock for -EAGAIN, but that would just > made it more complex for little gain, so this looks good to me. Yes, we technically don't but this way it's simpler to explain (drop RCU on any failure). > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > > Nit: > > > @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > MA_STATE(mas, &mm->mm_mt, address, address); > > struct vm_area_struct *vma; > > > > - rcu_read_lock(); > > retry: > > + rcu_read_lock(); > > vma = mas_walk(&mas); > > - if (!vma) > > + if (!vma) { > > + rcu_read_unlock(); > > goto inval; > > + } > > > > vma = vma_start_read(mm, vma); > > if (IS_ERR_OR_NULL(vma)) { > > @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > /* Failed to lock the VMA */ > > goto inval; > > } > > + > > + rcu_read_unlock(); > > Would it make sense to put this under the comment below? > > > + > > /* > > * At this point, we have a stable reference to a VMA: The VMA is > > * locked and we know it hasn't already been isolated. > > Give it continues like this: > > * From here on, we can access the VMA without worrying about which > > * fields are accessible for RCU readers. Yep, will change. Thanks! > > > @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > */ > > > > /* Check if the vma we locked is the right one. */ > > - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > > - goto inval_end_read; > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_end_read(vma); > > + goto inval; > > + } > > > > - rcu_read_unlock(); > > return vma; > > > > -inval_end_read: > > - vma_end_read(vma); > > inval: > > - rcu_read_unlock(); > > count_vm_vma_lock_event(VMA_LOCK_ABORT); > > return NULL; > > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan 2025-08-01 9:09 ` Vlastimil Babka @ 2025-08-01 11:00 ` Lorenzo Stoakes 1 sibling, 0 replies; 8+ messages in thread From: Lorenzo Stoakes @ 2025-08-01 11:00 UTC (permalink / raw) To: Suren Baghdasaryan Cc: akpm, jannh, Liam.Howlett, vbabka, pfalcato, linux-mm, linux-kernel On Thu, Jul 31, 2025 at 08:19:19AM -0700, Suren Baghdasaryan wrote: > vma_start_read() can drop and reacquire RCU lock in certain failure > cases. It's not apparent that the RCU session started by the caller of > this function might be interrupted when vma_start_read() fails to lock > the vma. This might become a source of subtle bugs and to prevent that > we change the locking rules for vma_start_read() to drop RCU read lock > upon failure. This way it's more obvious that RCU-protected objects are > unsafe after vma locking fails. > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Thanks for iterating on this so quickly - I think this is a good cleanup so well worth having. The change LGTM, so: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> I also confirmed this fixes the issue locally and ran mm self-tests so feel free to also add: Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > Changes since v1 [1]: > - Fixed missing RCU unlock in lock_vma_under_rcu(), per Lorenzo Stoakes > - Modified comments, per Lorenzo Stoakes > > [1] https://lore.kernel.org/all/20250731013405.4066346-2-surenb@google.com/ > > mm/mmap_lock.c | 86 +++++++++++++++++++++++++++----------------------- > 1 file changed, 47 insertions(+), 39 deletions(-) > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index 10826f347a9f..7ea603f26975 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -136,15 +136,16 @@ void vma_mark_detached(struct vm_area_struct *vma) > * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got > * detached. > * > - * WARNING! The vma passed to this function cannot be used if the function > - * fails to lock it because in certain cases RCU lock is dropped and then > - * reacquired. Once RCU lock is dropped the vma can be concurently freed. > + * IMPORTANT: RCU lock must be held upon entering the function, but upon error > + * IT IS RELEASED. The caller must handle this correctly. > */ Actually maybe this is better replacing the original, as it is super clear and direct. > static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > struct vm_area_struct *vma) > { > + struct mm_struct *other_mm; > int oldcnt; > > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held"); > /* > * Check before locking. A race might cause false locked result. > * We can use READ_ONCE() for the mm_lock_seq here, and don't need > @@ -152,8 +153,10 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > * we don't rely on for anything - the mm_lock_seq read against which we > * need ordering is below. > */ > - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) > - return NULL; > + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) { > + vma = NULL; > + goto err; > + } > > /* > * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() > @@ -164,34 +167,14 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, > VMA_REF_LIMIT))) { > /* return EAGAIN if vma got detached from under us */ > - return oldcnt ? NULL : ERR_PTR(-EAGAIN); > + vma = oldcnt ? NULL : ERR_PTR(-EAGAIN); > + goto err; > } > > rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); > > - /* > - * If vma got attached to another mm from under us, that mm is not > - * stable and can be freed in the narrow window after vma->vm_refcnt > - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before > - * releasing vma->vm_refcnt. > - */ > - if (unlikely(vma->vm_mm != mm)) { > - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ > - struct mm_struct *other_mm = vma->vm_mm; > - > - /* > - * __mmdrop() is a heavy operation and we don't need RCU > - * protection here. Release RCU lock during these operations. > - * We reinstate the RCU read lock as the caller expects it to > - * be held when this function returns even on error. > - */ > - rcu_read_unlock(); > - mmgrab(other_mm); > - vma_refcount_put(vma); > - mmdrop(other_mm); > - rcu_read_lock(); > - return NULL; > - } > + if (unlikely(vma->vm_mm != mm)) > + goto err_unstable; > > /* > * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. > @@ -206,10 +189,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > */ > if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { > vma_refcount_put(vma); > - return NULL; > + vma = NULL; > + goto err; > } > > return vma; > +err: > + rcu_read_unlock(); > + > + return vma; > +err_unstable: > + /* > + * If vma got attached to another mm from under us, that mm is not > + * stable and can be freed in the narrow window after vma->vm_refcnt > + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before > + * releasing vma->vm_refcnt. > + */ > + other_mm = vma->vm_mm; /* use a copy as vma can be freed after we drop vm_refcnt */ > + > + /* __mmdrop() is a heavy operation, do it after dropping RCU lock. */ > + rcu_read_unlock(); > + mmgrab(other_mm); > + vma_refcount_put(vma); > + mmdrop(other_mm); > + > + return NULL; > } > > /* > @@ -223,11 +227,13 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > MA_STATE(mas, &mm->mm_mt, address, address); > struct vm_area_struct *vma; > > - rcu_read_lock(); > retry: > + rcu_read_lock(); > vma = mas_walk(&mas); > - if (!vma) > + if (!vma) { > + rcu_read_unlock(); Good :>) Again, very easy thing to miss. > goto inval; > + } > > vma = vma_start_read(mm, vma); > if (IS_ERR_OR_NULL(vma)) { > @@ -241,6 +247,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > /* Failed to lock the VMA */ > goto inval; > } > + > + rcu_read_unlock(); > + > /* > * At this point, we have a stable reference to a VMA: The VMA is > * locked and we know it hasn't already been isolated. > @@ -249,16 +258,14 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > */ > > /* Check if the vma we locked is the right one. */ > - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > - goto inval_end_read; > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > + vma_end_read(vma); > + goto inval; > + } > > - rcu_read_unlock(); > return vma; > > -inval_end_read: > - vma_end_read(vma); > inval: > - rcu_read_unlock(); > count_vm_vma_lock_event(VMA_LOCK_ABORT); > return NULL; > } > @@ -313,6 +320,7 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, > */ > if (PTR_ERR(vma) == -EAGAIN) { > /* reset to search from the last address */ > + rcu_read_lock(); > vma_iter_set(vmi, from_addr); > goto retry; > } > @@ -342,9 +350,9 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, > return vma; > > fallback_unlock: > + rcu_read_unlock(); > vma_end_read(vma); > fallback: > - rcu_read_unlock(); > vma = lock_next_vma_under_mmap_lock(mm, vmi, from_addr); > rcu_read_lock(); > /* Reinitialize the iterator after re-entering rcu read section */ > -- > 2.50.1.552.g942d659e1b-goog > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read() 2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan @ 2025-07-31 15:20 ` Suren Baghdasaryan 2025-08-01 10:43 ` Lorenzo Stoakes 2025-08-01 8:42 ` Vlastimil Babka 2 siblings, 1 reply; 8+ messages in thread From: Suren Baghdasaryan @ 2025-07-31 15:20 UTC (permalink / raw) To: akpm Cc: jannh, Liam.Howlett, lorenzo.stoakes, vbabka, pfalcato, linux-mm, linux-kernel On Thu, Jul 31, 2025 at 8:19 AM Suren Baghdasaryan <surenb@google.com> wrote: > > Limit the scope of vma_start_read() as it is used only as a helper for > higher-level locking functions implemented inside mmap_lock.c and we are > about to introduce more complex RCU rules for this function. > The change is pure code refactoring and has no functional changes. > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Forgot to add Lorenzo's Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Thanks! > --- > include/linux/mmap_lock.h | 85 --------------------------------------- > mm/mmap_lock.c | 85 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 85 insertions(+), 85 deletions(-) > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > index 11a078de9150..2c9fffa58714 100644 > --- a/include/linux/mmap_lock.h > +++ b/include/linux/mmap_lock.h > @@ -147,91 +147,6 @@ static inline void vma_refcount_put(struct vm_area_struct *vma) > } > } > > -/* > - * Try to read-lock a vma. The function is allowed to occasionally yield false > - * locked result to avoid performance overhead, in which case we fall back to > - * using mmap_lock. The function should never yield false unlocked result. > - * False locked result is possible if mm_lock_seq overflows or if vma gets > - * reused and attached to a different mm before we lock it. > - * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got > - * detached. > - * > - * WARNING! The vma passed to this function cannot be used if the function > - * fails to lock it because in certain cases RCU lock is dropped and then > - * reacquired. Once RCU lock is dropped the vma can be concurently freed. > - */ > -static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > - struct vm_area_struct *vma) > -{ > - int oldcnt; > - > - /* > - * Check before locking. A race might cause false locked result. > - * We can use READ_ONCE() for the mm_lock_seq here, and don't need > - * ACQUIRE semantics, because this is just a lockless check whose result > - * we don't rely on for anything - the mm_lock_seq read against which we > - * need ordering is below. > - */ > - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) > - return NULL; > - > - /* > - * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() > - * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET. > - * Acquire fence is required here to avoid reordering against later > - * vm_lock_seq check and checks inside lock_vma_under_rcu(). > - */ > - if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, > - VMA_REF_LIMIT))) { > - /* return EAGAIN if vma got detached from under us */ > - return oldcnt ? NULL : ERR_PTR(-EAGAIN); > - } > - > - rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); > - > - /* > - * If vma got attached to another mm from under us, that mm is not > - * stable and can be freed in the narrow window after vma->vm_refcnt > - * is dropped and before rcuwait_wake_up(mm) is called. Grab it before > - * releasing vma->vm_refcnt. > - */ > - if (unlikely(vma->vm_mm != mm)) { > - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ > - struct mm_struct *other_mm = vma->vm_mm; > - > - /* > - * __mmdrop() is a heavy operation and we don't need RCU > - * protection here. Release RCU lock during these operations. > - * We reinstate the RCU read lock as the caller expects it to > - * be held when this function returns even on error. > - */ > - rcu_read_unlock(); > - mmgrab(other_mm); > - vma_refcount_put(vma); > - mmdrop(other_mm); > - rcu_read_lock(); > - return NULL; > - } > - > - /* > - * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. > - * False unlocked result is impossible because we modify and check > - * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq > - * modification invalidates all existing locks. > - * > - * We must use ACQUIRE semantics for the mm_lock_seq so that if we are > - * racing with vma_end_write_all(), we only start reading from the VMA > - * after it has been unlocked. > - * This pairs with RELEASE semantics in vma_end_write_all(). > - */ > - if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { > - vma_refcount_put(vma); > - return NULL; > - } > - > - return vma; > -} > - > /* > * Use only while holding mmap read lock which guarantees that locking will not > * fail (nobody can concurrently write-lock the vma). vma_start_read() should > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index b006cec8e6fe..10826f347a9f 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -127,6 +127,91 @@ void vma_mark_detached(struct vm_area_struct *vma) > } > } > > +/* > + * Try to read-lock a vma. The function is allowed to occasionally yield false > + * locked result to avoid performance overhead, in which case we fall back to > + * using mmap_lock. The function should never yield false unlocked result. > + * False locked result is possible if mm_lock_seq overflows or if vma gets > + * reused and attached to a different mm before we lock it. > + * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got > + * detached. > + * > + * WARNING! The vma passed to this function cannot be used if the function > + * fails to lock it because in certain cases RCU lock is dropped and then > + * reacquired. Once RCU lock is dropped the vma can be concurently freed. > + */ > +static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, > + struct vm_area_struct *vma) > +{ > + int oldcnt; > + > + /* > + * Check before locking. A race might cause false locked result. > + * We can use READ_ONCE() for the mm_lock_seq here, and don't need > + * ACQUIRE semantics, because this is just a lockless check whose result > + * we don't rely on for anything - the mm_lock_seq read against which we > + * need ordering is below. > + */ > + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence)) > + return NULL; > + > + /* > + * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire() > + * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET. > + * Acquire fence is required here to avoid reordering against later > + * vm_lock_seq check and checks inside lock_vma_under_rcu(). > + */ > + if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt, > + VMA_REF_LIMIT))) { > + /* return EAGAIN if vma got detached from under us */ > + return oldcnt ? NULL : ERR_PTR(-EAGAIN); > + } > + > + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); > + > + /* > + * If vma got attached to another mm from under us, that mm is not > + * stable and can be freed in the narrow window after vma->vm_refcnt > + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before > + * releasing vma->vm_refcnt. > + */ > + if (unlikely(vma->vm_mm != mm)) { > + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ > + struct mm_struct *other_mm = vma->vm_mm; > + > + /* > + * __mmdrop() is a heavy operation and we don't need RCU > + * protection here. Release RCU lock during these operations. > + * We reinstate the RCU read lock as the caller expects it to > + * be held when this function returns even on error. > + */ > + rcu_read_unlock(); > + mmgrab(other_mm); > + vma_refcount_put(vma); > + mmdrop(other_mm); > + rcu_read_lock(); > + return NULL; > + } > + > + /* > + * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. > + * False unlocked result is impossible because we modify and check > + * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq > + * modification invalidates all existing locks. > + * > + * We must use ACQUIRE semantics for the mm_lock_seq so that if we are > + * racing with vma_end_write_all(), we only start reading from the VMA > + * after it has been unlocked. > + * This pairs with RELEASE semantics in vma_end_write_all(). > + */ > + if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq))) { > + vma_refcount_put(vma); > + return NULL; > + } > + > + return vma; > +} > + > /* > * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be > * stable and not isolated. If the VMA is not found or is being modified the > > base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302 > -- > 2.50.1.552.g942d659e1b-goog > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read() 2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan @ 2025-08-01 10:43 ` Lorenzo Stoakes 0 siblings, 0 replies; 8+ messages in thread From: Lorenzo Stoakes @ 2025-08-01 10:43 UTC (permalink / raw) To: Suren Baghdasaryan Cc: akpm, jannh, Liam.Howlett, vbabka, pfalcato, linux-mm, linux-kernel On Thu, Jul 31, 2025 at 08:20:37AM -0700, Suren Baghdasaryan wrote: > On Thu, Jul 31, 2025 at 8:19 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > Limit the scope of vma_start_read() as it is used only as a helper for > > higher-level locking functions implemented inside mmap_lock.c and we are > > about to introduce more complex RCU rules for this function. > > The change is pure code refactoring and has no functional changes. > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Forgot to add Lorenzo's > > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Thanks! :>) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: limit the scope of vma_start_read() 2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan 2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan @ 2025-08-01 8:42 ` Vlastimil Babka 2 siblings, 0 replies; 8+ messages in thread From: Vlastimil Babka @ 2025-08-01 8:42 UTC (permalink / raw) To: Suren Baghdasaryan, akpm Cc: jannh, Liam.Howlett, lorenzo.stoakes, pfalcato, linux-mm, linux-kernel On 7/31/25 17:19, Suren Baghdasaryan wrote: > Limit the scope of vma_start_read() as it is used only as a helper for > higher-level locking functions implemented inside mmap_lock.c and we are > about to introduce more complex RCU rules for this function. > The change is pure code refactoring and has no functional changes. > > Suggested-by: Vlastimil Babka <vbabka@suse.cz> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-01 15:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-31 15:19 [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan 2025-07-31 15:19 ` [PATCH v2 2/2] mm: change vma_start_read() to drop RCU lock on failure Suren Baghdasaryan 2025-08-01 9:09 ` Vlastimil Babka 2025-08-01 15:30 ` Suren Baghdasaryan 2025-08-01 11:00 ` Lorenzo Stoakes 2025-07-31 15:20 ` [PATCH v2 1/2] mm: limit the scope of vma_start_read() Suren Baghdasaryan 2025-08-01 10:43 ` Lorenzo Stoakes 2025-08-01 8:42 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).