Re: [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <ljs@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-mm@kvack.org,  Shakeel Butt <shakeel.butt@linux.dev>,
	Suren Baghdasaryan <surenb@google.com>,
	 Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers
Date: Fri, 8 May 2026 21:15:54 +0100	[thread overview]
Message-ID: <af5C8bxEKEkQEjwV@lucifer> (raw)
In-Reply-To: <af4Zx0gJIWbdDeY2@lucifer>

On Fri, May 08, 2026 at 06:26:57PM +0100, Lorenzo Stoakes wrote:
> On Wed, Apr 29, 2026 at 11:19:59AM -0700, Dave Hansen wrote:
> >
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> >
> > == Background ==
> >
> > There are basically two parallel ways to look up a VMA: the
> > traditional way, which is protected by mmap_lock, and the RCU-based
> > per-VMA lock way which is based on RCU and refcounts.
> >
> > == Problems ==
> >
> > The mmap_lock one is more straightforward to use but it has a big
> > disadvantage in that it can not be mixed with page faults since those
> > can take mmap_lock for read.
> >
> > The RCU one can be mixed with faults, but it is not available in all
> > configs, so all RCU users need to be able to fall back to the
> > traditional way.
> >
> > == Solution ==
> >
> > Add a variant of the RCU-based lookup that waits for writers. This is
> > basically the same as the existing RCU-based lookup, but it also takes
> > mmap_lock for read and waits for writers to finish before returning
> > the VMA. This has two big advantages:
> >
> >  1. Callers do not need to have a fallback path for when they
> >     collide with writers.
> >  2. It can be used in contexts where page faults can happen because
> >     it can take the mmap_lock for read but never *holds* it.
> >
> > == Discussion ==
> >
> > I am not married to the naming here at all. Naming suggestions would
> > be much appreciated.
> >
> > This basically uses mmap_lock to wait for writers, nothing else. The
> > VMA is obviously stable under mmap_read_lock() and the code _can_
> > likely take advantage of that and possibly even remove the goto. For
> > instance, it could (probably) bump the VMA refcount and exclude future
> > writers. That would eliminate the goto.
> >
> > But the approach as-is is probably the smallest line count and
> > arguably the simplest approach. It is a good place to start a
> > conversation if nothing else.
> >
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Vlastimil Babka <vbabka@kernel.org>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: linux-mm@kvack.org
> > ---
> >
> >  b/include/linux/mmap_lock.h |    2 ++
> >  b/mm/mmap_lock.c            |   43 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 45 insertions(+)
> >
> > diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> > --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait	2026-04-29 11:18:50.633628887 -0700
> > +++ b/include/linux/mmap_lock.h	2026-04-29 11:18:50.707631737 -0700
> > @@ -470,6 +470,8 @@ static inline void vma_mark_detached(str
> >
> >  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> >  					  unsigned long address);
> > +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
> > +					  unsigned long address);
> >
> >  /*
> >   * Locks next vma pointed by the iterator. Confirms the locked vma has not
> > diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> > --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait	2026-04-29 11:18:50.704631622 -0700
> > +++ b/mm/mmap_lock.c	2026-04-29 11:18:50.707631737 -0700
> > @@ -340,6 +340,49 @@ inval:
> >  	return NULL;
> >  }
> >
> > +/*
> > + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> > + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> > + * 'address'.
> > + *
> > + * The fast path does not take mmap lock.
> > + */
> > +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
> > +					       unsigned long address)
> > +{
> > +	struct vm_area_struct *vma;
> > +
> > +retry:
> > +	vma = lock_vma_under_rcu(mm, address);
> > +	/* Fast path: return stable VMA covering 'address': */
> > +	if (vma)
> > +		return vma;
> > +
> > +	/*
> > +	 * Slow path: the VMA covering 'address' is being modified.
> > +	 * or there is no VMA covering 'address'. Rule out the
> > +	 * possibility that the VMA is being modified:
> > +	 */
> > +	mmap_read_lock(mm);
> > +	vma = vma_lookup(mm, address);
> > +	mmap_read_unlock(mm);
> > +
> > +	/* There was for sure no VMA covering 'address': */
> > +	if (!vma)
> > +		return NULL;
> > +
> > +	/*
> > +	 * VMA was likely being modified during RCU lookup. Try again.
> > +	 * mmap_read_lock() waited for the writer to complete and the
> > +	 * writer is now done.
> > +	 *
> > +	 * There is no guarantee that any single retry will succeed,
> > +	 * and it is possible but highly unlikely this will loop
> > +	 * forever.
> > +	 */
> > +	goto retry;
> > +}
>
> Hmm yeah this is not ideal :)
>
> You don't have to do any of this we already have logic to help out here -
> vma_start_read_locked().
>
> That uses the fact the mmap read lock is held to pin the VMA lock, because VMA
> write locks require an mmap write lock, and the mmap read lock prevents them.
>
> That way you can eliminate the retry.
>
> So instead:
>
> /*
>  * Tries to lock under RCU, failing that it acquires the VMA lock with the mmap
>  * read lock held.
>  */
> struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> 					       unsigned long address)
> {
> 	struct vm_area_struct *vma;
>
> 	might_sleep();
>
> 	vma = lock_vma_under_rcu(mm, address);
> 	if (vma)
> 		return vma;
>
> 	/* Slow path: preclude VMA writers by getting mmap read lock. */
> 	guard(rwsem_read)(&mm->mmap_lock);

Actually we'd possibly want the killable version of this? So a fatal signal can
interrupt and not indefinitely block others.

> 	vma = vma_lookup(mm, address);
> 	/* VMA isn't there. */
> 	if (!vma)
> 		return NULL;
>
> 	return vma_start_read_locked(vma);

Sorry this should be:

 	if (!vma_start_read_locked(vma))
		return NULL;
	return vma;

But as per below not sure we will actually see vma_start_read_locked() fail, it
doesn't touch the write side sequence number so overflow shouldn't be a problem.

> }
>
> (Untested, not even build tested code)
>
> Not sure if we can use the linux/cleanup.h guard here, because mmap_read_lock()
> also does some trace stuff, but the guard makes it WAY nicer so (when it goes
> out of scope the mmap read lock is dropped).
>
> Maybe we could add a custom mmap lock guard to cover that too?
>
> Suren - I actually wonder if vma_start_read_locked() actually needs to return a
> boolean? The failure cases for __refcount_inc_not_zero_limited_acquire() are -
> detached or excluding readers on write/detach.
>
> But in both of those cases, vma_lookup() would surely not find the VMA, and
> since we're precluding writers (so no write lock, no detach possible) that means
> we should never hit it?
>
> Wonder if we should make it a void and add a VM_WARN_ON_ONCE()?
>
>
> > +
> >  static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
> >  							    struct vma_iterator *vmi,
> >  							    unsigned long from_addr)
> > _
>
> Cheers, Lorenzo

next prev parent reply	other threads:[~2026-05-08 20:16 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 18:19 [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
2026-04-29 18:19 ` [PATCH 1/6] mm: Make per-VMA locks available universally Dave Hansen
2026-05-08 10:12   ` David Hildenbrand (Arm)
2026-05-08 10:58     ` David Hildenbrand (Arm)
2026-05-08 16:55       ` Lorenzo Stoakes
2026-05-13 21:03   ` Suren Baghdasaryan
2026-04-29 18:19 ` [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
2026-05-08 17:06   ` Lorenzo Stoakes
2026-05-14  0:15     ` Suren Baghdasaryan
2026-04-29 18:19 ` [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers Dave Hansen
2026-05-08 17:26   ` Lorenzo Stoakes
2026-05-08 20:15     ` Lorenzo Stoakes [this message]
2026-05-14  0:47     ` Suren Baghdasaryan
2026-04-29 18:20 ` [PATCH 4/6] binder: Remove mmap_lock fallback Dave Hansen
2026-05-08 17:29   ` Lorenzo Stoakes
2026-05-14  0:57     ` Suren Baghdasaryan
2026-05-14  1:01     ` Suren Baghdasaryan
2026-04-29 18:20 ` [PATCH 5/6] tcp: Remove mmap_lock fallback path Dave Hansen
2026-05-08 17:32   ` Lorenzo Stoakes
2026-05-14  1:05     ` Suren Baghdasaryan
2026-04-29 18:20 ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Dave Hansen
2026-05-04 23:15   ` Edgecombe, Rick P
2026-05-05 16:39     ` Dave Hansen
2026-05-08 20:39       ` Lorenzo Stoakes
2026-05-14  1:49         ` Suren Baghdasaryan
2026-05-14  1:44       ` Suren Baghdasaryan
2026-05-14  2:03         ` Suren Baghdasaryan
2026-04-29 18:22 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Dave Hansen
2026-04-30  8:11   ` Lorenzo Stoakes
2026-04-30 17:17     ` Suren Baghdasaryan
2026-04-30 17:20       ` Dave Hansen
2026-04-30  7:55 ` [syzbot ci] " syzbot ci
2026-04-30 16:59   ` Dave Hansen
     [not found] ` <20260430072053.e0be1b431bcff02831f07e9d@linux-foundation.org>
2026-04-30 16:52   ` [PATCH 0/6] " Dave Hansen
2026-05-08 16:52 ` Lorenzo Stoakes
2026-05-08 17:01   ` Dave Hansen
2026-05-13  4:30     ` Suren Baghdasaryan
2026-05-13  4:19   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af5C8bxEKEkQEjwV@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox