From: Lorenzo Stoakes <ljs@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-mm@kvack.org, Shakeel Butt <shakeel.butt@linux.dev>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers
Date: Fri, 8 May 2026 21:15:54 +0100 [thread overview]
Message-ID: <af5C8bxEKEkQEjwV@lucifer> (raw)
In-Reply-To: <af4Zx0gJIWbdDeY2@lucifer>
On Fri, May 08, 2026 at 06:26:57PM +0100, Lorenzo Stoakes wrote:
> On Wed, Apr 29, 2026 at 11:19:59AM -0700, Dave Hansen wrote:
> >
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> >
> > == Background ==
> >
> > There are basically two parallel ways to look up a VMA: the
> > traditional way, which is protected by mmap_lock, and the RCU-based
> > per-VMA lock way which is based on RCU and refcounts.
> >
> > == Problems ==
> >
> > The mmap_lock one is more straightforward to use but it has a big
> > disadvantage in that it can not be mixed with page faults since those
> > can take mmap_lock for read.
> >
> > The RCU one can be mixed with faults, but it is not available in all
> > configs, so all RCU users need to be able to fall back to the
> > traditional way.
> >
> > == Solution ==
> >
> > Add a variant of the RCU-based lookup that waits for writers. This is
> > basically the same as the existing RCU-based lookup, but it also takes
> > mmap_lock for read and waits for writers to finish before returning
> > the VMA. This has two big advantages:
> >
> > 1. Callers do not need to have a fallback path for when they
> > collide with writers.
> > 2. It can be used in contexts where page faults can happen because
> > it can take the mmap_lock for read but never *holds* it.
> >
> > == Discussion ==
> >
> > I am not married to the naming here at all. Naming suggestions would
> > be much appreciated.
> >
> > This basically uses mmap_lock to wait for writers, nothing else. The
> > VMA is obviously stable under mmap_read_lock() and the code _can_
> > likely take advantage of that and possibly even remove the goto. For
> > instance, it could (probably) bump the VMA refcount and exclude future
> > writers. That would eliminate the goto.
> >
> > But the approach as-is is probably the smallest line count and
> > arguably the simplest approach. It is a good place to start a
> > conversation if nothing else.
> >
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Vlastimil Babka <vbabka@kernel.org>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: linux-mm@kvack.org
> > ---
> >
> > b/include/linux/mmap_lock.h | 2 ++
> > b/mm/mmap_lock.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 45 insertions(+)
> >
> > diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> > --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-04-29 11:18:50.633628887 -0700
> > +++ b/include/linux/mmap_lock.h 2026-04-29 11:18:50.707631737 -0700
> > @@ -470,6 +470,8 @@ static inline void vma_mark_detached(str
> >
> > struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > unsigned long address);
> > +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
> > + unsigned long address);
> >
> > /*
> > * Locks next vma pointed by the iterator. Confirms the locked vma has not
> > diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> > --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-04-29 11:18:50.704631622 -0700
> > +++ b/mm/mmap_lock.c 2026-04-29 11:18:50.707631737 -0700
> > @@ -340,6 +340,49 @@ inval:
> > return NULL;
> > }
> >
> > +/*
> > + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> > + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> > + * 'address'.
> > + *
> > + * The fast path does not take mmap lock.
> > + */
> > +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm,
> > + unsigned long address)
> > +{
> > + struct vm_area_struct *vma;
> > +
> > +retry:
> > + vma = lock_vma_under_rcu(mm, address);
> > + /* Fast path: return stable VMA covering 'address': */
> > + if (vma)
> > + return vma;
> > +
> > + /*
> > + * Slow path: the VMA covering 'address' is being modified.
> > + * or there is no VMA covering 'address'. Rule out the
> > + * possibility that the VMA is being modified:
> > + */
> > + mmap_read_lock(mm);
> > + vma = vma_lookup(mm, address);
> > + mmap_read_unlock(mm);
> > +
> > + /* There was for sure no VMA covering 'address': */
> > + if (!vma)
> > + return NULL;
> > +
> > + /*
> > + * VMA was likely being modified during RCU lookup. Try again.
> > + * mmap_read_lock() waited for the writer to complete and the
> > + * writer is now done.
> > + *
> > + * There is no guarantee that any single retry will succeed,
> > + * and it is possible but highly unlikely this will loop
> > + * forever.
> > + */
> > + goto retry;
> > +}
>
> Hmm yeah this is not ideal :)
>
> You don't have to do any of this we already have logic to help out here -
> vma_start_read_locked().
>
> That uses the fact the mmap read lock is held to pin the VMA lock, because VMA
> write locks require an mmap write lock, and the mmap read lock prevents them.
>
> That way you can eliminate the retry.
>
> So instead:
>
> /*
> * Tries to lock under RCU, failing that it acquires the VMA lock with the mmap
> * read lock held.
> */
> struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> unsigned long address)
> {
> struct vm_area_struct *vma;
>
> might_sleep();
>
> vma = lock_vma_under_rcu(mm, address);
> if (vma)
> return vma;
>
> /* Slow path: preclude VMA writers by getting mmap read lock. */
> guard(rwsem_read)(&mm->mmap_lock);
Actually we'd possibly want the killable version of this? So a fatal signal can
interrupt and not indefinitely block others.
> vma = vma_lookup(mm, address);
> /* VMA isn't there. */
> if (!vma)
> return NULL;
>
> return vma_start_read_locked(vma);
Sorry this should be:
if (!vma_start_read_locked(vma))
return NULL;
return vma;
But as per below not sure we will actually see vma_start_read_locked() fail, it
doesn't touch the write side sequence number so overflow shouldn't be a problem.
> }
>
> (Untested, not even build tested code)
>
> Not sure if we can use the linux/cleanup.h guard here, because mmap_read_lock()
> also does some trace stuff, but the guard makes it WAY nicer so (when it goes
> out of scope the mmap read lock is dropped).
>
> Maybe we could add a custom mmap lock guard to cover that too?
>
> Suren - I actually wonder if vma_start_read_locked() actually needs to return a
> boolean? The failure cases for __refcount_inc_not_zero_limited_acquire() are -
> detached or excluding readers on write/detach.
>
> But in both of those cases, vma_lookup() would surely not find the VMA, and
> since we're precluding writers (so no write lock, no detach possible) that means
> we should never hit it?
>
> Wonder if we should make it a void and add a VM_WARN_ON_ONCE()?
>
>
> > +
> > static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
> > struct vma_iterator *vmi,
> > unsigned long from_addr)
> > _
>
> Cheers, Lorenzo
next prev parent reply other threads:[~2026-05-08 20:16 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260429181954.F50224AE@davehans-spike.ostc.intel.com>
[not found] ` <20260429182005.00BF70D8@davehans-spike.ostc.intel.com>
2026-05-04 23:15 ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Edgecombe, Rick P
2026-05-05 16:39 ` Dave Hansen
2026-05-08 20:39 ` Lorenzo Stoakes
[not found] ` <20260429181955.0C443845@davehans-spike.ostc.intel.com>
2026-05-08 10:12 ` [PATCH 1/6] mm: Make per-VMA locks available universally David Hildenbrand (Arm)
2026-05-08 10:58 ` David Hildenbrand (Arm)
2026-05-08 16:55 ` Lorenzo Stoakes
2026-05-08 16:52 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Lorenzo Stoakes
2026-05-08 17:01 ` Dave Hansen
[not found] ` <20260429181957.7511C256@davehans-spike.ostc.intel.com>
2026-05-08 17:06 ` [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock Lorenzo Stoakes
[not found] ` <20260429181959.BC9DABC5@davehans-spike.ostc.intel.com>
2026-05-08 17:26 ` [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers Lorenzo Stoakes
2026-05-08 20:15 ` Lorenzo Stoakes [this message]
[not found] ` <20260429182000.93887DFB@davehans-spike.ostc.intel.com>
2026-05-08 17:29 ` [PATCH 4/6] binder: Remove mmap_lock fallback Lorenzo Stoakes
[not found] ` <20260429182002.BB61C7BC@davehans-spike.ostc.intel.com>
2026-05-08 17:32 ` [PATCH 5/6] tcp: Remove mmap_lock fallback path Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af5C8bxEKEkQEjwV@lucifer \
--to=ljs@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox