Netdev List
 help / color / mirror / Atom feed
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>, linux-kernel@vger.kernel.org
Cc: "Alice Ryhl" <aliceryhl@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Carlos Llamas" <cmllamas@google.com>,
	"Christian Brauner" <christian@brauner.io>,
	"David Ahern" <dsahern@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-mm@kvack.org, "Lorenzo Stoakes" <ljs@kernel.org>,
	netdev@vger.kernel.org, "Shakeel Butt" <shakeel.butt@linux.dev>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Todd Kjos" <tkjos@android.com>
Subject: Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
Date: Fri, 12 Jun 2026 20:00:25 +0200	[thread overview]
Message-ID: <5b27d22d-a3fd-4267-b63b-5de9e6586d5e@kernel.org> (raw)
In-Reply-To: <20260610230415.C0521C88@davehans-spike.ostc.intel.com>

On 6/11/26 01:04, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> == Background ==
> 
> There are basically two parallel ways to look up a VMA: the
> traditional way, which is protected by mmap_lock, and the RCU-based
> per-VMA lock way which is based on RCU and refcounts.
> 
> == Problem ==
> 
> The mmap_lock one is more straightforward to use but it has a big
> disadvantage in that it can not be mixed with page faults since those
> can take mmap_lock for read, which can deadlock when mixed with page
> faults. For example:

... mixed with nested page faults and parallel writers, perhaps?

> 
> 	mmap_read_lock(mm);
> 	// Another thread does mmap_write_lock().
> 	// New mmap_lock readers are blocked.
> 	vma = vma_lookup(mm, address);
> 	// This deadlocks on mmap_read_lock() if it faults:
> 	copy_from_user(address);
> 	mmap_read_unlock(mm);
> 
> The RCU one can be mixed with faults, but it is not available in all

I'd stick to the per-VMA lock term than "RCU one"

> configs, so all RCU users need to be able to fall back to the
> traditional way.

This is now an obsolete statement as patch 1 makes them available? But the
problem is that they can fail and using mmap read lock as a fallback in a
simple way has the above issue?

> == Solution ==
> 
> Add a variant of the RCU-based lookup that waits for writers. This is
> basically the same as the existing RCU-based lookup, but it also takes
> mmap_lock for read and waits for writers to finish before returning
> the VMA. This has some advantages:

I would stress the part that the mmap lock is taken *only temporarily* to
wait for the writers and ensure we obtain a per-vma read lock, and then
dropped again? As that's the main trick IIUC.

>  1. Callers do not need to have a fallback path for when they
>     collide with writers.
>  2. It can be used in contexts where page faults can happen because
>     it can take the mmap_lock for read but never *holds* it.
>  3. Its fast path does not require taking mmap_lock for read.
> 
> Basically, when applied correctly, this approach results in faster
> *and* simpler code.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Christian Brauner <christian@brauner.io>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: netdev@vger.kernel.org
> 
> --
> 
> Changes from v1:
>  * Add a comment explaining that this can not be mixed with other
>    per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
>  * Add a FIXME about making the mmap_read_lock() killable

I don't see it anywhere?

>  * Add more chaneglog bits about the possibility for an infinite goto
>    loop.
>  * Adopt vma_start_read_unlocked() implementation from Lorenzo
> ---
> 
>  b/include/linux/mmap_lock.h |    3 +++
>  b/mm/mmap_lock.c            |   27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait	2026-06-10 15:57:55.828431712 -0700
> +++ b/include/linux/mmap_lock.h	2026-06-10 15:57:55.834431925 -0700
> @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
>  	return vma_start_read_locked_nested(vma, 0);
>  }
>  
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address);
> +
>  static inline void vma_end_read(struct vm_area_struct *vma)
>  {
>  	vma_refcount_put(vma);
> diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait	2026-06-10 15:57:55.831431819 -0700
> +++ b/mm/mmap_lock.c	2026-06-10 16:02:50.723860779 -0700
> @@ -338,6 +338,33 @@ inval:
>  	return NULL;
>  }
>  
> +/*
> + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> + * 'address'.
> + *
> + * Use only in code paths where no mmap_lock and no VMA lock is held.

I think we have various asserts that could be used and are stronger than a
comment ;)

> + *
> + * The fast path does not take mmap_lock.
> + */
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address)
> +{
> +	struct vm_area_struct *vma;
> +
> +	/* Fast path: return stable VMA covering 'address': */
> +	vma = lock_vma_under_rcu(mm, address);
> +	if (vma)
> +		return vma;
> +
> +	/* Slow path: preclude VMA writers by getting mmap read lock. */

Again I would say "temporarily".

> +	guard(rwsem_read)(&mm->mmap_lock);

Aside from the missing vma_lookup() I'm not sure we should also trust the
result of the lookup blindly? Should we also verify we found a vma? Some
callers might not fail the lookup because they will only lookup something
that's sure to be present, but some might fail?

> +	if (!vma_start_read_locked(vma))
> +		return NULL;

You can count me on the side that would rather see explicit operations than
the guard. Exactly because it's a subtle usage of the mmap sem, and yeah
also the tracing that Suren pointed out.

Seems to me uffd_lock_vma() mostly does all this right (but also does more
stuff that we don't want to do). I'm just not sure right know when
vma_start_read_locked() failures can happen in practice here (can it be only
recfount overflow or also refcount being zero? hopefully not zero if we
found the vma under mmap lock for read? comment in
lock_next_vma_under_mmap_lock() seems to hint at that) and what to do about
them. We seem to have some unhelpfully stale comments around.

- uffd_lock_vma() doesn't document that it can return -EAGAIN. (and should
the caller then retry or what?)

- vma_start_read_locked() has a comment saying how it cannot fail, but it in
fact can.

> +
> +	return vma;
> +}
> +
>  static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
>  							    struct vma_iterator *vmi,
>  							    unsigned long from_addr)
> _


  parent reply	other threads:[~2026-06-12 18:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
2026-06-11 19:29   ` Suren Baghdasaryan
2026-06-12 14:09     ` Vlastimil Babka (SUSE)
2026-06-12 14:12   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
2026-06-11  7:53   ` Alice Ryhl
2026-06-11 19:59     ` Suren Baghdasaryan
2026-06-12 15:41       ` Vlastimil Babka (SUSE)
2026-06-12 16:01         ` Suren Baghdasaryan
2026-06-12 16:04         ` Dave Hansen
2026-06-12 16:41           ` Suren Baghdasaryan
2026-06-12 16:54             ` Dave Hansen
2026-06-12 17:07               ` Carlos Llamas
2026-06-12 17:44               ` Suren Baghdasaryan
2026-06-12 18:47                 ` Dave Hansen
2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
2026-06-10 23:40   ` Dave Hansen
2026-06-11 20:35   ` Suren Baghdasaryan
2026-06-11 21:04     ` Dave Hansen
2026-06-12 18:00   ` Vlastimil Babka (SUSE) [this message]
2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
2026-06-11 20:40   ` Suren Baghdasaryan
2026-06-12 18:07   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
2026-06-11 20:44   ` Suren Baghdasaryan
2026-06-12 18:13   ` Vlastimil Babka (SUSE)
2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b27d22d-a3fd-4267-b63b-5de9e6586d5e@kernel.org \
    --to=vbabka@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aliceryhl@google.com \
    --cc=arve@android.com \
    --cc=christian@brauner.io \
    --cc=cmllamas@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tkjos@android.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox