Netdev List
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Alice Ryhl <aliceryhl@google.com>
Cc: "Dave Hansen" <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Carlos Llamas" <cmllamas@google.com>,
	"Christian Brauner" <christian@brauner.io>,
	"David Ahern" <dsahern@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-mm@kvack.org, "Lorenzo Stoakes" <ljs@kernel.org>,
	netdev@vger.kernel.org, "Shakeel Butt" <shakeel.butt@linux.dev>,
	"Todd Kjos" <tkjos@android.com>
Subject: Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
Date: Fri, 12 Jun 2026 09:04:48 -0700	[thread overview]
Message-ID: <876077cc-db6a-4517-9d81-cdfbb43e599e@intel.com> (raw)
In-Reply-To: <131c6a49-9177-418b-a653-8f13942fb8d3@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On 6/12/26 08:41, Vlastimil Babka (SUSE) wrote:
>>> If the vma lookup fails because the mmap write lock is held, but the vma
>>> actually exists (has not been unmapped), then this code might "successfully"
>>> remove the page without invoking zap_vma_range(). This means that the
>>> page does not actually get freed and will just hang around forever until
>>> the process owning the vma exits or Binder needs this page and maps a
>>> new page on top of the page.
>> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
>> jump to err_mmap_read_lock_failed, like we currently do if
>> mmap_read_trylock() fails.
> I don't think that will be enough as well, as the current code AFAICS does
> something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> NULL because there's no vma at that address. Now we would just assume the
> trylock failed even if the reason was that vma lookup found nothing for the
> address. The problem is that lock_vma_under_rcu() can't distinguish those
> two outcomes, so we would need something that does?

I spent way too much time staring at this yesterday.

I think the key to distinguishing between:

	vma==NULL because there's no VMA
and
	vma==NULL because of a trylock failure

is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
finishes. vm_ops->close() shouldn't be able to happen while
lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
also got a stable is binder_alloc_is_mapped().

So, if you've got a vma!=NULL *and* binder_alloc_is_mapped()==true, I
think you can be pretty sure you've got the right VMA.

If you have vma==NULL and binder_alloc_is_mapped()==true, you can be
pretty sure that you hit some kind of transient lock_vma_under_rcu()
failure.

I came up with the attached patch. More eyeballs would be welcome.
There's a _lot_ going on here.

[-- Attachment #2: binder-try-vma-lock.patch --]
[-- Type: text/x-patch, Size: 5890 bytes --]


From: Dave Hansen <dave.hansen@linux.intel.com>

tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().

Long Version:

== Background ==

Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but left mmap_read_trylock() as a
fallback.

This fallback was required when per-VMA locking could fail
persistenty, but that is no longer the case.

== Problem ==

The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().

The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.

== Solution ==

Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.

The one wrinkle is that lock_vma_under_rcu() can return NULL even if
there is a VMA at 'page_addr'. But the later page zapping code
*must* run if the page might be mapped in to a VMA. Stop relying on
vma_lookup() for this. Just rely on binder_alloc_is_mapped().

== Discussion ==

I think there end up being four possible cases to handle. The first
two are straightforward. Note that "mapped" is shorthand for
binder_alloc_is_mapped().

        !vma && !mapped: reclaim, no zap
         vma &&  mapped: reclaim, with zap

The next one is arguably wrong in the code today:

         vma && !mapped: Wrong VMA. Skip and retry.

It induces LRU_SKIP behavior from another VMA getting mapped. That
seems wrong. It is possible to continue this behavior, but it also
seems a bit silly to go to any lengths to keep doing it if it is
a bug.

The last case comes from normal lock_vma_under_rcu() behavior like
like overflows that is transient. It can _safely_ be handled by
LRU_SKIP. This case is new.

        !vma &&  mapped: VMA lock race. Skip and retry.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org

---

Changes from v1:
 * Move forward even if 'vma' is NULL in binder_alloc_free_page().
   This can happen if the VMA is unmapped (Sashiko).
 * Rename goto label to be more accurate for new lock scheme

Changes from v2:
 * Remove review tags. There's too much churn in here to keep them.
 * Rely on binder_alloc_free_page() instead of VMA lookups alone to
   determine if the range must be zapped.

---

 b/drivers/android/binder_alloc.c |   42 ++++++++++++++-------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock	2026-06-10 15:57:55.274412018 -0700
+++ b/drivers/android/binder_alloc.c	2026-06-11 15:17:25.240473010 -0700
@@ -1142,7 +1142,7 @@ enum lru_status binder_alloc_free_page(s
 	struct vm_area_struct *vma;
 	struct page *page_to_free;
 	unsigned long page_addr;
-	int mm_locked = 0;
+	bool mapped;
 	size_t index;
 
 	if (!mmget_not_zero(mm))
@@ -1151,26 +1151,21 @@ enum lru_status binder_alloc_free_page(s
 	index = mdata->page_index;
 	page_addr = alloc->vm_start + index * PAGE_SIZE;
 
-	/* attempt per-vma lock first */
 	vma = lock_vma_under_rcu(mm, page_addr);
-	if (!vma) {
-		/* fall back to mmap_lock */
-		if (!mmap_read_trylock(mm))
-			goto err_mmap_read_lock_failed;
-		mm_locked = 1;
-		vma = vma_lookup(mm, page_addr);
-	}
 
 	if (!mutex_trylock(&alloc->mutex))
-		goto err_get_alloc_mutex_failed;
+		goto err_vma_end_read;
 
 	/*
-	 * Since a binder_alloc can only be mapped once, we ensure
-	 * the vma corresponds to this mapping by checking whether
-	 * the binder_alloc is still mapped.
+	 * mapped==true means a VMA should be present. Any
+	 * inconsistency should be transient.  Skip the page
+	 * and try again later.
 	 */
-	if (vma && !binder_alloc_is_mapped(alloc))
-		goto err_invalid_vma;
+	mapped = binder_alloc_is_mapped(alloc);
+	if (!vma && mapped)
+		goto err_vma_inconsistent;
+
+	/* mapped==true now implies a valid 'vma' */
 
 	trace_binder_unmap_kernel_start(alloc, index);
 
@@ -1182,32 +1177,27 @@ enum lru_status binder_alloc_free_page(s
 	list_lru_isolate(lru, item);
 	spin_unlock(&lru->lock);
 
-	if (vma) {
+	if (mapped) {
 		trace_binder_unmap_user_start(alloc, index);
 
 		zap_vma_range(vma, page_addr, PAGE_SIZE);
 
 		trace_binder_unmap_user_end(alloc, index);
+
+		vma_end_read(vma);
 	}
 
 	mutex_unlock(&alloc->mutex);
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
-		vma_end_read(vma);
 	mmput_async(mm);
 	binder_free_page(page_to_free);
 
 	return LRU_REMOVED_RETRY;
 
-err_invalid_vma:
+err_vma_inconsistent:
 	mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
-	if (mm_locked)
-		mmap_read_unlock(mm);
-	else
+err_vma_end_read:
+	if (vma)
 		vma_end_read(vma);
-err_mmap_read_lock_failed:
 	mmput_async(mm);
 err_mmget:
 	return LRU_SKIP;
_

  parent reply	other threads:[~2026-06-12 16:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
2026-06-11 19:29   ` Suren Baghdasaryan
2026-06-12 14:09     ` Vlastimil Babka (SUSE)
2026-06-12 14:12   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
2026-06-11  7:53   ` Alice Ryhl
2026-06-11 19:59     ` Suren Baghdasaryan
2026-06-12 15:41       ` Vlastimil Babka (SUSE)
2026-06-12 16:01         ` Suren Baghdasaryan
2026-06-12 16:04         ` Dave Hansen [this message]
2026-06-12 16:41           ` Suren Baghdasaryan
2026-06-12 16:54             ` Dave Hansen
2026-06-12 17:07               ` Carlos Llamas
2026-06-12 17:44               ` Suren Baghdasaryan
2026-06-12 18:47                 ` Dave Hansen
2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
2026-06-10 23:40   ` Dave Hansen
2026-06-11 20:35   ` Suren Baghdasaryan
2026-06-11 21:04     ` Dave Hansen
2026-06-12 18:00   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
2026-06-11 20:40   ` Suren Baghdasaryan
2026-06-12 18:07   ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
2026-06-11 20:44   ` Suren Baghdasaryan
2026-06-12 18:13   ` Vlastimil Babka (SUSE)
2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=876077cc-db6a-4517-9d81-cdfbb43e599e@intel.com \
    --to=dave.hansen@intel.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aliceryhl@google.com \
    --cc=arve@android.com \
    --cc=christian@brauner.io \
    --cc=cmllamas@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tkjos@android.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox