From: Dave Hansen <dave.hansen@intel.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Alice Ryhl <aliceryhl@google.com>
Cc: "Dave Hansen" <dave.hansen@linux.intel.com>,
linux-kernel@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Arve Hjønnevåg" <arve@android.com>,
"Carlos Llamas" <cmllamas@google.com>,
"Christian Brauner" <christian@brauner.io>,
"David Ahern" <dsahern@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-mm@kvack.org, "Lorenzo Stoakes" <ljs@kernel.org>,
netdev@vger.kernel.org, "Shakeel Butt" <shakeel.butt@linux.dev>,
"Todd Kjos" <tkjos@android.com>
Subject: Re: [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock
Date: Fri, 12 Jun 2026 09:04:48 -0700 [thread overview]
Message-ID: <876077cc-db6a-4517-9d81-cdfbb43e599e@intel.com> (raw)
In-Reply-To: <131c6a49-9177-418b-a653-8f13942fb8d3@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]
On 6/12/26 08:41, Vlastimil Babka (SUSE) wrote:
>>> If the vma lookup fails because the mmap write lock is held, but the vma
>>> actually exists (has not been unmapped), then this code might "successfully"
>>> remove the page without invoking zap_vma_range(). This means that the
>>> page does not actually get freed and will just hang around forever until
>>> the process owning the vma exits or Binder needs this page and maps a
>>> new page on top of the page.
>> Yeah, I think if lock_vma_under_rcu() returns NULL you just need to
>> jump to err_mmap_read_lock_failed, like we currently do if
>> mmap_read_trylock() fails.
> I don't think that will be enough as well, as the current code AFAICS does
> something meaninfgul when mmap_read_trylock() suceeds but vma_lookup returns
> NULL because there's no vma at that address. Now we would just assume the
> trylock failed even if the reason was that vma lookup found nothing for the
> address. The problem is that lock_vma_under_rcu() can't distinguish those
> two outcomes, so we would need something that does?
I spent way too much time staring at this yesterday.
I think the key to distinguishing between:
vma==NULL because there's no VMA
and
vma==NULL because of a trylock failure
is binder_alloc_is_mapped(). It won't return false until vm_ops->close()
finishes. vm_ops->close() shouldn't be able to happen while
lock_vma_under_rcu() is held. So if you've got a non-NULL VMA, you've
also got a stable is binder_alloc_is_mapped().
So, if you've got a vma!=NULL *and* binder_alloc_is_mapped()==true, I
think you can be pretty sure you've got the right VMA.
If you have vma==NULL and binder_alloc_is_mapped()==true, you can be
pretty sure that you hit some kind of transient lock_vma_under_rcu()
failure.
I came up with the attached patch. More eyeballs would be welcome.
There's a _lot_ going on here.
[-- Attachment #2: binder-try-vma-lock.patch --]
[-- Type: text/x-patch, Size: 5890 bytes --]
From: Dave Hansen <dave.hansen@linux.intel.com>
tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both
it and mmap_read_trylock().
Long Version:
== Background ==
Historically, binder used an mmap_read_trylock() in its shrinker code.
This ensures that reclaim is not blocked on an mmap_lock. Commit
95bc2d4a9020 ("binder: use per-vma lock in page reclaiming") added
support for the per-VMA lock, but left mmap_read_trylock() as a
fallback.
This fallback was required when per-VMA locking could fail
persistenty, but that is no longer the case.
== Problem ==
The fallback is not worth the complexity here. lock_vma_under_rcu() is
essentially already a non-blocking trylock. The main reason it fails
is also the reason mmap_read_trylock() fails: something is holding
mmap_write_lock().
The only remedy for a collision with mmap_write_lock() is to wait,
which this code can not do. So the "fallback" after
lock_vma_under_rcu() failure is not really a fallback: it is really
likely to just be retrying in vain. That retry in an of itself isn't
horrible. But it adds complexity.
== Solution ==
Now that per-VMA locks are universally available, lock_vma_under_rcu()
will not persistently fail. Rely on it alone and simplify the code.
The one wrinkle is that lock_vma_under_rcu() can return NULL even if
there is a VMA at 'page_addr'. But the later page zapping code
*must* run if the page might be mapped in to a VMA. Stop relying on
vma_lookup() for this. Just rely on binder_alloc_is_mapped().
== Discussion ==
I think there end up being four possible cases to handle. The first
two are straightforward. Note that "mapped" is shorthand for
binder_alloc_is_mapped().
!vma && !mapped: reclaim, no zap
vma && mapped: reclaim, with zap
The next one is arguably wrong in the code today:
vma && !mapped: Wrong VMA. Skip and retry.
It induces LRU_SKIP behavior from another VMA getting mapped. That
seems wrong. It is possible to continue this behavior, but it also
seems a bit silly to go to any lengths to keep doing it if it is
a bug.
The last case comes from normal lock_vma_under_rcu() behavior like
like overflows that is transient. It can _safely_ be handled by
LRU_SKIP. This case is new.
!vma && mapped: VMA lock race. Skip and retry.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org
---
Changes from v1:
* Move forward even if 'vma' is NULL in binder_alloc_free_page().
This can happen if the VMA is unmapped (Sashiko).
* Rename goto label to be more accurate for new lock scheme
Changes from v2:
* Remove review tags. There's too much churn in here to keep them.
* Rely on binder_alloc_free_page() instead of VMA lookups alone to
determine if the range must be zapped.
---
b/drivers/android/binder_alloc.c | 42 ++++++++++++++-------------------------
1 file changed, 16 insertions(+), 26 deletions(-)
diff -puN drivers/android/binder_alloc.c~binder-try-vma-lock drivers/android/binder_alloc.c
--- a/drivers/android/binder_alloc.c~binder-try-vma-lock 2026-06-10 15:57:55.274412018 -0700
+++ b/drivers/android/binder_alloc.c 2026-06-11 15:17:25.240473010 -0700
@@ -1142,7 +1142,7 @@ enum lru_status binder_alloc_free_page(s
struct vm_area_struct *vma;
struct page *page_to_free;
unsigned long page_addr;
- int mm_locked = 0;
+ bool mapped;
size_t index;
if (!mmget_not_zero(mm))
@@ -1151,26 +1151,21 @@ enum lru_status binder_alloc_free_page(s
index = mdata->page_index;
page_addr = alloc->vm_start + index * PAGE_SIZE;
- /* attempt per-vma lock first */
vma = lock_vma_under_rcu(mm, page_addr);
- if (!vma) {
- /* fall back to mmap_lock */
- if (!mmap_read_trylock(mm))
- goto err_mmap_read_lock_failed;
- mm_locked = 1;
- vma = vma_lookup(mm, page_addr);
- }
if (!mutex_trylock(&alloc->mutex))
- goto err_get_alloc_mutex_failed;
+ goto err_vma_end_read;
/*
- * Since a binder_alloc can only be mapped once, we ensure
- * the vma corresponds to this mapping by checking whether
- * the binder_alloc is still mapped.
+ * mapped==true means a VMA should be present. Any
+ * inconsistency should be transient. Skip the page
+ * and try again later.
*/
- if (vma && !binder_alloc_is_mapped(alloc))
- goto err_invalid_vma;
+ mapped = binder_alloc_is_mapped(alloc);
+ if (!vma && mapped)
+ goto err_vma_inconsistent;
+
+ /* mapped==true now implies a valid 'vma' */
trace_binder_unmap_kernel_start(alloc, index);
@@ -1182,32 +1177,27 @@ enum lru_status binder_alloc_free_page(s
list_lru_isolate(lru, item);
spin_unlock(&lru->lock);
- if (vma) {
+ if (mapped) {
trace_binder_unmap_user_start(alloc, index);
zap_vma_range(vma, page_addr, PAGE_SIZE);
trace_binder_unmap_user_end(alloc, index);
+
+ vma_end_read(vma);
}
mutex_unlock(&alloc->mutex);
- if (mm_locked)
- mmap_read_unlock(mm);
- else
- vma_end_read(vma);
mmput_async(mm);
binder_free_page(page_to_free);
return LRU_REMOVED_RETRY;
-err_invalid_vma:
+err_vma_inconsistent:
mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
- if (mm_locked)
- mmap_read_unlock(mm);
- else
+err_vma_end_read:
+ if (vma)
vma_end_read(vma);
-err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
return LRU_SKIP;
_
next prev parent reply other threads:[~2026-06-12 16:05 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 23:04 [PATCH v2 0/5] mm: Unconditional per-VMA locks and cleanups Dave Hansen
2026-06-10 23:04 ` [PATCH v2 1/5] mm: Make per-VMA locks available universally Dave Hansen
2026-06-11 19:29 ` Suren Baghdasaryan
2026-06-12 14:09 ` Vlastimil Babka (SUSE)
2026-06-12 14:12 ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 2/5] binder: Make shrinker rely solely on per-VMA lock Dave Hansen
2026-06-11 7:53 ` Alice Ryhl
2026-06-11 19:59 ` Suren Baghdasaryan
2026-06-12 15:41 ` Vlastimil Babka (SUSE)
2026-06-12 16:01 ` Suren Baghdasaryan
2026-06-12 16:04 ` Dave Hansen [this message]
2026-06-12 16:41 ` Suren Baghdasaryan
2026-06-12 16:54 ` Dave Hansen
2026-06-12 17:07 ` Carlos Llamas
2026-06-12 17:44 ` Suren Baghdasaryan
2026-06-12 18:47 ` Dave Hansen
2026-06-12 19:50 ` Alice Ryhl
2026-06-10 23:04 ` [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers Dave Hansen
2026-06-10 23:40 ` Dave Hansen
2026-06-11 20:35 ` Suren Baghdasaryan
2026-06-11 21:04 ` Dave Hansen
2026-06-12 18:00 ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 4/5] binder: Remove mmap_lock fallback Dave Hansen
2026-06-11 20:40 ` Suren Baghdasaryan
2026-06-12 18:07 ` Vlastimil Babka (SUSE)
2026-06-10 23:04 ` [PATCH v2 5/5] tcp: Remove mmap_lock fallback path Dave Hansen
2026-06-11 20:44 ` Suren Baghdasaryan
2026-06-12 18:13 ` Vlastimil Babka (SUSE)
2026-06-11 20:24 ` [syzbot ci] Re: mm: Unconditional per-VMA locks and cleanups syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=876077cc-db6a-4517-9d81-cdfbb43e599e@intel.com \
--to=dave.hansen@intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aliceryhl@google.com \
--cc=arve@android.com \
--cc=christian@brauner.io \
--cc=cmllamas@google.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tkjos@android.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox