[PATCH v2 0/8] binder: faster page installations

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/8] binder: faster page installations
@ 2024-11-07  4:02 Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, kernel-team, Carlos Llamas, Alice Ryhl,
	David Hildenbrand, Liam R. Howlett, Matthew Wilcox, Minchan Kim,
	Nhat Pham, Johannes Weiner, Suren Baghdasaryan, Todd Kjos,
	Viktor Martensson, Hillf Danton, Lorenzo Stoakes

The main focus of these patches is to improve the performance of binder
page installations, primarily by reducing contention on the mmap_lock.
The idea is to allow concurrent page insertion by leveraging per-vma
locking and get_user_pages_remote().

Unfortunately, this required reverting the alloc->lock spinlock back
into a mutex in order to serialize with the shrinker. At least until
finding a better solution e.g. support page zapping with a spinlock.
The trade off is still quite worth it though.

Other patches are also included that remove unsafe and redundant things
such as the alloc->vma pointer or the struct binder_lru_page concept.

Note: I'll work on setting up a page fault handler for binder next.
I believe an idea from Alice Ryhl to deferred the page insertions will
make this finally feasible. I only need to figure out a few performance
bits but if/when done most of the manual page insertion code in binder
could be dropped. :)

Changelog:

v2:
 * fix locking order when upgrading from vma lock to mmap lock
 * switch folio_walk_start() for get_user_pages_remote()
 * release vma/mmap locks and mmput() right after vm_insert_page()
 * add binder_page_alloc() helper for binder_install_single_page()

v1:
https://lore.kernel.org/all/20241105200258.2380168-1-cmllamas@google.com/

Cc: Alice Ryhl <aliceryhl@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Viktor Martensson <vmartensson@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Carlos Llamas (8):
  Revert "binder: switch alloc->mutex to spinlock_t"
  binder: concurrent page installation
  binder: select correct nid for pages in LRU
  binder: remove struct binder_lru_page
  binder: use alloc->mapped to save the vma state
  binder: remove cached alloc->vma pointer
  binder: rename alloc->buffer to vm_start
  binder: use per-vma lock in page installation

 drivers/android/binder.c                |   2 +-
 drivers/android/binder_alloc.c          | 322 ++++++++++++++----------
 drivers/android/binder_alloc.h          |  35 +--
 drivers/android/binder_alloc_selftest.c |  18 +-
 drivers/android/binder_trace.h          |   2 +-
 5 files changed, 212 insertions(+), 167 deletions(-)

-- 
2.47.0.199.ga7371fff76-goog

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t"
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07  8:56   ` Greg Kroah-Hartman
  2024-11-07  4:02 ` [PATCH v2 2/8] binder: concurrent page installation Carlos Llamas
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Mukesh Ojha

This reverts commit 7710e2cca32e7f3958480e8bd44f50e29d0c2509.

In preparation for concurrent page installations, restore the original
alloc->mutex which will serialize zap_page_range_single() against page
installations in subsequent patches (instead of the mmap_sem).

Cc: Mukesh Ojha <quic_mojha@quicinc.com>
[cmllamas: fix trivial conflict due to 2c10a20f5e84a]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c | 46 +++++++++++++++++-----------------
 drivers/android/binder_alloc.h | 10 ++++----
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index b3acbc4174fb..7241bf4a3ff2 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -169,9 +169,9 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
 {
 	struct binder_buffer *buffer;
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	buffer = binder_alloc_prepare_to_free_locked(alloc, user_ptr);
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	return buffer;
 }
 
@@ -597,10 +597,10 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
 	if (!next)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	buffer = binder_alloc_new_buf_locked(alloc, next, size, is_async);
 	if (IS_ERR(buffer)) {
-		spin_unlock(&alloc->lock);
+		mutex_unlock(&alloc->mutex);
 		goto out;
 	}
 
@@ -608,7 +608,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
 	buffer->offsets_size = offsets_size;
 	buffer->extra_buffers_size = extra_buffers_size;
 	buffer->pid = current->tgid;
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 
 	ret = binder_install_buffer_pages(alloc, buffer, size);
 	if (ret) {
@@ -785,17 +785,17 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
 	 * We could eliminate the call to binder_alloc_clear_buf()
 	 * from binder_alloc_deferred_release() by moving this to
 	 * binder_free_buf_locked(). However, that could
-	 * increase contention for the alloc->lock if clear_on_free
-	 * is used frequently for large buffers. This lock is not
+	 * increase contention for the alloc mutex if clear_on_free
+	 * is used frequently for large buffers. The mutex is not
 	 * needed for correctness here.
 	 */
 	if (buffer->clear_on_free) {
 		binder_alloc_clear_buf(alloc, buffer);
 		buffer->clear_on_free = false;
 	}
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	binder_free_buf_locked(alloc, buffer);
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 }
 
 /**
@@ -893,7 +893,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 	struct binder_buffer *buffer;
 
 	buffers = 0;
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	BUG_ON(alloc->vma);
 
 	while ((n = rb_first(&alloc->allocated_buffers))) {
@@ -940,7 +940,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 			page_count++;
 		}
 	}
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	kvfree(alloc->pages);
 	if (alloc->mm)
 		mmdrop(alloc->mm);
@@ -964,7 +964,7 @@ void binder_alloc_print_allocated(struct seq_file *m,
 	struct binder_buffer *buffer;
 	struct rb_node *n;
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	for (n = rb_first(&alloc->allocated_buffers); n; n = rb_next(n)) {
 		buffer = rb_entry(n, struct binder_buffer, rb_node);
 		seq_printf(m, "  buffer %d: %lx size %zd:%zd:%zd %s\n",
@@ -974,7 +974,7 @@ void binder_alloc_print_allocated(struct seq_file *m,
 			   buffer->extra_buffers_size,
 			   buffer->transaction ? "active" : "delivered");
 	}
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 }
 
 /**
@@ -991,7 +991,7 @@ void binder_alloc_print_pages(struct seq_file *m,
 	int lru = 0;
 	int free = 0;
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	/*
 	 * Make sure the binder_alloc is fully initialized, otherwise we might
 	 * read inconsistent state.
@@ -1007,7 +1007,7 @@ void binder_alloc_print_pages(struct seq_file *m,
 				lru++;
 		}
 	}
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	seq_printf(m, "  pages: %d:%d:%d\n", active, lru, free);
 	seq_printf(m, "  pages high watermark: %zu\n", alloc->pages_high);
 }
@@ -1023,10 +1023,10 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
 	struct rb_node *n;
 	int count = 0;
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n))
 		count++;
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	return count;
 }
 
@@ -1071,8 +1071,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		goto err_mmget;
 	if (!mmap_read_trylock(mm))
 		goto err_mmap_read_lock_failed;
-	if (!spin_trylock(&alloc->lock))
-		goto err_get_alloc_lock_failed;
+	if (!mutex_trylock(&alloc->mutex))
+		goto err_get_alloc_mutex_failed;
 	if (!page->page_ptr)
 		goto err_page_already_freed;
 
@@ -1091,7 +1091,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	trace_binder_unmap_kernel_end(alloc, index);
 
 	list_lru_isolate(lru, item);
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	spin_unlock(lock);
 
 	if (vma) {
@@ -1111,8 +1111,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 
 err_invalid_vma:
 err_page_already_freed:
-	spin_unlock(&alloc->lock);
-err_get_alloc_lock_failed:
+	mutex_unlock(&alloc->mutex);
+err_get_alloc_mutex_failed:
 	mmap_read_unlock(mm);
 err_mmap_read_lock_failed:
 	mmput_async(mm);
@@ -1147,7 +1147,7 @@ void binder_alloc_init(struct binder_alloc *alloc)
 	alloc->pid = current->group_leader->pid;
 	alloc->mm = current->mm;
 	mmgrab(alloc->mm);
-	spin_lock_init(&alloc->lock);
+	mutex_init(&alloc->mutex);
 	INIT_LIST_HEAD(&alloc->buffers);
 }
 
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 70387234477e..a5181916942e 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -9,7 +9,7 @@
 #include <linux/rbtree.h>
 #include <linux/list.h>
 #include <linux/mm.h>
-#include <linux/spinlock.h>
+#include <linux/rtmutex.h>
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
 #include <linux/list_lru.h>
@@ -72,7 +72,7 @@ struct binder_lru_page {
 
 /**
  * struct binder_alloc - per-binder proc state for binder allocator
- * @lock:               protects binder_alloc fields
+ * @mutex:              protects binder_alloc fields
  * @vma:                vm_area_struct passed to mmap_handler
  *                      (invariant after mmap)
  * @mm:                 copy of task->mm (invariant after open)
@@ -96,7 +96,7 @@ struct binder_lru_page {
  * struct binder_buffer objects used to track the user buffers
  */
 struct binder_alloc {
-	spinlock_t lock;
+	struct mutex mutex;
 	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long buffer;
@@ -153,9 +153,9 @@ binder_alloc_get_free_async_space(struct binder_alloc *alloc)
 {
 	size_t free_async_space;
 
-	spin_lock(&alloc->lock);
+	mutex_lock(&alloc->mutex);
 	free_async_space = alloc->free_async_space;
-	spin_unlock(&alloc->lock);
+	mutex_unlock(&alloc->mutex);
 	return free_async_space;
 }
 
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/8] binder: concurrent page installation
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07 15:10   ` Suren Baghdasaryan
  2024-11-07  4:02 ` [PATCH v2 3/8] binder: select correct nid for pages in LRU Carlos Llamas
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, David Hildenbrand, Barry Song,
	Liam R. Howlett

Allow multiple callers to install pages simultaneously by downgrading
the mmap_sem to non-exclusive mode. Races to the same PTE are handled
using get_user_pages_remote() to retrieve the already installed page.
This method significantly reduces contention in the mmap semaphore.

To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid
operating on an isolated VMA. In addition, zap_page_range_single() is
called under the alloc->mutex to avoid racing with the shrinker.

Many thanks to Barry Song who posted a similar approach [1].

Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c | 64 +++++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 7241bf4a3ff2..acdc05552603 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -221,26 +221,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 				      struct binder_lru_page *lru_page,
 				      unsigned long addr)
 {
+	struct vm_area_struct *vma;
 	struct page *page;
-	int ret = 0;
+	long npages;
+	int ret;
 
 	if (!mmget_not_zero(alloc->mm))
 		return -ESRCH;
 
-	/*
-	 * Protected with mmap_sem in write mode as multiple tasks
-	 * might race to install the same page.
-	 */
-	mmap_write_lock(alloc->mm);
-	if (binder_get_installed_page(lru_page))
-		goto out;
-
-	if (!alloc->vma) {
-		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
-		ret = -ESRCH;
-		goto out;
-	}
-
 	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
 	if (!page) {
 		pr_err("%d: failed to allocate page\n", alloc->pid);
@@ -248,19 +236,47 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 		goto out;
 	}
 
-	ret = vm_insert_page(alloc->vma, addr, page);
-	if (ret) {
-		pr_err("%d: %s failed to insert page at offset %lx with %d\n",
-		       alloc->pid, __func__, addr - alloc->buffer, ret);
+	mmap_read_lock(alloc->mm);
+	vma = vma_lookup(alloc->mm, addr);
+	if (!vma || vma != alloc->vma) {
+		mmap_read_unlock(alloc->mm);
 		__free_page(page);
-		ret = -ENOMEM;
+		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
+		ret = -ESRCH;
 		goto out;
 	}
 
-	/* Mark page installation complete and safe to use */
-	binder_set_installed_page(lru_page, page);
+	ret = vm_insert_page(vma, addr, page);
+	switch (ret) {
+	case -EBUSY:
+		/*
+		 * EBUSY is ok. Someone installed the pte first but the
+		 * lru_page->page_ptr has not been updated yet. Discard
+		 * our page and look up the one already installed.
+		 */
+		ret = 0;
+		__free_page(page);
+		npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
+		if (npages <= 0) {
+			pr_err("%d: failed to find page at offset %lx\n",
+			       alloc->pid, addr - alloc->buffer);
+			ret = -ESRCH;
+			break;
+		}
+		fallthrough;
+	case 0:
+		/* Mark page installation complete and safe to use */
+		binder_set_installed_page(lru_page, page);
+		break;
+	default:
+		__free_page(page);
+		pr_err("%d: %s failed to insert page at offset %lx with %d\n",
+		       alloc->pid, __func__, addr - alloc->buffer, ret);
+		ret = -ENOMEM;
+		break;
+	}
+	mmap_read_unlock(alloc->mm);
 out:
-	mmap_write_unlock(alloc->mm);
 	mmput_async(alloc->mm);
 	return ret;
 }
@@ -1091,7 +1107,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	trace_binder_unmap_kernel_end(alloc, index);
 
 	list_lru_isolate(lru, item);
-	mutex_unlock(&alloc->mutex);
 	spin_unlock(lock);
 
 	if (vma) {
@@ -1102,6 +1117,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		trace_binder_unmap_user_end(alloc, index);
 	}
 
+	mutex_unlock(&alloc->mutex);
 	mmap_read_unlock(mm);
 	mmput_async(mm);
 	__free_page(page_to_free);
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/8] binder: select correct nid for pages in LRU
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 2/8] binder: concurrent page installation Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 4/8] binder: remove struct binder_lru_page Carlos Llamas
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Nhat Pham, Johannes Weiner

The numa node id for binder pages is currently being derived from the
lru entry under struct binder_lru_page. However, this object doesn't
reflect the node id of the struct page items allocated separately.

Instead, select the correct node id from the page itself. This was made
possible since commit 0a97c01cd20b ("list_lru: allow explicit memcg and
NUMA node selection").

Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index acdc05552603..7e205b508ce9 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -210,7 +210,10 @@ static void binder_lru_freelist_add(struct binder_alloc *alloc,
 
 		trace_binder_free_lru_start(alloc, index);
 
-		ret = list_lru_add_obj(&binder_freelist, &page->lru);
+		ret = list_lru_add(&binder_freelist,
+				   &page->lru,
+				   page_to_nid(page->page_ptr),
+				   NULL);
 		WARN_ON(!ret);
 
 		trace_binder_free_lru_end(alloc, index);
@@ -333,7 +336,10 @@ static void binder_lru_freelist_del(struct binder_alloc *alloc,
 		if (page->page_ptr) {
 			trace_binder_alloc_lru_start(alloc, index);
 
-			on_lru = list_lru_del_obj(&binder_freelist, &page->lru);
+			on_lru = list_lru_del(&binder_freelist,
+					      &page->lru,
+					      page_to_nid(page->page_ptr),
+					      NULL);
 			WARN_ON(!on_lru);
 
 			trace_binder_alloc_lru_end(alloc, index);
@@ -946,8 +952,10 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 			if (!alloc->pages[i].page_ptr)
 				continue;
 
-			on_lru = list_lru_del_obj(&binder_freelist,
-						  &alloc->pages[i].lru);
+			on_lru = list_lru_del(&binder_freelist,
+					      &alloc->pages[i].lru,
+					      page_to_nid(alloc->pages[i].page_ptr),
+					      NULL);
 			binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
 				     "%s: %d: page %d %s\n",
 				     __func__, alloc->pid, i,
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/8] binder: remove struct binder_lru_page
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (2 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 3/8] binder: select correct nid for pages in LRU Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 5/8] binder: use alloc->mapped to save the vma state Carlos Llamas
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Matthew Wilcox, Liam R. Howlett

Remove the redundant struct binder_lru_page concept. Instead, let's use
available struct page->lru and page->private members directly to achieve
the same functionality.

This reduces the maximum memory allocated for alloc->pages from 32768
down to 8192 bytes (aarch64). Savings are per binder instance.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c          | 102 +++++++++++++-----------
 drivers/android/binder_alloc.h          |  16 +---
 drivers/android/binder_alloc_selftest.c |  14 ++--
 3 files changed, 63 insertions(+), 69 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 7e205b508ce9..46789fa530a1 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -176,25 +176,26 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
 }
 
 static inline void
-binder_set_installed_page(struct binder_lru_page *lru_page,
+binder_set_installed_page(struct binder_alloc *alloc,
+			  unsigned long index,
 			  struct page *page)
 {
 	/* Pairs with acquire in binder_get_installed_page() */
-	smp_store_release(&lru_page->page_ptr, page);
+	smp_store_release(&alloc->pages[index], page);
 }
 
 static inline struct page *
-binder_get_installed_page(struct binder_lru_page *lru_page)
+binder_get_installed_page(struct binder_alloc *alloc, unsigned long index)
 {
 	/* Pairs with release in binder_set_installed_page() */
-	return smp_load_acquire(&lru_page->page_ptr);
+	return smp_load_acquire(&alloc->pages[index]);
 }
 
 static void binder_lru_freelist_add(struct binder_alloc *alloc,
 				    unsigned long start, unsigned long end)
 {
-	struct binder_lru_page *page;
 	unsigned long page_addr;
+	struct page *page;
 
 	trace_binder_update_page_range(alloc, false, start, end);
 
@@ -203,16 +204,15 @@ static void binder_lru_freelist_add(struct binder_alloc *alloc,
 		int ret;
 
 		index = (page_addr - alloc->buffer) / PAGE_SIZE;
-		page = &alloc->pages[index];
-
-		if (!binder_get_installed_page(page))
+		page = binder_get_installed_page(alloc, index);
+		if (!page)
 			continue;
 
 		trace_binder_free_lru_start(alloc, index);
 
 		ret = list_lru_add(&binder_freelist,
 				   &page->lru,
-				   page_to_nid(page->page_ptr),
+				   page_to_nid(page),
 				   NULL);
 		WARN_ON(!ret);
 
@@ -220,8 +220,25 @@ static void binder_lru_freelist_add(struct binder_alloc *alloc,
 	}
 }
 
+static struct page *binder_page_alloc(struct binder_alloc *alloc,
+				      unsigned long index,
+				      unsigned long addr)
+{
+	struct page *page;
+
+	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+	if (!page)
+		return NULL;
+
+	page->private = (unsigned long)alloc;
+	INIT_LIST_HEAD(&page->lru);
+	page->index = index;
+
+	return page;
+}
+
 static int binder_install_single_page(struct binder_alloc *alloc,
-				      struct binder_lru_page *lru_page,
+				      unsigned long index,
 				      unsigned long addr)
 {
 	struct vm_area_struct *vma;
@@ -232,9 +249,8 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 	if (!mmget_not_zero(alloc->mm))
 		return -ESRCH;
 
-	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+	page = binder_page_alloc(alloc, index, addr);
 	if (!page) {
-		pr_err("%d: failed to allocate page\n", alloc->pid);
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -254,7 +270,7 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 	case -EBUSY:
 		/*
 		 * EBUSY is ok. Someone installed the pte first but the
-		 * lru_page->page_ptr has not been updated yet. Discard
+		 * alloc->pages[index] has not been updated yet. Discard
 		 * our page and look up the one already installed.
 		 */
 		ret = 0;
@@ -269,7 +285,7 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 		fallthrough;
 	case 0:
 		/* Mark page installation complete and safe to use */
-		binder_set_installed_page(lru_page, page);
+		binder_set_installed_page(alloc, index, page);
 		break;
 	default:
 		__free_page(page);
@@ -288,7 +304,6 @@ static int binder_install_buffer_pages(struct binder_alloc *alloc,
 				       struct binder_buffer *buffer,
 				       size_t size)
 {
-	struct binder_lru_page *page;
 	unsigned long start, final;
 	unsigned long page_addr;
 
@@ -300,14 +315,12 @@ static int binder_install_buffer_pages(struct binder_alloc *alloc,
 		int ret;
 
 		index = (page_addr - alloc->buffer) / PAGE_SIZE;
-		page = &alloc->pages[index];
-
-		if (binder_get_installed_page(page))
+		if (binder_get_installed_page(alloc, index))
 			continue;
 
 		trace_binder_alloc_page_start(alloc, index);
 
-		ret = binder_install_single_page(alloc, page, page_addr);
+		ret = binder_install_single_page(alloc, index, page_addr);
 		if (ret)
 			return ret;
 
@@ -321,8 +334,8 @@ static int binder_install_buffer_pages(struct binder_alloc *alloc,
 static void binder_lru_freelist_del(struct binder_alloc *alloc,
 				    unsigned long start, unsigned long end)
 {
-	struct binder_lru_page *page;
 	unsigned long page_addr;
+	struct page *page;
 
 	trace_binder_update_page_range(alloc, true, start, end);
 
@@ -331,14 +344,14 @@ static void binder_lru_freelist_del(struct binder_alloc *alloc,
 		bool on_lru;
 
 		index = (page_addr - alloc->buffer) / PAGE_SIZE;
-		page = &alloc->pages[index];
+		page = binder_get_installed_page(alloc, index);
 
-		if (page->page_ptr) {
+		if (page) {
 			trace_binder_alloc_lru_start(alloc, index);
 
 			on_lru = list_lru_del(&binder_freelist,
 					      &page->lru,
-					      page_to_nid(page->page_ptr),
+					      page_to_nid(page),
 					      NULL);
 			WARN_ON(!on_lru);
 
@@ -759,11 +772,10 @@ static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
 		(buffer->user_data - alloc->buffer);
 	pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
 	size_t index = buffer_space_offset >> PAGE_SHIFT;
-	struct binder_lru_page *lru_page;
 
-	lru_page = &alloc->pages[index];
 	*pgoffp = pgoff;
-	return lru_page->page_ptr;
+
+	return binder_get_installed_page(alloc, index);
 }
 
 /**
@@ -838,7 +850,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 {
 	struct binder_buffer *buffer;
 	const char *failure_string;
-	int ret, i;
+	int ret;
 
 	if (unlikely(vma->vm_mm != alloc->mm)) {
 		ret = -EINVAL;
@@ -861,17 +873,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
 				sizeof(alloc->pages[0]),
 				GFP_KERNEL);
-	if (alloc->pages == NULL) {
+	if (!alloc->pages) {
 		ret = -ENOMEM;
 		failure_string = "alloc page array";
 		goto err_alloc_pages_failed;
 	}
 
-	for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
-		alloc->pages[i].alloc = alloc;
-		INIT_LIST_HEAD(&alloc->pages[i].lru);
-	}
-
 	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
 	if (!buffer) {
 		ret = -ENOMEM;
@@ -947,20 +954,22 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 		int i;
 
 		for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+			struct page *page;
 			bool on_lru;
 
-			if (!alloc->pages[i].page_ptr)
+			page = binder_get_installed_page(alloc, i);
+			if (!page)
 				continue;
 
 			on_lru = list_lru_del(&binder_freelist,
-					      &alloc->pages[i].lru,
-					      page_to_nid(alloc->pages[i].page_ptr),
+					      &page->lru,
+					      page_to_nid(page),
 					      NULL);
 			binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
 				     "%s: %d: page %d %s\n",
 				     __func__, alloc->pid, i,
 				     on_lru ? "on lru" : "active");
-			__free_page(alloc->pages[i].page_ptr);
+			__free_page(page);
 			page_count++;
 		}
 	}
@@ -1009,7 +1018,7 @@ void binder_alloc_print_allocated(struct seq_file *m,
 void binder_alloc_print_pages(struct seq_file *m,
 			      struct binder_alloc *alloc)
 {
-	struct binder_lru_page *page;
+	struct page *page;
 	int i;
 	int active = 0;
 	int lru = 0;
@@ -1022,8 +1031,8 @@ void binder_alloc_print_pages(struct seq_file *m,
 	 */
 	if (binder_alloc_get_vma(alloc) != NULL) {
 		for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
-			page = &alloc->pages[i];
-			if (!page->page_ptr)
+			page = binder_get_installed_page(alloc, i);
+			if (!page)
 				free++;
 			else if (list_empty(&page->lru))
 				active++;
@@ -1083,8 +1092,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 				       void *cb_arg)
 	__must_hold(lock)
 {
-	struct binder_lru_page *page = container_of(item, typeof(*page), lru);
-	struct binder_alloc *alloc = page->alloc;
+	struct page *page = container_of(item, typeof(*page), lru);
+	struct binder_alloc *alloc = (struct binder_alloc *)page->private;
 	struct mm_struct *mm = alloc->mm;
 	struct vm_area_struct *vma;
 	struct page *page_to_free;
@@ -1097,10 +1106,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		goto err_mmap_read_lock_failed;
 	if (!mutex_trylock(&alloc->mutex))
 		goto err_get_alloc_mutex_failed;
-	if (!page->page_ptr)
-		goto err_page_already_freed;
 
-	index = page - alloc->pages;
+	index = page->index;
 	page_addr = alloc->buffer + index * PAGE_SIZE;
 
 	vma = vma_lookup(mm, page_addr);
@@ -1109,8 +1116,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 
 	trace_binder_unmap_kernel_start(alloc, index);
 
-	page_to_free = page->page_ptr;
-	page->page_ptr = NULL;
+	page_to_free = page;
+	binder_set_installed_page(alloc, index, NULL);
 
 	trace_binder_unmap_kernel_end(alloc, index);
 
@@ -1134,7 +1141,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	return LRU_REMOVED_RETRY;
 
 err_invalid_vma:
-err_page_already_freed:
 	mutex_unlock(&alloc->mutex);
 err_get_alloc_mutex_failed:
 	mmap_read_unlock(mm);
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index a5181916942e..5c2473e95494 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -58,18 +58,6 @@ struct binder_buffer {
 	int pid;
 };
 
-/**
- * struct binder_lru_page - page object used for binder shrinker
- * @page_ptr: pointer to physical page in mmap'd space
- * @lru:      entry in binder_freelist
- * @alloc:    binder_alloc for a proc
- */
-struct binder_lru_page {
-	struct list_head lru;
-	struct page *page_ptr;
-	struct binder_alloc *alloc;
-};
-
 /**
  * struct binder_alloc - per-binder proc state for binder allocator
  * @mutex:              protects binder_alloc fields
@@ -83,7 +71,7 @@ struct binder_lru_page {
  * @allocated_buffers:  rb tree of allocated buffers sorted by address
  * @free_async_space:   VA space available for async buffers. This is
  *                      initialized at mmap time to 1/2 the full VA space
- * @pages:              array of binder_lru_page
+ * @pages:              array of struct page *
  * @buffer_size:        size of address space specified via mmap
  * @pid:                pid for associated binder_proc (invariant after init)
  * @pages_high:         high watermark of offset in @pages
@@ -104,7 +92,7 @@ struct binder_alloc {
 	struct rb_root free_buffers;
 	struct rb_root allocated_buffers;
 	size_t free_async_space;
-	struct binder_lru_page *pages;
+	struct page **pages;
 	size_t buffer_size;
 	int pid;
 	size_t pages_high;
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 81442fe20a69..c6941b9abad9 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -105,10 +105,10 @@ static bool check_buffer_pages_allocated(struct binder_alloc *alloc,
 	page_addr = buffer->user_data;
 	for (; page_addr < end; page_addr += PAGE_SIZE) {
 		page_index = (page_addr - alloc->buffer) / PAGE_SIZE;
-		if (!alloc->pages[page_index].page_ptr ||
-		    !list_empty(&alloc->pages[page_index].lru)) {
+		if (!alloc->pages[page_index] ||
+		    !list_empty(&alloc->pages[page_index]->lru)) {
 			pr_err("expect alloc but is %s at page index %d\n",
-			       alloc->pages[page_index].page_ptr ?
+			       alloc->pages[page_index] ?
 			       "lru" : "free", page_index);
 			return false;
 		}
@@ -148,10 +148,10 @@ static void binder_selftest_free_buf(struct binder_alloc *alloc,
 		 * if binder shrinker ran during binder_alloc_free_buf
 		 * calls above.
 		 */
-		if (list_empty(&alloc->pages[i].lru)) {
+		if (list_empty(&alloc->pages[i]->lru)) {
 			pr_err_size_seq(sizes, seq);
 			pr_err("expect lru but is %s at page index %d\n",
-			       alloc->pages[i].page_ptr ? "alloc" : "free", i);
+			       alloc->pages[i] ? "alloc" : "free", i);
 			binder_selftest_failures++;
 		}
 	}
@@ -168,9 +168,9 @@ static void binder_selftest_free_page(struct binder_alloc *alloc)
 	}
 
 	for (i = 0; i < (alloc->buffer_size / PAGE_SIZE); i++) {
-		if (alloc->pages[i].page_ptr) {
+		if (alloc->pages[i]) {
 			pr_err("expect free but is %s at page index %d\n",
-			       list_empty(&alloc->pages[i].lru) ?
+			       list_empty(&alloc->pages[i]->lru) ?
 			       "alloc" : "lru", i);
 			binder_selftest_failures++;
 		}
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/8] binder: use alloc->mapped to save the vma state
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (3 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 4/8] binder: remove struct binder_lru_page Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 6/8] binder: remove cached alloc->vma pointer Carlos Llamas
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Minchan Kim, Liam R. Howlett,
	Matthew Wilcox

It is unsafe to use alloc->vma outside of the mmap_sem. Instead, add a
new boolean alloc->mapped to save the vma state (mapped or unmmaped) and
use this to validate several paths. This change is in preparation for
removing the alloc->vma pointer in a subsequent patch.

Cc: Minchan Kim <minchan@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c          | 29 ++++++++++++++++++-------
 drivers/android/binder_alloc.h          |  2 ++
 drivers/android/binder_alloc_selftest.c |  2 +-
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 46789fa530a1..7239b92ef20a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -220,6 +220,19 @@ static void binder_lru_freelist_add(struct binder_alloc *alloc,
 	}
 }
 
+static inline
+void binder_alloc_set_mapped(struct binder_alloc *alloc, bool state)
+{
+	/* pairs with smp_load_acquire in binder_alloc_is_mapped() */
+	smp_store_release(&alloc->mapped, state);
+}
+
+static inline bool binder_alloc_is_mapped(struct binder_alloc *alloc)
+{
+	/* pairs with smp_store_release in binder_alloc_set_mapped() */
+	return smp_load_acquire(&alloc->mapped);
+}
+
 static struct page *binder_page_alloc(struct binder_alloc *alloc,
 				      unsigned long index,
 				      unsigned long addr)
@@ -257,7 +270,7 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 
 	mmap_read_lock(alloc->mm);
 	vma = vma_lookup(alloc->mm, addr);
-	if (!vma || vma != alloc->vma) {
+	if (!vma || !binder_alloc_is_mapped(alloc)) {
 		mmap_read_unlock(alloc->mm);
 		__free_page(page);
 		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
@@ -611,7 +624,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
 	int ret;
 
 	/* Check binder_alloc is fully initialized */
-	if (!binder_alloc_get_vma(alloc)) {
+	if (!binder_alloc_is_mapped(alloc)) {
 		binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
 				   "%d: binder_alloc_buf, no vma\n",
 				   alloc->pid);
@@ -893,7 +906,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 	alloc->free_async_space = alloc->buffer_size / 2;
 
 	/* Signal binder_alloc is fully initialized */
-	binder_alloc_set_vma(alloc, vma);
+	binder_alloc_set_mapped(alloc, true);
 
 	return 0;
 
@@ -923,7 +936,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 
 	buffers = 0;
 	mutex_lock(&alloc->mutex);
-	BUG_ON(alloc->vma);
+	BUG_ON(alloc->mapped);
 
 	while ((n = rb_first(&alloc->allocated_buffers))) {
 		buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -1029,7 +1042,7 @@ void binder_alloc_print_pages(struct seq_file *m,
 	 * Make sure the binder_alloc is fully initialized, otherwise we might
 	 * read inconsistent state.
 	 */
-	if (binder_alloc_get_vma(alloc) != NULL) {
+	if (binder_alloc_is_mapped(alloc)) {
 		for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
 			page = binder_get_installed_page(alloc, i);
 			if (!page)
@@ -1069,12 +1082,12 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
  * @alloc: binder_alloc for this proc
  *
  * Called from binder_vma_close() when releasing address space.
- * Clears alloc->vma to prevent new incoming transactions from
+ * Clears alloc->mapped to prevent new incoming transactions from
  * allocating more buffers.
  */
 void binder_alloc_vma_close(struct binder_alloc *alloc)
 {
-	binder_alloc_set_vma(alloc, NULL);
+	binder_alloc_set_mapped(alloc, false);
 }
 
 /**
@@ -1111,7 +1124,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	page_addr = alloc->buffer + index * PAGE_SIZE;
 
 	vma = vma_lookup(mm, page_addr);
-	if (vma && vma != binder_alloc_get_vma(alloc))
+	if (vma && !binder_alloc_is_mapped(alloc))
 		goto err_invalid_vma;
 
 	trace_binder_unmap_kernel_start(alloc, index);
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 5c2473e95494..89f6fa25c670 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -75,6 +75,7 @@ struct binder_buffer {
  * @buffer_size:        size of address space specified via mmap
  * @pid:                pid for associated binder_proc (invariant after init)
  * @pages_high:         high watermark of offset in @pages
+ * @mapped:             whether the vm area is mapped
  * @oneway_spam_detected: %true if oneway spam detection fired, clear that
  * flag once the async buffer has returned to a healthy state
  *
@@ -96,6 +97,7 @@ struct binder_alloc {
 	size_t buffer_size;
 	int pid;
 	size_t pages_high;
+	bool mapped;
 	bool oneway_spam_detected;
 };
 
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index c6941b9abad9..2dda82d0d5e8 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -291,7 +291,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
 	if (!binder_selftest_run)
 		return;
 	mutex_lock(&binder_selftest_lock);
-	if (!binder_selftest_run || !alloc->vma)
+	if (!binder_selftest_run || !alloc->mapped)
 		goto done;
 	pr_info("STARTED\n");
 	binder_selftest_alloc_offset(alloc, end_offset, 0);
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/8] binder: remove cached alloc->vma pointer
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (4 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 5/8] binder: use alloc->mapped to save the vma state Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07 15:33   ` Suren Baghdasaryan
  2024-11-07  4:02 ` [PATCH v2 7/8] binder: rename alloc->buffer to vm_start Carlos Llamas
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Minchan Kim, Liam R. Howlett,
	Matthew Wilcox

All usage of the alloc->vma pointer has been removed from binder.
Instead, paths has been refactored to use either vma_lookup() or the
alloc->mapped state. Using the alloc->vma was unsafe and required the
mmap_sem in exclusive mode, which caused several performance and
security issues in the past.

Cc: Minchan Kim <minchan@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c | 14 --------------
 drivers/android/binder_alloc.h |  3 ---
 2 files changed, 17 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 7239b92ef20a..13b476cc5b62 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -377,20 +377,6 @@ static void binder_lru_freelist_del(struct binder_alloc *alloc,
 	}
 }
 
-static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
-		struct vm_area_struct *vma)
-{
-	/* pairs with smp_load_acquire in binder_alloc_get_vma() */
-	smp_store_release(&alloc->vma, vma);
-}
-
-static inline struct vm_area_struct *binder_alloc_get_vma(
-		struct binder_alloc *alloc)
-{
-	/* pairs with smp_store_release in binder_alloc_set_vma() */
-	return smp_load_acquire(&alloc->vma);
-}
-
 static void debug_no_space_locked(struct binder_alloc *alloc)
 {
 	size_t largest_alloc_size = 0;
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 89f6fa25c670..b9e2e9dc90b3 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -61,8 +61,6 @@ struct binder_buffer {
 /**
  * struct binder_alloc - per-binder proc state for binder allocator
  * @mutex:              protects binder_alloc fields
- * @vma:                vm_area_struct passed to mmap_handler
- *                      (invariant after mmap)
  * @mm:                 copy of task->mm (invariant after open)
  * @buffer:             base of per-proc address space mapped via mmap
  * @buffers:            list of all buffers for this proc
@@ -86,7 +84,6 @@ struct binder_buffer {
  */
 struct binder_alloc {
 	struct mutex mutex;
-	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long buffer;
 	struct list_head buffers;
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 7/8] binder: rename alloc->buffer to vm_start
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (5 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 6/8] binder: remove cached alloc->vma pointer Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07  4:02 ` [PATCH v2 8/8] binder: use per-vma lock in page installation Carlos Llamas
  2024-11-07 16:18 ` [PATCH v2 0/8] binder: faster page installations Suren Baghdasaryan
  8 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team

The alloc->buffer field in struct binder_alloc stores the starting
address of the mapped vma, rename this field to alloc->vm_start to
better reflect its purpose. It also avoids confusion with the binder
buffer concept, e.g. transaction->buffer.

No functional changes in this patch.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder.c                |  2 +-
 drivers/android/binder_alloc.c          | 28 ++++++++++++-------------
 drivers/android/binder_alloc.h          |  4 ++--
 drivers/android/binder_alloc_selftest.c |  2 +-
 drivers/android/binder_trace.h          |  2 +-
 5 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 978740537a1a..57265cabec43 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -6350,7 +6350,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m,
 		seq_printf(m, " node %d", buffer->target_node->debug_id);
 	seq_printf(m, " size %zd:%zd offset %lx\n",
 		   buffer->data_size, buffer->offsets_size,
-		   proc->alloc.buffer - buffer->user_data);
+		   proc->alloc.vm_start - buffer->user_data);
 }
 
 static void print_binder_work_ilocked(struct seq_file *m,
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 13b476cc5b62..814435a2601a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -61,7 +61,7 @@ static size_t binder_alloc_buffer_size(struct binder_alloc *alloc,
 				       struct binder_buffer *buffer)
 {
 	if (list_is_last(&buffer->entry, &alloc->buffers))
-		return alloc->buffer + alloc->buffer_size - buffer->user_data;
+		return alloc->vm_start + alloc->buffer_size - buffer->user_data;
 	return binder_buffer_next(buffer)->user_data - buffer->user_data;
 }
 
@@ -203,7 +203,7 @@ static void binder_lru_freelist_add(struct binder_alloc *alloc,
 		size_t index;
 		int ret;
 
-		index = (page_addr - alloc->buffer) / PAGE_SIZE;
+		index = (page_addr - alloc->vm_start) / PAGE_SIZE;
 		page = binder_get_installed_page(alloc, index);
 		if (!page)
 			continue;
@@ -291,7 +291,7 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 		npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
 		if (npages <= 0) {
 			pr_err("%d: failed to find page at offset %lx\n",
-			       alloc->pid, addr - alloc->buffer);
+			       alloc->pid, addr - alloc->vm_start);
 			ret = -ESRCH;
 			break;
 		}
@@ -303,7 +303,7 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 	default:
 		__free_page(page);
 		pr_err("%d: %s failed to insert page at offset %lx with %d\n",
-		       alloc->pid, __func__, addr - alloc->buffer, ret);
+		       alloc->pid, __func__, addr - alloc->vm_start, ret);
 		ret = -ENOMEM;
 		break;
 	}
@@ -327,7 +327,7 @@ static int binder_install_buffer_pages(struct binder_alloc *alloc,
 		unsigned long index;
 		int ret;
 
-		index = (page_addr - alloc->buffer) / PAGE_SIZE;
+		index = (page_addr - alloc->vm_start) / PAGE_SIZE;
 		if (binder_get_installed_page(alloc, index))
 			continue;
 
@@ -356,7 +356,7 @@ static void binder_lru_freelist_del(struct binder_alloc *alloc,
 		unsigned long index;
 		bool on_lru;
 
-		index = (page_addr - alloc->buffer) / PAGE_SIZE;
+		index = (page_addr - alloc->vm_start) / PAGE_SIZE;
 		page = binder_get_installed_page(alloc, index);
 
 		if (page) {
@@ -708,8 +708,8 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
 	BUG_ON(buffer->free);
 	BUG_ON(size > buffer_size);
 	BUG_ON(buffer->transaction != NULL);
-	BUG_ON(buffer->user_data < alloc->buffer);
-	BUG_ON(buffer->user_data > alloc->buffer + alloc->buffer_size);
+	BUG_ON(buffer->user_data < alloc->vm_start);
+	BUG_ON(buffer->user_data > alloc->vm_start + alloc->buffer_size);
 
 	if (buffer->async_transaction) {
 		alloc->free_async_space += buffer_size;
@@ -768,7 +768,7 @@ static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
 					  pgoff_t *pgoffp)
 {
 	binder_size_t buffer_space_offset = buffer_offset +
-		(buffer->user_data - alloc->buffer);
+		(buffer->user_data - alloc->vm_start);
 	pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
 	size_t index = buffer_space_offset >> PAGE_SHIFT;
 
@@ -867,7 +867,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 				   SZ_4M);
 	mutex_unlock(&binder_alloc_mmap_lock);
 
-	alloc->buffer = vma->vm_start;
+	alloc->vm_start = vma->vm_start;
 
 	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
 				sizeof(alloc->pages[0]),
@@ -885,7 +885,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 		goto err_alloc_buf_struct_failed;
 	}
 
-	buffer->user_data = alloc->buffer;
+	buffer->user_data = alloc->vm_start;
 	list_add(&buffer->entry, &alloc->buffers);
 	buffer->free = 1;
 	binder_insert_free_buffer(alloc, buffer);
@@ -900,7 +900,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 	kvfree(alloc->pages);
 	alloc->pages = NULL;
 err_alloc_pages_failed:
-	alloc->buffer = 0;
+	alloc->vm_start = 0;
 	mutex_lock(&binder_alloc_mmap_lock);
 	alloc->buffer_size = 0;
 err_already_mapped:
@@ -1001,7 +1001,7 @@ void binder_alloc_print_allocated(struct seq_file *m,
 		buffer = rb_entry(n, struct binder_buffer, rb_node);
 		seq_printf(m, "  buffer %d: %lx size %zd:%zd:%zd %s\n",
 			   buffer->debug_id,
-			   buffer->user_data - alloc->buffer,
+			   buffer->user_data - alloc->vm_start,
 			   buffer->data_size, buffer->offsets_size,
 			   buffer->extra_buffers_size,
 			   buffer->transaction ? "active" : "delivered");
@@ -1107,7 +1107,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		goto err_get_alloc_mutex_failed;
 
 	index = page->index;
-	page_addr = alloc->buffer + index * PAGE_SIZE;
+	page_addr = alloc->vm_start + index * PAGE_SIZE;
 
 	vma = vma_lookup(mm, page_addr);
 	if (vma && !binder_alloc_is_mapped(alloc))
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index b9e2e9dc90b3..05a01d980f61 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -62,7 +62,7 @@ struct binder_buffer {
  * struct binder_alloc - per-binder proc state for binder allocator
  * @mutex:              protects binder_alloc fields
  * @mm:                 copy of task->mm (invariant after open)
- * @buffer:             base of per-proc address space mapped via mmap
+ * @vm_start:           base of per-proc address space mapped via mmap
  * @buffers:            list of all buffers for this proc
  * @free_buffers:       rb tree of buffers available for allocation
  *                      sorted by size
@@ -85,7 +85,7 @@ struct binder_buffer {
 struct binder_alloc {
 	struct mutex mutex;
 	struct mm_struct *mm;
-	unsigned long buffer;
+	unsigned long vm_start;
 	struct list_head buffers;
 	struct rb_root free_buffers;
 	struct rb_root allocated_buffers;
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 2dda82d0d5e8..d2d086d2c037 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -104,7 +104,7 @@ static bool check_buffer_pages_allocated(struct binder_alloc *alloc,
 	end = PAGE_ALIGN(buffer->user_data + size);
 	page_addr = buffer->user_data;
 	for (; page_addr < end; page_addr += PAGE_SIZE) {
-		page_index = (page_addr - alloc->buffer) / PAGE_SIZE;
+		page_index = (page_addr - alloc->vm_start) / PAGE_SIZE;
 		if (!alloc->pages[page_index] ||
 		    !list_empty(&alloc->pages[page_index]->lru)) {
 			pr_err("expect alloc but is %s at page index %d\n",
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index fe38c6fc65d0..16de1b9e72f7 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h
@@ -328,7 +328,7 @@ TRACE_EVENT(binder_update_page_range,
 	TP_fast_assign(
 		__entry->proc = alloc->pid;
 		__entry->allocate = allocate;
-		__entry->offset = start - alloc->buffer;
+		__entry->offset = start - alloc->vm_start;
 		__entry->size = end - start;
 	),
 	TP_printk("proc=%d allocate=%d offset=%zu size=%zu",
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (6 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 7/8] binder: rename alloc->buffer to vm_start Carlos Llamas
@ 2024-11-07  4:02 ` Carlos Llamas
  2024-11-07 16:16   ` Suren Baghdasaryan
  2024-11-07 16:18 ` [PATCH v2 0/8] binder: faster page installations Suren Baghdasaryan
  8 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07  4:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan
  Cc: linux-kernel, kernel-team, Nhat Pham, Johannes Weiner, Barry Song,
	Hillf Danton, Lorenzo Stoakes

Use per-vma locking for concurrent page installations, this minimizes
contention with unrelated vmas improving performance. The mmap_lock is
still acquired when needed though, e.g. before get_user_pages_remote().

Many thanks to Barry Song who posted a similar approach [1].

Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
 drivers/android/binder_alloc.c | 85 +++++++++++++++++++++++-----------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 814435a2601a..debfa541e01b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -233,6 +233,56 @@ static inline bool binder_alloc_is_mapped(struct binder_alloc *alloc)
 	return smp_load_acquire(&alloc->mapped);
 }
 
+static struct page *binder_page_lookup(struct mm_struct *mm,
+				       unsigned long addr)
+{
+	struct page *page;
+	long ret;
+
+	if (!mmget_not_zero(mm))
+		return NULL;
+
+	mmap_read_lock(mm);
+	ret = get_user_pages_remote(mm, addr, 1, 0, &page, NULL);
+	mmap_read_unlock(mm);
+	mmput_async(mm);
+
+	return ret > 0 ? page : NULL;
+}
+
+static int binder_page_insert(struct binder_alloc *alloc,
+			      unsigned long addr,
+			      struct page *page)
+{
+	struct mm_struct *mm = alloc->mm;
+	struct vm_area_struct *vma;
+	int ret = -ESRCH;
+
+	if (!mmget_not_zero(mm))
+		return -ESRCH;
+
+	/* attempt per-vma lock first */
+	vma = lock_vma_under_rcu(mm, addr);
+	if (!vma)
+		goto lock_mmap;
+
+	if (binder_alloc_is_mapped(alloc))
+		ret = vm_insert_page(vma, addr, page);
+	vma_end_read(vma);
+	goto done;
+
+lock_mmap:
+	/* fall back to mmap_lock */
+	mmap_read_lock(mm);
+	vma = vma_lookup(mm, addr);
+	if (vma && binder_alloc_is_mapped(alloc))
+		ret = vm_insert_page(vma, addr, page);
+	mmap_read_unlock(mm);
+done:
+	mmput_async(mm);
+	return ret;
+}
+
 static struct page *binder_page_alloc(struct binder_alloc *alloc,
 				      unsigned long index,
 				      unsigned long addr)
@@ -254,31 +304,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 				      unsigned long index,
 				      unsigned long addr)
 {
-	struct vm_area_struct *vma;
 	struct page *page;
-	long npages;
 	int ret;
 
-	if (!mmget_not_zero(alloc->mm))
-		return -ESRCH;
-
 	page = binder_page_alloc(alloc, index, addr);
-	if (!page) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	mmap_read_lock(alloc->mm);
-	vma = vma_lookup(alloc->mm, addr);
-	if (!vma || !binder_alloc_is_mapped(alloc)) {
-		mmap_read_unlock(alloc->mm);
-		__free_page(page);
-		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
-		ret = -ESRCH;
-		goto out;
-	}
+	if (!page)
+		return -ENOMEM;
 
-	ret = vm_insert_page(vma, addr, page);
+	ret = binder_page_insert(alloc, addr, page);
 	switch (ret) {
 	case -EBUSY:
 		/*
@@ -288,12 +321,11 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 		 */
 		ret = 0;
 		__free_page(page);
-		npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
-		if (npages <= 0) {
+		page = binder_page_lookup(alloc->mm, addr);
+		if (!page) {
 			pr_err("%d: failed to find page at offset %lx\n",
 			       alloc->pid, addr - alloc->vm_start);
-			ret = -ESRCH;
-			break;
+			return -ESRCH;
 		}
 		fallthrough;
 	case 0:
@@ -304,12 +336,9 @@ static int binder_install_single_page(struct binder_alloc *alloc,
 		__free_page(page);
 		pr_err("%d: %s failed to insert page at offset %lx with %d\n",
 		       alloc->pid, __func__, addr - alloc->vm_start, ret);
-		ret = -ENOMEM;
 		break;
 	}
-	mmap_read_unlock(alloc->mm);
-out:
-	mmput_async(alloc->mm);
+
 	return ret;
 }
 
-- 
2.47.0.199.ga7371fff76-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t"
  2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
@ 2024-11-07  8:56   ` Greg Kroah-Hartman
  2024-11-07 14:30     ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2024-11-07  8:56 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	linux-kernel, kernel-team, Mukesh Ojha

On Thu, Nov 07, 2024 at 04:02:23AM +0000, Carlos Llamas wrote:
> This reverts commit 7710e2cca32e7f3958480e8bd44f50e29d0c2509.
> 
> In preparation for concurrent page installations, restore the original
> alloc->mutex which will serialize zap_page_range_single() against page
> installations in subsequent patches (instead of the mmap_sem).
> 
> Cc: Mukesh Ojha <quic_mojha@quicinc.com>
> [cmllamas: fix trivial conflict due to 2c10a20f5e84a]

Nit, commits have the full name in them as well, not just the sha1 one,
i.e.:
	2c10a20f5e84 ("binder_alloc: Fix sleeping function called from invalid context")

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t"
  2024-11-07  8:56   ` Greg Kroah-Hartman
@ 2024-11-07 14:30     ` Carlos Llamas
  0 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 14:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	linux-kernel, kernel-team, Mukesh Ojha

On Thu, Nov 07, 2024 at 09:56:16AM +0100, Greg Kroah-Hartman wrote:
> On Thu, Nov 07, 2024 at 04:02:23AM +0000, Carlos Llamas wrote:
> > This reverts commit 7710e2cca32e7f3958480e8bd44f50e29d0c2509.
> > 
> > In preparation for concurrent page installations, restore the original
> > alloc->mutex which will serialize zap_page_range_single() against page
> > installations in subsequent patches (instead of the mmap_sem).
> > 
> > Cc: Mukesh Ojha <quic_mojha@quicinc.com>
> > [cmllamas: fix trivial conflict due to 2c10a20f5e84a]
> 
> Nit, commits have the full name in them as well, not just the sha1 one,
> i.e.:
> 	2c10a20f5e84 ("binder_alloc: Fix sleeping function called from invalid context")
>

Sounds good, done for v3.

Cheers,
Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/8] binder: concurrent page installation
  2024-11-07  4:02 ` [PATCH v2 2/8] binder: concurrent page installation Carlos Llamas
@ 2024-11-07 15:10   ` Suren Baghdasaryan
  2024-11-07 15:31     ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 15:10 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, David Hildenbrand, Barry Song, Liam R. Howlett

On Wed, Nov 6, 2024 at 8:02 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> Allow multiple callers to install pages simultaneously by downgrading
> the mmap_sem to non-exclusive mode. Races to the same PTE are handled
> using get_user_pages_remote() to retrieve the already installed page.
> This method significantly reduces contention in the mmap semaphore.
>
> To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid
> operating on an isolated VMA. In addition, zap_page_range_single() is
> called under the alloc->mutex to avoid racing with the shrinker.
>
> Many thanks to Barry Song who posted a similar approach [1].
>
> Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
>  drivers/android/binder_alloc.c | 64 +++++++++++++++++++++-------------
>  1 file changed, 40 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 7241bf4a3ff2..acdc05552603 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -221,26 +221,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>                                       struct binder_lru_page *lru_page,
>                                       unsigned long addr)
>  {
> +       struct vm_area_struct *vma;
>         struct page *page;
> -       int ret = 0;
> +       long npages;
> +       int ret;
>
>         if (!mmget_not_zero(alloc->mm))
>                 return -ESRCH;
>
> -       /*
> -        * Protected with mmap_sem in write mode as multiple tasks
> -        * might race to install the same page.
> -        */
> -       mmap_write_lock(alloc->mm);
> -       if (binder_get_installed_page(lru_page))
> -               goto out;
> -
> -       if (!alloc->vma) {
> -               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> -               ret = -ESRCH;
> -               goto out;
> -       }
> -
>         page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
>         if (!page) {
>                 pr_err("%d: failed to allocate page\n", alloc->pid);
> @@ -248,19 +236,47 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>                 goto out;
>         }
>
> -       ret = vm_insert_page(alloc->vma, addr, page);
> -       if (ret) {
> -               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> -                      alloc->pid, __func__, addr - alloc->buffer, ret);
> +       mmap_read_lock(alloc->mm);
> +       vma = vma_lookup(alloc->mm, addr);
> +       if (!vma || vma != alloc->vma) {
> +               mmap_read_unlock(alloc->mm);

nit: instead of unlocking here you could have another label before
mmap_read_unlock() at the end and jump to it.

>                 __free_page(page);
> -               ret = -ENOMEM;
> +               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> +               ret = -ESRCH;
>                 goto out;
>         }
>
> -       /* Mark page installation complete and safe to use */
> -       binder_set_installed_page(lru_page, page);
> +       ret = vm_insert_page(vma, addr, page);
> +       switch (ret) {
> +       case -EBUSY:
> +               /*
> +                * EBUSY is ok. Someone installed the pte first but the
> +                * lru_page->page_ptr has not been updated yet. Discard
> +                * our page and look up the one already installed.
> +                */
> +               ret = 0;
> +               __free_page(page);
> +               npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
> +               if (npages <= 0) {
> +                       pr_err("%d: failed to find page at offset %lx\n",
> +                              alloc->pid, addr - alloc->buffer);
> +                       ret = -ESRCH;
> +                       break;
> +               }
> +               fallthrough;
> +       case 0:
> +               /* Mark page installation complete and safe to use */
> +               binder_set_installed_page(lru_page, page);
> +               break;
> +       default:
> +               __free_page(page);
> +               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> +                      alloc->pid, __func__, addr - alloc->buffer, ret);
> +               ret = -ENOMEM;

vm_insert_page() can return EINVAL (see
validate_page_before_insert()). Instead of converting other codes into
ENOMEM why not return "ret" as is?

> +               break;
> +       }
> +       mmap_read_unlock(alloc->mm);
>  out:
> -       mmap_write_unlock(alloc->mm);
>         mmput_async(alloc->mm);
>         return ret;
>  }
> @@ -1091,7 +1107,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>         trace_binder_unmap_kernel_end(alloc, index);
>
>         list_lru_isolate(lru, item);
> -       mutex_unlock(&alloc->mutex);
>         spin_unlock(lock);
>
>         if (vma) {
> @@ -1102,6 +1117,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>                 trace_binder_unmap_user_end(alloc, index);
>         }
>
> +       mutex_unlock(&alloc->mutex);
>         mmap_read_unlock(mm);
>         mmput_async(mm);
>         __free_page(page_to_free);
> --
> 2.47.0.199.ga7371fff76-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/8] binder: concurrent page installation
  2024-11-07 15:10   ` Suren Baghdasaryan
@ 2024-11-07 15:31     ` Carlos Llamas
  0 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 15:31 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, David Hildenbrand, Barry Song, Liam R. Howlett

On Thu, Nov 07, 2024 at 07:10:28AM -0800, Suren Baghdasaryan wrote:
> On Wed, Nov 6, 2024 at 8:02 PM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > Allow multiple callers to install pages simultaneously by downgrading
> > the mmap_sem to non-exclusive mode. Races to the same PTE are handled
> > using get_user_pages_remote() to retrieve the already installed page.
> > This method significantly reduces contention in the mmap semaphore.
> >
> > To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid
> > operating on an isolated VMA. In addition, zap_page_range_single() is
> > called under the alloc->mutex to avoid racing with the shrinker.
> >
> > Many thanks to Barry Song who posted a similar approach [1].
> >
> > Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Barry Song <v-songbaohua@oppo.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
> > ---
> >  drivers/android/binder_alloc.c | 64 +++++++++++++++++++++-------------
> >  1 file changed, 40 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index 7241bf4a3ff2..acdc05552603 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -221,26 +221,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> >                                       struct binder_lru_page *lru_page,
> >                                       unsigned long addr)
> >  {
> > +       struct vm_area_struct *vma;
> >         struct page *page;
> > -       int ret = 0;
> > +       long npages;
> > +       int ret;
> >
> >         if (!mmget_not_zero(alloc->mm))
> >                 return -ESRCH;
> >
> > -       /*
> > -        * Protected with mmap_sem in write mode as multiple tasks
> > -        * might race to install the same page.
> > -        */
> > -       mmap_write_lock(alloc->mm);
> > -       if (binder_get_installed_page(lru_page))
> > -               goto out;
> > -
> > -       if (!alloc->vma) {
> > -               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > -               ret = -ESRCH;
> > -               goto out;
> > -       }
> > -
> >         page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> >         if (!page) {
> >                 pr_err("%d: failed to allocate page\n", alloc->pid);
> > @@ -248,19 +236,47 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> >                 goto out;
> >         }
> >
> > -       ret = vm_insert_page(alloc->vma, addr, page);
> > -       if (ret) {
> > -               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> > -                      alloc->pid, __func__, addr - alloc->buffer, ret);
> > +       mmap_read_lock(alloc->mm);
> > +       vma = vma_lookup(alloc->mm, addr);
> > +       if (!vma || vma != alloc->vma) {
> > +               mmap_read_unlock(alloc->mm);
> 
> nit: instead of unlocking here you could have another label before
> mmap_read_unlock() at the end and jump to it.

Sounds good, I'll do this.

> 
> >                 __free_page(page);
> > -               ret = -ENOMEM;
> > +               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > +               ret = -ESRCH;
> >                 goto out;
> >         }
> >
> > -       /* Mark page installation complete and safe to use */
> > -       binder_set_installed_page(lru_page, page);
> > +       ret = vm_insert_page(vma, addr, page);
> > +       switch (ret) {
> > +       case -EBUSY:
> > +               /*
> > +                * EBUSY is ok. Someone installed the pte first but the
> > +                * lru_page->page_ptr has not been updated yet. Discard
> > +                * our page and look up the one already installed.
> > +                */
> > +               ret = 0;
> > +               __free_page(page);
> > +               npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
> > +               if (npages <= 0) {
> > +                       pr_err("%d: failed to find page at offset %lx\n",
> > +                              alloc->pid, addr - alloc->buffer);
> > +                       ret = -ESRCH;
> > +                       break;
> > +               }
> > +               fallthrough;
> > +       case 0:
> > +               /* Mark page installation complete and safe to use */
> > +               binder_set_installed_page(lru_page, page);
> > +               break;
> > +       default:
> > +               __free_page(page);
> > +               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> > +                      alloc->pid, __func__, addr - alloc->buffer, ret);
> > +               ret = -ENOMEM;
> 
> vm_insert_page() can return EINVAL (see
> validate_page_before_insert()). Instead of converting other codes into
> ENOMEM why not return "ret" as is?

This is purely historical, binder has always propagated -ENOMEM to
userspace on errors from vm_insert_page() and I'm not sure why.

FWIW, I've dropped the behavior in the last patch and now I just forward
whatever vm_insert_page() returns. I had a look at libbinder code and it
doesn't really make a difference.

Perhaps, I should be explicit about this move and do it in a separate
commit.

Thanks,
--
Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/8] binder: remove cached alloc->vma pointer
  2024-11-07  4:02 ` [PATCH v2 6/8] binder: remove cached alloc->vma pointer Carlos Llamas
@ 2024-11-07 15:33   ` Suren Baghdasaryan
  0 siblings, 0 replies; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 15:33 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Minchan Kim, Liam R. Howlett, Matthew Wilcox

On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> All usage of the alloc->vma pointer has been removed from binder.
> Instead, paths has been refactored to use either vma_lookup() or the
> alloc->mapped state. Using the alloc->vma was unsafe and required the
> mmap_sem in exclusive mode, which caused several performance and
> security issues in the past.

nit: I think this can be safely squashed with the previous patch and
call it a replacement.

>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
>  drivers/android/binder_alloc.c | 14 --------------
>  drivers/android/binder_alloc.h |  3 ---
>  2 files changed, 17 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 7239b92ef20a..13b476cc5b62 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -377,20 +377,6 @@ static void binder_lru_freelist_del(struct binder_alloc *alloc,
>         }
>  }
>
> -static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
> -               struct vm_area_struct *vma)
> -{
> -       /* pairs with smp_load_acquire in binder_alloc_get_vma() */
> -       smp_store_release(&alloc->vma, vma);
> -}
> -
> -static inline struct vm_area_struct *binder_alloc_get_vma(
> -               struct binder_alloc *alloc)
> -{
> -       /* pairs with smp_store_release in binder_alloc_set_vma() */
> -       return smp_load_acquire(&alloc->vma);
> -}
> -
>  static void debug_no_space_locked(struct binder_alloc *alloc)
>  {
>         size_t largest_alloc_size = 0;
> diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
> index 89f6fa25c670..b9e2e9dc90b3 100644
> --- a/drivers/android/binder_alloc.h
> +++ b/drivers/android/binder_alloc.h
> @@ -61,8 +61,6 @@ struct binder_buffer {
>  /**
>   * struct binder_alloc - per-binder proc state for binder allocator
>   * @mutex:              protects binder_alloc fields
> - * @vma:                vm_area_struct passed to mmap_handler
> - *                      (invariant after mmap)
>   * @mm:                 copy of task->mm (invariant after open)
>   * @buffer:             base of per-proc address space mapped via mmap
>   * @buffers:            list of all buffers for this proc
> @@ -86,7 +84,6 @@ struct binder_buffer {
>   */
>  struct binder_alloc {
>         struct mutex mutex;
> -       struct vm_area_struct *vma;
>         struct mm_struct *mm;
>         unsigned long buffer;
>         struct list_head buffers;
> --
> 2.47.0.199.ga7371fff76-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07  4:02 ` [PATCH v2 8/8] binder: use per-vma lock in page installation Carlos Llamas
@ 2024-11-07 16:16   ` Suren Baghdasaryan
  2024-11-07 17:55     ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 16:16 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> Use per-vma locking for concurrent page installations, this minimizes
> contention with unrelated vmas improving performance. The mmap_lock is
> still acquired when needed though, e.g. before get_user_pages_remote().
>
> Many thanks to Barry Song who posted a similar approach [1].
>
> Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Hillf Danton <hdanton@sina.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
>  drivers/android/binder_alloc.c | 85 +++++++++++++++++++++++-----------
>  1 file changed, 57 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 814435a2601a..debfa541e01b 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -233,6 +233,56 @@ static inline bool binder_alloc_is_mapped(struct binder_alloc *alloc)
>         return smp_load_acquire(&alloc->mapped);
>  }
>
> +static struct page *binder_page_lookup(struct mm_struct *mm,

Maybe pass "struct binder_alloc" in both binder_page_lookup() and
binder_page_insert()?
I like how previous code stabilized mm with mmget_not_zero() once vs
now binder_page_lookup() and binder_page_insert() have to mmget/mmput
individually. Not a big deal but looked cleaner.

> +                                      unsigned long addr)
> +{
> +       struct page *page;
> +       long ret;
> +
> +       if (!mmget_not_zero(mm))
> +               return NULL;
> +
> +       mmap_read_lock(mm);
> +       ret = get_user_pages_remote(mm, addr, 1, 0, &page, NULL);
> +       mmap_read_unlock(mm);
> +       mmput_async(mm);
> +
> +       return ret > 0 ? page : NULL;
> +}
> +
> +static int binder_page_insert(struct binder_alloc *alloc,
> +                             unsigned long addr,
> +                             struct page *page)
> +{
> +       struct mm_struct *mm = alloc->mm;
> +       struct vm_area_struct *vma;
> +       int ret = -ESRCH;
> +
> +       if (!mmget_not_zero(mm))
> +               return -ESRCH;
> +
> +       /* attempt per-vma lock first */
> +       vma = lock_vma_under_rcu(mm, addr);
> +       if (!vma)
> +               goto lock_mmap;
> +
> +       if (binder_alloc_is_mapped(alloc))

I don't think you need this check here. lock_vma_under_rcu() ensures
that the VMA was not detached from the tree after locking the VMA, so
if you got a VMA it's in the tree and it can't be removed (because
it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
called after VMA gets detached from the tree and that won't happen
while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
binder_alloc_is_mapped() has to always return true. A WARN_ON() check
here to ensure that might be a better option.

> +               ret = vm_insert_page(vma, addr, page);
> +       vma_end_read(vma);
> +       goto done;

I think the code would be more readable without these jumps:

        vma = lock_vma_under_rcu(mm, addr);
        if (vma) {
               if (!WARN_ON(!binder_alloc_is_mapped(alloc)))
                       ret = vm_insert_page(vma, addr, page);
               vma_end_read(vma);
        } else {
               /* fall back to mmap_lock */
               mmap_read_lock(mm);
               vma = vma_lookup(mm, addr);
               if (vma && binder_alloc_is_mapped(alloc))
                       ret = vm_insert_page(vma, addr, page);
               mmap_read_unlock(mm);
        }
        mmput_async(mm);
        return ret;


> +
> +lock_mmap:
> +       /* fall back to mmap_lock */
> +       mmap_read_lock(mm);
> +       vma = vma_lookup(mm, addr);
> +       if (vma && binder_alloc_is_mapped(alloc))
> +               ret = vm_insert_page(vma, addr, page);
> +       mmap_read_unlock(mm);
> +done:
> +       mmput_async(mm);
> +       return ret;
> +}
> +
>  static struct page *binder_page_alloc(struct binder_alloc *alloc,
>                                       unsigned long index,
>                                       unsigned long addr)
> @@ -254,31 +304,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>                                       unsigned long index,
>                                       unsigned long addr)
>  {
> -       struct vm_area_struct *vma;
>         struct page *page;
> -       long npages;
>         int ret;
>
> -       if (!mmget_not_zero(alloc->mm))
> -               return -ESRCH;
> -
>         page = binder_page_alloc(alloc, index, addr);
> -       if (!page) {
> -               ret = -ENOMEM;
> -               goto out;
> -       }
> -
> -       mmap_read_lock(alloc->mm);
> -       vma = vma_lookup(alloc->mm, addr);
> -       if (!vma || !binder_alloc_is_mapped(alloc)) {
> -               mmap_read_unlock(alloc->mm);
> -               __free_page(page);
> -               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> -               ret = -ESRCH;
> -               goto out;
> -       }
> +       if (!page)
> +               return -ENOMEM;
>
> -       ret = vm_insert_page(vma, addr, page);
> +       ret = binder_page_insert(alloc, addr, page);
>         switch (ret) {
>         case -EBUSY:
>                 /*
> @@ -288,12 +321,11 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>                  */
>                 ret = 0;
>                 __free_page(page);
> -               npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
> -               if (npages <= 0) {
> +               page = binder_page_lookup(alloc->mm, addr);
> +               if (!page) {
>                         pr_err("%d: failed to find page at offset %lx\n",
>                                alloc->pid, addr - alloc->vm_start);
> -                       ret = -ESRCH;
> -                       break;
> +                       return -ESRCH;
>                 }
>                 fallthrough;
>         case 0:
> @@ -304,12 +336,9 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>                 __free_page(page);
>                 pr_err("%d: %s failed to insert page at offset %lx with %d\n",
>                        alloc->pid, __func__, addr - alloc->vm_start, ret);
> -               ret = -ENOMEM;
>                 break;
>         }
> -       mmap_read_unlock(alloc->mm);
> -out:
> -       mmput_async(alloc->mm);
> +
>         return ret;
>  }
>
> --
> 2.47.0.199.ga7371fff76-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/8] binder: faster page installations
  2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
                   ` (7 preceding siblings ...)
  2024-11-07  4:02 ` [PATCH v2 8/8] binder: use per-vma lock in page installation Carlos Llamas
@ 2024-11-07 16:18 ` Suren Baghdasaryan
  8 siblings, 0 replies; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 16:18 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, linux-kernel, kernel-team, Alice Ryhl,
	David Hildenbrand, Liam R. Howlett, Matthew Wilcox, Minchan Kim,
	Nhat Pham, Johannes Weiner, Todd Kjos, Viktor Martensson,
	Hillf Danton, Lorenzo Stoakes

On Wed, Nov 6, 2024 at 8:02 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> The main focus of these patches is to improve the performance of binder
> page installations, primarily by reducing contention on the mmap_lock.
> The idea is to allow concurrent page insertion by leveraging per-vma
> locking and get_user_pages_remote().
>
> Unfortunately, this required reverting the alloc->lock spinlock back
> into a mutex in order to serialize with the shrinker. At least until
> finding a better solution e.g. support page zapping with a spinlock.
> The trade off is still quite worth it though.
>
> Other patches are also included that remove unsafe and redundant things
> such as the alloc->vma pointer or the struct binder_lru_page concept.
>
> Note: I'll work on setting up a page fault handler for binder next.
> I believe an idea from Alice Ryhl to deferred the page insertions will
> make this finally feasible. I only need to figure out a few performance
> bits but if/when done most of the manual page insertion code in binder
> could be dropped. :)
>
> Changelog:
>
> v2:
>  * fix locking order when upgrading from vma lock to mmap lock
>  * switch folio_walk_start() for get_user_pages_remote()
>  * release vma/mmap locks and mmput() right after vm_insert_page()
>  * add binder_page_alloc() helper for binder_install_single_page()
>
> v1:
> https://lore.kernel.org/all/20241105200258.2380168-1-cmllamas@google.com/
>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Todd Kjos <tkjos@google.com>
> Cc: Viktor Martensson <vmartensson@google.com>
> Cc: Hillf Danton <hdanton@sina.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Some nits but overall looks quite good.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

>
> Carlos Llamas (8):
>   Revert "binder: switch alloc->mutex to spinlock_t"
>   binder: concurrent page installation
>   binder: select correct nid for pages in LRU
>   binder: remove struct binder_lru_page
>   binder: use alloc->mapped to save the vma state
>   binder: remove cached alloc->vma pointer
>   binder: rename alloc->buffer to vm_start
>   binder: use per-vma lock in page installation
>
>  drivers/android/binder.c                |   2 +-
>  drivers/android/binder_alloc.c          | 322 ++++++++++++++----------
>  drivers/android/binder_alloc.h          |  35 +--
>  drivers/android/binder_alloc_selftest.c |  18 +-
>  drivers/android/binder_trace.h          |   2 +-
>  5 files changed, 212 insertions(+), 167 deletions(-)
>
> --
> 2.47.0.199.ga7371fff76-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 16:16   ` Suren Baghdasaryan
@ 2024-11-07 17:55     ` Carlos Llamas
  2024-11-07 18:04       ` Suren Baghdasaryan
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 17:55 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > Use per-vma locking for concurrent page installations, this minimizes
> > contention with unrelated vmas improving performance. The mmap_lock is
> > still acquired when needed though, e.g. before get_user_pages_remote().
> >
> > Many thanks to Barry Song who posted a similar approach [1].
> >
> > Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> > Cc: Nhat Pham <nphamcs@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Barry Song <v-songbaohua@oppo.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Hillf Danton <hdanton@sina.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
> > ---
> >  drivers/android/binder_alloc.c | 85 +++++++++++++++++++++++-----------
> >  1 file changed, 57 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index 814435a2601a..debfa541e01b 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -233,6 +233,56 @@ static inline bool binder_alloc_is_mapped(struct binder_alloc *alloc)
> >         return smp_load_acquire(&alloc->mapped);
> >  }
> >
> > +static struct page *binder_page_lookup(struct mm_struct *mm,
> 
> Maybe pass "struct binder_alloc" in both binder_page_lookup() and
> binder_page_insert()?

I'm not sure this is worth it though. Yeah, it would match with
binder_page_insert() nicely, but also there is no usage for alloc in
binder_page_lookup(). It's only purpose would be to access the mm:

  static struct page *binder_page_lookup(struct binder_alloc *alloc,
  					 unsigned long addr)
  {
  	struct mm_struct *mm = alloc->mm;

If you think this is cleaner I really don't mind adding it for v3.

> I like how previous code stabilized mm with mmget_not_zero() once vs
> now binder_page_lookup() and binder_page_insert() have to mmget/mmput
> individually. Not a big deal but looked cleaner.

Sure, I can factor this out (the way it was in v1).

> 
> > +                                      unsigned long addr)
> > +{
> > +       struct page *page;
> > +       long ret;
> > +
> > +       if (!mmget_not_zero(mm))
> > +               return NULL;
> > +
> > +       mmap_read_lock(mm);
> > +       ret = get_user_pages_remote(mm, addr, 1, 0, &page, NULL);
> > +       mmap_read_unlock(mm);
> > +       mmput_async(mm);
> > +
> > +       return ret > 0 ? page : NULL;
> > +}
> > +
> > +static int binder_page_insert(struct binder_alloc *alloc,
> > +                             unsigned long addr,
> > +                             struct page *page)
> > +{
> > +       struct mm_struct *mm = alloc->mm;
> > +       struct vm_area_struct *vma;
> > +       int ret = -ESRCH;
> > +
> > +       if (!mmget_not_zero(mm))
> > +               return -ESRCH;
> > +
> > +       /* attempt per-vma lock first */
> > +       vma = lock_vma_under_rcu(mm, addr);
> > +       if (!vma)
> > +               goto lock_mmap;
> > +
> > +       if (binder_alloc_is_mapped(alloc))
> 
> I don't think you need this check here. lock_vma_under_rcu() ensures
> that the VMA was not detached from the tree after locking the VMA, so
> if you got a VMA it's in the tree and it can't be removed (because
> it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> called after VMA gets detached from the tree and that won't happen
> while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> here to ensure that might be a better option.

Yes we are guaranteed to have _a_ non-isolated vma. However, the check
validates that it's the _expected_ vma. IIUC, our vma could have been
unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
gotten the same address space assigned?

The binder_alloc_is_mapped() checks if the vma belongs to binder. This
reminds me, I should also check this for get_user_pages_remote().

> 
> > +               ret = vm_insert_page(vma, addr, page);
> > +       vma_end_read(vma);
> > +       goto done;
> 
> I think the code would be more readable without these jumps:
> 
>         vma = lock_vma_under_rcu(mm, addr);
>         if (vma) {
>                if (!WARN_ON(!binder_alloc_is_mapped(alloc)))
>                        ret = vm_insert_page(vma, addr, page);
>                vma_end_read(vma);
>         } else {
>                /* fall back to mmap_lock */
>                mmap_read_lock(mm);
>                vma = vma_lookup(mm, addr);
>                if (vma && binder_alloc_is_mapped(alloc))
>                        ret = vm_insert_page(vma, addr, page);
>                mmap_read_unlock(mm);
>         }
>         mmput_async(mm);
>         return ret;

Ok. I'm thinking with mmput_async() being factored out, I'll add an
early return. e.g.:

	vma = lock_vma_under_rcu(mm, addr);
	if (vma) {
		if (binder_alloc_is_mapped(alloc))
			ret = vm_insert_page(vma, addr, page);
		vma_end_read(vma);
		return ret;
	}

	/* fall back to mmap_lock */
	 mmap_read_lock(mm);
	 [...]


Thanks,
Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 17:55     ` Carlos Llamas
@ 2024-11-07 18:04       ` Suren Baghdasaryan
  2024-11-07 18:19         ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 18:04 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
>
> On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > >
> > > Use per-vma locking for concurrent page installations, this minimizes
> > > contention with unrelated vmas improving performance. The mmap_lock is
> > > still acquired when needed though, e.g. before get_user_pages_remote().
> > >
> > > Many thanks to Barry Song who posted a similar approach [1].
> > >
> > > Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> > > Cc: Nhat Pham <nphamcs@gmail.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Barry Song <v-songbaohua@oppo.com>
> > > Cc: Suren Baghdasaryan <surenb@google.com>
> > > Cc: Hillf Danton <hdanton@sina.com>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Signed-off-by: Carlos Llamas <cmllamas@google.com>
> > > ---
> > >  drivers/android/binder_alloc.c | 85 +++++++++++++++++++++++-----------
> > >  1 file changed, 57 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > > index 814435a2601a..debfa541e01b 100644
> > > --- a/drivers/android/binder_alloc.c
> > > +++ b/drivers/android/binder_alloc.c
> > > @@ -233,6 +233,56 @@ static inline bool binder_alloc_is_mapped(struct binder_alloc *alloc)
> > >         return smp_load_acquire(&alloc->mapped);
> > >  }
> > >
> > > +static struct page *binder_page_lookup(struct mm_struct *mm,
> >
> > Maybe pass "struct binder_alloc" in both binder_page_lookup() and
> > binder_page_insert()?
>
> I'm not sure this is worth it though. Yeah, it would match with
> binder_page_insert() nicely, but also there is no usage for alloc in
> binder_page_lookup(). It's only purpose would be to access the mm:
>
>   static struct page *binder_page_lookup(struct binder_alloc *alloc,
>                                          unsigned long addr)
>   {
>         struct mm_struct *mm = alloc->mm;
>
> If you think this is cleaner I really don't mind adding it for v3.
>
> > I like how previous code stabilized mm with mmget_not_zero() once vs
> > now binder_page_lookup() and binder_page_insert() have to mmget/mmput
> > individually. Not a big deal but looked cleaner.
>
> Sure, I can factor this out (the way it was in v1).
>
> >
> > > +                                      unsigned long addr)
> > > +{
> > > +       struct page *page;
> > > +       long ret;
> > > +
> > > +       if (!mmget_not_zero(mm))
> > > +               return NULL;
> > > +
> > > +       mmap_read_lock(mm);
> > > +       ret = get_user_pages_remote(mm, addr, 1, 0, &page, NULL);
> > > +       mmap_read_unlock(mm);
> > > +       mmput_async(mm);
> > > +
> > > +       return ret > 0 ? page : NULL;
> > > +}
> > > +
> > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > +                             unsigned long addr,
> > > +                             struct page *page)
> > > +{
> > > +       struct mm_struct *mm = alloc->mm;
> > > +       struct vm_area_struct *vma;
> > > +       int ret = -ESRCH;
> > > +
> > > +       if (!mmget_not_zero(mm))
> > > +               return -ESRCH;
> > > +
> > > +       /* attempt per-vma lock first */
> > > +       vma = lock_vma_under_rcu(mm, addr);
> > > +       if (!vma)
> > > +               goto lock_mmap;
> > > +
> > > +       if (binder_alloc_is_mapped(alloc))
> >
> > I don't think you need this check here. lock_vma_under_rcu() ensures
> > that the VMA was not detached from the tree after locking the VMA, so
> > if you got a VMA it's in the tree and it can't be removed (because
> > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > called after VMA gets detached from the tree and that won't happen
> > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > here to ensure that might be a better option.
>
> Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> validates that it's the _expected_ vma. IIUC, our vma could have been
> unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> gotten the same address space assigned?

No, this should never happen. lock_vma_under_rcu() specifically checks
the address range *after* it locks the VMA:
https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026

>
> The binder_alloc_is_mapped() checks if the vma belongs to binder. This
> reminds me, I should also check this for get_user_pages_remote().
>
> >
> > > +               ret = vm_insert_page(vma, addr, page);
> > > +       vma_end_read(vma);
> > > +       goto done;
> >
> > I think the code would be more readable without these jumps:
> >
> >         vma = lock_vma_under_rcu(mm, addr);
> >         if (vma) {
> >                if (!WARN_ON(!binder_alloc_is_mapped(alloc)))
> >                        ret = vm_insert_page(vma, addr, page);
> >                vma_end_read(vma);
> >         } else {
> >                /* fall back to mmap_lock */
> >                mmap_read_lock(mm);
> >                vma = vma_lookup(mm, addr);
> >                if (vma && binder_alloc_is_mapped(alloc))
> >                        ret = vm_insert_page(vma, addr, page);
> >                mmap_read_unlock(mm);
> >         }
> >         mmput_async(mm);
> >         return ret;
>
> Ok. I'm thinking with mmput_async() being factored out, I'll add an
> early return. e.g.:
>
>         vma = lock_vma_under_rcu(mm, addr);
>         if (vma) {
>                 if (binder_alloc_is_mapped(alloc))
>                         ret = vm_insert_page(vma, addr, page);
>                 vma_end_read(vma);
>                 return ret;
>         }
>
>         /* fall back to mmap_lock */
>          mmap_read_lock(mm);
>          [...]
>
>
> Thanks,
> Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 18:04       ` Suren Baghdasaryan
@ 2024-11-07 18:19         ` Carlos Llamas
  2024-11-07 18:27           ` Suren Baghdasaryan
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 18:19 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 07, 2024 at 10:04:23AM -0800, Suren Baghdasaryan wrote:
> On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
> > On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > > +                             unsigned long addr,
> > > > +                             struct page *page)
> > > > +{
> > > > +       struct mm_struct *mm = alloc->mm;
> > > > +       struct vm_area_struct *vma;
> > > > +       int ret = -ESRCH;
> > > > +
> > > > +       if (!mmget_not_zero(mm))
> > > > +               return -ESRCH;
> > > > +
> > > > +       /* attempt per-vma lock first */
> > > > +       vma = lock_vma_under_rcu(mm, addr);
> > > > +       if (!vma)
> > > > +               goto lock_mmap;
> > > > +
> > > > +       if (binder_alloc_is_mapped(alloc))
> > >
> > > I don't think you need this check here. lock_vma_under_rcu() ensures
> > > that the VMA was not detached from the tree after locking the VMA, so
> > > if you got a VMA it's in the tree and it can't be removed (because
> > > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > > called after VMA gets detached from the tree and that won't happen
> > > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > > here to ensure that might be a better option.
> >
> > Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> > validates that it's the _expected_ vma. IIUC, our vma could have been
> > unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> > gotten the same address space assigned?
> 
> No, this should never happen. lock_vma_under_rcu() specifically checks
> the address range *after* it locks the VMA:
> https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026

The scenario I'm describing is the following:

Proc A				Proc B
				mmap(addr, binder_fd)
binder_page_insert()
mmget_not_zero()
				munmap(addr)
				alloc->mapped = false;
				[...]
				// mmap other vma but same addr
				mmap(addr, other_fd)

vma = lock_vma_under_rcu()

Isn't there a chance for the vma that Proc A receives is an unrelated
vma that was placed in the same address range?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 18:19         ` Carlos Llamas
@ 2024-11-07 18:27           ` Suren Baghdasaryan
  2024-11-07 18:52             ` Suren Baghdasaryan
  0 siblings, 1 reply; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 18:27 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 7, 2024 at 10:19 AM Carlos Llamas <cmllamas@google.com> wrote:
>
> On Thu, Nov 07, 2024 at 10:04:23AM -0800, Suren Baghdasaryan wrote:
> > On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
> > > On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > > > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > > > +                             unsigned long addr,
> > > > > +                             struct page *page)
> > > > > +{
> > > > > +       struct mm_struct *mm = alloc->mm;
> > > > > +       struct vm_area_struct *vma;
> > > > > +       int ret = -ESRCH;
> > > > > +
> > > > > +       if (!mmget_not_zero(mm))
> > > > > +               return -ESRCH;
> > > > > +
> > > > > +       /* attempt per-vma lock first */
> > > > > +       vma = lock_vma_under_rcu(mm, addr);
> > > > > +       if (!vma)
> > > > > +               goto lock_mmap;
> > > > > +
> > > > > +       if (binder_alloc_is_mapped(alloc))
> > > >
> > > > I don't think you need this check here. lock_vma_under_rcu() ensures
> > > > that the VMA was not detached from the tree after locking the VMA, so
> > > > if you got a VMA it's in the tree and it can't be removed (because
> > > > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > > > called after VMA gets detached from the tree and that won't happen
> > > > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > > > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > > > here to ensure that might be a better option.
> > >
> > > Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> > > validates that it's the _expected_ vma. IIUC, our vma could have been
> > > unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> > > gotten the same address space assigned?
> >
> > No, this should never happen. lock_vma_under_rcu() specifically checks
> > the address range *after* it locks the VMA:
> > https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026
>
> The scenario I'm describing is the following:
>
> Proc A                          Proc B
>                                 mmap(addr, binder_fd)
> binder_page_insert()
> mmget_not_zero()
>                                 munmap(addr)
>                                 alloc->mapped = false;
>                                 [...]
>                                 // mmap other vma but same addr
>                                 mmap(addr, other_fd)
>
> vma = lock_vma_under_rcu()
>
> Isn't there a chance for the vma that Proc A receives is an unrelated
> vma that was placed in the same address range?

Ah, I see now. The VMA is a valid one and at the address we specified
but it does not belong to the binder. Yes, then you do need this
check.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 18:27           ` Suren Baghdasaryan
@ 2024-11-07 18:52             ` Suren Baghdasaryan
  2024-11-07 19:33               ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Suren Baghdasaryan @ 2024-11-07 18:52 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 7, 2024 at 10:27 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:19 AM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > On Thu, Nov 07, 2024 at 10:04:23AM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
> > > > On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > > > > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > > > > +                             unsigned long addr,
> > > > > > +                             struct page *page)
> > > > > > +{
> > > > > > +       struct mm_struct *mm = alloc->mm;
> > > > > > +       struct vm_area_struct *vma;
> > > > > > +       int ret = -ESRCH;
> > > > > > +
> > > > > > +       if (!mmget_not_zero(mm))
> > > > > > +               return -ESRCH;
> > > > > > +
> > > > > > +       /* attempt per-vma lock first */
> > > > > > +       vma = lock_vma_under_rcu(mm, addr);
> > > > > > +       if (!vma)
> > > > > > +               goto lock_mmap;
> > > > > > +
> > > > > > +       if (binder_alloc_is_mapped(alloc))
> > > > >
> > > > > I don't think you need this check here. lock_vma_under_rcu() ensures
> > > > > that the VMA was not detached from the tree after locking the VMA, so
> > > > > if you got a VMA it's in the tree and it can't be removed (because
> > > > > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > > > > called after VMA gets detached from the tree and that won't happen
> > > > > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > > > > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > > > > here to ensure that might be a better option.
> > > >
> > > > Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> > > > validates that it's the _expected_ vma. IIUC, our vma could have been
> > > > unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> > > > gotten the same address space assigned?
> > >
> > > No, this should never happen. lock_vma_under_rcu() specifically checks
> > > the address range *after* it locks the VMA:
> > > https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026
> >
> > The scenario I'm describing is the following:
> >
> > Proc A                          Proc B
> >                                 mmap(addr, binder_fd)
> > binder_page_insert()
> > mmget_not_zero()
> >                                 munmap(addr)
> >                                 alloc->mapped = false;
> >                                 [...]
> >                                 // mmap other vma but same addr
> >                                 mmap(addr, other_fd)
> >
> > vma = lock_vma_under_rcu()
> >
> > Isn't there a chance for the vma that Proc A receives is an unrelated
> > vma that was placed in the same address range?
>
> Ah, I see now. The VMA is a valid one and at the address we specified
> but it does not belong to the binder. Yes, then you do need this
> check.

Is this scenario possible?:

 Proc A                          Proc B
                                 mmap(addr, binder_fd)
 binder_page_insert()
 mmget_not_zero()
                                 munmap(addr)
                                 alloc->mapped = false;
                                 [...]
                                 // mmap other vma but same addr
                                 mmap(addr, other_fd)
                                 mmap(other_addr, binder_fd)
 vma = lock_vma_under_rcu(addr)

If so, I think your binder_alloc_is_mapped() check will return true
but the binder area is mapped at a different other_addr. To avoid that
I think you can check that "addr" still belongs to [alloc->vm_start,
alloc->buffer_size] after you obtained and locked the VMA.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 18:52             ` Suren Baghdasaryan
@ 2024-11-07 19:33               ` Carlos Llamas
  2024-11-07 19:40                 ` Carlos Llamas
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 19:33 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 07, 2024 at 10:52:30AM -0800, Suren Baghdasaryan wrote:
> On Thu, Nov 7, 2024 at 10:27 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 10:19 AM Carlos Llamas <cmllamas@google.com> wrote:
> > >
> > > On Thu, Nov 07, 2024 at 10:04:23AM -0800, Suren Baghdasaryan wrote:
> > > > On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > > > > > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > > > > > +                             unsigned long addr,
> > > > > > > +                             struct page *page)
> > > > > > > +{
> > > > > > > +       struct mm_struct *mm = alloc->mm;
> > > > > > > +       struct vm_area_struct *vma;
> > > > > > > +       int ret = -ESRCH;
> > > > > > > +
> > > > > > > +       if (!mmget_not_zero(mm))
> > > > > > > +               return -ESRCH;
> > > > > > > +
> > > > > > > +       /* attempt per-vma lock first */
> > > > > > > +       vma = lock_vma_under_rcu(mm, addr);
> > > > > > > +       if (!vma)
> > > > > > > +               goto lock_mmap;
> > > > > > > +
> > > > > > > +       if (binder_alloc_is_mapped(alloc))
> > > > > >
> > > > > > I don't think you need this check here. lock_vma_under_rcu() ensures
> > > > > > that the VMA was not detached from the tree after locking the VMA, so
> > > > > > if you got a VMA it's in the tree and it can't be removed (because
> > > > > > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > > > > > called after VMA gets detached from the tree and that won't happen
> > > > > > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > > > > > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > > > > > here to ensure that might be a better option.
> > > > >
> > > > > Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> > > > > validates that it's the _expected_ vma. IIUC, our vma could have been
> > > > > unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> > > > > gotten the same address space assigned?
> > > >
> > > > No, this should never happen. lock_vma_under_rcu() specifically checks
> > > > the address range *after* it locks the VMA:
> > > > https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026
> > >
> > > The scenario I'm describing is the following:
> > >
> > > Proc A                          Proc B
> > >                                 mmap(addr, binder_fd)
> > > binder_page_insert()
> > > mmget_not_zero()
> > >                                 munmap(addr)
> > >                                 alloc->mapped = false;
> > >                                 [...]
> > >                                 // mmap other vma but same addr
> > >                                 mmap(addr, other_fd)
> > >
> > > vma = lock_vma_under_rcu()
> > >
> > > Isn't there a chance for the vma that Proc A receives is an unrelated
> > > vma that was placed in the same address range?
> >
> > Ah, I see now. The VMA is a valid one and at the address we specified
> > but it does not belong to the binder. Yes, then you do need this
> > check.
> 
> Is this scenario possible?:
> 
>  Proc A                          Proc B
>                                  mmap(addr, binder_fd)
>  binder_page_insert()
>  mmget_not_zero()
>                                  munmap(addr)
>                                  alloc->mapped = false;
>                                  [...]
>                                  // mmap other vma but same addr
>                                  mmap(addr, other_fd)
>                                  mmap(other_addr, binder_fd)
>  vma = lock_vma_under_rcu(addr)
> 
> If so, I think your binder_alloc_is_mapped() check will return true
> but the binder area is mapped at a different other_addr. To avoid that
> I think you can check that "addr" still belongs to [alloc->vm_start,
> alloc->buffer_size] after you obtained and locked the VMA.

Wait, I thought that vm_ops->close() was called with the mmap_lock in
exclusive mode. This is where binder clears the alloc->mapped. If this
is not the case (was it ever?), then I'd definitely need to fix this.

I'll have a closer look.

Thanks,
Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 8/8] binder: use per-vma lock in page installation
  2024-11-07 19:33               ` Carlos Llamas
@ 2024-11-07 19:40                 ` Carlos Llamas
  0 siblings, 0 replies; 23+ messages in thread
From: Carlos Llamas @ 2024-11-07 19:40 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, linux-kernel,
	kernel-team, Nhat Pham, Johannes Weiner, Barry Song, Hillf Danton,
	Lorenzo Stoakes

On Thu, Nov 07, 2024 at 07:33:24PM +0000, Carlos Llamas wrote:
> On Thu, Nov 07, 2024 at 10:52:30AM -0800, Suren Baghdasaryan wrote:
> > On Thu, Nov 7, 2024 at 10:27 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 10:19 AM Carlos Llamas <cmllamas@google.com> wrote:
> > > >
> > > > On Thu, Nov 07, 2024 at 10:04:23AM -0800, Suren Baghdasaryan wrote:
> > > > > On Thu, Nov 7, 2024 at 9:55 AM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > > On Thu, Nov 07, 2024 at 08:16:39AM -0800, Suren Baghdasaryan wrote:
> > > > > > > On Wed, Nov 6, 2024 at 8:03 PM Carlos Llamas <cmllamas@google.com> wrote:
> > > > > > > > +static int binder_page_insert(struct binder_alloc *alloc,
> > > > > > > > +                             unsigned long addr,
> > > > > > > > +                             struct page *page)
> > > > > > > > +{
> > > > > > > > +       struct mm_struct *mm = alloc->mm;
> > > > > > > > +       struct vm_area_struct *vma;
> > > > > > > > +       int ret = -ESRCH;
> > > > > > > > +
> > > > > > > > +       if (!mmget_not_zero(mm))
> > > > > > > > +               return -ESRCH;
> > > > > > > > +
> > > > > > > > +       /* attempt per-vma lock first */
> > > > > > > > +       vma = lock_vma_under_rcu(mm, addr);
> > > > > > > > +       if (!vma)
> > > > > > > > +               goto lock_mmap;
> > > > > > > > +
> > > > > > > > +       if (binder_alloc_is_mapped(alloc))
> > > > > > >
> > > > > > > I don't think you need this check here. lock_vma_under_rcu() ensures
> > > > > > > that the VMA was not detached from the tree after locking the VMA, so
> > > > > > > if you got a VMA it's in the tree and it can't be removed (because
> > > > > > > it's locked). remove_vma()->vma_close()->vma->vm_ops->close() is
> > > > > > > called after VMA gets detached from the tree and that won't happen
> > > > > > > while VMA is locked. So, if lock_vma_under_rcu() returns a VMA,
> > > > > > > binder_alloc_is_mapped() has to always return true. A WARN_ON() check
> > > > > > > here to ensure that might be a better option.
> > > > > >
> > > > > > Yes we are guaranteed to have _a_ non-isolated vma. However, the check
> > > > > > validates that it's the _expected_ vma. IIUC, our vma could have been
> > > > > > unmapped (clearing alloc->mapped) and a _new_ unrelated vma could have
> > > > > > gotten the same address space assigned?
> > > > >
> > > > > No, this should never happen. lock_vma_under_rcu() specifically checks
> > > > > the address range *after* it locks the VMA:
> > > > > https://elixir.bootlin.com/linux/v6.11.6/source/mm/memory.c#L6026
> > > >
> > > > The scenario I'm describing is the following:
> > > >
> > > > Proc A                          Proc B
> > > >                                 mmap(addr, binder_fd)
> > > > binder_page_insert()
> > > > mmget_not_zero()
> > > >                                 munmap(addr)
> > > >                                 alloc->mapped = false;
> > > >                                 [...]
> > > >                                 // mmap other vma but same addr
> > > >                                 mmap(addr, other_fd)
> > > >
> > > > vma = lock_vma_under_rcu()
> > > >
> > > > Isn't there a chance for the vma that Proc A receives is an unrelated
> > > > vma that was placed in the same address range?
> > >
> > > Ah, I see now. The VMA is a valid one and at the address we specified
> > > but it does not belong to the binder. Yes, then you do need this
> > > check.
> > 
> > Is this scenario possible?:
> > 
> >  Proc A                          Proc B
> >                                  mmap(addr, binder_fd)
> >  binder_page_insert()
> >  mmget_not_zero()
> >                                  munmap(addr)
> >                                  alloc->mapped = false;
> >                                  [...]
> >                                  // mmap other vma but same addr
> >                                  mmap(addr, other_fd)
> >                                  mmap(other_addr, binder_fd)
> >  vma = lock_vma_under_rcu(addr)
> > 
> > If so, I think your binder_alloc_is_mapped() check will return true
> > but the binder area is mapped at a different other_addr. To avoid that
> > I think you can check that "addr" still belongs to [alloc->vm_start,
> > alloc->buffer_size] after you obtained and locked the VMA.
> 
> Wait, I thought that vm_ops->close() was called with the mmap_lock in
> exclusive mode. This is where binder clears the alloc->mapped. If this
> is not the case (was it ever?), then I'd definitely need to fix this.

On a second though, we don't need the mmap_sem in exclusive mode. When
we acquire the vma through lock_vma_under_rcu() we guarantee it's not
isolated and if our alloc->mapped is set, that means it has not been
unmapped either. So we are good to proceed.

--
Carlos Llamas

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-11-07 19:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
2024-11-07  8:56   ` Greg Kroah-Hartman
2024-11-07 14:30     ` Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 2/8] binder: concurrent page installation Carlos Llamas
2024-11-07 15:10   ` Suren Baghdasaryan
2024-11-07 15:31     ` Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 3/8] binder: select correct nid for pages in LRU Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 4/8] binder: remove struct binder_lru_page Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 5/8] binder: use alloc->mapped to save the vma state Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 6/8] binder: remove cached alloc->vma pointer Carlos Llamas
2024-11-07 15:33   ` Suren Baghdasaryan
2024-11-07  4:02 ` [PATCH v2 7/8] binder: rename alloc->buffer to vm_start Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 8/8] binder: use per-vma lock in page installation Carlos Llamas
2024-11-07 16:16   ` Suren Baghdasaryan
2024-11-07 17:55     ` Carlos Llamas
2024-11-07 18:04       ` Suren Baghdasaryan
2024-11-07 18:19         ` Carlos Llamas
2024-11-07 18:27           ` Suren Baghdasaryan
2024-11-07 18:52             ` Suren Baghdasaryan
2024-11-07 19:33               ` Carlos Llamas
2024-11-07 19:40                 ` Carlos Llamas
2024-11-07 16:18 ` [PATCH v2 0/8] binder: faster page installations Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox