* [PATCH 01/21] binder: use EPOLLERR from eventpoll.h
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:07 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 02/21] binder: fix use-after-free in shinker's callback Carlos Llamas
` (19 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan, Eric Biggers
Cc: linux-kernel, kernel-team, stable
Use EPOLLERR instead of POLLERR to make sure it is cast to the correct
__poll_t type. This fixes the following sparse issue:
drivers/android/binder.c:5030:24: warning: incorrect type in return expression (different base types)
drivers/android/binder.c:5030:24: expected restricted __poll_t
drivers/android/binder.c:5030:24: got int
Fixes: f88982679f54 ("binder: check for binder_thread allocation failure in binder_poll()")
Cc: stable@vger.kernel.org
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 92128aae2d06..71a40a4c546f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -5030,7 +5030,7 @@ static __poll_t binder_poll(struct file *filp,
thread = binder_get_thread(proc);
if (!thread)
- return POLLERR;
+ return EPOLLERR;
binder_inner_proc_lock(thread->proc);
thread->looper |= BINDER_LOOPER_STATE_POLL;
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 01/21] binder: use EPOLLERR from eventpoll.h
2023-11-02 18:59 ` [PATCH 01/21] binder: use EPOLLERR from eventpoll.h Carlos Llamas
@ 2023-11-07 9:07 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:07 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Eric Biggers,
Greg Kroah-Hartman, Joel Fernandes, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos, kernel-team, linux-kernel, stable
Carlos Llamas <cmllamas@google.com> writes:
> Use EPOLLERR instead of POLLERR to make sure it is cast to the correct
> __poll_t type. This fixes the following sparse issue:
>
> drivers/android/binder.c:5030:24: warning: incorrect type in return expression (different base types)
> drivers/android/binder.c:5030:24: expected restricted __poll_t
> drivers/android/binder.c:5030:24: got int
>
> Fixes: f88982679f54 ("binder: check for binder_thread allocation failure in binder_poll()")
> Cc: stable@vger.kernel.org
> Cc: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 02/21] binder: fix use-after-free in shinker's callback
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
2023-11-02 18:59 ` [PATCH 01/21] binder: use EPOLLERR from eventpoll.h Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-02 19:20 ` Liam R. Howlett
2023-11-07 9:07 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 03/21] binder: fix race between mmput() and do_exit() Carlos Llamas
` (18 subsequent siblings)
20 siblings, 2 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan, Andrew Morton, Vlastimil Babka,
Kirill A. Shutemov, Matthew Wilcox, Michal Hocko
Cc: linux-kernel, kernel-team, stable, Liam Howlett, Minchan Kim
The mmap read lock is used during the shrinker's callback, which means
that using alloc->vma pointer isn't safe as it can race with munmap().
As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and
triggering page reclaiming through the shrinker's debug sysfs. The
following KASAN report confirms the UAF:
==================================================================
BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
Read of size 8 at addr ffff356ed50e50f0 by task bash/478
CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
Hardware name: linux,dummy-virt (DT)
Call trace:
zap_page_range_single+0x470/0x4b8
binder_alloc_free_page+0x608/0xadc
__list_lru_walk_one+0x130/0x3b0
list_lru_walk_node+0xc4/0x22c
binder_shrink_scan+0x108/0x1dc
shrinker_debugfs_scan_write+0x2b4/0x500
full_proxy_write+0xd4/0x140
vfs_write+0x1ac/0x758
ksys_write+0xf0/0x1dc
__arm64_sys_write+0x6c/0x9c
Allocated by task 492:
kmem_cache_alloc+0x130/0x368
vm_area_alloc+0x2c/0x190
mmap_region+0x258/0x18bc
do_mmap+0x694/0xa60
vm_mmap_pgoff+0x170/0x29c
ksys_mmap_pgoff+0x290/0x3a0
__arm64_sys_mmap+0xcc/0x144
Freed by task 491:
kmem_cache_free+0x17c/0x3c8
vm_area_free_rcu_cb+0x74/0x98
rcu_core+0xa38/0x26d4
rcu_core_si+0x10/0x1c
__do_softirq+0x2fc/0xd24
Last potentially related work creation:
__call_rcu_common.constprop.0+0x6c/0xba0
call_rcu+0x10/0x1c
vm_area_free+0x18/0x24
remove_vma+0xe4/0x118
do_vmi_align_munmap.isra.0+0x718/0xb5c
do_vmi_munmap+0xdc/0x1fc
__vm_munmap+0x10c/0x278
__arm64_sys_munmap+0x58/0x7c
Fix this issue by performing instead a vma_lookup() which will fail to
find the vma that was isolated before the mmap lock downgrade. Note that
this option has better performance than upgrading to a mmap write lock
which would increase contention. Plus, mmap_write_trylock() has been
recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: stable@vger.kernel.org
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e3db8297095a..c4d60d81221b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
+ vma = vma_lookup(mm, page_addr);
+ if (vma && vma != binder_alloc_get_vma(alloc))
+ goto err_invalid_vma;
list_lru_isolate(lru, item);
spin_unlock(lock);
@@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;
+err_invalid_vma:
+ mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback
2023-11-02 18:59 ` [PATCH 02/21] binder: fix use-after-free in shinker's callback Carlos Llamas
@ 2023-11-02 19:20 ` Liam R. Howlett
2023-11-02 20:09 ` Carlos Llamas
2023-11-07 9:07 ` Alice Ryhl
1 sibling, 1 reply; 57+ messages in thread
From: Liam R. Howlett @ 2023-11-02 19:20 UTC (permalink / raw)
To: Carlos Llamas
Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner,
Suren Baghdasaryan, Andrew Morton, Vlastimil Babka,
Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, linux-kernel,
kernel-team, stable, Minchan Kim
* Carlos Llamas <cmllamas@google.com> [231102 15:00]:
> The mmap read lock is used during the shrinker's callback, which means
> that using alloc->vma pointer isn't safe as it can race with munmap().
I think you know my feelings about the safety of that pointer from
previous discussions.
> As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
> munmap") the mmap lock is downgraded after the vma has been isolated.
>
> I was able to reproduce this issue by manually adding some delays and
> triggering page reclaiming through the shrinker's debug sysfs. The
> following KASAN report confirms the UAF:
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
> Read of size 8 at addr ffff356ed50e50f0 by task bash/478
>
> CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> zap_page_range_single+0x470/0x4b8
> binder_alloc_free_page+0x608/0xadc
> __list_lru_walk_one+0x130/0x3b0
> list_lru_walk_node+0xc4/0x22c
> binder_shrink_scan+0x108/0x1dc
> shrinker_debugfs_scan_write+0x2b4/0x500
> full_proxy_write+0xd4/0x140
> vfs_write+0x1ac/0x758
> ksys_write+0xf0/0x1dc
> __arm64_sys_write+0x6c/0x9c
>
> Allocated by task 492:
> kmem_cache_alloc+0x130/0x368
> vm_area_alloc+0x2c/0x190
> mmap_region+0x258/0x18bc
> do_mmap+0x694/0xa60
> vm_mmap_pgoff+0x170/0x29c
> ksys_mmap_pgoff+0x290/0x3a0
> __arm64_sys_mmap+0xcc/0x144
>
> Freed by task 491:
> kmem_cache_free+0x17c/0x3c8
> vm_area_free_rcu_cb+0x74/0x98
> rcu_core+0xa38/0x26d4
> rcu_core_si+0x10/0x1c
> __do_softirq+0x2fc/0xd24
>
> Last potentially related work creation:
> __call_rcu_common.constprop.0+0x6c/0xba0
> call_rcu+0x10/0x1c
> vm_area_free+0x18/0x24
> remove_vma+0xe4/0x118
> do_vmi_align_munmap.isra.0+0x718/0xb5c
> do_vmi_munmap+0xdc/0x1fc
> __vm_munmap+0x10c/0x278
> __arm64_sys_munmap+0x58/0x7c
>
> Fix this issue by performing instead a vma_lookup() which will fail to
> find the vma that was isolated before the mmap lock downgrade. Note that
> this option has better performance than upgrading to a mmap write lock
> which would increase contention. Plus, mmap_write_trylock() has been
> recently removed anyway.
>
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Cc: stable@vger.kernel.org
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
> drivers/android/binder_alloc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index e3db8297095a..c4d60d81221b 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> goto err_mmget;
> if (!mmap_read_trylock(mm))
> goto err_mmap_read_lock_failed;
> - vma = binder_alloc_get_vma(alloc);
> + vma = vma_lookup(mm, page_addr);
> + if (vma && vma != binder_alloc_get_vma(alloc))
> + goto err_invalid_vma;
Doesn't this need to be:
if (!vma || vma != binder_alloc_get_vma(alloc))
This way, we catch a different vma and a NULL vma.
Or even, just:
if (vma != binder_alloc_get_vma(alloc))
if the alloc vma cannot be NULL?
>
> list_lru_isolate(lru, item);
> spin_unlock(lock);
> @@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> mutex_unlock(&alloc->mutex);
> return LRU_REMOVED_RETRY;
>
> +err_invalid_vma:
> + mmap_read_unlock(mm);
> err_mmap_read_lock_failed:
> mmput_async(mm);
> err_mmget:
> --
> 2.42.0.869.gea05f2083d-goog
>
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback
2023-11-02 19:20 ` Liam R. Howlett
@ 2023-11-02 20:09 ` Carlos Llamas
2023-11-02 20:27 ` Liam R. Howlett
0 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 20:09 UTC (permalink / raw)
To: Liam R. Howlett
Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner,
Suren Baghdasaryan, Andrew Morton, Vlastimil Babka,
Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, linux-kernel,
kernel-team, stable, Minchan Kim
On Thu, Nov 02, 2023 at 03:20:51PM -0400, Liam R. Howlett wrote:
> * Carlos Llamas <cmllamas@google.com> [231102 15:00]:
> > The mmap read lock is used during the shrinker's callback, which means
> > that using alloc->vma pointer isn't safe as it can race with munmap().
>
> I think you know my feelings about the safety of that pointer from
> previous discussions.
>
Yeah. The work here is not done. We actually already store the vm_start
address in alloc->buffer, so in theory we don't even need to swap the
alloc->vma pointer we could just drop it. So, I agree with you.
I want to include this saftey "fix" along with some other work that uses
the page fault handler and get_user_pages_remote(). I've tried a quick
prototype of this and it works fine.
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index e3db8297095a..c4d60d81221b 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> > goto err_mmget;
> > if (!mmap_read_trylock(mm))
> > goto err_mmap_read_lock_failed;
> > - vma = binder_alloc_get_vma(alloc);
> > + vma = vma_lookup(mm, page_addr);
> > + if (vma && vma != binder_alloc_get_vma(alloc))
> > + goto err_invalid_vma;
>
> Doesn't this need to be:
> if (!vma || vma != binder_alloc_get_vma(alloc))
>
> This way, we catch a different vma and a NULL vma.
>
> Or even, just:
> if (vma != binder_alloc_get_vma(alloc))
>
> if the alloc vma cannot be NULL?
>
If the vma_lookup() is NULL then we still need to isolate and free the
given binder page and we obviously skip the zap() in this case.
However, if we receive a random unexpected vma because of a corrupted
address or similar, then the whole process is skipped.
Thus, why we use the check above.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback
2023-11-02 20:09 ` Carlos Llamas
@ 2023-11-02 20:27 ` Liam R. Howlett
0 siblings, 0 replies; 57+ messages in thread
From: Liam R. Howlett @ 2023-11-02 20:27 UTC (permalink / raw)
To: Carlos Llamas
Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner,
Suren Baghdasaryan, Andrew Morton, Vlastimil Babka,
Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, linux-kernel,
kernel-team, stable, Minchan Kim
* Carlos Llamas <cmllamas@google.com> [231102 16:09]:
...
>
> > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > > index e3db8297095a..c4d60d81221b 100644
> > > --- a/drivers/android/binder_alloc.c
> > > +++ b/drivers/android/binder_alloc.c
> > > @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> > > goto err_mmget;
> > > if (!mmap_read_trylock(mm))
> > > goto err_mmap_read_lock_failed;
> > > - vma = binder_alloc_get_vma(alloc);
> > > + vma = vma_lookup(mm, page_addr);
> > > + if (vma && vma != binder_alloc_get_vma(alloc))
> > > + goto err_invalid_vma;
> >
> > Doesn't this need to be:
> > if (!vma || vma != binder_alloc_get_vma(alloc))
> >
> > This way, we catch a different vma and a NULL vma.
> >
> > Or even, just:
> > if (vma != binder_alloc_get_vma(alloc))
> >
> > if the alloc vma cannot be NULL?
> >
>
> If the vma_lookup() is NULL then we still need to isolate and free the
> given binder page and we obviously skip the zap() in this case.
I would have thought if there was no VMA, then the entire process could
be avoided. Thanks for clarifying.
>
> However, if we receive a random unexpected vma because of a corrupted
> address or similar, then the whole process is skipped.
>
> Thus, why we use the check above.
>
> --
> Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback
2023-11-02 18:59 ` [PATCH 02/21] binder: fix use-after-free in shinker's callback Carlos Llamas
2023-11-02 19:20 ` Liam R. Howlett
@ 2023-11-07 9:07 ` Alice Ryhl
1 sibling, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:07 UTC (permalink / raw)
To: Carlos Llamas
Cc: Liam R . Howlett, Andrew Morton, Arve Hjønnevåg,
Christian Brauner, Greg Kroah-Hartman, Joel Fernandes,
Kirill A . Shutemov, Martijn Coenen, Michal Hocko, Minchan Kim,
Suren Baghdasaryan, Todd Kjos, Vlastimil Babka, Matthew Wilcox,
linux-kernel, kernel-team, stable
Carlos Llamas <cmllamas@google.com> writes:
> The mmap read lock is used during the shrinker's callback, which means
> that using alloc->vma pointer isn't safe as it can race with munmap().
> As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
> munmap") the mmap lock is downgraded after the vma has been isolated.
>
> I was able to reproduce this issue by manually adding some delays and
> triggering page reclaiming through the shrinker's debug sysfs. The
> following KASAN report confirms the UAF:
>
> [...snip...]
>
> Fix this issue by performing instead a vma_lookup() which will fail to
> find the vma that was isolated before the mmap lock downgrade. Note that
> this option has better performance than upgrading to a mmap write lock
> which would increase contention. Plus, mmap_write_trylock() has been
> recently removed anyway.
>
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Cc: stable@vger.kernel.org
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
This change makes sense to me, and I agree that the code still needs to
run when the vma is null.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 03/21] binder: fix race between mmput() and do_exit()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
2023-11-02 18:59 ` [PATCH 01/21] binder: use EPOLLERR from eventpoll.h Carlos Llamas
2023-11-02 18:59 ` [PATCH 02/21] binder: fix use-after-free in shinker's callback Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:07 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 04/21] binder: fix async space check for 0-sized buffers Carlos Llamas
` (17 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan, Brian Swetland
Cc: linux-kernel, kernel-team, Greg Kroah-Hartman
Task A calls binder_update_page_range() to allocate and insert pages on
a remote address space from Task B. For this, Task A pins the remote mm
via mmget_not_zero() first. This can race with Task B do_exit() and the
final mmput() refcount decrement will come from Task A.
Task A | Task B
------------------+------------------
mmget_not_zero() |
| do_exit()
| exit_mm()
| mmput()
mmput() |
exit_mmap() |
remove_vma() |
fput() |
In this case, the work of ____fput() from Task B is queued up in Task A
as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup
work gets executed. However, Task A instead sleep, waiting for a reply
from Task B that never comes (it's dead).
This means the binder_deferred_release() is blocked until an unrelated
binder event forces Task A to go back to userspace. All the associated
death notifications will also be delayed until then.
In order to fix this use mmput_async() that will schedule the work in
the corresponding mm->async_put_work WQ instead of Task A.
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index c4d60d81221b..046c3ae9eb8f 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -271,7 +271,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
if (mm) {
mmap_write_unlock(mm);
- mmput(mm);
+ mmput_async(mm);
}
return 0;
@@ -304,7 +304,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
err_no_vma:
if (mm) {
mmap_write_unlock(mm);
- mmput(mm);
+ mmput_async(mm);
}
return vma ? -ENOMEM : -ESRCH;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 03/21] binder: fix race between mmput() and do_exit()
2023-11-02 18:59 ` [PATCH 03/21] binder: fix race between mmput() and do_exit() Carlos Llamas
@ 2023-11-07 9:07 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:07 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Greg Kroah-Hartman, Joel Fernandes, Martijn Coenen,
Suren Baghdasaryan, Brian Swetland, Todd Kjos, kernel-team,
linux-kernel
Carlos Llamas <cmllamas@google.com> writes:
> Task A calls binder_update_page_range() to allocate and insert pages on
> a remote address space from Task B. For this, Task A pins the remote mm
> via mmget_not_zero() first. This can race with Task B do_exit() and the
> final mmput() refcount decrement will come from Task A.
>
> Task A | Task B
> ------------------+------------------
> mmget_not_zero() |
> | do_exit()
> | exit_mm()
> | mmput()
> mmput() |
> exit_mmap() |
> remove_vma() |
> fput() |
>
> In this case, the work of ____fput() from Task B is queued up in Task A
> as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup
> work gets executed. However, Task A instead sleep, waiting for a reply
> from Task B that never comes (it's dead).
>
> This means the binder_deferred_release() is blocked until an unrelated
> binder event forces Task A to go back to userspace. All the associated
> death notifications will also be delayed until then.
>
> In order to fix this use mmput_async() that will schedule the work in
> the corresponding mm->async_put_work WQ instead of Task A.
>
> Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Nice catch!
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
The code really could use a comment, though.
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 04/21] binder: fix async space check for 0-sized buffers
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (2 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 03/21] binder: fix race between mmput() and do_exit() Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked() Carlos Llamas
` (16 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan, Sherry Yang
Cc: linux-kernel, kernel-team
Move the padding of 0-sized buffers to an earlier stage to account for
this round up during the alloc->free_async_space check.
Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 046c3ae9eb8f..9b28d0f9666d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -407,6 +407,10 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
alloc->pid, extra_buffers_size);
return ERR_PTR(-EINVAL);
}
+
+ /* Pad 0-size buffers so they get assigned unique addresses */
+ size = max(size, sizeof(void *));
+
if (is_async &&
alloc->free_async_space < size + sizeof(struct binder_buffer)) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
@@ -415,9 +419,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
return ERR_PTR(-ENOSPC);
}
- /* Pad 0-size buffers so they get assigned unique addresses */
- size = max(size, sizeof(void *));
-
while (n) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
BUG_ON(!buffer->free);
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 04/21] binder: fix async space check for 0-sized buffers
2023-11-02 18:59 ` [PATCH 04/21] binder: fix async space check for 0-sized buffers Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
sherryy, Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Move the padding of 0-sized buffers to an earlier stage to account for
> this round up during the alloc->free_async_space check.
>
> Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space")
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (3 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 04/21] binder: fix async space check for 0-sized buffers Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value Carlos Llamas
` (15 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team, stable, Todd Kjos
Fix minor misspelling of the function in the comment section.
No functional changes in this patch.
Cc: stable@vger.kernel.org
Fixes: 0f966cba95c7 ("binder: add flag to clear buffer on txn complete")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 9b28d0f9666d..cd720bb5c9ce 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -707,7 +707,7 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
/*
* We could eliminate the call to binder_alloc_clear_buf()
* from binder_alloc_deferred_release() by moving this to
- * binder_alloc_free_buf_locked(). However, that could
+ * binder_free_buf_locked(). However, that could
* increase contention for the alloc mutex if clear_on_free
* is used frequently for large buffers. The mutex is not
* needed for correctness here.
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked()
2023-11-02 18:59 ` [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked() Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 6:52 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, Martijn Coenen, Suren Baghdasaryan, Todd Kjos,
Todd Kjos, kernel-team, linux-kernel, stable
Carlos Llamas <cmllamas@google.com> writes:
> Fix minor misspelling of the function in the comment section.
>
> No functional changes in this patch.
>
> Cc: stable@vger.kernel.org
> Fixes: 0f966cba95c7 ("binder: add flag to clear buffer on txn complete")
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
It's a bit confusing that the pair of methods binder_alloc_free_buf and
binder_free_buf_locked are inconsistent in whether they user the alloc_
prefix. It might be worth it to change them to be consistent?
Either way, this change LGTM.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked()
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 6:52 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 6:52 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, Martijn Coenen, Suren Baghdasaryan, Todd Kjos,
Todd Kjos, kernel-team, linux-kernel, stable
On Tue, Nov 07, 2023 at 09:08:05AM +0000, Alice Ryhl wrote:
> It's a bit confusing that the pair of methods binder_alloc_free_buf and
> binder_free_buf_locked are inconsistent in whether they user the alloc_
> prefix. It might be worth it to change them to be consistent?
Right, the prefix is concistent across the APIs but not really for the
local function names. I wouldn't mind adding the "alloc" part, it just
seemed easier to fix the comment and backport that.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (4 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked() Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 07/21] binder: remove extern from function prototypes Carlos Llamas
` (14 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team, stable
Update the comments of binder_alloc_new_buf() to reflect that the return
value of the function is now ERR_PTR(-errno) on failure.
No functional changes in this patch.
Cc: stable@vger.kernel.org
Fixes: 57ada2fb2250 ("binder: add log information for binder transaction failures")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index cd720bb5c9ce..0e8312f4b771 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -558,7 +558,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* is the sum of the three given sizes (each rounded up to
* pointer-sized boundary)
*
- * Return: The allocated buffer or %NULL if error
+ * Return: The allocated buffer or %ERR_PTR(-errno) if error
*/
struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value
2023-11-02 18:59 ` [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen, stable,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Update the comments of binder_alloc_new_buf() to reflect that the return
> value of the function is now ERR_PTR(-errno) on failure.
>
> No functional changes in this patch.
>
> Cc: stable@vger.kernel.org
> Fixes: 57ada2fb2250 ("binder: add log information for binder transaction failures")
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 07/21] binder: remove extern from function prototypes
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (5 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 08/21] binder: keep vma addresses type as unsigned long Carlos Llamas
` (13 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
The kernel coding style does not require 'extern' in function prototypes
in .h files, so remove them from drivers/android/binder_alloc.h as they
are not needed.
No functional changes in this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.h | 38 +++++++++++++++++-----------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index dc1e2b01dd64..82380febdd85 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -121,27 +121,27 @@ static inline void binder_selftest_alloc(struct binder_alloc *alloc) {}
enum lru_status binder_alloc_free_page(struct list_head *item,
struct list_lru_one *lru,
spinlock_t *lock, void *cb_arg);
-extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
- size_t data_size,
- size_t offsets_size,
- size_t extra_buffers_size,
- int is_async,
- int pid);
-extern void binder_alloc_init(struct binder_alloc *alloc);
-extern int binder_alloc_shrinker_init(void);
-extern void binder_alloc_shrinker_exit(void);
-extern void binder_alloc_vma_close(struct binder_alloc *alloc);
-extern struct binder_buffer *
+struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
+ size_t data_size,
+ size_t offsets_size,
+ size_t extra_buffers_size,
+ int is_async,
+ int pid);
+void binder_alloc_init(struct binder_alloc *alloc);
+int binder_alloc_shrinker_init(void);
+void binder_alloc_shrinker_exit(void);
+void binder_alloc_vma_close(struct binder_alloc *alloc);
+struct binder_buffer *
binder_alloc_prepare_to_free(struct binder_alloc *alloc,
uintptr_t user_ptr);
-extern void binder_alloc_free_buf(struct binder_alloc *alloc,
- struct binder_buffer *buffer);
-extern int binder_alloc_mmap_handler(struct binder_alloc *alloc,
- struct vm_area_struct *vma);
-extern void binder_alloc_deferred_release(struct binder_alloc *alloc);
-extern int binder_alloc_get_allocated_count(struct binder_alloc *alloc);
-extern void binder_alloc_print_allocated(struct seq_file *m,
- struct binder_alloc *alloc);
+void binder_alloc_free_buf(struct binder_alloc *alloc,
+ struct binder_buffer *buffer);
+int binder_alloc_mmap_handler(struct binder_alloc *alloc,
+ struct vm_area_struct *vma);
+void binder_alloc_deferred_release(struct binder_alloc *alloc);
+int binder_alloc_get_allocated_count(struct binder_alloc *alloc);
+void binder_alloc_print_allocated(struct seq_file *m,
+ struct binder_alloc *alloc);
void binder_alloc_print_pages(struct seq_file *m,
struct binder_alloc *alloc);
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 07/21] binder: remove extern from function prototypes
2023-11-02 18:59 ` [PATCH 07/21] binder: remove extern from function prototypes Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> The kernel coding style does not require 'extern' in function prototypes
> in .h files, so remove them from drivers/android/binder_alloc.h as they
> are not needed.
>
> No functional changes in this patch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 08/21] binder: keep vma addresses type as unsigned long
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (6 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 07/21] binder: remove extern from function prototypes Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 09/21] binder: split up binder_update_page_range() Carlos Llamas
` (12 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
The vma addresses in binder are currently stored as void __user *. This
requires casting back and forth between the mm/ api which uses unsigned
long. Since we also do internal arithmetic on these addresses we end up
having to cast them _again_ to an integer type.
Lets stop all the unnecessary casting which kills code readability and
store the virtual addresses as the native unsigned long from mm/. Note
that this approach is preferred over uintptr_t as Linus explains in [1].
Opportunistically add a few cosmetic touchups.
Link: https://lore.kernel.org/all/CAHk-=wj2OHy-5e+srG1fy+ZU00TmZ1NFp6kFLbVLMXHe7A1d-g@mail.gmail.com/ [1]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder.c | 21 +++---
drivers/android/binder_alloc.c | 91 +++++++++++--------------
drivers/android/binder_alloc.h | 6 +-
drivers/android/binder_alloc_selftest.c | 6 +-
drivers/android/binder_trace.h | 2 +-
5 files changed, 57 insertions(+), 69 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 71a40a4c546f..437d1097b118 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2077,9 +2077,8 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
* Convert the address to an offset relative to
* the base of the transaction buffer.
*/
- fda_offset =
- (parent->buffer - (uintptr_t)buffer->user_data) +
- fda->parent_offset;
+ fda_offset = parent->buffer - buffer->user_data +
+ fda->parent_offset;
for (fd_index = 0; fd_index < fda->num_fds;
fd_index++) {
u32 fd;
@@ -2597,7 +2596,7 @@ static int binder_translate_fd_array(struct list_head *pf_head,
* Convert the address to an offset relative to
* the base of the transaction buffer.
*/
- fda_offset = (parent->buffer - (uintptr_t)t->buffer->user_data) +
+ fda_offset = parent->buffer - t->buffer->user_data +
fda->parent_offset;
sender_ufda_base = (void __user *)(uintptr_t)sender_uparent->buffer +
fda->parent_offset;
@@ -2672,8 +2671,9 @@ static int binder_fixup_parent(struct list_head *pf_head,
proc->pid, thread->pid);
return -EINVAL;
}
- buffer_offset = bp->parent_offset +
- (uintptr_t)parent->buffer - (uintptr_t)b->user_data;
+
+ buffer_offset = bp->parent_offset + parent->buffer - b->user_data;
+
return binder_add_fixup(pf_head, buffer_offset, bp->buffer, 0);
}
@@ -3250,7 +3250,7 @@ static void binder_transaction(struct binder_proc *proc,
ALIGN(extra_buffers_size, sizeof(void *)) -
ALIGN(secctx_sz, sizeof(u64));
- t->security_ctx = (uintptr_t)t->buffer->user_data + buf_offset;
+ t->security_ctx = t->buffer->user_data + buf_offset;
err = binder_alloc_copy_to_buffer(&target_proc->alloc,
t->buffer, buf_offset,
secctx, secctx_sz);
@@ -3527,8 +3527,7 @@ static void binder_transaction(struct binder_proc *proc,
goto err_translate_failed;
}
/* Fixup buffer pointer to target proc address space */
- bp->buffer = (uintptr_t)
- t->buffer->user_data + sg_buf_offset;
+ bp->buffer = t->buffer->user_data + sg_buf_offset;
sg_buf_offset += ALIGN(bp->length, sizeof(u64));
num_valid = (buffer_offset - off_start_offset) /
@@ -4698,7 +4697,7 @@ static int binder_thread_read(struct binder_proc *proc,
}
trd->data_size = t->buffer->data_size;
trd->offsets_size = t->buffer->offsets_size;
- trd->data.ptr.buffer = (uintptr_t)t->buffer->user_data;
+ trd->data.ptr.buffer = t->buffer->user_data;
trd->data.ptr.offsets = trd->data.ptr.buffer +
ALIGN(t->buffer->data_size,
sizeof(void *));
@@ -5981,7 +5980,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m,
}
if (buffer->target_node)
seq_printf(m, " node %d", buffer->target_node->debug_id);
- seq_printf(m, " size %zd:%zd data %pK\n",
+ seq_printf(m, " size %zd:%zd data %lx\n",
buffer->data_size, buffer->offsets_size,
buffer->user_data);
}
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 0e8312f4b771..b99d9845d69a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -125,23 +125,20 @@ static void binder_insert_allocated_buffer_locked(
static struct binder_buffer *binder_alloc_prepare_to_free_locked(
struct binder_alloc *alloc,
- uintptr_t user_ptr)
+ unsigned long user_ptr)
{
struct rb_node *n = alloc->allocated_buffers.rb_node;
struct binder_buffer *buffer;
- void __user *uptr;
-
- uptr = (void __user *)user_ptr;
while (n) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
BUG_ON(buffer->free);
- if (uptr < buffer->user_data)
+ if (user_ptr < buffer->user_data) {
n = n->rb_left;
- else if (uptr > buffer->user_data)
+ } else if (user_ptr > buffer->user_data) {
n = n->rb_right;
- else {
+ } else {
/*
* Guard against user threads attempting to
* free the buffer when in use by kernel or
@@ -168,7 +165,7 @@ static struct binder_buffer *binder_alloc_prepare_to_free_locked(
* Return: Pointer to buffer or NULL
*/
struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
- uintptr_t user_ptr)
+ unsigned long user_ptr)
{
struct binder_buffer *buffer;
@@ -179,18 +176,17 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
}
static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
- void __user *start, void __user *end)
+ unsigned long start, unsigned long end)
{
- void __user *page_addr;
- unsigned long user_page_addr;
- struct binder_lru_page *page;
struct vm_area_struct *vma = NULL;
+ struct binder_lru_page *page;
struct mm_struct *mm = NULL;
+ unsigned long page_addr;
bool need_mm = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: %s pages %pK-%pK\n", alloc->pid,
- allocate ? "allocate" : "free", start, end);
+ "%d: %s allocate pages %lx-%lx\n", alloc->pid,
+ allocate ? "allocate" : "free", start, end);
if (end <= start)
return 0;
@@ -249,18 +245,17 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
__GFP_HIGHMEM |
__GFP_ZERO);
if (!page->page_ptr) {
- pr_err("%d: binder_alloc_buf failed for page at %pK\n",
- alloc->pid, page_addr);
+ pr_err("%d: binder_alloc_buf failed for page at %lx\n",
+ alloc->pid, page_addr);
goto err_alloc_page_failed;
}
page->alloc = alloc;
INIT_LIST_HEAD(&page->lru);
- user_page_addr = (uintptr_t)page_addr;
- ret = vm_insert_page(vma, user_page_addr, page[0].page_ptr);
+ ret = vm_insert_page(vma, page_addr, page->page_ptr);
if (ret) {
pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n",
- alloc->pid, user_page_addr);
+ alloc->pid, page_addr);
goto err_vm_insert_page_failed;
}
@@ -378,9 +373,9 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_buffer *buffer;
size_t buffer_size;
struct rb_node *best_fit = NULL;
- void __user *has_page_addr;
- void __user *end_page_addr;
size_t size, data_offsets_size;
+ unsigned long has_page_addr;
+ unsigned long end_page_addr;
int ret;
/* Check binder_alloc is fully initialized */
@@ -479,15 +474,13 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
"%d: binder_alloc_buf size %zd got buffer %pK size %zd\n",
alloc->pid, size, buffer, buffer_size);
- has_page_addr = (void __user *)
- (((uintptr_t)buffer->user_data + buffer_size) & PAGE_MASK);
+ has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
WARN_ON(n && buffer_size != size);
- end_page_addr =
- (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size);
+ end_page_addr = PAGE_ALIGN(buffer->user_data + size);
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
- ret = binder_update_page_range(alloc, 1, (void __user *)
- PAGE_ALIGN((uintptr_t)buffer->user_data), end_page_addr);
+ ret = binder_update_page_range(alloc, 1, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
if (ret)
return ERR_PTR(ret);
@@ -500,7 +493,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
__func__, alloc->pid);
goto err_alloc_buf_struct_failed;
}
- new_buffer->user_data = (u8 __user *)buffer->user_data + size;
+ new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
new_buffer->free = 1;
binder_insert_free_buffer(alloc, new_buffer);
@@ -538,8 +531,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
return buffer;
err_alloc_buf_struct_failed:
- binder_update_page_range(alloc, 0, (void __user *)
- PAGE_ALIGN((uintptr_t)buffer->user_data),
+ binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
end_page_addr);
return ERR_PTR(-ENOMEM);
}
@@ -576,15 +568,14 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
return buffer;
}
-static void __user *buffer_start_page(struct binder_buffer *buffer)
+static unsigned long buffer_start_page(struct binder_buffer *buffer)
{
- return (void __user *)((uintptr_t)buffer->user_data & PAGE_MASK);
+ return buffer->user_data & PAGE_MASK;
}
-static void __user *prev_buffer_end_page(struct binder_buffer *buffer)
+static unsigned long prev_buffer_end_page(struct binder_buffer *buffer)
{
- return (void __user *)
- (((uintptr_t)(buffer->user_data) - 1) & PAGE_MASK);
+ return (buffer->user_data - 1) & PAGE_MASK;
}
static void binder_delete_free_buffer(struct binder_alloc *alloc,
@@ -599,7 +590,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (prev_buffer_end_page(prev) == buffer_start_page(buffer)) {
to_free = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK share page with %pK\n",
+ "%d: merge free, buffer %lx share page with %lx\n",
alloc->pid, buffer->user_data,
prev->user_data);
}
@@ -609,7 +600,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (buffer_start_page(next) == buffer_start_page(buffer)) {
to_free = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK share page with %pK\n",
+ "%d: merge free, buffer %lx share page with %lx\n",
alloc->pid,
buffer->user_data,
next->user_data);
@@ -618,17 +609,17 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (PAGE_ALIGNED(buffer->user_data)) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer start %pK is page aligned\n",
+ "%d: merge free, buffer start %lx is page aligned\n",
alloc->pid, buffer->user_data);
to_free = false;
}
if (to_free) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK do not share page with %pK or %pK\n",
+ "%d: merge free, buffer %lx do not share page with %lx or %lx\n",
alloc->pid, buffer->user_data,
prev->user_data,
- next ? next->user_data : NULL);
+ next ? next->user_data : 0);
binder_update_page_range(alloc, 0, buffer_start_page(buffer),
buffer_start_page(buffer) + PAGE_SIZE);
}
@@ -665,10 +656,8 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
alloc->pid, size, alloc->free_async_space);
}
- binder_update_page_range(alloc, 0,
- (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data),
- (void __user *)(((uintptr_t)
- buffer->user_data + buffer_size) & PAGE_MASK));
+ binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
+ (buffer->user_data + buffer_size) & PAGE_MASK);
rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
@@ -757,7 +746,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
SZ_4M);
mutex_unlock(&binder_alloc_mmap_lock);
- alloc->buffer = (void __user *)vma->vm_start;
+ alloc->buffer = vma->vm_start;
alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
sizeof(alloc->pages[0]),
@@ -790,7 +779,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
kfree(alloc->pages);
alloc->pages = NULL;
err_alloc_pages_failed:
- alloc->buffer = NULL;
+ alloc->buffer = 0;
mutex_lock(&binder_alloc_mmap_lock);
alloc->buffer_size = 0;
err_already_mapped:
@@ -843,7 +832,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
int i;
for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
- void __user *page_addr;
+ unsigned long page_addr;
bool on_lru;
if (!alloc->pages[i].page_ptr)
@@ -853,7 +842,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
&alloc->pages[i].lru);
page_addr = alloc->buffer + i * PAGE_SIZE;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%s: %d: page %d at %pK %s\n",
+ "%s: %d: page %d at %lx %s\n",
__func__, alloc->pid, i, page_addr,
on_lru ? "on lru" : "active");
__free_page(alloc->pages[i].page_ptr);
@@ -873,7 +862,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
static void print_binder_buffer(struct seq_file *m, const char *prefix,
struct binder_buffer *buffer)
{
- seq_printf(m, "%s %d: %pK size %zd:%zd:%zd %s\n",
+ seq_printf(m, "%s %d: %lx size %zd:%zd:%zd %s\n",
prefix, buffer->debug_id, buffer->user_data,
buffer->data_size, buffer->offsets_size,
buffer->extra_buffers_size,
@@ -987,7 +976,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
struct binder_lru_page,
lru);
struct binder_alloc *alloc;
- uintptr_t page_addr;
+ unsigned long page_addr;
size_t index;
struct vm_area_struct *vma;
@@ -999,7 +988,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_page_already_freed;
index = page - alloc->pages;
- page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
+ page_addr = alloc->buffer + index * PAGE_SIZE;
mm = alloc->mm;
if (!mmget_not_zero(mm))
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 82380febdd85..cb19677a5c15 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -56,7 +56,7 @@ struct binder_buffer {
size_t data_size;
size_t offsets_size;
size_t extra_buffers_size;
- void __user *user_data;
+ unsigned long user_data;
int pid;
};
@@ -101,7 +101,7 @@ struct binder_alloc {
struct mutex mutex;
struct vm_area_struct *vma;
struct mm_struct *mm;
- void __user *buffer;
+ unsigned long buffer;
struct list_head buffers;
struct rb_root free_buffers;
struct rb_root allocated_buffers;
@@ -133,7 +133,7 @@ void binder_alloc_shrinker_exit(void);
void binder_alloc_vma_close(struct binder_alloc *alloc);
struct binder_buffer *
binder_alloc_prepare_to_free(struct binder_alloc *alloc,
- uintptr_t user_ptr);
+ unsigned long user_ptr);
void binder_alloc_free_buf(struct binder_alloc *alloc,
struct binder_buffer *buffer);
int binder_alloc_mmap_handler(struct binder_alloc *alloc,
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index c2b323bc3b3a..341c73b4a807 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -93,11 +93,11 @@ static bool check_buffer_pages_allocated(struct binder_alloc *alloc,
struct binder_buffer *buffer,
size_t size)
{
- void __user *page_addr;
- void __user *end;
+ unsigned long page_addr;
+ unsigned long end;
int page_index;
- end = (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size);
+ end = PAGE_ALIGN(buffer->user_data + size);
page_addr = buffer->user_data;
for (; page_addr < end; page_addr += PAGE_SIZE) {
page_index = (page_addr - alloc->buffer) / PAGE_SIZE;
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index 8cc07e6a4273..fe38c6fc65d0 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h
@@ -317,7 +317,7 @@ DEFINE_EVENT(binder_buffer_class, binder_transaction_update_buffer_release,
TRACE_EVENT(binder_update_page_range,
TP_PROTO(struct binder_alloc *alloc, bool allocate,
- void __user *start, void __user *end),
+ unsigned long start, unsigned long end),
TP_ARGS(alloc, allocate, start, end),
TP_STRUCT__entry(
__field(int, proc)
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 08/21] binder: keep vma addresses type as unsigned long
2023-11-02 18:59 ` [PATCH 08/21] binder: keep vma addresses type as unsigned long Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:01 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> - seq_printf(m, " size %zd:%zd data %pK\n",
> + seq_printf(m, " size %zd:%zd data %lx\n",
> buffer->data_size, buffer->offsets_size,
> buffer->user_data);
This changes all of the print operations to use %lx instead of %pK,
which means that the addresses are no longer being hidden when using
kptr_restrict.
Since the pointers are all userspace pointers, it's not clear to me what
the consequences of this are. However, I'd like to confirm whether this
is an intentional change?
And if so, maybe it should use %px instead?
(Everything else in this change LGTM.)
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 08/21] binder: keep vma addresses type as unsigned long
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:01 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:01 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:13AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > - seq_printf(m, " size %zd:%zd data %pK\n",
> > + seq_printf(m, " size %zd:%zd data %lx\n",
> > buffer->data_size, buffer->offsets_size,
> > buffer->user_data);
>
> This changes all of the print operations to use %lx instead of %pK,
> which means that the addresses are no longer being hidden when using
> kptr_restrict.
>
> Since the pointers are all userspace pointers, it's not clear to me what
> the consequences of this are. However, I'd like to confirm whether this
> is an intentional change?
I confirm the change was intentional, we are giving the impression that
these are kernel pointers when they are not. However, I do think your
concern is valid. I've added a patch to v2 to deal with this.
I can tell you we are already logging the unhashed userspace addresses
in other places and we should probably fix those too.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 09/21] binder: split up binder_update_page_range()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (7 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 08/21] binder: keep vma addresses type as unsigned long Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf() Carlos Llamas
` (11 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
The binder_update_page_range() function performs both allocation and
freeing of binder pages. However, these two operations are unrelated and
have no common logic. In fact, when a free operation is requested, the
allocation logic is skipped entirely. This behavior makes the error path
unnecessarily complex. To improve readability of the code, this patch
splits the allocation and freeing operations into separate functions.
No functional changes are introduced by this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 83 ++++++++++++++++++----------------
1 file changed, 44 insertions(+), 39 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index b99d9845d69a..27c7163761c4 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -175,8 +175,36 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
return buffer;
}
-static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
- unsigned long start, unsigned long end)
+static void binder_free_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
+{
+ struct binder_lru_page *page;
+ unsigned long page_addr;
+
+ binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
+ "%d: free pages %lx-%lx\n",
+ alloc->pid, start, end);
+
+ trace_binder_update_page_range(alloc, false, start, end);
+
+ for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
+ size_t index;
+ int ret;
+
+ index = (page_addr - alloc->buffer) / PAGE_SIZE;
+ page = &alloc->pages[index];
+
+ trace_binder_free_lru_start(alloc, index);
+
+ ret = list_lru_add(&binder_alloc_lru, &page->lru);
+ WARN_ON(!ret);
+
+ trace_binder_free_lru_end(alloc, index);
+ }
+}
+
+static int binder_allocate_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
{
struct vm_area_struct *vma = NULL;
struct binder_lru_page *page;
@@ -185,16 +213,13 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
bool need_mm = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: %s allocate pages %lx-%lx\n", alloc->pid,
- allocate ? "allocate" : "free", start, end);
+ "%d: allocate pages %lx-%lx\n",
+ alloc->pid, start, end);
if (end <= start)
return 0;
- trace_binder_update_page_range(alloc, allocate, start, end);
-
- if (allocate == 0)
- goto free_range;
+ trace_binder_update_page_range(alloc, true, start, end);
for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
page = &alloc->pages[(page_addr - alloc->buffer) / PAGE_SIZE];
@@ -270,32 +295,12 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
return 0;
-free_range:
- for (page_addr = end - PAGE_SIZE; 1; page_addr -= PAGE_SIZE) {
- bool ret;
- size_t index;
-
- index = (page_addr - alloc->buffer) / PAGE_SIZE;
- page = &alloc->pages[index];
-
- trace_binder_free_lru_start(alloc, index);
-
- ret = list_lru_add(&binder_alloc_lru, &page->lru);
- WARN_ON(!ret);
-
- trace_binder_free_lru_end(alloc, index);
- if (page_addr == start)
- break;
- continue;
-
err_vm_insert_page_failed:
- __free_page(page->page_ptr);
- page->page_ptr = NULL;
+ __free_page(page->page_ptr);
+ page->page_ptr = NULL;
err_alloc_page_failed:
err_page_ptr_cleared:
- if (page_addr == start)
- break;
- }
+ binder_free_page_range(alloc, start, page_addr);
err_no_vma:
if (mm) {
mmap_write_unlock(mm);
@@ -479,8 +484,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
end_page_addr = PAGE_ALIGN(buffer->user_data + size);
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
- ret = binder_update_page_range(alloc, 1, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
+ ret = binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
if (ret)
return ERR_PTR(ret);
@@ -531,8 +536,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
return buffer;
err_alloc_buf_struct_failed:
- binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
+ binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
return ERR_PTR(-ENOMEM);
}
@@ -620,8 +625,8 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
alloc->pid, buffer->user_data,
prev->user_data,
next ? next->user_data : 0);
- binder_update_page_range(alloc, 0, buffer_start_page(buffer),
- buffer_start_page(buffer) + PAGE_SIZE);
+ binder_free_page_range(alloc, buffer_start_page(buffer),
+ buffer_start_page(buffer) + PAGE_SIZE);
}
list_del(&buffer->entry);
kfree(buffer);
@@ -656,8 +661,8 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
alloc->pid, size, alloc->free_async_space);
}
- binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
- (buffer->user_data + buffer_size) & PAGE_MASK);
+ binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ (buffer->user_data + buffer_size) & PAGE_MASK);
rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 09/21] binder: split up binder_update_page_range()
2023-11-02 18:59 ` [PATCH 09/21] binder: split up binder_update_page_range() Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:03 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> The binder_update_page_range() function performs both allocation and
> freeing of binder pages. However, these two operations are unrelated and
> have no common logic. In fact, when a free operation is requested, the
> allocation logic is skipped entirely. This behavior makes the error path
> unnecessarily complex. To improve readability of the code, this patch
> splits the allocation and freeing operations into separate functions.
>
> No functional changes are introduced by this patch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
The part below err_vm_insert_page_failed was a bit tricky, but I agree
that this is correct.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 09/21] binder: split up binder_update_page_range()
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:03 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:03 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:15AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > The binder_update_page_range() function performs both allocation and
> > freeing of binder pages. However, these two operations are unrelated and
> > have no common logic. In fact, when a free operation is requested, the
> > allocation logic is skipped entirely. This behavior makes the error path
> > unnecessarily complex. To improve readability of the code, this patch
> > splits the allocation and freeing operations into separate functions.
> >
> > No functional changes are introduced by this patch.
> >
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> The part below err_vm_insert_page_failed was a bit tricky, but I agree
> that this is correct.
Yes, this was a big reason behind this patch.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (8 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 09/21] binder: split up binder_update_page_range() Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 11/21] binder: remove pid param " Carlos Llamas
` (10 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Extract non-critical sections from binder_alloc_new_buf_locked() that
don't require holding the alloc->mutex. While we are here, consolidate
the multiple checks for size overflow into a single statement.
Also add a few touchups to follow the coding guidelines.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 85 ++++++++++++++++++----------------
1 file changed, 44 insertions(+), 41 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 27c7163761c4..ed1f52f98b0d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -368,9 +368,7 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
- size_t data_size,
- size_t offsets_size,
- size_t extra_buffers_size,
+ size_t size,
int is_async,
int pid)
{
@@ -378,39 +376,10 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_buffer *buffer;
size_t buffer_size;
struct rb_node *best_fit = NULL;
- size_t size, data_offsets_size;
unsigned long has_page_addr;
unsigned long end_page_addr;
int ret;
- /* Check binder_alloc is fully initialized */
- if (!binder_alloc_get_vma(alloc)) {
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "%d: binder_alloc_buf, no vma\n",
- alloc->pid);
- return ERR_PTR(-ESRCH);
- }
-
- data_offsets_size = ALIGN(data_size, sizeof(void *)) +
- ALIGN(offsets_size, sizeof(void *));
-
- if (data_offsets_size < data_size || data_offsets_size < offsets_size) {
- binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: got transaction with invalid size %zd-%zd\n",
- alloc->pid, data_size, offsets_size);
- return ERR_PTR(-EINVAL);
- }
- size = data_offsets_size + ALIGN(extra_buffers_size, sizeof(void *));
- if (size < data_offsets_size || size < extra_buffers_size) {
- binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: got transaction with invalid extra_buffers_size %zd\n",
- alloc->pid, extra_buffers_size);
- return ERR_PTR(-EINVAL);
- }
-
- /* Pad 0-size buffers so they get assigned unique addresses */
- size = max(size, sizeof(void *));
-
if (is_async &&
alloc->free_async_space < size + sizeof(struct binder_buffer)) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
@@ -427,13 +396,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
if (size < buffer_size) {
best_fit = n;
n = n->rb_left;
- } else if (size > buffer_size)
+ } else if (size > buffer_size) {
n = n->rb_right;
- else {
+ } else {
best_fit = n;
break;
}
}
+
if (best_fit == NULL) {
size_t allocated_buffers = 0;
size_t largest_alloc_size = 0;
@@ -511,11 +481,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: binder_alloc_buf size %zd got %pK\n",
alloc->pid, size, buffer);
- buffer->data_size = data_size;
- buffer->offsets_size = offsets_size;
- buffer->async_transaction = is_async;
- buffer->extra_buffers_size = extra_buffers_size;
- buffer->pid = pid;
+
buffer->oneway_spam_suspect = false;
if (is_async) {
alloc->free_async_space -= size + sizeof(struct binder_buffer);
@@ -533,6 +499,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
alloc->oneway_spam_detected = false;
}
}
+
return buffer;
err_alloc_buf_struct_failed:
@@ -565,11 +532,47 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
int pid)
{
struct binder_buffer *buffer;
+ size_t size;
+
+ /* Check binder_alloc is fully initialized */
+ if (!binder_alloc_get_vma(alloc)) {
+ binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+ "%d: binder_alloc_buf, no vma\n",
+ alloc->pid);
+ return ERR_PTR(-ESRCH);
+ }
+
+ size = ALIGN(data_size, sizeof(void *)) +
+ ALIGN(offsets_size, sizeof(void *)) +
+ ALIGN(extra_buffers_size, sizeof(void *));
+
+ if (size < data_size ||
+ size < offsets_size ||
+ size < extra_buffers_size) {
+ binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
+ "%d: got transaction with invalid size %zd-%zd-%zd\n",
+ alloc->pid, data_size, offsets_size,
+ extra_buffers_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ /* Pad 0-size buffers so they get assigned unique addresses */
+ size = max(size, sizeof(void *));
mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, data_size, offsets_size,
- extra_buffers_size, is_async, pid);
+ buffer = binder_alloc_new_buf_locked(alloc, size, is_async, pid);
mutex_unlock(&alloc->mutex);
+
+ if (IS_ERR(buffer))
+ goto out;
+
+ buffer->data_size = data_size;
+ buffer->offsets_size = offsets_size;
+ buffer->async_transaction = is_async;
+ buffer->extra_buffers_size = extra_buffers_size;
+ buffer->pid = pid;
+
+out:
return buffer;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf()
2023-11-02 18:59 ` [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf() Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:10 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
I found a few issues in this patch:
Carlos Llamas <cmllamas@google.com> writes:
> - data_offsets_size = ALIGN(data_size, sizeof(void *)) +
> - ALIGN(offsets_size, sizeof(void *));
> -
> - if (data_offsets_size < data_size || data_offsets_size < offsets_size) {
> - binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
> - "%d: got transaction with invalid size %zd-%zd\n",
> - alloc->pid, data_size, offsets_size);
> - return ERR_PTR(-EINVAL);
> - }
> - size = data_offsets_size + ALIGN(extra_buffers_size, sizeof(void *));
> - if (size < data_offsets_size || size < extra_buffers_size) {
> - binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
> - "%d: got transaction with invalid extra_buffers_size %zd\n",
> - alloc->pid, extra_buffers_size);
> - return ERR_PTR(-EINVAL);
> - }
> [snip]
> + size = ALIGN(data_size, sizeof(void *)) +
> + ALIGN(offsets_size, sizeof(void *)) +
> + ALIGN(extra_buffers_size, sizeof(void *));
> +
> + if (size < data_size ||
> + size < offsets_size ||
> + size < extra_buffers_size) {
> + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
> + "%d: got transaction with invalid size %zd-%zd-%zd\n",
> + alloc->pid, data_size, offsets_size,
> + extra_buffers_size);
> + return ERR_PTR(-EINVAL);
> + }
Consolidating the overflow check into one if statement like this doesn't
catch all cases of integer overflow. For example, if all three sizes are
9223372036854775816, then the computed size will be 9223372036854775832,
so this would not trigger the overflow check.
Carlos Llamas <cmllamas@google.com> writes:
> mutex_unlock(&alloc->mutex);
> +
> + if (IS_ERR(buffer))
> + goto out;
> +
> + buffer->data_size = data_size;
> + buffer->offsets_size = offsets_size;
> + buffer->async_transaction = is_async;
> + buffer->extra_buffers_size = extra_buffers_size;
> + buffer->pid = pid;
With this change, if there is a concurrent call to
debug_low_async_space_locked, then there is a data race on the
async_transaction field. Similarly for print_binder_buffer.
Perhaps these writes should be moved before the mutex_unlock?
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf()
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:10 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:10 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:18AM +0000, Alice Ryhl wrote:
> I found a few issues in this patch:
>
> Consolidating the overflow check into one if statement like this doesn't
> catch all cases of integer overflow. For example, if all three sizes are
> 9223372036854775816, then the computed size will be 9223372036854775832,
> so this would not trigger the overflow check.
Thanks for pointing this out, you are right.
I don't understand the reasoning behind using size_t for the uapi. It
just made things more complicated than needed. These sizes are much
larger than the maximum buffer size of SZ_4M.
Anyway, I've fixed this for v2.
>
> Carlos Llamas <cmllamas@google.com> writes:
> > mutex_unlock(&alloc->mutex);
> > +
> > + if (IS_ERR(buffer))
> > + goto out;
> > +
> > + buffer->data_size = data_size;
> > + buffer->offsets_size = offsets_size;
> > + buffer->async_transaction = is_async;
> > + buffer->extra_buffers_size = extra_buffers_size;
> > + buffer->pid = pid;
>
> With this change, if there is a concurrent call to
> debug_low_async_space_locked, then there is a data race on the
> async_transaction field. Similarly for print_binder_buffer.
>
> Perhaps these writes should be moved before the mutex_unlock?
Also fixed, thanks!
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 11/21] binder: remove pid param in binder_alloc_new_buf()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (9 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf() Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 12/21] binder: separate the no-space debugging logic Carlos Llamas
` (9 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Binder attributes the buffer allocation to the current->tgid everytime.
There is no need to pass this as a parameter so drop it.
Also add a few touchups to follow the coding guidelines. No functional
changes are introduced in this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder.c | 2 +-
drivers/android/binder_alloc.c | 18 ++++++++----------
drivers/android/binder_alloc.h | 7 ++-----
drivers/android/binder_alloc_selftest.c | 2 +-
4 files changed, 12 insertions(+), 17 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 437d1097b118..45674af6310f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3225,7 +3225,7 @@ static void binder_transaction(struct binder_proc *proc,
t->buffer = binder_alloc_new_buf(&target_proc->alloc, tr->data_size,
tr->offsets_size, extra_buffers_size,
- !reply && (t->flags & TF_ONE_WAY), current->tgid);
+ !reply && (t->flags & TF_ONE_WAY));
if (IS_ERR(t->buffer)) {
char *s;
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index ed1f52f98b0d..eacc5a3ce538 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -323,7 +323,7 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return smp_load_acquire(&alloc->vma);
}
-static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
+static bool debug_low_async_space_locked(struct binder_alloc *alloc)
{
/*
* Find the amount and size of buffers allocated by the current caller;
@@ -332,10 +332,11 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
* and at some point we'll catch them in the act. This is more efficient
* than keeping a map per pid.
*/
- struct rb_node *n;
struct binder_buffer *buffer;
size_t total_alloc_size = 0;
+ int pid = current->tgid;
size_t num_buffers = 0;
+ struct rb_node *n;
for (n = rb_first(&alloc->allocated_buffers); n != NULL;
n = rb_next(n)) {
@@ -369,8 +370,7 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
size_t size,
- int is_async,
- int pid)
+ int is_async)
{
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -494,7 +494,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* of async space left (which is less than 10% of total
* buffer size).
*/
- buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc, pid);
+ buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc);
} else {
alloc->oneway_spam_detected = false;
}
@@ -515,7 +515,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* @offsets_size: user specified buffer offset
* @extra_buffers_size: size of extra space for meta-data (eg, security context)
* @is_async: buffer for async transaction
- * @pid: pid to attribute allocation to (used for debugging)
*
* Allocate a new buffer given the requested sizes. Returns
* the kernel version of the buffer pointer. The size allocated
@@ -528,8 +527,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
- int is_async,
- int pid)
+ int is_async)
{
struct binder_buffer *buffer;
size_t size;
@@ -560,7 +558,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size = max(size, sizeof(void *));
mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, size, is_async, pid);
+ buffer = binder_alloc_new_buf_locked(alloc, size, is_async);
mutex_unlock(&alloc->mutex);
if (IS_ERR(buffer))
@@ -570,7 +568,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
buffer->offsets_size = offsets_size;
buffer->async_transaction = is_async;
buffer->extra_buffers_size = extra_buffers_size;
- buffer->pid = pid;
+ buffer->pid = current->tgid;
out:
return buffer;
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index cb19677a5c15..bbc16bc6d5ac 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -49,15 +49,13 @@ struct binder_buffer {
unsigned async_transaction:1;
unsigned oneway_spam_suspect:1;
unsigned debug_id:27;
-
struct binder_transaction *transaction;
-
struct binder_node *target_node;
size_t data_size;
size_t offsets_size;
size_t extra_buffers_size;
unsigned long user_data;
- int pid;
+ int pid;
};
/**
@@ -125,8 +123,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
- int is_async,
- int pid);
+ int is_async);
void binder_alloc_init(struct binder_alloc *alloc);
int binder_alloc_shrinker_init(void);
void binder_alloc_shrinker_exit(void);
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 341c73b4a807..ed753747e54c 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -119,7 +119,7 @@ static void binder_selftest_alloc_buf(struct binder_alloc *alloc,
int i;
for (i = 0; i < BUFFER_NUM; i++) {
- buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0, 0);
+ buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0);
if (IS_ERR(buffers[i]) ||
!check_buffer_pages_allocated(alloc, buffers[i],
sizes[i])) {
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 11/21] binder: remove pid param in binder_alloc_new_buf()
2023-11-02 18:59 ` [PATCH 11/21] binder: remove pid param " Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Binder attributes the buffer allocation to the current->tgid everytime.
> There is no need to pass this as a parameter so drop it.
>
> Also add a few touchups to follow the coding guidelines. No functional
> changes are introduced in this patch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
You could argue that this code would be cleaner without this change,
since it now relies on global state. However, I'm okay either way.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 12/21] binder: separate the no-space debugging logic
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (10 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 11/21] binder: remove pid param " Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 13/21] binder: relocate low space calculation Carlos Llamas
` (8 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Move the no-space debugging logic into a separate function. Lets also
mark this branch as unlikely in binder_alloc_new_buf_locked() as most
requests will fit without issue.
Also add a few cosmetic changes and suggestions from checkpatch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 71 +++++++++++++++++++---------------
1 file changed, 40 insertions(+), 31 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index eacc5a3ce538..3fe5c734c3df 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -323,6 +323,43 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return smp_load_acquire(&alloc->vma);
}
+static void debug_no_space_locked(struct binder_alloc *alloc)
+{
+ size_t largest_alloc_size = 0;
+ struct binder_buffer *buffer;
+ size_t allocated_buffers = 0;
+ size_t largest_free_size = 0;
+ size_t total_alloc_size = 0;
+ size_t total_free_size = 0;
+ size_t free_buffers = 0;
+ size_t buffer_size;
+ struct rb_node *n;
+
+ for (n = rb_first(&alloc->allocated_buffers); n; n = rb_next(n)) {
+ buffer = rb_entry(n, struct binder_buffer, rb_node);
+ buffer_size = binder_alloc_buffer_size(alloc, buffer);
+ allocated_buffers++;
+ total_alloc_size += buffer_size;
+ if (buffer_size > largest_alloc_size)
+ largest_alloc_size = buffer_size;
+ }
+
+ for (n = rb_first(&alloc->free_buffers); n; n = rb_next(n)) {
+ buffer = rb_entry(n, struct binder_buffer, rb_node);
+ buffer_size = binder_alloc_buffer_size(alloc, buffer);
+ free_buffers++;
+ total_free_size += buffer_size;
+ if (buffer_size > largest_free_size)
+ largest_free_size = buffer_size;
+ }
+
+ binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+ "allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)\n",
+ total_alloc_size, allocated_buffers,
+ largest_alloc_size, total_free_size,
+ free_buffers, largest_free_size);
+}
+
static bool debug_low_async_space_locked(struct binder_alloc *alloc)
{
/*
@@ -404,42 +441,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
}
}
- if (best_fit == NULL) {
- size_t allocated_buffers = 0;
- size_t largest_alloc_size = 0;
- size_t total_alloc_size = 0;
- size_t free_buffers = 0;
- size_t largest_free_size = 0;
- size_t total_free_size = 0;
-
- for (n = rb_first(&alloc->allocated_buffers); n != NULL;
- n = rb_next(n)) {
- buffer = rb_entry(n, struct binder_buffer, rb_node);
- buffer_size = binder_alloc_buffer_size(alloc, buffer);
- allocated_buffers++;
- total_alloc_size += buffer_size;
- if (buffer_size > largest_alloc_size)
- largest_alloc_size = buffer_size;
- }
- for (n = rb_first(&alloc->free_buffers); n != NULL;
- n = rb_next(n)) {
- buffer = rb_entry(n, struct binder_buffer, rb_node);
- buffer_size = binder_alloc_buffer_size(alloc, buffer);
- free_buffers++;
- total_free_size += buffer_size;
- if (buffer_size > largest_free_size)
- largest_free_size = buffer_size;
- }
+ if (unlikely(!best_fit)) {
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf size %zd failed, no address space\n",
alloc->pid, size);
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)\n",
- total_alloc_size, allocated_buffers,
- largest_alloc_size, total_free_size,
- free_buffers, largest_free_size);
+ debug_no_space_locked(alloc);
return ERR_PTR(-ENOSPC);
}
+
if (n == NULL) {
buffer = rb_entry(best_fit, struct binder_buffer, rb_node);
buffer_size = binder_alloc_buffer_size(alloc, buffer);
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 12/21] binder: separate the no-space debugging logic
2023-11-02 18:59 ` [PATCH 12/21] binder: separate the no-space debugging logic Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Move the no-space debugging logic into a separate function. Lets also
> mark this branch as unlikely in binder_alloc_new_buf_locked() as most
> requests will fit without issue.
>
> Also add a few cosmetic changes and suggestions from checkpatch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
I double checked that this does not change behavior.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 13/21] binder: relocate low space calculation
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (11 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 12/21] binder: separate the no-space debugging logic Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 14/21] binder: do not add pages to LRU in release path Carlos Llamas
` (7 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Move the low async space calculation to debug_low_async_space_locked().
This logic not only fits better here but also offloads some of the many
tasks currently done in binder_alloc_new_buf_locked().
No functional change in this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 3fe5c734c3df..cc627c106a01 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -375,6 +375,15 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc)
size_t num_buffers = 0;
struct rb_node *n;
+ /*
+ * Only start detecting spammers once we have less than 20% of async
+ * space left (which is less than 10% of total buffer size).
+ */
+ if (alloc->free_async_space >= alloc->buffer_size / 10) {
+ alloc->oneway_spam_detected = false;
+ return false;
+ }
+
for (n = rb_first(&alloc->allocated_buffers); n != NULL;
n = rb_next(n)) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -497,16 +506,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC,
"%d: binder_alloc_buf size %zd async free %zd\n",
alloc->pid, size, alloc->free_async_space);
- if (alloc->free_async_space < alloc->buffer_size / 10) {
- /*
- * Start detecting spammers once we have less than 20%
- * of async space left (which is less than 10% of total
- * buffer size).
- */
- buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc);
- } else {
- alloc->oneway_spam_detected = false;
- }
+ if (debug_low_async_space_locked(alloc))
+ buffer->oneway_spam_suspect = true;
}
return buffer;
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 13/21] binder: relocate low space calculation
2023-11-02 18:59 ` [PATCH 13/21] binder: relocate low space calculation Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:12 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Move the low async space calculation to debug_low_async_space_locked().
> This logic not only fits better here but also offloads some of the many
> tasks currently done in binder_alloc_new_buf_locked().
>
> No functional change in this patch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
One suggestion below, but I'm fine either way.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Carlos Llamas <cmllamas@google.com> writes:
> + if (debug_low_async_space_locked(alloc))
> + buffer->oneway_spam_suspect = true;
You could avoid a branch here like this:
buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc);
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 13/21] binder: relocate low space calculation
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:12 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:12 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:26AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > Move the low async space calculation to debug_low_async_space_locked().
> > This logic not only fits better here but also offloads some of the many
> > tasks currently done in binder_alloc_new_buf_locked().
> >
> > No functional change in this patch.
> >
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> One suggestion below, but I'm fine either way.
>
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>
>
>
> Carlos Llamas <cmllamas@google.com> writes:
> > + if (debug_low_async_space_locked(alloc))
> > + buffer->oneway_spam_suspect = true;
>
> You could avoid a branch here like this:
>
Sure, sounds good to me.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 14/21] binder: do not add pages to LRU in release path
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (12 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 13/21] binder: relocate low space calculation Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 15/21] binder: relocate binder_alloc_clear_buf() Carlos Llamas
` (6 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
In binder_alloc_deferred_release() pages are added to the LRU list via
binder_free_buf_locked(). However, this is pointless because these pages
are kfree'd immediately afterwards. Add an option to skip the LRU list.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index cc627c106a01..d2a38dee12db 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -595,16 +595,16 @@ static unsigned long prev_buffer_end_page(struct binder_buffer *buffer)
}
static void binder_delete_free_buffer(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
+ struct binder_buffer *buffer,
+ bool free_pages)
{
struct binder_buffer *prev, *next = NULL;
- bool to_free = true;
BUG_ON(alloc->buffers.next == &buffer->entry);
prev = binder_buffer_prev(buffer);
BUG_ON(!prev->free);
if (prev_buffer_end_page(prev) == buffer_start_page(buffer)) {
- to_free = false;
+ free_pages = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx share page with %lx\n",
alloc->pid, buffer->user_data,
@@ -614,7 +614,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (!list_is_last(&buffer->entry, &alloc->buffers)) {
next = binder_buffer_next(buffer);
if (buffer_start_page(next) == buffer_start_page(buffer)) {
- to_free = false;
+ free_pages = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx share page with %lx\n",
alloc->pid,
@@ -627,10 +627,10 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer start %lx is page aligned\n",
alloc->pid, buffer->user_data);
- to_free = false;
+ free_pages = false;
}
- if (to_free) {
+ if (free_pages) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx do not share page with %lx or %lx\n",
alloc->pid, buffer->user_data,
@@ -644,7 +644,8 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
}
static void binder_free_buf_locked(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
+ struct binder_buffer *buffer,
+ bool free_pages)
{
size_t size, buffer_size;
@@ -672,8 +673,9 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
alloc->pid, size, alloc->free_async_space);
}
- binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- (buffer->user_data + buffer_size) & PAGE_MASK);
+ if (free_pages)
+ binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ (buffer->user_data + buffer_size) & PAGE_MASK);
rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
@@ -682,14 +684,14 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
if (next->free) {
rb_erase(&next->rb_node, &alloc->free_buffers);
- binder_delete_free_buffer(alloc, next);
+ binder_delete_free_buffer(alloc, next, free_pages);
}
}
if (alloc->buffers.next != &buffer->entry) {
struct binder_buffer *prev = binder_buffer_prev(buffer);
if (prev->free) {
- binder_delete_free_buffer(alloc, buffer);
+ binder_delete_free_buffer(alloc, buffer, free_pages);
rb_erase(&prev->rb_node, &alloc->free_buffers);
buffer = prev;
}
@@ -722,7 +724,7 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
buffer->clear_on_free = false;
}
mutex_lock(&alloc->mutex);
- binder_free_buf_locked(alloc, buffer);
+ binder_free_buf_locked(alloc, buffer, true);
mutex_unlock(&alloc->mutex);
}
@@ -829,7 +831,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
binder_alloc_clear_buf(alloc, buffer);
buffer->clear_on_free = false;
}
- binder_free_buf_locked(alloc, buffer);
+ binder_free_buf_locked(alloc, buffer, false);
buffers++;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 14/21] binder: do not add pages to LRU in release path
2023-11-02 18:59 ` [PATCH 14/21] binder: do not add pages to LRU in release path Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:15 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> In binder_alloc_deferred_release() pages are added to the LRU list via
> binder_free_buf_locked(). However, this is pointless because these pages
> are kfree'd immediately afterwards. Add an option to skip the LRU list.
They aren't freed with kfree, buf with __free_page.
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
The change itself looks correct, so I'll give my tag for this:
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
But I'm wondering whether process cleanup really is so performance
intensive to justify the added complexity of this?
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 14/21] binder: do not add pages to LRU in release path
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:15 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:15 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:29AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > In binder_alloc_deferred_release() pages are added to the LRU list via
> > binder_free_buf_locked(). However, this is pointless because these pages
> > are kfree'd immediately afterwards. Add an option to skip the LRU list.
>
> They aren't freed with kfree, buf with __free_page.
>
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> The change itself looks correct, so I'll give my tag for this:
>
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>
>
> But I'm wondering whether process cleanup really is so performance
> intensive to justify the added complexity of this?
So, this was needed on an earlier version of the patchset and I was
hoping that it would also help with an issue reported here:
https://lore.kernel.org/all/ZSHmtLqtNZRAtaZ0@google.com/
However, I do agree that it is unecessary at this stage so I've decided
to drop it from v2.
Thanks,
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 15/21] binder: relocate binder_alloc_clear_buf()
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (13 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 14/21] binder: do not add pages to LRU in release path Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 16/21] binder: refactor page range allocation Carlos Llamas
` (5 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Move this function up along with binder_alloc_get_page() so that their
prototypes aren't necessary.
No functional change in this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 124 ++++++++++++++++-----------------
1 file changed, 61 insertions(+), 63 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index d2a38dee12db..4821a29799c8 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -699,8 +699,68 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
binder_insert_free_buffer(alloc, buffer);
}
+/**
+ * binder_alloc_get_page() - get kernel pointer for given buffer offset
+ * @alloc: binder_alloc for this proc
+ * @buffer: binder buffer to be accessed
+ * @buffer_offset: offset into @buffer data
+ * @pgoffp: address to copy final page offset to
+ *
+ * Lookup the struct page corresponding to the address
+ * at @buffer_offset into @buffer->user_data. If @pgoffp is not
+ * NULL, the byte-offset into the page is written there.
+ *
+ * The caller is responsible to ensure that the offset points
+ * to a valid address within the @buffer and that @buffer is
+ * not freeable by the user. Since it can't be freed, we are
+ * guaranteed that the corresponding elements of @alloc->pages[]
+ * cannot change.
+ *
+ * Return: struct page
+ */
+static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
+ struct binder_buffer *buffer,
+ binder_size_t buffer_offset,
+ pgoff_t *pgoffp)
+{
+ binder_size_t buffer_space_offset = buffer_offset +
+ (buffer->user_data - alloc->buffer);
+ pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
+ size_t index = buffer_space_offset >> PAGE_SHIFT;
+ struct binder_lru_page *lru_page;
+
+ lru_page = &alloc->pages[index];
+ *pgoffp = pgoff;
+ return lru_page->page_ptr;
+}
+
+/**
+ * binder_alloc_clear_buf() - zero out buffer
+ * @alloc: binder_alloc for this proc
+ * @buffer: binder buffer to be cleared
+ *
+ * memset the given buffer to 0
+ */
static void binder_alloc_clear_buf(struct binder_alloc *alloc,
- struct binder_buffer *buffer);
+ struct binder_buffer *buffer)
+{
+ size_t bytes = binder_alloc_buffer_size(alloc, buffer);
+ binder_size_t buffer_offset = 0;
+
+ while (bytes) {
+ unsigned long size;
+ struct page *page;
+ pgoff_t pgoff;
+
+ page = binder_alloc_get_page(alloc, buffer,
+ buffer_offset, &pgoff);
+ size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
+ memset_page(page, pgoff, 0, size);
+ bytes -= size;
+ buffer_offset += size;
+ }
+}
+
/**
* binder_alloc_free_buf() - free a binder buffer
* @alloc: binder_alloc for this proc
@@ -1137,68 +1197,6 @@ static inline bool check_buffer(struct binder_alloc *alloc,
(!buffer->allow_user_free || !buffer->transaction);
}
-/**
- * binder_alloc_get_page() - get kernel pointer for given buffer offset
- * @alloc: binder_alloc for this proc
- * @buffer: binder buffer to be accessed
- * @buffer_offset: offset into @buffer data
- * @pgoffp: address to copy final page offset to
- *
- * Lookup the struct page corresponding to the address
- * at @buffer_offset into @buffer->user_data. If @pgoffp is not
- * NULL, the byte-offset into the page is written there.
- *
- * The caller is responsible to ensure that the offset points
- * to a valid address within the @buffer and that @buffer is
- * not freeable by the user. Since it can't be freed, we are
- * guaranteed that the corresponding elements of @alloc->pages[]
- * cannot change.
- *
- * Return: struct page
- */
-static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
- struct binder_buffer *buffer,
- binder_size_t buffer_offset,
- pgoff_t *pgoffp)
-{
- binder_size_t buffer_space_offset = buffer_offset +
- (buffer->user_data - alloc->buffer);
- pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
- size_t index = buffer_space_offset >> PAGE_SHIFT;
- struct binder_lru_page *lru_page;
-
- lru_page = &alloc->pages[index];
- *pgoffp = pgoff;
- return lru_page->page_ptr;
-}
-
-/**
- * binder_alloc_clear_buf() - zero out buffer
- * @alloc: binder_alloc for this proc
- * @buffer: binder buffer to be cleared
- *
- * memset the given buffer to 0
- */
-static void binder_alloc_clear_buf(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
-{
- size_t bytes = binder_alloc_buffer_size(alloc, buffer);
- binder_size_t buffer_offset = 0;
-
- while (bytes) {
- unsigned long size;
- struct page *page;
- pgoff_t pgoff;
-
- page = binder_alloc_get_page(alloc, buffer,
- buffer_offset, &pgoff);
- size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
- memset_page(page, pgoff, 0, size);
- bytes -= size;
- buffer_offset += size;
- }
-}
-
/**
* binder_alloc_copy_user_to_buffer() - copy src user to tgt user
* @alloc: binder_alloc for this proc
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 15/21] binder: relocate binder_alloc_clear_buf()
2023-11-02 18:59 ` [PATCH 15/21] binder: relocate binder_alloc_clear_buf() Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Move this function up along with binder_alloc_get_page() so that their
> prototypes aren't necessary.
>
> No functional change in this patch.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
I used diff to compare them, and the diff was empty.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 16/21] binder: refactor page range allocation
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (14 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 15/21] binder: relocate binder_alloc_clear_buf() Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 17/21] binder: malloc new_buffer outside of locks Carlos Llamas
` (4 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Instead of looping through the page range twice to first determine if
the mmap lock is required, simply do it per-page as needed. Split out
all this logic into a separate binder_get_user_page_remote() function.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 107 +++++++++++++++------------------
1 file changed, 47 insertions(+), 60 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 4821a29799c8..56936430954f 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -203,14 +203,51 @@ static void binder_free_page_range(struct binder_alloc *alloc,
}
}
+static int binder_get_user_page_remote(struct binder_alloc *alloc,
+ struct binder_lru_page *lru_page,
+ unsigned long addr)
+{
+ struct page *page;
+ int ret = 0;
+
+ if (!mmget_not_zero(alloc->mm))
+ return -ESRCH;
+
+ mmap_write_lock(alloc->mm);
+ if (!alloc->vma) {
+ pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
+ ret = -ESRCH;
+ goto out;
+ }
+
+ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+ if (!page) {
+ pr_err("%d: failed to allocate page\n", alloc->pid);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = vm_insert_page(alloc->vma, addr, page);
+ if (ret) {
+ pr_err("%d: %s failed to insert page at %lx with %d\n",
+ alloc->pid, __func__, addr, ret);
+ __free_page(page);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ lru_page->page_ptr = page;
+out:
+ mmap_write_unlock(alloc->mm);
+ mmput_async(alloc->mm);
+ return ret;
+}
+
static int binder_allocate_page_range(struct binder_alloc *alloc,
unsigned long start, unsigned long end)
{
- struct vm_area_struct *vma = NULL;
struct binder_lru_page *page;
- struct mm_struct *mm = NULL;
unsigned long page_addr;
- bool need_mm = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: allocate pages %lx-%lx\n",
@@ -222,32 +259,9 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
trace_binder_update_page_range(alloc, true, start, end);
for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
- page = &alloc->pages[(page_addr - alloc->buffer) / PAGE_SIZE];
- if (!page->page_ptr) {
- need_mm = true;
- break;
- }
- }
-
- if (need_mm && mmget_not_zero(alloc->mm))
- mm = alloc->mm;
-
- if (mm) {
- mmap_write_lock(mm);
- vma = alloc->vma;
- }
-
- if (!vma && need_mm) {
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "%d: binder_alloc_buf failed to map pages in userspace, no vma\n",
- alloc->pid);
- goto err_no_vma;
- }
-
- for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
- int ret;
+ unsigned long index;
bool on_lru;
- size_t index;
+ int ret;
index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];
@@ -262,26 +276,15 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
continue;
}
- if (WARN_ON(!vma))
- goto err_page_ptr_cleared;
-
trace_binder_alloc_page_start(alloc, index);
- page->page_ptr = alloc_page(GFP_KERNEL |
- __GFP_HIGHMEM |
- __GFP_ZERO);
- if (!page->page_ptr) {
- pr_err("%d: binder_alloc_buf failed for page at %lx\n",
- alloc->pid, page_addr);
- goto err_alloc_page_failed;
- }
+
page->alloc = alloc;
INIT_LIST_HEAD(&page->lru);
- ret = vm_insert_page(vma, page_addr, page->page_ptr);
+ ret = binder_get_user_page_remote(alloc, page, page_addr);
if (ret) {
- pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n",
- alloc->pid, page_addr);
- goto err_vm_insert_page_failed;
+ binder_free_page_range(alloc, start, page_addr);
+ return ret;
}
if (index + 1 > alloc->pages_high)
@@ -289,24 +292,8 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
trace_binder_alloc_page_end(alloc, index);
}
- if (mm) {
- mmap_write_unlock(mm);
- mmput_async(mm);
- }
- return 0;
-err_vm_insert_page_failed:
- __free_page(page->page_ptr);
- page->page_ptr = NULL;
-err_alloc_page_failed:
-err_page_ptr_cleared:
- binder_free_page_range(alloc, start, page_addr);
-err_no_vma:
- if (mm) {
- mmap_write_unlock(mm);
- mmput_async(mm);
- }
- return vma ? -ENOMEM : -ESRCH;
+ return 0;
}
static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 16/21] binder: refactor page range allocation
2023-11-02 18:59 ` [PATCH 16/21] binder: refactor page range allocation Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:19 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Instead of looping through the page range twice to first determine if
> the mmap lock is required, simply do it per-page as needed. Split out
> all this logic into a separate binder_get_user_page_remote() function.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
One nit, otherwise looks okay to me.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Carlos Llamas <cmllamas@google.com> writes:
> + unsigned long index;
> - size_t index;
You're changing the type of index to unsigned long here, but it seems to
me that size_t is the correct type.
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 16/21] binder: refactor page range allocation
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:19 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:19 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:35AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > Instead of looping through the page range twice to first determine if
> > the mmap lock is required, simply do it per-page as needed. Split out
> > all this logic into a separate binder_get_user_page_remote() function.
> >
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> One nit, otherwise looks okay to me.
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>
>
> Carlos Llamas <cmllamas@google.com> writes:
> > + unsigned long index;
> > - size_t index;
>
> You're changing the type of index to unsigned long here, but it seems to
> me that size_t is the correct type.
Perhaps size_t made sense before the addresses were also moved to
unsigned long. I think this is a better fit now.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 17/21] binder: malloc new_buffer outside of locks
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (15 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 16/21] binder: refactor page range allocation Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 18/21] binder: initialize lru pages in mmap callback Carlos Llamas
` (3 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Preallocate new_buffer before acquiring the alloc->mutex and hand it
down to binder_alloc_new_buf_locked(). The new buffer will be used in
the vast majority of requests (measured at 98.2% in field data). The
buffer is discarded otherwise. This change is required in preparation
for transitioning alloc->mutex into a spinlock in subsequent commits.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 38 +++++++++++++++++-----------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 56936430954f..da6c62567ffb 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -400,17 +400,19 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc)
return false;
}
+/* Callers preallocate @new_buffer, it is freed by this function if unused */
static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
+ struct binder_buffer *new_buffer,
size_t size,
int is_async)
{
struct rb_node *n = alloc->free_buffers.rb_node;
- struct binder_buffer *buffer;
- size_t buffer_size;
struct rb_node *best_fit = NULL;
+ struct binder_buffer *buffer;
unsigned long has_page_addr;
unsigned long end_page_addr;
+ size_t buffer_size;
int ret;
if (is_async &&
@@ -461,22 +463,17 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
end_page_addr = has_page_addr;
ret = binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
end_page_addr);
- if (ret)
- return ERR_PTR(ret);
+ if (ret) {
+ buffer = ERR_PTR(ret);
+ goto out;
+ }
if (buffer_size != size) {
- struct binder_buffer *new_buffer;
-
- new_buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
- if (!new_buffer) {
- pr_err("%s: %d failed to alloc new buffer struct\n",
- __func__, alloc->pid);
- goto err_alloc_buf_struct_failed;
- }
new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
new_buffer->free = 1;
binder_insert_free_buffer(alloc, new_buffer);
+ new_buffer = NULL;
}
rb_erase(best_fit, &alloc->free_buffers);
@@ -497,12 +494,10 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->oneway_spam_suspect = true;
}
+out:
+ /* discard possibly unused new_buffer */
+ kfree(new_buffer);
return buffer;
-
-err_alloc_buf_struct_failed:
- binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
- return ERR_PTR(-ENOMEM);
}
/**
@@ -526,7 +521,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t extra_buffers_size,
int is_async)
{
- struct binder_buffer *buffer;
+ struct binder_buffer *buffer, *next;
size_t size;
/* Check binder_alloc is fully initialized */
@@ -551,11 +546,16 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
return ERR_PTR(-EINVAL);
}
+ /* preallocate the next buffer */
+ next = kzalloc(sizeof(*next), GFP_KERNEL);
+ if (!next)
+ return ERR_PTR(-ENOMEM);
+
/* Pad 0-size buffers so they get assigned unique addresses */
size = max(size, sizeof(void *));
mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, size, is_async);
+ buffer = binder_alloc_new_buf_locked(alloc, next, size, is_async);
mutex_unlock(&alloc->mutex);
if (IS_ERR(buffer))
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 17/21] binder: malloc new_buffer outside of locks
2023-11-02 18:59 ` [PATCH 17/21] binder: malloc new_buffer outside of locks Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:20 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Preallocate new_buffer before acquiring the alloc->mutex and hand it
> down to binder_alloc_new_buf_locked(). The new buffer will be used in
> the vast majority of requests (measured at 98.2% in field data). The
> buffer is discarded otherwise. This change is required in preparation
> for transitioning alloc->mutex into a spinlock in subsequent commits.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
You also need to free the new buffer here:
if (unlikely(!best_fit)) {
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf size %zd failed, no address space\n",
alloc->pid, size);
debug_no_space_locked(alloc);
return ERR_PTR(-ENOSPC);
}
Other than the above, this looks correct to me.
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 17/21] binder: malloc new_buffer outside of locks
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:20 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:20 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:38AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > Preallocate new_buffer before acquiring the alloc->mutex and hand it
> > down to binder_alloc_new_buf_locked(). The new buffer will be used in
> > the vast majority of requests (measured at 98.2% in field data). The
> > buffer is discarded otherwise. This change is required in preparation
> > for transitioning alloc->mutex into a spinlock in subsequent commits.
> >
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> You also need to free the new buffer here:
>
> if (unlikely(!best_fit)) {
> binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
> "%d: binder_alloc_buf size %zd failed, no address space\n",
> alloc->pid, size);
> debug_no_space_locked(alloc);
> return ERR_PTR(-ENOSPC);
> }
Ouch! this is true and there is a second instance that needs the kfree
as well. Thanks for catching it.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 18/21] binder: initialize lru pages in mmap callback
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (16 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 17/21] binder: malloc new_buffer outside of locks Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 19/21] binder: perform page allocation outside of locks Carlos Llamas
` (2 subsequent siblings)
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Rather than repeatedly initializing some of the binder_lru_page members
during binder_alloc_new_buf(), perform this initialization just once in
binder_alloc_mmap_handler(), after the pages have been created.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index da6c62567ffb..7e0af1786b20 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -278,9 +278,6 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
trace_binder_alloc_page_start(alloc, index);
- page->alloc = alloc;
- INIT_LIST_HEAD(&page->lru);
-
ret = binder_get_user_page_remote(alloc, page, page_addr);
if (ret) {
binder_free_page_range(alloc, start, page_addr);
@@ -791,9 +788,9 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
int binder_alloc_mmap_handler(struct binder_alloc *alloc,
struct vm_area_struct *vma)
{
- int ret;
- const char *failure_string;
struct binder_buffer *buffer;
+ const char *failure_string;
+ int ret, i;
if (unlikely(vma->vm_mm != alloc->mm)) {
ret = -EINVAL;
@@ -822,6 +819,11 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
goto err_alloc_pages_failed;
}
+ for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+ alloc->pages[i].alloc = alloc;
+ INIT_LIST_HEAD(&alloc->pages[i].lru);
+ }
+
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
if (!buffer) {
ret = -ENOMEM;
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 18/21] binder: initialize lru pages in mmap callback
2023-11-02 18:59 ` [PATCH 18/21] binder: initialize lru pages in mmap callback Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
0 siblings, 0 replies; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Rather than repeatedly initializing some of the binder_lru_page members
> during binder_alloc_new_buf(), perform this initialization just once in
> binder_alloc_mmap_handler(), after the pages have been created.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 19/21] binder: perform page allocation outside of locks
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (17 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 18/21] binder: initialize lru pages in mmap callback Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 20/21] binder: reverse locking order in shrinker callback Carlos Llamas
2023-11-02 18:59 ` [PATCH 21/21] binder: switch alloc->mutex to spinlock_t Carlos Llamas
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
Split out the insertion of pages to be done outside of the alloc->mutex
in a separate binder_get_pages_range() routine. Since this is no longer
serialized with other requests we need to make sure we look at the full
range of pages for this buffer, including those shared with neighboring
buffers. The insertion of pages into the vma is still serialized with
the mmap write lock.
Besides avoiding unnecessary nested locking this helps in preparation of
switching the alloc->mutex into a spinlock_t in subsequent patches.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 85 ++++++++++++++++++++++------------
1 file changed, 56 insertions(+), 29 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 7e0af1786b20..e739be7f2dd4 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -194,6 +194,9 @@ static void binder_free_page_range(struct binder_alloc *alloc,
index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];
+ if (!page->page_ptr)
+ continue;
+
trace_binder_free_lru_start(alloc, index);
ret = list_lru_add(&binder_alloc_lru, &page->lru);
@@ -214,6 +217,9 @@ static int binder_get_user_page_remote(struct binder_alloc *alloc,
return -ESRCH;
mmap_write_lock(alloc->mm);
+ if (lru_page->page_ptr)
+ goto out;
+
if (!alloc->vma) {
pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
ret = -ESRCH;
@@ -236,32 +242,64 @@ static int binder_get_user_page_remote(struct binder_alloc *alloc,
goto out;
}
- lru_page->page_ptr = page;
+ /* mark page insertion complete and safe to acquire */
+ smp_store_release(&lru_page->page_ptr, page);
out:
mmap_write_unlock(alloc->mm);
mmput_async(alloc->mm);
return ret;
}
-static int binder_allocate_page_range(struct binder_alloc *alloc,
- unsigned long start, unsigned long end)
+/* The range of pages should include those shared with other buffers */
+static int binder_get_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
{
struct binder_lru_page *page;
unsigned long page_addr;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: allocate pages %lx-%lx\n",
+ "%d: get pages %lx-%lx\n",
alloc->pid, start, end);
- if (end <= start)
- return 0;
+ for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
+ unsigned long index;
+ int ret;
+
+ index = (page_addr - alloc->buffer) / PAGE_SIZE;
+ page = &alloc->pages[index];
+
+ /* check if page insertion is marked complete by release */
+ if (smp_load_acquire(&page->page_ptr))
+ continue;
+
+ trace_binder_alloc_page_start(alloc, index);
+
+ ret = binder_get_user_page_remote(alloc, page, page_addr);
+ if (ret)
+ return ret;
+
+ trace_binder_alloc_page_end(alloc, index);
+ }
+
+ return 0;
+}
+
+/* The range of pages should exclude those shared with other buffers */
+static void binder_allocate_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
+{
+ struct binder_lru_page *page;
+ unsigned long page_addr;
+
+ binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
+ "%d: allocate pages %lx-%lx\n",
+ alloc->pid, start, end);
trace_binder_update_page_range(alloc, true, start, end);
for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
unsigned long index;
bool on_lru;
- int ret;
index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];
@@ -276,21 +314,9 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
continue;
}
- trace_binder_alloc_page_start(alloc, index);
-
- ret = binder_get_user_page_remote(alloc, page, page_addr);
- if (ret) {
- binder_free_page_range(alloc, start, page_addr);
- return ret;
- }
-
if (index + 1 > alloc->pages_high)
alloc->pages_high = index + 1;
-
- trace_binder_alloc_page_end(alloc, index);
}
-
- return 0;
}
static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
@@ -410,7 +436,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
unsigned long has_page_addr;
unsigned long end_page_addr;
size_t buffer_size;
- int ret;
if (is_async &&
alloc->free_async_space < size + sizeof(struct binder_buffer)) {
@@ -453,18 +478,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
"%d: binder_alloc_buf size %zd got buffer %pK size %zd\n",
alloc->pid, size, buffer, buffer_size);
- has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
WARN_ON(n && buffer_size != size);
+
+ has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
end_page_addr = PAGE_ALIGN(buffer->user_data + size);
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
- ret = binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
- if (ret) {
- buffer = ERR_PTR(ret);
- goto out;
- }
-
+ binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
if (buffer_size != size) {
new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
@@ -491,7 +512,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->oneway_spam_suspect = true;
}
-out:
/* discard possibly unused new_buffer */
kfree(new_buffer);
return buffer;
@@ -520,6 +540,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
{
struct binder_buffer *buffer, *next;
size_t size;
+ int ret;
/* Check binder_alloc is fully initialized */
if (!binder_alloc_get_vma(alloc)) {
@@ -564,6 +585,12 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
buffer->extra_buffers_size = extra_buffers_size;
buffer->pid = current->tgid;
+ ret = binder_get_page_range(alloc, buffer->user_data & PAGE_MASK,
+ PAGE_ALIGN(buffer->user_data + size));
+ if (ret) {
+ binder_alloc_free_buf(alloc, buffer);
+ buffer = ERR_PTR(ret);
+ }
out:
return buffer;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 19/21] binder: perform page allocation outside of locks
2023-11-02 18:59 ` [PATCH 19/21] binder: perform page allocation outside of locks Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:39 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> Split out the insertion of pages to be done outside of the alloc->mutex
> in a separate binder_get_pages_range() routine. Since this is no longer
> serialized with other requests we need to make sure we look at the full
> range of pages for this buffer, including those shared with neighboring
> buffers. The insertion of pages into the vma is still serialized with
> the mmap write lock.
>
> Besides avoiding unnecessary nested locking this helps in preparation of
> switching the alloc->mutex into a spinlock_t in subsequent patches.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
This is rather complex, so it could use more comments. However, as far
as I can tell, the change is correct.
> +/* The range of pages should include those shared with other buffers */
> +static int binder_get_page_range(struct binder_alloc *alloc,
> + unsigned long start, unsigned long end)
> [snip]
> +/* The range of pages should exclude those shared with other buffers */
> +static void binder_allocate_page_range(struct binder_alloc *alloc,
> + unsigned long start, unsigned long end)
I would really like a comment on each function explaining that:
* The binder_allocate_page_range function ensures that existing pages
will not be reclaimed by the shrinker.
* The binder_get_page_range function ensures that missing pages are
allocated and inserted.
I also don't think the names are great. The function with "allocate" in
its name is the one that doesn't perform any allocations.
> mmap_write_lock(alloc->mm);
> + if (lru_page->page_ptr)
> + goto out;
Another comment that I'd like to see somewhere is one that says
something along these lines:
Multiple processes may call `binder_get_user_page_remote` on the
same page in parallel. When this happens, one of them will allocate
the page and insert it, and the other process will use the mmap
write lock to wait for the insertion to complete. This means that we
can't use a mmap read lock here.
> + /* mark page insertion complete and safe to acquire */
> + smp_store_release(&lru_page->page_ptr, page);
> [snip]
> + /* check if page insertion is marked complete by release */
> + if (smp_load_acquire(&page->page_ptr))
> + continue;
We already discussed this when I asked you to make this an acquire /
release operation so that it isn't racy, but it could use a comment
explaining its purpose.
> mmap_write_lock(alloc->mm);
> + if (lru_page->page_ptr)
> + goto out;
> +
> if (!alloc->vma) {
> pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> ret = -ESRCH;
> goto out;
> }
>
> page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> if (!page) {
> pr_err("%d: failed to allocate page\n", alloc->pid);
> ret = -ENOMEM;
> goto out;
> }
Maybe it would be worth to allocate the page before taking the mmap
write lock? It has the disadvantage that you may have to immediately
deallocate it if we trigger the `if (lru_page->page_ptr) goto out`
branch, but that shouldn't happen that often, and it would reduce the
amount of time we spend holding the mmap write lock.
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 19/21] binder: perform page allocation outside of locks
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:39 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:39 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:43AM +0000, Alice Ryhl wrote:
> I would really like a comment on each function explaining that:
>
> * The binder_allocate_page_range function ensures that existing pages
> will not be reclaimed by the shrinker.
> * The binder_get_page_range function ensures that missing pages are
> allocated and inserted.
Ok, I think I rather go for a better naming than compensating through
comments, so I came up with the following names:
- binder_lru_freelist_{add,del}()
- binder_install_buffer_pages()
There will be more details in the v2. The new names give a clear
separation of the scope of these function.
> > mmap_write_lock(alloc->mm);
> > + if (lru_page->page_ptr)
> > + goto out;
>
> Another comment that I'd like to see somewhere is one that says
> something along these lines:
>
> Multiple processes may call `binder_get_user_page_remote` on the
> same page in parallel. When this happens, one of them will allocate
> the page and insert it, and the other process will use the mmap
> write lock to wait for the insertion to complete. This means that we
> can't use a mmap read lock here.
>
I've added a shorter version of this to v2, thanks.
> > + /* mark page insertion complete and safe to acquire */
> > + smp_store_release(&lru_page->page_ptr, page);
> > [snip]
> > + /* check if page insertion is marked complete by release */
> > + if (smp_load_acquire(&page->page_ptr))
> > + continue;
>
> We already discussed this when I asked you to make this an acquire /
> release operation so that it isn't racy, but it could use a comment
> explaining its purpose.
I've wrapped these calls into inline functions with better names in v2.
The purpose should now be evident.
>
> > mmap_write_lock(alloc->mm);
> > + if (lru_page->page_ptr)
> > + goto out;
> > +
> > if (!alloc->vma) {
> > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > ret = -ESRCH;
> > goto out;
> > }
> >
> > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> > if (!page) {
> > pr_err("%d: failed to allocate page\n", alloc->pid);
> > ret = -ENOMEM;
> > goto out;
> > }
>
> Maybe it would be worth to allocate the page before taking the mmap
> write lock? It has the disadvantage that you may have to immediately
> deallocate it if we trigger the `if (lru_page->page_ptr) goto out`
> branch, but that shouldn't happen that often, and it would reduce the
> amount of time we spend holding the mmap write lock.
If we sleep on alloc_page() then chances are that having other tasks
allocating more pages could create more memory pressure. In some cases
this would be unecessary (e.g. if it's the same page). I do think this
could happen often since buffer requests tend to be < PAGE_SIZE and
adjecent too. I'll look into this with more detail and send a follow up
patch if needed. Thanks!
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 20/21] binder: reverse locking order in shrinker callback
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (18 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 19/21] binder: perform page allocation outside of locks Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
2023-11-02 18:59 ` [PATCH 21/21] binder: switch alloc->mutex to spinlock_t Carlos Llamas
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
The locking order currently requires the alloc->mutex to be acquired
first followed by the mmap lock. However, the alloc->mutex is converted
into a spinlock in subsequent commits so the order needs to be reversed
to avoid nesting the sleeping mmap lock under the spinlock.
The shrinker's callback binder_alloc_free_page() is the only place that
needs to be reordered since other functions have been refactored and no
longer nest these locks.
Some minor cosmetic changes are also included in this patch.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 44 ++++++++++++++++------------------
1 file changed, 20 insertions(+), 24 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e739be7f2dd4..0ba9f524e0ff 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1065,35 +1065,38 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
void *cb_arg)
__must_hold(lock)
{
- struct mm_struct *mm = NULL;
- struct binder_lru_page *page = container_of(item,
- struct binder_lru_page,
- lru);
- struct binder_alloc *alloc;
+ struct binder_lru_page *page = container_of(item, typeof(*page), lru);
+ struct binder_alloc *alloc = page->alloc;
+ struct mm_struct *mm = alloc->mm;
+ struct vm_area_struct *vma;
unsigned long page_addr;
size_t index;
- struct vm_area_struct *vma;
- alloc = page->alloc;
+ if (!mmget_not_zero(mm))
+ goto err_mmget;
+ if (!mmap_read_trylock(mm))
+ goto err_mmap_read_lock_failed;
if (!mutex_trylock(&alloc->mutex))
goto err_get_alloc_mutex_failed;
-
if (!page->page_ptr)
goto err_page_already_freed;
index = page - alloc->pages;
page_addr = alloc->buffer + index * PAGE_SIZE;
- mm = alloc->mm;
- if (!mmget_not_zero(mm))
- goto err_mmget;
- if (!mmap_read_trylock(mm))
- goto err_mmap_read_lock_failed;
vma = vma_lookup(mm, page_addr);
if (vma && vma != binder_alloc_get_vma(alloc))
goto err_invalid_vma;
+ trace_binder_unmap_kernel_start(alloc, index);
+
+ __free_page(page->page_ptr);
+ page->page_ptr = NULL;
+
+ trace_binder_unmap_kernel_end(alloc, index);
+
list_lru_isolate(lru, item);
+ mutex_unlock(&alloc->mutex);
spin_unlock(lock);
if (vma) {
@@ -1103,28 +1106,21 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
trace_binder_unmap_user_end(alloc, index);
}
+
mmap_read_unlock(mm);
mmput_async(mm);
- trace_binder_unmap_kernel_start(alloc, index);
-
- __free_page(page->page_ptr);
- page->page_ptr = NULL;
-
- trace_binder_unmap_kernel_end(alloc, index);
-
spin_lock(lock);
- mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;
err_invalid_vma:
+err_page_already_freed:
+ mutex_unlock(&alloc->mutex);
+err_get_alloc_mutex_failed:
mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
-err_page_already_freed:
- mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
return LRU_SKIP;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 20/21] binder: reverse locking order in shrinker callback
2023-11-02 18:59 ` [PATCH 20/21] binder: reverse locking order in shrinker callback Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:42 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> + trace_binder_unmap_kernel_start(alloc, index);
> +
> + __free_page(page->page_ptr);
> + page->page_ptr = NULL;
> +
> + trace_binder_unmap_kernel_end(alloc, index);
> +
> list_lru_isolate(lru, item);
> + mutex_unlock(&alloc->mutex);
> spin_unlock(lock);
>
> if (vma) {
> trace_binder_unmap_user_start(alloc, index);
>
> zap_page_range_single(vma, page_addr, PAGE_SIZE, NULL);
>
> trace_binder_unmap_user_end(alloc, index);
> }
> +
> mmap_read_unlock(mm);
> mmput_async(mm);
>
> - trace_binder_unmap_kernel_start(alloc, index);
> -
> - __free_page(page->page_ptr);
> - page->page_ptr = NULL;
> -
> - trace_binder_unmap_kernel_end(alloc, index);
This reverses the order so that __free_page is called before
zap_page_range_single. Is that okay?
Another option would be to do something along these lines:
page_to_free = page->page_ptr;
page->page_ptr = NULL;
[snip]
mmap_read_unlock(mm);
mmput_async(mm);
__free_page(page_to_free);
This way you wouldn't reverse the order. Also, you reduce the amount of
time you spend holding the mmap read lock, since the page is freed
without holding the lock.
Other than the above, this LGTM.
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [PATCH 20/21] binder: reverse locking order in shrinker callback
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:42 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:42 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:46AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> This reverses the order so that __free_page is called before
> zap_page_range_single. Is that okay?
It is not OK. Userspace would still have access to the page between
__free_page() and zap_page() calls. This is bad, so thanks for catching
this.
> Another option would be to do something along these lines:
>
> page_to_free = page->page_ptr;
> page->page_ptr = NULL;
>
> [snip]
>
> mmap_read_unlock(mm);
> mmput_async(mm);
> __free_page(page_to_free);
>
> This way you wouldn't reverse the order. Also, you reduce the amount of
> time you spend holding the mmap read lock, since the page is freed
> without holding the lock.
>
Thanks, I've added this approach to v2.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 21/21] binder: switch alloc->mutex to spinlock_t
2023-11-02 18:59 [PATCH 00/21] binder: convert alloc->mutex to spinlock Carlos Llamas
` (19 preceding siblings ...)
2023-11-02 18:59 ` [PATCH 20/21] binder: reverse locking order in shrinker callback Carlos Llamas
@ 2023-11-02 18:59 ` Carlos Llamas
2023-11-07 9:08 ` Alice Ryhl
20 siblings, 1 reply; 57+ messages in thread
From: Carlos Llamas @ 2023-11-02 18:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
Suren Baghdasaryan
Cc: linux-kernel, kernel-team
The alloc->mutex is a highly contended lock that causes performance
issues on Android devices. When a low-priority task is given this lock
and it sleeps, it becomes difficult for the task to wakeup and complete
its work. This delays other tasks that are also waiting on the mutex.
The problem gets worse when there is memory pressure in the system,
because this increases the contention on the alloc->mutex while the
shrinker reclaims binder pages.
Switching to a spinlock helps to keep the waiters running and avoids the
overhead of waking up tasks. This significantly improves the transaction
latency when the problematic scenario occurs.
The performance impact of this patchset was measured by stress-testing
the binder alloc contention. In this test, several clients of different
priorities send thousands of transactions of different sizes to a single
server. In parallel, pages get reclaimed using the shinker's debugfs.
The test was run on a Pixel 8, Pixel 6 and qemu machine. The results
were similar on all three devices:
after:
| sched | prio | average | max | min |
|--------+------+---------+-----------+---------|
| fifo | 99 | 0.135ms | 1.197ms | 0.022ms |
| fifo | 01 | 0.136ms | 5.232ms | 0.018ms |
| other | -20 | 0.180ms | 7.403ms | 0.019ms |
| other | 19 | 0.241ms | 58.094ms | 0.018ms |
before:
| sched | prio | average | max | min |
|--------+------+---------+-----------+---------|
| fifo | 99 | 0.350ms | 248.730ms | 0.020ms |
| fifo | 01 | 0.357ms | 248.817ms | 0.024ms |
| other | -20 | 0.399ms | 249.906ms | 0.020ms |
| other | 19 | 0.477ms | 297.756ms | 0.022ms |
The key metrics above are the average and max latencies (wall time).
These improvements should roughly translate to p95-p99 latencies on real
workloads. The response time is up to 200x faster in these scenarios and
there is no penalty in the regular path.
Note that it is only possible to convert this lock after a series of
changes made by previous patches. These mainly include refactoring the
sections that might_sleep() and changing the locking order with the
mmap_lock amongst others.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
drivers/android/binder_alloc.c | 44 +++++++++++++++++-----------------
drivers/android/binder_alloc.h | 10 ++++----
2 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 0ba9f524e0ff..a8a3e91ba308 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -169,9 +169,9 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
{
struct binder_buffer *buffer;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
buffer = binder_alloc_prepare_to_free_locked(alloc, user_ptr);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return buffer;
}
@@ -572,9 +572,9 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
/* Pad 0-size buffers so they get assigned unique addresses */
size = max(size, sizeof(void *));
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
buffer = binder_alloc_new_buf_locked(alloc, next, size, is_async);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
if (IS_ERR(buffer))
goto out;
@@ -786,17 +786,17 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
* We could eliminate the call to binder_alloc_clear_buf()
* from binder_alloc_deferred_release() by moving this to
* binder_free_buf_locked(). However, that could
- * increase contention for the alloc mutex if clear_on_free
- * is used frequently for large buffers. The mutex is not
+ * increase contention for the alloc->lock if clear_on_free
+ * is used frequently for large buffers. This lock is not
* needed for correctness here.
*/
if (buffer->clear_on_free) {
binder_alloc_clear_buf(alloc, buffer);
buffer->clear_on_free = false;
}
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
binder_free_buf_locked(alloc, buffer, true);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
}
/**
@@ -894,7 +894,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
struct binder_buffer *buffer;
buffers = 0;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
BUG_ON(alloc->vma);
while ((n = rb_first(&alloc->allocated_buffers))) {
@@ -944,7 +944,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
}
kfree(alloc->pages);
}
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
if (alloc->mm)
mmdrop(alloc->mm);
@@ -976,11 +976,11 @@ void binder_alloc_print_allocated(struct seq_file *m,
{
struct rb_node *n;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n))
print_binder_buffer(m, " buffer",
rb_entry(n, struct binder_buffer, rb_node));
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
}
/**
@@ -997,7 +997,7 @@ void binder_alloc_print_pages(struct seq_file *m,
int lru = 0;
int free = 0;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
/*
* Make sure the binder_alloc is fully initialized, otherwise we might
* read inconsistent state.
@@ -1013,7 +1013,7 @@ void binder_alloc_print_pages(struct seq_file *m,
lru++;
}
}
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
seq_printf(m, " pages: %d:%d:%d\n", active, lru, free);
seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);
}
@@ -1029,10 +1029,10 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
struct rb_node *n;
int count = 0;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n))
count++;
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return count;
}
@@ -1076,8 +1076,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- if (!mutex_trylock(&alloc->mutex))
- goto err_get_alloc_mutex_failed;
+ if (!spin_trylock(&alloc->lock))
+ goto err_get_alloc_lock_failed;
if (!page->page_ptr)
goto err_page_already_freed;
@@ -1096,7 +1096,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
trace_binder_unmap_kernel_end(alloc, index);
list_lru_isolate(lru, item);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
spin_unlock(lock);
if (vma) {
@@ -1115,8 +1115,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
err_invalid_vma:
err_page_already_freed:
- mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
+ spin_unlock(&alloc->lock);
+err_get_alloc_lock_failed:
mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
@@ -1155,7 +1155,7 @@ void binder_alloc_init(struct binder_alloc *alloc)
alloc->pid = current->group_leader->pid;
alloc->mm = current->mm;
mmgrab(alloc->mm);
- mutex_init(&alloc->mutex);
+ spin_lock_init(&alloc->lock);
INIT_LIST_HEAD(&alloc->buffers);
}
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index bbc16bc6d5ac..3602a60d9609 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -9,7 +9,7 @@
#include <linux/rbtree.h>
#include <linux/list.h>
#include <linux/mm.h>
-#include <linux/rtmutex.h>
+#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <linux/slab.h>
#include <linux/list_lru.h>
@@ -72,7 +72,7 @@ struct binder_lru_page {
/**
* struct binder_alloc - per-binder proc state for binder allocator
- * @mutex: protects binder_alloc fields
+ * @lock: protects binder_alloc fields
* @vma: vm_area_struct passed to mmap_handler
* (invariant after mmap)
* @mm: copy of task->mm (invariant after open)
@@ -96,7 +96,7 @@ struct binder_lru_page {
* struct binder_buffer objects used to track the user buffers
*/
struct binder_alloc {
- struct mutex mutex;
+ spinlock_t lock;
struct vm_area_struct *vma;
struct mm_struct *mm;
unsigned long buffer;
@@ -153,9 +153,9 @@ binder_alloc_get_free_async_space(struct binder_alloc *alloc)
{
size_t free_async_space;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
free_async_space = alloc->free_async_space;
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return free_async_space;
}
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 57+ messages in thread* Re: [PATCH 21/21] binder: switch alloc->mutex to spinlock_t
2023-11-02 18:59 ` [PATCH 21/21] binder: switch alloc->mutex to spinlock_t Carlos Llamas
@ 2023-11-07 9:08 ` Alice Ryhl
2023-12-01 7:46 ` Carlos Llamas
0 siblings, 1 reply; 57+ messages in thread
From: Alice Ryhl @ 2023-11-07 9:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
Carlos Llamas <cmllamas@google.com> writes:
> The alloc->mutex is a highly contended lock that causes performance
> issues on Android devices. When a low-priority task is given this lock
> and it sleeps, it becomes difficult for the task to wakeup and complete
> its work. This delays other tasks that are also waiting on the mutex.
Grammar nit: "to wake up"
> The problem gets worse when there is memory pressure in the system,
> because this increases the contention on the alloc->mutex while the
> shrinker reclaims binder pages.
>
> Switching to a spinlock helps to keep the waiters running and avoids the
> overhead of waking up tasks. This significantly improves the transaction
> latency when the problematic scenario occurs.
>
> [snip]
>
> Note that it is only possible to convert this lock after a series of
> changes made by previous patches. These mainly include refactoring the
> sections that might_sleep() and changing the locking order with the
> mmap_lock amongst others.
>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
Nice!
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
> * binder_free_buf_locked(). However, that could
> - * increase contention for the alloc mutex if clear_on_free
> - * is used frequently for large buffers. The mutex is not
> + * increase contention for the alloc->lock if clear_on_free
> + * is used frequently for large buffers. This lock is not
Grammar nit: Shouldn't this say "However, that could increase contention
on alloc->lock if clear_on_free is used frequently for large buffers."?
Alice
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 21/21] binder: switch alloc->mutex to spinlock_t
2023-11-07 9:08 ` Alice Ryhl
@ 2023-12-01 7:46 ` Carlos Llamas
0 siblings, 0 replies; 57+ messages in thread
From: Carlos Llamas @ 2023-12-01 7:46 UTC (permalink / raw)
To: Alice Ryhl
Cc: Arve Hjønnevåg, Christian Brauner, Greg Kroah-Hartman,
Joel Fernandes, kernel-team, linux-kernel, Martijn Coenen,
Suren Baghdasaryan, Todd Kjos
On Tue, Nov 07, 2023 at 09:08:49AM +0000, Alice Ryhl wrote:
> Carlos Llamas <cmllamas@google.com> writes:
> > The alloc->mutex is a highly contended lock that causes performance
> > issues on Android devices. When a low-priority task is given this lock
> > and it sleeps, it becomes difficult for the task to wakeup and complete
> > its work. This delays other tasks that are also waiting on the mutex.
>
> Grammar nit: "to wake up"
OK.
>
> > The problem gets worse when there is memory pressure in the system,
> > because this increases the contention on the alloc->mutex while the
> > shrinker reclaims binder pages.
> >
> > Switching to a spinlock helps to keep the waiters running and avoids the
> > overhead of waking up tasks. This significantly improves the transaction
> > latency when the problematic scenario occurs.
> >
> > [snip]
> >
> > Note that it is only possible to convert this lock after a series of
> > changes made by previous patches. These mainly include refactoring the
> > sections that might_sleep() and changing the locking order with the
> > mmap_lock amongst others.
> >
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
>
> Nice!
>
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>
>
> > * binder_free_buf_locked(). However, that could
> > - * increase contention for the alloc mutex if clear_on_free
> > - * is used frequently for large buffers. The mutex is not
> > + * increase contention for the alloc->lock if clear_on_free
> > + * is used frequently for large buffers. This lock is not
>
> Grammar nit: Shouldn't this say "However, that could increase contention
> on alloc->lock if clear_on_free is used frequently for large buffers."?
Do you mean "contention for" vs "contention on"? They both seem correct
to me but this was written by Todd, so I'd trust his english much more
than mine.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 57+ messages in thread