public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Carlos Llamas <cmllamas@google.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Todd Kjos" <tkjos@android.com>,
	"Martijn Coenen" <maco@android.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Christian Brauner" <brauner@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@android.com,
	"David Hildenbrand" <david@redhat.com>,
	"Barry Song" <v-songbaohua@oppo.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH v2 2/8] binder: concurrent page installation
Date: Thu, 7 Nov 2024 15:31:31 +0000	[thread overview]
Message-ID: <ZyzdUyHr7S_cpsso@google.com> (raw)
In-Reply-To: <CAJuCfpGAr=Tm=JJtnrJ0ipFo2yqYSEVAMtx5aUU-k1F6prjYjQ@mail.gmail.com>

On Thu, Nov 07, 2024 at 07:10:28AM -0800, Suren Baghdasaryan wrote:
> On Wed, Nov 6, 2024 at 8:02 PM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > Allow multiple callers to install pages simultaneously by downgrading
> > the mmap_sem to non-exclusive mode. Races to the same PTE are handled
> > using get_user_pages_remote() to retrieve the already installed page.
> > This method significantly reduces contention in the mmap semaphore.
> >
> > To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid
> > operating on an isolated VMA. In addition, zap_page_range_single() is
> > called under the alloc->mutex to avoid racing with the shrinker.
> >
> > Many thanks to Barry Song who posted a similar approach [1].
> >
> > Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Barry Song <v-songbaohua@oppo.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
> > ---
> >  drivers/android/binder_alloc.c | 64 +++++++++++++++++++++-------------
> >  1 file changed, 40 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index 7241bf4a3ff2..acdc05552603 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -221,26 +221,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> >                                       struct binder_lru_page *lru_page,
> >                                       unsigned long addr)
> >  {
> > +       struct vm_area_struct *vma;
> >         struct page *page;
> > -       int ret = 0;
> > +       long npages;
> > +       int ret;
> >
> >         if (!mmget_not_zero(alloc->mm))
> >                 return -ESRCH;
> >
> > -       /*
> > -        * Protected with mmap_sem in write mode as multiple tasks
> > -        * might race to install the same page.
> > -        */
> > -       mmap_write_lock(alloc->mm);
> > -       if (binder_get_installed_page(lru_page))
> > -               goto out;
> > -
> > -       if (!alloc->vma) {
> > -               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > -               ret = -ESRCH;
> > -               goto out;
> > -       }
> > -
> >         page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> >         if (!page) {
> >                 pr_err("%d: failed to allocate page\n", alloc->pid);
> > @@ -248,19 +236,47 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> >                 goto out;
> >         }
> >
> > -       ret = vm_insert_page(alloc->vma, addr, page);
> > -       if (ret) {
> > -               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> > -                      alloc->pid, __func__, addr - alloc->buffer, ret);
> > +       mmap_read_lock(alloc->mm);
> > +       vma = vma_lookup(alloc->mm, addr);
> > +       if (!vma || vma != alloc->vma) {
> > +               mmap_read_unlock(alloc->mm);
> 
> nit: instead of unlocking here you could have another label before
> mmap_read_unlock() at the end and jump to it.

Sounds good, I'll do this.

> 
> >                 __free_page(page);
> > -               ret = -ENOMEM;
> > +               pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > +               ret = -ESRCH;
> >                 goto out;
> >         }
> >
> > -       /* Mark page installation complete and safe to use */
> > -       binder_set_installed_page(lru_page, page);
> > +       ret = vm_insert_page(vma, addr, page);
> > +       switch (ret) {
> > +       case -EBUSY:
> > +               /*
> > +                * EBUSY is ok. Someone installed the pte first but the
> > +                * lru_page->page_ptr has not been updated yet. Discard
> > +                * our page and look up the one already installed.
> > +                */
> > +               ret = 0;
> > +               __free_page(page);
> > +               npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
> > +               if (npages <= 0) {
> > +                       pr_err("%d: failed to find page at offset %lx\n",
> > +                              alloc->pid, addr - alloc->buffer);
> > +                       ret = -ESRCH;
> > +                       break;
> > +               }
> > +               fallthrough;
> > +       case 0:
> > +               /* Mark page installation complete and safe to use */
> > +               binder_set_installed_page(lru_page, page);
> > +               break;
> > +       default:
> > +               __free_page(page);
> > +               pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> > +                      alloc->pid, __func__, addr - alloc->buffer, ret);
> > +               ret = -ENOMEM;
> 
> vm_insert_page() can return EINVAL (see
> validate_page_before_insert()). Instead of converting other codes into
> ENOMEM why not return "ret" as is?

This is purely historical, binder has always propagated -ENOMEM to
userspace on errors from vm_insert_page() and I'm not sure why.

FWIW, I've dropped the behavior in the last patch and now I just forward
whatever vm_insert_page() returns. I had a look at libbinder code and it
doesn't really make a difference.

Perhaps, I should be explicit about this move and do it in a separate
commit.

Thanks,
--
Carlos Llamas

  reply	other threads:[~2024-11-07 15:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-07  4:02 [PATCH v2 0/8] binder: faster page installations Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 1/8] Revert "binder: switch alloc->mutex to spinlock_t" Carlos Llamas
2024-11-07  8:56   ` Greg Kroah-Hartman
2024-11-07 14:30     ` Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 2/8] binder: concurrent page installation Carlos Llamas
2024-11-07 15:10   ` Suren Baghdasaryan
2024-11-07 15:31     ` Carlos Llamas [this message]
2024-11-07  4:02 ` [PATCH v2 3/8] binder: select correct nid for pages in LRU Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 4/8] binder: remove struct binder_lru_page Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 5/8] binder: use alloc->mapped to save the vma state Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 6/8] binder: remove cached alloc->vma pointer Carlos Llamas
2024-11-07 15:33   ` Suren Baghdasaryan
2024-11-07  4:02 ` [PATCH v2 7/8] binder: rename alloc->buffer to vm_start Carlos Llamas
2024-11-07  4:02 ` [PATCH v2 8/8] binder: use per-vma lock in page installation Carlos Llamas
2024-11-07 16:16   ` Suren Baghdasaryan
2024-11-07 17:55     ` Carlos Llamas
2024-11-07 18:04       ` Suren Baghdasaryan
2024-11-07 18:19         ` Carlos Llamas
2024-11-07 18:27           ` Suren Baghdasaryan
2024-11-07 18:52             ` Suren Baghdasaryan
2024-11-07 19:33               ` Carlos Llamas
2024-11-07 19:40                 ` Carlos Llamas
2024-11-07 16:18 ` [PATCH v2 0/8] binder: faster page installations Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZyzdUyHr7S_cpsso@google.com \
    --to=cmllamas@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=arve@android.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maco@android.com \
    --cc=surenb@google.com \
    --cc=tkjos@android.com \
    --cc=v-songbaohua@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox