From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54B1D1EBFE4 for ; Tue, 3 Sep 2024 22:23:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725402220; cv=none; b=ImfAL7gIXck5LW2zUAQDBvoQg2Az4RJyzfX0J4ZV5A/GuhbIL9AygyZht8QLeas5/4n7hogHGTDBrC85C+htx3zQS8bxLJK6VtUxyrDam8pvE1OaBp99SUJMLf7QZvphb+zQcBxyXK1aQ3gL7yRnfOJD9eWRsyG0H5Wgf9+Xdjo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725402220; c=relaxed/simple; bh=g9uVYzq1UsU2PbcrqmQGTnSNLBXbtYLselYEvmUKEqc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oe9ZElw9jKnKnnL7myCl7JtcePEtImkBDn/QgjKtoZGZ4pTXEMY1YQAHJ5qOzN2P+r7kFhZo7ub+w7w6mPm7qJLRIHEluc422/HxuIp8JwHhhVdZGOXbrKT67eeKu6OqzIaVsNSvz+TYxPzsUKnphatSy0FC+qWozzbbhvT1fS0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NePF8YJa; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NePF8YJa" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2056aa5cefcso33565ad.0 for ; Tue, 03 Sep 2024 15:23:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725402219; x=1726007019; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=NroSIqwal6C0TPK35/SdmH0JtLPdZfaKT8kFABhfjM0=; b=NePF8YJaMg9iAaRfot9sKAF5nfZhHb5IA4e2zr2/19GwmFGbQpt11P7odzl0wb2nVR 7zL/muuPhwkMrscERFr4tO79SfIUssAuM7g14CW5UoMk62M3/e50JX4GYuXlAhTf+6tA MeE8I1vNENnHFa3bG/GUBOskb3JEdjCZrE6AHEmBXQyBXzD+yvWbH69acF2thwHkleHo BXlZOeM+pJvpo+fL07lz9mbj2vGShUf8Sx/7DYF/82K1MGeFF4LMhtBfXLuty1gggvvQ ELYfSJnhGYwicGkCLhABBYLTq/dTXTNjgnLCRVIODd8PElOq9T5ZftjbHqNiumWjCLla arNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725402219; x=1726007019; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NroSIqwal6C0TPK35/SdmH0JtLPdZfaKT8kFABhfjM0=; b=PRlISxzhRIo9ndHTxqsMg0h2E2UXomS8JAOA3zL+aru/mx4cZ2vrusGAOHQXJ7O0IL plXm8oLL445zYdiOEz6JqO17QMbgwtZ96ezTH5xNLqiNbruZQZ0L1+Vi86vijhYyL+Op UeOpmlFwLedgblnD2zo+rZClkr9rwpBD6oMD2lEDoKqHHnMIdpypdxtjwOKOdm+E1R8l tKPS7WLimqXl76Os5Rfv3LpAa6HDJgA4j54aHnVtonurG3QGomzUXLZrX6P2t7UdLBL/ ugLpbKCNGDCvQwX2RhDJ0R3cPNJ1JFnfUx/OZOT9DF8CRhcR4e47q2Lm2svq8oX3CsLM xwaQ== X-Forwarded-Encrypted: i=1; AJvYcCURrazGU6YdYUW60sWWvEkQrqQ6gQsSgXElz2eE8FVyoDXau4nt3ffPB/HFsommpOTr8cLeGmZrVdOzjME=@vger.kernel.org X-Gm-Message-State: AOJu0YyG7VFClM8Nz5WD7jaRNiFdKtY1cAUzhDyF6J+77sOJWrRnIYpn zXGkiDYwjDwMaVD1AHWeLrRplV47pu3L9dD8PRYE6Uz80qFMDV8/+TaxB5cN7Q== X-Google-Smtp-Source: AGHT+IGPK+Sty2egGVVQYW1P+X2yQYVnvIv8jHjlO6wDgkw0UaUcxded/EZRq+TQYfvI+DIfmARohw== X-Received: by 2002:a17:902:f546:b0:206:ae39:9f9 with SMTP id d9443c01a7336-206b58bdcf2mr424035ad.21.1725402218366; Tue, 03 Sep 2024 15:23:38 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d8d38ffff9sm5300801a91.39.2024.09.03.15.23.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Sep 2024 15:23:37 -0700 (PDT) Date: Tue, 3 Sep 2024 22:23:32 +0000 From: Carlos Llamas To: Barry Song <21cnbao@gmail.com> Cc: Hillf Danton , Greg Kroah-Hartman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Suren Baghdasaryan , Michal Hocko , Tangquan Zheng Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration Message-ID: References: <20240902225009.34576-1-21cnbao@gmail.com> <20240903110109.1696-1-hdanton@sina.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Sep 04, 2024 at 09:10:20AM +1200, Barry Song wrote: > However, I'm not entirely convinced that this is a problem :-) Concurrent > allocations like this can occur in many places, especially in PFs. Reclamation > is not useless because it helps free up memory for others; it's not > without value. Yeah, for binder though there is a high chance of multiple callers trying to allocate the _same_ page concurrently. What I observed is that pages being reclaimed "in vain" are often in the same vma and this means subsequent callers will need to allocate these pages. But even if this wasn't an issue, the correct solution would still be to support concurrent faults. So, in reality it doesn't matter and we still need to refactor the call to be non-exclusive. > I also don't believe binder is one of the largest users executing concurrent > allocations. Oh, I have no statistics on this. > Awesome! I’m eager to see your patch, and we’re ready to help with testing. > I strongly recommend dropping the write lock entirely. Using > `mmap_write_lock()` isn’t just a binder-specific concern; it has the > potential to affect the entire Android system. > > In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_ > proposing it for merging since you’re already working on it, but I wanted > to share the idea. (just like patch2, it has only passed build-test) Yeap, I'm using the per-vma-lock too per Suren's suggestion. > > [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock > > To further reduce the read lock duration, let's try using per_vma_lock > first. If that fails, we can take the read lock, similar to how page > fault handlers operate. > > Signed-off-by: Barry Song > --- > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index a2281dfacbbc..b40a5dd650c8 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -221,6 +221,8 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > struct binder_lru_page *lru_page, > unsigned long addr) > { > + struct vm_area_struct *vma; > + bool per_vma_lock = true; > struct page *page; > int ret = 0; > > @@ -235,10 +237,15 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > */ > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > - mmap_read_lock(alloc->mm); > + vma = lock_vma_under_rcu(alloc->mm, addr); > + if (!vma) { > + per_vma_lock = false; > + mmap_read_lock(alloc->mm); > + vma = find_vma(alloc->mm, addr); > + } > > - /* vma might have been dropped or deattached */ > - if (!alloc->vma || !find_vma(alloc->mm, addr)) { > + /* vma might have been dropped, deattached or changed to new one */ > + if (!alloc->vma || !vma || vma != alloc->vma) { > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > ret = -ESRCH; > goto out; > @@ -270,7 +277,10 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > binder_set_installed_page(lru_page, page); > spin_unlock(&alloc->lock); > out: > - mmap_read_unlock(alloc->mm); > + if (per_vma_lock) > + vma_end_read(vma); > + else > + mmap_read_unlock(alloc->mm); > mmput_async(alloc->mm); > if (ret && page) > __free_page(page); > -- > 2.39.3 (Apple Git-146) This looks fairly similar to my patch. However, you would need to handle the case were vm_insert_page() returns -EBUSY (success that raced) and also sync with the shrinker callbacks in binder_alloc_free_page() which is the biggest concern. Let's not duplicate efforts though. Can we please wait for my patch? I'll add you as Co-Developed-by, since you've posted this already? Regards, Carlos Llamas