public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area
       [not found] ` <20230216051750.3125598-22-surenb@google.com>
@ 2023-02-16 15:34   ` Liam R. Howlett
       [not found]     ` <CAJuCfpEkujbHNxNWcWr8bmrsMhXGcpDyraOfQaPAcOH=RQPv5A@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-16 15:34 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team


First, sorry I didn't see this before v3..

* Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> While unmapping VMAs, adjacent VMAs might be able to grow into the area
> being unmapped. In such cases write-lock adjacent VMAs to prevent this
> growth.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 118b2246bba9..00f8c5798936 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	 * down_read(mmap_lock) and collide with the VMA we are about to unmap.
>  	 */
>  	if (downgrade) {
> -		if (next && (next->vm_flags & VM_GROWSDOWN))
> +		if (next && (next->vm_flags & VM_GROWSDOWN)) {
> +			vma_start_write(next);
>  			downgrade = false;

If the mmap write lock is insufficient to protect us from next/prev
modifications then we need to move *most* of this block above the maple
tree write operation, otherwise we have a race here.  When I say most, I
mean everything besides the call to mmap_write_downgrade() needs to be
moved.

If the mmap write lock is sufficient to protect us from next/prev
modifications then we don't need to write lock the vmas themselves.

I believe this is for expand_stack() protection, so I believe it's okay
to not vma write lock these vmas.. I don't think there are other areas
where we can modify the vmas without holding the mmap lock, but others
on the CC list please chime in if I've forgotten something.

So, if I am correct, then you shouldn't lock next/prev and allow the
vma locking fault method on these vmas.  This will work because
lock_vma_under_rcu() uses mas_walk() on the faulting address.  That is,
your lock_vma_under_rcu() will fail to find anything that needs to be
grown and go back to mmap lock protection.  As it is written today, the
vma locking fault handler will fail and we will wait for the mmap lock
to be released even when the vma isn't going to expand.


> -		else if (prev && (prev->vm_flags & VM_GROWSUP))
> +		} else if (prev && (prev->vm_flags & VM_GROWSUP)) {
> +			vma_start_write(prev);
>  			downgrade = false;
> -		else
> +		} else
>  			mmap_write_downgrade(mm);
>  	}
>  
> -- 
> 2.39.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
       [not found] ` <20230216051750.3125598-27-surenb@google.com>
@ 2023-02-16 15:44   ` Matthew Wilcox
  2023-02-16 19:43     ` Suren Baghdasaryan
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2023-02-16 15:44 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> When vma->anon_vma is not set, page fault handler will set it by either
> reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> a compatible adjacent VMA and that requires not only the faulting VMA
> to be stable but also the tree structure and other VMAs inside that tree.
> Therefore locking just the faulting VMA is not enough for this search.
> Fall back to taking mmap_lock when vma->anon_vma is not set. This
> situation happens only on the first page fault and should not affect
> overall performance.

I think I asked this before, but don't remember getting an aswer.
Why do we defer setting anon_vma to the first fault?  Why don't we
set it up at mmap time?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
  2023-02-16 15:44   ` [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set Matthew Wilcox
@ 2023-02-16 19:43     ` Suren Baghdasaryan
  2023-02-17  2:14       ` Suren Baghdasaryan
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2023-02-16 19:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > When vma->anon_vma is not set, page fault handler will set it by either
> > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > a compatible adjacent VMA and that requires not only the faulting VMA
> > to be stable but also the tree structure and other VMAs inside that tree.
> > Therefore locking just the faulting VMA is not enough for this search.
> > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > situation happens only on the first page fault and should not affect
> > overall performance.
>
> I think I asked this before, but don't remember getting an aswer.
> Why do we defer setting anon_vma to the first fault?  Why don't we
> set it up at mmap time?

Yeah, I remember that conversation Matthew and I could not find the
definitive answer at the time. I'll look into that again or maybe
someone can answer it here.

In the end rather than changing that logic I decided to skip
vma->anon_vma==NULL cases because I measured them being less than
0.01% of all page faults, so ROI from changing that would be quite
low. But I agree that the logic is weird and maybe we can improve
that. I will have to review that again when I'm working on eliminating
all these special cases we skip, like swap/userfaults/etc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
  2023-02-16 19:43     ` Suren Baghdasaryan
@ 2023-02-17  2:14       ` Suren Baghdasaryan
  2023-02-17 16:05         ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2023-02-17  2:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > When vma->anon_vma is not set, page fault handler will set it by either
> > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > to be stable but also the tree structure and other VMAs inside that tree.
> > > Therefore locking just the faulting VMA is not enough for this search.
> > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > situation happens only on the first page fault and should not affect
> > > overall performance.
> >
> > I think I asked this before, but don't remember getting an aswer.
> > Why do we defer setting anon_vma to the first fault?  Why don't we
> > set it up at mmap time?
>
> Yeah, I remember that conversation Matthew and I could not find the
> definitive answer at the time. I'll look into that again or maybe
> someone can answer it here.

After looking into it again I'm still under the impression that
vma->anon_vma is populated lazily (during the first page fault rather
than at mmap time) to avoid doing extra work for areas which are never
faulted. Though I might be missing some important detail here.

>
> In the end rather than changing that logic I decided to skip
> vma->anon_vma==NULL cases because I measured them being less than
> 0.01% of all page faults, so ROI from changing that would be quite
> low. But I agree that the logic is weird and maybe we can improve
> that. I will have to review that again when I'm working on eliminating
> all these special cases we skip, like swap/userfaults/etc.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area
       [not found]     ` <CAJuCfpEkujbHNxNWcWr8bmrsMhXGcpDyraOfQaPAcOH=RQPv5A@mail.gmail.com>
@ 2023-02-17 14:50       ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-17 14:50 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

* Suren Baghdasaryan <surenb@google.com> [230216 14:36]:
> On Thu, Feb 16, 2023 at 7:34 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> >
> > First, sorry I didn't see this before v3..
> 
> Feedback at any time is highly appreciated!
> 
> >
> > * Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> > > While unmapping VMAs, adjacent VMAs might be able to grow into the area
> > > being unmapped. In such cases write-lock adjacent VMAs to prevent this
> > > growth.
> > >
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > ---
> > >  mm/mmap.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 118b2246bba9..00f8c5798936 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > >        * down_read(mmap_lock) and collide with the VMA we are about to unmap.
> > >        */
> > >       if (downgrade) {
> > > -             if (next && (next->vm_flags & VM_GROWSDOWN))
> > > +             if (next && (next->vm_flags & VM_GROWSDOWN)) {
> > > +                     vma_start_write(next);
> > >                       downgrade = false;
> >
> > If the mmap write lock is insufficient to protect us from next/prev
> > modifications then we need to move *most* of this block above the maple
> > tree write operation, otherwise we have a race here.  When I say most, I
> > mean everything besides the call to mmap_write_downgrade() needs to be
> > moved.
> 
> Which prior maple tree write operation are you referring to? I see
> __split_vma() and munmap_sidetree() which both already lock the VMAs
> they operate on, so page faults can't happen in those VMAs.

The write that removes the VMAs from the maple tree a few lines above..
/* Point of no return */

If the mmap lock is not sufficient, then we need to move the
vma_start_write() of prev/next to above the call to
vma_iter_clear_gfp() in do_vmi_align_munmap().

But I still think it IS enough.

> 
> >
> > If the mmap write lock is sufficient to protect us from next/prev
> > modifications then we don't need to write lock the vmas themselves.
> 
> mmap write lock is not sufficient because with per-VMA locks we do not
> take mmap lock at all.

Understood, but it also does not expand VMAs.

> 
> >
> > I believe this is for expand_stack() protection, so I believe it's okay
> > to not vma write lock these vmas.. I don't think there are other areas
> > where we can modify the vmas without holding the mmap lock, but others
> > on the CC list please chime in if I've forgotten something.
> >
> > So, if I am correct, then you shouldn't lock next/prev and allow the
> > vma locking fault method on these vmas.  This will work because
> > lock_vma_under_rcu() uses mas_walk() on the faulting address.  That is,
> > your lock_vma_under_rcu() will fail to find anything that needs to be
> > grown and go back to mmap lock protection.  As it is written today, the
> > vma locking fault handler will fail and we will wait for the mmap lock
> > to be released even when the vma isn't going to expand.
> 
> So, let's consider a case when the next VMA is not being removed (so
> it was neither removed nor locked by munmap_sidetree()) and it is
> found by lock_vma_under_rcu() in the page fault handling path.

By this point next VMA is either NULL or outside the munmap area, so
what you said here is always true.

>Page
> fault handler can now expand it and push into the area we are
> unmapping in unmap_region(). That is the race I'm trying to prevent
> here by locking the next/prev VMAs which can be expanded before
> unmap_region() unmaps them. Am I missing something?

Yes, I think the part you are missing (or I am missing..) is that
expand_stack() will never be called without the mmap lock.  We don't use
the vma locking to expand the stack.

...


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
  2023-02-17  2:14       ` Suren Baghdasaryan
@ 2023-02-17 16:05         ` Matthew Wilcox
  2023-02-17 16:10           ` Suren Baghdasaryan
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2023-02-17 16:05 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > situation happens only on the first page fault and should not affect
> > > > overall performance.
> > >
> > > I think I asked this before, but don't remember getting an aswer.
> > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > set it up at mmap time?
> >
> > Yeah, I remember that conversation Matthew and I could not find the
> > definitive answer at the time. I'll look into that again or maybe
> > someone can answer it here.
> 
> After looking into it again I'm still under the impression that
> vma->anon_vma is populated lazily (during the first page fault rather
> than at mmap time) to avoid doing extra work for areas which are never
> faulted. Though I might be missing some important detail here.

How often does userspace call mmap() and then _never_ fault on it?
I appreciate that userspace might mmap() gigabytes of address space and
then only end up using a small amount of it, so populating it lazily
makes sense.  But creating a region and never faulting on it?  The only
use-case I can think of is loading shared libraries:

openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
(...)
mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000

but that's a file-backed VMA, not an anon VMA.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
  2023-02-17 16:05         ` Matthew Wilcox
@ 2023-02-17 16:10           ` Suren Baghdasaryan
  2023-04-03 19:49             ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2023-02-17 16:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Fri, Feb 17, 2023 at 8:05 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > situation happens only on the first page fault and should not affect
> > > > > overall performance.
> > > >
> > > > I think I asked this before, but don't remember getting an aswer.
> > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > set it up at mmap time?
> > >
> > > Yeah, I remember that conversation Matthew and I could not find the
> > > definitive answer at the time. I'll look into that again or maybe
> > > someone can answer it here.
> >
> > After looking into it again I'm still under the impression that
> > vma->anon_vma is populated lazily (during the first page fault rather
> > than at mmap time) to avoid doing extra work for areas which are never
> > faulted. Though I might be missing some important detail here.
>
> How often does userspace call mmap() and then _never_ fault on it?
> I appreciate that userspace might mmap() gigabytes of address space and
> then only end up using a small amount of it, so populating it lazily
> makes sense.  But creating a region and never faulting on it?  The only
> use-case I can think of is loading shared libraries:
>
> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> (...)
> mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
> mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
> mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
> mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
> mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000
>
> but that's a file-backed VMA, not an anon VMA.

Might the case of dup_mmap() while forking be the reason why a VMA in
the child process might be never used while parent uses it (or visa
versa)? Again, I'm not sure this is the reason but I can find no other
good explanation.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 23/35] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration
       [not found] ` <20230216051750.3125598-24-surenb@google.com>
@ 2023-02-23 20:06   ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-23 20:06 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

* Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> Page fault handlers might need to fire MMU notifications while a new
> notifier is being registered. Modify mm_take_all_locks to write-lock all
> VMAs and prevent this race with page fault handlers that would hold VMA
> locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
> locking order as in page fault handlers.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 00f8c5798936..801608726be8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3501,6 +3501,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
>   * of mm/rmap.c:
>   *   - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
>   *     hugetlb mapping);
> + *   - all vmas marked locked
>   *   - all i_mmap_rwsem locks;
>   *   - all anon_vma->rwseml
>   *
> @@ -3523,6 +3524,13 @@ int mm_take_all_locks(struct mm_struct *mm)
>  
>  	mutex_lock(&mm_all_locks_mutex);
>  
> +	mas_for_each(&mas, vma, ULONG_MAX) {
> +		if (signal_pending(current))
> +			goto out_unlock;
> +		vma_start_write(vma);
> +	}
> +
> +	mas_set(&mas, 0);
>  	mas_for_each(&mas, vma, ULONG_MAX) {
>  		if (signal_pending(current))
>  			goto out_unlock;

Do we need a vma_end_write_all(mm) in the out_unlock unrolling?

Also, does this need to honour the strict locking order that we have to
add an entire new loop?  This function is...suboptimal today, but if we
could get away with not looping through every VMA for a 4th time, that
would be nice.

> @@ -3612,6 +3620,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
>  		if (vma->vm_file && vma->vm_file->f_mapping)
>  			vm_unlock_mapping(vma->vm_file->f_mapping);
>  	}
> +	vma_end_write_all(mm);
>  
>  	mutex_unlock(&mm_all_locks_mutex);
>  }
> -- 
> 2.39.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 24/35] mm: introduce vma detached flag
       [not found] ` <20230216051750.3125598-25-surenb@google.com>
@ 2023-02-23 20:08   ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-23 20:08 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team


Can we change this to active/inactive?  I think there is potential for
reusing this functionality to even larger degrees and that name would
fit better and would still make sense in this context.

ie: vma_mark_active() and vma_mark_inactive() ?

* Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> Per-vma locking mechanism will search for VMA under RCU protection and
> then after locking it, has to ensure it was not removed from the VMA
> tree after we found it. To make this check efficient, introduce a
> vma->detached flag to mark VMAs which were removed from the VMA tree.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  include/linux/mm.h       | 11 +++++++++++
>  include/linux/mm_types.h |  3 +++
>  mm/mmap.c                |  2 ++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f4f702224ec5..3f98344f829c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -693,6 +693,14 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
>  	VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma);
>  }
>  
> +static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached)
> +{
> +	/* When detaching vma should be write-locked */
> +	if (detached)
> +		vma_assert_write_locked(vma);
> +	vma->detached = detached;
> +}
> +
>  #else /* CONFIG_PER_VMA_LOCK */
>  
>  static inline void vma_init_lock(struct vm_area_struct *vma) {}
> @@ -701,6 +709,8 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
>  static inline void vma_end_read(struct vm_area_struct *vma) {}
>  static inline void vma_start_write(struct vm_area_struct *vma) {}
>  static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
> +static inline void vma_mark_detached(struct vm_area_struct *vma,
> +				     bool detached) {}
>  
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
> @@ -712,6 +722,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
>  	vma->vm_mm = mm;
>  	vma->vm_ops = &dummy_vm_ops;
>  	INIT_LIST_HEAD(&vma->anon_vma_chain);
> +	vma_mark_detached(vma, false);
>  	vma_init_lock(vma);
>  }
>  
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index e268723eaf44..939f4f5a1115 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -511,6 +511,9 @@ struct vm_area_struct {
>  #ifdef CONFIG_PER_VMA_LOCK
>  	int vm_lock_seq;
>  	struct rw_semaphore lock;
> +
> +	/* Flag to indicate areas detached from the mm->mm_mt tree */
> +	bool detached;
>  #endif
>  
>  	/*
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 801608726be8..adf40177e68f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -593,6 +593,7 @@ static inline void vma_complete(struct vma_prepare *vp,
>  
>  	if (vp->remove) {
>  again:
> +		vma_mark_detached(vp->remove, true);
>  		if (vp->file) {
>  			uprobe_munmap(vp->remove, vp->remove->vm_start,
>  				      vp->remove->vm_end);
> @@ -2267,6 +2268,7 @@ static inline int munmap_sidetree(struct vm_area_struct *vma,
>  	if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
>  		return -ENOMEM;
>  
> +	vma_mark_detached(vma, true);
>  	if (vma->vm_flags & VM_LOCKED)
>  		vma->vm_mm->locked_vm -= vma_pages(vma);
>  
> -- 
> 2.39.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it
       [not found] ` <20230216051750.3125598-18-surenb@google.com>
@ 2023-02-23 20:20   ` Liam R. Howlett
  2023-02-23 20:28     ` Liam R. Howlett
  0 siblings, 1 reply; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-23 20:20 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

* Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> vma_expand and vma_shrink change VMA boundaries. Expansion might also
> result in freeing of an adjacent VMA. Write-lock affected VMAs to prevent
> concurrent page faults.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index ec2f8d0af280..f079e5bbcd57 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -674,6 +674,9 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		ret = dup_anon_vma(vma, next);
>  		if (ret)
>  			return ret;
> +
> +		/* Lock the VMA  before removing it */
> +		vma_start_write(next);
>  	}
>  
>  	init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
> @@ -686,6 +689,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	if (vma_iter_prealloc(vmi))
>  		goto nomem;
>  
> +	vma_start_write(vma);
>  	vma_adjust_trans_huge(vma, start, end, 0);
>  	/* VMA iterator points to previous, so set to start if necessary */
>  	if (vma_iter_addr(vmi) != start)
> @@ -725,6 +729,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	if (vma_iter_prealloc(vmi))
>  		return -ENOMEM;
>  
> +	vma_start_write(vma);
>  	init_vma_prep(&vp, vma);
>  	vma_adjust_trans_huge(vma, start, end, 0);
>  	vma_prepare(&vp);
> -- 
> 2.39.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it
  2023-02-23 20:20   ` [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it Liam R. Howlett
@ 2023-02-23 20:28     ` Liam R. Howlett
       [not found]       ` <CAJuCfpE3YtSQuXJwOYWKe1z9O4GASS9pA_FTWGkdveHb3bcMXA@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-23 20:28 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm, michel, jglisse, mhocko, vbabka, hannes,
	mgorman, dave, willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team


Wait, I figured a better place to do this.

init_multi_vma_prep() should vma_start_write() on any VMA that is passed
in.. that we we catch any modifications here & in vma_merge(), which I
think is missed in this patch set?


* Liam R. Howlett <Liam.Howlett@Oracle.com> [230223 15:20]:
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> 
> * Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> > vma_expand and vma_shrink change VMA boundaries. Expansion might also
> > result in freeing of an adjacent VMA. Write-lock affected VMAs to prevent
> > concurrent page faults.
> > 
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  mm/mmap.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index ec2f8d0af280..f079e5bbcd57 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -674,6 +674,9 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  		ret = dup_anon_vma(vma, next);
> >  		if (ret)
> >  			return ret;
> > +
> > +		/* Lock the VMA  before removing it */
> > +		vma_start_write(next);
> >  	}
> >  
> >  	init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
> > @@ -686,6 +689,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  	if (vma_iter_prealloc(vmi))
> >  		goto nomem;
> >  
> > +	vma_start_write(vma);
> >  	vma_adjust_trans_huge(vma, start, end, 0);
> >  	/* VMA iterator points to previous, so set to start if necessary */
> >  	if (vma_iter_addr(vmi) != start)
> > @@ -725,6 +729,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  	if (vma_iter_prealloc(vmi))
> >  		return -ENOMEM;
> >  
> > +	vma_start_write(vma);
> >  	init_vma_prep(&vp, vma);
> >  	vma_adjust_trans_huge(vma, start, end, 0);
> >  	vma_prepare(&vp);
> > -- 
> > 2.39.1
> > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it
       [not found]       ` <CAJuCfpE3YtSQuXJwOYWKe1z9O4GASS9pA_FTWGkdveHb3bcMXA@mail.gmail.com>
@ 2023-02-24  1:46         ` Liam R. Howlett
       [not found]           ` <CAJuCfpG4JOv4aeJ6KJDi7R649vuhc0h75230ZRJgUg8spqti8w@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-24  1:46 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

* Suren Baghdasaryan <surenb@google.com> [230223 16:16]:
> On Thu, Feb 23, 2023 at 12:28 PM Liam R. Howlett
> <Liam.Howlett@oracle.com> wrote:
> >
> >
> > Wait, I figured a better place to do this.
> >
> > init_multi_vma_prep() should vma_start_write() on any VMA that is passed
> > in.. that we we catch any modifications here & in vma_merge(), which I
> > think is missed in this patch set?
> 
> Hmm. That looks like a good idea but in that case, why not do the
> locking inside vma_prepare() itself? From the description of that
> function it sounds like it was designed to acquire locks before VMA
> modifications, so would be the ideal location for doing that. WDYT?

That might be even better.  I think it will result in even less code.

There is also a vma_complete() which might work to call
vma_end_write_all() as well?

> The only concern is vma_adjust_trans_huge() being called before
> vma_prepare() but I *think* that's safe because
> vma_adjust_trans_huge() does its modifications after acquiring PTL
> lock, which page fault handlers also have to take. Does that sound
> right?

I am not sure.  We are certainly safe the way it is, and the PTL has to
be safe for concurrent faults.. but this could alter the walk to a page
table while that walk is occurring and I don't think that happens today.

It might be best to leave the locking order the way you have it, unless
someone can tell us it's safe?

We could pass through the three extra variables that are needed to move
the vma_adjust_trans_huge() call within that function as well?  This
would have the added benefit of having all locking grouped in the one
location, but the argument list would be getting long, however we could
use the struct.

remove & remove2 should be be detached in vma_prepare() or
vma_complete() as well?

> 
> >
> >
> > * Liam R. Howlett <Liam.Howlett@Oracle.com> [230223 15:20]:
> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > >
> > > * Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> > > > vma_expand and vma_shrink change VMA boundaries. Expansion might also
> > > > result in freeing of an adjacent VMA. Write-lock affected VMAs to prevent
> > > > concurrent page faults.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > ---
> > > >  mm/mmap.c | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index ec2f8d0af280..f079e5bbcd57 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -674,6 +674,9 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >             ret = dup_anon_vma(vma, next);
> > > >             if (ret)
> > > >                     return ret;
> > > > +
> > > > +           /* Lock the VMA  before removing it */
> > > > +           vma_start_write(next);
> > > >     }
> > > >
> > > >     init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
> > > > @@ -686,6 +689,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >     if (vma_iter_prealloc(vmi))
> > > >             goto nomem;
> > > >
> > > > +   vma_start_write(vma);
> > > >     vma_adjust_trans_huge(vma, start, end, 0);
> > > >     /* VMA iterator points to previous, so set to start if necessary */
> > > >     if (vma_iter_addr(vmi) != start)
> > > > @@ -725,6 +729,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >     if (vma_iter_prealloc(vmi))
> > > >             return -ENOMEM;
> > > >
> > > > +   vma_start_write(vma);
> > > >     init_vma_prep(&vp, vma);
> > > >     vma_adjust_trans_huge(vma, start, end, 0);
> > > >     vma_prepare(&vp);
> > > > --
> > > > 2.39.1
> > > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 00/35] Per-VMA locks
       [not found] <20230216051750.3125598-1-surenb@google.com>
                   ` (4 preceding siblings ...)
       [not found] ` <20230216051750.3125598-18-surenb@google.com>
@ 2023-02-24  9:21 ` freak07
  2023-02-27 16:50   ` Davidlohr Bueso
  5 siblings, 1 reply; 17+ messages in thread
From: freak07 @ 2023-02-24  9:21 UTC (permalink / raw)
  To: surenb
  Cc: akpm, arjunroy, axelrasmussen, bigeasy, chriscli, dave, david,
	dhowells, edumazet, gthelen, gurua, hannes, hughd, jannh, jglisse,
	joelaf, kent.overstreet, kernel-team, ldufour, leewalsh,
	liam.howlett, linux-arm-kernel, linux-kernel, linux-mm,
	linuxppc-dev, lstoakes, luto, mgorman, mhocko, michel, minchan,
	mingo, paulmck, peterjung1337, peterx, peterz, posk,
	punit.agrawal, rientjes, rppt, shakeelb, soheil, songliubraving,
	tatashin, vbabka, will, willy, x86


Here are some measurements from a Pixel 7 Pro that´s running a kernel either with the Per-VMA locks patchset or without. 
If there´s interest I can provide results of other specific apps as well.

Results are from consecutive cold app launches issued with "am start" command spawning the main activity of Slack Android app. 

https://docs.google.com/spreadsheets/d/1ktujfcyDmIJoQMWsoizGIE-0A_jMS9VMw_seehUY9s0/edit?usp=sharing

There´s quite a noticeable improvement, as can be seen in the graph. The results were reproducible. 

If you have any questions feel free to ask. 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it
       [not found]           ` <CAJuCfpG4JOv4aeJ6KJDi7R649vuhc0h75230ZRJgUg8spqti8w@mail.gmail.com>
@ 2023-02-24 16:14             ` Liam R. Howlett
  0 siblings, 0 replies; 17+ messages in thread
From: Liam R. Howlett @ 2023-02-24 16:14 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	willy, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

* Suren Baghdasaryan <surenb@google.com> [230223 21:06]:
> On Thu, Feb 23, 2023 at 5:46 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Suren Baghdasaryan <surenb@google.com> [230223 16:16]:
> > > On Thu, Feb 23, 2023 at 12:28 PM Liam R. Howlett
> > > <Liam.Howlett@oracle.com> wrote:
> > > >
> > > >
> > > > Wait, I figured a better place to do this.
> > > >
> > > > init_multi_vma_prep() should vma_start_write() on any VMA that is passed
> > > > in.. that we we catch any modifications here & in vma_merge(), which I
> > > > think is missed in this patch set?
> > >
> > > Hmm. That looks like a good idea but in that case, why not do the
> > > locking inside vma_prepare() itself? From the description of that
> > > function it sounds like it was designed to acquire locks before VMA
> > > modifications, so would be the ideal location for doing that. WDYT?
> >
> > That might be even better.  I think it will result in even less code.
> 
> Yes.
> 
> >
> > There is also a vma_complete() which might work to call
> > vma_end_write_all() as well?
> 
> If there are other VMAs already locked before vma_prepare() then we
> would unlock them too. Safer to just let mmap_unlock do
> vma_end_write_all().
> 
> >
> > > The only concern is vma_adjust_trans_huge() being called before
> > > vma_prepare() but I *think* that's safe because
> > > vma_adjust_trans_huge() does its modifications after acquiring PTL
> > > lock, which page fault handlers also have to take. Does that sound
> > > right?
> >
> > I am not sure.  We are certainly safe the way it is, and the PTL has to
> > be safe for concurrent faults.. but this could alter the walk to a page
> > table while that walk is occurring and I don't think that happens today.
> >
> > It might be best to leave the locking order the way you have it, unless
> > someone can tell us it's safe?
> 
> Yes, I have the same feelings about changing this.
> 
> >
> > We could pass through the three extra variables that are needed to move
> > the vma_adjust_trans_huge() call within that function as well?  This
> > would have the added benefit of having all locking grouped in the one
> > location, but the argument list would be getting long, however we could
> > use the struct.
> 
> Any issues if I change the order to have vma_prepare() called always
> before vma_adjust_trans_huge()? That way the VMA will always be locked
> before vma_adjust_trans_huge() executes and we don't need any
> additional arguments.

I preserved the locking order from __vma_adjust() to ensure there was no
issues.

I am not sure but, looking through the page table information [1], it
seems that vma_adjust_trans_huge() uses the pmd lock, which is part of
the split page table lock.  According to the comment in rmap, it should
be fine to reverse the ordering here.

Instead of:

mmap_lock()
vma_adjust_trans_huge()
	pte_lock
	pte_unlock

vma_prepare()
	mapping->i_mmap_rwsem lock
	anon_vma->rwsem lock

<changes to tree/VMAs>

vma_complete()
	anon_vma->rwsem unlock
	mapping->i_mmap_rwsem unlock

mmap_unlock()

---------

We would have:

mmap_lock()
vma_prepare()
	mapping->i_mmap_rwsem lock
	anon_vma->rwsem lock

vma_adjust_trans_huge()
	pte_lock
	pte_unlock

<changes to tree/VMAs>

vma_complete()
	anon_vma->rwsem unlock
	mapping->i_mmap_rwsem unlock

mmap_unlock()


Essentially, increasing the nesting of the pte lock, but not violating
the ordering.

1. https://docs.kernel.org/mm/split_page_table_lock.html

> 
> >
> > remove & remove2 should be be detached in vma_prepare() or
> > vma_complete() as well?
> 
> They are marked detached in vma_complete() (see
> https://lore.kernel.org/all/20230216051750.3125598-25-surenb@google.com/)
> and that should be enough. We should be safe as long as we mark them
> detached before unlocking the VMA.
> 

Right, Thanks.

...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 00/35] Per-VMA locks
  2023-02-24  9:21 ` [PATCH v3 00/35] Per-VMA locks freak07
@ 2023-02-27 16:50   ` Davidlohr Bueso
  2023-02-27 17:22     ` Suren Baghdasaryan
  0 siblings, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2023-02-27 16:50 UTC (permalink / raw)
  To: freak07
  Cc: surenb, akpm, arjunroy, axelrasmussen, bigeasy, chriscli, david,
	dhowells, edumazet, gthelen, gurua, hannes, hughd, jannh, jglisse,
	joelaf, kent.overstreet, kernel-team, ldufour, leewalsh,
	liam.howlett, linux-arm-kernel, linux-kernel, linux-mm,
	linuxppc-dev, lstoakes, luto, mgorman, mhocko, michel, minchan,
	mingo, paulmck, peterjung1337, peterx, peterz, posk,
	punit.agrawal, rientjes, rppt, shakeelb, soheil, songliubraving,
	tatashin, vbabka, will, willy, x86

On Fri, 24 Feb 2023, freak07 wrote:

>Here are some measurements from a Pixel 7 Pro that´s running a kernel either with the Per-VMA locks patchset or without.
>If there´s interest I can provide results of other specific apps as well.
>
>Results are from consecutive cold app launches issued with "am start" command spawning the main activity of Slack Android app.
>
>https://docs.google.com/spreadsheets/d/1ktujfcyDmIJoQMWsoizGIE-0A_jMS9VMw_seehUY9s0/edit?usp=sharing
>
>There´s quite a noticeable improvement, as can be seen in the graph. The results were reproducible.

Thanks for sharing. I am not surprised - after all, per-vma locks narrow some of the performance gaps
between vanilla and speculative pfs, with less complexity (albeit this is now a 35 patch series :).

Thanks,
Davidlohr

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 00/35] Per-VMA locks
  2023-02-27 16:50   ` Davidlohr Bueso
@ 2023-02-27 17:22     ` Suren Baghdasaryan
  0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2023-02-27 17:22 UTC (permalink / raw)
  To: freak07, surenb, akpm, arjunroy, axelrasmussen, bigeasy, chriscli,
	david, dhowells, edumazet, gthelen, gurua, hannes, hughd, jannh,
	jglisse, joelaf, kent.overstreet, kernel-team, ldufour, leewalsh,
	liam.howlett, linux-arm-kernel, linux-kernel, linux-mm,
	linuxppc-dev, lstoakes, luto, mgorman, mhocko, michel, minchan,
	mingo, paulmck, peterjung1337, peterx, peterz, posk,
	punit.agrawal, rientjes, rppt, shakeelb, soheil, songliubraving,
	tatashin, vbabka, will, willy, x86

On Mon, Feb 27, 2023 at 9:19 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> On Fri, 24 Feb 2023, freak07 wrote:
>
> >Here are some measurements from a Pixel 7 Pro that´s running a kernel either with the Per-VMA locks patchset or without.
> >If there´s interest I can provide results of other specific apps as well.
> >
> >Results are from consecutive cold app launches issued with "am start" command spawning the main activity of Slack Android app.
> >
> >https://docs.google.com/spreadsheets/d/1ktujfcyDmIJoQMWsoizGIE-0A_jMS9VMw_seehUY9s0/edit?usp=sharing
> >
> >There´s quite a noticeable improvement, as can be seen in the graph. The results were reproducible.
>
> Thanks for sharing. I am not surprised - after all, per-vma locks narrow some of the performance gaps
> between vanilla and speculative pfs, with less complexity (albeit this is now a 35 patch series :).

Yes, depending on the specific app we would expect some launch time
improvement (in this case average improvement is 7%). Thanks for
sharing the numbers!
I'll be posting v4 today, which is a 33 patch series now, so a bit better :)
Thanks,
Suren.

>
> Thanks,
> Davidlohr
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
  2023-02-17 16:10           ` Suren Baghdasaryan
@ 2023-04-03 19:49             ` Matthew Wilcox
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2023-04-03 19:49 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, michel, jglisse, mhocko, vbabka, hannes, mgorman, dave,
	liam.howlett, peterz, ldufour, paulmck, mingo, will, luto,
	songliubraving, peterx, david, dhowells, hughd, bigeasy,
	kent.overstreet, punit.agrawal, lstoakes, peterjung1337, rientjes,
	chriscli, axelrasmussen, joelaf, minchan, rppt, jannh, shakeelb,
	tatashin, edumazet, gthelen, gurua, arjunroy, soheil, leewalsh,
	posk, michalechner92, linux-mm, linux-arm-kernel, linuxppc-dev,
	x86, linux-kernel, kernel-team

On Fri, Feb 17, 2023 at 08:10:35AM -0800, Suren Baghdasaryan wrote:
> On Fri, Feb 17, 2023 at 8:05 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > > situation happens only on the first page fault and should not affect
> > > > > > overall performance.
> > > > >
> > > > > I think I asked this before, but don't remember getting an aswer.
> > > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > > set it up at mmap time?
> > > >
> > > > Yeah, I remember that conversation Matthew and I could not find the
> > > > definitive answer at the time. I'll look into that again or maybe
> > > > someone can answer it here.
> > >
> > > After looking into it again I'm still under the impression that
> > > vma->anon_vma is populated lazily (during the first page fault rather
> > > than at mmap time) to avoid doing extra work for areas which are never
> > > faulted. Though I might be missing some important detail here.
> >
> > How often does userspace call mmap() and then _never_ fault on it?
> > I appreciate that userspace might mmap() gigabytes of address space and
> > then only end up using a small amount of it, so populating it lazily
> > makes sense.  But creating a region and never faulting on it?  The only
> > use-case I can think of is loading shared libraries:
> >
> > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> > (...)
> > mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
> > mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
> > mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
> > mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
> > mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000
> >
> > but that's a file-backed VMA, not an anon VMA.
> 
> Might the case of dup_mmap() while forking be the reason why a VMA in
> the child process might be never used while parent uses it (or visa
> versa)? Again, I'm not sure this is the reason but I can find no other
> good explanation.

I found an explanation!  Well, a partial one.  If we MAP_PRIVATE a file
mapping (like, er those ones up there) and only take read faults on it,
we can postpone allocation of the anon_vma indefinitely.  But once we
take a write fault in that VMA, we need to allocate an anon_vma for it
so that we can track the anonymous pages that have been allocated to
satisfy the copy-on-write (see do_cow_fault()).

However, I think in that caase, we could probably skip the
find_mergeable_anon_vma() step.  We don't today; we check whether
a->vm_file == b->vm_file in anon_vma_compatible, but I wonder if that
triggers often.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-04-03 19:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230216051750.3125598-1-surenb@google.com>
     [not found] ` <20230216051750.3125598-22-surenb@google.com>
2023-02-16 15:34   ` [PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area Liam R. Howlett
     [not found]     ` <CAJuCfpEkujbHNxNWcWr8bmrsMhXGcpDyraOfQaPAcOH=RQPv5A@mail.gmail.com>
2023-02-17 14:50       ` Liam R. Howlett
     [not found] ` <20230216051750.3125598-27-surenb@google.com>
2023-02-16 15:44   ` [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set Matthew Wilcox
2023-02-16 19:43     ` Suren Baghdasaryan
2023-02-17  2:14       ` Suren Baghdasaryan
2023-02-17 16:05         ` Matthew Wilcox
2023-02-17 16:10           ` Suren Baghdasaryan
2023-04-03 19:49             ` Matthew Wilcox
     [not found] ` <20230216051750.3125598-24-surenb@google.com>
2023-02-23 20:06   ` [PATCH v3 23/35] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration Liam R. Howlett
     [not found] ` <20230216051750.3125598-25-surenb@google.com>
2023-02-23 20:08   ` [PATCH v3 24/35] mm: introduce vma detached flag Liam R. Howlett
     [not found] ` <20230216051750.3125598-18-surenb@google.com>
2023-02-23 20:20   ` [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it Liam R. Howlett
2023-02-23 20:28     ` Liam R. Howlett
     [not found]       ` <CAJuCfpE3YtSQuXJwOYWKe1z9O4GASS9pA_FTWGkdveHb3bcMXA@mail.gmail.com>
2023-02-24  1:46         ` Liam R. Howlett
     [not found]           ` <CAJuCfpG4JOv4aeJ6KJDi7R649vuhc0h75230ZRJgUg8spqti8w@mail.gmail.com>
2023-02-24 16:14             ` Liam R. Howlett
2023-02-24  9:21 ` [PATCH v3 00/35] Per-VMA locks freak07
2023-02-27 16:50   ` Davidlohr Bueso
2023-02-27 17:22     ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox