All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Rik van Riel <riel@surriel.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
Date: Mon, 12 May 2025 11:39:16 +0900	[thread overview]
Message-ID: <aCFfVHFq_4enEgsL@hyeyoo> (raw)
In-Reply-To: <18c756fc9eaf7ad082a710c91133b8346f8cd9a8.1744104124.git.lorenzo.stoakes@oracle.com>

On Tue, Apr 08, 2025 at 10:29:31AM +0100, Lorenzo Stoakes wrote:
> anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
> change anon_vma linking to fix multi-process server scalability issue").
> 
> This patch was introduced in March 2010. As part of this change, careful
> attention was made to the instance of mprotect() causing a VMA merge, with one
> faulted (i.e. having anon_vma set) and another not:
> 
> 		/*
> 		 * Easily overlooked: when mprotect shifts the boundary,
> 		 * make sure the expanding vma has anon_vma set if the
> 		 * shrinking vma had, to cover any anon pages imported.
> 		 */
> 
> In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
> anon_vma_clone()).
> 
> This case is one of the three configurations of adjacent VMA anon_vma state
> that we might encounter on merge (where dst is the VMA which will be merged
> into and src the one being merged into dst):
> 
> 1.  dst->anon_vma,  src->anon_vma - These must be equal, no-op.
> 2.  dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
> 3. !dst->anon_vma,  src->anon_vma - The case in question here.
> 
> In case 3, the instance addressed here - we duplicate the AVC connections
> from src and place into dst.
> 
> However, in practice, we very often do NOT do this.
> 
> This appears to be due to an inadvertent consequence of the change
> introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
> introduced in May 2011.
> 
> This implies that this merge case was functional only for a little over a
> year, and has since been broken for ~15 years.
> 
> Here, lock scalability concerns lead to us restricting anonymous merges
> only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
> VMA that is not connected to any parent process's anon_vma.
> 
> The mergeability test looks like this:
> 
> static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> 		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> {
> 	if ((!anon_vma1 || !anon_vma2) && (!vma ||
> 		!vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
> 		return true;
> 	return anon_vma1 == anon_vma2;
> }
> 
> However, we have a problem here - typically the vma passed here is the
> destination VMA.
> 
> For instance in vma_merge_existing_range() we invoke:
> 
> can_vma_merge_left()
> -> [ check that there is an immediately adjacent prior VMA ]
> -> can_vma_merge_after()
>   -> is_mergeable_vma() for general attribute check
> -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> 
> So if we were considering a target unfaulted 'prev':
> 
> 	  unfaulted    faulted
> 	|-----------|-----------|
> 	|    prev   |    vma    |
> 	|-----------|-----------|
> 
> This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> 
> The list_is_singular() check for vma->anon_vma_chain, an empty list on
> fault, would cause this merge to _fail_ even though all else indicates a
> merge.
> 
> Equally a simple merge into a next VMA would hit the same problem:
> 
> 	   faulted    unfaulted
> 	|-----------|-----------|
> 	|    vma    |    next   |
> 	|-----------|-----------|
> 
> can_vma_merge_right()
> -> [ check that there is an immediately adjacent succeeding VMA ]
> -> can_vma_merge_before()
>   -> is_mergeable_vma() for general attribute check
> -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
> 
> For a 3-way merge, we'd also hit the same problem if it was configured like
> this for instance:
> 
> 	  unfaulted    faulted    unfaulted
> 	|-----------|-----------|-----------|
> 	|    prev   |    vma    |    next   |
> 	|-----------|-----------|-----------|
> 
> As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
> next, both of which would fail.
> 
> vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
> the new VMA would never already be faulted (it is a proposed new range).
> 
> Because we already handle each of the aforementioned merge cases, and can
> absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
> src->anon_vma, there is absolutely no reason to disallow this kind of
> merge.
> 
> It seems that the intention of this patch is to ensure that, in the
> instance of merging unfaulted VMAs with faulted ones, we never wish to do
> so with those with multiple AVCs due to the fact that anon_vma lock's are
> held across both parent and child anon_vma's (actually, the 'root' parent
> anon_vma's lock is used).
> 
> In fact, the original commit alludes to this - "find_mergeable_anon_vma()
> already considers this case".
> 
> In find_mergeable_anon_vma() however, we check the anon_vma which will be
> merged from, if it is set, then we check
> list_is_singular(vma->anon_vma_chain).
> 
> So to match this logic, update is_mergeable_anon_vma() to perform this
> scalability check on the VMA whose anon_vma we ultimately merge into.
> 
> This matches existing behaviour with forked VMAs, only we no longer wrongly
> disallow ALL empty target merges.
> 
> So we both allow merge cases and ensure the scalability check is correctly
> applied.
> 
> We may wish to revisit these lock scalability concerns at a later date and
> ensure they are still valid.
> 
> Additionally, correct userland VMA tests which were mistakenly not
> asserting these cases correctly previously to now correctly assert this,
> and to ensure vmg->anon_vma state is always consistent to account for newly
> introduced asserts.
> 
> Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---

A little bit late to review, but better late than never :)

This patch makes sense and looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


  reply	other threads:[~2025-05-12  2:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-08  9:29 [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
2025-04-08  9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
2025-05-12  2:39   ` Harry Yoo [this message]
2025-05-12 11:24     ` Lorenzo Stoakes
2025-04-08  9:29 ` [PATCH v2 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
2025-04-08  9:29 ` [PATCH v2 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aCFfVHFq_4enEgsL@hyeyoo \
    --to=harry.yoo@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=richard.weiyang@gmail.com \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.