All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, "Suren Baghdasaryan" <surenb@google.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Jacob Young" <jacobly.alt@gmail.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: [PATCH 6.4 7/8] fork: lock VMAs of the parent process when forking
Date: Sun,  9 Jul 2023 13:14:13 +0200	[thread overview]
Message-ID: <20230709111345.516444847@linuxfoundation.org> (raw)
In-Reply-To: <20230709111345.297026264@linuxfoundation.org>

From: Suren Baghdasaryan <surenb@google.com>

commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.

Patch series "Avoid memory corruption caused by per-VMA locks", v4.

A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.  Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.

Patch 1/2 in the series implements proper VMA locking during fork.  I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.

This fix can potentially regress some fork-heavy workloads.  Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Patch 2/2 disables per-VMA locks until the fix is tested and verified.


This patch (of 2):

When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap().  If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page.  Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).

Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.  This
fix can potentially regress some fork-heavy workloads.  Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression.  If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance.  Further optimizations
are possible if this regression proves to be problematic.

Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/fork.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str
 		retval = -EINTR;
 		goto fail_uprobe_end;
 	}
+#ifdef CONFIG_PER_VMA_LOCK
+	/* Disallow any page faults before calling flush_cache_dup_mm */
+	for_each_vma(old_vmi, mpnt)
+		vma_start_write(mpnt);
+	vma_iter_set(&old_vmi, 0);
+#endif
 	flush_cache_dup_mm(oldmm);
 	uprobe_dup_mmap(oldmm, mm);
 	/*



  parent reply	other threads:[~2023-07-09 11:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-09 11:14 [PATCH 6.4 0/8] 6.4.3-rc1 review Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 2/8] mm: lock a vma before stack expansion Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 3/8] mm: lock newly mapped VMA which can be modified after it becomes visible Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 4/8] mm: lock newly mapped VMA with corrected ordering Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 5/8] mm: call arch_swap_restore() from do_swap_page() Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 6/8] bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page Greg Kroah-Hartman
2023-07-09 11:14 ` Greg Kroah-Hartman [this message]
2023-07-09 12:39   ` [PATCH 6.4 7/8] fork: lock VMAs of the parent process when forking Thorsten Leemhuis
2023-07-09 13:32     ` Greg Kroah-Hartman
2023-07-09 16:04       ` Suren Baghdasaryan
2023-07-09 16:09         ` Greg Kroah-Hartman
2023-07-09 19:53           ` Suren Baghdasaryan
2023-07-09 20:24             ` Suren Baghdasaryan
2023-07-09 20:40               ` Greg Kroah-Hartman
2023-07-09 11:14 ` [PATCH 6.4 8/8] fork: lock VMAs of the parent process when forking, again Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230709111345.516444847@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=holger@applied-asynchrony.com \
    --cc=jacobly.alt@gmail.com \
    --cc=jirislaby@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.