From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: jirislaby@kernel.org, jacobly.alt@gmail.com,
holger@applied-asynchrony.com, hdegoede@redhat.com,
michel@lespinasse.org, jglisse@google.com, mhocko@suse.com,
vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net,
dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com,
peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org,
mingo@redhat.com, will@kernel.org, luto@kernel.org,
songliubraving@fb.com, peterx@redhat.com, david@redhat.com,
dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de,
kent.overstreet@linux.dev, punit.agrawal@bytedance.com,
lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com,
chriscli@google.com, axelrasmussen@google.com, joelaf@google.com,
minchan@google.com, rppt@kernel.org, jannh@google.com,
shakeelb@google.com, tatashin@google.com, edumazet@google.com,
gthelen@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v3 1/2] fork: lock VMAs of the parent process when forking
Date: Wed, 5 Jul 2023 10:12:11 -0700 [thread overview]
Message-ID: <20230705171213.2843068-2-surenb@google.com> (raw)
In-Reply-To: <20230705171213.2843068-1-surenb@google.com>
When forking a child process, parent write-protects an anonymous page
and COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to
some stale writable TLB entries targeting the wrong (old) page.
Similar issue happened in the past with userfaultfd (see flush_tlb_page()
call inside do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
kernel/fork.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index b85814e614a5..403bc2b72301 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = -EINTR;
goto fail_uprobe_end;
}
+#ifdef CONFIG_PER_VMA_LOCK
+ /* Disallow any page faults before calling flush_cache_dup_mm */
+ for_each_vma(old_vmi, mpnt)
+ vma_start_write(mpnt);
+ vma_iter_init(&old_vmi, oldmm, 0);
+#endif
flush_cache_dup_mm(oldmm);
uprobe_dup_mmap(oldmm, mm);
/*
--
2.41.0.255.g8b1d071c50-goog
next prev parent reply other threads:[~2023-07-05 17:12 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 17:12 [PATCH v3 0/2] Avoid memory corruption caused by per-VMA locks Suren Baghdasaryan
2023-07-05 17:12 ` Suren Baghdasaryan [this message]
2023-07-05 17:14 ` [PATCH v3 1/2] fork: lock VMAs of the parent process when forking David Hildenbrand
2023-07-05 17:23 ` Suren Baghdasaryan
2023-07-05 23:06 ` Liam R. Howlett
2023-07-06 0:20 ` Suren Baghdasaryan
2023-07-06 0:32 ` Liam R. Howlett
2023-07-06 0:42 ` Suren Baghdasaryan
2023-07-05 17:12 ` [PATCH v3 2/2] mm: disable CONFIG_PER_VMA_LOCK until its fixed Suren Baghdasaryan
2023-07-05 17:15 ` David Hildenbrand
2023-07-05 17:22 ` Suren Baghdasaryan
2023-07-05 17:24 ` David Hildenbrand
2023-07-05 18:09 ` Suren Baghdasaryan
2023-07-05 18:14 ` Suren Baghdasaryan
2023-07-05 20:25 ` Peter Xu
2023-07-05 20:33 ` Suren Baghdasaryan
2023-07-06 0:24 ` Andrew Morton
2023-07-06 0:30 ` Suren Baghdasaryan
2023-07-06 0:32 ` Suren Baghdasaryan
2023-07-06 0:44 ` Andrew Morton
2023-07-06 0:49 ` Suren Baghdasaryan
2023-07-06 1:16 ` Suren Baghdasaryan
2023-07-05 20:37 ` David Hildenbrand
2023-07-05 21:09 ` Suren Baghdasaryan
2023-07-05 21:27 ` Matthew Wilcox
2023-07-05 21:54 ` Suren Baghdasaryan
2023-07-05 21:55 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230705171213.2843068-2-surenb@google.com \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bigeasy@linutronix.de \
--cc=chriscli@google.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hdegoede@redhat.com \
--cc=holger@applied-asynchrony.com \
--cc=hughd@google.com \
--cc=jacobly.alt@gmail.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=jirislaby@kernel.org \
--cc=joelaf@google.com \
--cc=kent.overstreet@linux.dev \
--cc=ldufour@linux.ibm.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=michel@lespinasse.org \
--cc=minchan@google.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterjung1337@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=punit.agrawal@bytedance.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=stable@vger.kernel.org \
--cc=tatashin@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.