* [PATCH hotfix 6.12 0/2] fork: do not expose incomplete mm on fork
@ 2024-10-15 17:56 Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 1/2] fork: do not invoke uffd on fork if error occurs Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 2/2] fork: only invoke khugepaged, ksm hooks if no error Lorenzo Stoakes
0 siblings, 2 replies; 3+ messages in thread
From: Lorenzo Stoakes @ 2024-10-15 17:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Jann Horn, Liam Howlett, Vlastimil Babka,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-mm, linux-kernel
During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.
In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.
As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.
We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and disallow
uffd from observing fork events should an error have occurred.
Lorenzo Stoakes (2):
fork: do not invoke uffd on fork if error occurs
fork: only invoke khugepaged, ksm hooks if no error
fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++
include/linux/ksm.h | 10 ++++------
include/linux/userfaultfd_k.h | 5 +++++
kernel/fork.c | 12 ++++++------
4 files changed, 43 insertions(+), 12 deletions(-)
--
2.46.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH hotfix 6.12 1/2] fork: do not invoke uffd on fork if error occurs
2024-10-15 17:56 [PATCH hotfix 6.12 0/2] fork: do not expose incomplete mm on fork Lorenzo Stoakes
@ 2024-10-15 17:56 ` Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 2/2] fork: only invoke khugepaged, ksm hooks if no error Lorenzo Stoakes
1 sibling, 0 replies; 3+ messages in thread
From: Lorenzo Stoakes @ 2024-10-15 17:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Jann Horn, Liam Howlett, Vlastimil Babka,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-mm, linux-kernel
Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.
This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers for
the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().
This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple
tree in dup_mmap()") makes this more pertinent as we may be in a state
where entries in the maple tree are not yet consistent.
We address this by, on fork error, ensuring that we roll back state that we
would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().
We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.
Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.
Reported-by: Jann Horn <jannh@google.com>
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++
include/linux/userfaultfd_k.h | 5 +++++
kernel/fork.c | 5 ++++-
3 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 68cdd89c97a3..7c0bd0b55f88 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -692,6 +692,34 @@ void dup_userfaultfd_complete(struct list_head *fcs)
}
}
+void dup_userfaultfd_fail(struct list_head *fcs)
+{
+ struct userfaultfd_fork_ctx *fctx, *n;
+
+ /*
+ * An error has occurred on fork, we will tear memory down, but have
+ * allocated memory for fctx's and raised reference counts for both the
+ * original and child contexts (and on the mm for each as a result).
+ *
+ * These would ordinarily be taken care of by a user handling the event,
+ * but we are no longer doing so, so manually clean up here.
+ *
+ * mm tear down will take care of cleaning up VMA contexts.
+ */
+ list_for_each_entry_safe(fctx, n, fcs, list) {
+ struct userfaultfd_ctx *octx = fctx->orig;
+ struct userfaultfd_ctx *ctx = fctx->new;
+
+ atomic_dec(&octx->mmap_changing);
+ VM_BUG_ON(atomic_read(&octx->mmap_changing) < 0);
+ userfaultfd_ctx_put(octx);
+ userfaultfd_ctx_put(ctx);
+
+ list_del(&fctx->list);
+ kfree(fctx);
+ }
+}
+
void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *vm_ctx)
{
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 9fc6ce15c499..cb40f1a1d081 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -249,6 +249,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
extern void dup_userfaultfd_complete(struct list_head *);
+void dup_userfaultfd_fail(struct list_head *);
extern void mremap_userfaultfd_prep(struct vm_area_struct *,
struct vm_userfaultfd_ctx *);
@@ -351,6 +352,10 @@ static inline void dup_userfaultfd_complete(struct list_head *l)
{
}
+static inline void dup_userfaultfd_fail(struct list_head *l)
+{
+}
+
static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *ctx)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index 89ceb4a68af2..597b477dd491 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -775,7 +775,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
mmap_write_unlock(oldmm);
- dup_userfaultfd_complete(&uf);
+ if (!retval)
+ dup_userfaultfd_complete(&uf);
+ else
+ dup_userfaultfd_fail(&uf);
fail_uprobe_end:
uprobe_end_dup_mmap();
return retval;
--
2.46.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH hotfix 6.12 2/2] fork: only invoke khugepaged, ksm hooks if no error
2024-10-15 17:56 [PATCH hotfix 6.12 0/2] fork: do not expose incomplete mm on fork Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 1/2] fork: do not invoke uffd on fork if error occurs Lorenzo Stoakes
@ 2024-10-15 17:56 ` Lorenzo Stoakes
1 sibling, 0 replies; 3+ messages in thread
From: Lorenzo Stoakes @ 2024-10-15 17:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Jann Horn, Liam Howlett, Vlastimil Babka,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-mm, linux-kernel
There is no reason to invoke these hooks early against an mm that is in an
incomplete state.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple
tree in dup_mmap()") makes this more pertinent as we may be in a state
where entries in the maple tree are not yet consistent.
Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in any
case.
Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't really
matter all that much if, under such dire memory pressure, we fail to
register an mm with these.
As a result, we follow the example of commit d2081b2bf819 ("mm: khugepaged:
make khugepaged_enter() void function") and make ksm_fork() a void function
also.
We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.
Reported-by: Jann Horn <jannh@google.com>
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Cc: stable@vger.kernel.org
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/ksm.h | 10 ++++------
kernel/fork.c | 7 ++-----
2 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 11690dacd986..ec9c05044d4f 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -54,12 +54,11 @@ static inline long mm_ksm_zero_pages(struct mm_struct *mm)
return atomic_long_read(&mm->ksm_zero_pages);
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
+ /* Adding mm to ksm is best effort on fork. */
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
- return __ksm_enter(mm);
-
- return 0;
+ __ksm_enter(mm);
}
static inline int ksm_execve(struct mm_struct *mm)
@@ -107,9 +106,8 @@ static inline int ksm_disable(struct mm_struct *mm)
return 0;
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
- return 0;
}
static inline int ksm_execve(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index 597b477dd491..3bf38d260cb3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -653,11 +653,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
mm->exec_vm = oldmm->exec_vm;
mm->stack_vm = oldmm->stack_vm;
- retval = ksm_fork(mm, oldmm);
- if (retval)
- goto out;
- khugepaged_fork(mm, oldmm);
-
/* Use __mt_dup() to efficiently build an identical maple tree. */
retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL);
if (unlikely(retval))
@@ -760,6 +755,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
vma_iter_free(&vmi);
if (!retval) {
mt_set_in_rcu(vmi.mas.tree);
+ ksm_fork(mm, oldmm);
+ khugepaged_fork(mm, oldmm);
} else if (mpnt) {
/*
* The entire maple tree has already been duplicated. If the
--
2.46.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-10-15 17:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 17:56 [PATCH hotfix 6.12 0/2] fork: do not expose incomplete mm on fork Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 1/2] fork: do not invoke uffd on fork if error occurs Lorenzo Stoakes
2024-10-15 17:56 ` [PATCH hotfix 6.12 2/2] fork: only invoke khugepaged, ksm hooks if no error Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).