* [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more
@ 2025-04-02 9:45 Ingo Molnar
2025-04-02 9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
` (6 more replies)
0 siblings, 7 replies; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
These are a couple of cleanups and micro-optimizations by
Andy and Peter around the x86 use_/unuse_temporary_mm() APIs,
which were posted back in November, and which I merged on top
of the WIP.x86/alternatives tree:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/mm
Thanks,
Ingo
===============>
Andy Lutomirski (5):
x86/events, x86/insn-eval: Remove incorrect current->active_mm references
x86/mm: Make use_/unuse_temporary_mm() non-static
x86/mm: Allow temporary MMs when IRQs are on
x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
x86/mm: Opt-in to IRQs-off activate_mm()
Peter Zijlstra (2):
x86/mm: Add 'mm' argument to unuse_temporary_mm()
x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
arch/x86/Kconfig | 1 +
arch/x86/events/core.c | 9 ++++-
arch/x86/include/asm/mmu_context.h | 5 ++-
arch/x86/kernel/alternative.c | 64 -----------------------------------
arch/x86/lib/insn-eval.c | 13 +++++--
arch/x86/mm/tlb.c | 69 ++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 7 ++--
7 files changed, 94 insertions(+), 74 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm()
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02 9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
` (5 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Peter Zijlstra <peterz@infradead.org>
In commit 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
unuse_temporary_mm() grew the assumption that it gets used on
poking_mm exclusively. While this is currently true, lets not hard
code this assumption.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.322525475@infradead.org
---
arch/x86/kernel/alternative.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5b1a6252a4b9..cfffcb80f564 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2161,14 +2161,14 @@ static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
__ro_after_init struct mm_struct *text_poke_mm;
__ro_after_init unsigned long text_poke_mm_addr;
-static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
+static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
{
lockdep_assert_irqs_disabled();
switch_mm_irqs_off(NULL, prev_mm, current);
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
/*
* Restore the breakpoints if they were disabled before the temporary mm
@@ -2275,7 +2275,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
* instruction that already allows the core to see the updated version.
* Xen-PV is assumed to serialize execution in a similar manner.
*/
- unuse_temporary_mm(prev_mm);
+ unuse_temporary_mm(text_poke_mm, prev_mm);
/*
* Flushing the TLB might involve IPIs, which would require enabled
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
2025-04-02 9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
` (4 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Andy Lutomirski <luto@kernel.org>
When decoding an instruction or handling a perf event that references an
LDT segment, if we don't have a valid user context, trying to access the
LDT by any means other than SLDT is racy. Certainly, using
current->active_mm is wrong, as active_mm can point to a real user mm when
CR3 and LDTR no longer reference that mm.
Clean up the code. If nmi_uaccess_okay() says we don't have a valid
context, just fail. Otherwise use current->mm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.433533770@infradead.org
---
arch/x86/events/core.c | 9 ++++++++-
arch/x86/lib/insn-eval.c | 13 ++++++++++---
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6866cc5acb0b..95118b52b606 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2803,8 +2803,15 @@ static unsigned long get_segment_base(unsigned int segment)
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
+ /*
+ * If we're not in a valid context with a real (not just lazy)
+ * user mm, then don't even try.
+ */
+ if (!nmi_uaccess_okay())
+ return 0;
+
/* IRQs are off, so this synchronizes with smp_store_release */
- ldt = READ_ONCE(current->active_mm->context.ldt);
+ ldt = smp_load_acquire(¤t->mm->context.ldt);
if (!ldt || idx >= ldt->nr_entries)
return 0;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 98631c0e7a11..f786401ac15d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -631,14 +631,21 @@ static bool get_desc(struct desc_struct *out, unsigned short sel)
/* Bits [15:3] contain the index of the desired entry. */
sel >>= 3;
- mutex_lock(¤t->active_mm->context.lock);
- ldt = current->active_mm->context.ldt;
+ /*
+ * If we're not in a valid context with a real (not just lazy)
+ * user mm, then don't even try.
+ */
+ if (!nmi_uaccess_okay())
+ return false;
+
+ mutex_lock(¤t->mm->context.lock);
+ ldt = current->mm->context.ldt;
if (ldt && sel < ldt->nr_entries) {
*out = ldt->entries[sel];
success = true;
}
- mutex_unlock(¤t->active_mm->context.lock);
+ mutex_unlock(¤t->mm->context.lock);
return success;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
2025-04-02 9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
2025-04-02 9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
` (3 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Andy Lutomirski <luto@kernel.org>
This prepares them for use outside of the alternative machinery.
The code is unchanged.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.533822339@infradead.org
---
arch/x86/include/asm/mmu_context.h | 3 ++
arch/x86/kernel/alternative.c | 64 --------------------------------------
arch/x86/mm/tlb.c | 64 ++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 64 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2398058b6e83..b103e1709a67 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -272,4 +272,7 @@ unsigned long __get_current_cr3_fast(void);
#include <asm-generic/mmu_context.h>
+extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
+extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+
#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cfffcb80f564..25abadaf8751 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2111,73 +2111,9 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
}
}
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- * harden security IRQs must be disabled while the temporary mm is
- * loaded, thereby preventing interrupt handler bugs from overriding
- * the kernel memory protection.
- */
-static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
-{
- struct mm_struct *prev_mm;
-
- lockdep_assert_irqs_disabled();
-
- /*
- * Make sure not to be in TLB lazy mode, as otherwise we'll end up
- * with a stale address space WITHOUT being in lazy mode after
- * restoring the previous mm.
- */
- if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
- leave_mm();
-
- prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
- switch_mm_irqs_off(NULL, temp_mm, current);
-
- /*
- * If breakpoints are enabled, disable them while the temporary mm is
- * used. Userspace might set up watchpoints on addresses that are used
- * in the temporary mm, which would lead to wrong signals being sent or
- * crashes.
- *
- * Note that breakpoints are not disabled selectively, which also causes
- * kernel breakpoints (e.g., perf's) to be disabled. This might be
- * undesirable, but still seems reasonable as the code that runs in the
- * temporary mm should be short.
- */
- if (hw_breakpoint_active())
- hw_breakpoint_disable();
-
- return prev_mm;
-}
-
__ro_after_init struct mm_struct *text_poke_mm;
__ro_after_init unsigned long text_poke_mm_addr;
-static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
-{
- lockdep_assert_irqs_disabled();
-
- switch_mm_irqs_off(NULL, prev_mm, current);
-
- /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
-
- /*
- * Restore the breakpoints if they were disabled before the temporary mm
- * was loaded.
- */
- if (hw_breakpoint_active())
- hw_breakpoint_restore();
-}
-
static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
memcpy(dst, src, len);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0925768d00cb..06a1ad39be74 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -972,6 +972,70 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
this_cpu_write(cpu_tlbstate_shared.is_lazy, true);
}
+/*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ * harden security IRQs must be disabled while the temporary mm is
+ * loaded, thereby preventing interrupt handler bugs from overriding
+ * the kernel memory protection.
+ */
+struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
+{
+ struct mm_struct *prev_mm;
+
+ lockdep_assert_irqs_disabled();
+
+ /*
+ * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+ * with a stale address space WITHOUT being in lazy mode after
+ * restoring the previous mm.
+ */
+ if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
+ leave_mm();
+
+ prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+ switch_mm_irqs_off(NULL, temp_mm, current);
+
+ /*
+ * If breakpoints are enabled, disable them while the temporary mm is
+ * used. Userspace might set up watchpoints on addresses that are used
+ * in the temporary mm, which would lead to wrong signals being sent or
+ * crashes.
+ *
+ * Note that breakpoints are not disabled selectively, which also causes
+ * kernel breakpoints (e.g., perf's) to be disabled. This might be
+ * undesirable, but still seems reasonable as the code that runs in the
+ * temporary mm should be short.
+ */
+ if (hw_breakpoint_active())
+ hw_breakpoint_disable();
+
+ return prev_mm;
+}
+
+void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+{
+ lockdep_assert_irqs_disabled();
+
+ switch_mm_irqs_off(NULL, prev_mm, current);
+
+ /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+
+ /*
+ * Restore the breakpoints if they were disabled before the temporary mm
+ * was loaded.
+ */
+ if (hw_breakpoint_active())
+ hw_breakpoint_restore();
+}
+
/*
* Call this when reinitializing a CPU. It fixes the following potential
* problems:
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
` (2 preceding siblings ...)
2025-04-02 9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02 9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
` (2 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Peter Zijlstra <peterz@infradead.org>
Now that unuse_temporary_mm() lives in tlb.c it can access
cpu_tlbstate.loaded_mm.
[ mingo: Merged it on top of x86/alternatives ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.648739178@infradead.org
---
arch/x86/include/asm/mmu_context.h | 2 +-
arch/x86/kernel/alternative.c | 2 +-
arch/x86/mm/tlb.c | 8 ++++----
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b103e1709a67..988c11792634 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -273,6 +273,6 @@ unsigned long __get_current_cr3_fast(void);
#include <asm-generic/mmu_context.h>
extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
-extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+extern void unuse_temporary_mm(struct mm_struct *prev_mm);
#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 25abadaf8751..964a2eb0071a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2211,7 +2211,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
* instruction that already allows the core to see the updated version.
* Xen-PV is assumed to serialize execution in a similar manner.
*/
- unuse_temporary_mm(text_poke_mm, prev_mm);
+ unuse_temporary_mm(prev_mm);
/*
* Flushing the TLB might involve IPIs, which would require enabled
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 06a1ad39be74..e672508ca158 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1019,14 +1019,14 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
return prev_mm;
}
-void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+void unuse_temporary_mm(struct mm_struct *prev_mm)
{
lockdep_assert_irqs_disabled();
- switch_mm_irqs_off(NULL, prev_mm, current);
-
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+ cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
+
+ switch_mm_irqs_off(NULL, prev_mm, current);
/*
* Restore the breakpoints if they were disabled before the temporary mm
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
` (3 preceding siblings ...)
2025-04-02 9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
2025-04-02 9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar, Ard Biesheuvel
From: Andy Lutomirski <luto@kernel.org>
EFI runtime services should use temporary MMs, but EFI runtime services
want IRQs on. Preemption must still be disabled in a temporary MM context.
At some point, the entirely temporary MM mechanism should be moved out of
arch code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241119163035.758732080@infradead.org
---
arch/x86/mm/tlb.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e672508ca158..8e4818ce04a5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -978,18 +978,23 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
* that override the kernel memory protections (e.g., W^X), without exposing the
* temporary page-table mappings that are required for these write operations to
* other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
+ * mapping is torn down. Temporary mms can also be used for EFI runtime service
+ * calls or similar functionality.
*
- * Context: The temporary mm needs to be used exclusively by a single core. To
- * harden security IRQs must be disabled while the temporary mm is
- * loaded, thereby preventing interrupt handler bugs from overriding
- * the kernel memory protection.
+ * It is illegal to schedule while using a temporary mm -- the context switch
+ * code is unaware of the temporary mm and does not know how to context switch.
+ * Use a real (non-temporary) mm in a kernel thread if you need to sleep.
+ *
+ * Note: For sensitive memory writes, the temporary mm needs to be used
+ * exclusively by a single core, and IRQs should be disabled while the
+ * temporary mm is loaded, thereby preventing interrupt handler bugs from
+ * overriding the kernel memory protection.
*/
struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
{
struct mm_struct *prev_mm;
- lockdep_assert_irqs_disabled();
+ lockdep_assert_preemption_disabled();
/*
* Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1021,7 +1026,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
void unuse_temporary_mm(struct mm_struct *prev_mm)
{
- lockdep_assert_irqs_disabled();
+ lockdep_assert_preemption_disabled();
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
` (4 preceding siblings ...)
2025-04-02 9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Andy Lutomirski <luto@kernel.org>
This should be considerably more robust. It's also necessary for optimized
for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
lazy context would remove the lazy mm from mm_cpumask().
[ mingo: Merged it on top of x86/alternatives ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.877939834@infradead.org
---
arch/x86/platform/efi/efi_64.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ac57259a432b..a5d3496d32a5 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
*/
static void efi_enter_mm(void)
{
- efi_prev_mm = current->active_mm;
- current->active_mm = &efi_mm;
- switch_mm(efi_prev_mm, &efi_mm, NULL);
+ efi_prev_mm = use_temporary_mm(&efi_mm);
}
static void efi_leave_mm(void)
{
- current->active_mm = efi_prev_mm;
- switch_mm(&efi_mm, efi_prev_mm, NULL);
+ unuse_temporary_mm(efi_prev_mm);
}
void arch_efi_call_virt_setup(void)
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm()
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
` (5 preceding siblings ...)
2025-04-02 9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
@ 2025-04-02 9:45 ` Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02 9:45 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
Linus Torvalds, Andrew Morton, Ingo Molnar
From: Andy Lutomirski <luto@kernel.org>
We gain nothing by having the core code enable IRQs right before calling
activate_mm() only for us to turn them right back off again in switch_mm().
This will save a few cycles, so execve() should be blazingly fast with this
patch applied!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.985203915@infradead.org
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu_context.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 98bd4935280c..6b90d93fc40e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -149,6 +149,7 @@ config X86
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
+ select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_WATCHDOG
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 988c11792634..c511f8584ae4 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -190,7 +190,7 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
#define activate_mm(prev, next) \
do { \
paravirt_enter_mmap(next); \
- switch_mm((prev), (next), NULL); \
+ switch_mm_irqs_off((prev), (next), NULL); \
} while (0);
#ifdef CONFIG_X86_32
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Opt-in to IRQs-off activate_mm()
2025-04-02 9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Andy Lutomirski
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: af8967158f9ad759a93e8e7a933c10e7cbb01ba2
Gitweb: https://git.kernel.org/tip/af8967158f9ad759a93e8e7a933c10e7cbb01ba2
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 02 Apr 2025 11:45:40 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:08 +02:00
x86/mm: Opt-in to IRQs-off activate_mm()
We gain nothing by having the core code enable IRQs right before calling
activate_mm() only for us to turn them right back off again in switch_mm().
This will save a few cycles, so execve() should be blazingly fast with this
patch applied!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-8-mingo@kernel.org
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu_context.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b9f378..aeac63b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -153,6 +153,7 @@ config X86
select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
+ select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_WATCHDOG
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 988c117..c511f85 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -190,7 +190,7 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
#define activate_mm(prev, next) \
do { \
paravirt_enter_mmap(next); \
- switch_mm((prev), (next), NULL); \
+ switch_mm_irqs_off((prev), (next), NULL); \
} while (0);
#ifdef CONFIG_X86_32
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-02 9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Andy Lutomirski
2025-04-17 14:17 ` Borislav Petkov
0 siblings, 1 reply; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: e7021e2fe0b4335523d3f6e2221000bdfc633b62
Gitweb: https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 02 Apr 2025 11:45:39 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00
x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
This should be considerably more robust. It's also necessary for optimized
for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
lazy context would remove the lazy mm from mm_cpumask().
[ mingo: Merged it on top of x86/alternatives ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
---
arch/x86/platform/efi/efi_64.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ac57259..a5d3496 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
*/
static void efi_enter_mm(void)
{
- efi_prev_mm = current->active_mm;
- current->active_mm = &efi_mm;
- switch_mm(efi_prev_mm, &efi_mm, NULL);
+ efi_prev_mm = use_temporary_mm(&efi_mm);
}
static void efi_leave_mm(void)
{
- current->active_mm = efi_prev_mm;
- switch_mm(&efi_mm, efi_prev_mm, NULL);
+ unuse_temporary_mm(efi_prev_mm);
}
void arch_efi_call_virt_setup(void)
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Allow temporary MMs when IRQs are on
2025-04-02 9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Andy Lutomirski
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton,
Ard Biesheuvel, x86, linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: 58f8ffa917669a0c8c027e24d5349f0b488f8181
Gitweb: https://git.kernel.org/tip/58f8ffa917669a0c8c027e24d5349f0b488f8181
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 02 Apr 2025 11:45:38 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:00 +02:00
x86/mm: Allow temporary MMs when IRQs are on
EFI runtime services should use temporary MMs, but EFI runtime services
want IRQs on. Preemption must still be disabled in a temporary MM context.
At some point, the entirely temporary MM mechanism should be moved out of
arch code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-6-mingo@kernel.org
---
arch/x86/mm/tlb.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 38fdcf8..c9b87e5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -977,18 +977,23 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
* that override the kernel memory protections (e.g., W^X), without exposing the
* temporary page-table mappings that are required for these write operations to
* other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
+ * mapping is torn down. Temporary mms can also be used for EFI runtime service
+ * calls or similar functionality.
*
- * Context: The temporary mm needs to be used exclusively by a single core. To
- * harden security IRQs must be disabled while the temporary mm is
- * loaded, thereby preventing interrupt handler bugs from overriding
- * the kernel memory protection.
+ * It is illegal to schedule while using a temporary mm -- the context switch
+ * code is unaware of the temporary mm and does not know how to context switch.
+ * Use a real (non-temporary) mm in a kernel thread if you need to sleep.
+ *
+ * Note: For sensitive memory writes, the temporary mm needs to be used
+ * exclusively by a single core, and IRQs should be disabled while the
+ * temporary mm is loaded, thereby preventing interrupt handler bugs from
+ * overriding the kernel memory protection.
*/
struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
{
struct mm_struct *prev_mm;
- lockdep_assert_irqs_disabled();
+ lockdep_assert_preemption_disabled();
/*
* Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1020,7 +1025,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
void unuse_temporary_mm(struct mm_struct *prev_mm)
{
- lockdep_assert_irqs_disabled();
+ lockdep_assert_preemption_disabled();
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
2025-04-02 9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Andy Lutomirski,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: 4873f494bbe4670f353a9b76ce44e6028c811cbb
Gitweb: https://git.kernel.org/tip/4873f494bbe4670f353a9b76ce44e6028c811cbb
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 02 Apr 2025 11:45:37 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:56 +02:00
x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
Now that unuse_temporary_mm() lives in tlb.c it can access
cpu_tlbstate.loaded_mm.
[ mingo: Merged it on top of x86/alternatives ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-5-mingo@kernel.org
---
arch/x86/include/asm/mmu_context.h | 2 +-
arch/x86/kernel/alternative.c | 2 +-
arch/x86/mm/tlb.c | 8 ++++----
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b103e17..988c117 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -273,6 +273,6 @@ unsigned long __get_current_cr3_fast(void);
#include <asm-generic/mmu_context.h>
extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
-extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+extern void unuse_temporary_mm(struct mm_struct *prev_mm);
#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index bdbdfa0..ddbc303 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2211,7 +2211,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
* instruction that already allows the core to see the updated version.
* Xen-PV is assumed to serialize execution in a similar manner.
*/
- unuse_temporary_mm(text_poke_mm, prev_mm);
+ unuse_temporary_mm(prev_mm);
/*
* Flushing the TLB might involve IPIs, which would require enabled
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f3da20b..38fdcf8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1018,14 +1018,14 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
return prev_mm;
}
-void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+void unuse_temporary_mm(struct mm_struct *prev_mm)
{
lockdep_assert_irqs_disabled();
- switch_mm_irqs_off(NULL, prev_mm, current);
-
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+ cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
+
+ switch_mm_irqs_off(NULL, prev_mm, current);
/*
* Restore the breakpoints if they were disabled before the temporary mm
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Make use_/unuse_temporary_mm() non-static
2025-04-02 9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Andy Lutomirski
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: d376972c9825ac4e8ad74872ee0730a5b4292e44
Gitweb: https://git.kernel.org/tip/d376972c9825ac4e8ad74872ee0730a5b4292e44
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 02 Apr 2025 11:45:36 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:52 +02:00
x86/mm: Make use_/unuse_temporary_mm() non-static
This prepares them for use outside of the alternative machinery.
The code is unchanged.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-4-mingo@kernel.org
---
arch/x86/include/asm/mmu_context.h | 3 +-
arch/x86/kernel/alternative.c | 64 +-----------------------------
arch/x86/mm/tlb.c | 64 +++++++++++++++++++++++++++++-
3 files changed, 67 insertions(+), 64 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2398058..b103e17 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -272,4 +272,7 @@ unsigned long __get_current_cr3_fast(void);
#include <asm-generic/mmu_context.h>
+extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
+extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+
#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 95053e8..bdbdfa0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2111,73 +2111,9 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
}
}
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- * harden security IRQs must be disabled while the temporary mm is
- * loaded, thereby preventing interrupt handler bugs from overriding
- * the kernel memory protection.
- */
-static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
-{
- struct mm_struct *prev_mm;
-
- lockdep_assert_irqs_disabled();
-
- /*
- * Make sure not to be in TLB lazy mode, as otherwise we'll end up
- * with a stale address space WITHOUT being in lazy mode after
- * restoring the previous mm.
- */
- if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
- leave_mm();
-
- prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
- switch_mm_irqs_off(NULL, temp_mm, current);
-
- /*
- * If breakpoints are enabled, disable them while the temporary mm is
- * used. Userspace might set up watchpoints on addresses that are used
- * in the temporary mm, which would lead to wrong signals being sent or
- * crashes.
- *
- * Note that breakpoints are not disabled selectively, which also causes
- * kernel breakpoints (e.g., perf's) to be disabled. This might be
- * undesirable, but still seems reasonable as the code that runs in the
- * temporary mm should be short.
- */
- if (hw_breakpoint_active())
- hw_breakpoint_disable();
-
- return prev_mm;
-}
-
__ro_after_init struct mm_struct *text_poke_mm;
__ro_after_init unsigned long text_poke_mm_addr;
-static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
-{
- lockdep_assert_irqs_disabled();
-
- switch_mm_irqs_off(NULL, prev_mm, current);
-
- /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
-
- /*
- * Restore the breakpoints if they were disabled before the temporary mm
- * was loaded.
- */
- if (hw_breakpoint_active())
- hw_breakpoint_restore();
-}
-
static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
memcpy(dst, src, len);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e459d97..f3da20b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -972,6 +972,70 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
}
/*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ * harden security IRQs must be disabled while the temporary mm is
+ * loaded, thereby preventing interrupt handler bugs from overriding
+ * the kernel memory protection.
+ */
+struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
+{
+ struct mm_struct *prev_mm;
+
+ lockdep_assert_irqs_disabled();
+
+ /*
+ * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+ * with a stale address space WITHOUT being in lazy mode after
+ * restoring the previous mm.
+ */
+ if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
+ leave_mm();
+
+ prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+ switch_mm_irqs_off(NULL, temp_mm, current);
+
+ /*
+ * If breakpoints are enabled, disable them while the temporary mm is
+ * used. Userspace might set up watchpoints on addresses that are used
+ * in the temporary mm, which would lead to wrong signals being sent or
+ * crashes.
+ *
+ * Note that breakpoints are not disabled selectively, which also causes
+ * kernel breakpoints (e.g., perf's) to be disabled. This might be
+ * undesirable, but still seems reasonable as the code that runs in the
+ * temporary mm should be short.
+ */
+ if (hw_breakpoint_active())
+ hw_breakpoint_disable();
+
+ return prev_mm;
+}
+
+void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+{
+ lockdep_assert_irqs_disabled();
+
+ switch_mm_irqs_off(NULL, prev_mm, current);
+
+ /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+
+ /*
+ * Restore the breakpoints if they were disabled before the temporary mm
+ * was loaded.
+ */
+ if (hw_breakpoint_active())
+ hw_breakpoint_restore();
+}
+
+/*
* Call this when reinitializing a CPU. It fixes the following potential
* problems:
*
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/events, x86/insn-eval: Remove incorrect current->active_mm references
2025-04-02 9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Andy Lutomirski
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: 81e3cbdef230fd9adfa8569044b07290afd66708
Gitweb: https://git.kernel.org/tip/81e3cbdef230fd9adfa8569044b07290afd66708
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 02 Apr 2025 11:45:35 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:46 +02:00
x86/events, x86/insn-eval: Remove incorrect current->active_mm references
When decoding an instruction or handling a perf event that references an
LDT segment, if we don't have a valid user context, trying to access the
LDT by any means other than SLDT is racy. Certainly, using
current->active_mm is wrong, as active_mm can point to a real user mm when
CR3 and LDTR no longer reference that mm.
Clean up the code. If nmi_uaccess_okay() says we don't have a valid
context, just fail. Otherwise use current->mm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-3-mingo@kernel.org
---
arch/x86/events/core.c | 9 ++++++++-
arch/x86/lib/insn-eval.c | 13 ++++++++++---
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6866cc5..95118b5 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2803,8 +2803,15 @@ static unsigned long get_segment_base(unsigned int segment)
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
+ /*
+ * If we're not in a valid context with a real (not just lazy)
+ * user mm, then don't even try.
+ */
+ if (!nmi_uaccess_okay())
+ return 0;
+
/* IRQs are off, so this synchronizes with smp_store_release */
- ldt = READ_ONCE(current->active_mm->context.ldt);
+ ldt = smp_load_acquire(¤t->mm->context.ldt);
if (!ldt || idx >= ldt->nr_entries)
return 0;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 98631c0..f786401 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -631,14 +631,21 @@ static bool get_desc(struct desc_struct *out, unsigned short sel)
/* Bits [15:3] contain the index of the desired entry. */
sel >>= 3;
- mutex_lock(¤t->active_mm->context.lock);
- ldt = current->active_mm->context.ldt;
+ /*
+ * If we're not in a valid context with a real (not just lazy)
+ * user mm, then don't even try.
+ */
+ if (!nmi_uaccess_okay())
+ return false;
+
+ mutex_lock(¤t->mm->context.lock);
+ ldt = current->mm->context.ldt;
if (ldt && sel < ldt->nr_entries) {
*out = ldt->entries[sel];
success = true;
}
- mutex_unlock(¤t->active_mm->context.lock);
+ mutex_unlock(¤t->mm->context.lock);
return success;
}
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Add 'mm' argument to unuse_temporary_mm()
2025-04-02 9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
@ 2025-04-12 18:46 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-12 18:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Andy Lutomirski,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: 0812e096cff0fd58d88a21a413fba56c0e6c3caa
Gitweb: https://git.kernel.org/tip/0812e096cff0fd58d88a21a413fba56c0e6c3caa
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 02 Apr 2025 11:45:34 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:37 +02:00
x86/mm: Add 'mm' argument to unuse_temporary_mm()
In commit 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
unuse_temporary_mm() grew the assumption that it gets used on
poking_mm exclusively. While this is currently true, lets not hard
code this assumption.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-2-mingo@kernel.org
---
arch/x86/kernel/alternative.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f785d23..95053e8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2161,14 +2161,14 @@ static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
__ro_after_init struct mm_struct *text_poke_mm;
__ro_after_init unsigned long text_poke_mm_addr;
-static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
+static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
{
lockdep_assert_irqs_disabled();
switch_mm_irqs_off(NULL, prev_mm, current);
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
/*
* Restore the breakpoints if they were disabled before the temporary mm
@@ -2275,7 +2275,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
* instruction that already allows the core to see the updated version.
* Xen-PV is assumed to serialize execution in a similar manner.
*/
- unuse_temporary_mm(prev_mm);
+ unuse_temporary_mm(text_poke_mm, prev_mm);
/*
* Flushing the TLB might involve IPIs, which would require enabled
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
@ 2025-04-17 14:17 ` Borislav Petkov
2025-04-18 9:50 ` Peter Zijlstra
0 siblings, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2025-04-17 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: linux-tip-commits, Andy Lutomirski, Peter Zijlstra (Intel),
Ingo Molnar, Rik van Riel, H. Peter Anvin, Linus Torvalds,
Andrew Morton, x86
On Sat, Apr 12, 2025 at 06:46:48PM -0000, tip-bot2 for Andy Lutomirski wrote:
> The following commit has been merged into the x86/alternatives branch of tip:
>
> Commit-ID: e7021e2fe0b4335523d3f6e2221000bdfc633b62
> Gitweb: https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
> Author: Andy Lutomirski <luto@kernel.org>
> AuthorDate: Wed, 02 Apr 2025 11:45:39 +02:00
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00
>
> x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
>
> This should be considerably more robust. It's also necessary for optimized
> for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
> lazy context would remove the lazy mm from mm_cpumask().
>
> [ mingo: Merged it on top of x86/alternatives ]
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
> ---
> arch/x86/platform/efi/efi_64.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index ac57259..a5d3496 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
> */
> static void efi_enter_mm(void)
> {
> - efi_prev_mm = current->active_mm;
> - current->active_mm = &efi_mm;
> - switch_mm(efi_prev_mm, &efi_mm, NULL);
> + efi_prev_mm = use_temporary_mm(&efi_mm);
> }
>
> static void efi_leave_mm(void)
> {
> - current->active_mm = efi_prev_mm;
> - switch_mm(&efi_mm, efi_prev_mm, NULL);
> + unuse_temporary_mm(efi_prev_mm);
> }
>
> void arch_efi_call_virt_setup(void)
mingo thinks this one causes this:
[ 0.119491] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.119498] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[ 0.137368] Freeing SMP alternatives memory: 40K
[ 0.137381] pid_max: default: 32768 minimum: 301
[ 0.137496] ------------[ cut here ]------------
[ 0.137502] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795 switch_mm_irqs_off+0x3d3/0x460
[ 0.137516] Modules linked in:
[ 0.137526] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc2+ #3 PREEMPT(voluntary)
[ 0.137537] Hardware name: HP HP ProBook 635 Aero G7 Notebook PC/8830, BIOS S84 Ver. 01.05.00 05/14/2021
[ 0.137548] RIP: 0010:switch_mm_irqs_off+0x3d3/0x460
[ 0.137556] Code: 28 00 65 ff 0d 3e c9 db 01 0f 85 88 fd ff ff 0f 1f 44 00 00 e9 7e fd ff ff be 00 01 00 00 31 ff e8 02 cb fb ff e9 be fd ff ff <0f> 0b e9 6c fc ff ff 9c 58 f6 c4 02 0f 84 c4 fd ff ff e8 46 3b 59
[ 0.137575] RSP: 0000:ffffffffb6a03e00 EFLAGS: 00010202
[ 0.137583] RAX: 0000000000000246 RBX: ffffffffb6c5fd40 RCX: 0000000100238000
[ 0.137591] RDX: ffffffffb6a149c0 RSI: ffffffffb6c5fd40 RDI: 0000000000000000
[ 0.137599] RBP: ffffffffb6bbcdc0 R08: 00000000b357d000 R09: 0000000000000000
[ 0.137607] R10: 000000010ab06067 R11: 0000000000000000 R12: ffffffffb6a149c0
[ 0.137616] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.137624] FS: 0000000000000000(0000) GS:ffff9b6093385000(0000) knlGS:0000000000000000
[ 0.137633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.137640] CR2: ffff9b6048601000 CR3: 0000000106c26000 CR4: 0000000000350ef0
[ 0.137648] Call Trace:
[ 0.137653] <TASK>
[ 0.137658] use_temporary_mm+0x55/0x90
[ 0.137666] efi_set_virtual_address_map+0xfd/0x1b0
[ 0.137676] efi_enter_virtual_mode+0x3e3/0x450
[ 0.137685] start_kernel+0x6b7/0x720
[ 0.137693] x86_64_start_reservations+0x24/0x30
[ 0.137700] x86_64_start_kernel+0x7a/0x80
[ 0.137706] common_startup_64+0x13e/0x141
[ 0.137717] </TASK>
[ 0.137720] irq event stamp: 128439
[ 0.137725] hardirqs last enabled at (128447): [<ffffffffb579dcd2>] __up_console_sem+0x52/0x60
[ 0.137737] hardirqs last disabled at (128454): [<ffffffffb579dcb7>] __up_console_sem+0x37/0x60
[ 0.137748] softirqs last enabled at (105766): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[ 0.137759] softirqs last disabled at (105759): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[ 0.137770] ---[ end trace 0000000000000000 ]---
[ 0.137777] ------------[ cut here ]------------
[ 0.137782] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:453 cr4_update_irqsoff+0x45/0x70
[ 0.137794] Modules linked in:
[ 0.137800] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.15.0-rc2+ #3 PREEMPT(voluntary)
[ 0.137813] Tainted: [W]=WARN
[ 0.137817] Hardware name: HP HP ProBook 635 Aero G7 Notebook PC/8830, BIOS S84 Ver. 01.05.00 05/14/2021
[ 0.137827] RIP: 0010:cr4_update_irqsoff+0x45/0x70
[ 0.137834] Code: 0b 65 8b 0d d5 3d e0 01 85 c9 74 13 48 f7 d7 48 21 d7 48 09 c7 48 39 fa 75 20 e9 f6 8d b8 00 65 8b 0d 9b 3a e0 01 85 c9 74 e2 <0f> 0b 48 f7 d7 48 21 d7 48 09 c7 48 39 fa 74 e0 65 48 89 3d eb 6a
[ 0.137853] RSP: 0000:ffffffffb6a03df8 EFLAGS: 00010202
[ 0.137860] RAX: 0000000000000000 RBX: ffffffffb6c5fd40 RCX: 0000000000000001
[ 0.137868] RDX: 0000000000350ef0 RSI: 0000000000000100 RDI: 0000000000000100
[ 0.137876] RBP: ffffffffb6bbcdc0 R08: 00000000b357d000 R09: 0000000000000000
[ 0.137884] R10: 000000010ab06067 R11: 0000000000000000 R12: 000000010022e000
[ 0.137892] R13: 0000000000010000 R14: 0000000000000000 R15: 0000000000000000
[ 0.137900] FS: 0000000000000000(0000) GS:ffff9b6093385000(0000) knlGS:0000000000000000
[ 0.137909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.137916] CR2: ffff9b6048601000 CR3: 000000010022e000 CR4: 0000000000350ef0
[ 0.137924] Call Trace:
[ 0.137928] <TASK>
[ 0.137932] switch_mm_irqs_off+0x3ce/0x460
[ 0.137940] use_temporary_mm+0x55/0x90
[ 0.137946] efi_set_virtual_address_map+0xfd/0x1b0
[ 0.137956] efi_enter_virtual_mode+0x3e3/0x450
[ 0.137964] start_kernel+0x6b7/0x720
[ 0.137971] x86_64_start_reservations+0x24/0x30
[ 0.137978] x86_64_start_kernel+0x7a/0x80
[ 0.137984] common_startup_64+0x13e/0x141
[ 0.137994] </TASK>
[ 0.137998] irq event stamp: 128723
[ 0.138002] hardirqs last enabled at (128731): [<ffffffffb579dcd2>] __up_console_sem+0x52/0x60
[ 0.138013] hardirqs last disabled at (128738): [<ffffffffb579dcb7>] __up_console_sem+0x37/0x60
[ 0.138024] softirqs last enabled at (105766): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[ 0.138034] softirqs last disabled at (105759): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[ 0.138045] ---[ end trace 0000000000000000 ]---
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-17 14:17 ` Borislav Petkov
@ 2025-04-18 9:50 ` Peter Zijlstra
2025-04-18 11:43 ` Borislav Petkov
2025-04-18 12:48 ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2025-04-18 9:50 UTC (permalink / raw)
To: Borislav Petkov
Cc: linux-kernel, linux-tip-commits, Andy Lutomirski, Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86
On Thu, Apr 17, 2025 at 04:17:51PM +0200, Borislav Petkov wrote:
> On Sat, Apr 12, 2025 at 06:46:48PM -0000, tip-bot2 for Andy Lutomirski wrote:
> > The following commit has been merged into the x86/alternatives branch of tip:
> >
> > Commit-ID: e7021e2fe0b4335523d3f6e2221000bdfc633b62
> > Gitweb: https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
> > Author: Andy Lutomirski <luto@kernel.org>
> > AuthorDate: Wed, 02 Apr 2025 11:45:39 +02:00
> > Committer: Ingo Molnar <mingo@kernel.org>
> > CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00
> >
> > x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
> >
> > This should be considerably more robust. It's also necessary for optimized
> > for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
> > lazy context would remove the lazy mm from mm_cpumask().
> >
> > [ mingo: Merged it on top of x86/alternatives ]
> >
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
> > ---
> > arch/x86/platform/efi/efi_64.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> > index ac57259..a5d3496 100644
> > --- a/arch/x86/platform/efi/efi_64.c
> > +++ b/arch/x86/platform/efi/efi_64.c
> > @@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
> > */
> > static void efi_enter_mm(void)
> > {
> > - efi_prev_mm = current->active_mm;
> > - current->active_mm = &efi_mm;
> > - switch_mm(efi_prev_mm, &efi_mm, NULL);
> > + efi_prev_mm = use_temporary_mm(&efi_mm);
> > }
> >
> > static void efi_leave_mm(void)
> > {
> > - current->active_mm = efi_prev_mm;
> > - switch_mm(&efi_mm, efi_prev_mm, NULL);
> > + unuse_temporary_mm(efi_prev_mm);
> > }
> >
> > void arch_efi_call_virt_setup(void)
>
> mingo thinks this one causes this:
>
> [ 0.119491] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
> [ 0.119498] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
> [ 0.137368] Freeing SMP alternatives memory: 40K
> [ 0.137381] pid_max: default: 32768 minimum: 301
> [ 0.137496] ------------[ cut here ]------------
> [ 0.137502] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795 switch_mm_irqs_off+0x3d3/0x460
> [ 0.137516] Modules linked in:
Ah yes :-( Something like so perhaps..
---
Subject: x86/mm: Fix {,un}use_temporary_mm() IRQ state
As the function switch_mm_irqs_off() implies, it ought to be called with
IRQs *off*. Commit 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs
are on") caused this to not be the case for EFI.
Ensure IRQs are off where it matters.
Fixes: 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 92bde0d6205a..1451e022129a 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -991,6 +991,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
struct mm_struct *prev_mm;
lockdep_assert_preemption_disabled();
+ guard(irqsave)();
/*
* Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1023,6 +1024,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
void unuse_temporary_mm(struct mm_struct *prev_mm)
{
lockdep_assert_preemption_disabled();
+ guard(irqsave)();
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-18 9:50 ` Peter Zijlstra
@ 2025-04-18 11:43 ` Borislav Petkov
2025-04-18 12:37 ` Ingo Molnar
2025-04-18 12:48 ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
1 sibling, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2025-04-18 11:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-tip-commits, Andy Lutomirski, Ingo Molnar,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86
On Fri, Apr 18, 2025 at 11:50:34AM +0200, Peter Zijlstra wrote:
> Ah yes :-( Something like so perhaps..
Thanks, that does it.
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
2025-04-18 11:43 ` Borislav Petkov
@ 2025-04-18 12:37 ` Ingo Molnar
0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2025-04-18 12:37 UTC (permalink / raw)
To: Borislav Petkov
Cc: Peter Zijlstra, linux-kernel, linux-tip-commits, Andy Lutomirski,
Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86
* Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Apr 18, 2025 at 11:50:34AM +0200, Peter Zijlstra wrote:
> > Ah yes :-( Something like so perhaps..
>
> Thanks, that does it.
>
> Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
> Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Applied to tip:x86/alternatives, thanks guys!
Ingo
^ permalink raw reply [flat|nested] 20+ messages in thread
* [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state
2025-04-18 9:50 ` Peter Zijlstra
2025-04-18 11:43 ` Borislav Petkov
@ 2025-04-18 12:48 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-18 12:48 UTC (permalink / raw)
To: linux-tip-commits
Cc: Borislav Petkov (AMD), Peter Zijlstra (Intel), Ingo Molnar,
H. Peter Anvin, Andrew Morton, Andy Lutomirski, Linus Torvalds,
Rik van Riel, x86, linux-kernel
The following commit has been merged into the x86/alternatives branch of tip:
Commit-ID: aef1d0209ddf127a8069aca5fa3a062be4136b76
Gitweb: https://git.kernel.org/tip/aef1d0209ddf127a8069aca5fa3a062be4136b76
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 18 Apr 2025 11:50:34 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 18 Apr 2025 14:36:18 +02:00
x86/mm: Fix {,un}use_temporary_mm() IRQ state
As the function switch_mm_irqs_off() implies, it ought to be called with
IRQs *off*. Commit 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs
are on") caused this to not be the case for EFI.
Ensure IRQs are off where it matters.
Fixes: 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on")
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20250418095034.GR38216@noisy.programming.kicks-ass.net
---
arch/x86/mm/tlb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 79c124f..39761c7 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -986,6 +986,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
struct mm_struct *prev_mm;
lockdep_assert_preemption_disabled();
+ guard(irqsave)();
/*
* Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1018,6 +1019,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
void unuse_temporary_mm(struct mm_struct *prev_mm)
{
lockdep_assert_preemption_disabled();
+ guard(irqsave)();
/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-04-18 12:48 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02 9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
2025-04-02 9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02 9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02 9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02 9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-17 14:17 ` Borislav Petkov
2025-04-18 9:50 ` Peter Zijlstra
2025-04-18 11:43 ` Borislav Petkov
2025-04-18 12:37 ` Ingo Molnar
2025-04-18 12:48 ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
2025-04-02 9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
2025-04-12 18:46 ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox