public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more
@ 2025-04-02  9:45 Ingo Molnar
  2025-04-02  9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

These are a couple of cleanups and micro-optimizations by
Andy and Peter around the x86 use_/unuse_temporary_mm() APIs,
which were posted back in November, and which I merged on top
of the WIP.x86/alternatives tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/mm

Thanks,

	Ingo

===============>

Andy Lutomirski (5):
  x86/events, x86/insn-eval: Remove incorrect current->active_mm references
  x86/mm: Make use_/unuse_temporary_mm() non-static
  x86/mm: Allow temporary MMs when IRQs are on
  x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  x86/mm: Opt-in to IRQs-off activate_mm()

Peter Zijlstra (2):
  x86/mm: Add 'mm' argument to unuse_temporary_mm()
  x86/mm: Remove 'mm' argument from unuse_temporary_mm() again

 arch/x86/Kconfig                   |  1 +
 arch/x86/events/core.c             |  9 ++++-
 arch/x86/include/asm/mmu_context.h |  5 ++-
 arch/x86/kernel/alternative.c      | 64 -----------------------------------
 arch/x86/lib/insn-eval.c           | 13 +++++--
 arch/x86/mm/tlb.c                  | 69 ++++++++++++++++++++++++++++++++++++++
 arch/x86/platform/efi/efi_64.c     |  7 ++--
 7 files changed, 94 insertions(+), 74 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm()
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
  2025-04-02  9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Peter Zijlstra <peterz@infradead.org>

In commit 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
unuse_temporary_mm() grew the assumption that it gets used on
poking_mm exclusively. While this is currently true, lets not hard
code this assumption.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.322525475@infradead.org
---
 arch/x86/kernel/alternative.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5b1a6252a4b9..cfffcb80f564 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2161,14 +2161,14 @@ static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
+static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
 	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
 
 	/*
 	 * Restore the breakpoints if they were disabled before the temporary mm
@@ -2275,7 +2275,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(prev_mm);
+	unuse_temporary_mm(text_poke_mm, prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
  2025-04-02  9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
  2025-04-02  9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

When decoding an instruction or handling a perf event that references an
LDT segment, if we don't have a valid user context, trying to access the
LDT by any means other than SLDT is racy.  Certainly, using
current->active_mm is wrong, as active_mm can point to a real user mm when
CR3 and LDTR no longer reference that mm.

Clean up the code.  If nmi_uaccess_okay() says we don't have a valid
context, just fail.  Otherwise use current->mm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.433533770@infradead.org
---
 arch/x86/events/core.c   |  9 ++++++++-
 arch/x86/lib/insn-eval.c | 13 ++++++++++---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6866cc5acb0b..95118b52b606 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2803,8 +2803,15 @@ static unsigned long get_segment_base(unsigned int segment)
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
+		/*
+		 * If we're not in a valid context with a real (not just lazy)
+		 * user mm, then don't even try.
+		 */
+		if (!nmi_uaccess_okay())
+			return 0;
+
 		/* IRQs are off, so this synchronizes with smp_store_release */
-		ldt = READ_ONCE(current->active_mm->context.ldt);
+		ldt = smp_load_acquire(&current->mm->context.ldt);
 		if (!ldt || idx >= ldt->nr_entries)
 			return 0;
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 98631c0e7a11..f786401ac15d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -631,14 +631,21 @@ static bool get_desc(struct desc_struct *out, unsigned short sel)
 		/* Bits [15:3] contain the index of the desired entry. */
 		sel >>= 3;
 
-		mutex_lock(&current->active_mm->context.lock);
-		ldt = current->active_mm->context.ldt;
+		/*
+		 * If we're not in a valid context with a real (not just lazy)
+		 * user mm, then don't even try.
+		 */
+		if (!nmi_uaccess_okay())
+			return false;
+
+		mutex_lock(&current->mm->context.lock);
+		ldt = current->mm->context.ldt;
 		if (ldt && sel < ldt->nr_entries) {
 			*out = ldt->entries[sel];
 			success = true;
 		}
 
-		mutex_unlock(&current->active_mm->context.lock);
+		mutex_unlock(&current->mm->context.lock);
 
 		return success;
 	}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
  2025-04-02  9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
  2025-04-02  9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
  2025-04-02  9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

This prepares them for use outside of the alternative machinery.
The code is unchanged.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.533822339@infradead.org
---
 arch/x86/include/asm/mmu_context.h |  3 ++
 arch/x86/kernel/alternative.c      | 64 --------------------------------------
 arch/x86/mm/tlb.c                  | 64 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2398058b6e83..b103e1709a67 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -272,4 +272,7 @@ unsigned long __get_current_cr3_fast(void);
 
 #include <asm-generic/mmu_context.h>
 
+extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
+extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cfffcb80f564..25abadaf8751 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2111,73 +2111,9 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 	}
 }
 
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
- */
-static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
-{
-	struct mm_struct *prev_mm;
-
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
-	 * with a stale address space WITHOUT being in lazy mode after
-	 * restoring the previous mm.
-	 */
-	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
-		leave_mm();
-
-	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	switch_mm_irqs_off(NULL, temp_mm, current);
-
-	/*
-	 * If breakpoints are enabled, disable them while the temporary mm is
-	 * used. Userspace might set up watchpoints on addresses that are used
-	 * in the temporary mm, which would lead to wrong signals being sent or
-	 * crashes.
-	 *
-	 * Note that breakpoints are not disabled selectively, which also causes
-	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
-	 * undesirable, but still seems reasonable as the code that runs in the
-	 * temporary mm should be short.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_disable();
-
-	return prev_mm;
-}
-
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
-{
-	lockdep_assert_irqs_disabled();
-
-	switch_mm_irqs_off(NULL, prev_mm, current);
-
-	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
-
-	/*
-	 * Restore the breakpoints if they were disabled before the temporary mm
-	 * was loaded.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
-}
-
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
 	memcpy(dst, src, len);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0925768d00cb..06a1ad39be74 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -972,6 +972,70 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 	this_cpu_write(cpu_tlbstate_shared.is_lazy, true);
 }
 
+/*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ *          harden security IRQs must be disabled while the temporary mm is
+ *          loaded, thereby preventing interrupt handler bugs from overriding
+ *          the kernel memory protection.
+ */
+struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
+{
+	struct mm_struct *prev_mm;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+	 * with a stale address space WITHOUT being in lazy mode after
+	 * restoring the previous mm.
+	 */
+	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
+		leave_mm();
+
+	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, temp_mm, current);
+
+	/*
+	 * If breakpoints are enabled, disable them while the temporary mm is
+	 * used. Userspace might set up watchpoints on addresses that are used
+	 * in the temporary mm, which would lead to wrong signals being sent or
+	 * crashes.
+	 *
+	 * Note that breakpoints are not disabled selectively, which also causes
+	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
+	 * undesirable, but still seems reasonable as the code that runs in the
+	 * temporary mm should be short.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_disable();
+
+	return prev_mm;
+}
+
+void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	switch_mm_irqs_off(NULL, prev_mm, current);
+
+	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+
+	/*
+	 * Restore the breakpoints if they were disabled before the temporary mm
+	 * was loaded.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_restore();
+}
+
 /*
  * Call this when reinitializing a CPU.  It fixes the following potential
  * problems:
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
                   ` (2 preceding siblings ...)
  2025-04-02  9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
  2025-04-02  9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Peter Zijlstra <peterz@infradead.org>

Now that unuse_temporary_mm() lives in tlb.c it can access
cpu_tlbstate.loaded_mm.

[ mingo: Merged it on top of x86/alternatives ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.648739178@infradead.org
---
 arch/x86/include/asm/mmu_context.h | 2 +-
 arch/x86/kernel/alternative.c      | 2 +-
 arch/x86/mm/tlb.c                  | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b103e1709a67..988c11792634 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -273,6 +273,6 @@ unsigned long __get_current_cr3_fast(void);
 #include <asm-generic/mmu_context.h>
 
 extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
-extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+extern void unuse_temporary_mm(struct mm_struct *prev_mm);
 
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 25abadaf8751..964a2eb0071a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2211,7 +2211,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(text_poke_mm, prev_mm);
+	unuse_temporary_mm(prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 06a1ad39be74..e672508ca158 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1019,14 +1019,14 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 	return prev_mm;
 }
 
-void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
-	switch_mm_irqs_off(NULL, prev_mm, current);
-
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
+
+	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/*
 	 * Restore the breakpoints if they were disabled before the temporary mm
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
                   ` (3 preceding siblings ...)
  2025-04-02  9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
  2025-04-02  9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
  2025-04-02  9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar, Ard Biesheuvel

From: Andy Lutomirski <luto@kernel.org>

EFI runtime services should use temporary MMs, but EFI runtime services
want IRQs on.  Preemption must still be disabled in a temporary MM context.

At some point, the entirely temporary MM mechanism should be moved out of
arch code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241119163035.758732080@infradead.org
---
 arch/x86/mm/tlb.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e672508ca158..8e4818ce04a5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -978,18 +978,23 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
  * that override the kernel memory protections (e.g., W^X), without exposing the
  * temporary page-table mappings that are required for these write operations to
  * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
+ * mapping is torn down.  Temporary mms can also be used for EFI runtime service
+ * calls or similar functionality.
  *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
+ * It is illegal to schedule while using a temporary mm -- the context switch
+ * code is unaware of the temporary mm and does not know how to context switch.
+ * Use a real (non-temporary) mm in a kernel thread if you need to sleep.
+ *
+ * Note: For sensitive memory writes, the temporary mm needs to be used
+ *       exclusively by a single core, and IRQs should be disabled while the
+ *       temporary mm is loaded, thereby preventing interrupt handler bugs from
+ *       overriding the kernel memory protection.
  */
 struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 {
 	struct mm_struct *prev_mm;
 
-	lockdep_assert_irqs_disabled();
+	lockdep_assert_preemption_disabled();
 
 	/*
 	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1021,7 +1026,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 
 void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
-	lockdep_assert_irqs_disabled();
+	lockdep_assert_preemption_disabled();
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
 	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
                   ` (4 preceding siblings ...)
  2025-04-02  9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
  2025-04-02  9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

This should be considerably more robust.  It's also necessary for optimized
for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
lazy context would remove the lazy mm from mm_cpumask().

[ mingo: Merged it on top of x86/alternatives ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.877939834@infradead.org
---
 arch/x86/platform/efi/efi_64.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ac57259a432b..a5d3496d32a5 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
  */
 static void efi_enter_mm(void)
 {
-	efi_prev_mm = current->active_mm;
-	current->active_mm = &efi_mm;
-	switch_mm(efi_prev_mm, &efi_mm, NULL);
+	efi_prev_mm = use_temporary_mm(&efi_mm);
 }
 
 static void efi_leave_mm(void)
 {
-	current->active_mm = efi_prev_mm;
-	switch_mm(&efi_mm, efi_prev_mm, NULL);
+	unuse_temporary_mm(efi_prev_mm);
 }
 
 void arch_efi_call_virt_setup(void)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm()
  2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
                   ` (5 preceding siblings ...)
  2025-04-02  9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
@ 2025-04-02  9:45 ` Ingo Molnar
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
  6 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-02  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Rik van Riel, H . Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

We gain nothing by having the core code enable IRQs right before calling
activate_mm() only for us to turn them right back off again in switch_mm().

This will save a few cycles, so execve() should be blazingly fast with this
patch applied!

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241119163035.985203915@infradead.org
---
 arch/x86/Kconfig                   | 1 +
 arch/x86/include/asm/mmu_context.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 98bd4935280c..6b90d93fc40e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -149,6 +149,7 @@ config X86
 	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP	if X86_64
 	select ARCH_WANTS_THP_SWAP		if X86_64
 	select ARCH_HAS_PARANOID_L1D_FLUSH
+	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
 	select BUILDTIME_TABLE_SORT
 	select CLKEVT_I8253
 	select CLOCKSOURCE_WATCHDOG
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 988c11792634..c511f8584ae4 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -190,7 +190,7 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #define activate_mm(prev, next)			\
 do {						\
 	paravirt_enter_mmap(next);		\
-	switch_mm((prev), (next), NULL);	\
+	switch_mm_irqs_off((prev), (next), NULL);	\
 } while (0);
 
 #ifdef CONFIG_X86_32
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Opt-in to IRQs-off activate_mm()
  2025-04-02  9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     af8967158f9ad759a93e8e7a933c10e7cbb01ba2
Gitweb:        https://git.kernel.org/tip/af8967158f9ad759a93e8e7a933c10e7cbb01ba2
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:40 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:08 +02:00

x86/mm: Opt-in to IRQs-off activate_mm()

We gain nothing by having the core code enable IRQs right before calling
activate_mm() only for us to turn them right back off again in switch_mm().

This will save a few cycles, so execve() should be blazingly fast with this
patch applied!

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-8-mingo@kernel.org
---
 arch/x86/Kconfig                   | 1 +
 arch/x86/include/asm/mmu_context.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b9f378..aeac63b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -153,6 +153,7 @@ config X86
 	select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
 	select ARCH_WANTS_THP_SWAP		if X86_64
 	select ARCH_HAS_PARANOID_L1D_FLUSH
+	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
 	select BUILDTIME_TABLE_SORT
 	select CLKEVT_I8253
 	select CLOCKSOURCE_WATCHDOG
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 988c117..c511f85 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -190,7 +190,7 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #define activate_mm(prev, next)			\
 do {						\
 	paravirt_enter_mmap(next);		\
-	switch_mm((prev), (next), NULL);	\
+	switch_mm_irqs_off((prev), (next), NULL);	\
 } while (0);
 
 #ifdef CONFIG_X86_32

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-02  9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Andy Lutomirski
  2025-04-17 14:17     ` Borislav Petkov
  0 siblings, 1 reply; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     e7021e2fe0b4335523d3f6e2221000bdfc633b62
Gitweb:        https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:39 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00

x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery

This should be considerably more robust.  It's also necessary for optimized
for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
lazy context would remove the lazy mm from mm_cpumask().

[ mingo: Merged it on top of x86/alternatives ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
---
 arch/x86/platform/efi/efi_64.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ac57259..a5d3496 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
  */
 static void efi_enter_mm(void)
 {
-	efi_prev_mm = current->active_mm;
-	current->active_mm = &efi_mm;
-	switch_mm(efi_prev_mm, &efi_mm, NULL);
+	efi_prev_mm = use_temporary_mm(&efi_mm);
 }
 
 static void efi_leave_mm(void)
 {
-	current->active_mm = efi_prev_mm;
-	switch_mm(&efi_mm, efi_prev_mm, NULL);
+	unuse_temporary_mm(efi_prev_mm);
 }
 
 void arch_efi_call_virt_setup(void)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Allow temporary MMs when IRQs are on
  2025-04-02  9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton,
	Ard Biesheuvel, x86, linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     58f8ffa917669a0c8c027e24d5349f0b488f8181
Gitweb:        https://git.kernel.org/tip/58f8ffa917669a0c8c027e24d5349f0b488f8181
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:38 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:06:00 +02:00

x86/mm: Allow temporary MMs when IRQs are on

EFI runtime services should use temporary MMs, but EFI runtime services
want IRQs on.  Preemption must still be disabled in a temporary MM context.

At some point, the entirely temporary MM mechanism should be moved out of
arch code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-6-mingo@kernel.org
---
 arch/x86/mm/tlb.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 38fdcf8..c9b87e5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -977,18 +977,23 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
  * that override the kernel memory protections (e.g., W^X), without exposing the
  * temporary page-table mappings that are required for these write operations to
  * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
+ * mapping is torn down.  Temporary mms can also be used for EFI runtime service
+ * calls or similar functionality.
  *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
+ * It is illegal to schedule while using a temporary mm -- the context switch
+ * code is unaware of the temporary mm and does not know how to context switch.
+ * Use a real (non-temporary) mm in a kernel thread if you need to sleep.
+ *
+ * Note: For sensitive memory writes, the temporary mm needs to be used
+ *       exclusively by a single core, and IRQs should be disabled while the
+ *       temporary mm is loaded, thereby preventing interrupt handler bugs from
+ *       overriding the kernel memory protection.
  */
 struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 {
 	struct mm_struct *prev_mm;
 
-	lockdep_assert_irqs_disabled();
+	lockdep_assert_preemption_disabled();
 
 	/*
 	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1020,7 +1025,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 
 void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
-	lockdep_assert_irqs_disabled();
+	lockdep_assert_preemption_disabled();
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
 	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again
  2025-04-02  9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ingo Molnar, Andy Lutomirski,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     4873f494bbe4670f353a9b76ce44e6028c811cbb
Gitweb:        https://git.kernel.org/tip/4873f494bbe4670f353a9b76ce44e6028c811cbb
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:37 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:56 +02:00

x86/mm: Remove 'mm' argument from unuse_temporary_mm() again

Now that unuse_temporary_mm() lives in tlb.c it can access
cpu_tlbstate.loaded_mm.

[ mingo: Merged it on top of x86/alternatives ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-5-mingo@kernel.org
---
 arch/x86/include/asm/mmu_context.h | 2 +-
 arch/x86/kernel/alternative.c      | 2 +-
 arch/x86/mm/tlb.c                  | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b103e17..988c117 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -273,6 +273,6 @@ unsigned long __get_current_cr3_fast(void);
 #include <asm-generic/mmu_context.h>
 
 extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
-extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+extern void unuse_temporary_mm(struct mm_struct *prev_mm);
 
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index bdbdfa0..ddbc303 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2211,7 +2211,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(text_poke_mm, prev_mm);
+	unuse_temporary_mm(prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f3da20b..38fdcf8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1018,14 +1018,14 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 	return prev_mm;
 }
 
-void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
-	switch_mm_irqs_off(NULL, prev_mm, current);
-
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
+
+	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/*
 	 * Restore the breakpoints if they were disabled before the temporary mm

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Make use_/unuse_temporary_mm() non-static
  2025-04-02  9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     d376972c9825ac4e8ad74872ee0730a5b4292e44
Gitweb:        https://git.kernel.org/tip/d376972c9825ac4e8ad74872ee0730a5b4292e44
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:36 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:52 +02:00

x86/mm: Make use_/unuse_temporary_mm() non-static

This prepares them for use outside of the alternative machinery.
The code is unchanged.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-4-mingo@kernel.org
---
 arch/x86/include/asm/mmu_context.h |  3 +-
 arch/x86/kernel/alternative.c      | 64 +-----------------------------
 arch/x86/mm/tlb.c                  | 64 +++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2398058..b103e17 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -272,4 +272,7 @@ unsigned long __get_current_cr3_fast(void);
 
 #include <asm-generic/mmu_context.h>
 
+extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
+extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 95053e8..bdbdfa0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2111,73 +2111,9 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 	}
 }
 
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
- */
-static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
-{
-	struct mm_struct *prev_mm;
-
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
-	 * with a stale address space WITHOUT being in lazy mode after
-	 * restoring the previous mm.
-	 */
-	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
-		leave_mm();
-
-	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	switch_mm_irqs_off(NULL, temp_mm, current);
-
-	/*
-	 * If breakpoints are enabled, disable them while the temporary mm is
-	 * used. Userspace might set up watchpoints on addresses that are used
-	 * in the temporary mm, which would lead to wrong signals being sent or
-	 * crashes.
-	 *
-	 * Note that breakpoints are not disabled selectively, which also causes
-	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
-	 * undesirable, but still seems reasonable as the code that runs in the
-	 * temporary mm should be short.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_disable();
-
-	return prev_mm;
-}
-
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
-{
-	lockdep_assert_irqs_disabled();
-
-	switch_mm_irqs_off(NULL, prev_mm, current);
-
-	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
-
-	/*
-	 * Restore the breakpoints if they were disabled before the temporary mm
-	 * was loaded.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
-}
-
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
 	memcpy(dst, src, len);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e459d97..f3da20b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -972,6 +972,70 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 }
 
 /*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ *          harden security IRQs must be disabled while the temporary mm is
+ *          loaded, thereby preventing interrupt handler bugs from overriding
+ *          the kernel memory protection.
+ */
+struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
+{
+	struct mm_struct *prev_mm;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+	 * with a stale address space WITHOUT being in lazy mode after
+	 * restoring the previous mm.
+	 */
+	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
+		leave_mm();
+
+	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, temp_mm, current);
+
+	/*
+	 * If breakpoints are enabled, disable them while the temporary mm is
+	 * used. Userspace might set up watchpoints on addresses that are used
+	 * in the temporary mm, which would lead to wrong signals being sent or
+	 * crashes.
+	 *
+	 * Note that breakpoints are not disabled selectively, which also causes
+	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
+	 * undesirable, but still seems reasonable as the code that runs in the
+	 * temporary mm should be short.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_disable();
+
+	return prev_mm;
+}
+
+void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	switch_mm_irqs_off(NULL, prev_mm, current);
+
+	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+
+	/*
+	 * Restore the breakpoints if they were disabled before the temporary mm
+	 * was loaded.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_restore();
+}
+
+/*
  * Call this when reinitializing a CPU.  It fixes the following potential
  * problems:
  *

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/events, x86/insn-eval: Remove incorrect current->active_mm references
  2025-04-02  9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Peter Zijlstra (Intel), Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     81e3cbdef230fd9adfa8569044b07290afd66708
Gitweb:        https://git.kernel.org/tip/81e3cbdef230fd9adfa8569044b07290afd66708
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:35 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:46 +02:00

x86/events, x86/insn-eval: Remove incorrect current->active_mm references

When decoding an instruction or handling a perf event that references an
LDT segment, if we don't have a valid user context, trying to access the
LDT by any means other than SLDT is racy.  Certainly, using
current->active_mm is wrong, as active_mm can point to a real user mm when
CR3 and LDTR no longer reference that mm.

Clean up the code.  If nmi_uaccess_okay() says we don't have a valid
context, just fail.  Otherwise use current->mm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-3-mingo@kernel.org
---
 arch/x86/events/core.c   |  9 ++++++++-
 arch/x86/lib/insn-eval.c | 13 ++++++++++---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6866cc5..95118b5 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2803,8 +2803,15 @@ static unsigned long get_segment_base(unsigned int segment)
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 		struct ldt_struct *ldt;
 
+		/*
+		 * If we're not in a valid context with a real (not just lazy)
+		 * user mm, then don't even try.
+		 */
+		if (!nmi_uaccess_okay())
+			return 0;
+
 		/* IRQs are off, so this synchronizes with smp_store_release */
-		ldt = READ_ONCE(current->active_mm->context.ldt);
+		ldt = smp_load_acquire(&current->mm->context.ldt);
 		if (!ldt || idx >= ldt->nr_entries)
 			return 0;
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 98631c0..f786401 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -631,14 +631,21 @@ static bool get_desc(struct desc_struct *out, unsigned short sel)
 		/* Bits [15:3] contain the index of the desired entry. */
 		sel >>= 3;
 
-		mutex_lock(&current->active_mm->context.lock);
-		ldt = current->active_mm->context.ldt;
+		/*
+		 * If we're not in a valid context with a real (not just lazy)
+		 * user mm, then don't even try.
+		 */
+		if (!nmi_uaccess_okay())
+			return false;
+
+		mutex_lock(&current->mm->context.lock);
+		ldt = current->mm->context.ldt;
 		if (ldt && sel < ldt->nr_entries) {
 			*out = ldt->entries[sel];
 			success = true;
 		}
 
-		mutex_unlock(&current->active_mm->context.lock);
+		mutex_unlock(&current->mm->context.lock);
 
 		return success;
 	}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Add 'mm' argument to unuse_temporary_mm()
  2025-04-02  9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
@ 2025-04-12 18:46   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-12 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ingo Molnar, Andy Lutomirski,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86,
	linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     0812e096cff0fd58d88a21a413fba56c0e6c3caa
Gitweb:        https://git.kernel.org/tip/0812e096cff0fd58d88a21a413fba56c0e6c3caa
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 02 Apr 2025 11:45:34 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 10:05:37 +02:00

x86/mm: Add 'mm' argument to unuse_temporary_mm()

In commit 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
unuse_temporary_mm() grew the assumption that it gets used on
poking_mm exclusively. While this is currently true, lets not hard
code this assumption.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-2-mingo@kernel.org
---
 arch/x86/kernel/alternative.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f785d23..95053e8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2161,14 +2161,14 @@ static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
+static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
 	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
 
 	/*
 	 * Restore the breakpoints if they were disabled before the temporary mm
@@ -2275,7 +2275,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(prev_mm);
+	unuse_temporary_mm(text_poke_mm, prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
@ 2025-04-17 14:17     ` Borislav Petkov
  2025-04-18  9:50       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2025-04-17 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Andy Lutomirski, Peter Zijlstra (Intel),
	Ingo Molnar, Rik van Riel, H. Peter Anvin, Linus Torvalds,
	Andrew Morton, x86

On Sat, Apr 12, 2025 at 06:46:48PM -0000, tip-bot2 for Andy Lutomirski wrote:
> The following commit has been merged into the x86/alternatives branch of tip:
> 
> Commit-ID:     e7021e2fe0b4335523d3f6e2221000bdfc633b62
> Gitweb:        https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
> Author:        Andy Lutomirski <luto@kernel.org>
> AuthorDate:    Wed, 02 Apr 2025 11:45:39 +02:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00
> 
> x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
> 
> This should be considerably more robust.  It's also necessary for optimized
> for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
> lazy context would remove the lazy mm from mm_cpumask().
> 
> [ mingo: Merged it on top of x86/alternatives ]
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
> ---
>  arch/x86/platform/efi/efi_64.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index ac57259..a5d3496 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
>   */
>  static void efi_enter_mm(void)
>  {
> -	efi_prev_mm = current->active_mm;
> -	current->active_mm = &efi_mm;
> -	switch_mm(efi_prev_mm, &efi_mm, NULL);
> +	efi_prev_mm = use_temporary_mm(&efi_mm);
>  }
>  
>  static void efi_leave_mm(void)
>  {
> -	current->active_mm = efi_prev_mm;
> -	switch_mm(&efi_mm, efi_prev_mm, NULL);
> +	unuse_temporary_mm(efi_prev_mm);
>  }
>  
>  void arch_efi_call_virt_setup(void)

mingo thinks this one causes this:

[    0.119491] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.119498] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[    0.137368] Freeing SMP alternatives memory: 40K
[    0.137381] pid_max: default: 32768 minimum: 301
[    0.137496] ------------[ cut here ]------------
[    0.137502] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795 switch_mm_irqs_off+0x3d3/0x460
[    0.137516] Modules linked in:
[    0.137526] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc2+ #3 PREEMPT(voluntary) 
[    0.137537] Hardware name: HP HP ProBook 635 Aero G7 Notebook PC/8830, BIOS S84 Ver. 01.05.00 05/14/2021
[    0.137548] RIP: 0010:switch_mm_irqs_off+0x3d3/0x460
[    0.137556] Code: 28 00 65 ff 0d 3e c9 db 01 0f 85 88 fd ff ff 0f 1f 44 00 00 e9 7e fd ff ff be 00 01 00 00 31 ff e8 02 cb fb ff e9 be fd ff ff <0f> 0b e9 6c fc ff ff 9c 58 f6 c4 02 0f 84 c4 fd ff ff e8 46 3b 59
[    0.137575] RSP: 0000:ffffffffb6a03e00 EFLAGS: 00010202
[    0.137583] RAX: 0000000000000246 RBX: ffffffffb6c5fd40 RCX: 0000000100238000
[    0.137591] RDX: ffffffffb6a149c0 RSI: ffffffffb6c5fd40 RDI: 0000000000000000
[    0.137599] RBP: ffffffffb6bbcdc0 R08: 00000000b357d000 R09: 0000000000000000
[    0.137607] R10: 000000010ab06067 R11: 0000000000000000 R12: ffffffffb6a149c0
[    0.137616] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.137624] FS:  0000000000000000(0000) GS:ffff9b6093385000(0000) knlGS:0000000000000000
[    0.137633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.137640] CR2: ffff9b6048601000 CR3: 0000000106c26000 CR4: 0000000000350ef0
[    0.137648] Call Trace:
[    0.137653]  <TASK>
[    0.137658]  use_temporary_mm+0x55/0x90
[    0.137666]  efi_set_virtual_address_map+0xfd/0x1b0
[    0.137676]  efi_enter_virtual_mode+0x3e3/0x450
[    0.137685]  start_kernel+0x6b7/0x720
[    0.137693]  x86_64_start_reservations+0x24/0x30
[    0.137700]  x86_64_start_kernel+0x7a/0x80
[    0.137706]  common_startup_64+0x13e/0x141
[    0.137717]  </TASK>
[    0.137720] irq event stamp: 128439
[    0.137725] hardirqs last  enabled at (128447): [<ffffffffb579dcd2>] __up_console_sem+0x52/0x60
[    0.137737] hardirqs last disabled at (128454): [<ffffffffb579dcb7>] __up_console_sem+0x37/0x60
[    0.137748] softirqs last  enabled at (105766): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[    0.137759] softirqs last disabled at (105759): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[    0.137770] ---[ end trace 0000000000000000 ]---
[    0.137777] ------------[ cut here ]------------
[    0.137782] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:453 cr4_update_irqsoff+0x45/0x70
[    0.137794] Modules linked in:
[    0.137800] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W           6.15.0-rc2+ #3 PREEMPT(voluntary) 
[    0.137813] Tainted: [W]=WARN
[    0.137817] Hardware name: HP HP ProBook 635 Aero G7 Notebook PC/8830, BIOS S84 Ver. 01.05.00 05/14/2021
[    0.137827] RIP: 0010:cr4_update_irqsoff+0x45/0x70
[    0.137834] Code: 0b 65 8b 0d d5 3d e0 01 85 c9 74 13 48 f7 d7 48 21 d7 48 09 c7 48 39 fa 75 20 e9 f6 8d b8 00 65 8b 0d 9b 3a e0 01 85 c9 74 e2 <0f> 0b 48 f7 d7 48 21 d7 48 09 c7 48 39 fa 74 e0 65 48 89 3d eb 6a
[    0.137853] RSP: 0000:ffffffffb6a03df8 EFLAGS: 00010202
[    0.137860] RAX: 0000000000000000 RBX: ffffffffb6c5fd40 RCX: 0000000000000001
[    0.137868] RDX: 0000000000350ef0 RSI: 0000000000000100 RDI: 0000000000000100
[    0.137876] RBP: ffffffffb6bbcdc0 R08: 00000000b357d000 R09: 0000000000000000
[    0.137884] R10: 000000010ab06067 R11: 0000000000000000 R12: 000000010022e000
[    0.137892] R13: 0000000000010000 R14: 0000000000000000 R15: 0000000000000000
[    0.137900] FS:  0000000000000000(0000) GS:ffff9b6093385000(0000) knlGS:0000000000000000
[    0.137909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.137916] CR2: ffff9b6048601000 CR3: 000000010022e000 CR4: 0000000000350ef0
[    0.137924] Call Trace:
[    0.137928]  <TASK>
[    0.137932]  switch_mm_irqs_off+0x3ce/0x460
[    0.137940]  use_temporary_mm+0x55/0x90
[    0.137946]  efi_set_virtual_address_map+0xfd/0x1b0
[    0.137956]  efi_enter_virtual_mode+0x3e3/0x450
[    0.137964]  start_kernel+0x6b7/0x720
[    0.137971]  x86_64_start_reservations+0x24/0x30
[    0.137978]  x86_64_start_kernel+0x7a/0x80
[    0.137984]  common_startup_64+0x13e/0x141
[    0.137994]  </TASK>
[    0.137998] irq event stamp: 128723
[    0.138002] hardirqs last  enabled at (128731): [<ffffffffb579dcd2>] __up_console_sem+0x52/0x60
[    0.138013] hardirqs last disabled at (128738): [<ffffffffb579dcb7>] __up_console_sem+0x37/0x60
[    0.138024] softirqs last  enabled at (105766): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[    0.138034] softirqs last disabled at (105759): [<ffffffffb56f57b6>] __irq_exit_rcu+0x96/0xc0
[    0.138045] ---[ end trace 0000000000000000 ]---


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-17 14:17     ` Borislav Petkov
@ 2025-04-18  9:50       ` Peter Zijlstra
  2025-04-18 11:43         ` Borislav Petkov
  2025-04-18 12:48         ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2025-04-18  9:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-tip-commits, Andy Lutomirski, Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86

On Thu, Apr 17, 2025 at 04:17:51PM +0200, Borislav Petkov wrote:
> On Sat, Apr 12, 2025 at 06:46:48PM -0000, tip-bot2 for Andy Lutomirski wrote:
> > The following commit has been merged into the x86/alternatives branch of tip:
> > 
> > Commit-ID:     e7021e2fe0b4335523d3f6e2221000bdfc633b62
> > Gitweb:        https://git.kernel.org/tip/e7021e2fe0b4335523d3f6e2221000bdfc633b62
> > Author:        Andy Lutomirski <luto@kernel.org>
> > AuthorDate:    Wed, 02 Apr 2025 11:45:39 +02:00
> > Committer:     Ingo Molnar <mingo@kernel.org>
> > CommitterDate: Sat, 12 Apr 2025 10:06:04 +02:00
> > 
> > x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
> > 
> > This should be considerably more robust.  It's also necessary for optimized
> > for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in
> > lazy context would remove the lazy mm from mm_cpumask().
> > 
> > [ mingo: Merged it on top of x86/alternatives ]
> > 
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org
> > ---
> >  arch/x86/platform/efi/efi_64.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> > index ac57259..a5d3496 100644
> > --- a/arch/x86/platform/efi/efi_64.c
> > +++ b/arch/x86/platform/efi/efi_64.c
> > @@ -434,15 +434,12 @@ void __init efi_dump_pagetable(void)
> >   */
> >  static void efi_enter_mm(void)
> >  {
> > -	efi_prev_mm = current->active_mm;
> > -	current->active_mm = &efi_mm;
> > -	switch_mm(efi_prev_mm, &efi_mm, NULL);
> > +	efi_prev_mm = use_temporary_mm(&efi_mm);
> >  }
> >  
> >  static void efi_leave_mm(void)
> >  {
> > -	current->active_mm = efi_prev_mm;
> > -	switch_mm(&efi_mm, efi_prev_mm, NULL);
> > +	unuse_temporary_mm(efi_prev_mm);
> >  }
> >  
> >  void arch_efi_call_virt_setup(void)
> 
> mingo thinks this one causes this:
> 
> [    0.119491] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> [    0.119498] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
> [    0.137368] Freeing SMP alternatives memory: 40K
> [    0.137381] pid_max: default: 32768 minimum: 301
> [    0.137496] ------------[ cut here ]------------
> [    0.137502] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795 switch_mm_irqs_off+0x3d3/0x460
> [    0.137516] Modules linked in:

Ah yes :-( Something like so perhaps..

---
Subject: x86/mm: Fix {,un}use_temporary_mm() IRQ state

As the function switch_mm_irqs_off() implies, it ought to be called with
IRQs *off*. Commit 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs
are on") caused this to not be the case for EFI.

Ensure IRQs are off where it matters.

Fixes: 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 92bde0d6205a..1451e022129a 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -991,6 +991,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 	struct mm_struct *prev_mm;
 
 	lockdep_assert_preemption_disabled();
+	guard(irqsave)();
 
 	/*
 	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1023,6 +1024,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_preemption_disabled();
+	guard(irqsave)();
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
 	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-18  9:50       ` Peter Zijlstra
@ 2025-04-18 11:43         ` Borislav Petkov
  2025-04-18 12:37           ` Ingo Molnar
  2025-04-18 12:48         ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2025-04-18 11:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-tip-commits, Andy Lutomirski, Ingo Molnar,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86

On Fri, Apr 18, 2025 at 11:50:34AM +0200, Peter Zijlstra wrote:
> Ah yes :-( Something like so perhaps..

Thanks, that does it.

Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip: x86/alternatives] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery
  2025-04-18 11:43         ` Borislav Petkov
@ 2025-04-18 12:37           ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2025-04-18 12:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, linux-kernel, linux-tip-commits, Andy Lutomirski,
	Rik van Riel, H. Peter Anvin, Linus Torvalds, Andrew Morton, x86


* Borislav Petkov <bp@alien8.de> wrote:

> On Fri, Apr 18, 2025 at 11:50:34AM +0200, Peter Zijlstra wrote:
> > Ah yes :-( Something like so perhaps..
> 
> Thanks, that does it.
> 
> Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
> Tested-by: Borislav Petkov (AMD) <bp@alien8.de>

Applied to tip:x86/alternatives, thanks guys!

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state
  2025-04-18  9:50       ` Peter Zijlstra
  2025-04-18 11:43         ` Borislav Petkov
@ 2025-04-18 12:48         ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-18 12:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Borislav Petkov (AMD), Peter Zijlstra (Intel), Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Andy Lutomirski, Linus Torvalds,
	Rik van Riel, x86, linux-kernel

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     aef1d0209ddf127a8069aca5fa3a062be4136b76
Gitweb:        https://git.kernel.org/tip/aef1d0209ddf127a8069aca5fa3a062be4136b76
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 18 Apr 2025 11:50:34 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 18 Apr 2025 14:36:18 +02:00

x86/mm: Fix {,un}use_temporary_mm() IRQ state

As the function switch_mm_irqs_off() implies, it ought to be called with
IRQs *off*. Commit 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs
are on") caused this to not be the case for EFI.

Ensure IRQs are off where it matters.

Fixes: 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on")
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20250418095034.GR38216@noisy.programming.kicks-ass.net
---
 arch/x86/mm/tlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 79c124f..39761c7 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -986,6 +986,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 	struct mm_struct *prev_mm;
 
 	lockdep_assert_preemption_disabled();
+	guard(irqsave)();
 
 	/*
 	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
@@ -1018,6 +1019,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_preemption_disabled();
+	guard(irqsave)();
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
 	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-04-18 12:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02  9:45 [PATCH 0/7 -v2] Factor out, clean up and use the use_/unuse_temporary_mm() APIs some more Ingo Molnar
2025-04-02  9:45 ` [PATCH 1/7] x86/mm: Add 'mm' argument to unuse_temporary_mm() Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02  9:45 ` [PATCH 2/7] x86/events, x86/insn-eval: Remove incorrect current->active_mm references Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02  9:45 ` [PATCH 3/7] x86/mm: Make use_/unuse_temporary_mm() non-static Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02  9:45 ` [PATCH 4/7] x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Peter Zijlstra
2025-04-02  9:45 ` [PATCH 5/7] x86/mm: Allow temporary MMs when IRQs are on Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-02  9:45 ` [PATCH 6/7] x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machinery Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski
2025-04-17 14:17     ` Borislav Petkov
2025-04-18  9:50       ` Peter Zijlstra
2025-04-18 11:43         ` Borislav Petkov
2025-04-18 12:37           ` Ingo Molnar
2025-04-18 12:48         ` [tip: x86/alternatives] x86/mm: Fix {,un}use_temporary_mm() IRQ state tip-bot2 for Peter Zijlstra
2025-04-02  9:45 ` [PATCH 7/7] x86/mm: Opt-in to IRQs-off activate_mm() Ingo Molnar
2025-04-12 18:46   ` [tip: x86/alternatives] " tip-bot2 for Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox