* [PATCH 1/3] x86/fpu: Add fpu_save_state() for __save_processor_state()
2025-05-16 23:18 [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
@ 2025-05-16 23:18 ` Eric Biggers
2025-05-16 23:18 ` [PATCH 2/3] x86/pm: Use fpu_save_state() in __save_processor_state() Eric Biggers
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Eric Biggers @ 2025-05-16 23:18 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
From: Eric Biggers <ebiggers@google.com>
Add a new function fpu_save_state() which handles the FPU state saving
and invalidation that __save_processor_state() needs. This will allow
__save_processor_state() to stop using kernel_fpu_begin() and
kernel_fpu_end() for something other than actual kernel-mode FPU.
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/include/asm/fpu/api.h | 1 +
arch/x86/kernel/fpu/core.c | 43 ++++++++++++++++++++++++++++------
2 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 8e6848f55dcdb..3ad359c5b100e 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -26,10 +26,11 @@
#define KFPU_MXCSR _BITUL(1) /* MXCSR will be initialized */
extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
extern void kernel_fpu_end(void);
extern bool irq_fpu_usable(void);
+extern void fpu_save_state(void);
extern void fpregs_mark_activate(void);
/* Code that is unaware of kernel_fpu_begin_mask() can use this */
static inline void kernel_fpu_begin(void)
{
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 948b4f5fad99c..476393b1d5e8f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -118,11 +118,11 @@ static void update_avx_timestamp(struct fpu *fpu)
/*
* Save the FPU register state in fpu->fpstate->regs. The register state is
* preserved.
*
- * Must be called with fpregs_lock() held.
+ * Must be called with fpregs_lock() held or hardirqs disabled.
*
* The legacy FNSAVE instruction clears all FPU state unconditionally, so
* register state has to be reloaded. That might be a pointless exercise
* when the FPU is going to be used by another task right after that. But
* this only affects 20+ years old 32bit systems and avoids conditionals all
@@ -431,26 +431,31 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
+static __always_inline void __fpu_save_state(void)
+{
+ if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
+ !test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ save_fpregs_to_fpstate(x86_task_fpu(current));
+ }
+ __cpu_invalidate_fpregs_state();
+}
+
void kernel_fpu_begin_mask(unsigned int kfpu_mask)
{
if (!irqs_disabled())
fpregs_lock();
WARN_ON_FPU(!irq_fpu_usable());
WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
this_cpu_write(in_kernel_fpu, true);
- if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
- !test_thread_flag(TIF_NEED_FPU_LOAD)) {
- set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(x86_task_fpu(current));
- }
- __cpu_invalidate_fpregs_state();
+ __fpu_save_state();
/* Put sane initial values into the control registers. */
if (likely(kfpu_mask & KFPU_MXCSR) && boot_cpu_has(X86_FEATURE_XMM))
ldmxcsr(MXCSR_DEFAULT);
@@ -467,10 +472,34 @@ void kernel_fpu_end(void)
if (!irqs_disabled())
fpregs_unlock();
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
+#ifdef CONFIG_PM_SLEEP
+/*
+ * If the FPU registers are live for the current task, save them to current's
+ * memory register state and set TIF_NEED_FPU_LOAD. This is used by the suspend
+ * and kexec code to prepare for the FPU registers being clobbered. Unlike
+ * kernel_fpu_begin(), this function can be called with hardirqs disabled, and
+ * it does not initialize the FPU control registers for kernel-mode FPU use.
+ */
+void fpu_save_state(void)
+{
+ unsigned long flags;
+
+ WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
+
+ /*
+ * This is sometimes called with hardirqs disabled, so we need to use
+ * local_irq_save/restore() instead of fpregs_lock/unlock().
+ */
+ local_irq_save(flags);
+ __fpu_save_state();
+ local_irq_restore(flags);
+}
+#endif /* CONFIG_PM_SLEEP */
+
/*
* Sync the FPU register state to current's memory register state when the
* current task owns the FPU. The hardware register state is preserved.
*/
void fpu_sync_fpstate(struct fpu *fpu)
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] x86/pm: Use fpu_save_state() in __save_processor_state()
2025-05-16 23:18 [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
2025-05-16 23:18 ` [PATCH 1/3] x86/fpu: Add fpu_save_state() for __save_processor_state() Eric Biggers
@ 2025-05-16 23:18 ` Eric Biggers
2025-05-16 23:18 ` [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled() Eric Biggers
2025-05-17 1:30 ` [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
3 siblings, 0 replies; 15+ messages in thread
From: Eric Biggers @ 2025-05-16 23:18 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
From: Eric Biggers <ebiggers@google.com>
Make __save_processor_state() use fpu_save_state() instead of a
kernel_fpu_begin() and kernel_fpu_end() pair. This matches more
directly what it needs. This eliminates the need for kernel_fpu_begin()
and kernel_fpu_end() to support the irqs_disabled() case.
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/power/cpu.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 916441f5e85ce..dde4ccbc77f4b 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -81,11 +81,17 @@ static void msr_restore_context(struct saved_context *ctxt)
static void __save_processor_state(struct saved_context *ctxt)
{
#ifdef CONFIG_X86_32
mtrr_save_fixed_ranges(NULL);
#endif
- kernel_fpu_begin();
+
+ /*
+ * The FPU registers may be live for the current task, so save them to
+ * current's memory register state. The corresponding restore happens
+ * lazily when returning to userspace, not in restore_processor_state().
+ */
+ fpu_save_state();
/*
* descriptor tables
*/
store_idt(&ctxt->idt);
@@ -99,11 +105,10 @@ static void __save_processor_state(struct saved_context *ctxt)
ctxt->gdt_desc.size = GDT_SIZE - 1;
ctxt->gdt_desc.address = (unsigned long)get_cpu_gdt_rw(smp_processor_id());
store_tr(ctxt->tr);
- /* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
/*
* segment registers
*/
savesegment(gs, ctxt->gs);
#ifdef CONFIG_X86_64
@@ -139,18 +144,10 @@ void save_processor_state(void)
}
#ifdef CONFIG_X86_32
EXPORT_SYMBOL(save_processor_state);
#endif
-static void do_fpu_end(void)
-{
- /*
- * Restore FPU regs if necessary.
- */
- kernel_fpu_end();
-}
-
static void fix_processor_context(void)
{
int cpu = smp_processor_id();
#ifdef CONFIG_X86_64
struct desc_struct *desc = get_cpu_gdt_rw(cpu);
@@ -272,11 +269,10 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrq(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
#else
loadsegment(gs, ctxt->gs);
#endif
- do_fpu_end();
tsc_verify_tsc_adjust(true);
x86_platform.restore_sched_clock_state();
cache_bp_restore();
perf_restore_debug_store();
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-16 23:18 [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
2025-05-16 23:18 ` [PATCH 1/3] x86/fpu: Add fpu_save_state() for __save_processor_state() Eric Biggers
2025-05-16 23:18 ` [PATCH 2/3] x86/pm: Use fpu_save_state() in __save_processor_state() Eric Biggers
@ 2025-05-16 23:18 ` Eric Biggers
2025-05-17 7:09 ` Ingo Molnar
2025-05-17 1:30 ` [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
3 siblings, 1 reply; 15+ messages in thread
From: Eric Biggers @ 2025-05-16 23:18 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
From: Eric Biggers <ebiggers@google.com>
Make irq_fpu_usable() return false when irqs_disabled(). That makes the
irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
unnecessary, so also remove those.
Rationale:
- There's no known use case for kernel-mode FPU when irqs_disabled().
arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
__save_processor_state() previously did expect kernel_fpu_begin() and
kernel_fpu_end() to work when irqs_disabled(), but this was a
different use case and not actual kernel-mode FPU use.
- This is more efficient, since one call to irqs_disabled() replaces two
irqs_disabled() and one in_hardirq().
- This fixes irq_fpu_usable() to correctly return false during CPU
initialization. Incorrectly returning true caused the SHA-256 library
code, which is called when loading AMD microcode, to take a
SIMD-optimized code path too early, causing a crash. By correctly
returning false from irq_fpu_usable(), the generic SHA-256 code
correctly gets used instead. (Note: SIMD-optimized SHA-256 doesn't
get enabled until subsys_initcall, but CPU hotplug can happen later.)
Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash")
Reported-by: Ayush Jain <Ayush.Jain3@amd.com>
Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/kernel/fpu/core.c | 49 ++++++++++++++------------------------
1 file changed, 18 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 476393b1d5e8f..9b3c5e17f86cd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -66,42 +66,31 @@ struct fpu *x86_task_fpu(struct task_struct *task)
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
*/
bool irq_fpu_usable(void)
{
- if (WARN_ON_ONCE(in_nmi()))
- return false;
-
/*
- * In kernel FPU usage already active? This detects any explicitly
- * nested usage in task or softirq context, which is unsupported. It
- * also detects attempted usage in a hardirq that has interrupted a
- * kernel-mode FPU section.
+ * To ensure that (non-explicitly-nested) kernel-mode FPU is always
+ * usable in task and softirq contexts, kernel_fpu_begin() disables
+ * preemption and softirqs, and kernel_fpu_end() re-enables them. That
+ * is not compatible with hardirqs being disabled (including hardirq
+ * context), or with NMI context. Support for kernel-mode FPU is not
+ * needed in those contexts anyway. Return false in those contexts.
+ *
+ * Returning false when irqs_disabled() also eliminates the need to
+ * explicitly check whether the FPU has been initialized yet during CPU
+ * initialization. Before then, hardirqs are still disabled.
*/
+ if (irqs_disabled() || WARN_ON_ONCE(in_nmi()))
+ return false;
+
+ /* Catch any explicitly nested usage, which should never happen. */
if (this_cpu_read(in_kernel_fpu)) {
- WARN_ON_FPU(!in_hardirq());
+ WARN_ON_FPU(1);
return false;
}
-
- /*
- * When not in NMI or hard interrupt context, FPU can be used in:
- *
- * - Task context except from within fpregs_lock()'ed critical
- * regions.
- *
- * - Soft interrupt processing context which cannot happen
- * while in a fpregs_lock()'ed critical region.
- */
- if (!in_hardirq())
- return true;
-
- /*
- * In hard interrupt context it's safe when soft interrupts
- * are enabled, which means the interrupt did not hit in
- * a fpregs_lock()'ed critical region.
- */
- return !softirq_count();
+ return true;
}
EXPORT_SYMBOL(irq_fpu_usable);
/*
* Track AVX512 state use because it is known to slow the max clock
@@ -443,12 +432,11 @@ static __always_inline void __fpu_save_state(void)
__cpu_invalidate_fpregs_state();
}
void kernel_fpu_begin_mask(unsigned int kfpu_mask)
{
- if (!irqs_disabled())
- fpregs_lock();
+ fpregs_lock();
WARN_ON_FPU(!irq_fpu_usable());
WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
this_cpu_write(in_kernel_fpu, true);
@@ -467,12 +455,11 @@ EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
void kernel_fpu_end(void)
{
WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
this_cpu_write(in_kernel_fpu, false);
- if (!irqs_disabled())
- fpregs_unlock();
+ fpregs_unlock();
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
#ifdef CONFIG_PM_SLEEP
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-16 23:18 ` [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled() Eric Biggers
@ 2025-05-17 7:09 ` Ingo Molnar
2025-05-17 18:39 ` Eric Biggers
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-05-17 7:09 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
* Eric Biggers <ebiggers@kernel.org> wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Make irq_fpu_usable() return false when irqs_disabled(). That makes the
> irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
> unnecessary, so also remove those.
>
> Rationale:
>
> - There's no known use case for kernel-mode FPU when irqs_disabled().
Except EFI?
> arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
> __save_processor_state() previously did expect kernel_fpu_begin() and
> kernel_fpu_end() to work when irqs_disabled(), but this was a
> different use case and not actual kernel-mode FPU use.
>
> - This is more efficient, since one call to irqs_disabled() replaces two
> irqs_disabled() and one in_hardirq().
This is noise compared to the overhead of saving/restoring vector CPU
context ...
> - This fixes irq_fpu_usable() to correctly return false during CPU
> initialization. Incorrectly returning true caused the SHA-256 library
> code, which is called when loading AMD microcode, to take a
> SIMD-optimized code path too early, causing a crash. By correctly
> returning false from irq_fpu_usable(), the generic SHA-256 code
> correctly gets used instead. (Note: SIMD-optimized SHA-256 doesn't
> get enabled until subsys_initcall, but CPU hotplug can happen later.)
Alternatively we could set in_kernel_fpu during CPU bootstrap, and
clear it once we know the FPU is usable? This is only a relatively
short early boot period, with no scheduling, right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-17 7:09 ` Ingo Molnar
@ 2025-05-17 18:39 ` Eric Biggers
2025-05-18 6:34 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Eric Biggers @ 2025-05-17 18:39 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
On Sat, May 17, 2025 at 09:09:01AM +0200, Ingo Molnar wrote:
>
> * Eric Biggers <ebiggers@kernel.org> wrote:
>
> > From: Eric Biggers <ebiggers@google.com>
> >
> > Make irq_fpu_usable() return false when irqs_disabled(). That makes the
> > irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
> > unnecessary, so also remove those.
> >
> > Rationale:
> >
> > - There's no known use case for kernel-mode FPU when irqs_disabled().
>
> Except EFI?
Yes, I remembered that just after sending this... And EFI does want the ldmxcsr
and fninit, which makes it like actual kernel-mode FPU. That implies we at
least need to disable BH (and preemption) if it wasn't already disabled. But if
hardirqs may or may not be disabled already, that means we either need to
conditionally use local_bh_disable()/enable (or preempt_enable()/disable on
PREEMPT_RT) as the current code does, or use local_irq_save()/restore.
If we did the latter, then all EFI calls would run with hardirqs disabled. It
looks like hardirqs are currently intentionally disabled before some of the EFI
calls, but not all of them. I'm not sure what the logic is there, and whether
it would be okay to just always disable them.
>
> > arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
> > __save_processor_state() previously did expect kernel_fpu_begin() and
> > kernel_fpu_end() to work when irqs_disabled(), but this was a
> > different use case and not actual kernel-mode FPU use.
> >
> > - This is more efficient, since one call to irqs_disabled() replaces two
> > irqs_disabled() and one in_hardirq().
>
> This is noise compared to the overhead of saving/restoring vector CPU
> context ...
In practice most calls to kernel_fpu_begin() don't actually do the
save_fpregs_to_fpstate(), since either TIF_NEED_FPU_LOAD is already set or it's
a kthread. So, the overhead from the other parts like the EFLAGS checks and
ldmxcsr are measurable, especially when processing small amounts of data.
> > - This fixes irq_fpu_usable() to correctly return false during CPU
> > initialization. Incorrectly returning true caused the SHA-256 library
> > code, which is called when loading AMD microcode, to take a
> > SIMD-optimized code path too early, causing a crash. By correctly
> > returning false from irq_fpu_usable(), the generic SHA-256 code
> > correctly gets used instead. (Note: SIMD-optimized SHA-256 doesn't
> > get enabled until subsys_initcall, but CPU hotplug can happen later.)
>
> Alternatively we could set in_kernel_fpu during CPU bootstrap, and
> clear it once we know the FPU is usable? This is only a relatively
> short early boot period, with no scheduling, right?
Yes, if there isn't agreement on this approach we can do that instead. Say:
- Replace in_kernel_fpu with kernel_fpu_supported, with the opposite meaning
(so that the initial value of false means "unsupported")
- fpu__init_cpu() sets it to true
- cpu_disable_common() sets it to false
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-17 18:39 ` Eric Biggers
@ 2025-05-18 6:34 ` Ingo Molnar
2025-05-18 13:18 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-05-18 6:34 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
* Eric Biggers <ebiggers@kernel.org> wrote:
> > Alternatively we could set in_kernel_fpu during CPU bootstrap, and
> > clear it once we know the FPU is usable? This is only a relatively
> > short early boot period, with no scheduling, right?
>
> Yes, if there isn't agreement on this approach we can do that
> instead. Say:
>
> - Replace in_kernel_fpu with kernel_fpu_supported, with the opposite
> meaning (so that the initial value of false means "unsupported")
I'm not against simplifying the x86 FPU model to exclude IRQs-off
context (especially if it also micro-optimizes some of the key runtime
kernel-FPU primitives), but it has to be a full solution and we'll have
to see how complicated the EFI changes get.
Ie. without seeing the full cost-benefit balance it's hard to call this
in advance. Mind sending a full series that addresses the EFI case too?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-18 6:34 ` Ingo Molnar
@ 2025-05-18 13:18 ` Ard Biesheuvel
2025-05-18 20:01 ` Eric Biggers
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-05-18 13:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Biggers, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
On Sun, 18 May 2025 at 08:34, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Eric Biggers <ebiggers@kernel.org> wrote:
>
> > > Alternatively we could set in_kernel_fpu during CPU bootstrap, and
> > > clear it once we know the FPU is usable? This is only a relatively
> > > short early boot period, with no scheduling, right?
> >
> > Yes, if there isn't agreement on this approach we can do that
> > instead. Say:
> >
> > - Replace in_kernel_fpu with kernel_fpu_supported, with the opposite
> > meaning (so that the initial value of false means "unsupported")
>
> I'm not against simplifying the x86 FPU model to exclude IRQs-off
> context (especially if it also micro-optimizes some of the key runtime
> kernel-FPU primitives), but it has to be a full solution and we'll have
> to see how complicated the EFI changes get.
>
> Ie. without seeing the full cost-benefit balance it's hard to call this
> in advance. Mind sending a full series that addresses the EFI case too?
>
EFI services are only called with IRQs disabled in exceptional cases,
so it would be unfortunate if it prevents us from make meaningful
improvements here. In ordinary cases, they are called from a
workqueue, and I'd prefer it if we can address this without calling
all EFI services with interrupts disabled either.
AIUI, the reason we cannot tolerate IRQs being disabled is because
re-enabling softirqs will complain if IRQs are disabled, due to the
fact that handling softirqs should not be attempted at that point?
I don't know the history here, but I wonder if that isn't overly
pedantic? Disabling softirqs could be avoided entirely when IRQs are
off, given that they are disabled implicitly already. But why then is
it not permitted to disable and re-enable softirqs under this
condition, given that it makes no difference? Or perhaps I'm missing
something here.
A good way to trigger such an exceptional case is running a kernel
with efi-pstore and lkdtm built-in under QEMU with OVMF, and do
# echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
Another case that likely executes with IRQs disabled (but I haven't
double checked) is reset_system(), which may return with an error, or
reboot/poweroff the machine and never return.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-18 13:18 ` Ard Biesheuvel
@ 2025-05-18 20:01 ` Eric Biggers
2025-05-19 8:05 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Eric Biggers @ 2025-05-18 20:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ingo Molnar, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
On Sun, May 18, 2025 at 03:18:58PM +0200, Ard Biesheuvel wrote:
> On Sun, 18 May 2025 at 08:34, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > > > Alternatively we could set in_kernel_fpu during CPU bootstrap, and
> > > > clear it once we know the FPU is usable? This is only a relatively
> > > > short early boot period, with no scheduling, right?
> > >
> > > Yes, if there isn't agreement on this approach we can do that
> > > instead. Say:
> > >
> > > - Replace in_kernel_fpu with kernel_fpu_supported, with the opposite
> > > meaning (so that the initial value of false means "unsupported")
> >
> > I'm not against simplifying the x86 FPU model to exclude IRQs-off
> > context (especially if it also micro-optimizes some of the key runtime
> > kernel-FPU primitives), but it has to be a full solution and we'll have
> > to see how complicated the EFI changes get.
> >
> > Ie. without seeing the full cost-benefit balance it's hard to call this
> > in advance. Mind sending a full series that addresses the EFI case too?
> >
>
> EFI services are only called with IRQs disabled in exceptional cases,
> so it would be unfortunate if it prevents us from make meaningful
> improvements here. In ordinary cases, they are called from a
> workqueue, and I'd prefer it if we can address this without calling
> all EFI services with interrupts disabled either.
>
> AIUI, the reason we cannot tolerate IRQs being disabled is because
> re-enabling softirqs will complain if IRQs are disabled, due to the
> fact that handling softirqs should not be attempted at that point?
>
> I don't know the history here, but I wonder if that isn't overly
> pedantic? Disabling softirqs could be avoided entirely when IRQs are
> off, given that they are disabled implicitly already. But why then is
> it not permitted to disable and re-enable softirqs under this
> condition, given that it makes no difference? Or perhaps I'm missing
> something here.
>
> A good way to trigger such an exceptional case is running a kernel
> with efi-pstore and lkdtm built-in under QEMU with OVMF, and do
>
> # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
>
> Another case that likely executes with IRQs disabled (but I haven't
> double checked) is reset_system(), which may return with an error, or
> reboot/poweroff the machine and never return.
That makes sense to me. preempt_disable() and preempt_enable() are already
allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
local_bh_enable() are different. local_bh_enable() already uses
local_irq_save(flags) instead of local_irq_disable(), so it seems it's sort of
intended to work when IRQs are disabled, despite the
lockdep_assert_irqs_enabled().
Anyway, that would point to continuing to support kernel-mode FPU when IRQs are
disabled. But also EFI needs it anyway, unless we refactor it to use
kernel_fpu_begin() and kernel_fpu_end() only when irq_fpu_usable() and otherwise
use different code, analogous what arm64 does.
So for now I've sent
https://lore.kernel.org/lkml/20250518193212.1822-1-ebiggers@kernel.org which
implements the other possible fix, where we just start keeping track of whether
the FPU has been initialized or not.
- Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-18 20:01 ` Eric Biggers
@ 2025-05-19 8:05 ` Ingo Molnar
2025-05-19 9:49 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-05-19 8:05 UTC (permalink / raw)
To: Eric Biggers
Cc: Ard Biesheuvel, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
* Eric Biggers <ebiggers@kernel.org> wrote:
> > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > Another case that likely executes with IRQs disabled (but I haven't
> > double checked) is reset_system(), which may return with an error, or
> > reboot/poweroff the machine and never return.
>
> That makes sense to me. preempt_disable() and preempt_enable() are already
> allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> local_bh_enable() are different.
Because local_bh_enable() may run softirq handlers immediately if
there's pending softirqs, which shouldn't be done in hardirq context.
This is a key optimization of the Linux networking code, which uses
BH-off/BH-on sections instead of IRQS-off/IRQS-on critical sections,
for performance reasons.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-19 8:05 ` Ingo Molnar
@ 2025-05-19 9:49 ` Ard Biesheuvel
2025-05-19 12:57 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-05-19 9:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Biggers, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Eric Biggers <ebiggers@kernel.org> wrote:
>
> > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > >
> > > Another case that likely executes with IRQs disabled (but I haven't
> > > double checked) is reset_system(), which may return with an error, or
> > > reboot/poweroff the machine and never return.
> >
> > That makes sense to me. preempt_disable() and preempt_enable() are already
> > allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> > local_bh_enable() are different.
>
> Because local_bh_enable() may run softirq handlers immediately if
> there's pending softirqs, which shouldn't be done in hardirq context.
>
Sure, but why is that mandatory?
preempt_disable() has preempt_enable() and preempt_enable_no_resched()
counterparts. Could we have a local_bh_enable_no_xxx() version that
re-enables async softirq processing on the current CPU but does not
kick off a synchronous processing run?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-19 9:49 ` Ard Biesheuvel
@ 2025-05-19 12:57 ` Ingo Molnar
2025-05-19 13:50 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-05-19 12:57 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Eric Biggers, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
* Ard Biesheuvel <ardb@kernel.org> wrote:
> On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > > >
> > > > Another case that likely executes with IRQs disabled (but I haven't
> > > > double checked) is reset_system(), which may return with an error, or
> > > > reboot/poweroff the machine and never return.
> > >
> > > That makes sense to me. preempt_disable() and preempt_enable() are already
> > > allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> > > local_bh_enable() are different.
> >
> > Because local_bh_enable() may run softirq handlers immediately if
> > there's pending softirqs, which shouldn't be done in hardirq context.
> >
>
> Sure, but why is that mandatory?
>
>
> preempt_disable() has preempt_enable() and preempt_enable_no_resched()
> counterparts.
> [...] Could we have a local_bh_enable_no_xxx() version that
> re-enables async softirq processing on the current CPU but does not
> kick off a synchronous processing run?
Yes, that's what __local_bh_enable() does, but if used it for
kernel_fpu_end() we'd be introducing random softirq processing
latencies. The softirq execution model is for softirqs to be
immediately executed after local_bh_enable(), and various networking
code is tuned to that behavior.
You can try talking the networking folks into an asynchronous
local_bh_enable() executed on the next IRQ or the next scheduler tick
or so, but it's a non-trivial behavioral change. It would probably also
need user-return callback activation.
I'm pretty sure that the naive implementation would increase LAN ping
latencies by +4 msecs on a typical distro kernel.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-19 12:57 ` Ingo Molnar
@ 2025-05-19 13:50 ` Ard Biesheuvel
2025-05-20 7:42 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-05-19 13:50 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Biggers, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
On Mon, 19 May 2025 at 14:57, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > > > >
> > > > > Another case that likely executes with IRQs disabled (but I haven't
> > > > > double checked) is reset_system(), which may return with an error, or
> > > > > reboot/poweroff the machine and never return.
> > > >
> > > > That makes sense to me. preempt_disable() and preempt_enable() are already
> > > > allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> > > > local_bh_enable() are different.
> > >
> > > Because local_bh_enable() may run softirq handlers immediately if
> > > there's pending softirqs, which shouldn't be done in hardirq context.
> > >
> >
> > Sure, but why is that mandatory?
> >
> >
> > preempt_disable() has preempt_enable() and preempt_enable_no_resched()
> > counterparts.
>
> > [...] Could we have a local_bh_enable_no_xxx() version that
> > re-enables async softirq processing on the current CPU but does not
> > kick off a synchronous processing run?
>
> Yes, that's what __local_bh_enable() does, but if used it for
> kernel_fpu_end() we'd be introducing random softirq processing
> latencies. The softirq execution model is for softirqs to be
> immediately executed after local_bh_enable(), and various networking
> code is tuned to that behavior.
>
All of that only applies when re-enabling softirqs with IRQs enabled.
> You can try talking the networking folks into an asynchronous
> local_bh_enable() executed on the next IRQ or the next scheduler tick
> or so, but it's a non-trivial behavioral change. It would probably also
> need user-return callback activation.
>
> I'm pretty sure that the naive implementation would increase LAN ping
> latencies by +4 msecs on a typical distro kernel.
>
I don't see why we'd need all of that.
Conceptually, kernel_fpu_end() would do
if (irqs_disabled())
local_bh_enable_no_xxx();
else
local_bh_enable();
which cannot affect any existing use cases, given that the former case
is forbidden atm.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
2025-05-19 13:50 ` Ard Biesheuvel
@ 2025-05-20 7:42 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2025-05-20 7:42 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Eric Biggers, x86, linux-kernel, linux-crypto, linux-pm,
Borislav Petkov, Thomas Gleixner, Ayush Jain, Herbert Xu
* Ard Biesheuvel <ardb@kernel.org> wrote:
> On Mon, 19 May 2025 at 14:57, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Eric Biggers <ebiggers@kernel.org> wrote:
> > > >
> > > > > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > > > > >
> > > > > > Another case that likely executes with IRQs disabled (but I
> > > > > > haven't double checked) is reset_system(), which may return
> > > > > > with an error, or reboot/poweroff the machine and never
> > > > > > return.
> > > > >
> > > > > That makes sense to me. preempt_disable() and
> > > > > preempt_enable() are already allowed when IRQs are disabled,
> > > > > and I'm not sure why local_bh_disable() and local_bh_enable()
> > > > > are different.
> > > >
> > > > Because local_bh_enable() may run softirq handlers immediately
> > > > if there's pending softirqs, which shouldn't be done in hardirq
> > > > context.
> > >
> > > Sure, but why is that mandatory?
> > >
> > >
> > > preempt_disable() has preempt_enable() and
> > > preempt_enable_no_resched() counterparts.
> >
> > > [...] Could we have a local_bh_enable_no_xxx() version that
> > > re-enables async softirq processing on the current CPU but does not
> > > kick off a synchronous processing run?
> >
> > Yes, that's what __local_bh_enable() does, but if used it for
> > kernel_fpu_end() we'd be introducing random softirq processing
> > latencies. The softirq execution model is for softirqs to be
> > immediately executed after local_bh_enable(), and various networking
> > code is tuned to that behavior.
> >
>
> All of that only applies when re-enabling softirqs with IRQs enabled.
Yes, of course. BHs in the networking code are typically
disabled/enabled when IRQs are on. It's the whole *point* of the
facility: it was written as a 'lightweight' IRQs-on/off facility
originally, back in the days when local_irq_save() was very expensive,
especially on non-x86 platforms.
> I don't see why we'd need all of that.
>
> Conceptually, kernel_fpu_end() would do
>
> if (irqs_disabled())
> local_bh_enable_no_xxx();
> else
> local_bh_enable();
In normal kernel code local_bh_enable() obviously cannot be done with
IRQs off, and local_bh_disable()/__local_bh_enable() is highly frowned
upon because it's generally pointless: if irqs are off to begin with,
why disable any BHs?
What you probably mean is to only disable BHs if IRQs are not off
(because hardirqs-off state already disables BHs):
kernel_fpu_begin()
if (!irqs_disabled)
local_bh_disable();
kernel_fpu_end()
if (!irqs_disabled())
local_bh_enable();
... which is basically what the current code does:
if (!irqs_disabled())
fpregs_lock();
...
if (!irqs_disabled())
fpregs_unlock();
BTW., maybe we should add a lockdep check to make sure we never enable
hardirqs while kernel FPU handling is ongoing. It should be relatively
straightforward and cheap.
But that brings is far away from the original question:
> > > > > preempt_disable() and preempt_enable() are already allowed
> > > > > when IRQs are disabled, and I'm not sure why
> > > > > local_bh_disable() and local_bh_enable() are different.
> > > >
> > > > Because local_bh_enable() may run softirq handlers immediately
> > > > if there's pending softirqs, which shouldn't be done in hardirq
> > > > context.
To rephrase my answer: local_bh_disable()/enable() are part of the
softirq execution mechanism whose primary historical purpose was to be
a lighter-weight replacement hardirq disable/enable critical sections,
combined with a relaxation of how long a softirq could run versus a
hardirq. It makes little sense to try to nest BH handling primitives to
within hardirq critical sections.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled
2025-05-16 23:18 [PATCH 0/3] x86: Don't support kernel-mode FPU with hardirqs disabled Eric Biggers
` (2 preceding siblings ...)
2025-05-16 23:18 ` [PATCH 3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled() Eric Biggers
@ 2025-05-17 1:30 ` Eric Biggers
3 siblings, 0 replies; 15+ messages in thread
From: Eric Biggers @ 2025-05-17 1:30 UTC (permalink / raw)
To: x86
Cc: linux-kernel, linux-crypto, linux-pm, Borislav Petkov,
Thomas Gleixner, Ayush Jain, Herbert Xu, Ard Biesheuvel
On Fri, May 16, 2025 at 04:18:55PM -0700, Eric Biggers wrote:
> This series returns to my earlier suggestion to make x86 not support
> kernel-mode FPU when hardirqs are disabled, aligning it with arm64
> (https://lore.kernel.org/r/20250220051325.340691-2-ebiggers@kernel.org).
> To make this possible despite the use of the kernel-mode FPU functions
> by __save_processor_state() (which I mentioned at
> https://lore.kernel.org/r/20250228035924.GC5588@sol.localdomain), I've
> changed __save_processor_state() to use a new function instead of
> (mis)using the kernel-mode FPU functions.
>
> This slightly reduces the overhead of kernel-mode FPU (since the result
> is fewer checks), and it fixes the issue reported at
> https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local/
> where irq_fpu_usable() incorrectly returned false during CPU
> initialization, causing a crash in the SHA-256 library code.
>
> Eric Biggers (3):
> x86/fpu: Add fpu_save_state() for __save_processor_state()
> x86/pm: Use fpu_save_state() in __save_processor_state()
> x86/fpu: Don't support kernel-mode FPU when irqs_disabled()
>
> arch/x86/include/asm/fpu/api.h | 1 +
> arch/x86/kernel/fpu/core.c | 92 ++++++++++++++++++++--------------
> arch/x86/power/cpu.c | 18 +++----
> 3 files changed, 62 insertions(+), 49 deletions(-)
I realized I forgot about EFI again. Ard had mentioned that earlier.
I think we'll need a !irq_disabled()-safe solution for efi_fpu_begin(). It
could be a different function from the regular kernel_fpu_begin(), though.
- Eric
^ permalink raw reply [flat|nested] 15+ messages in thread