* [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu
@ 2024-06-08 7:31 Ingo Molnar
2024-06-08 7:31 ` [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
` (8 more replies)
0 siblings, 9 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
This series is one of the dependencies of the fast-headers work,
which aims to reduce header complexity by removing <asm/processor.h>
from the <linux/sched.h> dependency chain, which headers are headers
are fat enough already even if we do not combine them.
To achieve that decoupling, one of the key steps is to not embedd any
C types from <asm/processor.h> into task_struct.
The only architecture that relies on that in a serious fashion is x86,
via the 'struct thread::fpu' variable size structure. The series below
attempts to resolve it by using a calculated fpu context area address
value via the x86_task_fpu() helper. The allocation layout of
task_struct + fpu-save-area doesn't change.
Changes in -v3:
- Restructure the series to be easier to review
- Extend the debug checks to all PF_KTHREAD tasks
- A few cleanups on top
The Git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu
Thanks,
Ingo
======================>
Ingo Molnar (9):
x86/fpu: Introduce the x86_task_fpu() helper method
x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()
x86/fpu: Make task_struct::thread constant size
x86/fpu: Remove the thread::fpu pointer
x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit
x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
x86/fpu: Use 'fpstate' variable names consistently
x86/fpu: Fix stale comment in ex_handler_fprestore()
arch/x86/include/asm/fpu/api.h | 2 +-
arch/x86/include/asm/fpu/sched.h | 4 +-
arch/x86/include/asm/processor.h | 23 ++++++------
arch/x86/kernel/fpu/context.h | 4 +-
arch/x86/kernel/fpu/core.c | 80 +++++++++++++++++++++++-----------------
arch/x86/kernel/fpu/init.c | 23 +++++++-----
arch/x86/kernel/fpu/regset.c | 22 +++++------
arch/x86/kernel/fpu/signal.c | 18 ++++-----
arch/x86/kernel/fpu/xstate.c | 27 ++++++--------
arch/x86/kernel/fpu/xstate.h | 6 +--
arch/x86/kernel/process.c | 7 +---
arch/x86/kernel/signal.c | 6 +--
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 +-
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
include/linux/sched.h | 13 ++-----
18 files changed, 126 insertions(+), 121 deletions(-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 2/9] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu() Ingo Molnar
` (7 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
The per-task FPU context/save area is allocated right
next to task_struct, currently in a variable-size
array via task_struct::thread.fpu[], but we plan to
fully hide it from the C type scope.
Introduce the x86_task_fpu() accessor that gets to the
FPU context pointer explicitly from the task pointer.
Right now this is a simple (task)->thread.fpu wrapper.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/processor.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cb4f6c513c48..35aa8f652964 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -511,6 +511,8 @@ struct thread_struct {
*/
};
+#define x86_task_fpu(task) (&(task)->thread.fpu)
+
extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
static inline void arch_thread_struct_whitelist(unsigned long *offset,
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/9] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
2024-06-08 7:31 ` [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 3/9] x86/fpu: Make task_struct::thread constant size Ingo Molnar
` (6 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
This will make the removal of the task_struct::thread.fpu array
easier.
No change in functionality - code generated before and after this
commit is identical on x86-defconfig:
kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm
kepler:~/tip>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/kernel/fpu/context.h | 4 ++--
arch/x86/kernel/fpu/core.c | 30 +++++++++++++++---------------
arch/x86/kernel/fpu/init.c | 8 ++++----
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 +++---
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
15 files changed, 68 insertions(+), 68 deletions(-)
diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index c485f1944c5f..1feaa68b7567 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
- struct fpu *old_fpu = &old->thread.fpu;
+ struct fpu *old_fpu = x86_task_fpu(old);
save_fpregs_to_fpstate(old_fpu);
/*
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index f6d856bd50bc..10d0a720659c 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int cpu = smp_processor_id();
if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
@@ -67,7 +67,7 @@ static inline void fpregs_restore_userregs(void)
* If PKRU is enabled, then the PKRU value is already
* correct because it was either set in switch_to() or in
* flush_thread(). So it is excluded because it might be
- * not up to date in current->thread.fpu.xsave state.
+ * not up to date in current->thread.fpu->xsave state.
*
* XFD state is handled in restore_fpregs_from_fpstate().
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..ca6745f8ac2a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
return;
spin_lock_irq(¤t->sighand->siglock);
- fpuperm = ¤t->group_leader->thread.fpu.guest_perm;
+ fpuperm = &x86_task_fpu(current->group_leader)->guest_perm;
perm = fpuperm->__state_perm;
/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = current->thread.fpu.fpstate;
+ struct fpstate *fps = x86_task_fpu(current)->fpstate;
lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
{
struct fpstate *guest_fps = guest_fpu->fpstate;
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *cur_fps = fpu->fpstate;
fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(¤t->thread.fpu);
+ save_fpregs_to_fpstate(x86_task_fpu(current));
}
__cpu_invalidate_fpregs_state();
@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
*/
void fpu_sync_fpstate(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != ¤t->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));
fpregs_lock();
trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
static inline void fpu_inherit_perms(struct fpu *dst_fpu)
{
if (fpu_state_size_dynamic()) {
- struct fpu *src_fpu = ¤t->group_leader->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current->group_leader);
spin_lock_irq(¤t->sighand->siglock);
/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
if (!ssp)
return 0;
- xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
+ xstate = get_xsave_addr(&x86_task_fpu(dst)->fpstate->regs.xsave,
XFEATURE_CET_USER);
/*
@@ -584,8 +584,8 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long ssp)
{
- struct fpu *src_fpu = ¤t->thread.fpu;
- struct fpu *dst_fpu = &dst->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current);
+ struct fpu *dst_fpu = x86_task_fpu(dst);
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -677,7 +677,7 @@ void fpu__drop(struct fpu *fpu)
{
preempt_disable();
- if (fpu == ¤t->thread.fpu) {
+ if (fpu == x86_task_fpu(current)) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
@@ -711,7 +711,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
*/
static void fpu_reset_fpregs(void)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
fpregs_lock();
__fpu_invalidate_fpregs_state(fpu);
@@ -740,7 +740,7 @@ static void fpu_reset_fpregs(void)
*/
void fpu__clear_user_states(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != ¤t->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));
fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -773,7 +773,7 @@ void fpu__clear_user_states(struct fpu *fpu)
void fpu_flush_thread(void)
{
- fpstate_reset(¤t->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
fpu_reset_fpregs();
}
/*
@@ -814,7 +814,7 @@ void fpregs_lock_and_load(void)
*/
void fpregs_assert_state_consistent(void)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
if (test_thread_flag(TIF_NEED_FPU_LOAD))
return;
@@ -826,7 +826,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
void fpregs_mark_activate(void)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
fpregs_activate(fpu);
fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 998a08f17e33..ad5cb2943d37 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(¤t->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
else
#endif
asm volatile ("fninit");
@@ -154,7 +154,7 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(current->thread.fpu.__fpstate.regs);
+ task_size -= sizeof(union fpregs_state);
/*
* Add back the dynamically-calculated register state
@@ -204,7 +204,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(¤t->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}
/*
@@ -213,7 +213,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
void __init fpu__init_system(void)
{
- fpstate_reset(¤t->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
fpu__init_system_early_generic();
/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2a21bd..19fd159217f7 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static void sync_fpstate(struct fpu *fpu)
{
- if (fpu == ¤t->thread.fpu)
+ if (fpu == x86_task_fpu(current))
fpu_sync_fpstate(fpu);
}
@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
* Only stopped child tasks can be used to modify the FPU
* state in the fpstate buffer:
*/
- WARN_ON_FPU(fpu == ¤t->thread.fpu);
+ WARN_ON_FPU(fpu == x86_task_fpu(current));
__fpu_invalidate_fpregs_state(fpu);
}
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct fxregs_state newstate;
int ret;
@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;
- sync_fpstate(&target->thread.fpu);
+ sync_fpstate(x86_task_fpu(target));
copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *tmpbuf = NULL;
int ret;
@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct cet_user_state *cetregs;
if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
struct cet_user_state *cetregs;
unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
- __convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
+ __convert_from_fxsr(env, tsk, &x86_task_fpu(tsk)->fpstate->regs.fxsave);
}
void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;
@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
int ret;
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..b203bf4617fc 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
- fx_sw->xstate_size > current->thread.fpu.fpstate->user_size ||
+ fx_sw->xstate_size > x86_task_fpu(current)->fpstate->user_size ||
fx_sw->xstate_size > fx_sw->extended_size)
goto setfx;
@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
if (likely(magic2 == FP_XSTATE_MAGIC2))
return true;
setfx:
- trace_x86_fpu_xstate_check_failed(¤t->thread.fpu);
+ trace_x86_fpu_xstate_check_failed(x86_task_fpu(current));
/* Set the parameters for fx only state */
fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
{
if (use_fxsr()) {
- struct xregs_state *xsave = &tsk->thread.fpu.fpstate->regs.xsave;
+ struct xregs_state *xsave = &x86_task_fpu(tsk)->fpstate->regs.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_32 __user *fp = buf;
fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD))
- fxsave(&tsk->thread.fpu.fpstate->regs.fxsave);
+ fxsave(&x86_task_fpu(tsk)->fpstate->regs.fxsave);
fpregs_unlock();
convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
struct task_struct *tsk = current;
- struct fpstate *fpstate = tsk->thread.fpu.fpstate;
+ struct fpstate *fpstate = x86_task_fpu(tsk)->fpstate;
bool ia32_fxstate = (buf != buf_fx);
int ret;
@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
*/
static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int ret;
/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
bool ia32_fxstate)
{
struct task_struct *tsk = current;
- struct fpu *fpu = &tsk->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
*/
bool fpu__restore_sig(void __user *buf, int ia32_frame)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
void __user *buf_fx = buf;
bool ia32_fxstate = false;
bool success = false;
@@ -503,7 +503,7 @@ unsigned long
fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long *buf_fx, unsigned long *size)
{
- unsigned long frame_size = xstate_sigframe_size(current->thread.fpu.fpstate);
+ unsigned long frame_size = xstate_sigframe_size(x86_task_fpu(current)->fpstate);
*buf_fx = sp = round_down(sp - frame_size, 64);
if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..90b11671e943 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
*/
init_fpstate.xfd = 0;
- fpstate_reset(¤t->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}
/*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
/* Reset the state for the current task */
- fpstate_reset(¤t->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
/*
* Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
}
if (fpu_state_size_dynamic())
- wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
+ wrmsrl(MSR_IA32_XFD, x86_task_fpu(current)->fpstate->xfd);
}
/*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, tsk->thread.fpu.fpstate,
- tsk->thread.fpu.fpstate->user_xfeatures,
+ __copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
+ x86_task_fpu(tsk)->fpstate->user_xfeatures,
tsk->thread.pkru, copy_mode);
}
@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
+ return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru);
}
static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
* The XFD MSR does not match fpstate->xfd. That's invalid when
* the passed in fpstate is current's fpstate.
*/
- if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+ if (fpstate->xfd == x86_task_fpu(current)->fpstate->xfd)
return false;
/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
unsigned int usize, struct fpu_guest *guest_fpu)
{
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *curfps, *newfps = NULL;
unsigned int fpsize;
bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
* AVX512.
*/
bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
- struct fpu *fpu = ¤t->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;
unsigned int ksize, usize;
u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
return -EPERM;
}
- fpu = ¤t->group_leader->thread.fpu;
+ fpu = x86_task_fpu(current->group_leader);
perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
ksize = perm->__state_size;
usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
*/
static void avx512_status(struct seq_file *m, struct task_struct *task)
{
- unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+ unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
long delta;
if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 05df04f39628..391fa3c69e5e 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
static inline u64 xstate_get_group_perm(bool guest)
{
- struct fpu *fpu = ¤t->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;
/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
- struct fpstate *fpstate = current->thread.fpu.fpstate;
+ struct fpstate *fpstate = x86_task_fpu(current)->fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask;
u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
u32 hmask = mask >> 32;
int err;
- xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+ xfd_validate_state(x86_task_fpu(current)->fpstate, mask, true);
stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..adfeefd6375a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
dst->thread.vm86 = NULL;
#endif
/* Drop the copied pointer to current's fpstate */
- dst->thread.fpu.fpstate = NULL;
+ x86_task_fpu(dst)->fpstate = NULL;
return 0;
}
@@ -106,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void arch_release_task_struct(struct task_struct *tsk)
{
if (fpu_state_size_dynamic())
- fpstate_free(&tsk->thread.fpu);
+ fpstate_free(x86_task_fpu(tsk));
}
#endif
@@ -116,7 +116,7 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = &t->fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);
if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..d7906f5979a4 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
bool stepping, failed;
- struct fpu *fpu = ¤t->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
if (v8086_mode(regs))
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
return true;
- fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+ fsize += x86_task_fpu(current->group_leader)->perm.__user_state_size;
if (likely(ss_size > fsize))
return true;
if (strict_sigaltstack_size)
return ss_size > fsize;
- mask = current->group_leader->thread.fpu.perm.__state_perm;
+ mask = x86_task_fpu(current->group_leader)->perm.__state_perm;
if (mask & XFEATURE_MASK_USER_DYNAMIC)
return ss_size > fsize;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..b19fb494c57c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
- struct fpu *fpu = &task->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(task);
int si_code;
char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
"simd exception";
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index d62662bdd460..5f253ae406b6 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)
void finit(void)
{
- fpstate_init_soft(¤t->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
}
/*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 91c52ead1226..5034df617740 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
void *space = s387->st_space;
int ret;
int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
const void *space = s387->st_space;
int offset = (S387->ftop & 7) * 10, other = 80 - offset;
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index eec3e4805c75..5e238e930fe3 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
}
-#define I387 (¤t->thread.fpu.fpstate->regs)
+#define I387 (&x86_task_fpu(current)->fpstate->regs)
#define FPU_info (I387->soft.info)
#define FPU_CS (*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 51986e8a9d35..1359ad75da3a 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu.state)
+ * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
2024-06-08 7:31 ` [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
2024-06-08 7:31 ` [PATCH 2/9] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu() Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-10 21:13 ` Nathan Chancellor
2024-06-08 7:31 ` [PATCH 4/9] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
` (5 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.
This allows us to remove the old kludge of task_struct being variable size:
struct task_struct {
...
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
/* CPU-specific state of this task: */
struct thread_struct thread;
/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure. It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};
... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.
But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:
DECLARE_PER_TASK(type, name);
...
per_task(current, name) = val;
... which requires task_struct to be a constant size struct.
The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20240605083557.2051480-2-mingo@kernel.org
---
arch/x86/include/asm/processor.h | 20 +++++++++-----------
arch/x86/kernel/fpu/core.c | 23 ++++++++++++-----------
arch/x86/kernel/fpu/init.c | 19 ++++++++++++-------
arch/x86/kernel/process.c | 2 +-
include/linux/sched.h | 13 +++----------
5 files changed, 37 insertions(+), 40 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 35aa8f652964..64509c7f26c8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,21 +504,19 @@ struct thread_struct {
#endif
/* Floating point and extended processor state */
- struct fpu fpu;
- /*
- * WARNING: 'fpu' is dynamically-sized. It *MUST* be at
- * the end.
- */
+ struct fpu *fpu;
};
-#define x86_task_fpu(task) (&(task)->thread.fpu)
+#define x86_task_fpu(task) ((task)->thread.fpu)
-extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
-
-static inline void arch_thread_struct_whitelist(unsigned long *offset,
- unsigned long *size)
+/*
+ * X86 doesn't need any embedded-FPU-struct quirks:
+ */
+static inline void
+arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
{
- fpu_thread_struct_whitelist(offset, size);
+ *offset = 0;
+ *size = 0;
}
static inline void
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index ca6745f8ac2a..f0c4367804b3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long ssp)
{
+ /*
+ * We allocate the new FPU structure right after the end of the task struct.
+ * task allocation size already took this into account.
+ *
+ * This is safe because task_struct size is a multiple of cacheline size.
+ */
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = x86_task_fpu(dst);
+ struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+
+ BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
+ BUG_ON(!src_fpu);
+
+ dst->thread.fpu = dst_fpu;
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
return 0;
}
-/*
- * Whitelist the FPU register state embedded into task_struct for hardened
- * usercopy.
- */
-void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
-{
- *offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
- *size = fpu_kernel_cfg.default_size;
-}
-
/*
* Drops current FPU state: deactivates the fpregs and
* the fpstate. NOTE: it still leaves previous contents
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index ad5cb2943d37..4e8d37b5a90b 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}
+static struct fpu x86_init_fpu __read_mostly;
+
static void __init fpu__init_system_early_generic(void)
{
+ int this_cpu = smp_processor_id();
+
+ fpstate_reset(&x86_init_fpu);
+ current->thread.fpu = &x86_init_fpu;
+ per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
+ x86_init_fpu.last_cpu = this_cpu;
+
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
if (fpu__probe_without_cpuid())
@@ -150,6 +159,8 @@ static void __init fpu__init_task_struct_size(void)
{
int task_size = sizeof(struct task_struct);
+ task_size += sizeof(struct fpu);
+
/*
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
@@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)
/*
* We dynamically size 'struct fpu', so we require that
- * it be at the end of 'thread_struct' and that
- * 'thread_struct' be at the end of 'task_struct'. If
- * you hit a compile error here, check the structure to
- * see if something got added to the end.
+ * 'state' be at the end of 'it:
*/
CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
- CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
- CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
arch_task_struct_size = task_size;
}
@@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
void __init fpu__init_system(void)
{
- fpstate_reset(x86_task_fpu(current));
fpu__init_system_early_generic();
/*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index adfeefd6375a..5bb73bc0e31a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
dst->thread.vm86 = NULL;
#endif
/* Drop the copied pointer to current's fpstate */
- x86_task_fpu(dst)->fpstate = NULL;
+ dst->thread.fpu = NULL;
return 0;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..215a7380e41c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1554,21 +1554,14 @@ struct task_struct {
struct user_event_mm *user_event_mm;
#endif
+ /* CPU-specific state of this task: */
+ struct thread_struct thread;
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-
- /* CPU-specific state of this task: */
- struct thread_struct thread;
-
- /*
- * WARNING: on x86, 'thread_struct' contains a variable-sized
- * structure. It *MUST* be at the end of 'task_struct'.
- *
- * Do not put anything below here!
- */
};
#define TASK_REPORT_IDLE (TASK_REPORT + 1)
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 4/9] x86/fpu: Remove the thread::fpu pointer
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (2 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 3/9] x86/fpu: Make task_struct::thread constant size Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 5/9] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
` (4 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.
This improves code generation a bit:
kepler:~/tip> size vmlinux.before vmlinux.after
text data bss dec hex filename
26475405 10435342 1740804 38651551 24dc69f vmlinux.before
26475339 10959630 1216516 38651485 24dc65d vmlinux.after
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20240605083557.2051480-3-mingo@kernel.org
---
arch/x86/include/asm/processor.h | 5 +----
arch/x86/kernel/fpu/core.c | 4 +---
arch/x86/kernel/fpu/init.c | 1 -
arch/x86/kernel/process.c | 2 --
arch/x86/kernel/vmlinux.lds.S | 4 ++++
5 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 64509c7f26c8..3de609aad0af 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -502,12 +502,9 @@ struct thread_struct {
struct thread_shstk shstk;
#endif
-
- /* Floating point and extended processor state */
- struct fpu *fpu;
};
-#define x86_task_fpu(task) ((task)->thread.fpu)
+#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f0c4367804b3..167a9c7ed6d3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+ struct fpu *dst_fpu = x86_task_fpu(dst);
BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
BUG_ON(!src_fpu);
- dst->thread.fpu = dst_fpu;
-
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..794682b52373 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -78,7 +78,6 @@ static void __init fpu__init_system_early_generic(void)
int this_cpu = smp_processor_id();
fpstate_reset(&x86_init_fpu);
- current->thread.fpu = &x86_init_fpu;
per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
x86_init_fpu.last_cpu = this_cpu;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5bb73bc0e31a..4184c085627e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,8 +96,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
#ifdef CONFIG_VM86
dst->thread.vm86 = NULL;
#endif
- /* Drop the copied pointer to current's fpstate */
- dst->thread.fpu = NULL;
return 0;
}
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3509afc6a672..226244a894da 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,6 +170,10 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
+ __x86_init_fpu_begin = .;
+ . = __x86_init_fpu_begin + 128*PAGE_SIZE;
+ __x86_init_fpu_end = .;
+
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 5/9] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (3 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 4/9] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
` (3 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/kernel/fpu/core.c | 4 +++-
arch/x86/kernel/process.c | 3 +--
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 1feaa68b7567..5fd12634bcc4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -10,7 +10,7 @@
#include <asm/trace/fpu.h>
extern void save_fpregs_to_fpstate(struct fpu *fpu);
-extern void fpu__drop(struct fpu *fpu);
+extern void fpu__drop(struct task_struct *tsk);
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long shstk_addr);
extern void fpu_flush_thread(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 167a9c7ed6d3..c85667c0695d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -672,8 +672,10 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* a state-restore is coming: either an explicit one,
* or a reschedule.
*/
-void fpu__drop(struct fpu *fpu)
+void fpu__drop(struct task_struct *tsk)
{
+ struct fpu *fpu = x86_task_fpu(tsk);
+
preempt_disable();
if (fpu == x86_task_fpu(current)) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4184c085627e..0a24997c8cc6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -114,7 +114,6 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = x86_task_fpu(tsk);
if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
@@ -122,7 +121,7 @@ void exit_thread(struct task_struct *tsk)
free_vm86(t);
shstk_free(tsk);
- fpu__drop(fpu);
+ fpu__drop(tsk);
}
static int set_new_tls(struct task_struct *p, unsigned long tls)
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (4 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 5/9] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-10 10:01 ` Oleg Nesterov
2024-06-08 7:31 ` [PATCH 7/9] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
` (2 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
fpu__drop() calls x86_task_fpu() unconditionally, while the FPU context
area will not be present if it's the init task, and should not be in
use when it's some other type of kthread.
Return early for PF_KTHREAD tasks. The debug warning in x86_task_fpu()
will catch any kthreads attempting to use the FPU save area.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/fpu/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c85667c0695d..52d5843c886c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -674,7 +674,13 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
*/
void fpu__drop(struct task_struct *tsk)
{
- struct fpu *fpu = x86_task_fpu(tsk);
+ struct fpu *fpu;
+
+ /* PF_KTHREAD tasks do not use the FPU context area: */
+ if (tsk->flags & PF_KTHREAD)
+ return;
+
+ fpu = x86_task_fpu(tsk);
preempt_disable();
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 7/9] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (5 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 8/9] x86/fpu: Use 'fpstate' variable names consistently Ingo Molnar
2024-06-08 7:31 ` [PATCH 9/9] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
init_task's FPU state initialization was a bit of a hack:
__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;
But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.
As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20240605083557.2051480-4-mingo@kernel.org
---
arch/x86/include/asm/processor.h | 6 +++++-
arch/x86/kernel/fpu/core.c | 15 +++++++++++----
arch/x86/kernel/fpu/init.c | 3 +--
arch/x86/kernel/fpu/xstate.c | 3 ---
arch/x86/kernel/vmlinux.lds.S | 4 ----
5 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3de609aad0af..4fd3364dbc73 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
#endif
};
-#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+#endif
/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 52d5843c886c..db608afa686f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,16 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+ if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+ return NULL;
+
+ return (void *)task + sizeof(*task);
+}
+#endif
+
/*
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
@@ -590,11 +600,9 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
*
* This is safe because task_struct size is a multiple of cacheline size.
*/
- struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = x86_task_fpu(dst);
+ struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
- BUG_ON(!src_fpu);
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -657,7 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
if (update_fpu_shstk(dst, ssp))
return 1;
- trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);
return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 794682b52373..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+ ;
else
#endif
asm volatile ("fninit");
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(x86_task_fpu(current));
}
/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
if (err)
goto out_disable;
- /* Reset the state for the current task */
- fpstate_reset(x86_task_fpu(current));
-
/*
* Update info used for ptrace frames; use standard-format size and no
* supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
- __x86_init_fpu_begin = .;
- . = __x86_init_fpu_begin + 128*PAGE_SIZE;
- __x86_init_fpu_end = .;
-
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 8/9] x86/fpu: Use 'fpstate' variable names consistently
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (6 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 7/9] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
2024-06-08 7:31 ` [PATCH 9/9] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
A few uses of 'fps' snuck in, which is rather confusing
(to me) as it suggests frames-per-second. ;-)
Rename them to the canonical 'fpstate' name.
No change in functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/fpu/api.h | 2 +-
arch/x86/kernel/fpu/core.c | 14 +++++++-------
arch/x86/kernel/fpu/xstate.c | 4 ++--
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index f86ad3335529..708ea13ab2b0 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -139,7 +139,7 @@ static inline void fpstate_free(struct fpu *fpu) { }
#endif
/* fpstate-related functions which are exported to KVM */
-extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
+extern void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature);
extern u64 xstate_get_guest_group_perm(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index db608afa686f..6aef0116add4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -266,16 +266,16 @@ EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate);
void fpu_free_guest_fpstate(struct fpu_guest *gfpu)
{
- struct fpstate *fps = gfpu->fpstate;
+ struct fpstate *fpstate = gfpu->fpstate;
- if (!fps)
+ if (!fpstate)
return;
- if (WARN_ON_ONCE(!fps->is_valloc || !fps->is_guest || fps->in_use))
+ if (WARN_ON_ONCE(!fpstate->is_valloc || !fpstate->is_guest || fpstate->in_use))
return;
gfpu->fpstate = NULL;
- vfree(fps);
+ vfree(fpstate);
}
EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate);
@@ -326,12 +326,12 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = x86_task_fpu(current)->fpstate;
+ struct fpstate *fpstate = x86_task_fpu(current)->fpstate;
lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
- rdmsrl(MSR_IA32_XFD, fps->xfd);
- __this_cpu_write(xfd_state, fps->xfd);
+ rdmsrl(MSR_IA32_XFD, fpstate->xfd);
+ __this_cpu_write(xfd_state, fpstate->xfd);
}
}
EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1f37da22ddbe..68dea76b4afc 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1392,9 +1392,9 @@ void xrstors(struct xregs_state *xstate, u64 mask)
}
#if IS_ENABLED(CONFIG_KVM)
-void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature)
+void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature)
{
- void *addr = get_xsave_addr(&fps->regs.xsave, xfeature);
+ void *addr = get_xsave_addr(&fpstate->regs.xsave, xfeature);
if (addr)
memset(addr, 0, xstate_sizes[xfeature]);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 9/9] x86/fpu: Fix stale comment in ex_handler_fprestore()
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
` (7 preceding siblings ...)
2024-06-08 7:31 ` [PATCH 8/9] x86/fpu: Use 'fpstate' variable names consistently Ingo Molnar
@ 2024-06-08 7:31 ` Ingo Molnar
8 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-08 7:31 UTC (permalink / raw)
To: linux-kernel
Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
Borislav Petkov, Brian Gerst, H . Peter Anvin, Linus Torvalds,
Oleg Nesterov, Thomas Gleixner, Uros Bizjak
The state -> fpstate rename of the fpu::fpstate field didn't
get propagated to the comment describing ex_handler_fprestore(),
fix it.
Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/mm/extable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 1359ad75da3a..bf8dab18be97 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
+ * here because the FPU state of a task using the FPU (struct fpu::fpstate)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit
2024-06-08 7:31 ` [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
@ 2024-06-10 10:01 ` Oleg Nesterov
0 siblings, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-10 10:01 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
Peter Zijlstra, Borislav Petkov, Brian Gerst, H . Peter Anvin,
Linus Torvalds, Thomas Gleixner, Uros Bizjak
The whole series looks good to me, and afaics 7/9 allows more
cleanups / improvements.
But let me ask a stupid question about fpu__drop(), I know nothing
about fpu asm.
fpu__drop() does
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
and this comment predates the git history. Could someone explain
why exactly the exiting user-space thread needs fwait ?
And if it is needed, suppose that a kernel thread exits right
after kernel_fpu_end(), can this lead to the delayed exception?
And otoh, perhaps fpu__drop() can set TIF_NEED_FPU_LOAD to avoid
switch_fpu_prepare()->save_fpregs_to_fpstate() on its path to the
final schedule?
On 06/08, Ingo Molnar wrote:
>
> void fpu__drop(struct task_struct *tsk)
> {
> - struct fpu *fpu = x86_task_fpu(tsk);
> + struct fpu *fpu;
> +
> + /* PF_KTHREAD tasks do not use the FPU context area: */
> + if (tsk->flags & PF_KTHREAD)
> + return;
I think it can already do
if (tsk->flags & (PF_KTHREAD | PF_USER_WORKER))
return;
This matches other similar checks. But I won't insist, and I
think all these checks need some cleanups anyway.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-08 7:31 ` [PATCH 3/9] x86/fpu: Make task_struct::thread constant size Ingo Molnar
@ 2024-06-10 21:13 ` Nathan Chancellor
2024-06-11 12:42 ` Oleg Nesterov
0 siblings, 1 reply; 21+ messages in thread
From: Nathan Chancellor @ 2024-06-10 21:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
Peter Zijlstra, Borislav Petkov, Brian Gerst, H . Peter Anvin,
Linus Torvalds, Oleg Nesterov, Thomas Gleixner, Uros Bizjak
Hi Ingo,
On Sat, Jun 08, 2024 at 09:31:28AM +0200, Ingo Molnar wrote:
> Turn thread.fpu into a pointer. Since most FPU code internals work by passing
> around the FPU pointer already, the code generation impact is small.
>
> This allows us to remove the old kludge of task_struct being variable size:
>
> struct task_struct {
>
> ...
> /*
> * New fields for task_struct should be added above here, so that
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
>
> /* CPU-specific state of this task: */
> struct thread_struct thread;
>
> /*
> * WARNING: on x86, 'thread_struct' contains a variable-sized
> * structure. It *MUST* be at the end of 'task_struct'.
> *
> * Do not put anything below here!
> */
> };
>
> ... which creates a number of problems, such as requiring thread_struct to be
> the last member of the struct - not allowing it to be struct-randomized, etc.
>
> But the primary motivation is to allow the decoupling of task_struct from
> hardware details (<asm/processor.h> in particular), and to eventually allow
> the per-task infrastructure:
>
> DECLARE_PER_TASK(type, name);
> ...
> per_task(current, name) = val;
>
> ... which requires task_struct to be a constant size struct.
>
> The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
> now that the FPU structure is not embedded in the task struct anymore, which
> reduces text footprint a bit.
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Link: https://lore.kernel.org/r/20240605083557.2051480-2-mingo@kernel.org
> ---
> arch/x86/include/asm/processor.h | 20 +++++++++-----------
> arch/x86/kernel/fpu/core.c | 23 ++++++++++++-----------
> arch/x86/kernel/fpu/init.c | 19 ++++++++++++-------
> arch/x86/kernel/process.c | 2 +-
> include/linux/sched.h | 13 +++----------
> 5 files changed, 37 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 35aa8f652964..64509c7f26c8 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -504,21 +504,19 @@ struct thread_struct {
> #endif
>
> /* Floating point and extended processor state */
> - struct fpu fpu;
> - /*
> - * WARNING: 'fpu' is dynamically-sized. It *MUST* be at
> - * the end.
> - */
> + struct fpu *fpu;
> };
>
> -#define x86_task_fpu(task) (&(task)->thread.fpu)
> +#define x86_task_fpu(task) ((task)->thread.fpu)
>
> -extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
> -
> -static inline void arch_thread_struct_whitelist(unsigned long *offset,
> - unsigned long *size)
> +/*
> + * X86 doesn't need any embedded-FPU-struct quirks:
> + */
> +static inline void
> +arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
> {
> - fpu_thread_struct_whitelist(offset, size);
> + *offset = 0;
> + *size = 0;
> }
>
> static inline void
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index ca6745f8ac2a..f0c4367804b3 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
> int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> unsigned long ssp)
> {
> + /*
> + * We allocate the new FPU structure right after the end of the task struct.
> + * task allocation size already took this into account.
> + *
> + * This is safe because task_struct size is a multiple of cacheline size.
> + */
> struct fpu *src_fpu = x86_task_fpu(current);
> - struct fpu *dst_fpu = x86_task_fpu(dst);
> + struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> +
> + BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
> + BUG_ON(!src_fpu);
> +
> + dst->thread.fpu = dst_fpu;
>
> /* The new task's FPU state cannot be valid in the hardware. */
> dst_fpu->last_cpu = -1;
> @@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> return 0;
> }
>
> -/*
> - * Whitelist the FPU register state embedded into task_struct for hardened
> - * usercopy.
> - */
> -void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
> -{
> - *offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
> - *size = fpu_kernel_cfg.default_size;
> -}
> -
> /*
> * Drops current FPU state: deactivates the fpregs and
> * the fpstate. NOTE: it still leaves previous contents
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index ad5cb2943d37..4e8d37b5a90b 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
> return fsw == 0 && (fcw & 0x103f) == 0x003f;
> }
>
> +static struct fpu x86_init_fpu __read_mostly;
> +
> static void __init fpu__init_system_early_generic(void)
> {
> + int this_cpu = smp_processor_id();
> +
> + fpstate_reset(&x86_init_fpu);
> + current->thread.fpu = &x86_init_fpu;
> + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> + x86_init_fpu.last_cpu = this_cpu;
> +
> if (!boot_cpu_has(X86_FEATURE_CPUID) &&
> !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
> if (fpu__probe_without_cpuid())
> @@ -150,6 +159,8 @@ static void __init fpu__init_task_struct_size(void)
> {
> int task_size = sizeof(struct task_struct);
>
> + task_size += sizeof(struct fpu);
> +
> /*
> * Subtract off the static size of the register state.
> * It potentially has a bunch of padding.
> @@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)
>
> /*
> * We dynamically size 'struct fpu', so we require that
> - * it be at the end of 'thread_struct' and that
> - * 'thread_struct' be at the end of 'task_struct'. If
> - * you hit a compile error here, check the structure to
> - * see if something got added to the end.
> + * 'state' be at the end of 'it:
> */
> CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
> - CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
> - CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
>
> arch_task_struct_size = task_size;
> }
> @@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
> */
> void __init fpu__init_system(void)
> {
> - fpstate_reset(x86_task_fpu(current));
> fpu__init_system_early_generic();
>
> /*
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index adfeefd6375a..5bb73bc0e31a 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> dst->thread.vm86 = NULL;
> #endif
> /* Drop the copied pointer to current's fpstate */
> - x86_task_fpu(dst)->fpstate = NULL;
> + dst->thread.fpu = NULL;
>
> return 0;
> }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 61591ac6eab6..215a7380e41c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1554,21 +1554,14 @@ struct task_struct {
> struct user_event_mm *user_event_mm;
> #endif
>
> + /* CPU-specific state of this task: */
> + struct thread_struct thread;
> +
> /*
> * New fields for task_struct should be added above here, so that
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
> -
> - /* CPU-specific state of this task: */
> - struct thread_struct thread;
> -
> - /*
> - * WARNING: on x86, 'thread_struct' contains a variable-sized
> - * structure. It *MUST* be at the end of 'task_struct'.
> - *
> - * Do not put anything below here!
> - */
> };
>
> #define TASK_REPORT_IDLE (TASK_REPORT + 1)
> --
> 2.43.0
>
I am seeing a crash with this change in -next as
commit 018d456409d6 ("x86/fpu: Make task_struct::thread constant size")
and I still see it in WIP.x86/fpu from commit 4f4a9b399357 ("x86/fpu:
Make task_struct::thread constant size").
$ make -skj"$(nproc)" ARCH=i386 CROSS_COMPILE=i386-linux- defconfig bzImage
$ qemu-system-i386 \
-display none \
-nodefaults \
-M q35 \
-d unimp,guest_errors \
-append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
-kernel arch/x86/boot/bzImage \
-initrd rootfs.cpio \
-cpu host \
-enable-kvm \
-m 512m \
-smp 8 \
-serial mon:stdio
[ 0.000000] Linux version 6.10.0-rc2-00003-g4f4a9b399357 (nathan@thelio-3990X) (i386-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Mon Jun 10 13:58:20 MST 2024
...
[ 0.337514] ------------[ cut here ]------------
[ 0.338184] Bad FPU state detected at restore_fpregs_from_fpstate+0x38/0x6c, reinitializing FPU registers.
[ 0.338195] WARNING: CPU: 2 PID: 100 at arch/x86/mm/extable.c:127 fixup_exception+0x41e/0x45c
[ 0.340506] Modules linked in:
[ 0.340905] CPU: 2 PID: 100 Comm: modprobe Not tainted 6.10.0-rc2-00003-g4f4a9b399357 #1
[ 0.341939] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.343363] EIP: fixup_exception+0x41e/0x45c
[ 0.343916] Code: e8 cb c0 00 00 0f 0b eb cb 0f 0b ba 4c e9 7f da eb b6 b2 01 88 15 16 bf 6c da 89 44 24 04 c7 04 24 18 39 3b da e8 a6 c0 00 00 <0f> 0b e9 ee fd ff ff 8d b4 26 00 00 00 00 0f 0b ba 4c e9 7f da e9
[ 0.346288] EAX: 0000005e EBX: da4e53f0 ECX: 00000000 EDX: da5d606c
[ 0.347082] ESI: c1ba5ef0 EDI: 0000000d EBP: c1ba5e58 ESP: c1ba5dd8
[ 0.347882] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010086
[ 0.348744] CR0: 80050033 CR2: bfe4afcb CR3: 012f5000 CR4: 00350ed0
[ 0.349540] Call Trace:
[ 0.349855] ? show_regs+0x4d/0x54
[ 0.350300] ? fixup_exception+0x41e/0x45c
[ 0.350828] ? __warn+0x84/0x150
[ 0.351242] ? fixup_exception+0x41e/0x45c
[ 0.351765] ? fixup_exception+0x41e/0x45c
[ 0.352289] ? report_bug+0x186/0x1b0
[ 0.352756] ? exc_overflow+0x50/0x50
[ 0.353230] ? handle_bug+0x2d/0x50
[ 0.353675] ? exc_invalid_op+0x1b/0x70
[ 0.354171] ? console_unlock+0x53/0xc4
[ 0.354658] ? handle_exception+0x14b/0x14b
[ 0.355196] ? exc_overflow+0x50/0x50
[ 0.355663] ? fixup_exception+0x41e/0x45c
[ 0.356184] ? exc_overflow+0x50/0x50
[ 0.356651] ? fixup_exception+0x41e/0x45c
[ 0.357172] ? restore_fpregs_from_fpstate+0x38/0x6c
[ 0.357809] ? _get_random_bytes+0x65/0x190
[ 0.358341] ? mt_find+0xd1/0x458
[ 0.358766] ? exc_bounds+0xac/0xac
[ 0.359213] exc_general_protection+0x97/0x358
[ 0.359775] ? randomize_page+0x37/0x54
[ 0.360271] ? exc_bounds+0xac/0xac
[ 0.360722] handle_exception+0x14b/0x14b
[ 0.361232] EIP: restore_fpregs_from_fpstate+0x38/0x6c
[ 0.361885] Code: 7d fc 89 ca eb 09 cc cc cc db e2 0f 77 db 03 3e 8d 74 26 00 8b 3d ec 31 46 da 8b 0d e8 31 46 da 21 fa 8d 7b 40 21 c8 0f c7 1f <8b> 5d f8 8b 7d fc 89 ec 5d c3 66 90 3e 8d 74 26 00 0f ae 4b 40 8b
[ 0.364284] EAX: 00000007 EBX: c1a73860 ECX: 00000007 EDX: 00000000
[ 0.365077] ESI: c1a73820 EDI: c1a738a0 EBP: c1ba5f54 ESP: c1ba5f4c
[ 0.365870] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
[ 0.366728] ? exc_bounds+0xac/0xac
[ 0.367171] ? exc_bounds+0xac/0xac
[ 0.367621] ? restore_fpregs_from_fpstate+0x35/0x6c
[ 0.368255] switch_fpu_return+0x49/0xd0
[ 0.368756] syscall_exit_to_user_mode+0x181/0x1a8
[ 0.369364] ? call_usermodehelper_exec_async+0xbe/0x1ac
[ 0.370040] ? call_usermodehelper+0x8c/0x8c
[ 0.370586] ret_from_fork+0x23/0x44
[ 0.371048] ? call_usermodehelper+0x8c/0x8c
[ 0.371592] ret_from_fork_asm+0x12/0x18
[ 0.372092] entry_INT80_32+0x108/0x108
[ 0.372581] EIP: 0xb7f77087
[ 0.372941] Code: Unable to access opcode bytes at 0xb7f7705d.
[ 0.373679] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 0.374477] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe4aed0
[ 0.375271] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
[ 0.376134] ---[ end trace 0000000000000000 ]---
...
The rootfs is from [1] if you should need it (x86-rootfs.cpio.zst,
decompress with zstd first). I am more than happy to provide any
additional information or test patches as necessary.
[1]: https://github.com/ClangBuiltLinux/boot-utils/releases
Cheers,
Nathan
# bad: [d35b2284e966c0bef3e2182a5c5ea02177dd32e4] Add linux-next specific files for 20240607
# good: [8a92980606e3585d72d510a03b59906e96755b8a] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect start 'd35b2284e966c0bef3e2182a5c5ea02177dd32e4' '8a92980606e3585d72d510a03b59906e96755b8a'
# good: [faef37a085e57f29479f853624948cdc7df6e366] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect good faef37a085e57f29479f853624948cdc7df6e366
# good: [822946749b5f02d9f49dde4b4cb8f2535c247ce5] Merge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel
git bisect good 822946749b5f02d9f49dde4b4cb8f2535c247ce5
# bad: [9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad 9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32
# good: [9a85e49be89a9150fd2ec9964f48013a00c261d1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 9a85e49be89a9150fd2ec9964f48013a00c261d1
# bad: [0435675ef6ba0b3e14859bd4c6edb0d81093d28e] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
git bisect bad 0435675ef6ba0b3e14859bd4c6edb0d81093d28e
# bad: [8f4652aac08030f2d5110be87229ad0f15c33496] Merge branch into tip/master: 'timers/core'
git bisect bad 8f4652aac08030f2d5110be87229ad0f15c33496
# bad: [0724c2b228bef4ea71cf5be0ab64de1065e32299] Merge branch into tip/master: 'irq/core'
git bisect bad 0724c2b228bef4ea71cf5be0ab64de1065e32299
# good: [51d8bfbcae9aeedee48cddefe6776c96acb2d83e] Merge branch into tip/master: 'x86/urgent'
git bisect good 51d8bfbcae9aeedee48cddefe6776c96acb2d83e
# bad: [80b691d02d005beafcd120a3b92c951155db543a] x86/fpu: Use 'fpstate' variable names consistently
git bisect bad 80b691d02d005beafcd120a3b92c951155db543a
# bad: [eb6428fd2eb4f72c735ba6fddd62147ac8d544c2] x86/fpu: Remove the thread::fpu pointer
git bisect bad eb6428fd2eb4f72c735ba6fddd62147ac8d544c2
# bad: [7329f3c69f07a502eb2ab5a6b4d27cd6a067579b] x86/fpu: Introduce the x86_task_fpu() helper method
git bisect bad 7329f3c69f07a502eb2ab5a6b4d27cd6a067579b
# bad: [018d456409d6c9ef4046eb5db95ce357acfeba23] x86/fpu: Make task_struct::thread constant size
git bisect bad 018d456409d6c9ef4046eb5db95ce357acfeba23
# first bad commit: [018d456409d6c9ef4046eb5db95ce357acfeba23] x86/fpu: Make task_struct::thread constant size
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-10 21:13 ` Nathan Chancellor
@ 2024-06-11 12:42 ` Oleg Nesterov
2024-06-12 8:17 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-11 12:42 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak,
Nathan Chancellor
I don't think this can explain the problem reported by Nathan, but.
On 06/08, Ingo Molnar wrote:
>
> +static struct fpu x86_init_fpu __read_mostly;
> +
> static void __init fpu__init_system_early_generic(void)
> {
> + int this_cpu = smp_processor_id();
> +
> + fpstate_reset(&x86_init_fpu);
> + current->thread.fpu = &x86_init_fpu;
OK,
> + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> + x86_init_fpu.last_cpu = this_cpu;
Why? I think it should do
x86_init_fpu.last_cpu = -1;
set_thread_flag(TIF_NEED_FPU_LOAD);
And the next patch should kill x86_init_fpu altogether, but keep
TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-11 12:42 ` Oleg Nesterov
@ 2024-06-12 8:17 ` Ingo Molnar
2024-06-12 9:40 ` Oleg Nesterov
2024-06-12 18:41 ` Oleg Nesterov
0 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2024-06-12 8:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]
* Oleg Nesterov <oleg@redhat.com> wrote:
> I don't think this can explain the problem reported by Nathan, but.
>
> On 06/08, Ingo Molnar wrote:
> >
> > +static struct fpu x86_init_fpu __read_mostly;
> > +
> > static void __init fpu__init_system_early_generic(void)
> > {
> > + int this_cpu = smp_processor_id();
> > +
> > + fpstate_reset(&x86_init_fpu);
> > + current->thread.fpu = &x86_init_fpu;
>
> OK,
>
> > + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> > + x86_init_fpu.last_cpu = this_cpu;
>
> Why? I think it should do
>
> x86_init_fpu.last_cpu = -1;
> set_thread_flag(TIF_NEED_FPU_LOAD);
>
> And the next patch should kill x86_init_fpu altogether, but keep
> TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.
So I applied the patch further below on top of:
4f4a9b399357 x86/fpu: Make task_struct::thread constant size
And Nathan's 32-bit kernel testcase [but running with 1 CPU to simplify it]
still crashes in a similar fashion in the (first?) modprobe instance with a
bad FPU state exception:
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[...]
netconsole: network logging started
cfg80211: Loading compiled-in X.509 certificates for regulatory database
------------[ cut here ]------------
Bad FPU state detected at restore_fpregs_from_fpstate+0x38/0x6c, reinitializing FPU registers.
WARNING: CPU: 0 PID: 60 at arch/x86/mm/extable.c:127 fixup_exception+0x41e/0x45c
Modules linked in:
CPU: 0 PID: 60 Comm: modprobe Not tainted 6.10.0-rc2-00003-g4f4a9b399357-dirty #39
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
...
... and the kernel goes down shortly afterwards - full crashlog attached.
What am I missing?
Thanks,
Ingo
===================>
arch/x86/kernel/fpu/init.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..8f912f564fb1 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -75,12 +75,11 @@ static struct fpu x86_init_fpu __read_mostly;
static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+
+ x86_init_fpu.last_cpu = -1;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
[-- Attachment #2: crash.log --]
[-- Type: text/plain, Size: 25096 bytes --]
Linux version 6.10.0-rc2-00003-g4f4a9b399357-dirty (mingo@kepler) (gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #39 SMP PREEMPT_DYNAMIC Wed Jun 12 10:10:53 CEST 2024
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
**********************************************************
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
** **
** This system shows unhashed kernel memory addresses **
** via the console, logs, and other interfaces. This **
** might reduce the security of your system. **
** **
** If you see this message and you are not debugging **
** the kernel, report this immediately to your system **
** administrator! **
** **
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
**********************************************************
earlycon: uart8250 at I/O port 0x3f8 (options '')
printk: legacy bootconsole [uart8250] enabled
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
APIC: Static calls initialized
SMBIOS 3.0.0 present.
DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
DMI: Memory slots populated: 1/1
Hypervisor detected: KVM
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: using sched offset of 243836685 cycles
clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
tsc: Detected 4010.116 MHz processor
last_pfn = 0x1ffe0 max_arch_pfn = 0x100000
MTRR map: 4 entries (3 fixed + 1 variable; max 19), built from 8 variable MTRRs
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
found SMP MP-table at [mem 0x000f5480-0x000f548f]
RAMDISK: [mem 0x1fabd000-0x1ffdffff]
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x00000000000F52C0 000014 (v00 BOCHS )
ACPI: RSDT 0x000000001FFE228B 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: FACP 0x000000001FFE2083 0000F4 (v03 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: DSDT 0x000000001FFE0040 002043 (v01 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: FACS 0x000000001FFE0000 000040
ACPI: APIC 0x000000001FFE2177 000078 (v03 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: HPET 0x000000001FFE21EF 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: MCFG 0x000000001FFE2227 00003C (v01 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: WAET 0x000000001FFE2263 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001)
ACPI: Reserving FACP table memory at [mem 0x1ffe2083-0x1ffe2176]
ACPI: Reserving DSDT table memory at [mem 0x1ffe0040-0x1ffe2082]
ACPI: Reserving FACS table memory at [mem 0x1ffe0000-0x1ffe003f]
ACPI: Reserving APIC table memory at [mem 0x1ffe2177-0x1ffe21ee]
ACPI: Reserving HPET table memory at [mem 0x1ffe21ef-0x1ffe2226]
ACPI: Reserving MCFG table memory at [mem 0x1ffe2227-0x1ffe2262]
ACPI: Reserving WAET table memory at [mem 0x1ffe2263-0x1ffe228a]
0MB HIGHMEM available.
511MB LOWMEM available.
mapped low ram: 0 - 1ffe0000
low ram: 0 - 1ffe0000
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
Normal [mem 0x0000000001000000-0x000000001ffdffff]
HighMem empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x000000001ffdffff]
Initmem setup node 0 [mem 0x0000000000001000-0x000000001ffdffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
ACPI: PM-Timer IO Port: 0x608
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
ACPI: Using ACPI (MADT) for SMP configuration information
TSC deadline timer available
CPU topo: Max. logical packages: 1
CPU topo: Max. logical dies: 1
CPU topo: Max. dies per package: 1
CPU topo: Max. threads per core: 1
CPU topo: Num. cores per package: 1
CPU topo: Num. threads per package: 1
CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
kvm-guest: APIC: eoi() replaced with kvm_guest_apic_eoi_write()
PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff]
PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[mem 0x20000000-0xafffffff] available for PCI devices
Booting paravirtualized kernel on KVM
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
setup_percpu: NR_CPUS:8 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
percpu: Embedded 32 pages/cpu s101428 r0 d29644 u131072
Kernel command line: no_hash_pointers console=ttyS0 earlycon=uart8250,io,0x3f8
random: crng init done
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
Built 1 zonelists, mobility grouping on. Total pages: 130942
mem auto-init: stack:all(zero), heap alloc:off, heap free:off, mlocked free:off
Initializing HighMem for node 0 (00000000:00000000)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
Memory: 490028K/523768K available (14430K kernel code, 2069K rwdata, 4928K rodata, 848K init, 556K bss, 33740K reserved, 0K cma-reserved, 0K highmem)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
trace event string verifier disabled
Dynamic Preempt: voluntary
rcu: Preemptible hierarchical RCU implementation.
rcu: RCU event tracing is enabled.
rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
Trampoline variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
RCU Tasks: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1.
NR_IRQS: 2304, nr_irqs: 256, preallocated irqs: 16
rcu: srcu_init: Setting srcu_struct sizes based on contention.
Console: colour *CGA 80x25
printk: legacy console [ttyS0] enabled
printk: legacy console [ttyS0] enabled
printk: legacy bootconsole [uart8250] disabled
printk: legacy bootconsole [uart8250] disabled
ACPI: Core revision 20240322
APIC: Switch to symmetric I/O mode setup
clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x39cdb139b23, max_idle_ns: 440795315175 ns
Calibrating delay loop (skipped) preset value.. 8020.23 BogoMIPS (lpj=4010116)
x86/cpu: User Mode Instruction Prevention (UMIP) activated
Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Spectre V2 : Mitigation: Retpolines
Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
Spectre V2 : Enabling Speculation Barrier for firmware calls
RETBleed: Vulnerable
Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
Speculative Return Stack Overflow: IBPB-extending microcode not applied!
Speculative Return Stack Overflow: WARNING: See https://kernel.org/doc/html/latest/admin-guide/hw-vuln/srso.html for mitigation options.
Speculative Return Stack Overflow: WARNING: kernel not compiled with MITIGATION_SRSO.
Speculative Return Stack Overflow: Vulnerable: No microcode
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
Freeing SMP alternatives memory: 48K
pid_max: default: 32768 minimum: 301
LSM: initializing lsm=capability,selinux
SELinux: Initializing.
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
smpboot: CPU0: AMD Ryzen Threadripper 3970X 32-Core Processor (family: 0x17, model: 0x31, stepping: 0x0)
Performance Events: Fam17h+ core perfctr, AMD PMU driver.
... version: 0
... bit width: 48
... generic registers: 6
... value mask: 0000ffffffffffff
... max period: 00007fffffffffff
... fixed-purpose events: 0
... event mask: 000000000000003f
signal: max sigframe size: 1760
rcu: Hierarchical SRCU implementation.
rcu: Max phase no-delay instances is 400.
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 1 CPU
smpboot: Total of 1 processors activated (8020.23 BogoMIPS)
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
futex hash table entries: 256 (order: 1, 8192 bytes, linear)
PM: RTC time: 08:11:04, date: 2024-06-12
NET: Registered PF_NETLINK/PF_ROUTE protocol family
audit: initializing netlink subsys (disabled)
thermal_sys: Registered thermal governor 'step_wise'
thermal_sys: Registered thermal governor 'user_space'
audit: type=2000 audit(1718179864.319:1): state=initialized audit_enabled=0 res=1
cpuidle: using governor menu
PCI: ECAM [mem 0xb0000000-0xbfffffff] (base 0xb0000000) for domain 0000 [bus 00-ff]
PCI: ECAM [mem 0xb0000000-0xbfffffff] reserved as E820 entry
PCI: Using ECAM for extended config space
PCI: Using configuration type 1 for base access
kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
ACPI: Added _OSI(Module Device)
ACPI: Added _OSI(Processor Device)
ACPI: Added _OSI(3.0 _SCP Extensions)
ACPI: Added _OSI(Processor Aggregator Device)
ACPI: 1 ACPI AML tables successfully acquired and loaded
ACPI: _OSC evaluation for CPUs failed, trying _PDC
ACPI: Interpreter enabled
ACPI: PM: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
PCI: Using E820 reservations for host bridge windows
ACPI: Enabled 2 GPEs in block 00 to 3F
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
acpi PNP0A08:00: _OSC: platform does not support [LTR]
acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability]
acpi resource window ([0x100000000-0x8ffffffff] ignored, not CPU addressable)
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x20000000-0xafffffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000 conventional PCI endpoint
pci 0000:00:1f.0: [8086:2918] type 00 class 0x060100 conventional PCI endpoint
pci 0000:00:1f.0: quirk: [io 0x0600-0x067f] claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.2: [8086:2922] type 00 class 0x010601 conventional PCI endpoint
pci 0000:00:1f.2: BAR 4 [io 0xc040-0xc05f]
pci 0000:00:1f.2: BAR 5 [mem 0xfebff000-0xfebfffff]
pci 0000:00:1f.3: [8086:2930] type 00 class 0x0c0500 conventional PCI endpoint
pci 0000:00:1f.3: BAR 4 [io 0x0700-0x073f]
ACPI: PCI: Interrupt link LNKA configured for IRQ 10
ACPI: PCI: Interrupt link LNKB configured for IRQ 10
ACPI: PCI: Interrupt link LNKC configured for IRQ 11
ACPI: PCI: Interrupt link LNKD configured for IRQ 11
ACPI: PCI: Interrupt link LNKE configured for IRQ 10
ACPI: PCI: Interrupt link LNKF configured for IRQ 10
ACPI: PCI: Interrupt link LNKG configured for IRQ 11
ACPI: PCI: Interrupt link LNKH configured for IRQ 11
ACPI: PCI: Interrupt link GSIA configured for IRQ 16
ACPI: PCI: Interrupt link GSIB configured for IRQ 17
ACPI: PCI: Interrupt link GSIC configured for IRQ 18
ACPI: PCI: Interrupt link GSID configured for IRQ 19
ACPI: PCI: Interrupt link GSIE configured for IRQ 20
ACPI: PCI: Interrupt link GSIF configured for IRQ 21
ACPI: PCI: Interrupt link GSIG configured for IRQ 22
ACPI: PCI: Interrupt link GSIH configured for IRQ 23
iommu: Default domain type: Translated
iommu: DMA domain TLB invalidation policy: lazy mode
SCSI subsystem initialized
ACPI: bus type USB registered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
Advanced Linux Sound Architecture Driver Initialized.
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
NetLabel: unlabeled traffic allowed by default
PCI: Using ACPI for IRQ routing
vgaarb: loaded
clocksource: Switched to clocksource kvm-clock
VFS: Disk quotas dquot_6.6.0
VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
pnp: PnP ACPI init
system 00:04: [mem 0xb0000000-0xbfffffff window] has been reserved
pnp: PnP ACPI: found 5 devices
clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
NET: Registered PF_INET protocol family
IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
TCP: Hash tables configured (established 4096 bind 4096)
UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
NET: Registered PF_UNIX/PF_LOCAL protocol family
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp-with-tls transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: resource 7 [mem 0x20000000-0xafffffff window]
pci_bus 0000:00: resource 8 [mem 0xc0000000-0xfebfffff window]
PCI: CLS 0 bytes, default 64
Unpacking initramfs...
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x39cdb139b23, max_idle_ns: 440795315175 ns
Initialise system trusted keyrings
workingset: timestamp_bits=30 max_order=17 bucket_order=0
Freeing initrd memory: 5260K
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
9p: Installing v9fs 9p2000 file system support
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
io scheduler mq-deadline registered
io scheduler kyber registered
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: button: Power Button [PWRF]
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
hpet_acpi_add: no address or irqs in _CRS
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
ACPI: bus type drm_connector registered
loop: module loaded
ACPI: \_SB_.GSIA: Enabled at IRQ 16
ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 1.5 Gbps, SATA mode
ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f)
ahci 0000:00:1f.2: flags: 64bit ncq only
scsi host0: ahci
scsi host1: ahci
scsi host2: ahci
scsi host3: ahci
scsi host4: ahci
scsi host5: ahci
ata1: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff100 irq 24 lpm-pol 0
ata2: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff180 irq 24 lpm-pol 0
ata3: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff200 irq 24 lpm-pol 0
ata4: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff280 irq 24 lpm-pol 0
ata5: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff300 irq 24 lpm-pol 0
ata6: SATA max UDMA/133 abar m4096@0xfebff000 port 0xfebff380 irq 24 lpm-pol 0
e100: Intel(R) PRO/100 Network Driver
e100: Copyright(c) 1999-2006 Intel Corporation
e1000: Intel(R) PRO/1000 Network Driver
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
sky2: driver version 1.30
usbcore: registered new interface driver usblp
usbcore: registered new interface driver usb-storage
i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
rtc_cmos 00:03: RTC can wake from S4
rtc_cmos 00:03: registered as rtc0
rtc_cmos 00:03: alarms up to one day, y3k, 242 bytes nvram
i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
i2c i2c-0: Memory type 0x07 not supported yet, not instantiating SPD
amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled
hid: raw HID events driver (C) Jiri Kosina
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
Initializing XFRM netlink socket
NET: Registered PF_INET6 protocol family
Segment Routing with IPv6
In-situ OAM (IOAM) with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered PF_PACKET protocol family
9pnet: Installing 9P2000 support
Key type dns_resolver registered
IPI shorthand broadcast: enabled
registered taskstats version 1
Loading compiled-in X.509 certificates
PM: Magic number: 12:364:169
printk: legacy console [netcon0] enabled
netconsole: network logging started
cfg80211: Loading compiled-in X.509 certificates for regulatory database
------------[ cut here ]------------
Bad FPU state detected at restore_fpregs_from_fpstate+0x38/0x6c, reinitializing FPU registers.
WARNING: CPU: 0 PID: 60 at arch/x86/mm/extable.c:127 fixup_exception+0x41e/0x45c
Modules linked in:
CPU: 0 PID: 60 Comm: modprobe Not tainted 6.10.0-rc2-00003-g4f4a9b399357-dirty #39
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
EIP: fixup_exception+0x41e/0x45c
Code: e8 cb c0 00 00 0f 0b eb cb 0f 0b ba 4c e9 ff cb eb b6 b2 01 88 15 16 bf ec cb 89 44 24 04 c7 04 24 b8 3a bb cb e8 a6 c0 00 00 <0f> 0b e9 ee fd ff ff 8d b4 26 00 00 00 00 0f 0b ba 4c e9 ff cb e9
EAX: 0000005e EBX: cbce53f0 ECX: 00000000 EDX: cbdd606c
ESI: c1ce5ef0 EDI: 0000000d EBP: c1ce5e58 ESP: c1ce5dd8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010086
CR0: 80050033 CR2: bff85b6b CR3: 01840000 CR4: 00350ed0
Call Trace:
? show_regs+0x4d/0x54
? fixup_exception+0x41e/0x45c
? __warn+0x84/0x150
? fixup_exception+0x41e/0x45c
? fixup_exception+0x41e/0x45c
? report_bug+0x186/0x1b0
? exc_overflow+0x50/0x50
? handle_bug+0x2d/0x50
? exc_invalid_op+0x1b/0x70
? console_unlock+0x53/0xc4
? handle_exception+0x14b/0x14b
? exc_overflow+0x50/0x50
? fixup_exception+0x41e/0x45c
? exc_overflow+0x50/0x50
? fixup_exception+0x41e/0x45c
? restore_fpregs_from_fpstate+0x38/0x6c
? _get_random_bytes+0x65/0x190
? mt_find+0xd1/0x458
? exc_bounds+0xac/0xac
exc_general_protection+0x97/0x358
? randomize_page+0x37/0x54
? exc_bounds+0xac/0xac
handle_exception+0x14b/0x14b
EIP: restore_fpregs_from_fpstate+0x38/0x6c
Code: 7d fc 89 ca eb 09 cc cc cc db e2 0f 77 db 03 3e 8d 74 26 00 8b 3d ec 31 c6 cb 8b 0d e8 31 c6 cb 21 fa 8d 7b 40 21 c8 0f c7 1f <8b> 5d f8 8b 7d fc 89 ec 5d c3 66 90 3e 8d 74 26 00 0f ae 4b 40 8b
EAX: 00000007 EBX: c1ccdc60 ECX: 00000007 EDX: 00000000
ESI: c1ccdc20 EDI: c1ccdca0 EBP: c1ce5f54 ESP: c1ce5f4c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
? exc_bounds+0xac/0xac
? exc_bounds+0xac/0xac
? restore_fpregs_from_fpstate+0x35/0x6c
switch_fpu_return+0x49/0xd0
syscall_exit_to_user_mode+0x181/0x1a8
? call_usermodehelper_exec_async+0xbe/0x1ac
? call_usermodehelper+0x8c/0x8c
ret_from_fork+0x23/0x44
? call_usermodehelper+0x8c/0x8c
ret_from_fork_asm+0x12/0x18
entry_INT80_32+0x108/0x108
EIP: 0xb7f3b087
Code: Unable to access opcode bytes at 0xb7f3b05d.
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bff85a70
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 60 at arch/x86/kernel/fpu/xstate.h:195 save_fpregs_to_fpstate+0x98/0xa4
Modules linked in:
CPU: 0 PID: 60 Comm: modprobe Tainted: G W 6.10.0-rc2-00003-g4f4a9b399357-dirty #39
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
EIP: save_fpregs_to_fpstate+0x98/0xa4
Code: 2e 8d b4 26 00 00 00 00 66 90 8b 40 08 dd 70 40 9b 8b 43 08 dd 60 40 8b 5d f8 8b 7d fc 89 ec 5d c3 2e 8d b4 26 00 00 00 00 90 <0f> 0b eb 96 8d 74 26 00 0f 0b eb 82 55 89 e5 83 ec 0c 89 5d fc 89
EAX: 00000007 EBX: c1ccdc20 ECX: c1ccdc60 EDX: 00000000
ESI: c1ccd400 EDI: fffffff2 EBP: c10efb2c ESP: c10efb24
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010082
CR0: 80050033 CR2: b7f3b05d CR3: 01840000 CR4: 00350ed0
Call Trace:
---[ end trace 0000000000000000 ]---
modprobe (60) used greatest stack depth: 5836 bytes left
Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Loaded X.509 cert 'wens: 61c038651aabdcf94bd0ac7ff06c7248db18c600'
platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
cfg80211: failed to load regulatory.db
Unstable clock detected, switching default tracing clock to "global"
If you want to keep using the local clock, then add:
"trace_clock=local"
on the kernel command line
ALSA device list:
No soundcards found.
ata6: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
ata2: SATA link down (SStatus 0 SControl 300)
ata1: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata5: SATA link down (SStatus 0 SControl 300)
Freeing unused kernel image (initmem) memory: 848K
Write protecting kernel text and read-only data: 19360k
Run /init as init process
init[1] bad frame in 32bit sigreturn frame:bfb9d86c ip:b7ee4517 sp:bfb9df20 orax:ffffffff in libuClibc-1.0.42.so[18517,b7edf000+52000]
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU: 0 PID: 1 Comm: init Tainted: G W 6.10.0-rc2-00003-g4f4a9b399357-dirty #39
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
dump_stack_lvl+0x21/0x60
dump_stack+0x12/0x18
panic+0x2ab/0x2d0
do_exit+0x885/0x9e0
? __switch_to_asm+0xdd/0xf0
do_group_exit+0x24/0x74
get_signal+0xa1f/0xa20
? __switch_to_asm+0x53/0xf0
? __switch_to_asm+0x4d/0xf0
? finish_task_switch.isra.0+0x79/0x254
? __switch_to_asm+0x41/0xf0
arch_do_signal_or_restart+0x30/0x1f0
syscall_exit_to_user_mode+0xbf/0x1a8
do_int80_syscall_32+0x8d/0x14c
entry_INT80_32+0x108/0x108
EIP: 0xb7ee4517
Code: c8 ff c3 56 53 8b 44 24 0c e8 39 ce ff ff 81 c3 59 67 08 00 8b 4c 24 10 8b 54 24 14 8b 74 24 18 53 89 c3 b8 72 00 00 00 cd 80 <5b> 3d 00 f0 ff ff 76 0e 8b 93 dc 02 00 00 f7 d8 65 89 02 83 c8 ff
EAX: 00000000 EBX: ffffffff ECX: bfb9df88 EDX: 00000000
ESI: 00000000 EDI: 00529a4c EBP: 00000001 ESP: bfb9df20
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
/home/mingo/qemu/i386/run-image-i386.sh: line 15: 2538063 Killed qemu-system-i386 -display none -nodefaults -M q35 -d unimp,guest_errors -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' -kernel arch/x86/boot/bzImage -initrd $DIR/x86-rootfs.cpio -cpu host -enable-kvm -m 512m -smp 1 -serial mon:stdio
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-12 8:17 ` Ingo Molnar
@ 2024-06-12 9:40 ` Oleg Nesterov
2024-06-12 18:41 ` Oleg Nesterov
1 sibling, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-12 9:40 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
On 06/12, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> > > + x86_init_fpu.last_cpu = this_cpu;
> >
> > Why? I think it should do
> >
> > x86_init_fpu.last_cpu = -1;
> > set_thread_flag(TIF_NEED_FPU_LOAD);
> >
> > And the next patch should kill x86_init_fpu altogether, but keep
> > TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.
>
> So I applied the patch further below on top of:
>
> 4f4a9b399357 x86/fpu: Make task_struct::thread constant size
>
> And Nathan's 32-bit kernel testcase [but running with 1 CPU to simplify it]
> still crashes in a similar fashion
Yes, I didn't expect it can fix the problem. Still makes sense, I think.
> in the (first?) modprobe instance with a
> bad FPU state exception:
OK, I reproduced it too. I see nothing wrong in the usermodehelper or
kernel_execve paths... and fpu_clone() looks fine, "minimal" is still
true if init_task or another PF_KTHREAD calls user_mode_thread().
So I appiled the patch below and save_fpregs_to_fpstate() in
fpu__init_system() triggers the WARN_ON_FPU(err) in os_xsave()
[ 0.014609] RESTORED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ 0.014958] ------------[ cut here ]------------
[ 0.014958] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.h:189 save_fpregs_to_fpstate+0x74/0x80
...
so I _think_ we can probably forget about modprobe/etc.
Oleg.
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 10d0a720659c..9fa78f75b2e5 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -56,8 +56,8 @@ static inline void fpregs_restore_userregs(void)
struct fpu *fpu = x86_task_fpu(current);
int cpu = smp_processor_id();
- if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
- return;
+// if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
+// return;
if (!fpregs_state_valid(fpu, cpu)) {
/*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..0e63d54595aa 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,11 +5,11 @@
#include <asm/fpu/api.h>
#include <asm/tlbflush.h>
#include <asm/setup.h>
-
+#include <asm/fpu/signal.h>
#include <linux/sched.h>
#include <linux/sched/task.h>
#include <linux/init.h>
-
+#include "context.h"
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
@@ -75,12 +75,12 @@ static struct fpu x86_init_fpu __read_mostly;
static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
+// int this_cpu = smp_processor_id();
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
@@ -217,6 +217,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
* Called on the boot CPU once per system bootup, to set up the initial
* FPU state that is later cloned into all processes:
*/
+void save_fpregs_to_fpstate(struct fpu *fpu);
void __init fpu__init_system(void)
{
fpu__init_system_early_generic();
@@ -231,4 +232,10 @@ void __init fpu__init_system(void)
fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate(fpu_kernel_cfg.max_size);
fpu__init_task_struct_size();
+
+ BUG_ON(x86_task_fpu(current) != &x86_init_fpu);
+ fpregs_restore_userregs();
+ pr_crit("RESTORED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
+ save_fpregs_to_fpstate(&x86_init_fpu);
+ pr_crit("SAVED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-12 8:17 ` Ingo Molnar
2024-06-12 9:40 ` Oleg Nesterov
@ 2024-06-12 18:41 ` Oleg Nesterov
2024-06-12 20:30 ` Oleg Nesterov
2024-06-13 9:36 ` [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels Ingo Molnar
1 sibling, 2 replies; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-12 18:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
The patch below seems to fix the problem.
Again, the changes in fpu__init_system_early_generic() are not
strictly needed to fix it, but I believe make sense anyway.
Oleg.
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..848ea79886ba 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}
-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;
static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 215a7380e41c..ec22b9bf27f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1562,7 +1562,7 @@ struct task_struct {
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-};
+} __attribute__ ((aligned (64)));
#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size
2024-06-12 18:41 ` Oleg Nesterov
@ 2024-06-12 20:30 ` Oleg Nesterov
2024-06-13 9:36 ` [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels Ingo Molnar
1 sibling, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-12 20:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
Damn, sorry for the additional spam, but if I was not clear...
On 06/12, Oleg Nesterov wrote:
>
> The patch below seems to fix the problem.
Of course I am not trying to propose this change. This is just
the debugging patch. Which can hopefully explain the problem.
> Again, the changes in fpu__init_system_early_generic() are not
> strictly needed to fix it, but I believe make sense anyway.
...
> static void __init fpu__init_system_early_generic(void)
> {
> - int this_cpu = smp_processor_id();
> -
> fpstate_reset(&x86_init_fpu);
> current->thread.fpu = &x86_init_fpu;
> - per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> - x86_init_fpu.last_cpu = this_cpu;
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + x86_init_fpu.last_cpu = -1;
Yes, in particular set_thread_flag(TIF_NEED_FPU_LOAD). And x86_init_fpu
should die after the next patch.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels
2024-06-12 18:41 ` Oleg Nesterov
2024-06-12 20:30 ` Oleg Nesterov
@ 2024-06-13 9:36 ` Ingo Molnar
2024-06-14 15:16 ` Oleg Nesterov
1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2024-06-13 9:36 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
* Oleg Nesterov <oleg@redhat.com> wrote:
> The patch below seems to fix the problem.
>
> Again, the changes in fpu__init_system_early_generic() are not
> strictly needed to fix it, but I believe make sense anyway.
>
> Oleg.
>
>
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index 4e8d37b5a90b..848ea79886ba 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
> return fsw == 0 && (fcw & 0x103f) == 0x003f;
> }
>
> -static struct fpu x86_init_fpu __read_mostly;
> +static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;
>
> static void __init fpu__init_system_early_generic(void)
> {
> - int this_cpu = smp_processor_id();
> -
> fpstate_reset(&x86_init_fpu);
> current->thread.fpu = &x86_init_fpu;
> - per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> - x86_init_fpu.last_cpu = this_cpu;
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + x86_init_fpu.last_cpu = -1;
>
> if (!boot_cpu_has(X86_FEATURE_CPUID) &&
> !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 215a7380e41c..ec22b9bf27f5 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1562,7 +1562,7 @@ struct task_struct {
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
> -};
> +} __attribute__ ((aligned (64)));
Oh ... indeed, FPU context save area must be 64 bytes aligned!
On 64-bit kernels this was a given, accidentally, but on 32-bit kernels
init_task was only 32-byte aligned:
c22f04e0 D init_task
... which misaligned the struct fpu as well, I think. With your fix:
c22f0500 D init_task
What happened is that due to my series 'struct task_struct' lost its
64-byte alignment attribute, which broke the fpu struct allocation code on
32-bit kernels and made the 64-bit one probably unrobust as well.
To add insult to injury, I was aware of the alignment requirement, and
tried to cover it with an assert, but doubly mis-coded it:
+ BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
Which is buggy:
- As on 32-bit kernels CONFIG_X86_L1_CACHE_SHIFT=5, ie. 32 bytes ...
- Nor does it really check the alignment of the FPU context save area
within struct fpu as it's allocated after task_struct ...
The interim patch below against the full WIP.x86/fpu series is what fixes
Nathan's 32-bit testcase.
Further improvements:
- The extra alignment attribute in <linux/sched.h> will affect other
architecture as well, although in practice the alignment of init_task is
not critical, and is very likely at least 32 bytes, probably more.
Still, it's a bit ugly in its current form.
- Also, because this was pretty hard to debug, we should probably add an
alignment check to fpu__init_task_struct_size() where we allocate the
fpu context structure, and fix the buggy size-assert.
Thanks a lot for your help Oleg! I've added this tag of yours:
Fixed-by: Oleg Nesterov <oleg@redhat.com>
... and would appreciate your Acked-by or Reviewed-by for the eventual
final version of the series, but I don't insist. ;-)
Ingo
=================>
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 53580e59e5db..16b6611634c3 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,15 +71,13 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}
-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;
static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 215a7380e41c..ec22b9bf27f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1562,7 +1562,7 @@ struct task_struct {
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-};
+} __attribute__ ((aligned (64)));
#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels
2024-06-13 9:36 ` [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels Ingo Molnar
@ 2024-06-14 15:16 ` Oleg Nesterov
2024-06-15 10:23 ` Oleg Nesterov
0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-14 15:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
Hi Ingo, sorry for delay.
On 06/13, Ingo Molnar wrote:
>
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1562,7 +1562,7 @@ struct task_struct {
> > * they are included in the randomized portion of task_struct.
> > */
> > randomized_struct_fields_end
> > -};
> > +} __attribute__ ((aligned (64)));
I guess __aligned(64) will look a bit better, but this is minor.
> What happened is that due to my series 'struct task_struct' lost its
> 64-byte alignment attribute, which broke the fpu struct allocation code on
> 32-bit kernels and made the 64-bit one probably unrobust as well.
Yes, and note that struct fpstate has the same __aligned(64), that is
how I noticed the potential problem and decided to check.
But Ingo, it was a shot in the dark ;) I still don't really understand
what exactly should be aligned. Is it the fpstate->regs member? Or what?
If yes, perhaps this member needs __aligned(64) too to be safe?
> ... and would appreciate your Acked-by or Reviewed-by for the eventual
> final version of the series, but I don't insist. ;-)
Thanks ;) will do.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels
2024-06-14 15:16 ` Oleg Nesterov
@ 2024-06-15 10:23 ` Oleg Nesterov
2024-06-16 10:55 ` Oleg Nesterov
0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-15 10:23 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
On 06/14, Oleg Nesterov wrote:
>
> On 06/13, Ingo Molnar wrote:
> >
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1562,7 +1562,7 @@ struct task_struct {
> > > * they are included in the randomized portion of task_struct.
> > > */
> > > randomized_struct_fields_end
> > > -};
> > > +} __attribute__ ((aligned (64)));
>
> I guess __aligned(64) will look a bit better, but this is minor.
...
> But Ingo, it was a shot in the dark ;) I still don't really understand
> what exactly should be aligned. Is it the fpstate->regs member? Or what?
> If yes, perhaps this member needs __aligned(64) too to be safe?
Ah, I didn't notice that fpregs_state->xregs_state has
__attribute__ ((packed, aligned (64))), so everything is clear.
From your previous email:
- The extra alignment attribute in <linux/sched.h> will affect other
architecture as well, although in practice the alignment of init_task is
not critical, and is very likely at least 32 bytes, probably more.
Still, it's a bit ugly in its current form.
Agreed, but afaics we need to align task_struct only to ensure that
(void *)task + sizeof(*task);
doesn't break the alignment.
So perhaps we can (later) change x86_task_fpu(), fpu_clone(), and
fpu__init_task_struct_size() to use
ALIGN(sizeof(struct task_struct), 64)
and remove the alignment attribute in sched.h?
Or use ARCH_MIN_TASKALIGN == __alignof__(union fpregs_state) which is
also used in fork_init()->kmem_cache_create().
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels
2024-06-15 10:23 ` Oleg Nesterov
@ 2024-06-16 10:55 ` Oleg Nesterov
0 siblings, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2024-06-16 10:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nathan Chancellor, linux-kernel, Andy Lutomirski, Andrew Morton,
Dave Hansen, Peter Zijlstra, Borislav Petkov, Brian Gerst,
H . Peter Anvin, Linus Torvalds, Thomas Gleixner, Uros Bizjak
On 06/15, Oleg Nesterov wrote:
>
> So perhaps we can (later) change x86_task_fpu(), fpu_clone(), and
> fpu__init_task_struct_size() to use
>
> ALIGN(sizeof(struct task_struct), 64)
>
> and remove the alignment attribute in sched.h?
On the 2nd thought, perhaps this makes sense from the very beginning?
See the patch below, up to you.
> Or use ARCH_MIN_TASKALIGN == __alignof__(union fpregs_state) which is
> also used in fork_init()->kmem_cache_create().
Either way, I hope that CONFIG_X86_VSMP can't define ARCH_MIN_TASKALIGN
less than __alignof__(fpregs_state).
Oleg.
---
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 64509c7f26c8..7887e9493330 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,6 +507,9 @@ struct thread_struct {
struct fpu *fpu;
};
+#define X86_TASK_SIZE \
+ ALIGN(sizeof(struct task_struct), __alignof__(union fpregs_state))
+
#define x86_task_fpu(task) ((task)->thread.fpu)
/*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f0c4367804b3..613198372764 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -591,7 +591,7 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+ struct fpu *dst_fpu = (void *)dst + X86_TASK_SIZE;
BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
BUG_ON(!src_fpu);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..8b43c83b82c7 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}
-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __aligned(64) __read_mostly;
static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
@@ -157,7 +155,7 @@ static void __init fpu__init_system_generic(void)
*/
static void __init fpu__init_task_struct_size(void)
{
- int task_size = sizeof(struct task_struct);
+ int task_size = X86_TASK_SIZE;
task_size += sizeof(struct fpu);
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-06-16 10:57 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-08 7:31 [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu Ingo Molnar
2024-06-08 7:31 ` [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
2024-06-08 7:31 ` [PATCH 2/9] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu() Ingo Molnar
2024-06-08 7:31 ` [PATCH 3/9] x86/fpu: Make task_struct::thread constant size Ingo Molnar
2024-06-10 21:13 ` Nathan Chancellor
2024-06-11 12:42 ` Oleg Nesterov
2024-06-12 8:17 ` Ingo Molnar
2024-06-12 9:40 ` Oleg Nesterov
2024-06-12 18:41 ` Oleg Nesterov
2024-06-12 20:30 ` Oleg Nesterov
2024-06-13 9:36 ` [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels Ingo Molnar
2024-06-14 15:16 ` Oleg Nesterov
2024-06-15 10:23 ` Oleg Nesterov
2024-06-16 10:55 ` Oleg Nesterov
2024-06-08 7:31 ` [PATCH 4/9] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2024-06-08 7:31 ` [PATCH 5/9] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
2024-06-08 7:31 ` [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
2024-06-10 10:01 ` Oleg Nesterov
2024-06-08 7:31 ` [PATCH 7/9] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2024-06-08 7:31 ` [PATCH 8/9] x86/fpu: Use 'fpstate' variable names consistently Ingo Molnar
2024-06-08 7:31 ` [PATCH 9/9] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox