public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer
@ 2024-06-05  8:35 Ingo Molnar
  2024-06-05  8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-05  8:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak

This series is one of the dependencies of the fast-headers work,
which aims to reduce header complexity by removing <asm/processor.h>
from the <linux/sched.h> dependency chain, which headers are headers
are fat enough already even if we do not combine them.

To achieve that decoupling, one of the key steps is to not embedd any
C types from <asm/processor.h> into task_struct.

The only architecture that relies on that in a serious fashion is x86,
via 'struct thread::fpu', and the series below attempts to resolve it
by using a calculated &task->thread.fpu value via the x86_task_fpu()
helper.

Code generation: without CONFIG_X86_DEBUG_FPU=y we get a small reduction:

   text        data        bss        dec         hex        filename
   26475783    10439082    1740804    38655669    24dd6b5    vmlinux.before
   26475601    10435146    1740804    38651551    24dc69f    vmlinux.after

With the new debug code - which I think we'll remove soon-ish, there's an
expected increase:

   text        data        bss        dec         hex        filename
   26476198    10439286    1740804    38656288    24dd920    vmlinux.CONFIG_X86_DEBUG_FPU.before
   26477000    10435458    1740804    38653262    24dcd4e    vmlinux.CONFIG_X86_DEBUG_FPU.after

The Git tree can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu

Thanks,

    Ingo

===============>
Ingo Molnar (3):
  x86/fpu: Make task_struct::thread constant size
  x86/fpu: Remove the thread::fpu pointer
  x86/fpu: Remove init_task FPU state dependencies, add debugging warning

 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/include/asm/processor.h | 19 ++++++++++---------
 arch/x86/kernel/fpu/context.h    |  4 ++--
 arch/x86/kernel/fpu/core.c       | 57 +++++++++++++++++++++++++++++++--------------------------
 arch/x86/kernel/fpu/init.c       | 23 +++++++++++++----------
 arch/x86/kernel/fpu/regset.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/signal.c     | 18 +++++++++---------
 arch/x86/kernel/fpu/xstate.c     | 23 ++++++++++-------------
 arch/x86/kernel/fpu/xstate.h     |  6 +++---
 arch/x86/kernel/process.c        |  6 ++----
 arch/x86/kernel/signal.c         |  6 +++---
 arch/x86/kernel/traps.c          |  2 +-
 arch/x86/math-emu/fpu_aux.c      |  2 +-
 arch/x86/math-emu/fpu_entry.c    |  4 ++--
 arch/x86/math-emu/fpu_system.h   |  2 +-
 arch/x86/mm/extable.c            |  2 +-
 include/linux/sched.h            | 13 +++----------
 17 files changed, 104 insertions(+), 107 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] x86/fpu: Make task_struct::thread constant size
  2024-06-05  8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
@ 2024-06-05  8:35 ` Ingo Molnar
  2024-06-05 19:04   ` Chang S. Bae
  2024-06-05  8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-05  8:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak

Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.

This allows us to remove the old kludge of task_struct being variable size:

  struct task_struct {

       ...
       /*
        * New fields for task_struct should be added above here, so that
        * they are included in the randomized portion of task_struct.
        */
       randomized_struct_fields_end

       /* CPU-specific state of this task: */
       struct thread_struct            thread;

       /*
        * WARNING: on x86, 'thread_struct' contains a variable-sized
        * structure.  It *MUST* be at the end of 'task_struct'.
        *
        * Do not put anything below here!
        */
  };

... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.

But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:

   DECLARE_PER_TASK(type, name);
   ...
   per_task(current, name) = val;

... which requires task_struct to be a constant size struct.

The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/ZgMeX+lFNgwYdOJF@gmail.com
---
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/include/asm/processor.h | 14 ++++++--------
 arch/x86/kernel/fpu/context.h    |  4 ++--
 arch/x86/kernel/fpu/core.c       | 51 ++++++++++++++++++++++++++-------------------------
 arch/x86/kernel/fpu/init.c       | 25 +++++++++++++++----------
 arch/x86/kernel/fpu/regset.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/signal.c     | 18 +++++++++---------
 arch/x86/kernel/fpu/xstate.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/xstate.h     |  6 +++---
 arch/x86/kernel/process.c        |  6 +++---
 arch/x86/kernel/signal.c         |  6 +++---
 arch/x86/kernel/traps.c          |  2 +-
 arch/x86/math-emu/fpu_aux.c      |  2 +-
 arch/x86/math-emu/fpu_entry.c    |  4 ++--
 arch/x86/math-emu/fpu_system.h   |  2 +-
 arch/x86/mm/extable.c            |  2 +-
 include/linux/sched.h            | 13 +++----------
 17 files changed, 99 insertions(+), 102 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index c485f1944c5f..3cf20ab49b5f 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
 {
 	if (cpu_feature_enabled(X86_FEATURE_FPU) &&
 	    !(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
-		struct fpu *old_fpu = &old->thread.fpu;
+		struct fpu *old_fpu = old->thread.fpu;
 
 		save_fpregs_to_fpstate(old_fpu);
 		/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cb4f6c513c48..920b0beebd11 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,19 +504,17 @@ struct thread_struct {
 #endif
 
 	/* Floating point and extended processor state */
-	struct fpu		fpu;
-	/*
-	 * WARNING: 'fpu' is dynamically-sized.  It *MUST* be at
-	 * the end.
-	 */
+	struct fpu		*fpu;
 };
 
-extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
-
+/*
+ * X86 doesn't need any embedded-FPU-struct quirks:
+ */
 static inline void arch_thread_struct_whitelist(unsigned long *offset,
 						unsigned long *size)
 {
-	fpu_thread_struct_whitelist(offset, size);
+	*offset = 0;
+	*size = 0;
 }
 
 static inline void
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index f6d856bd50bc..96d1f34179b3 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
 /* Internal helper for switch_fpu_return() and signal frame setup */
 static inline void fpregs_restore_userregs(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 	int cpu = smp_processor_id();
 
 	if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
@@ -67,7 +67,7 @@ static inline void fpregs_restore_userregs(void)
 		 * If PKRU is enabled, then the PKRU value is already
 		 * correct because it was either set in switch_to() or in
 		 * flush_thread(). So it is excluded because it might be
-		 * not up to date in current->thread.fpu.xsave state.
+		 * not up to date in current->thread.fpu->xsave state.
 		 *
 		 * XFD state is handled in restore_fpregs_from_fpstate().
 		 */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..816d48978da4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
 		return;
 
 	spin_lock_irq(&current->sighand->siglock);
-	fpuperm = &current->group_leader->thread.fpu.guest_perm;
+	fpuperm = &current->group_leader->thread.fpu->guest_perm;
 	perm = fpuperm->__state_perm;
 
 	/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
  */
 void fpu_sync_guest_vmexit_xfd_state(void)
 {
-	struct fpstate *fps = current->thread.fpu.fpstate;
+	struct fpstate *fps = current->thread.fpu->fpstate;
 
 	lockdep_assert_irqs_disabled();
 	if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
 int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 {
 	struct fpstate *guest_fps = guest_fpu->fpstate;
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 	struct fpstate *cur_fps = fpu->fpstate;
 
 	fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 	if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
 	    !test_thread_flag(TIF_NEED_FPU_LOAD)) {
 		set_thread_flag(TIF_NEED_FPU_LOAD);
-		save_fpregs_to_fpstate(&current->thread.fpu);
+		save_fpregs_to_fpstate(current->thread.fpu);
 	}
 	__cpu_invalidate_fpregs_state();
 
@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
  */
 void fpu_sync_fpstate(struct fpu *fpu)
 {
-	WARN_ON_FPU(fpu != &current->thread.fpu);
+	WARN_ON_FPU(fpu != current->thread.fpu);
 
 	fpregs_lock();
 	trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
 static inline void fpu_inherit_perms(struct fpu *dst_fpu)
 {
 	if (fpu_state_size_dynamic()) {
-		struct fpu *src_fpu = &current->group_leader->thread.fpu;
+		struct fpu *src_fpu = current->group_leader->thread.fpu;
 
 		spin_lock_irq(&current->sighand->siglock);
 		/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
 	if (!ssp)
 		return 0;
 
-	xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
+	xstate = get_xsave_addr(&dst->thread.fpu->fpstate->regs.xsave,
 				XFEATURE_CET_USER);
 
 	/*
@@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
 int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	      unsigned long ssp)
 {
-	struct fpu *src_fpu = &current->thread.fpu;
-	struct fpu *dst_fpu = &dst->thread.fpu;
+	/*
+	 * We allocate the new FPU structure right after the end of the task struct.
+	 * task allocation size already took this into account.
+	 *
+	 * This is safe because task_struct size is a multiple of cacheline size.
+	 */
+	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+	struct fpu *src_fpu = current->thread.fpu;
+
+	BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
+	BUG_ON(!src_fpu);
+
+	dst->thread.fpu = dst_fpu;
 
 	/* The new task's FPU state cannot be valid in the hardware. */
 	dst_fpu->last_cpu = -1;
@@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	return 0;
 }
 
-/*
- * Whitelist the FPU register state embedded into task_struct for hardened
- * usercopy.
- */
-void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
-{
-	*offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
-	*size = fpu_kernel_cfg.default_size;
-}
-
 /*
  * Drops current FPU state: deactivates the fpregs and
  * the fpstate. NOTE: it still leaves previous contents
@@ -677,7 +678,7 @@ void fpu__drop(struct fpu *fpu)
 {
 	preempt_disable();
 
-	if (fpu == &current->thread.fpu) {
+	if (fpu == current->thread.fpu) {
 		/* Ignore delayed exceptions from user space */
 		asm volatile("1: fwait\n"
 			     "2:\n"
@@ -711,7 +712,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
  */
 static void fpu_reset_fpregs(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 
 	fpregs_lock();
 	__fpu_invalidate_fpregs_state(fpu);
@@ -740,7 +741,7 @@ static void fpu_reset_fpregs(void)
  */
 void fpu__clear_user_states(struct fpu *fpu)
 {
-	WARN_ON_FPU(fpu != &current->thread.fpu);
+	WARN_ON_FPU(fpu != current->thread.fpu);
 
 	fpregs_lock();
 	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -773,7 +774,7 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 void fpu_flush_thread(void)
 {
-	fpstate_reset(&current->thread.fpu);
+	fpstate_reset(current->thread.fpu);
 	fpu_reset_fpregs();
 }
 /*
@@ -814,7 +815,7 @@ void fpregs_lock_and_load(void)
  */
 void fpregs_assert_state_consistent(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		return;
@@ -826,7 +827,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
 
 void fpregs_mark_activate(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 
 	fpregs_activate(fpu);
 	fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 998a08f17e33..de618ec509aa 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
 	/* Flush out any pending x87 state: */
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU))
-		fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+		fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
 	else
 #endif
 		asm volatile ("fninit");
@@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
 	return fsw == 0 && (fcw & 0x103f) == 0x003f;
 }
 
+static struct fpu x86_init_fpu __read_mostly;
+
 static void __init fpu__init_system_early_generic(void)
 {
+	int this_cpu = smp_processor_id();
+
+	fpstate_reset(&x86_init_fpu);
+	current->thread.fpu = &x86_init_fpu;
+	per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
+	x86_init_fpu.last_cpu = this_cpu;
+
 	if (!boot_cpu_has(X86_FEATURE_CPUID) &&
 	    !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
 		if (fpu__probe_without_cpuid())
@@ -150,11 +159,13 @@ static void __init fpu__init_task_struct_size(void)
 {
 	int task_size = sizeof(struct task_struct);
 
+	task_size += sizeof(struct fpu);
+
 	/*
 	 * Subtract off the static size of the register state.
 	 * It potentially has a bunch of padding.
 	 */
-	task_size -= sizeof(current->thread.fpu.__fpstate.regs);
+	task_size -= sizeof(current->thread.fpu->__fpstate.regs);
 
 	/*
 	 * Add back the dynamically-calculated register state
@@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)
 
 	/*
 	 * We dynamically size 'struct fpu', so we require that
-	 * it be at the end of 'thread_struct' and that
-	 * 'thread_struct' be at the end of 'task_struct'.  If
-	 * you hit a compile error here, check the structure to
-	 * see if something got added to the end.
+	 * 'state' be at the end of 'it:
 	 */
 	CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
-	CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
-	CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
 
 	arch_task_struct_size = task_size;
 }
@@ -204,7 +210,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_kernel_cfg.default_size = size;
 	fpu_user_cfg.max_size = size;
 	fpu_user_cfg.default_size = size;
-	fpstate_reset(&current->thread.fpu);
+	fpstate_reset(current->thread.fpu);
 }
 
 /*
@@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  */
 void __init fpu__init_system(void)
 {
-	fpstate_reset(&current->thread.fpu);
 	fpu__init_system_early_generic();
 
 	/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2a21bd..38bc0b390d02 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
  */
 static void sync_fpstate(struct fpu *fpu)
 {
-	if (fpu == &current->thread.fpu)
+	if (fpu == current->thread.fpu)
 		fpu_sync_fpstate(fpu);
 }
 
@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
 	 * Only stopped child tasks can be used to modify the FPU
 	 * state in the fpstate buffer:
 	 */
-	WARN_ON_FPU(fpu == &current->thread.fpu);
+	WARN_ON_FPU(fpu == current->thread.fpu);
 
 	__fpu_invalidate_fpregs_state(fpu);
 }
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
 int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
 		struct membuf to)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 
 	if (!cpu_feature_enabled(X86_FEATURE_FXSR))
 		return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
 		unsigned int pos, unsigned int count,
 		const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct fxregs_state newstate;
 	int ret;
 
@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
 	if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
 		return -ENODEV;
 
-	sync_fpstate(&target->thread.fpu);
+	sync_fpstate(target->thread.fpu);
 
 	copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
 	return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 		  unsigned int pos, unsigned int count,
 		  const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct xregs_state *tmpbuf = NULL;
 	int ret;
 
@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
 int ssp_get(struct task_struct *target, const struct user_regset *regset,
 	    struct membuf to)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct cet_user_state *cetregs;
 
 	if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
 	    unsigned int pos, unsigned int count,
 	    const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
 	struct cet_user_state *cetregs;
 	unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
 void
 convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 {
-	__convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
+	__convert_from_fxsr(env, tsk, &tsk->thread.fpu->fpstate->regs.fxsave);
 }
 
 void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
 int fpregs_get(struct task_struct *target, const struct user_regset *regset,
 	       struct membuf to)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct user_i387_ia32_struct env;
 	struct fxregs_state fxsave, *fx;
 
@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 	       unsigned int pos, unsigned int count,
 	       const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = &target->thread.fpu;
+	struct fpu *fpu = target->thread.fpu;
 	struct user_i387_ia32_struct env;
 	int ret;
 
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..6b262d0b8fc1 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	/* Check for the first magic field and other error scenarios. */
 	if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
 	    fx_sw->xstate_size < min_xstate_size ||
-	    fx_sw->xstate_size > current->thread.fpu.fpstate->user_size ||
+	    fx_sw->xstate_size > current->thread.fpu->fpstate->user_size ||
 	    fx_sw->xstate_size > fx_sw->extended_size)
 		goto setfx;
 
@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	if (likely(magic2 == FP_XSTATE_MAGIC2))
 		return true;
 setfx:
-	trace_x86_fpu_xstate_check_failed(&current->thread.fpu);
+	trace_x86_fpu_xstate_check_failed(current->thread.fpu);
 
 	/* Set the parameters for fx only state */
 	fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
 {
 	if (use_fxsr()) {
-		struct xregs_state *xsave = &tsk->thread.fpu.fpstate->regs.xsave;
+		struct xregs_state *xsave = &tsk->thread.fpu->fpstate->regs.xsave;
 		struct user_i387_ia32_struct env;
 		struct _fpstate_32 __user *fp = buf;
 
 		fpregs_lock();
 		if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-			fxsave(&tsk->thread.fpu.fpstate->regs.fxsave);
+			fxsave(&tsk->thread.fpu->fpstate->regs.fxsave);
 		fpregs_unlock();
 
 		convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct task_struct *tsk = current;
-	struct fpstate *fpstate = tsk->thread.fpu.fpstate;
+	struct fpstate *fpstate = tsk->thread.fpu->fpstate;
 	bool ia32_fxstate = (buf != buf_fx);
 	int ret;
 
@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
  */
 static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 	int ret;
 
 	/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 			      bool ia32_fxstate)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = &tsk->thread.fpu;
+	struct fpu *fpu = tsk->thread.fpu;
 	struct user_i387_ia32_struct env;
 	bool success, fx_only = false;
 	union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
  */
 bool fpu__restore_sig(void __user *buf, int ia32_frame)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 	void __user *buf_fx = buf;
 	bool ia32_fxstate = false;
 	bool success = false;
@@ -503,7 +503,7 @@ unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size)
 {
-	unsigned long frame_size = xstate_sigframe_size(current->thread.fpu.fpstate);
+	unsigned long frame_size = xstate_sigframe_size(current->thread.fpu->fpstate);
 
 	*buf_fx = sp = round_down(sp - frame_size, 64);
 	if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..1d1f6599731b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
 	 */
 	init_fpstate.xfd = 0;
 
-	fpstate_reset(&current->thread.fpu);
+	fpstate_reset(current->thread.fpu);
 }
 
 /*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 		goto out_disable;
 
 	/* Reset the state for the current task */
-	fpstate_reset(&current->thread.fpu);
+	fpstate_reset(current->thread.fpu);
 
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
 	}
 
 	if (fpu_state_size_dynamic())
-		wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
+		wrmsrl(MSR_IA32_XFD, current->thread.fpu->fpstate->xfd);
 }
 
 /*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
 			     enum xstate_copy_mode copy_mode)
 {
-	__copy_xstate_to_uabi_buf(to, tsk->thread.fpu.fpstate,
-				  tsk->thread.fpu.fpstate->user_xfeatures,
+	__copy_xstate_to_uabi_buf(to, tsk->thread.fpu->fpstate,
+				  tsk->thread.fpu->fpstate->user_xfeatures,
 				  tsk->thread.pkru, copy_mode);
 }
 
@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
 int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
 				      const void __user *ubuf)
 {
-	return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
+	return copy_uabi_to_xstate(tsk->thread.fpu->fpstate, NULL, ubuf, &tsk->thread.pkru);
 }
 
 static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
 	  * The XFD MSR does not match fpstate->xfd. That's invalid when
 	  * the passed in fpstate is current's fpstate.
 	  */
-	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+	if (fpstate->xfd == current->thread.fpu->fpstate->xfd)
 		return false;
 
 	/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
 static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
 			   unsigned int usize, struct fpu_guest *guest_fpu)
 {
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 	struct fpstate *curfps, *newfps = NULL;
 	unsigned int fpsize;
 	bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
 	 * AVX512.
 	 */
 	bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
-	struct fpu *fpu = &current->group_leader->thread.fpu;
+	struct fpu *fpu = current->group_leader->thread.fpu;
 	struct fpu_state_perm *perm;
 	unsigned int ksize, usize;
 	u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
 		return -EPERM;
 	}
 
-	fpu = &current->group_leader->thread.fpu;
+	fpu = current->group_leader->thread.fpu;
 	perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
 	ksize = perm->__state_size;
 	usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
  */
 static void avx512_status(struct seq_file *m, struct task_struct *task)
 {
-	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+	unsigned long timestamp = READ_ONCE(task->thread.fpu->avx512_timestamp);
 	long delta;
 
 	if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 05df04f39628..1c720c376a69 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 
 static inline u64 xstate_get_group_perm(bool guest)
 {
-	struct fpu *fpu = &current->group_leader->thread.fpu;
+	struct fpu *fpu = current->group_leader->thread.fpu;
 	struct fpu_state_perm *perm;
 
 	/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 	 * internally, e.g. PKRU. That's user space ABI and also required
 	 * to allow the signal handler to modify PKRU.
 	 */
-	struct fpstate *fpstate = current->thread.fpu.fpstate;
+	struct fpstate *fpstate = current->thread.fpu->fpstate;
 	u64 mask = fpstate->user_xfeatures;
 	u32 lmask;
 	u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
 	u32 hmask = mask >> 32;
 	int err;
 
-	xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+	xfd_validate_state(current->thread.fpu->fpstate, mask, true);
 
 	stac();
 	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..5f3f48713870 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.vm86 = NULL;
 #endif
 	/* Drop the copied pointer to current's fpstate */
-	dst->thread.fpu.fpstate = NULL;
+	dst->thread.fpu = NULL;
 
 	return 0;
 }
@@ -106,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 void arch_release_task_struct(struct task_struct *tsk)
 {
 	if (fpu_state_size_dynamic())
-		fpstate_free(&tsk->thread.fpu);
+		fpstate_free(tsk->thread.fpu);
 }
 #endif
 
@@ -116,7 +116,7 @@ void arch_release_task_struct(struct task_struct *tsk)
 void exit_thread(struct task_struct *tsk)
 {
 	struct thread_struct *t = &tsk->thread;
-	struct fpu *fpu = &t->fpu;
+	struct fpu *fpu = t->fpu;
 
 	if (test_thread_flag(TIF_IO_BITMAP))
 		io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..7e709ae8e99a 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
 handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 {
 	bool stepping, failed;
-	struct fpu *fpu = &current->thread.fpu;
+	struct fpu *fpu = current->thread.fpu;
 
 	if (v8086_mode(regs))
 		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
 	if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
 		return true;
 
-	fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+	fsize += current->group_leader->thread.fpu->perm.__user_state_size;
 	if (likely(ss_size > fsize))
 		return true;
 
 	if (strict_sigaltstack_size)
 		return ss_size > fsize;
 
-	mask = current->group_leader->thread.fpu.perm.__state_perm;
+	mask = current->group_leader->thread.fpu->perm.__state_perm;
 	if (mask & XFEATURE_MASK_USER_DYNAMIC)
 		return ss_size > fsize;
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..c79e36c4071b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
 static void math_error(struct pt_regs *regs, int trapnr)
 {
 	struct task_struct *task = current;
-	struct fpu *fpu = &task->thread.fpu;
+	struct fpu *fpu = task->thread.fpu;
 	int si_code;
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index d62662bdd460..8087953164fc 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)
 
 void finit(void)
 {
-	fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+	fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
 }
 
 /*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 91c52ead1226..30400d95d9d0 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
 		    unsigned int pos, unsigned int count,
 		    const void *kbuf, const void __user *ubuf)
 {
-	struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+	struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
 	void *space = s387->st_space;
 	int ret;
 	int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
 		    const struct user_regset *regset,
 		    struct membuf to)
 {
-	struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+	struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
 	const void *space = s387->st_space;
 	int offset = (S387->ftop & 7) * 10, other = 80 - offset;
 
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index eec3e4805c75..3417337e7d99 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
 	return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
 }
 
-#define I387			(&current->thread.fpu.fpstate->regs)
+#define I387			(&current->thread.fpu->fpstate->regs)
 #define FPU_info		(I387->soft.info)
 
 #define FPU_CS			(*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 51986e8a9d35..9400c1c29fc8 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
 
 /*
  * Handler for when we fail to restore a task's FPU state.  We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu.state)
+ * here because the FPU state of a task using the FPU (task->thread.fpu->state)
  * should always be valid.  However, past bugs have allowed userspace to set
  * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
  * These caused XRSTOR to fail when switching to the task, leaking the FPU
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..215a7380e41c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1554,21 +1554,14 @@ struct task_struct {
 	struct user_event_mm		*user_event_mm;
 #endif
 
+	/* CPU-specific state of this task: */
+	struct thread_struct		thread;
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
 	 */
 	randomized_struct_fields_end
-
-	/* CPU-specific state of this task: */
-	struct thread_struct		thread;
-
-	/*
-	 * WARNING: on x86, 'thread_struct' contains a variable-sized
-	 * structure.  It *MUST* be at the end of 'task_struct'.
-	 *
-	 * Do not put anything below here!
-	 */
 };
 
 #define TASK_REPORT_IDLE	(TASK_REPORT + 1)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-05  8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
  2024-06-05  8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
@ 2024-06-05  8:35 ` Ingo Molnar
  2024-06-05 13:38   ` Oleg Nesterov
  2024-06-25  5:26   ` Edgecombe, Rick P
  2024-06-05  8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
  2024-06-05 21:21 ` [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Brian Gerst
  3 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-05  8:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak

As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.

This improves code generation a bit:

   kepler:~/tip> size vmlinux.before vmlinux.after
   text        data        bss        dec         hex        filename
   26475405    10435342    1740804    38651551    24dc69f    vmlinux.before
   26475339    10959630    1216516    38651485    24dc65d    vmlinux.after

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

==================>
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/include/asm/processor.h |  5 ++---
 arch/x86/include/asm/vm86.h      |  2 +-
 arch/x86/kernel/fpu/context.h    |  2 +-
 arch/x86/kernel/fpu/core.c       | 30 ++++++++++++++----------------
 arch/x86/kernel/fpu/init.c       |  7 +++----
 arch/x86/kernel/fpu/regset.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/signal.c     | 18 +++++++++---------
 arch/x86/kernel/fpu/xstate.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/xstate.h     |  6 +++---
 arch/x86/kernel/process.c        |  6 ++----
 arch/x86/kernel/signal.c         |  6 +++---
 arch/x86/kernel/traps.c          |  2 +-
 arch/x86/kernel/vmlinux.lds.S    |  4 ++++
 arch/x86/math-emu/fpu_aux.c      |  2 +-
 arch/x86/math-emu/fpu_entry.c    |  4 ++--
 arch/x86/math-emu/fpu_system.h   |  2 +-
 arch/x86/mm/extable.c            |  2 +-
 18 files changed, 71 insertions(+), 73 deletions(-)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/ZgaFfyHMOdLHEKm+@gmail.com
---
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/include/asm/processor.h |  5 ++---
 arch/x86/kernel/fpu/context.h    |  2 +-
 arch/x86/kernel/fpu/core.c       | 30 ++++++++++++++----------------
 arch/x86/kernel/fpu/init.c       |  7 +++----
 arch/x86/kernel/fpu/regset.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/signal.c     | 18 +++++++++---------
 arch/x86/kernel/fpu/xstate.c     | 22 +++++++++++-----------
 arch/x86/kernel/fpu/xstate.h     |  6 +++---
 arch/x86/kernel/process.c        |  6 ++----
 arch/x86/kernel/signal.c         |  6 +++---
 arch/x86/kernel/traps.c          |  2 +-
 arch/x86/kernel/vmlinux.lds.S    |  4 ++++
 arch/x86/math-emu/fpu_aux.c      |  2 +-
 arch/x86/math-emu/fpu_entry.c    |  4 ++--
 arch/x86/math-emu/fpu_system.h   |  2 +-
 arch/x86/mm/extable.c            |  2 +-
 17 files changed, 70 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 3cf20ab49b5f..1feaa68b7567 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
 {
 	if (cpu_feature_enabled(X86_FEATURE_FPU) &&
 	    !(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
-		struct fpu *old_fpu = old->thread.fpu;
+		struct fpu *old_fpu = x86_task_fpu(old);
 
 		save_fpregs_to_fpstate(old_fpu);
 		/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 920b0beebd11..249c5fa20de4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -502,11 +502,10 @@ struct thread_struct {
 
 	struct thread_shstk	shstk;
 #endif
-
-	/* Floating point and extended processor state */
-	struct fpu		*fpu;
 };
 
+#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+
 /*
  * X86 doesn't need any embedded-FPU-struct quirks:
  */
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 96d1f34179b3..10d0a720659c 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
 /* Internal helper for switch_fpu_return() and signal frame setup */
 static inline void fpregs_restore_userregs(void)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 	int cpu = smp_processor_id();
 
 	if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 816d48978da4..0ccabcd3bf62 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
 		return;
 
 	spin_lock_irq(&current->sighand->siglock);
-	fpuperm = &current->group_leader->thread.fpu->guest_perm;
+	fpuperm = &x86_task_fpu(current->group_leader)->guest_perm;
 	perm = fpuperm->__state_perm;
 
 	/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
  */
 void fpu_sync_guest_vmexit_xfd_state(void)
 {
-	struct fpstate *fps = current->thread.fpu->fpstate;
+	struct fpstate *fps = x86_task_fpu(current)->fpstate;
 
 	lockdep_assert_irqs_disabled();
 	if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
 int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 {
 	struct fpstate *guest_fps = guest_fpu->fpstate;
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 	struct fpstate *cur_fps = fpu->fpstate;
 
 	fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 	if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
 	    !test_thread_flag(TIF_NEED_FPU_LOAD)) {
 		set_thread_flag(TIF_NEED_FPU_LOAD);
-		save_fpregs_to_fpstate(current->thread.fpu);
+		save_fpregs_to_fpstate(x86_task_fpu(current));
 	}
 	__cpu_invalidate_fpregs_state();
 
@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
  */
 void fpu_sync_fpstate(struct fpu *fpu)
 {
-	WARN_ON_FPU(fpu != current->thread.fpu);
+	WARN_ON_FPU(fpu != x86_task_fpu(current));
 
 	fpregs_lock();
 	trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
 static inline void fpu_inherit_perms(struct fpu *dst_fpu)
 {
 	if (fpu_state_size_dynamic()) {
-		struct fpu *src_fpu = current->group_leader->thread.fpu;
+		struct fpu *src_fpu = x86_task_fpu(current->group_leader);
 
 		spin_lock_irq(&current->sighand->siglock);
 		/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
 	if (!ssp)
 		return 0;
 
-	xstate = get_xsave_addr(&dst->thread.fpu->fpstate->regs.xsave,
+	xstate = get_xsave_addr(&x86_task_fpu(dst)->fpstate->regs.xsave,
 				XFEATURE_CET_USER);
 
 	/*
@@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	 * This is safe because task_struct size is a multiple of cacheline size.
 	 */
 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
-	struct fpu *src_fpu = current->thread.fpu;
+	struct fpu *src_fpu = x86_task_fpu(current);
 
 	BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
 	BUG_ON(!src_fpu);
 
-	dst->thread.fpu = dst_fpu;
-
 	/* The new task's FPU state cannot be valid in the hardware. */
 	dst_fpu->last_cpu = -1;
 
@@ -678,7 +676,7 @@ void fpu__drop(struct fpu *fpu)
 {
 	preempt_disable();
 
-	if (fpu == current->thread.fpu) {
+	if (fpu == x86_task_fpu(current)) {
 		/* Ignore delayed exceptions from user space */
 		asm volatile("1: fwait\n"
 			     "2:\n"
@@ -712,7 +710,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
  */
 static void fpu_reset_fpregs(void)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 
 	fpregs_lock();
 	__fpu_invalidate_fpregs_state(fpu);
@@ -741,7 +739,7 @@ static void fpu_reset_fpregs(void)
  */
 void fpu__clear_user_states(struct fpu *fpu)
 {
-	WARN_ON_FPU(fpu != current->thread.fpu);
+	WARN_ON_FPU(fpu != x86_task_fpu(current));
 
 	fpregs_lock();
 	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -774,7 +772,7 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 void fpu_flush_thread(void)
 {
-	fpstate_reset(current->thread.fpu);
+	fpstate_reset(x86_task_fpu(current));
 	fpu_reset_fpregs();
 }
 /*
@@ -815,7 +813,7 @@ void fpregs_lock_and_load(void)
  */
 void fpregs_assert_state_consistent(void)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		return;
@@ -827,7 +825,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
 
 void fpregs_mark_activate(void)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 
 	fpregs_activate(fpu);
 	fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index de618ec509aa..11aa31410df2 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
 	/* Flush out any pending x87 state: */
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU))
-		fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
+		fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
 	else
 #endif
 		asm volatile ("fninit");
@@ -78,7 +78,6 @@ static void __init fpu__init_system_early_generic(void)
 	int this_cpu = smp_processor_id();
 
 	fpstate_reset(&x86_init_fpu);
-	current->thread.fpu = &x86_init_fpu;
 	per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
 	x86_init_fpu.last_cpu = this_cpu;
 
@@ -165,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
 	 * Subtract off the static size of the register state.
 	 * It potentially has a bunch of padding.
 	 */
-	task_size -= sizeof(current->thread.fpu->__fpstate.regs);
+	task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
 
 	/*
 	 * Add back the dynamically-calculated register state
@@ -210,7 +209,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_kernel_cfg.default_size = size;
 	fpu_user_cfg.max_size = size;
 	fpu_user_cfg.default_size = size;
-	fpstate_reset(current->thread.fpu);
+	fpstate_reset(x86_task_fpu(current));
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 38bc0b390d02..19fd159217f7 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
  */
 static void sync_fpstate(struct fpu *fpu)
 {
-	if (fpu == current->thread.fpu)
+	if (fpu == x86_task_fpu(current))
 		fpu_sync_fpstate(fpu);
 }
 
@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
 	 * Only stopped child tasks can be used to modify the FPU
 	 * state in the fpstate buffer:
 	 */
-	WARN_ON_FPU(fpu == current->thread.fpu);
+	WARN_ON_FPU(fpu == x86_task_fpu(current));
 
 	__fpu_invalidate_fpregs_state(fpu);
 }
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
 int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
 		struct membuf to)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 
 	if (!cpu_feature_enabled(X86_FEATURE_FXSR))
 		return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
 		unsigned int pos, unsigned int count,
 		const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct fxregs_state newstate;
 	int ret;
 
@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
 	if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
 		return -ENODEV;
 
-	sync_fpstate(target->thread.fpu);
+	sync_fpstate(x86_task_fpu(target));
 
 	copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
 	return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 		  unsigned int pos, unsigned int count,
 		  const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct xregs_state *tmpbuf = NULL;
 	int ret;
 
@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
 int ssp_get(struct task_struct *target, const struct user_regset *regset,
 	    struct membuf to)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct cet_user_state *cetregs;
 
 	if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
 	    unsigned int pos, unsigned int count,
 	    const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
 	struct cet_user_state *cetregs;
 	unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
 void
 convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 {
-	__convert_from_fxsr(env, tsk, &tsk->thread.fpu->fpstate->regs.fxsave);
+	__convert_from_fxsr(env, tsk, &x86_task_fpu(tsk)->fpstate->regs.fxsave);
 }
 
 void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
 int fpregs_get(struct task_struct *target, const struct user_regset *regset,
 	       struct membuf to)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct user_i387_ia32_struct env;
 	struct fxregs_state fxsave, *fx;
 
@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 	       unsigned int pos, unsigned int count,
 	       const void *kbuf, const void __user *ubuf)
 {
-	struct fpu *fpu = target->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(target);
 	struct user_i387_ia32_struct env;
 	int ret;
 
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 6b262d0b8fc1..b203bf4617fc 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	/* Check for the first magic field and other error scenarios. */
 	if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
 	    fx_sw->xstate_size < min_xstate_size ||
-	    fx_sw->xstate_size > current->thread.fpu->fpstate->user_size ||
+	    fx_sw->xstate_size > x86_task_fpu(current)->fpstate->user_size ||
 	    fx_sw->xstate_size > fx_sw->extended_size)
 		goto setfx;
 
@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	if (likely(magic2 == FP_XSTATE_MAGIC2))
 		return true;
 setfx:
-	trace_x86_fpu_xstate_check_failed(current->thread.fpu);
+	trace_x86_fpu_xstate_check_failed(x86_task_fpu(current));
 
 	/* Set the parameters for fx only state */
 	fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
 {
 	if (use_fxsr()) {
-		struct xregs_state *xsave = &tsk->thread.fpu->fpstate->regs.xsave;
+		struct xregs_state *xsave = &x86_task_fpu(tsk)->fpstate->regs.xsave;
 		struct user_i387_ia32_struct env;
 		struct _fpstate_32 __user *fp = buf;
 
 		fpregs_lock();
 		if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-			fxsave(&tsk->thread.fpu->fpstate->regs.fxsave);
+			fxsave(&x86_task_fpu(tsk)->fpstate->regs.fxsave);
 		fpregs_unlock();
 
 		convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct task_struct *tsk = current;
-	struct fpstate *fpstate = tsk->thread.fpu->fpstate;
+	struct fpstate *fpstate = x86_task_fpu(tsk)->fpstate;
 	bool ia32_fxstate = (buf != buf_fx);
 	int ret;
 
@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
  */
 static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 	int ret;
 
 	/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 			      bool ia32_fxstate)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = tsk->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(tsk);
 	struct user_i387_ia32_struct env;
 	bool success, fx_only = false;
 	union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
  */
 bool fpu__restore_sig(void __user *buf, int ia32_frame)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 	void __user *buf_fx = buf;
 	bool ia32_fxstate = false;
 	bool success = false;
@@ -503,7 +503,7 @@ unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size)
 {
-	unsigned long frame_size = xstate_sigframe_size(current->thread.fpu->fpstate);
+	unsigned long frame_size = xstate_sigframe_size(x86_task_fpu(current)->fpstate);
 
 	*buf_fx = sp = round_down(sp - frame_size, 64);
 	if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1d1f6599731b..90b11671e943 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
 	 */
 	init_fpstate.xfd = 0;
 
-	fpstate_reset(current->thread.fpu);
+	fpstate_reset(x86_task_fpu(current));
 }
 
 /*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 		goto out_disable;
 
 	/* Reset the state for the current task */
-	fpstate_reset(current->thread.fpu);
+	fpstate_reset(x86_task_fpu(current));
 
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
 	}
 
 	if (fpu_state_size_dynamic())
-		wrmsrl(MSR_IA32_XFD, current->thread.fpu->fpstate->xfd);
+		wrmsrl(MSR_IA32_XFD, x86_task_fpu(current)->fpstate->xfd);
 }
 
 /*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
 			     enum xstate_copy_mode copy_mode)
 {
-	__copy_xstate_to_uabi_buf(to, tsk->thread.fpu->fpstate,
-				  tsk->thread.fpu->fpstate->user_xfeatures,
+	__copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
+				  x86_task_fpu(tsk)->fpstate->user_xfeatures,
 				  tsk->thread.pkru, copy_mode);
 }
 
@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
 int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
 				      const void __user *ubuf)
 {
-	return copy_uabi_to_xstate(tsk->thread.fpu->fpstate, NULL, ubuf, &tsk->thread.pkru);
+	return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru);
 }
 
 static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
 	  * The XFD MSR does not match fpstate->xfd. That's invalid when
 	  * the passed in fpstate is current's fpstate.
 	  */
-	if (fpstate->xfd == current->thread.fpu->fpstate->xfd)
+	if (fpstate->xfd == x86_task_fpu(current)->fpstate->xfd)
 		return false;
 
 	/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
 static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
 			   unsigned int usize, struct fpu_guest *guest_fpu)
 {
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 	struct fpstate *curfps, *newfps = NULL;
 	unsigned int fpsize;
 	bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
 	 * AVX512.
 	 */
 	bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
-	struct fpu *fpu = current->group_leader->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current->group_leader);
 	struct fpu_state_perm *perm;
 	unsigned int ksize, usize;
 	u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
 		return -EPERM;
 	}
 
-	fpu = current->group_leader->thread.fpu;
+	fpu = x86_task_fpu(current->group_leader);
 	perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
 	ksize = perm->__state_size;
 	usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
  */
 static void avx512_status(struct seq_file *m, struct task_struct *task)
 {
-	unsigned long timestamp = READ_ONCE(task->thread.fpu->avx512_timestamp);
+	unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
 	long delta;
 
 	if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 1c720c376a69..391fa3c69e5e 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 
 static inline u64 xstate_get_group_perm(bool guest)
 {
-	struct fpu *fpu = current->group_leader->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current->group_leader);
 	struct fpu_state_perm *perm;
 
 	/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 	 * internally, e.g. PKRU. That's user space ABI and also required
 	 * to allow the signal handler to modify PKRU.
 	 */
-	struct fpstate *fpstate = current->thread.fpu->fpstate;
+	struct fpstate *fpstate = x86_task_fpu(current)->fpstate;
 	u64 mask = fpstate->user_xfeatures;
 	u32 lmask;
 	u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
 	u32 hmask = mask >> 32;
 	int err;
 
-	xfd_validate_state(current->thread.fpu->fpstate, mask, true);
+	xfd_validate_state(x86_task_fpu(current)->fpstate, mask, true);
 
 	stac();
 	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5f3f48713870..4184c085627e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,8 +96,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 #ifdef CONFIG_VM86
 	dst->thread.vm86 = NULL;
 #endif
-	/* Drop the copied pointer to current's fpstate */
-	dst->thread.fpu = NULL;
 
 	return 0;
 }
@@ -106,7 +104,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 void arch_release_task_struct(struct task_struct *tsk)
 {
 	if (fpu_state_size_dynamic())
-		fpstate_free(tsk->thread.fpu);
+		fpstate_free(x86_task_fpu(tsk));
 }
 #endif
 
@@ -116,7 +114,7 @@ void arch_release_task_struct(struct task_struct *tsk)
 void exit_thread(struct task_struct *tsk)
 {
 	struct thread_struct *t = &tsk->thread;
-	struct fpu *fpu = t->fpu;
+	struct fpu *fpu = x86_task_fpu(tsk);
 
 	if (test_thread_flag(TIF_IO_BITMAP))
 		io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 7e709ae8e99a..d7906f5979a4 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
 handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 {
 	bool stepping, failed;
-	struct fpu *fpu = current->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(current);
 
 	if (v8086_mode(regs))
 		save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
 	if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
 		return true;
 
-	fsize += current->group_leader->thread.fpu->perm.__user_state_size;
+	fsize += x86_task_fpu(current->group_leader)->perm.__user_state_size;
 	if (likely(ss_size > fsize))
 		return true;
 
 	if (strict_sigaltstack_size)
 		return ss_size > fsize;
 
-	mask = current->group_leader->thread.fpu->perm.__state_perm;
+	mask = x86_task_fpu(current->group_leader)->perm.__state_perm;
 	if (mask & XFEATURE_MASK_USER_DYNAMIC)
 		return ss_size > fsize;
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c79e36c4071b..b19fb494c57c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
 static void math_error(struct pt_regs *regs, int trapnr)
 {
 	struct task_struct *task = current;
-	struct fpu *fpu = task->thread.fpu;
+	struct fpu *fpu = x86_task_fpu(task);
 	int si_code;
 	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
 						"simd exception";
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3509afc6a672..226244a894da 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,6 +170,10 @@ SECTIONS
 		/* equivalent to task_pt_regs(&init_task) */
 		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
 
+		__x86_init_fpu_begin = .;
+		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
+		__x86_init_fpu_end = .;
+
 #ifdef CONFIG_X86_32
 		/* 32 bit has nosave before _edata */
 		NOSAVE_DATA
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index 8087953164fc..5f253ae406b6 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)
 
 void finit(void)
 {
-	fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
+	fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
 }
 
 /*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 30400d95d9d0..5034df617740 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
 		    unsigned int pos, unsigned int count,
 		    const void *kbuf, const void __user *ubuf)
 {
-	struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
+	struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
 	void *space = s387->st_space;
 	int ret;
 	int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
 		    const struct user_regset *regset,
 		    struct membuf to)
 {
-	struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
+	struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
 	const void *space = s387->st_space;
 	int offset = (S387->ftop & 7) * 10, other = 80 - offset;
 
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 3417337e7d99..5e238e930fe3 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
 	return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
 }
 
-#define I387			(&current->thread.fpu->fpstate->regs)
+#define I387			(&x86_task_fpu(current)->fpstate->regs)
 #define FPU_info		(I387->soft.info)
 
 #define FPU_CS			(*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 9400c1c29fc8..1359ad75da3a 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
 
 /*
  * Handler for when we fail to restore a task's FPU state.  We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu->state)
+ * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
  * should always be valid.  However, past bugs have allowed userspace to set
  * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
  * These caused XRSTOR to fail when switching to the task, leaking the FPU
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05  8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
  2024-06-05  8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
  2024-06-05  8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
@ 2024-06-05  8:35 ` Ingo Molnar
  2024-06-05 14:17   ` Oleg Nesterov
  2024-06-24  6:47   ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ning, Hongyu
  2024-06-05 21:21 ` [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Brian Gerst
  3 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-05  8:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak

init_task's FPU state initialization was a bit of a hack:

		__x86_init_fpu_begin = .;
		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
		__x86_init_fpu_end = .;

But the init task isn't supposed to be using the FPU in any case,
so remove the hack and add in some debug warnings.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/ZgaNs1lC2Y+AnRG4@gmail.com
---
 arch/x86/include/asm/processor.h |  6 +++++-
 arch/x86/kernel/fpu/core.c       | 12 +++++++++---
 arch/x86/kernel/fpu/init.c       |  5 ++---
 arch/x86/kernel/fpu/xstate.c     |  3 ---
 arch/x86/kernel/vmlinux.lds.S    |  4 ----
 5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 249c5fa20de4..ed8981866f4d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
 #endif
 };
 
-#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#endif
 
 /*
  * X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 0ccabcd3bf62..fdc3b227800d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,15 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
  */
 DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+	WARN_ON_ONCE(task == &init_task);
+
+	return (void *)task + sizeof(*task);
+}
+#endif
+
 /*
  * Can we use the FPU in kernel mode with the
  * whole "kernel_fpu_begin/end()" sequence?
@@ -591,10 +600,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	 * This is safe because task_struct size is a multiple of cacheline size.
 	 */
 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
-	struct fpu *src_fpu = x86_task_fpu(current);
 
 	BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
-	BUG_ON(!src_fpu);
 
 	/* The new task's FPU state cannot be valid in the hardware. */
 	dst_fpu->last_cpu = -1;
@@ -657,7 +664,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	if (update_fpu_shstk(dst, ssp))
 		return 1;
 
-	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
 
 	return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 11aa31410df2..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
 	/* Flush out any pending x87 state: */
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU))
-		fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+		;
 	else
 #endif
 		asm volatile ("fninit");
@@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
 	 * Subtract off the static size of the register state.
 	 * It potentially has a bunch of padding.
 	 */
-	task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
+	task_size -= sizeof(union fpregs_state);
 
 	/*
 	 * Add back the dynamically-calculated register state
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_kernel_cfg.default_size = size;
 	fpu_user_cfg.max_size = size;
 	fpu_user_cfg.default_size = size;
-	fpstate_reset(x86_task_fpu(current));
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	if (err)
 		goto out_disable;
 
-	/* Reset the state for the current task */
-	fpstate_reset(x86_task_fpu(current));
-
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
 		/* equivalent to task_pt_regs(&init_task) */
 		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
 
-		__x86_init_fpu_begin = .;
-		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
-		__x86_init_fpu_end = .;
-
 #ifdef CONFIG_X86_32
 		/* 32 bit has nosave before _edata */
 		NOSAVE_DATA
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-05  8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
@ 2024-06-05 13:38   ` Oleg Nesterov
  2024-06-06  8:53     ` Ingo Molnar
  2024-06-25  5:26   ` Edgecombe, Rick P
  1 sibling, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-06-05 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak

On 06/05, Ingo Molnar wrote:
>
> @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
>  	 * This is safe because task_struct size is a multiple of cacheline size.
>  	 */
>  	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> -	struct fpu *src_fpu = current->thread.fpu;
> +	struct fpu *src_fpu = x86_task_fpu(current);

I think this patch can also change

	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);

above to use x86_task_fpu(dst).

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05  8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
@ 2024-06-05 14:17   ` Oleg Nesterov
  2024-06-05 16:08     ` Linus Torvalds
  2024-06-24  6:47   ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ning, Hongyu
  1 sibling, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-06-05 14:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak

On 06/05, Ingo Molnar wrote:
>
> But the init task isn't supposed to be using the FPU in any case,

Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?

> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> +	WARN_ON_ONCE(task == &init_task);
> +
> +	return (void *)task + sizeof(*task);
> +}

So perhaps WARN_ON_ONCE(task->flags & PF_KTHREAD) makes more sense?

Although in this case fpu_clone() can't use x86_task_fpu(dst) as I tried
to suggest in reply to 2/3.




This is offtopic right now, but I am wondering if it makes any sense to
make arch_task_struct_size depend on PF_KTHREAD... I mean, create another
task_struct_kthread_cachep used by alloc_task_struct_node() if PF_KTHREAD.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05 14:17   ` Oleg Nesterov
@ 2024-06-05 16:08     ` Linus Torvalds
  2024-06-05 16:26       ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2024-06-05 16:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

On Wed, 5 Jun 2024 at 07:19, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/05, Ingo Molnar wrote:
> >
> > But the init task isn't supposed to be using the FPU in any case,
>
> Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?

I don't think so. We have various users of kernel_fpu_begin()/end()
that are very much about things like crypto and RAID xor memory copies
etc that will be used by kernel worker threads.

In fact, as far as I know, we'll use the FPU in the init_task too
thanks to irq_fpu_usable(). Look up the  nft_pipapo_avx2_lookup()
function.

Maybe other patches removed that, I didn't check the context, but the
"init_task doesn't use FPU" doesn't actually sound true to me.

                Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05 16:08     ` Linus Torvalds
@ 2024-06-05 16:26       ` Oleg Nesterov
  2024-06-05 17:28         ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-06-05 16:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

On 06/05, Linus Torvalds wrote:
>
> On Wed, 5 Jun 2024 at 07:19, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 06/05, Ingo Molnar wrote:
> > >
> > > But the init task isn't supposed to be using the FPU in any case,
> >
> > Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?
>
> I don't think so. We have various users of kernel_fpu_begin()/end()
> that are very much about things like crypto and RAID xor memory copies
> etc that will be used by kernel worker threads.

Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
current->flags & PF_KTHREAD ?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05 16:26       ` Oleg Nesterov
@ 2024-06-05 17:28         ` Linus Torvalds
  2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2024-06-05 17:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

On Wed, 5 Jun 2024 at 09:27, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
> current->flags & PF_KTHREAD ?

Ahh, and init_thread does have PF_KTHREAD.

Ok, looks fine to me, except I think the commit message should be cleared up.

The whole sentence about

 "But the init task isn't supposed to be using the FPU in any case ..."

is just simply not true.

It should be more along the lines of "kernel threads don't need an FPU
save area, because their FPU use is not preemptible or reentrant and
they don't return to user space".

                Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] x86/fpu: Make task_struct::thread constant size
  2024-06-05  8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
@ 2024-06-05 19:04   ` Chang S. Bae
  2024-06-06  9:30     ` [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Chang S. Bae @ 2024-06-05 19:04 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak

On 6/5/2024 1:35 AM, Ingo Molnar wrote:
>   
>   /*
>    * Handler for when we fail to restore a task's FPU state.  We should never get
> - * here because the FPU state of a task using the FPU (task->thread.fpu.state)
> + * here because the FPU state of a task using the FPU (task->thread.fpu->state)

Just a nitpick:
	fpu::fpstate now points to the active FPU in-memory storage.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer
  2024-06-05  8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
                   ` (2 preceding siblings ...)
  2024-06-05  8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
@ 2024-06-05 21:21 ` Brian Gerst
  2024-06-06  9:06   ` [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
  3 siblings, 1 reply; 30+ messages in thread
From: Brian Gerst @ 2024-06-05 21:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak

On Wed, Jun 5, 2024 at 4:36 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> This series is one of the dependencies of the fast-headers work,
> which aims to reduce header complexity by removing <asm/processor.h>
> from the <linux/sched.h> dependency chain, which headers are headers
> are fat enough already even if we do not combine them.
>
> To achieve that decoupling, one of the key steps is to not embedd any
> C types from <asm/processor.h> into task_struct.
>
> The only architecture that relies on that in a serious fashion is x86,
> via 'struct thread::fpu', and the series below attempts to resolve it
> by using a calculated &task->thread.fpu value via the x86_task_fpu()
> helper.
>
> Code generation: without CONFIG_X86_DEBUG_FPU=y we get a small reduction:
>
>    text        data        bss        dec         hex        filename
>    26475783    10439082    1740804    38655669    24dd6b5    vmlinux.before
>    26475601    10435146    1740804    38651551    24dc69f    vmlinux.after
>
> With the new debug code - which I think we'll remove soon-ish, there's an
> expected increase:
>
>    text        data        bss        dec         hex        filename
>    26476198    10439286    1740804    38656288    24dd920    vmlinux.CONFIG_X86_DEBUG_FPU.before
>    26477000    10435458    1740804    38653262    24dcd4e    vmlinux.CONFIG_X86_DEBUG_FPU.after
>
> The Git tree can be found at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu
>
> Thanks,
>
>     Ingo
>
> ===============>
> Ingo Molnar (3):
>   x86/fpu: Make task_struct::thread constant size
>   x86/fpu: Remove the thread::fpu pointer
>   x86/fpu: Remove init_task FPU state dependencies, add debugging warning
>
>  arch/x86/include/asm/fpu/sched.h |  2 +-
>  arch/x86/include/asm/processor.h | 19 ++++++++++---------
>  arch/x86/kernel/fpu/context.h    |  4 ++--
>  arch/x86/kernel/fpu/core.c       | 57 +++++++++++++++++++++++++++++++--------------------------
>  arch/x86/kernel/fpu/init.c       | 23 +++++++++++++----------
>  arch/x86/kernel/fpu/regset.c     | 22 +++++++++++-----------
>  arch/x86/kernel/fpu/signal.c     | 18 +++++++++---------
>  arch/x86/kernel/fpu/xstate.c     | 23 ++++++++++-------------
>  arch/x86/kernel/fpu/xstate.h     |  6 +++---
>  arch/x86/kernel/process.c        |  6 ++----
>  arch/x86/kernel/signal.c         |  6 +++---
>  arch/x86/kernel/traps.c          |  2 +-
>  arch/x86/math-emu/fpu_aux.c      |  2 +-
>  arch/x86/math-emu/fpu_entry.c    |  4 ++--
>  arch/x86/math-emu/fpu_system.h   |  2 +-
>  arch/x86/mm/extable.c            |  2 +-
>  include/linux/sched.h            | 13 +++----------
>  17 files changed, 104 insertions(+), 107 deletions(-)

This series would be better if you added the x86_task_fpu() helper in
an initial patch without any other changes.  That would make the
actual changes more visible with less code churn.

Brian Gerst

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
  2024-06-05 17:28         ` Linus Torvalds
@ 2024-06-06  8:30           ` Ingo Molnar
  2024-06-06  8:46             ` [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
                               ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  8:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 5 Jun 2024 at 09:27, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
> > current->flags & PF_KTHREAD ?
> 
> Ahh, and init_thread does have PF_KTHREAD.
> 
> Ok, looks fine to me, except I think the commit message should be cleared up.
> 
> The whole sentence about
> 
>  "But the init task isn't supposed to be using the FPU in any case ..."
> 
> is just simply not true.
>
> It should be more along the lines of "kernel threads don't need an FPU
> save area, because their FPU use is not preemptible or reentrant and
> they don't return to user space".

Indeed - I fixed & clarified the changelog accordingly - see the new patch 
further below.

I changed the debug check to test for PF_KTHREAD, and to return NULL:

+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+	if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+		return NULL;
+
+	return (void *)task + sizeof(*task);
+}
+#endif

... and the NULL we return will likely crash & exit any kthreads attempting 
to use the FPU context area - which I think is preferable to returning 
invalid memory that may then be corrupted.

Hopefully this remains a hypothethical concern. :-)

Alternatively, this may be one of the very few cases where a BUG_ON() might 
be justified? This condition is not recoverable in any sane fashion IMO.

Thanks,

	Ingo

============================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 5 Jun 2024 10:35:57 +0200
Subject: [PATCH] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks

init_task's FPU state initialization was a bit of a hack:

		__x86_init_fpu_begin = .;
		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
		__x86_init_fpu_end = .;

But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.

As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20240605083557.2051480-4-mingo@kernel.org
---
 arch/x86/include/asm/processor.h |  6 +++++-
 arch/x86/kernel/fpu/core.c       | 13 ++++++++++---
 arch/x86/kernel/fpu/init.c       |  5 ++---
 arch/x86/kernel/fpu/xstate.c     |  3 ---
 arch/x86/kernel/vmlinux.lds.S    |  4 ----
 5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 249c5fa20de4..ed8981866f4d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
 #endif
 };
 
-#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#endif
 
 /*
  * X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 0ccabcd3bf62..d9ff8ca5b32d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,16 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
  */
 DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+	if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+		return NULL;
+
+	return (void *)task + sizeof(*task);
+}
+#endif
+
 /*
  * Can we use the FPU in kernel mode with the
  * whole "kernel_fpu_begin/end()" sequence?
@@ -591,10 +601,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	 * This is safe because task_struct size is a multiple of cacheline size.
 	 */
 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
-	struct fpu *src_fpu = x86_task_fpu(current);
 
 	BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
-	BUG_ON(!src_fpu);
 
 	/* The new task's FPU state cannot be valid in the hardware. */
 	dst_fpu->last_cpu = -1;
@@ -657,7 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 	if (update_fpu_shstk(dst, ssp))
 		return 1;
 
-	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
 
 	return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 11aa31410df2..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
 	/* Flush out any pending x87 state: */
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU))
-		fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+		;
 	else
 #endif
 		asm volatile ("fninit");
@@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
 	 * Subtract off the static size of the register state.
 	 * It potentially has a bunch of padding.
 	 */
-	task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
+	task_size -= sizeof(union fpregs_state);
 
 	/*
 	 * Add back the dynamically-calculated register state
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_kernel_cfg.default_size = size;
 	fpu_user_cfg.max_size = size;
 	fpu_user_cfg.default_size = size;
-	fpstate_reset(x86_task_fpu(current));
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	if (err)
 		goto out_disable;
 
-	/* Reset the state for the current task */
-	fpstate_reset(x86_task_fpu(current));
-
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
 		/* equivalent to task_pt_regs(&init_task) */
 		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
 
-		__x86_init_fpu_begin = .;
-		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
-		__x86_init_fpu_end = .;
-
 #ifdef CONFIG_X86_32
 		/* 32 bit has nosave before _edata */
 		NOSAVE_DATA

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
  2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
@ 2024-06-06  8:46             ` Ingo Molnar
  2024-06-06  8:47             ` [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  8:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/sched.h | 2 +-
 arch/x86/kernel/fpu/core.c       | 4 +++-
 arch/x86/kernel/process.c        | 3 +--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 1feaa68b7567..5fd12634bcc4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -10,7 +10,7 @@
 #include <asm/trace/fpu.h>
 
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
-extern void fpu__drop(struct fpu *fpu);
+extern void fpu__drop(struct task_struct *tsk);
 extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
 		      unsigned long shstk_addr);
 extern void fpu_flush_thread(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d9ff8ca5b32d..3223eb3dd09d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -679,8 +679,10 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
  * a state-restore is coming: either an explicit one,
  * or a reschedule.
  */
-void fpu__drop(struct fpu *fpu)
+void fpu__drop(struct task_struct *tsk)
 {
+	struct fpu *fpu = x86_task_fpu(tsk);
+
 	preempt_disable();
 
 	if (fpu == x86_task_fpu(current)) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4184c085627e..0a24997c8cc6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -114,7 +114,6 @@ void arch_release_task_struct(struct task_struct *tsk)
 void exit_thread(struct task_struct *tsk)
 {
 	struct thread_struct *t = &tsk->thread;
-	struct fpu *fpu = x86_task_fpu(tsk);
 
 	if (test_thread_flag(TIF_IO_BITMAP))
 		io_bitmap_exit(tsk);
@@ -122,7 +121,7 @@ void exit_thread(struct task_struct *tsk)
 	free_vm86(t);
 
 	shstk_free(tsk);
-	fpu__drop(fpu);
+	fpu__drop(tsk);
 }
 
 static int set_new_tls(struct task_struct *p, unsigned long tls)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit
  2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
  2024-06-06  8:46             ` [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
@ 2024-06-06  8:47             ` Ingo Molnar
  2024-06-06  8:48             ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
  2024-06-06 12:00             ` Oleg Nesterov
  3 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  8:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

fpu__drop() calls x86_task_fpu() unconditionally, while the FPU context
area will not be present if it's the init task, and should not be in
use when it's some other type of kthread.

Return early for PF_KTHREAD tasks. The debug warning in x86_task_fpu()
will catch any kthreads attempting to use the FPU save area.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/fpu/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3223eb3dd09d..db608afa686f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -681,7 +681,13 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
  */
 void fpu__drop(struct task_struct *tsk)
 {
-	struct fpu *fpu = x86_task_fpu(tsk);
+	struct fpu *fpu;
+
+	/* PF_KTHREAD tasks do not use the FPU context area: */
+	if (tsk->flags & PF_KTHREAD)
+		return;
+
+	fpu = x86_task_fpu(tsk);
 
 	preempt_disable();
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
  2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
  2024-06-06  8:46             ` [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
  2024-06-06  8:47             ` [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
@ 2024-06-06  8:48             ` Ingo Molnar
  2024-06-06 12:00             ` Oleg Nesterov
  3 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  8:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak


* Ingo Molnar <mingo@kernel.org> wrote:

> I changed the debug check to test for PF_KTHREAD, and to return NULL:
> 
> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> +	if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> +		return NULL;
> +
> +	return (void *)task + sizeof(*task);
> +}
> +#endif
> 
> ... and the NULL we return will likely crash & exit any kthreads attempting 
> to use the FPU context area - which I think is preferable to returning 
> invalid memory that may then be corrupted.
> 
> Hopefully this remains a hypothethical concern. :-)
> 
> Alternatively, this may be one of the very few cases where a BUG_ON() might 
> be justified? This condition is not recoverable in any sane fashion IMO.

And promptly this triggered in live testing, because while kthreads do not 
use the FPU context area, the current fpu__drop() code does call 
x86_task_cpu() even for kthreads...

See the two new 4/3 and 5/3 patches in this thread I've sent that clean 
this up:

  [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
  [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit

I'll also reorder the patches to apply these fpu__drop() changes before 
adding the debug warning.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-05 13:38   ` Oleg Nesterov
@ 2024-06-06  8:53     ` Ingo Molnar
  2024-06-08  6:55       ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  8:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 06/05, Ingo Molnar wrote:
> >
> > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> >  	 * This is safe because task_struct size is a multiple of cacheline size.
> >  	 */
> >  	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > -	struct fpu *src_fpu = current->thread.fpu;
> > +	struct fpu *src_fpu = x86_task_fpu(current);
> 
> I think this patch can also change
> 
> 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> 
> above to use x86_task_fpu(dst).

Yeah, so I'd prefer to keep it open coded, because of the comment and the 
debug check makes a lot more sense if the pointer calculation is visible:

        /*
         * We allocate the new FPU structure right after the end of the task struct.
         * task allocation size already took this into account.
         *
         * This is safe because task_struct size is a multiple of cacheline size.
         */
        struct fpu *dst_fpu = (void *)dst + sizeof(*dst);

        BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);

But yes, you are right, this is technically an open-coded x86_task_fpu().

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
  2024-06-05 21:21 ` [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Brian Gerst
@ 2024-06-06  9:06   ` Ingo Molnar
  2024-06-06 15:35     ` Brian Gerst
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  9:06 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak


* Brian Gerst <brgerst@gmail.com> wrote:

> >  17 files changed, 104 insertions(+), 107 deletions(-)
> 
> This series would be better if you added the x86_task_fpu() helper in
> an initial patch without any other changes.  That would make the
> actual changes more visible with less code churn.

Makes sense - I've split out the patch below and adjusted the rest of the 
series. Is this what you had in mind?

Note that I also robustified the macro a bit:

 -# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
 +# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))

Thanks,

	Ingo

========================>
From: Ingo Molnar <mingo@kernel.org>
Date: Thu, 6 Jun 2024 11:01:14 +0200
Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method

The per-task FPU context/save area is allocated right
next to task_struct() - introduce the x86_task_fpu()
helper that calculates this explicitly from the
task pointer.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 920b0beebd11..fb6f030f0692 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,6 +507,8 @@ struct thread_struct {
 	struct fpu		*fpu;
 };
 
+#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+
 /*
  * X86 doesn't need any embedded-FPU-struct quirks:
  */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()
  2024-06-05 19:04   ` Chang S. Bae
@ 2024-06-06  9:30     ` Ingo Molnar
  2024-06-06 15:55       ` Chang S. Bae
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-06  9:30 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak


* Chang S. Bae <chang.seok.bae@intel.com> wrote:

> On 6/5/2024 1:35 AM, Ingo Molnar wrote:
> >   /*
> >    * Handler for when we fail to restore a task's FPU state.  We should never get
> > - * here because the FPU state of a task using the FPU (task->thread.fpu.state)
> > + * here because the FPU state of a task using the FPU (task->thread.fpu->state)
> 
> Just a nitpick:
> 	fpu::fpstate now points to the active FPU in-memory storage.

Yeah, that is a stale comment, but this was just a 'sed' job in essence, 
and I'd like to keep that particular patch semi-automated.

Note that later on that comment gets further mangled into:

/*
 * Handler for when we fail to restore a task's FPU state.  We should never get
 * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 * should always be valid.  However, past bugs have allowed userspace to set
 * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
 * These caused XRSTOR to fail when switching to the task, leaking the FPU
 * registers of the task previously executing on the CPU.  Mitigate this class
 * of vulnerability by restoring from the initial state (essentially, zeroing
 * out all the FPU registers) if we can't restore from the task's FPU state.
 */

... which didn't improve clarity either. :-)

How about the patch below?

Thanks,

	Ingo

=========================>
From: Ingo Molnar <mingo@kernel.org>
Date: Thu, 6 Jun 2024 11:27:57 +0200
Subject: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()

The state -> fpstate rename of the fpu::fpstate field didn't
get propagated to the comment describing ex_handler_fprestore(),
fix it.

Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/extable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 1359ad75da3a..bf8dab18be97 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
 
 /*
  * Handler for when we fail to restore a task's FPU state.  We should never get
- * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
+ * here because the FPU state of a task using the FPU (struct fpu::fpstate)
  * should always be valid.  However, past bugs have allowed userspace to set
  * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
  * These caused XRSTOR to fail when switching to the task, leaking the FPU

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
  2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
                               ` (2 preceding siblings ...)
  2024-06-06  8:48             ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
@ 2024-06-06 12:00             ` Oleg Nesterov
  2024-06-07 10:56               ` Ingo Molnar
  3 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-06-06 12:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak

On 06/06, Ingo Molnar wrote:
>
> I changed the debug check to test for PF_KTHREAD, and to return NULL:
>
> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> +	if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> +		return NULL;
> +
> +	return (void *)task + sizeof(*task);
> +}
> +#endif

How many users enable CONFIG_X86_DEBUG_FPU? Perhaps it makes sense
to check PF_KTHREAD unconditionally for the start, them add
if (IS_ENABLED(X86_DEBUG_FPU)). But I won't insist.


For the record, I think we can later change this code to check

	task->flags & (PF_KTHREAD | PF_USER_WORKER)

but I guess this needs some (simple) changes in the ptrace/coredump
paths.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
  2024-06-06  9:06   ` [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
@ 2024-06-06 15:35     ` Brian Gerst
  2024-06-07 11:38       ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Gerst @ 2024-06-06 15:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak

On Thu, Jun 6, 2024 at 5:06 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > >  17 files changed, 104 insertions(+), 107 deletions(-)
> >
> > This series would be better if you added the x86_task_fpu() helper in
> > an initial patch without any other changes.  That would make the
> > actual changes more visible with less code churn.
>
> Makes sense - I've split out the patch below and adjusted the rest of the
> series. Is this what you had in mind?
>
> Note that I also robustified the macro a bit:
>
>  -# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
>  +# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
>
> Thanks,
>
>         Ingo
>
> ========================>
> From: Ingo Molnar <mingo@kernel.org>
> Date: Thu, 6 Jun 2024 11:01:14 +0200
> Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
>
> The per-task FPU context/save area is allocated right
> next to task_struct() - introduce the x86_task_fpu()
> helper that calculates this explicitly from the
> task pointer.
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/include/asm/processor.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 920b0beebd11..fb6f030f0692 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -507,6 +507,8 @@ struct thread_struct {
>         struct fpu              *fpu;
>  };
>
> +#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> +
>  /*
>   * X86 doesn't need any embedded-FPU-struct quirks:
>   */

Since this should be the first patch in the series, It would be:

#define #define x86_task_fpu(task) (&(task)->thread.fpu)

along with converting the existing accesses to task->thread.fpu in one
patch with no other functional changes. Then you could change how the
fpu struct is allocated without touching every access site again.


Brian Gerst

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()
  2024-06-06  9:30     ` [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
@ 2024-06-06 15:55       ` Chang S. Bae
  0 siblings, 0 replies; 30+ messages in thread
From: Chang S. Bae @ 2024-06-06 15:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak

On 6/6/2024 2:30 AM, Ingo Molnar wrote:
> 
> From: Ingo Molnar <mingo@kernel.org>
> Date: Thu, 6 Jun 2024 11:27:57 +0200
> Subject: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()
> 
> The state -> fpstate rename of the fpu::fpstate field didn't
> get propagated to the comment describing ex_handler_fprestore(),
> fix it.
> 
> Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

Yeah, looks good to me. Feel free to add:

Reviewed-by Chang S. Bae <chang.seok.bae@intel.com>

Thanks,
Chang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
  2024-06-06 12:00             ` Oleg Nesterov
@ 2024-06-07 10:56               ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-07 10:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, linux-kernel, Andy Lutomirski, Andrew Morton,
	Dave Hansen, Peter Zijlstra, Borislav Petkov, H . Peter Anvin,
	Thomas Gleixner, Uros Bizjak


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 06/06, Ingo Molnar wrote:
> >
> > I changed the debug check to test for PF_KTHREAD, and to return NULL:
> >
> > +#ifdef CONFIG_X86_DEBUG_FPU
> > +struct fpu *x86_task_fpu(struct task_struct *task)
> > +{
> > +	if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> > +		return NULL;
> > +
> > +	return (void *)task + sizeof(*task);
> > +}
> > +#endif
> 
> How many users enable CONFIG_X86_DEBUG_FPU?

Ubuntu does:

  kepler:~/tip> grep X86_DEBUG_FPU /boot/config-6.*
  /boot/config-6.8.0-35-generic:CONFIG_X86_DEBUG_FPU=y

Fedora doesn't:

  kepler:~/s/tmp> grep X86_DEBUG_FPU ./lib/modules/6.9.0-64.fc41.x86_64/config
  # CONFIG_X86_DEBUG_FPU is not set

So it's a bit of a hit and miss, but at least one major distribution does, 
which is all that we need really.

> [...] Perhaps it makes sense to check PF_KTHREAD unconditionally for the 
> start, them add if (IS_ENABLED(X86_DEBUG_FPU)). But I won't insist.

I think Ubuntu ought to add enough debug coverage - and this check isn't 
exactly cheap, given how it replaces a simple build time structure offset 
with a full-blown function call ...

> For the record, I think we can later change this code to check
> 
> 	task->flags & (PF_KTHREAD | PF_USER_WORKER)
> 
> but I guess this needs some (simple) changes in the ptrace/coredump
> paths.

Sounds useful, would you be interested in cooking up a series on top of 
tip:master (or tip:WIP.x86/fpu), if you are interested and have the time? 

I'll send out one last iteration of this series today, otherwise the 
changes seem close to final for an upstream merge via the tip:x86/fpu tree.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
  2024-06-06 15:35     ` Brian Gerst
@ 2024-06-07 11:38       ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2024-06-07 11:38 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Oleg Nesterov, Thomas Gleixner, Uros Bizjak


* Brian Gerst <brgerst@gmail.com> wrote:

> On Thu, Jun 6, 2024 at 5:06 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Brian Gerst <brgerst@gmail.com> wrote:
> >
> > > >  17 files changed, 104 insertions(+), 107 deletions(-)
> > >
> > > This series would be better if you added the x86_task_fpu() helper in
> > > an initial patch without any other changes.  That would make the
> > > actual changes more visible with less code churn.
> >
> > Makes sense - I've split out the patch below and adjusted the rest of the
> > series. Is this what you had in mind?
> >
> > Note that I also robustified the macro a bit:
> >
> >  -# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
> >  +# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> >
> > Thanks,
> >
> >         Ingo
> >
> > ========================>
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Thu, 6 Jun 2024 11:01:14 +0200
> > Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
> >
> > The per-task FPU context/save area is allocated right
> > next to task_struct() - introduce the x86_task_fpu()
> > helper that calculates this explicitly from the
> > task pointer.
> >
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  arch/x86/include/asm/processor.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 920b0beebd11..fb6f030f0692 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -507,6 +507,8 @@ struct thread_struct {
> >         struct fpu              *fpu;
> >  };
> >
> > +#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> > +
> >  /*
> >   * X86 doesn't need any embedded-FPU-struct quirks:
> >   */
> 
> Since this should be the first patch in the series, It would be:
> 
> #define #define x86_task_fpu(task) (&(task)->thread.fpu)
> 
> along with converting the existing accesses to task->thread.fpu in one
> patch with no other functional changes. Then you could change how the
> fpu struct is allocated without touching every access site again.

Yeah, you are right of course - I've restructured the series accordingly, 
it's now indeed a lot easier to review internally now, but has the same end 
result. Will post it later today.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-06  8:53     ` Ingo Molnar
@ 2024-06-08  6:55       ` Ingo Molnar
  2024-06-08  7:26         ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-08  6:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > On 06/05, Ingo Molnar wrote:
> > >
> > > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> > >  	 * This is safe because task_struct size is a multiple of cacheline size.
> > >  	 */
> > >  	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > > -	struct fpu *src_fpu = current->thread.fpu;
> > > +	struct fpu *src_fpu = x86_task_fpu(current);
> > 
> > I think this patch can also change
> > 
> > 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > 
> > above to use x86_task_fpu(dst).
> 
> Yeah, so I'd prefer to keep it open coded, because of the comment and the 
> debug check makes a lot more sense if the pointer calculation is visible:

On a second thought I changed it to your suggested variant:

        struct fpu *src_fpu = x86_task_fpu(current);
        struct fpu *dst_fpu = x86_task_fpu(dst);

because you are right, it's in fact easier to read this way.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-08  6:55       ` Ingo Molnar
@ 2024-06-08  7:26         ` Ingo Molnar
  2024-06-08 10:10           ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2024-06-08  7:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> > 
> > > On 06/05, Ingo Molnar wrote:
> > > >
> > > > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> > > >  	 * This is safe because task_struct size is a multiple of cacheline size.
> > > >  	 */
> > > >  	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > > > -	struct fpu *src_fpu = current->thread.fpu;
> > > > +	struct fpu *src_fpu = x86_task_fpu(current);
> > > 
> > > I think this patch can also change
> > > 
> > > 	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > > 
> > > above to use x86_task_fpu(dst).
> > 
> > Yeah, so I'd prefer to keep it open coded, because of the comment and the 
> > debug check makes a lot more sense if the pointer calculation is visible:
> 
> On a second thought I changed it to your suggested variant:
> 
>         struct fpu *src_fpu = x86_task_fpu(current);
>         struct fpu *dst_fpu = x86_task_fpu(dst);
> 
> because you are right, it's in fact easier to read this way.

On a third thought, while more readable, this doesn't work in practice with 
the current scheme, because x86_task_fpu() gets called on kthreads in 
fpu_clone(), which trips up the new debugging code.

We could resolve it by special-casing PF_KTHREAD here too, but that weakens 
the whole readability argument. I'll leave it as-is for now.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-08  7:26         ` Ingo Molnar
@ 2024-06-08 10:10           ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-06-08 10:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andy Lutomirski, Andrew Morton, Dave Hansen,
	Peter Zijlstra, Borislav Petkov, H . Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Uros Bizjak

On 06/08, Ingo Molnar wrote:
>
> On a third thought, while more readable, this doesn't work in practice with
> the current scheme, because x86_task_fpu() gets called on kthreads in
> fpu_clone(), which trips up the new debugging code.

Yes, yes, I even mentioned this in reply to 3/3.

> We could resolve it by special-casing PF_KTHREAD here too, but that weakens
> the whole readability argument. I'll leave it as-is for now.

Agreed.

Note that PF_KTHREAD | PF_USER_WORKER is already a special case in fpu_clone(),
see the "if (minimal)" check. We can probably do more changes later, for example
I don't even understand why do we need to initialize dst_fpu->fpstate->regs,
switch_fpu_return() is only called by arch_exit_to_user_mode_prepare().

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-05  8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
  2024-06-05 14:17   ` Oleg Nesterov
@ 2024-06-24  6:47   ` Ning, Hongyu
  2024-06-27  3:50     ` Ning, Hongyu
  1 sibling, 1 reply; 30+ messages in thread
From: Ning, Hongyu @ 2024-06-24  6:47 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak, kirill.shutemov



On 2024/6/5 16:35, Ingo Molnar wrote:
> init_task's FPU state initialization was a bit of a hack:
> 
> 		__x86_init_fpu_begin = .;
> 		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
> 		__x86_init_fpu_end = .;
> 
> But the init task isn't supposed to be using the FPU in any case,
> so remove the hack and add in some debug warnings.
> 
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> Link: https://lore.kernel.org/r/ZgaNs1lC2Y+AnRG4@gmail.com
> ---
>   arch/x86/include/asm/processor.h |  6 +++++-
>   arch/x86/kernel/fpu/core.c       | 12 +++++++++---
>   arch/x86/kernel/fpu/init.c       |  5 ++---
>   arch/x86/kernel/fpu/xstate.c     |  3 ---
>   arch/x86/kernel/vmlinux.lds.S    |  4 ----
>   5 files changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 249c5fa20de4..ed8981866f4d 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -504,7 +504,11 @@ struct thread_struct {
>   #endif
>   };
>   
> -#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
> +#ifdef CONFIG_X86_DEBUG_FPU
> +extern struct fpu *x86_task_fpu(struct task_struct *task);
> +#else
> +# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
> +#endif
>   
>   /*
>    * X86 doesn't need any embedded-FPU-struct quirks:
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 0ccabcd3bf62..fdc3b227800d 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -51,6 +51,15 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
>    */
>   DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
>   
> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> +	WARN_ON_ONCE(task == &init_task);
> +
> +	return (void *)task + sizeof(*task);
> +}
> +#endif
> +
>   /*
>    * Can we use the FPU in kernel mode with the
>    * whole "kernel_fpu_begin/end()" sequence?
> @@ -591,10 +600,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
>   	 * This is safe because task_struct size is a multiple of cacheline size.
>   	 */
>   	struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> -	struct fpu *src_fpu = x86_task_fpu(current);
>   
>   	BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
> -	BUG_ON(!src_fpu);
>   
>   	/* The new task's FPU state cannot be valid in the hardware. */
>   	dst_fpu->last_cpu = -1;
> @@ -657,7 +664,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
>   	if (update_fpu_shstk(dst, ssp))
>   		return 1;
>   
> -	trace_x86_fpu_copy_src(src_fpu);
>   	trace_x86_fpu_copy_dst(dst_fpu);
>   
>   	return 0;
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index 11aa31410df2..53580e59e5db 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
>   	/* Flush out any pending x87 state: */
>   #ifdef CONFIG_MATH_EMULATION
>   	if (!boot_cpu_has(X86_FEATURE_FPU))
> -		fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
> +		;
>   	else
>   #endif
>   		asm volatile ("fninit");
> @@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
>   	 * Subtract off the static size of the register state.
>   	 * It potentially has a bunch of padding.
>   	 */
> -	task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
> +	task_size -= sizeof(union fpregs_state);
>   
>   	/*
>   	 * Add back the dynamically-calculated register state
> @@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
>   	fpu_kernel_cfg.default_size = size;
>   	fpu_user_cfg.max_size = size;
>   	fpu_user_cfg.default_size = size;
> -	fpstate_reset(x86_task_fpu(current));
>   }
>   
>   /*
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 90b11671e943..1f37da22ddbe 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
>   	if (err)
>   		goto out_disable;
>   
> -	/* Reset the state for the current task */
> -	fpstate_reset(x86_task_fpu(current));
> -
>   	/*
>   	 * Update info used for ptrace frames; use standard-format size and no
>   	 * supervisor xstates:
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 226244a894da..3509afc6a672 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -170,10 +170,6 @@ SECTIONS
>   		/* equivalent to task_pt_regs(&init_task) */
>   		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
>   
> -		__x86_init_fpu_begin = .;
> -		. = __x86_init_fpu_begin + 128*PAGE_SIZE;
> -		__x86_init_fpu_end = .;
> -
>   #ifdef CONFIG_X86_32
>   		/* 32 bit has nosave before _edata */
>   		NOSAVE_DATA

Hi,

we've hit x86/fpu related WARNING and NULL pointer issue during KVM/QEMU 
VM booting with latest linux-next kernel, bisect results show it's 
related to this commit, would you take a look?

detailed description in https://bugzilla.kernel.org/show_bug.cgi?id=218980

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-05  8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
  2024-06-05 13:38   ` Oleg Nesterov
@ 2024-06-25  5:26   ` Edgecombe, Rick P
  2024-06-25 13:45     ` Edgecombe, Rick P
  1 sibling, 1 reply; 30+ messages in thread
From: Edgecombe, Rick P @ 2024-06-25  5:26 UTC (permalink / raw)
  To: mingo@kernel.org, linux-kernel@vger.kernel.org
  Cc: akpm@linux-foundation.org, bp@alien8.de, peterz@infradead.org,
	hpa@zytor.com, luto@amacapital.net, tglx@linutronix.de,
	torvalds@linux-foundation.org, dave@sr71.net, oleg@redhat.com,
	ubizjak@gmail.com

On Wed, 2024-06-05 at 10:35 +0200, Ingo Molnar wrote:
> As suggested by Oleg, remove the thread::fpu pointer, as we can
> calculate it via x86_task_fpu() at compile-time.

I'm seeing boot failures in a TDX VM that bisects to this commit in tip
(807333522953). The host is a pile of out-of-tree KVM patches, but the nature of
the change makes me wonder if it's not TDX related. The failure looks like the
below on the first bad commit. Some of the later commits had a failure with more
of an FPU associated stack trace. It also only shows when I have lock debugging
on:
#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

If it's not obvious, I can investigate some more tomorrow on a more normal VM
configuration.

[    8.830714] ------------[ cut here ]------------
[    8.830714] DEBUG_LOCKS_WARN_ON(1)
[    8.830714] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:232
__lock_acquire+0xa5c/0x2120
[    8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                
6.10.0-rc3-00004-g807333522953 #117
[    8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[    8.830714] RIP: 0010:__lock_acquire+0xa5c/0x2120
[    8.830714] Code: b8 85 c0 0f 84 39 fd ff ff 8b 05 13 4d 92 01 85 c0 0f 85 2b
fd ff ff 48 c7 c6 c5 93 3e 82 48 c7 c7 66 aa 39 82 e8 24 34 f8 ff <0f> 0b 31 c0
44 8b 5d b8 e9 60 f7 ff ff 88 55 b0 44 89 5d b8 e8 4b
[    8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010086
[    8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[    8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[    8.830714] R10: ff11000437ffe000 R11: ff11000464000d78 R12: ffffffff826b57a8
[    8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[    8.830714] FS:  0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[    8.830714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.830714] CR2: ff1100047ffff000 CR3: 0000000002eda001 CR4: 0000000000771ef0
[    8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[    8.830714] PKRU: 55555554
[    8.830714] Call Trace:
[    8.830714]  <TASK>
[    8.830714]  ? show_regs+0x60/0x70
[    8.830714]  ? __warn+0x84/0x180
[    8.830714]  ? __lock_acquire+0xa5c/0x2120
[    8.830714]  ? report_bug+0x1f3/0x200
[    8.830714]  ? handle_bug+0x40/0x70
[    8.830714]  ? exc_invalid_op+0x19/0x70
[    8.830714]  ? asm_exc_invalid_op+0x1b/0x20
[    8.830714]  ? __lock_acquire+0xa5c/0x2120
[    8.830714]  ? __lock_acquire+0xa5c/0x2120
[    8.830714]  ? common_startup_64+0x13e/0x148
[    8.830714]  lock_acquire+0xc8/0x2e0
[    8.830714]  ? copy_process+0x107/0x2b80
[    8.830714]  ? __this_cpu_preempt_check+0x13/0x20
[    8.830714]  ? _raw_spin_lock_irq+0x4b/0x50
[    8.830714]  _raw_spin_lock_irq+0x37/0x50
[    8.830714]  ? copy_process+0x107/0x2b80
[    8.830714]  copy_process+0x107/0x2b80
[    8.830714]  ? find_held_lock+0x31/0x90
[    8.830714]  ? lock_release+0x130/0x290
[    8.830714]  ? _raw_spin_unlock_irqrestore+0x2c/0x60
[    8.830714]  ? find_held_lock+0x31/0x90
[    8.830714]  kernel_clone+0x97/0x3b0
[    8.830714]  ? __mutex_unlock_slowpath+0x3c/0x2a0
[    8.830714]  user_mode_thread+0x59/0x70
[    8.830714]  ? rest_init+0x190/0x190
[    8.830714]  rest_init+0x1e/0x190
[    8.830714]  start_kernel+0x672/0x790
[    8.830714]  x86_64_start_reservations+0x18/0x30
[    8.830714]  x86_64_start_kernel+0xd0/0xe0
[    8.830714]  common_startup_64+0x13e/0x148
[    8.830714]  </TASK>
[    8.830714] irq event stamp: 27428
[    8.830714] hardirqs last  enabled at (27427): [<ffffffff81dc236c>]
_raw_spin_unlock_irqrestore+0x2c/0x60
[    8.830714] hardirqs last disabled at (27428): [<ffffffff81dc208b>]
_raw_spin_lock_irq+0x4b/0x50
[    8.830714] softirqs last  enabled at (27360): [<ffffffff8116d93c>]
cgroup_idr_alloc.constprop.0+0x5c/0x100
[    8.830714] softirqs last disabled at (27358): [<ffffffff8116d916>]
cgroup_idr_alloc.constprop.0+0x36/0x100
[    8.830714] ---[ end trace 0000000000000000 ]---
[    8.830714] BUG: kernel NULL pointer dereference, address: 00000000000000c4
[    8.830714] #PF: supervisor read access in kernel mode
[    8.830714] #PF: error_code(0x0000) - not-present page
[    8.830714] PGD 0 
[    8.830714] Oops: Oops: 0000 [#1] PREEMPT SMP
[    8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S      W         
6.10.0-rc3-00004-g807333522953 #117
[    8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[    8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[    8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[    8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[    8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[    8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[    8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[    8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[    8.830714] FS:  0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[    8.830714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[    8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[    8.830714] PKRU: 55555554
[    8.830714] Call Trace:
[    8.830714]  <TASK>
[    8.830714]  ? show_regs+0x60/0x70
[    8.830714]  ? __die+0x20/0x60
[    8.830714]  ? page_fault_oops+0x15a/0x480
[    8.830714]  ? __lock_acquire+0xa5c/0x2120
[    8.830714]  ? exc_page_fault+0x437/0x910
[    8.830714]  ? asm_exc_page_fault+0x27/0x30
[    8.830714]  ? __lock_acquire+0x1c9/0x2120
[    8.830714]  ? __lock_acquire+0xa5c/0x2120
[    8.830714]  ? common_startup_64+0x13e/0x148
[    8.830714]  lock_acquire+0xc8/0x2e0
[    8.830714]  ? copy_process+0x107/0x2b80
[    8.830714]  ? __this_cpu_preempt_check+0x13/0x20
[    8.830714]  ? _raw_spin_lock_irq+0x4b/0x50
[    8.830714]  _raw_spin_lock_irq+0x37/0x50
[    8.830714]  ? copy_process+0x107/0x2b80
[    8.830714]  copy_process+0x107/0x2b80
[    8.830714]  ? find_held_lock+0x31/0x90
[    8.830714]  ? lock_release+0x130/0x290
[    8.830714]  ? _raw_spin_unlock_irqrestore+0x2c/0x60
[    8.830714]  ? find_held_lock+0x31/0x90
[    8.830714]  kernel_clone+0x97/0x3b0
[    8.830714]  ? __mutex_unlock_slowpath+0x3c/0x2a0
[    8.830714]  user_mode_thread+0x59/0x70
[    8.830714]  ? rest_init+0x190/0x190
[    8.830714]  rest_init+0x1e/0x190
[    8.830714]  start_kernel+0x672/0x790
[    8.830714]  x86_64_start_reservations+0x18/0x30
[    8.830714]  x86_64_start_kernel+0xd0/0xe0
[    8.830714]  common_startup_64+0x13e/0x148
[    8.830714]  </TASK>
[    8.830714] CR2: 00000000000000c4
[    8.830714] ---[ end trace 0000000000000000 ]---
[    8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[    8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[    8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[    8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[    8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[    8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[    8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[    8.830714] FS:  0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[    8.830714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[    8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[    8.830714] PKRU: 55555554
[    8.830714] Kernel panic - not syncing: Attempted to kill the idle task!
[    8.830714] ---[ end Kernel panic - not syncing: Attempted to kill the idle
task! ]---



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
  2024-06-25  5:26   ` Edgecombe, Rick P
@ 2024-06-25 13:45     ` Edgecombe, Rick P
  0 siblings, 0 replies; 30+ messages in thread
From: Edgecombe, Rick P @ 2024-06-25 13:45 UTC (permalink / raw)
  To: mingo@kernel.org, linux-kernel@vger.kernel.org
  Cc: akpm@linux-foundation.org, bp@alien8.de, peterz@infradead.org,
	hpa@zytor.com, luto@amacapital.net, tglx@linutronix.de,
	torvalds@linux-foundation.org, dave@sr71.net, oleg@redhat.com,
	ubizjak@gmail.com

On Mon, 2024-06-24 at 22:26 -0700, Rick Edgecombe wrote:
> 
> If it's not obvious, I can investigate some more tomorrow on a more normal VM
> configuration.

I see the same issue on a normal VM. CONFIG_DEBUG_LOCK_ALLOC seems to be the
critical config.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning
  2024-06-24  6:47   ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ning, Hongyu
@ 2024-06-27  3:50     ` Ning, Hongyu
  0 siblings, 0 replies; 30+ messages in thread
From: Ning, Hongyu @ 2024-06-27  3:50 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Andy Lutomirski, Andrew Morton, Dave Hansen, Peter Zijlstra,
	Borislav Petkov, H . Peter Anvin, Linus Torvalds, Oleg Nesterov,
	Thomas Gleixner, Uros Bizjak, kirill.shutemov



On 2024/6/24 14:47, Ning, Hongyu wrote:
> 
> 
> On 2024/6/5 16:35, Ingo Molnar wrote:
>> init_task's FPU state initialization was a bit of a hack:
>>
>>         __x86_init_fpu_begin = .;
>>         . = __x86_init_fpu_begin + 128*PAGE_SIZE;
>>         __x86_init_fpu_end = .;
>>
>> But the init task isn't supposed to be using the FPU in any case,
>> so remove the hack and add in some debug warnings.
>>
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Fenghua Yu <fenghua.yu@intel.com>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Uros Bizjak <ubizjak@gmail.com>
>> Link: https://lore.kernel.org/r/ZgaNs1lC2Y+AnRG4@gmail.com
>> ---
>>   arch/x86/include/asm/processor.h |  6 +++++-
>>   arch/x86/kernel/fpu/core.c       | 12 +++++++++---
>>   arch/x86/kernel/fpu/init.c       |  5 ++---
>>   arch/x86/kernel/fpu/xstate.c     |  3 ---
>>   arch/x86/kernel/vmlinux.lds.S    |  4 ----
>>   5 files changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/processor.h 
>> b/arch/x86/include/asm/processor.h
>> index 249c5fa20de4..ed8981866f4d 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -504,7 +504,11 @@ struct thread_struct {
>>   #endif
>>   };
>> -#define x86_task_fpu(task) ((struct fpu *)((void *)task + 
>> sizeof(*task)))
>> +#ifdef CONFIG_X86_DEBUG_FPU
>> +extern struct fpu *x86_task_fpu(struct task_struct *task);
>> +#else
>> +# define x86_task_fpu(task) ((struct fpu *)((void *)task + 
>> sizeof(*task)))
>> +#endif
>>   /*
>>    * X86 doesn't need any embedded-FPU-struct quirks:
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index 0ccabcd3bf62..fdc3b227800d 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -51,6 +51,15 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
>>    */
>>   DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
>> +#ifdef CONFIG_X86_DEBUG_FPU
>> +struct fpu *x86_task_fpu(struct task_struct *task)
>> +{
>> +    WARN_ON_ONCE(task == &init_task);
>> +
>> +    return (void *)task + sizeof(*task);
>> +}
>> +#endif
>> +
>>   /*
>>    * Can we use the FPU in kernel mode with the
>>    * whole "kernel_fpu_begin/end()" sequence?
>> @@ -591,10 +600,8 @@ int fpu_clone(struct task_struct *dst, unsigned 
>> long clone_flags, bool minimal,
>>        * This is safe because task_struct size is a multiple of 
>> cacheline size.
>>        */
>>       struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
>> -    struct fpu *src_fpu = x86_task_fpu(current);
>>       BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
>> -    BUG_ON(!src_fpu);
>>       /* The new task's FPU state cannot be valid in the hardware. */
>>       dst_fpu->last_cpu = -1;
>> @@ -657,7 +664,6 @@ int fpu_clone(struct task_struct *dst, unsigned 
>> long clone_flags, bool minimal,
>>       if (update_fpu_shstk(dst, ssp))
>>           return 1;
>> -    trace_x86_fpu_copy_src(src_fpu);
>>       trace_x86_fpu_copy_dst(dst_fpu);
>>       return 0;
>> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
>> index 11aa31410df2..53580e59e5db 100644
>> --- a/arch/x86/kernel/fpu/init.c
>> +++ b/arch/x86/kernel/fpu/init.c
>> @@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
>>       /* Flush out any pending x87 state: */
>>   #ifdef CONFIG_MATH_EMULATION
>>       if (!boot_cpu_has(X86_FEATURE_FPU))
>> -        fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
>> +        ;
>>       else
>>   #endif
>>           asm volatile ("fninit");
>> @@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
>>        * Subtract off the static size of the register state.
>>        * It potentially has a bunch of padding.
>>        */
>> -    task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
>> +    task_size -= sizeof(union fpregs_state);
>>       /*
>>        * Add back the dynamically-calculated register state
>> @@ -209,7 +209,6 @@ static void __init 
>> fpu__init_system_xstate_size_legacy(void)
>>       fpu_kernel_cfg.default_size = size;
>>       fpu_user_cfg.max_size = size;
>>       fpu_user_cfg.default_size = size;
>> -    fpstate_reset(x86_task_fpu(current));
>>   }
>>   /*
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 90b11671e943..1f37da22ddbe 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int 
>> legacy_size)
>>       if (err)
>>           goto out_disable;
>> -    /* Reset the state for the current task */
>> -    fpstate_reset(x86_task_fpu(current));
>> -
>>       /*
>>        * Update info used for ptrace frames; use standard-format size 
>> and no
>>        * supervisor xstates:
>> diff --git a/arch/x86/kernel/vmlinux.lds.S 
>> b/arch/x86/kernel/vmlinux.lds.S
>> index 226244a894da..3509afc6a672 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -170,10 +170,6 @@ SECTIONS
>>           /* equivalent to task_pt_regs(&init_task) */
>>           __top_init_kernel_stack = __end_init_stack - 
>> TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
>> -        __x86_init_fpu_begin = .;
>> -        . = __x86_init_fpu_begin + 128*PAGE_SIZE;
>> -        __x86_init_fpu_end = .;
>> -
>>   #ifdef CONFIG_X86_32
>>           /* 32 bit has nosave before _edata */
>>           NOSAVE_DATA
> 
> Hi,
> 
> we've hit x86/fpu related WARNING and NULL pointer issue during KVM/QEMU 
> VM booting with latest linux-next kernel, bisect results show it's 
> related to this commit, would you take a look?
> 
> detailed description in https://bugzilla.kernel.org/show_bug.cgi?id=218980
> 

add a quick update:
1. CONFIG_X86_DEBUG_FPU=y was set by auto regression framework
2. disable CONFIG_X86_DEBUG_FPU will bypass above WARNING and NULL 
pointer issue

it may not make sense for general kernel regression check to enable 
CONFIG_X86_DEBUG_FPU=y, will revise auto regression framework to keep 
CONFIG_X86_DEBUG_FPU disabled to bypass it.

in the meanwhile, please let me know if this issue is still valuable to 
look into.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-06-27  3:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05  8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2024-06-05  8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
2024-06-05 19:04   ` Chang S. Bae
2024-06-06  9:30     ` [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
2024-06-06 15:55       ` Chang S. Bae
2024-06-05  8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2024-06-05 13:38   ` Oleg Nesterov
2024-06-06  8:53     ` Ingo Molnar
2024-06-08  6:55       ` Ingo Molnar
2024-06-08  7:26         ` Ingo Molnar
2024-06-08 10:10           ` Oleg Nesterov
2024-06-25  5:26   ` Edgecombe, Rick P
2024-06-25 13:45     ` Edgecombe, Rick P
2024-06-05  8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
2024-06-05 14:17   ` Oleg Nesterov
2024-06-05 16:08     ` Linus Torvalds
2024-06-05 16:26       ` Oleg Nesterov
2024-06-05 17:28         ` Linus Torvalds
2024-06-06  8:30           ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2024-06-06  8:46             ` [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
2024-06-06  8:47             ` [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
2024-06-06  8:48             ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2024-06-06 12:00             ` Oleg Nesterov
2024-06-07 10:56               ` Ingo Molnar
2024-06-24  6:47   ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ning, Hongyu
2024-06-27  3:50     ` Ning, Hongyu
2024-06-05 21:21 ` [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Brian Gerst
2024-06-06  9:06   ` [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
2024-06-06 15:35     ` Brian Gerst
2024-06-07 11:38       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox