All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Andy Lutomirski <luto@amacapital.net>,
	Dave Hansen <dave@sr71.net>, Brian Gerst <brgerst@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>, "H . Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Chang S . Bae" <chang.seok.bae@intel.com>
Subject: [PATCH tip/x86/fpu 1/6] x86/fpu: simplify the switch_fpu_prepare() + switch_fpu_finish() logic
Date: Sat, 3 May 2025 16:38:30 +0200	[thread overview]
Message-ID: <20250503143830.GA8982@redhat.com> (raw)
In-Reply-To: <20250409211127.3544993-1-mingo@kernel.org>

Now that switch_fpu_finish() doesn't load the FPU state, it makes more
sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and
more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD.
It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD)
until "prev_p" is scheduled again.

There is no worry about the very first context switch, fpu_clone() must
always set TIF_NEED_FPU_LOAD.

Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers
to switch_fpu().

Note that the "PF_KTHREAD | PF_USER_WORKER" check can be removed but
this deserves a separate patch which can change more functions, say,
kernel_fpu_begin_mask().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/fpu/sched.h | 34 +++++++++-----------------------
 arch/x86/kernel/process_32.c     |  5 +----
 arch/x86/kernel/process_64.c     |  5 +----
 3 files changed, 11 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 5fd12634bcc4..c060549c6c94 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -18,31 +18,25 @@ extern void fpu_flush_thread(void);
 /*
  * FPU state switching for scheduling.
  *
- * This is a two-stage process:
+ * switch_fpu() saves the old state and sets TIF_NEED_FPU_LOAD if
+ * TIF_NEED_FPU_LOAD is not set.  This is done within the context
+ * of the old process.
  *
- *  - switch_fpu_prepare() saves the old state.
- *    This is done within the context of the old process.
- *
- *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
- *    will get loaded on return to userspace, or when the kernel needs it.
- *
- * If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
- * are saved in the current thread's FPU register state.
- *
- * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
- * hold current()'s FPU registers. It is required to load the
+ * Once TIF_NEED_FPU_LOAD is set, it is required to load the
  * registers before returning to userland or using the content
  * otherwise.
  *
  * The FPU context is only stored/restored for a user task and
  * PF_KTHREAD is used to distinguish between kernel and user threads.
  */
-static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
+static inline void switch_fpu(struct task_struct *old, int cpu)
 {
-	if (cpu_feature_enabled(X86_FEATURE_FPU) &&
+	if (!test_tsk_thread_flag(old, TIF_NEED_FPU_LOAD) &&
+	    cpu_feature_enabled(X86_FEATURE_FPU) &&
 	    !(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		struct fpu *old_fpu = x86_task_fpu(old);
 
+		set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
 		save_fpregs_to_fpstate(old_fpu);
 		/*
 		 * The save operation preserved register state, so the
@@ -50,7 +44,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
 		 * current CPU number in @old_fpu, so the next return
 		 * to user space can avoid the FPU register restore
 		 * when is returns on the same CPU and still owns the
-		 * context.
+		 * context. See fpregs_restore_userregs().
 		 */
 		old_fpu->last_cpu = cpu;
 
@@ -58,14 +52,4 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
 	}
 }
 
-/*
- * Delay loading of the complete FPU state until the return to userland.
- * PKRU is handled separately.
- */
-static inline void switch_fpu_finish(struct task_struct *new)
-{
-	if (cpu_feature_enabled(X86_FEATURE_FPU))
-		set_tsk_thread_flag(new, TIF_NEED_FPU_LOAD);
-}
-
 #endif /* _ASM_X86_FPU_SCHED_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4636ef359973..9bd4fa694da5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -160,8 +160,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
-	if (!test_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_p, cpu);
+	switch_fpu(prev_p, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -208,8 +207,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	raw_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_p);
-
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in(next_p);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 7196ca7048be..d55310d3133c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -616,8 +616,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(hardirq_stack_inuse));
 
-	if (!test_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_p, cpu);
+	switch_fpu(prev_p, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -671,8 +670,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	raw_cpu_write(current_task, next_p);
 	raw_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_p);
-
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
-- 
2.25.1.362.g51ebf55


  parent reply	other threads:[~2025-05-03 14:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-09 21:11 [PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu Ingo Molnar
2025-04-09 21:11 ` [PATCH 1/8] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 2/8] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu() Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 3/8] x86/fpu: Make task_struct::thread constant size Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 4/8] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2025-04-10  7:39   ` Peter Zijlstra
2025-04-10 10:10     ` Ingo Molnar
2025-04-10 10:30       ` Peter Zijlstra
2025-04-10 10:54         ` [PATCH] x86/fpu: Clarify FPU context cacheline alignment Ingo Molnar
2025-04-14  7:34           ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-10 10:51       ` [PATCH 4/8] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2025-04-10 14:04       ` Oleg Nesterov
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 5/8] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 6/8] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD|PF_USER_WORKER tasks during exit Ingo Molnar
2025-04-11 15:22   ` Chang S. Bae
2025-04-12  8:37     ` Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 7/8] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-09 21:11 ` [PATCH 8/8] x86/fpu: Use 'fpstate' variable names consistently Ingo Molnar
2025-04-14  7:34   ` [tip: x86/merge] " tip-bot2 for Ingo Molnar
2025-04-22 16:11 ` [PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu Oleg Nesterov
2025-04-22 20:09   ` Ingo Molnar
2025-04-22 17:01 ` question about switch_fpu_prepare/switch_fpu_finish Oleg Nesterov
2025-04-22 20:11   ` Ingo Molnar
2025-05-03 14:38 ` Oleg Nesterov [this message]
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Simplify the switch_fpu_prepare() + switch_fpu_finish() logic tip-bot2 for Oleg Nesterov
2025-05-03 14:38 ` [PATCH tip/x86/fpu 2/6] x86/fpu: kill x86_init_fpu Oleg Nesterov
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Remove x86_init_fpu tip-bot2 for Oleg Nesterov
2025-05-03 14:38 ` [PATCH tip/x86/fpu 3/6] x86/fpu: kill DEFINE_EVENT(x86_fpu, x86_fpu_copy_src) Oleg Nesterov
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Remove " tip-bot2 for Oleg Nesterov
2025-05-03 14:38 ` [PATCH tip/x86/fpu 4/6] x86/fpu: arch_dup_task_struct: always use memcpy_and_pad() Oleg Nesterov
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Always use memcpy_and_pad() in arch_dup_task_struct() tip-bot2 for Oleg Nesterov
2025-05-03 14:38 ` [PATCH tip/x86/fpu 5/6] x86/fpu: fpu__drop: check TIF_NEED_FPU_LOAD instead of PF_KTHREAD|PF_USER_WORKER Oleg Nesterov
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Check TIF_NEED_FPU_LOAD instead of PF_KTHREAD|PF_USER_WORKER in fpu__drop() tip-bot2 for Oleg Nesterov
2025-05-03 14:39 ` [PATCH tip/x86/fpu 6/6] x86/fpu: shift fpregs_assert_state_consistent() from arch_exit_work() to its caller Oleg Nesterov
2025-05-04  8:36   ` Ingo Molnar
2025-05-04  8:54   ` [tip: x86/fpu] x86/fpu: Shift " tip-bot2 for Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250503143830.GA8982@redhat.com \
    --to=oleg@redhat.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=chang.seok.bae@intel.com \
    --cc=dave@sr71.net \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.