From: Dapeng Mi <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
Eranian Stephane <eranian@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Zide Chen <zide.chen@intel.com>,
Falcon Thomas <thomas.falcon@intel.com>,
Dapeng Mi <dapeng1.mi@intel.com>,
Xudong Hao <xudong.hao@intel.com>,
Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: [Patch v7 09/24] x86/fpu: Ensure TIF_NEED_FPU_LOAD is set after saving FPU state
Date: Tue, 24 Mar 2026 08:41:03 +0800 [thread overview]
Message-ID: <20260324004118.3772171-10-dapeng1.mi@linux.intel.com> (raw)
In-Reply-To: <20260324004118.3772171-1-dapeng1.mi@linux.intel.com>
Following Peter and Dave's suggestion, Ensure that the TIF_NEED_FPU_LOAD
flag is always set after saving the FPU state. This guarantees that the
user space FPU state has been saved whenever the TIF_NEED_FPU_LOAD flag
is set.
A subsequent patch will verify if the user space FPU state can be
retrieved from the saved task FPU state in the NMI context by checking
the TIF_NEED_FPU_LOAD flag.
Please check the below link to get more background about the suggestion.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20251204154721.GB2619703@noisy.programming.kicks-ass.net/
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
V7: Add wrapper helper update_fpu_state_and_flag() and corresponding
comments.
arch/x86/include/asm/fpu/sched.h | 5 +++--
arch/x86/kernel/fpu/core.c | 27 ++++++++++++++++++++-------
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 89004f4ca208..dcb2fa5f06d6 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -10,6 +10,8 @@
#include <asm/trace/fpu.h>
extern void save_fpregs_to_fpstate(struct fpu *fpu);
+extern void update_fpu_state_and_flag(struct fpu *fpu,
+ struct task_struct *task);
extern void fpu__drop(struct task_struct *tsk);
extern int fpu_clone(struct task_struct *dst, u64 clone_flags, bool minimal,
unsigned long shstk_addr);
@@ -36,8 +38,7 @@ static inline void switch_fpu(struct task_struct *old, int cpu)
!(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
struct fpu *old_fpu = x86_task_fpu(old);
- set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(old_fpu);
+ update_fpu_state_and_flag(old_fpu, old);
/*
* The save operation preserved register state, so the
* fpu_fpregs_owner_ctx is still @old_fpu. Store the
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 608983806fd7..48d1ab50a961 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -213,6 +213,19 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
}
}
+/*
+ * Save the FPU register state in fpu->fpstate->regs and set
+ * TIF_NEED_FPU_LOAD subsequently.
+ *
+ * Must be called with fpregs_lock() held, ensuring flag
+ * TIF_NEED_FPU_LOAD is set last.
+ */
+void update_fpu_state_and_flag(struct fpu *fpu, struct task_struct *task)
+{
+ save_fpregs_to_fpstate(fpu);
+ set_tsk_thread_flag(task, TIF_NEED_FPU_LOAD);
+}
+
void fpu_reset_from_exception_fixup(void)
{
restore_fpregs_from_fpstate(&init_fpstate, XFEATURE_MASK_FPSTATE);
@@ -379,17 +392,19 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
fpregs_lock();
if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD))
- save_fpregs_to_fpstate(fpu);
+ update_fpu_state_and_flag(fpu, current);
/* Swap fpstate */
if (enter_guest) {
- fpu->__task_fpstate = cur_fps;
+ WRITE_ONCE(fpu->__task_fpstate, cur_fps);
+ barrier();
fpu->fpstate = guest_fps;
guest_fps->in_use = true;
} else {
guest_fps->in_use = false;
fpu->fpstate = fpu->__task_fpstate;
- fpu->__task_fpstate = NULL;
+ barrier();
+ WRITE_ONCE(fpu->__task_fpstate, NULL);
}
cur_fps = fpu->fpstate;
@@ -481,10 +496,8 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
this_cpu_write(kernel_fpu_allowed, false);
if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
- !test_thread_flag(TIF_NEED_FPU_LOAD)) {
- set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(x86_task_fpu(current));
- }
+ !test_thread_flag(TIF_NEED_FPU_LOAD))
+ update_fpu_state_and_flag(x86_task_fpu(current), current);
__cpu_invalidate_fpregs_state();
/* Put sane initial values into the control registers. */
--
2.34.1
next prev parent reply other threads:[~2026-03-24 0:46 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 0:40 [Patch v7 00/24] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2026-03-24 0:40 ` [Patch v7 01/24] perf/x86: Move hybrid PMU initialization before x86_pmu_starting_cpu() Dapeng Mi
2026-03-24 0:40 ` [Patch v7 02/24] perf/x86/intel: Avoid PEBS event on fixed counters without extended PEBS Dapeng Mi
2026-03-24 0:40 ` [Patch v7 03/24] perf/x86/intel: Enable large PEBS sampling for XMMs Dapeng Mi
2026-03-24 0:40 ` [Patch v7 04/24] perf/x86/intel: Convert x86_perf_regs to per-cpu variables Dapeng Mi
2026-03-24 0:40 ` [Patch v7 05/24] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2026-03-24 0:41 ` [Patch v7 06/24] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2026-03-24 0:41 ` [Patch v7 07/24] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2026-03-25 5:18 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 08/24] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2026-03-24 0:41 ` Dapeng Mi [this message]
2026-03-24 0:41 ` [Patch v7 10/24] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2026-03-24 0:41 ` [Patch v7 11/24] perf/x86: Enable XMM Register Sampling for Non-PEBS Events Dapeng Mi
2026-03-25 7:30 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 12/24] perf/x86: Enable XMM register sampling for REGS_USER case Dapeng Mi
2026-03-25 7:58 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 13/24] perf: Add sampling support for SIMD registers Dapeng Mi
2026-03-25 8:44 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 14/24] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2026-03-25 9:01 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 15/24] perf/x86: Enable YMM " Dapeng Mi
2026-03-24 0:41 ` [Patch v7 16/24] perf/x86: Enable ZMM " Dapeng Mi
2026-03-24 0:41 ` [Patch v7 17/24] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2026-03-24 0:41 ` [Patch v7 18/24] perf: Enhance perf_reg_validate() with simd_enabled argument Dapeng Mi
2026-03-24 0:41 ` [Patch v7 19/24] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2026-03-24 0:41 ` [Patch v7 20/24] perf/x86: Enable SSP " Dapeng Mi
2026-03-25 9:25 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 21/24] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2026-04-25 2:01 ` sashiko-bot
2026-04-29 5:25 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 22/24] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2026-04-25 3:08 ` sashiko-bot
2026-04-29 5:36 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 23/24] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2026-04-25 3:31 ` sashiko-bot
2026-04-29 6:00 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 24/24] perf/x86/intel: Add sanity check for PEBS fragment size Dapeng Mi
2026-04-25 3:53 ` sashiko-bot
2026-04-29 7:04 ` Mi, Dapeng
2026-03-24 1:08 ` [Patch v7 00/24] Support SIMD/eGPRs/SSP registers sampling for perf Mi, Dapeng
2026-03-25 9:41 ` Mi, Dapeng
2026-05-13 5:52 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324004118.3772171-10-dapeng1.mi@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=broonie@kernel.org \
--cc=dapeng1.mi@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=tglx@linutronix.de \
--cc=thomas.falcon@intel.com \
--cc=xudong.hao@intel.com \
--cc=zide.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.