From: Dapeng Mi <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
Eranian Stephane <eranian@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Zide Chen <zide.chen@intel.com>,
Falcon Thomas <thomas.falcon@intel.com>,
Dapeng Mi <dapeng1.mi@intel.com>,
Xudong Hao <xudong.hao@intel.com>,
Dapeng Mi <dapeng1.mi@linux.intel.com>,
Kan Liang <kan.liang@linux.intel.com>
Subject: [Patch v7 12/24] perf/x86: Enable XMM register sampling for REGS_USER case
Date: Tue, 24 Mar 2026 08:41:06 +0800 [thread overview]
Message-ID: <20260324004118.3772171-13-dapeng1.mi@linux.intel.com> (raw)
In-Reply-To: <20260324004118.3772171-1-dapeng1.mi@linux.intel.com>
This patch adds support for XMM register sampling in the REGS_USER case.
To handle simultaneous sampling of XMM registers for both REGS_INTR and
REGS_USER cases, a per-CPU `x86_user_regs` is introduced to store
REGS_USER-specific XMM registers. This prevents REGS_USER-specific XMM
register data from being overwritten by REGS_INTR-specific data if they
share the same `x86_perf_regs` structure.
To sample user-space XMM registers, the `x86_pmu_update_user_ext_regs()`
helper function is added. It checks if the `TIF_NEED_FPU_LOAD` flag is
set. If so, the user-space XMM register data can be directly retrieved
from the cached task FPU state, as the corresponding hardware registers
have been cleared or switched to kernel-space data. Otherwise, the data
must be read from the hardware registers using the `xsaves` instruction.
For PEBS events, `x86_pmu_update_user_ext_regs()` checks if the
PEBS-sampled XMM register data belongs to user-space. If so, no further
action is needed. Otherwise, the user-space XMM register data needs to be
re-sampled using the same method as for non-PEBS events.
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
arch/x86/events/core.c | 95 ++++++++++++++++++++++++++++++++++++------
1 file changed, 82 insertions(+), 13 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 22965a8a22b3..a5643c875190 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -696,7 +696,7 @@ int x86_pmu_hw_config(struct perf_event *event)
return -EINVAL;
}
- if (event->attr.sample_type & PERF_SAMPLE_REGS_INTR) {
+ if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
/*
* Besides the general purpose registers, XMM registers may
* be collected as well.
@@ -707,15 +707,6 @@ int x86_pmu_hw_config(struct perf_event *event)
}
}
- if (event->attr.sample_type & PERF_SAMPLE_REGS_USER) {
- /*
- * Currently XMM registers sampling for REGS_USER is not
- * supported yet.
- */
- if (event_has_extended_regs(event))
- return -EINVAL;
- }
-
return x86_setup_perfctr(event);
}
@@ -1745,6 +1736,28 @@ static void x86_pmu_del(struct perf_event *event, int flags)
static_call_cond(x86_pmu_del)(event);
}
+/*
+ * When both PERF_SAMPLE_REGS_INTR and PERF_SAMPLE_REGS_USER are set,
+ * an additional x86_perf_regs is required to save user-space registers.
+ * Without this, user-space register data may be overwritten by kernel-space
+ * registers.
+ */
+static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs);
+static void x86_pmu_perf_get_regs_user(struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct x86_perf_regs *x86_regs_user = this_cpu_ptr(&x86_user_regs);
+ struct perf_regs regs_user;
+
+ perf_get_regs_user(®s_user, regs);
+ data->regs_user.abi = regs_user.abi;
+ if (regs_user.regs) {
+ x86_regs_user->regs = *regs_user.regs;
+ data->regs_user.regs = &x86_regs_user->regs;
+ } else
+ data->regs_user.regs = NULL;
+}
+
static void x86_pmu_setup_gpregs_data(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
@@ -1757,7 +1770,14 @@ static void x86_pmu_setup_gpregs_data(struct perf_event *event,
data->regs_user.abi = perf_reg_abi(current);
data->regs_user.regs = regs;
} else if (!(current->flags & PF_KTHREAD)) {
- perf_get_regs_user(&data->regs_user, regs);
+ /*
+ * It cannot guarantee that the kernel will never
+ * touch the registers outside of the pt_regs,
+ * especially when more and more registers
+ * (e.g., SIMD, eGPR) are added. The live data
+ * cannot be used.
+ */
+ x86_pmu_perf_get_regs_user(data, regs);
} else {
data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data->regs_user.regs = NULL;
@@ -1800,6 +1820,43 @@ static inline void x86_pmu_update_xregs(struct x86_perf_regs *perf_regs,
perf_regs->xmm_space = xsave->i387.xmm_space;
}
+/*
+ * This function retrieves cached user-space fpu registers (XMM/YMM/ZMM).
+ * If TIF_NEED_FPU_LOAD is set, it indicates that the user-space FPU state
+ * is cached. Otherwise, the data should be read directly from the hardware
+ * registers.
+ */
+static inline u64 x86_pmu_update_user_xregs(struct perf_sample_data *data,
+ u64 mask, u64 ignore_mask)
+{
+ struct x86_perf_regs *perf_regs;
+ struct xregs_state *xsave;
+ struct fpu *fpu;
+ struct fpstate *fps;
+
+ if (data->regs_user.abi == PERF_SAMPLE_REGS_ABI_NONE)
+ return 0;
+
+ if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ perf_regs = container_of(data->regs_user.regs,
+ struct x86_perf_regs, regs);
+ fpu = x86_task_fpu(current);
+ /*
+ * If __task_fpstate is set, it holds the right pointer,
+ * otherwise fpstate will.
+ */
+ fps = READ_ONCE(fpu->__task_fpstate);
+ if (!fps)
+ fps = fpu->fpstate;
+ xsave = &fps->regs.xsave;
+
+ x86_pmu_update_xregs(perf_regs, xsave, mask);
+ return 0;
+ }
+
+ return mask & ~ignore_mask;
+}
+
static void x86_pmu_sample_xregs(struct perf_event *event,
struct perf_sample_data *data,
u64 ignore_mask)
@@ -1807,6 +1864,7 @@ static void x86_pmu_sample_xregs(struct perf_event *event,
struct xregs_state *xsave = per_cpu(ext_regs_buf, smp_processor_id());
u64 sample_type = event->attr.sample_type;
struct x86_perf_regs *perf_regs;
+ u64 user_mask = 0;
u64 intr_mask = 0;
u64 mask = 0;
@@ -1817,15 +1875,26 @@ static void x86_pmu_sample_xregs(struct perf_event *event,
mask |= XFEATURE_MASK_SSE;
mask &= x86_pmu.ext_regs_mask;
+ if ((sample_type & PERF_SAMPLE_REGS_USER) && data->regs_user.abi)
+ user_mask = x86_pmu_update_user_xregs(data, mask, ignore_mask);
if ((sample_type & PERF_SAMPLE_REGS_INTR) && data->regs_intr.abi)
intr_mask = mask & ~ignore_mask;
+ if (user_mask | intr_mask) {
+ xsave->header.xfeatures = 0;
+ xsaves_nmi(xsave, user_mask | intr_mask);
+ }
+
+ if (user_mask) {
+ perf_regs = container_of(data->regs_user.regs,
+ struct x86_perf_regs, regs);
+ x86_pmu_update_xregs(perf_regs, xsave, user_mask);
+ }
+
if (intr_mask) {
perf_regs = container_of(data->regs_intr.regs,
struct x86_perf_regs, regs);
- xsave->header.xfeatures = 0;
- xsaves_nmi(xsave, mask);
x86_pmu_update_xregs(perf_regs, xsave, intr_mask);
}
}
--
2.34.1
next prev parent reply other threads:[~2026-03-24 0:46 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 0:40 [Patch v7 00/24] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2026-03-24 0:40 ` [Patch v7 01/24] perf/x86: Move hybrid PMU initialization before x86_pmu_starting_cpu() Dapeng Mi
2026-03-24 0:40 ` [Patch v7 02/24] perf/x86/intel: Avoid PEBS event on fixed counters without extended PEBS Dapeng Mi
2026-03-24 0:40 ` [Patch v7 03/24] perf/x86/intel: Enable large PEBS sampling for XMMs Dapeng Mi
2026-03-24 0:40 ` [Patch v7 04/24] perf/x86/intel: Convert x86_perf_regs to per-cpu variables Dapeng Mi
2026-03-24 0:40 ` [Patch v7 05/24] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2026-03-24 0:41 ` [Patch v7 06/24] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2026-03-24 0:41 ` [Patch v7 07/24] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2026-03-25 5:18 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 08/24] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2026-03-24 0:41 ` [Patch v7 09/24] x86/fpu: Ensure TIF_NEED_FPU_LOAD is set after saving FPU state Dapeng Mi
2026-03-24 0:41 ` [Patch v7 10/24] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2026-03-24 0:41 ` [Patch v7 11/24] perf/x86: Enable XMM Register Sampling for Non-PEBS Events Dapeng Mi
2026-03-25 7:30 ` Mi, Dapeng
2026-03-24 0:41 ` Dapeng Mi [this message]
2026-03-25 7:58 ` [Patch v7 12/24] perf/x86: Enable XMM register sampling for REGS_USER case Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 13/24] perf: Add sampling support for SIMD registers Dapeng Mi
2026-03-25 8:44 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 14/24] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2026-03-25 9:01 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 15/24] perf/x86: Enable YMM " Dapeng Mi
2026-03-24 0:41 ` [Patch v7 16/24] perf/x86: Enable ZMM " Dapeng Mi
2026-03-24 0:41 ` [Patch v7 17/24] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2026-03-24 0:41 ` [Patch v7 18/24] perf: Enhance perf_reg_validate() with simd_enabled argument Dapeng Mi
2026-03-24 0:41 ` [Patch v7 19/24] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2026-03-24 0:41 ` [Patch v7 20/24] perf/x86: Enable SSP " Dapeng Mi
2026-03-25 9:25 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 21/24] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2026-04-25 2:01 ` sashiko-bot
2026-04-29 5:25 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 22/24] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2026-04-25 3:08 ` sashiko-bot
2026-04-29 5:36 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 23/24] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2026-04-25 3:31 ` sashiko-bot
2026-04-29 6:00 ` Mi, Dapeng
2026-03-24 0:41 ` [Patch v7 24/24] perf/x86/intel: Add sanity check for PEBS fragment size Dapeng Mi
2026-04-25 3:53 ` sashiko-bot
2026-04-29 7:04 ` Mi, Dapeng
2026-03-24 1:08 ` [Patch v7 00/24] Support SIMD/eGPRs/SSP registers sampling for perf Mi, Dapeng
2026-03-25 9:41 ` Mi, Dapeng
2026-05-13 5:52 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324004118.3772171-13-dapeng1.mi@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=broonie@kernel.org \
--cc=dapeng1.mi@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=tglx@linutronix.de \
--cc=thomas.falcon@intel.com \
--cc=xudong.hao@intel.com \
--cc=zide.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.