* [RFC PATCH V2 00/13] Support vector and more extended registers in perf
@ 2025-06-26 19:55 kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler kan.liang
` (12 more replies)
0 siblings, 13 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:55 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Changes since V1:
- Apply the new interfaces to configure and dump the SIMD registers
- Utilize the existing FPU functions, e.g., xstate_calculate_size,
get_xsave_addr().
Starting from the Intel Ice Lake, the XMM registers can be collected in
a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
be added in the upcoming Architecture PEBS as well. But it requires the
hardware support.
The patch set provides a software solution to mitigate the hardware
requirement. It utilizes the XSAVES command to retrieve the requested
registers in the overflow handler. The feature isn't limited to the PEBS
event or specific platforms anymore.
The hardware solution (if available) is still preferred, since it has
low overhead (especially with the large PEBS) and is more accurate.
In theory, the solution should work for all X86 platforms. But I only
have newer Inter platforms to test. The patch set only enable the
feature for Intel Ice Lake and later platforms.
The new registers include YMM, ZMM, OPMASK, SSP, and APX.
The sample_regs_user/intr has run out. A new field in the
struct perf_event_attr is required for the registers.
After a long discussion in V1,
https://lore.kernel.org/lkml/3f1c9a9e-cb63-47ff-a5e9-06555fa6cc9a@linux.intel.com/
The new field looks like as below.
@@ -543,6 +545,25 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+
+
+ /*
+ * Defines set of SIMD registers to dump on samples.
+ * The sample_simd_regs_enabled !=0 implies the
+ * set of SIMD registers is used to config all SIMD registers.
+ * If !sample_simd_regs_enabled, sample_regs_XXX may be used to
+ * config some SIMD registers on X86.
+ */
+ union {
+ __u16 sample_simd_regs_enabled;
+ __u16 sample_simd_pred_reg_qwords;
+ };
+ __u32 sample_simd_pred_reg_intr;
+ __u32 sample_simd_pred_reg_user;
+ __u16 sample_simd_vec_reg_qwords;
+ __u64 sample_simd_vec_reg_intr;
+ __u64 sample_simd_vec_reg_user;
+ __u32 __reserved_4;
};
@@ -1016,7 +1037,15 @@ enum perf_event_type {
* } && PERF_SAMPLE_BRANCH_STACK
*
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+ * } && PERF_SAMPLE_REGS_USER
*
* { u64 size;
* char data[size];
@@ -1043,7 +1072,15 @@ enum perf_event_type {
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* { u64 transaction; } && PERF_SAMPLE_TRANSACTION
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+ * } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
* { u64 cgroup;} && PERF_SAMPLE_CGROUP
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
Since there is only one vector qwords field, the qwords for the newest
vector should be set by the tools. For example, if the end user wants
XMM0 and YMM1, the vector qwords should be 4. The vector mask should be
0x3. The YMM0 and YMM1 will be dumped to the userspace. It's the tool's
responsibility to output the XMM0 and YMM1 to the end user.
I had a POC perf tool patch for testing purposes. I didn't include it in
this RFC series. I will send a complete patch set (include both kernel
and perf tool), when the interface is accepted and there is no NAK for
the solution.
Kan Liang (13):
perf/x86: Use x86_perf_regs in the x86 nmi handler
perf/x86: Setup the regs data
x86/fpu/xstate: Add xsaves_nmi
perf: Move has_extended_regs() to header file
perf/x86: Support XMM register for non-PEBS and REGS_USER
perf: Support SIMD registers
perf/x86: Move XMM to sample_simd_vec_regs
perf/x86: Add YMM into sample_simd_vec_regs
perf/x86: Add ZMM into sample_simd_vec_regs
perf/x86: Add OPMASK into sample_simd_pred_reg
perf/x86: Add eGPRs into sample_regs
perf/x86: Add SSP into sample_regs
perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS
arch/x86/events/core.c | 281 ++++++++++++++++++++++++--
arch/x86/events/intel/core.c | 73 ++++++-
arch/x86/events/intel/ds.c | 12 +-
arch/x86/events/perf_event.h | 32 +++
arch/x86/include/asm/fpu/xstate.h | 3 +
arch/x86/include/asm/perf_event.h | 30 ++-
arch/x86/include/uapi/asm/perf_regs.h | 44 +++-
arch/x86/kernel/fpu/xstate.c | 32 ++-
arch/x86/kernel/perf_regs.c | 105 ++++++++--
include/linux/perf_event.h | 21 ++
include/linux/perf_regs.h | 5 +
include/uapi/linux/perf_event.h | 46 ++++-
kernel/events/core.c | 97 ++++++++-
13 files changed, 731 insertions(+), 50 deletions(-)
--
2.38.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
@ 2025-06-26 19:55 ` kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 02/13] perf/x86: Setup the regs data kan.liang
` (11 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:55 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
More and more regs will be supported in the overflow, e.g., more vector
registers, SSP, etc. The generic pt_regs struct cannot store all of
them. Use a X86 specific x86_perf_regs instead.
The struct pt_regs *regs is still passed to x86_pmu_handle_irq(). There
is no functional change for the existing code.
AMD IBS's NMI handler doesn't utilize the static call
x86_pmu_handle_irq(). The x86_perf_regs struct doesn't apply to the AMD
IBS. It can be added separately later when AMD IBS supports more regs.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7610f26dfbd9..64a7a8aa2e38 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1752,6 +1752,7 @@ void perf_events_lapic_init(void)
static int
perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
{
+ struct x86_perf_regs x86_regs;
u64 start_clock;
u64 finish_clock;
int ret;
@@ -1764,7 +1765,8 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
return NMI_DONE;
start_clock = sched_clock();
- ret = static_call(x86_pmu_handle_irq)(regs);
+ x86_regs.regs = *regs;
+ ret = static_call(x86_pmu_handle_irq)(&x86_regs.regs);
finish_clock = sched_clock();
perf_sample_event_took(finish_clock - start_clock);
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 02/13] perf/x86: Setup the regs data
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler kan.liang
@ 2025-06-26 19:55 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi kan.liang
` (10 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:55 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The current code relies on the generic code to setup the regs data.
It will not work well when there are more regs introduced.
Introduce a X86-specific x86_pmu_setup_regs_data().
Now, it's the same as the generic code. More X86-specific codes will be
added later when the new regs.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/events/intel/ds.c | 4 +++-
arch/x86/events/perf_event.h | 4 ++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 64a7a8aa2e38..c601ad761534 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1685,6 +1685,38 @@ static void x86_pmu_del(struct perf_event *event, int flags)
static_call_cond(x86_pmu_del)(event);
}
+void x86_pmu_setup_regs_data(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ u64 sample_type = event->attr.sample_type;
+
+ if (sample_type & PERF_SAMPLE_REGS_USER) {
+ if (user_mode(regs)) {
+ data->regs_user.abi = perf_reg_abi(current);
+ data->regs_user.regs = regs;
+ } else if (!(current->flags & PF_KTHREAD)) {
+ perf_get_regs_user(&data->regs_user, regs);
+ } else {
+ data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
+ data->regs_user.regs = NULL;
+ }
+ data->dyn_size += sizeof(u64);
+ if (data->regs_user.regs)
+ data->dyn_size += hweight64(event->attr.sample_regs_user) * sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_REGS_USER;
+ }
+
+ if (sample_type & PERF_SAMPLE_REGS_INTR) {
+ data->regs_intr.regs = regs;
+ data->regs_intr.abi = perf_reg_abi(current);
+ data->dyn_size += sizeof(u64);
+ if (data->regs_intr.regs)
+ data->dyn_size += hweight64(event->attr.sample_regs_intr) * sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_REGS_INTR;
+ }
+}
+
int x86_pmu_handle_irq(struct pt_regs *regs)
{
struct perf_sample_data data;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c0b7ac1c7594..e67d8a03ddfe 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2126,8 +2126,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
regs->flags &= ~PERF_EFLAGS_EXACT;
}
- if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER))
+ if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
adaptive_pebs_save_regs(regs, gprs);
+ x86_pmu_setup_regs_data(event, data, regs);
+ }
}
if (format_group & PEBS_DATACFG_MEMINFO) {
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2b969386dcdd..12682a059608 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1278,6 +1278,10 @@ void x86_pmu_enable_event(struct perf_event *event);
int x86_pmu_handle_irq(struct pt_regs *regs);
+void x86_pmu_setup_regs_data(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+
void x86_pmu_show_pmu_cap(struct pmu *pmu);
static inline int x86_pmu_num_counters(struct pmu *pmu)
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 02/13] perf/x86: Setup the regs data kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-07-02 0:18 ` Chang S. Bae
2025-06-26 19:56 ` [RFC PATCH V2 04/13] perf: Move has_extended_regs() to header file kan.liang
` (9 subsequent siblings)
12 siblings, 1 reply; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
There is a hardware feature (Intel PEBS XMMs group), which can handle
XSAVE "snapshots" from random code running. This just provides another
XSAVE data source at a random time.
Add an interface to retrieve the actual register contents when the NMI
hit. The interface is different from the other interfaces of FPU. The
other mechanisms that deal with xstate try to get something coherent.
But this interface is *in*coherent. There's no telling what was in the
registers when a NMI hits. It writes whatever was in the registers when
the NMI hit. It's the invoker's responsibility to make sure the contents
are properly filtered before exposing them to the end user.
The support of the supervisor state components is required. The
compacted storage format is preferred. So the XSAVES is used.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/include/asm/fpu/xstate.h | 1 +
arch/x86/kernel/fpu/xstate.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index b308a76afbb7..0c8b9251c29f 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -107,6 +107,7 @@ int xfeature_size(int xfeature_nr);
void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);
+void xsaves_nmi(struct xregs_state *xsave, u64 mask);
int xfd_enable_feature(u64 xfd_err);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9aa9ac8399ae..8602683fcb12 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1448,6 +1448,36 @@ void xrstors(struct xregs_state *xstate, u64 mask)
WARN_ON_ONCE(err);
}
+/**
+ * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI
+ * @xstate: Pointer to the buffer
+ * @mask: Feature mask to select the components to save
+ *
+ * The @xstate buffer must be 64 byte aligned.
+ *
+ * Caution: The interface is different from the other interfaces of FPU.
+ * The other mechanisms that deal with xstate try to get something coherent.
+ * But this interface is *in*coherent. There's no telling what was in the
+ * registers when a NMI hits. It writes whatever was in the registers when
+ * the NMI hit.
+ * The only user for the interface is perf_event. There is already a
+ * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
+ * "snapshots" from random code running. This just provides another XSAVE
+ * data source at a random time.
+ * This function can only be invoked in an NMI. It returns the *ACTUAL*
+ * register contents when the NMI hit.
+ */
+void xsaves_nmi(struct xregs_state *xstate, u64 mask)
+{
+ int err;
+
+ if (!in_nmi())
+ return;
+
+ XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
+ WARN_ON_ONCE(err);
+}
+
#if IS_ENABLED(CONFIG_KVM)
void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature)
{
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 04/13] perf: Move has_extended_regs() to header file
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (2 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER kan.liang
` (8 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The function will also be used in the ARCH-specific code.
Rename it to follow the naming rule of the existing functions.
No functional change.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
include/linux/perf_event.h | 8 ++++++++
kernel/events/core.c | 8 +-------
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 52dc7cfab0e0..74c188a699e4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1488,6 +1488,14 @@ perf_event__output_id_sample(struct perf_event *event,
extern void
perf_log_lost_samples(struct perf_event *event, u64 lost);
+static inline bool event_has_extended_regs(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+
+ return (attr->sample_regs_user & PERF_REG_EXTENDED_MASK) ||
+ (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK);
+}
+
static inline bool event_has_any_exclude_flag(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cc77f127e11a..7f0d98d73629 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12502,12 +12502,6 @@ int perf_pmu_unregister(struct pmu *pmu)
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
-static inline bool has_extended_regs(struct perf_event *event)
-{
- return (event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK) ||
- (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK);
-}
-
static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
{
struct perf_event_context *ctx = NULL;
@@ -12542,7 +12536,7 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
goto err_pmu;
if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) &&
- has_extended_regs(event)) {
+ event_has_extended_regs(event)) {
ret = -EOPNOTSUPP;
goto err_destroy;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (3 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 04/13] perf: Move has_extended_regs() to header file kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-27 14:35 ` Dave Hansen
2025-06-26 19:56 ` [RFC PATCH V2 06/13] perf: Support SIMD registers kan.liang
` (7 subsequent siblings)
12 siblings, 1 reply; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Collecting the XMM registers in a PEBS record has been supported since
the Icelake. But non-PEBS events don't support the feature. It's
possible to retrieve the XMM registers from the XSAVE for non-PEBS.
Add it to make the feature complete.
To utilize the XSAVE, a 64-byte aligned buffer is required. Add a
per-CPU ext_regs_buf to store the vector registers. The size of the
buffer is ~2K. kzalloc_node() is used because there's a _guarantee_
that all kmalloc()'s with powers of 2 are naturally aligned and also
64b aligned.
Extend the support for both REGS_USER and REGS_INTR. For REGS_USER, the
perf_get_regs_user() returns the regs from the task_pt_regs(current),
which is struct pt_regs. Need to move it to local struct x86_perf_regs
x86_user_regs.
For PEBS, the HW support is still preferred. The XMM should be retrieved
from PEBS records.
There could be more vector registers supported later. Add ext_regs_mask
to track the supported vector register group.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 128 +++++++++++++++++++++++++-----
arch/x86/events/intel/core.c | 27 +++++++
arch/x86/events/intel/ds.c | 10 ++-
arch/x86/events/perf_event.h | 12 ++-
arch/x86/include/asm/fpu/xstate.h | 2 +
arch/x86/include/asm/perf_event.h | 5 +-
arch/x86/kernel/fpu/xstate.c | 2 +-
7 files changed, 161 insertions(+), 25 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c601ad761534..899bd5680f6b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -406,6 +406,62 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
return x86_pmu_extra_regs(val, event);
}
+static DEFINE_PER_CPU(struct xregs_state *, ext_regs_buf);
+
+static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
+{
+ struct xregs_state *xsave = per_cpu(ext_regs_buf, smp_processor_id());
+
+ if (WARN_ON_ONCE(!xsave))
+ return;
+
+ xsaves_nmi(xsave, mask);
+
+ if (mask & XFEATURE_MASK_SSE &&
+ xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE))
+ perf_regs->xmm_space = xsave->i387.xmm_space;
+}
+
+static void release_ext_regs_buffers(void)
+{
+ int cpu;
+
+ if (!x86_pmu.ext_regs_mask)
+ return;
+
+ for_each_possible_cpu(cpu) {
+ kfree(per_cpu(ext_regs_buf, cpu));
+ per_cpu(ext_regs_buf, cpu) = NULL;
+ }
+}
+
+static void reserve_ext_regs_buffers(void)
+{
+ unsigned int size;
+ u64 mask = 0;
+ int cpu;
+
+ if (!x86_pmu.ext_regs_mask)
+ return;
+
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)
+ mask |= XFEATURE_MASK_SSE;
+
+ size = xstate_calculate_size(mask, true);
+
+ for_each_possible_cpu(cpu) {
+ per_cpu(ext_regs_buf, cpu) = kzalloc_node(size, GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (!per_cpu(ext_regs_buf, cpu))
+ goto err;
+ }
+
+ return;
+
+err:
+ release_ext_regs_buffers();
+}
+
int x86_reserve_hardware(void)
{
int err = 0;
@@ -418,6 +474,7 @@ int x86_reserve_hardware(void)
} else {
reserve_ds_buffers();
reserve_lbr_buffers();
+ reserve_ext_regs_buffers();
}
}
if (!err)
@@ -434,6 +491,7 @@ void x86_release_hardware(void)
release_pmc_hardware();
release_ds_buffers();
release_lbr_buffers();
+ release_ext_regs_buffers();
mutex_unlock(&pmc_reserve_mutex);
}
}
@@ -642,21 +700,18 @@ int x86_pmu_hw_config(struct perf_event *event)
return -EINVAL;
}
- /* sample_regs_user never support XMM registers */
- if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK))
- return -EINVAL;
- /*
- * Besides the general purpose registers, XMM registers may
- * be collected in PEBS on some platforms, e.g. Icelake
- */
- if (unlikely(event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK)) {
- if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS))
- return -EINVAL;
-
- if (!event->attr.precise_ip)
- return -EINVAL;
+ if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
+ /*
+ * Besides the general purpose registers, XMM registers may
+ * be collected as well.
+ */
+ if (event_has_extended_regs(event)) {
+ if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS))
+ return -EINVAL;
+ if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
+ return -EINVAL;
+ }
}
-
return x86_setup_perfctr(event);
}
@@ -1685,25 +1740,51 @@ static void x86_pmu_del(struct perf_event *event, int flags)
static_call_cond(x86_pmu_del)(event);
}
+static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs);
+
+static struct x86_perf_regs *
+x86_pmu_perf_get_regs_user(struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct x86_perf_regs *x86_regs_user = this_cpu_ptr(&x86_user_regs);
+ struct perf_regs regs_user;
+
+ perf_get_regs_user(®s_user, regs);
+ data->regs_user.abi = regs_user.abi;
+ if (regs_user.regs) {
+ x86_regs_user->regs = *regs_user.regs;
+ data->regs_user.regs = &x86_regs_user->regs;
+ } else
+ data->regs_user.regs = NULL;
+ return x86_regs_user;
+}
+
void x86_pmu_setup_regs_data(struct perf_event *event,
struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ u64 ignore_mask)
{
- u64 sample_type = event->attr.sample_type;
+ struct x86_perf_regs *perf_regs = container_of(regs, struct x86_perf_regs, regs);
+ struct perf_event_attr *attr = &event->attr;
+ u64 sample_type = attr->sample_type;
+ u64 mask = 0;
+
+ if (!(attr->sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)))
+ return;
if (sample_type & PERF_SAMPLE_REGS_USER) {
if (user_mode(regs)) {
data->regs_user.abi = perf_reg_abi(current);
data->regs_user.regs = regs;
} else if (!(current->flags & PF_KTHREAD)) {
- perf_get_regs_user(&data->regs_user, regs);
+ perf_regs = x86_pmu_perf_get_regs_user(data, regs);
} else {
data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data->regs_user.regs = NULL;
}
data->dyn_size += sizeof(u64);
if (data->regs_user.regs)
- data->dyn_size += hweight64(event->attr.sample_regs_user) * sizeof(u64);
+ data->dyn_size += hweight64(attr->sample_regs_user) * sizeof(u64);
data->sample_flags |= PERF_SAMPLE_REGS_USER;
}
@@ -1712,9 +1793,18 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
data->regs_intr.abi = perf_reg_abi(current);
data->dyn_size += sizeof(u64);
if (data->regs_intr.regs)
- data->dyn_size += hweight64(event->attr.sample_regs_intr) * sizeof(u64);
+ data->dyn_size += hweight64(attr->sample_regs_intr) * sizeof(u64);
data->sample_flags |= PERF_SAMPLE_REGS_INTR;
}
+
+ if (event_has_extended_regs(event)) {
+ perf_regs->xmm_regs = NULL;
+ mask |= XFEATURE_MASK_SSE;
+ }
+
+ mask &= ~ignore_mask;
+ if (mask)
+ x86_pmu_get_ext_regs(perf_regs, mask);
}
int x86_pmu_handle_irq(struct pt_regs *regs)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index edebc8dfbc96..c73c2e57d71b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3285,6 +3285,8 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
if (has_branch_stack(event))
intel_pmu_lbr_save_brstack(&data, cpuc, event);
+ x86_pmu_setup_regs_data(event, &data, regs, 0);
+
perf_event_overflow(event, &data, regs);
}
@@ -5273,6 +5275,29 @@ static inline bool intel_pmu_broken_perf_cap(void)
return false;
}
+static void intel_extended_regs_init(struct pmu *pmu)
+{
+ /*
+ * Extend the vector registers support to non-PEBS.
+ * The feature is limited to newer Intel machines with
+ * PEBS V4+ or archPerfmonExt (0x23) enabled for now.
+ * In theory, the vector registers can be retrieved as
+ * long as the CPU supports. The support for the old
+ * generations may be added later if there is a
+ * requirement.
+ * Only support the extension when XSAVES is available.
+ */
+ if (!boot_cpu_has(X86_FEATURE_XSAVES))
+ return;
+
+ if (!boot_cpu_has(X86_FEATURE_XMM) ||
+ !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
+ return;
+
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_XMM;
+ x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
+}
+
static void update_pmu_cap(struct pmu *pmu)
{
unsigned int cntr, fixed_cntr, ecx, edx;
@@ -5307,6 +5332,8 @@ static void update_pmu_cap(struct pmu *pmu)
/* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration */
rdmsrq(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities);
}
+
+ intel_extended_regs_init(pmu);
}
static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index e67d8a03ddfe..8437730abfb7 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1415,8 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event)
if (gprs || (attr->precise_ip < 2) || tsx_weight)
pebs_data_cfg |= PEBS_DATACFG_GP;
- if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
- (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK))
+ if (event_has_extended_regs(event))
pebs_data_cfg |= PEBS_DATACFG_XMMS;
if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
@@ -2127,8 +2126,12 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
}
if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
+ u64 mask = 0;
+
adaptive_pebs_save_regs(regs, gprs);
- x86_pmu_setup_regs_data(event, data, regs);
+ if (format_group & PEBS_DATACFG_XMMS)
+ mask |= XFEATURE_MASK_SSE;
+ x86_pmu_setup_regs_data(event, data, regs, mask);
}
}
@@ -2755,6 +2758,7 @@ void __init intel_pebs_init(void)
x86_pmu.flags |= PMU_FL_PEBS_ALL;
x86_pmu.pebs_capable = ~0ULL;
pebs_qual = "-baseline";
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_XMM;
x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 12682a059608..37ed46cafa53 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -687,6 +687,10 @@ enum {
x86_lbr_exclusive_max,
};
+enum {
+ X86_EXT_REGS_XMM = BIT_ULL(0),
+};
+
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
#define PERF_PEBS_DATA_SOURCE_MASK (PERF_PEBS_DATA_SOURCE_MAX - 1)
#define PERF_PEBS_DATA_SOURCE_GRT_MAX 0x10
@@ -992,6 +996,11 @@ struct x86_pmu {
struct extra_reg *extra_regs;
unsigned int flags;
+ /*
+ * Extended regs, e.g., vector registers
+ */
+ u64 ext_regs_mask;
+
/*
* Intel host/guest support (KVM)
*/
@@ -1280,7 +1289,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs);
void x86_pmu_setup_regs_data(struct perf_event *event,
struct perf_sample_data *data,
- struct pt_regs *regs);
+ struct pt_regs *regs,
+ u64 ignore_mask);
void x86_pmu_show_pmu_cap(struct pmu *pmu);
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0c8b9251c29f..58bbdf9226d1 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -109,6 +109,8 @@ void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);
void xsaves_nmi(struct xregs_state *xsave, u64 mask);
+unsigned int xstate_calculate_size(u64 xfeatures, bool compacted);
+
int xfd_enable_feature(u64 xfd_err);
#ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 70d1d94aca7e..f36f04bc95f1 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -592,7 +592,10 @@ extern void perf_events_lapic_init(void);
struct pt_regs;
struct x86_perf_regs {
struct pt_regs regs;
- u64 *xmm_regs;
+ union {
+ u64 *xmm_regs;
+ u32 *xmm_space; /* for xsaves */
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 8602683fcb12..4747b29608cd 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -583,7 +583,7 @@ static bool __init check_xstate_against_struct(int nr)
return true;
}
-static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
+unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
unsigned int offset, i;
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 06/13] perf: Support SIMD registers
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (4 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-07-02 11:16 ` Mark Brown
2025-06-26 19:56 ` [RFC PATCH V2 07/13] perf/x86: Move XMM to sample_simd_vec_regs kan.liang
` (6 subsequent siblings)
12 siblings, 1 reply; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The users may be interested in the SIMD registers in a sample while
profiling. The current sample_regs_XXX doesn't have enough space for all
SIMD registers.
Add sets of the sample_simd_{pred,vec}_reg_* in the
struct perf_event_attr to define a set of SIMD registers to dump on
samples.
The current X86 supports the XMM registers in sample_regs_XXX. To
utilize the new SIMD registers configuration method, the
sample_simd_regs_enabled should always be set. If so, the XMM space in
the sample_regs_XXX is reserved for other usage.
The SIMD registers are wider than 64. A new output format is introduced.
The number and width of SIMD registers will be dumped first, following
the register values. The number and width are the same as the user's
configuration now. If, for some reason (e.g., ARM) they are different,
an ARCH-specific perf_output_sample_simd_regs can be implemented later
separately.
Add a new ABI, PERF_SAMPLE_REGS_ABI_SIMD, to indicate the new format.
The enum perf_sample_regs_abi becomes a bitmap now. There should be no
impact on the existing tool, since the version and bitmap are the same
for 1 and 2.
Add two new __weak functions to validate the configuration of the SIMD
registers and retrieve the SIMD registers. The ARCH-specific functions
will be implemented in the following patches.
Add a new flag PERF_PMU_CAP_SIMD_REGS to indicate that the PMU has the
capability to support SIMD registers dumping. Error out if the
sample_simd_{pred,vec}_reg_* mistakenly set for a PMU that doesn't have
the capability.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
include/linux/perf_event.h | 13 +++++
include/linux/perf_regs.h | 5 ++
include/uapi/linux/perf_event.h | 47 +++++++++++++++--
kernel/events/core.c | 89 +++++++++++++++++++++++++++++++--
4 files changed, 146 insertions(+), 8 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 74c188a699e4..56bcb073100f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,6 +305,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
+#define PERF_PMU_CAP_SIMD_REGS 0x0800
/**
* pmu::scope
@@ -1488,6 +1489,18 @@ perf_event__output_id_sample(struct perf_event *event,
extern void
perf_log_lost_samples(struct perf_event *event, u64 lost);
+static inline bool event_has_simd_regs(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+
+ return attr->sample_simd_regs_enabled != 0 ||
+ attr->sample_simd_pred_reg_intr != 0 ||
+ attr->sample_simd_pred_reg_user != 0 ||
+ attr->sample_simd_vec_reg_qwords != 0 ||
+ attr->sample_simd_vec_reg_intr != 0 ||
+ attr->sample_simd_vec_reg_user != 0;
+}
+
static inline bool event_has_extended_regs(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h
index f632c5725f16..38d11f152753 100644
--- a/include/linux/perf_regs.h
+++ b/include/linux/perf_regs.h
@@ -9,6 +9,11 @@ struct perf_regs {
struct pt_regs *regs;
};
+int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
+ u16 pred_qwords, u32 pred_mask);
+u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
+ u16 qwords_idx, bool pred);
+
#ifdef CONFIG_HAVE_PERF_REGS
#include <asm/perf_regs.h>
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 78a362b80027..2e9b16acbed6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -313,9 +313,10 @@ enum {
* Values to determine ABI of the registers dump.
*/
enum perf_sample_regs_abi {
- PERF_SAMPLE_REGS_ABI_NONE = 0,
- PERF_SAMPLE_REGS_ABI_32 = 1,
- PERF_SAMPLE_REGS_ABI_64 = 2,
+ PERF_SAMPLE_REGS_ABI_NONE = 0x00,
+ PERF_SAMPLE_REGS_ABI_32 = 0x01,
+ PERF_SAMPLE_REGS_ABI_64 = 0x02,
+ PERF_SAMPLE_REGS_ABI_SIMD = 0x04,
};
/*
@@ -382,6 +383,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
+#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */
/*
* 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +545,25 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+
+
+ /*
+ * Defines set of SIMD registers to dump on samples.
+ * The sample_simd_regs_enabled !=0 implies the
+ * set of SIMD registers is used to config all SIMD registers.
+ * If !sample_simd_regs_enabled, sample_regs_XXX may be used to
+ * config some SIMD registers on X86.
+ */
+ union {
+ __u16 sample_simd_regs_enabled;
+ __u16 sample_simd_pred_reg_qwords;
+ };
+ __u32 sample_simd_pred_reg_intr;
+ __u32 sample_simd_pred_reg_user;
+ __u16 sample_simd_vec_reg_qwords;
+ __u64 sample_simd_vec_reg_intr;
+ __u64 sample_simd_vec_reg_user;
+ __u32 __reserved_4;
};
/*
@@ -1016,7 +1037,15 @@ enum perf_event_type {
* } && PERF_SAMPLE_BRANCH_STACK
*
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+ * } && PERF_SAMPLE_REGS_USER
*
* { u64 size;
* char data[size];
@@ -1043,7 +1072,15 @@ enum perf_event_type {
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* { u64 transaction; } && PERF_SAMPLE_TRANSACTION
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+ * } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
* { u64 cgroup;} && PERF_SAMPLE_CGROUP
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7f0d98d73629..14ae43694833 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7390,6 +7390,43 @@ perf_output_sample_regs(struct perf_output_handle *handle,
}
}
+static void
+perf_output_sample_simd_regs(struct perf_output_handle *handle,
+ struct perf_event *event,
+ struct pt_regs *regs,
+ u64 mask, u16 pred_mask)
+{
+ u16 pred_qwords = event->attr.sample_simd_pred_reg_qwords;
+ u16 vec_qwords = event->attr.sample_simd_vec_reg_qwords;
+ u16 nr_pred = hweight16(pred_mask);
+ u16 nr_vectors = hweight64(mask);
+ int bit;
+ u64 val;
+ u16 i;
+
+ perf_output_put(handle, nr_vectors);
+ perf_output_put(handle, vec_qwords);
+ perf_output_put(handle, nr_pred);
+ perf_output_put(handle, pred_qwords);
+
+ if (nr_vectors) {
+ for_each_set_bit(bit, (unsigned long *)&mask, sizeof(mask) * BITS_PER_BYTE) {
+ for (i = 0; i < vec_qwords; i++) {
+ val = perf_simd_reg_value(regs, bit, i, false);
+ perf_output_put(handle, val);
+ }
+ }
+ }
+ if (nr_pred) {
+ for_each_set_bit(bit, (unsigned long *)&pred_mask, sizeof(pred_mask) * BITS_PER_BYTE) {
+ for (i = 0; i < pred_qwords; i++) {
+ val = perf_simd_reg_value(regs, bit, i, true);
+ perf_output_put(handle, val);
+ }
+ }
+ }
+}
+
static void perf_sample_regs_user(struct perf_regs *regs_user,
struct pt_regs *regs)
{
@@ -7411,6 +7448,17 @@ static void perf_sample_regs_intr(struct perf_regs *regs_intr,
regs_intr->abi = perf_reg_abi(current);
}
+int __weak perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
+ u16 pred_qwords, u32 pred_mask)
+{
+ return vec_qwords || vec_mask || pred_qwords || pred_mask ? -ENOSYS : 0;
+}
+
+u64 __weak perf_simd_reg_value(struct pt_regs *regs, int idx,
+ u16 qwords_idx, bool pred)
+{
+ return 0;
+}
/*
* Get remaining task size from user stack pointer.
@@ -7939,10 +7987,17 @@ void perf_output_sample(struct perf_output_handle *handle,
perf_output_put(handle, abi);
if (abi) {
- u64 mask = event->attr.sample_regs_user;
+ struct perf_event_attr *attr = &event->attr;
+ u64 mask = attr->sample_regs_user;
perf_output_sample_regs(handle,
data->regs_user.regs,
mask);
+ if (abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+ perf_output_sample_simd_regs(handle, event,
+ data->regs_user.regs,
+ attr->sample_simd_vec_reg_user,
+ attr->sample_simd_pred_reg_user);
+ }
}
}
@@ -7970,11 +8025,18 @@ void perf_output_sample(struct perf_output_handle *handle,
perf_output_put(handle, abi);
if (abi) {
- u64 mask = event->attr.sample_regs_intr;
+ struct perf_event_attr *attr = &event->attr;
+ u64 mask = attr->sample_regs_intr;
perf_output_sample_regs(handle,
data->regs_intr.regs,
mask);
+ if (abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+ perf_output_sample_simd_regs(handle, event,
+ data->regs_intr.regs,
+ attr->sample_simd_vec_reg_intr,
+ attr->sample_simd_pred_reg_intr);
+ }
}
}
@@ -12535,6 +12597,12 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
if (ret)
goto err_pmu;
+ if (!(pmu->capabilities & PERF_PMU_CAP_SIMD_REGS) &&
+ event_has_simd_regs(event)) {
+ ret = -EOPNOTSUPP;
+ goto err_destroy;
+ }
+
if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) &&
event_has_extended_regs(event)) {
ret = -EOPNOTSUPP;
@@ -13076,6 +13144,12 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
ret = perf_reg_validate(attr->sample_regs_user);
if (ret)
return ret;
+ ret = perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords,
+ attr->sample_simd_vec_reg_user,
+ attr->sample_simd_pred_reg_qwords,
+ attr->sample_simd_pred_reg_user);
+ if (ret)
+ return ret;
}
if (attr->sample_type & PERF_SAMPLE_STACK_USER) {
@@ -13096,8 +13170,17 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
if (!attr->sample_max_stack)
attr->sample_max_stack = sysctl_perf_event_max_stack;
- if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
+ if (attr->sample_type & PERF_SAMPLE_REGS_INTR) {
ret = perf_reg_validate(attr->sample_regs_intr);
+ if (ret)
+ return ret;
+ ret = perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords,
+ attr->sample_simd_vec_reg_intr,
+ attr->sample_simd_pred_reg_qwords,
+ attr->sample_simd_pred_reg_intr);
+ if (ret)
+ return ret;
+ }
#ifndef CONFIG_CGROUP_PERF
if (attr->sample_type & PERF_SAMPLE_CGROUP)
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 07/13] perf/x86: Move XMM to sample_simd_vec_regs
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (5 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 06/13] perf: Support SIMD registers kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 08/13] perf/x86: Add YMM into sample_simd_vec_regs kan.liang
` (5 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The XMM0-15 are SIMD registers. Move them from sample_regs to
sample_simd_vec_regs. Reject access to the extended space of the sample_regs
if the new sample_simd_vec_regs is used.
The perf_reg_value requires the abi to understand the layout of the
sample_regs. Add the abi information in the struct x86_perf_regs.
Implement the X86-specific perf_simd_reg_validate to validate the SIMD
registers configuration from the user tool. Only the XMM0-15 is
supported now. More registers will be added in the following patches.
Implement the X86-specific perf_simd_reg_value to retrieve the XMM
value.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 38 ++++++++++++++++++++-
arch/x86/events/intel/ds.c | 2 +-
arch/x86/events/perf_event.h | 12 +++++++
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/include/uapi/asm/perf_regs.h | 6 ++++
arch/x86/kernel/perf_regs.c | 49 ++++++++++++++++++++++++++-
6 files changed, 105 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 899bd5680f6b..2515179ac664 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -710,6 +710,22 @@ int x86_pmu_hw_config(struct perf_event *event)
return -EINVAL;
if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
return -EINVAL;
+ if (event->attr.sample_simd_regs_enabled)
+ return -EINVAL;
+ }
+
+ if (event_has_simd_regs(event)) {
+ if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS))
+ return -EINVAL;
+ /* Not require any vector registers but set width */
+ if (event->attr.sample_simd_vec_reg_qwords &&
+ !event->attr.sample_simd_vec_reg_intr &&
+ !event->attr.sample_simd_vec_reg_user)
+ return -EINVAL;
+ /* The vector registers set is not supported */
+ if (event->attr.sample_simd_vec_reg_qwords >= PERF_X86_XMM_QWORDS &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
+ return -EINVAL;
}
}
return x86_setup_perfctr(event);
@@ -1785,6 +1801,16 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
data->dyn_size += sizeof(u64);
if (data->regs_user.regs)
data->dyn_size += hweight64(attr->sample_regs_user) * sizeof(u64);
+ if (attr->sample_simd_regs_enabled && data->regs_user.abi) {
+ /* num and qwords of vector and pred registers */
+ data->dyn_size += sizeof(u64);
+ /* data[] */
+ data->dyn_size += hweight64(attr->sample_simd_vec_reg_user) *
+ sizeof(u64) *
+ attr->sample_simd_vec_reg_qwords;
+ data->regs_user.abi |= PERF_SAMPLE_REGS_ABI_SIMD;
+ }
+ perf_regs->abi = data->regs_user.abi;
data->sample_flags |= PERF_SAMPLE_REGS_USER;
}
@@ -1794,10 +1820,20 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
data->dyn_size += sizeof(u64);
if (data->regs_intr.regs)
data->dyn_size += hweight64(attr->sample_regs_intr) * sizeof(u64);
+ if (attr->sample_simd_regs_enabled && data->regs_intr.abi) {
+ /* num and qwords of vector and pred registers */
+ data->dyn_size += sizeof(u64);
+ /* data[] */
+ data->dyn_size += hweight64(attr->sample_simd_vec_reg_intr) *
+ sizeof(u64) *
+ attr->sample_simd_vec_reg_qwords;
+ data->regs_intr.abi |= PERF_SAMPLE_REGS_ABI_SIMD;
+ }
+ perf_regs->abi = data->regs_intr.abi;
data->sample_flags |= PERF_SAMPLE_REGS_INTR;
}
- if (event_has_extended_regs(event)) {
+ if (event_needs_xmm(event)) {
perf_regs->xmm_regs = NULL;
mask |= XFEATURE_MASK_SSE;
}
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8437730abfb7..849136bef336 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1415,7 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event)
if (gprs || (attr->precise_ip < 2) || tsx_weight)
pebs_data_cfg |= PEBS_DATACFG_GP;
- if (event_has_extended_regs(event))
+ if (event_needs_xmm(event))
pebs_data_cfg |= PEBS_DATACFG_XMMS;
if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 37ed46cafa53..69964433a245 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -133,6 +133,18 @@ static inline bool is_acr_event_group(struct perf_event *event)
return check_leader_group(event->group_leader, PERF_X86_EVENT_ACR);
}
+static inline bool event_needs_xmm(struct perf_event *event)
+{
+ if (event->attr.sample_simd_regs_enabled &&
+ event->attr.sample_simd_vec_reg_qwords >= PERF_X86_XMM_QWORDS)
+ return true;
+
+ if (!event->attr.sample_simd_regs_enabled &&
+ event_has_extended_regs(event))
+ return true;
+ return false;
+}
+
struct amd_nb {
int nb_id; /* NorthBridge id */
int refcnt; /* reference count */
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f36f04bc95f1..538219c59979 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -592,6 +592,7 @@ extern void perf_events_lapic_init(void);
struct pt_regs;
struct x86_perf_regs {
struct pt_regs regs;
+ u64 abi;
union {
u64 *xmm_regs;
u32 *xmm_space; /* for xsaves */
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index 7c9d2bb3833b..bd8af802f757 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -55,4 +55,10 @@ enum perf_event_x86_regs {
#define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
+#define PERF_X86_SIMD_VEC_REGS_MAX 16
+#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
+
+#define PERF_X86_XMM_QWORDS 2
+#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS
+
#endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 624703af80a1..638b9e186c50 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -63,6 +63,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
if (idx >= PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) {
perf_regs = container_of(regs, struct x86_perf_regs, regs);
+ /* SIMD registers are moved to dedicated sample_simd_vec_reg */
+ if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+ return 0;
if (!perf_regs->xmm_regs)
return 0;
return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0];
@@ -74,6 +77,49 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
return regs_get_register(regs, pt_regs_offset[idx]);
}
+u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
+ u16 qwords_idx, bool pred)
+{
+ struct x86_perf_regs *perf_regs = container_of(regs, struct x86_perf_regs, regs);
+
+ if (pred)
+ return 0;
+
+ if (WARN_ON_ONCE(idx >= PERF_X86_SIMD_VEC_REGS_MAX ||
+ qwords_idx >= PERF_X86_SIMD_QWORDS_MAX))
+ return 0;
+
+ if (qwords_idx < PERF_X86_XMM_QWORDS) {
+ if (!perf_regs->xmm_regs)
+ return 0;
+ return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx];
+ }
+
+ return 0;
+}
+
+int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
+ u16 pred_qwords, u32 pred_mask)
+{
+ /* pred_qwords implies sample_simd_{pred,vec}_reg_* are supported */
+ if (!pred_qwords)
+ return 0;
+
+ if (!vec_qwords) {
+ if (vec_mask)
+ return -EINVAL;
+ } else {
+ if (vec_qwords != PERF_X86_XMM_QWORDS)
+ return -EINVAL;
+ if (vec_mask & ~PERF_X86_SIMD_VEC_MASK)
+ return -EINVAL;
+ }
+ if (pred_mask)
+ return -EINVAL;
+
+ return 0;
+}
+
#define PERF_REG_X86_RESERVED (((1ULL << PERF_REG_X86_XMM0) - 1) & \
~((1ULL << PERF_REG_X86_MAX) - 1))
@@ -114,7 +160,8 @@ void perf_get_regs_user(struct perf_regs *regs_user,
int perf_reg_validate(u64 mask)
{
- if (!mask || (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED)))
+ /* The mask could be 0 if only the SIMD registers are interested */
+ if (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))
return -EINVAL;
return 0;
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 08/13] perf/x86: Add YMM into sample_simd_vec_regs
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (6 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 07/13] perf/x86: Move XMM to sample_simd_vec_regs kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 09/13] perf/x86: Add ZMM " kan.liang
` (4 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The YMM0-15 is composed of XMM and YMMH. It requires 2 XSAVE commands to
get the complete value. Internally, the XMM and YMMH are stored in
different structures, which follow the XSAVE format. But the output
dumps the YMM as a whole.
The qwords 4 imply YMM.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 15 +++++++++++++++
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/perf_event.h | 4 ++++
arch/x86/include/uapi/asm/perf_regs.h | 4 +++-
arch/x86/kernel/perf_regs.c | 7 ++++++-
5 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2515179ac664..20c825e83a3f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -420,6 +420,9 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
if (mask & XFEATURE_MASK_SSE &&
xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE))
perf_regs->xmm_space = xsave->i387.xmm_space;
+
+ if (mask & XFEATURE_MASK_YMM)
+ perf_regs->ymmh = get_xsave_addr(xsave, XFEATURE_YMM);
}
static void release_ext_regs_buffers(void)
@@ -446,6 +449,8 @@ static void reserve_ext_regs_buffers(void)
if (x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM)
mask |= XFEATURE_MASK_SSE;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM)
+ mask |= XFEATURE_MASK_YMM;
size = xstate_calculate_size(mask, true);
@@ -726,6 +731,9 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.sample_simd_vec_reg_qwords >= PERF_X86_XMM_QWORDS &&
!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
return -EINVAL;
+ if (event->attr.sample_simd_vec_reg_qwords >= PERF_X86_YMM_QWORDS &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM))
+ return -EINVAL;
}
}
return x86_setup_perfctr(event);
@@ -1838,6 +1846,13 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
mask |= XFEATURE_MASK_SSE;
}
+ if (attr->sample_simd_regs_enabled) {
+ if (attr->sample_simd_vec_reg_qwords >= PERF_X86_YMM_QWORDS) {
+ perf_regs->ymmh_regs = NULL;
+ mask |= XFEATURE_MASK_YMM;
+ }
+ }
+
mask &= ~ignore_mask;
if (mask)
x86_pmu_get_ext_regs(perf_regs, mask);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 69964433a245..7d332d0247ed 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -701,6 +701,7 @@ enum {
enum {
X86_EXT_REGS_XMM = BIT_ULL(0),
+ X86_EXT_REGS_YMM = BIT_ULL(1),
};
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 538219c59979..81e3143fd91a 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -597,6 +597,10 @@ struct x86_perf_regs {
u64 *xmm_regs;
u32 *xmm_space; /* for xsaves */
};
+ union {
+ u64 *ymmh_regs;
+ struct ymmh_struct *ymmh;
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index bd8af802f757..feb3e8f80761 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -59,6 +59,8 @@ enum perf_event_x86_regs {
#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
#define PERF_X86_XMM_QWORDS 2
-#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS
+#define PERF_X86_YMM_QWORDS 4
+#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2)
+#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS
#endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 638b9e186c50..37cf0a282915 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -93,6 +93,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
if (!perf_regs->xmm_regs)
return 0;
return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx];
+ } else if (qwords_idx < PERF_X86_YMM_QWORDS) {
+ if (!perf_regs->ymmh_regs)
+ return 0;
+ return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PERF_X86_XMM_QWORDS];
}
return 0;
@@ -109,7 +113,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
if (vec_mask)
return -EINVAL;
} else {
- if (vec_qwords != PERF_X86_XMM_QWORDS)
+ if (vec_qwords != PERF_X86_XMM_QWORDS &&
+ vec_qwords != PERF_X86_YMM_QWORDS)
return -EINVAL;
if (vec_mask & ~PERF_X86_SIMD_VEC_MASK)
return -EINVAL;
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 09/13] perf/x86: Add ZMM into sample_simd_vec_regs
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (7 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 08/13] perf/x86: Add YMM into sample_simd_vec_regs kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 10/13] perf/x86: Add OPMASK into sample_simd_pred_reg kan.liang
` (3 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The ZMM0-15 is composed of XMM, YMMH, and ZMMH. It requires 3 XSAVE
commands to get the complete value.
The ZMM16-31/YMM16-31/XMM16-31 are also supported, which only require
the XSAVE Hi16_ZMM.
Internally, the XMM, YMMH, ZMMH and Hi16_ZMM are stored in different
structures, which follow the XSAVE format. But the output dumps the ZMM
or Hi16 XMM/YMM/ZMM as a whole.
The qwords 8 imply ZMM.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 24 ++++++++++++++++++++++++
arch/x86/events/perf_event.h | 2 ++
arch/x86/include/asm/perf_event.h | 8 ++++++++
arch/x86/include/uapi/asm/perf_regs.h | 8 ++++++--
arch/x86/kernel/perf_regs.c | 13 ++++++++++++-
5 files changed, 52 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 20c825e83a3f..3c05ca98ec3f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -423,6 +423,10 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
if (mask & XFEATURE_MASK_YMM)
perf_regs->ymmh = get_xsave_addr(xsave, XFEATURE_YMM);
+ if (mask & XFEATURE_MASK_ZMM_Hi256)
+ perf_regs->zmmh = get_xsave_addr(xsave, XFEATURE_ZMM_Hi256);
+ if (mask & XFEATURE_MASK_Hi16_ZMM)
+ perf_regs->h16zmm = get_xsave_addr(xsave, XFEATURE_Hi16_ZMM);
}
static void release_ext_regs_buffers(void)
@@ -451,6 +455,10 @@ static void reserve_ext_regs_buffers(void)
mask |= XFEATURE_MASK_SSE;
if (x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM)
mask |= XFEATURE_MASK_YMM;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_ZMMH)
+ mask |= XFEATURE_MASK_ZMM_Hi256;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM)
+ mask |= XFEATURE_MASK_Hi16_ZMM;
size = xstate_calculate_size(mask, true);
@@ -734,6 +742,13 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.sample_simd_vec_reg_qwords >= PERF_X86_YMM_QWORDS &&
!(x86_pmu.ext_regs_mask & X86_EXT_REGS_YMM))
return -EINVAL;
+ if (event->attr.sample_simd_vec_reg_qwords >= PERF_X86_ZMM_QWORDS &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_ZMMH))
+ return -EINVAL;
+ if ((fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE ||
+ fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE) &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM))
+ return -EINVAL;
}
}
return x86_setup_perfctr(event);
@@ -1851,6 +1866,15 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
perf_regs->ymmh_regs = NULL;
mask |= XFEATURE_MASK_YMM;
}
+ if (attr->sample_simd_vec_reg_qwords >= PERF_X86_ZMM_QWORDS) {
+ perf_regs->zmmh_regs = NULL;
+ mask |= XFEATURE_MASK_ZMM_Hi256;
+ }
+ if (fls64(attr->sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE ||
+ fls64(attr->sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE) {
+ perf_regs->h16zmm_regs = NULL;
+ mask |= XFEATURE_MASK_Hi16_ZMM;
+ }
}
mask &= ~ignore_mask;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7d332d0247ed..cc42e9d3e13d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -702,6 +702,8 @@ enum {
enum {
X86_EXT_REGS_XMM = BIT_ULL(0),
X86_EXT_REGS_YMM = BIT_ULL(1),
+ X86_EXT_REGS_ZMMH = BIT_ULL(2),
+ X86_EXT_REGS_H16ZMM = BIT_ULL(3),
};
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 81e3143fd91a..2d78bd9649bd 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -601,6 +601,14 @@ struct x86_perf_regs {
u64 *ymmh_regs;
struct ymmh_struct *ymmh;
};
+ union {
+ u64 *zmmh_regs;
+ struct avx_512_zmm_uppers_state *zmmh;
+ };
+ union {
+ u64 *h16zmm_regs;
+ struct avx_512_hi16_state *h16zmm;
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index feb3e8f80761..f74e3ba65be2 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -55,12 +55,16 @@ enum perf_event_x86_regs {
#define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
-#define PERF_X86_SIMD_VEC_REGS_MAX 16
+#define PERF_X86_SIMD_VEC_REGS_MAX 32
#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
+#define PERF_X86_H16ZMM_BASE 16
+
#define PERF_X86_XMM_QWORDS 2
#define PERF_X86_YMM_QWORDS 4
#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2)
-#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS
+#define PERF_X86_ZMM_QWORDS 8
+#define PERF_X86_ZMMH_QWORDS (PERF_X86_ZMM_QWORDS / 2)
+#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_ZMM_QWORDS
#endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 37cf0a282915..74e05e2e5c90 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -89,6 +89,12 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
qwords_idx >= PERF_X86_SIMD_QWORDS_MAX))
return 0;
+ if (idx >= PERF_X86_H16ZMM_BASE) {
+ if (!perf_regs->h16zmm_regs)
+ return 0;
+ return perf_regs->h16zmm_regs[idx * PERF_X86_ZMM_QWORDS + qwords_idx];
+ }
+
if (qwords_idx < PERF_X86_XMM_QWORDS) {
if (!perf_regs->xmm_regs)
return 0;
@@ -97,6 +103,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
if (!perf_regs->ymmh_regs)
return 0;
return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PERF_X86_XMM_QWORDS];
+ } else if (qwords_idx < PERF_X86_ZMM_QWORDS) {
+ if (!perf_regs->zmmh_regs)
+ return 0;
+ return perf_regs->zmmh_regs[idx * PERF_X86_ZMMH_QWORDS + qwords_idx - PERF_X86_YMM_QWORDS];
}
return 0;
@@ -114,7 +124,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
return -EINVAL;
} else {
if (vec_qwords != PERF_X86_XMM_QWORDS &&
- vec_qwords != PERF_X86_YMM_QWORDS)
+ vec_qwords != PERF_X86_YMM_QWORDS &&
+ vec_qwords != PERF_X86_ZMM_QWORDS)
return -EINVAL;
if (vec_mask & ~PERF_X86_SIMD_VEC_MASK)
return -EINVAL;
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 10/13] perf/x86: Add OPMASK into sample_simd_pred_reg
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (8 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 09/13] perf/x86: Add ZMM " kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 11/13] perf/x86: Add eGPRs into sample_regs kan.liang
` (2 subsequent siblings)
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The OPMASK is the SIMD's predicate registers. Add them into
sample_simd_pred_reg. The qwords of OPMASK is 1. There are 8 registers.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 15 +++++++++++++++
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/perf_event.h | 4 ++++
arch/x86/include/uapi/asm/perf_regs.h | 3 +++
arch/x86/kernel/perf_regs.c | 15 ++++++++++++---
5 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3c05ca98ec3f..d4710edce2e9 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -427,6 +427,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
perf_regs->zmmh = get_xsave_addr(xsave, XFEATURE_ZMM_Hi256);
if (mask & XFEATURE_MASK_Hi16_ZMM)
perf_regs->h16zmm = get_xsave_addr(xsave, XFEATURE_Hi16_ZMM);
+ if (mask & XFEATURE_MASK_OPMASK)
+ perf_regs->opmask = get_xsave_addr(xsave, XFEATURE_OPMASK);
}
static void release_ext_regs_buffers(void)
@@ -459,6 +461,8 @@ static void reserve_ext_regs_buffers(void)
mask |= XFEATURE_MASK_ZMM_Hi256;
if (x86_pmu.ext_regs_mask & X86_EXT_REGS_H16ZMM)
mask |= XFEATURE_MASK_Hi16_ZMM;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_OPMASK)
+ mask |= XFEATURE_MASK_OPMASK;
size = xstate_calculate_size(mask, true);
@@ -1831,6 +1835,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
data->dyn_size += hweight64(attr->sample_simd_vec_reg_user) *
sizeof(u64) *
attr->sample_simd_vec_reg_qwords;
+ data->dyn_size += hweight32(attr->sample_simd_pred_reg_user) *
+ sizeof(u64) *
+ attr->sample_simd_pred_reg_qwords;
data->regs_user.abi |= PERF_SAMPLE_REGS_ABI_SIMD;
}
perf_regs->abi = data->regs_user.abi;
@@ -1850,6 +1857,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
data->dyn_size += hweight64(attr->sample_simd_vec_reg_intr) *
sizeof(u64) *
attr->sample_simd_vec_reg_qwords;
+ data->dyn_size += hweight32(attr->sample_simd_pred_reg_intr) *
+ sizeof(u64) *
+ attr->sample_simd_pred_reg_qwords;
data->regs_intr.abi |= PERF_SAMPLE_REGS_ABI_SIMD;
}
perf_regs->abi = data->regs_intr.abi;
@@ -1875,6 +1885,11 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
perf_regs->h16zmm_regs = NULL;
mask |= XFEATURE_MASK_Hi16_ZMM;
}
+ if (attr->sample_simd_pred_reg_intr ||
+ attr->sample_simd_pred_reg_user) {
+ perf_regs->opmask_regs = NULL;
+ mask |= XFEATURE_MASK_OPMASK;
+ }
}
mask &= ~ignore_mask;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index cc42e9d3e13d..cc0bd9479fa7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -704,6 +704,7 @@ enum {
X86_EXT_REGS_YMM = BIT_ULL(1),
X86_EXT_REGS_ZMMH = BIT_ULL(2),
X86_EXT_REGS_H16ZMM = BIT_ULL(3),
+ X86_EXT_REGS_OPMASK = BIT_ULL(4),
};
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2d78bd9649bd..dda677022882 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -609,6 +609,10 @@ struct x86_perf_regs {
u64 *h16zmm_regs;
struct avx_512_hi16_state *h16zmm;
};
+ union {
+ u64 *opmask_regs;
+ struct avx_512_opmask_state *opmask;
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index f74e3ba65be2..dd7bd1dd8d39 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -55,11 +55,14 @@ enum perf_event_x86_regs {
#define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
+#define PERF_X86_SIMD_PRED_REGS_MAX 8
+#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
#define PERF_X86_SIMD_VEC_REGS_MAX 32
#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
#define PERF_X86_H16ZMM_BASE 16
+#define PERF_X86_OPMASK_QWORDS 1
#define PERF_X86_XMM_QWORDS 2
#define PERF_X86_YMM_QWORDS 4
#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2)
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 74e05e2e5c90..b569368743a4 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -82,8 +82,14 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx,
{
struct x86_perf_regs *perf_regs = container_of(regs, struct x86_perf_regs, regs);
- if (pred)
- return 0;
+ if (pred) {
+ if (WARN_ON_ONCE(idx >= PERF_X86_SIMD_PRED_REGS_MAX ||
+ qwords_idx >= PERF_X86_OPMASK_QWORDS))
+ return 0;
+ if (!perf_regs->opmask_regs)
+ return 0;
+ return perf_regs->opmask_regs[idx];
+ }
if (WARN_ON_ONCE(idx >= PERF_X86_SIMD_VEC_REGS_MAX ||
qwords_idx >= PERF_X86_SIMD_QWORDS_MAX))
@@ -130,7 +136,10 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
if (vec_mask & ~PERF_X86_SIMD_VEC_MASK)
return -EINVAL;
}
- if (pred_mask)
+
+ if (pred_qwords != PERF_X86_OPMASK_QWORDS)
+ return -EINVAL;
+ if (pred_mask & ~PERF_X86_SIMD_PRED_MASK)
return -EINVAL;
return 0;
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 11/13] perf/x86: Add eGPRs into sample_regs
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (9 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 10/13] perf/x86: Add OPMASK into sample_simd_pred_reg kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 12/13] perf/x86: Add SSP " kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 13/13] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS kan.liang
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The eGPRs is only supported when the new SIMD registers configuration
method is used, which moves the XMM to sample_simd_vec_regs. So the
space can be reclaimed for the eGPRs.
The eGPRs is retrieved by XSAVE. Only support the eGPRs for X86_64.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 41 +++++++++++++++++++++------
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/perf_event.h | 4 +++
arch/x86/include/uapi/asm/perf_regs.h | 26 +++++++++++++++--
arch/x86/kernel/perf_regs.c | 31 ++++++++++----------
5 files changed, 78 insertions(+), 25 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d4710edce2e9..1da18886e1f3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -429,6 +429,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
perf_regs->h16zmm = get_xsave_addr(xsave, XFEATURE_Hi16_ZMM);
if (mask & XFEATURE_MASK_OPMASK)
perf_regs->opmask = get_xsave_addr(xsave, XFEATURE_OPMASK);
+ if (mask & XFEATURE_MASK_APX)
+ perf_regs->egpr = get_xsave_addr(xsave, XFEATURE_APX);
}
static void release_ext_regs_buffers(void)
@@ -463,6 +465,8 @@ static void reserve_ext_regs_buffers(void)
mask |= XFEATURE_MASK_Hi16_ZMM;
if (x86_pmu.ext_regs_mask & X86_EXT_REGS_OPMASK)
mask |= XFEATURE_MASK_OPMASK;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS)
+ mask |= XFEATURE_MASK_APX;
size = xstate_calculate_size(mask, true);
@@ -718,17 +722,33 @@ int x86_pmu_hw_config(struct perf_event *event)
}
if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
- /*
- * Besides the general purpose registers, XMM registers may
- * be collected as well.
- */
- if (event_has_extended_regs(event)) {
- if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS))
+ if (event->attr.sample_simd_regs_enabled) {
+ u64 reserved = ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0);
+
+ if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS))
return -EINVAL;
- if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
+ /*
+ * The XMM space in the perf_event_x86_regs is reclaimed
+ * for eGPRs and other general registers.
+ */
+ if (event->attr.sample_regs_user & reserved ||
+ event->attr.sample_regs_intr & reserved)
return -EINVAL;
- if (event->attr.sample_simd_regs_enabled)
+ if ((event->attr.sample_regs_user & PERF_X86_EGPRS_MASK ||
+ event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS))
return -EINVAL;
+ } else {
+ /*
+ * Besides the general purpose registers, XMM registers may
+ * be collected as well.
+ */
+ if (event_has_extended_regs(event)) {
+ if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS))
+ return -EINVAL;
+ if (!(x86_pmu.ext_regs_mask & X86_EXT_REGS_XMM))
+ return -EINVAL;
+ }
}
if (event_has_simd_regs(event)) {
@@ -1890,6 +1910,11 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
perf_regs->opmask_regs = NULL;
mask |= XFEATURE_MASK_OPMASK;
}
+ if (attr->sample_regs_user & PERF_X86_EGPRS_MASK ||
+ attr->sample_regs_intr & PERF_X86_EGPRS_MASK) {
+ perf_regs->egpr_regs = NULL;
+ mask |= XFEATURE_MASK_APX;
+ }
}
mask &= ~ignore_mask;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index cc0bd9479fa7..4dd1e7344021 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -705,6 +705,7 @@ enum {
X86_EXT_REGS_ZMMH = BIT_ULL(2),
X86_EXT_REGS_H16ZMM = BIT_ULL(3),
X86_EXT_REGS_OPMASK = BIT_ULL(4),
+ X86_EXT_REGS_EGPRS = BIT_ULL(5),
};
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index dda677022882..4400cb66bc8e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -613,6 +613,10 @@ struct x86_perf_regs {
u64 *opmask_regs;
struct avx_512_opmask_state *opmask;
};
+ union {
+ u64 *egpr_regs;
+ struct apx_state *egpr;
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index dd7bd1dd8d39..cd0f6804debf 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -27,11 +27,31 @@ enum perf_event_x86_regs {
PERF_REG_X86_R13,
PERF_REG_X86_R14,
PERF_REG_X86_R15,
+ /* Extended GPRs (EGPRs) */
+ PERF_REG_X86_R16,
+ PERF_REG_X86_R17,
+ PERF_REG_X86_R18,
+ PERF_REG_X86_R19,
+ PERF_REG_X86_R20,
+ PERF_REG_X86_R21,
+ PERF_REG_X86_R22,
+ PERF_REG_X86_R23,
+ PERF_REG_X86_R24,
+ PERF_REG_X86_R25,
+ PERF_REG_X86_R26,
+ PERF_REG_X86_R27,
+ PERF_REG_X86_R28,
+ PERF_REG_X86_R29,
+ PERF_REG_X86_R30,
+ PERF_REG_X86_R31,
/* These are the limits for the GPRs. */
PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
- PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
+ PERF_REG_X86_64_MAX = PERF_REG_X86_R31 + 1,
- /* These all need two bits set because they are 128bit */
+ /*
+ * These all need two bits set because they are 128bit.
+ * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD
+ */
PERF_REG_X86_XMM0 = 32,
PERF_REG_X86_XMM1 = 34,
PERF_REG_X86_XMM2 = 36,
@@ -55,6 +75,8 @@ enum perf_event_x86_regs {
#define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
+#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16)
+
#define PERF_X86_SIMD_PRED_REGS_MAX 8
#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
#define PERF_X86_SIMD_VEC_REGS_MAX 32
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index b569368743a4..3780a7b0e021 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -61,14 +61,22 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
{
struct x86_perf_regs *perf_regs;
- if (idx >= PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) {
+ if (idx > PERF_REG_X86_R15) {
perf_regs = container_of(regs, struct x86_perf_regs, regs);
- /* SIMD registers are moved to dedicated sample_simd_vec_reg */
- if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
- return 0;
- if (!perf_regs->xmm_regs)
- return 0;
- return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0];
+
+ if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+ if (idx <= PERF_REG_X86_R31) {
+ if (!perf_regs->egpr_regs)
+ return 0;
+ return perf_regs->egpr_regs[idx - PERF_REG_X86_R16];
+ }
+ } else {
+ if (idx >= PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) {
+ if (!perf_regs->xmm_regs)
+ return 0;
+ return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0];
+ }
+ }
}
if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset)))
@@ -149,14 +157,7 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
~((1ULL << PERF_REG_X86_MAX) - 1))
#ifdef CONFIG_X86_32
-#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_R8) | \
- (1ULL << PERF_REG_X86_R9) | \
- (1ULL << PERF_REG_X86_R10) | \
- (1ULL << PERF_REG_X86_R11) | \
- (1ULL << PERF_REG_X86_R12) | \
- (1ULL << PERF_REG_X86_R13) | \
- (1ULL << PERF_REG_X86_R14) | \
- (1ULL << PERF_REG_X86_R15))
+#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8)
int perf_reg_validate(u64 mask)
{
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 12/13] perf/x86: Add SSP into sample_regs
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (10 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 11/13] perf/x86: Add eGPRs into sample_regs kan.liang
@ 2025-06-26 19:56 ` kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 13/13] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS kan.liang
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The SSP is only supported when the new SIMD registers configuration
method is used, which moves the XMM to sample_simd_vec_regs. So the
space can be reclaimed for the SSP.
The SSP is retrieved by XSAVE. Only support the SSP for X86_64.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/core.c | 16 +++++++++++++++-
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/perf_event.h | 4 ++++
arch/x86/include/uapi/asm/perf_regs.h | 3 +++
arch/x86/kernel/perf_regs.c | 8 +++++++-
5 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1da18886e1f3..b35b5695e42f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -431,6 +431,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
perf_regs->opmask = get_xsave_addr(xsave, XFEATURE_OPMASK);
if (mask & XFEATURE_MASK_APX)
perf_regs->egpr = get_xsave_addr(xsave, XFEATURE_APX);
+ if (mask & XFEATURE_MASK_CET_USER)
+ perf_regs->cet = get_xsave_addr(xsave, XFEATURE_CET_USER);
}
static void release_ext_regs_buffers(void)
@@ -467,6 +469,8 @@ static void reserve_ext_regs_buffers(void)
mask |= XFEATURE_MASK_OPMASK;
if (x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS)
mask |= XFEATURE_MASK_APX;
+ if (x86_pmu.ext_regs_mask & X86_EXT_REGS_CET)
+ mask |= XFEATURE_MASK_CET_USER;
size = xstate_calculate_size(mask, true);
@@ -723,7 +727,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) {
if (event->attr.sample_simd_regs_enabled) {
- u64 reserved = ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0);
+ u64 reserved = ~GENMASK_ULL(PERF_REG_MISC_MAX - 1, 0);
if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS))
return -EINVAL;
@@ -738,6 +742,11 @@ int x86_pmu_hw_config(struct perf_event *event)
event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) &&
!(x86_pmu.ext_regs_mask & X86_EXT_REGS_EGPRS))
return -EINVAL;
+ if ((event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) ||
+ event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) &&
+ !(x86_pmu.ext_regs_mask & X86_EXT_REGS_CET))
+ return -EINVAL;
+
} else {
/*
* Besides the general purpose registers, XMM registers may
@@ -1915,6 +1924,11 @@ void x86_pmu_setup_regs_data(struct perf_event *event,
perf_regs->egpr_regs = NULL;
mask |= XFEATURE_MASK_APX;
}
+ if (attr->sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) ||
+ attr->sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) {
+ perf_regs->cet_regs = NULL;
+ mask |= XFEATURE_MASK_CET_USER;
+ }
}
mask &= ~ignore_mask;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 4dd1e7344021..1d958059db07 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -706,6 +706,7 @@ enum {
X86_EXT_REGS_H16ZMM = BIT_ULL(3),
X86_EXT_REGS_OPMASK = BIT_ULL(4),
X86_EXT_REGS_EGPRS = BIT_ULL(5),
+ X86_EXT_REGS_CET = BIT_ULL(6),
};
#define PERF_PEBS_DATA_SOURCE_MAX 0x100
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4400cb66bc8e..28ddff38d232 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -617,6 +617,10 @@ struct x86_perf_regs {
u64 *egpr_regs;
struct apx_state *egpr;
};
+ union {
+ u64 *cet_regs;
+ struct cet_user_state *cet;
+ };
};
extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs);
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index cd0f6804debf..4d88cb18acb9 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -48,6 +48,9 @@ enum perf_event_x86_regs {
PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
PERF_REG_X86_64_MAX = PERF_REG_X86_R31 + 1,
+ PERF_REG_X86_SSP,
+ PERF_REG_MISC_MAX = PERF_REG_X86_SSP + 1,
+
/*
* These all need two bits set because they are 128bit.
* These are only available when !PERF_SAMPLE_REGS_ABI_SIMD
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 3780a7b0e021..f985765a799a 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -70,6 +70,11 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
return 0;
return perf_regs->egpr_regs[idx - PERF_REG_X86_R16];
}
+ if (idx == PERF_REG_X86_SSP) {
+ if (!perf_regs->cet_regs)
+ return 0;
+ return perf_regs->cet_regs[1];
+ }
} else {
if (idx >= PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) {
if (!perf_regs->xmm_regs)
@@ -157,7 +162,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask,
~((1ULL << PERF_REG_X86_MAX) - 1))
#ifdef CONFIG_X86_32
-#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8)
+#define REG_NOSUPPORT (GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) | \
+ BIT_ULL(PERF_REG_X86_SSP))
int perf_reg_validate(u64 mask)
{
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [RFC PATCH V2 13/13] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
` (11 preceding siblings ...)
2025-06-26 19:56 ` [RFC PATCH V2 12/13] perf/x86: Add SSP " kan.liang
@ 2025-06-26 19:56 ` kan.liang
12 siblings, 0 replies; 21+ messages in thread
From: kan.liang @ 2025-06-26 19:56 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria,
Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Enable PERF_PMU_CAP_SIMD_REGS if there is XSAVES support for YMM, ZMM,
OPMASK, eGPRs, or SSP.
Disable large PEBS for these registers since PEBS HW doesn't support
them yet.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/intel/core.c | 46 ++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c73c2e57d71b..8dc638f9efd2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4034,8 +4034,30 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event)
flags &= ~PERF_SAMPLE_TIME;
if (!event->attr.exclude_kernel)
flags &= ~PERF_SAMPLE_REGS_USER;
- if (event->attr.sample_regs_user & ~PEBS_GP_REGS)
- flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR);
+ if (event->attr.sample_simd_regs_enabled) {
+ u64 nolarge = PERF_X86_EGPRS_MASK | BIT_ULL(PERF_REG_X86_SSP);
+
+ /*
+ * PEBS HW can only collect the XMM0-XMM15 for now.
+ * Disable large PEBS for other vector registers, predicate
+ * registers, eGPRs, and SSP.
+ */
+ if (event->attr.sample_regs_user & nolarge ||
+ fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE ||
+ event->attr.sample_simd_pred_reg_user)
+ flags &= ~PERF_SAMPLE_REGS_USER;
+
+ if (event->attr.sample_regs_intr & nolarge ||
+ fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE ||
+ event->attr.sample_simd_pred_reg_intr)
+ flags &= ~PERF_SAMPLE_REGS_INTR;
+
+ if (event->attr.sample_simd_vec_reg_qwords > PERF_X86_XMM_QWORDS)
+ flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR);
+ } else {
+ if (event->attr.sample_regs_user & ~PEBS_GP_REGS)
+ flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR);
+ }
return flags;
}
@@ -5296,6 +5318,26 @@ static void intel_extended_regs_init(struct pmu *pmu)
x86_pmu.ext_regs_mask |= X86_EXT_REGS_XMM;
x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
+
+ if (boot_cpu_has(X86_FEATURE_AVX) &&
+ cpu_has_xfeatures(XFEATURE_MASK_YMM, NULL))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_YMM;
+ if (boot_cpu_has(X86_FEATURE_APX) &&
+ cpu_has_xfeatures(XFEATURE_MASK_APX, NULL))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_EGPRS;
+ if (boot_cpu_has(X86_FEATURE_AVX512F)) {
+ if (cpu_has_xfeatures(XFEATURE_MASK_OPMASK, NULL))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_OPMASK;
+ if (cpu_has_xfeatures(XFEATURE_MASK_ZMM_Hi256, NULL))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_ZMMH;
+ if (cpu_has_xfeatures(XFEATURE_MASK_Hi16_ZMM, NULL))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_H16ZMM;
+ }
+ if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
+ x86_pmu.ext_regs_mask |= X86_EXT_REGS_CET;
+
+ if (x86_pmu.ext_regs_mask != X86_EXT_REGS_XMM)
+ x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_SIMD_REGS;
}
static void update_pmu_cap(struct pmu *pmu)
--
2.38.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER
2025-06-26 19:56 ` [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER kan.liang
@ 2025-06-27 14:35 ` Dave Hansen
2025-06-27 21:23 ` Liang, Kan
0 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2025-06-27 14:35 UTC (permalink / raw)
To: kan.liang, peterz, mingo, acme, namhyung, tglx, dave.hansen,
irogers, adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria
On 6/26/25 12:56, kan.liang@linux.intel.com wrote:
> +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
> +{
> + struct xregs_state *xsave = per_cpu(ext_regs_buf, smp_processor_id());
> +
> + if (WARN_ON_ONCE(!xsave))
> + return;
> +
> + xsaves_nmi(xsave, mask);
This makes me a little nervous.
Could we maybe keep a mask around that reminds us what 'ext_regs_buf'
was sized for and then ensure that no bits in the passed-in mask are set
in that?
I almost wonder if you want to add a
struct fpu_state_config fpu_perf_cfg;
I guess it's mostly overkill for this. But please do have a look at the
data structures in:
arch/x86/include/asm/fpu/types.h
> + if (mask & XFEATURE_MASK_SSE &&
> + xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE))
> + perf_regs->xmm_space = xsave->i387.xmm_space;
> +}
There's a lot going on here.
'mask' and 'xfeatures' have the exact same format. Why use
XFEATURE_MASK_SSE for one and BIT_ULL(XFEATURE_SSE) for the other?
Why check both? How could a bit get into 'xfeatures' without being in
'mask'?
How does the caller handle the fact that ->xmm_space might be written or
not?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER
2025-06-27 14:35 ` Dave Hansen
@ 2025-06-27 21:23 ` Liang, Kan
0 siblings, 0 replies; 21+ messages in thread
From: Liang, Kan @ 2025-06-27 21:23 UTC (permalink / raw)
To: Dave Hansen, peterz, mingo, acme, namhyung, tglx, dave.hansen,
irogers, adrian.hunter, jolsa, alexander.shishkin, linux-kernel
Cc: dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria
On 2025-06-27 10:35 a.m., Dave Hansen wrote:
> On 6/26/25 12:56, kan.liang@linux.intel.com wrote:
>> +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
>> +{
>> + struct xregs_state *xsave = per_cpu(ext_regs_buf, smp_processor_id());
>> +
>> + if (WARN_ON_ONCE(!xsave))
>> + return;
>> +
>> + xsaves_nmi(xsave, mask);
>
> This makes me a little nervous.
>
> Could we maybe keep a mask around that reminds us what 'ext_regs_buf'
> was sized for and then ensure that no bits in the passed-in mask are set
> in that?
>
The x86_pmu.ext_regs_mask tracks the available bits of
x86_pmu.ext_regs_buf. But it has its own format.
I will make it use the XSAVE format, and add a check here.
> I almost wonder if you want to add a
>
> struct fpu_state_config fpu_perf_cfg;
>
> I guess it's mostly overkill for this. But please do have a look at the
> data structures in:
>
> arch/x86/include/asm/fpu/types.h
>
It looks overkill. The perf usage is simple. It should be good enough to
have one mask to track the available bits. The size is from FPU's
xstate_calculate_size(). I think, as long as perf inputs the correct
mask, the size can be trusted.
>> + if (mask & XFEATURE_MASK_SSE &&
>> + xsave->header.xfeatures & BIT_ULL(XFEATURE_SSE))
>> + perf_regs->xmm_space = xsave->i387.xmm_space;
>> +}
>
> There's a lot going on here.
>
> 'mask' and 'xfeatures' have the exact same format. Why use
> XFEATURE_MASK_SSE for one and BIT_ULL(XFEATURE_SSE) for the other?
>
Ah, my bad. The same XFEATURE_MASK_SSE should be used.
> Why check both? How could a bit get into 'xfeatures' without being in
> 'mask'?
The 'mask' is what perf wants/configures. I think the 'xfeatures' is
what XSAVE really gives. I'm not quite sure if HW can always give us
everything we configured. If not, I think both checks are required.
I'm thinking to add the below first.
valid_mask = x86_pmu.ext_regs_mask & mask & xsave->header.xfeatures;
Then only use the valid_mask to check each XFEATURE.
>
> How does the caller handle the fact that ->xmm_space might be written or
> not?
>
For this series, the returned XMM value is zeroed if the ->xmm_space is
NULL.
But I should clear the nr_vectors. So nothing will be dumped to the
userspace if the ->xmm_space is not available. I will address it in V3.
Thanks,
Kan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi
2025-06-26 19:56 ` [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi kan.liang
@ 2025-07-02 0:18 ` Chang S. Bae
2025-07-07 18:12 ` Liang, Kan
0 siblings, 1 reply; 21+ messages in thread
From: Chang S. Bae @ 2025-07-02 0:18 UTC (permalink / raw)
To: kan.liang
Cc: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel,
dapeng1.mi, ak, zide.chen, mark.rutland, broonie, ravi.bangoria
On 6/26/2025 12:56 PM, kan.liang@linux.intel.com wrote:
<snip>> Add an interface to retrieve the actual register contents when
the NMI
> hit. The interface is different from the other interfaces of FPU. The
> other mechanisms that deal with xstate try to get something coherent.
> But this interface is *in*coherent. There's no telling what was in the
> registers when a NMI hits. It writes whatever was in the registers when
> the NMI hit. It's the invoker's responsibility to make sure the contents
> are properly filtered before exposing them to the end user.
<snip>
>
> +/**
> + * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI
> + * @xstate: Pointer to the buffer
> + * @mask: Feature mask to select the components to save
> + *
> + * The @xstate buffer must be 64 byte aligned.
> + *
> + * Caution: The interface is different from the other interfaces of FPU.
> + * The other mechanisms that deal with xstate try to get something coherent.
> + * But this interface is *in*coherent. There's no telling what was in the
> + * registers when a NMI hits. It writes whatever was in the registers when
> + * the NMI hit.
> + * The only user for the interface is perf_event. There is already a
> + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
> + * "snapshots" from random code running. This just provides another XSAVE
> + * data source at a random time.
> + * This function can only be invoked in an NMI. It returns the *ACTUAL*
> + * register contents when the NMI hit.
> + */
> +void xsaves_nmi(struct xregs_state *xstate, u64 mask)
> +{
> + int err;
> +
> + if (!in_nmi())
> + return;
> +
> + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
> + WARN_ON_ONCE(err);
> +}
> +
There are xsaves()/xrstors() functions, already narrowed to the
"independent" feature set only. So, adding a new xsaves_yyy() variant
for a different use case -- without renaming the existing helpers to
something like xsaves_xxx() -- would make the naming scheme appear
inconsistent at a glance.
But looking back at history:
1. These helpers were established with "independent" in the name (though
they were initially described as for “dynamic” features):
copy_kernel_to_independent_supervisor()/
copy_independent_supervisor_to_kernel()
2. Later, Thomas reworked them, renaming and simplifying them to
xsaves()/xrstors(), and adding a refactored validator:
validate_xsaves_xrstors() [1]. At that point, their usage was
*relaxed* and not strictly limited to independent features.
3. Subsequently, in preparation for dynamic feature support, the helpers
were restricted again to independent features only [2]. This involved
renaming and enforcing stricter validation via
validate_independent_components().
Given that, rather than introducing a new wrapper for every additional
use case, another option could be to retain xsaves() naming but modestly
expand its scope. That would mean to add another allowance: features in
tightly constrained contexts (e.g., NMI). Perhaps, this approach can
keep the API simple while still expanding usage.
[1] a75c52896b6d ("x86/fpu/xstate: Sanitize handling of independent
features")
[2] f5daf836f292 ("x86/fpu: Restrict xsaves()/xrstors() to independent
states")
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 06/13] perf: Support SIMD registers
2025-06-26 19:56 ` [RFC PATCH V2 06/13] perf: Support SIMD registers kan.liang
@ 2025-07-02 11:16 ` Mark Brown
2025-07-07 18:12 ` Liang, Kan
0 siblings, 1 reply; 21+ messages in thread
From: Mark Brown @ 2025-07-02 11:16 UTC (permalink / raw)
To: kan.liang
Cc: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel,
dapeng1.mi, ak, zide.chen, mark.rutland, ravi.bangoria
[-- Attachment #1: Type: text/plain, Size: 695 bytes --]
On Thu, Jun 26, 2025 at 12:56:03PM -0700, kan.liang@linux.intel.com wrote:
> * { u64 abi; # enum perf_sample_regs_abi
> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> + * u64 regs[weight(mask)];
> + * struct {
> + * u16 nr_vectors;
> + * u16 vector_qwords;
> + * u16 nr_pred;
> + * u16 pred_qwords;
> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> + * } && PERF_SAMPLE_REGS_USER
I'm not super familiar with perf but I think this should work for arm64,
it supplies the vector length through the _qwords and we can handle FFR
being optional by varying the number of predicate registers.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi
2025-07-02 0:18 ` Chang S. Bae
@ 2025-07-07 18:12 ` Liang, Kan
2025-07-07 18:17 ` Dave Hansen
0 siblings, 1 reply; 21+ messages in thread
From: Liang, Kan @ 2025-07-07 18:12 UTC (permalink / raw)
To: Chang S. Bae, Dave Hansen, Thomas Gleixner
Cc: peterz, mingo, acme, namhyung, irogers, adrian.hunter, jolsa,
alexander.shishkin, linux-kernel, dapeng1.mi, ak, zide.chen,
mark.rutland, broonie, ravi.bangoria
On 2025-07-01 5:18 p.m., Chang S. Bae wrote:
> On 6/26/2025 12:56 PM, kan.liang@linux.intel.com wrote:
> <snip>> Add an interface to retrieve the actual register contents when
> the NMI
>> hit. The interface is different from the other interfaces of FPU. The
>> other mechanisms that deal with xstate try to get something coherent.
>> But this interface is *in*coherent. There's no telling what was in the
>> registers when a NMI hits. It writes whatever was in the registers when
>> the NMI hit. It's the invoker's responsibility to make sure the contents
>> are properly filtered before exposing them to the end user.
>
> <snip>
>
>> +/**
>> + * xsaves_nmi - Save selected components to a kernel xstate buffer in
>> NMI
>> + * @xstate: Pointer to the buffer
>> + * @mask: Feature mask to select the components to save
>> + *
>> + * The @xstate buffer must be 64 byte aligned.
>> + *
>> + * Caution: The interface is different from the other interfaces of FPU.
>> + * The other mechanisms that deal with xstate try to get something
>> coherent.
>> + * But this interface is *in*coherent. There's no telling what was in
>> the
>> + * registers when a NMI hits. It writes whatever was in the registers
>> when
>> + * the NMI hit.
>> + * The only user for the interface is perf_event. There is already a
>> + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
>> + * "snapshots" from random code running. This just provides another
>> XSAVE
>> + * data source at a random time.
>> + * This function can only be invoked in an NMI. It returns the *ACTUAL*
>> + * register contents when the NMI hit.
>> + */
>> +void xsaves_nmi(struct xregs_state *xstate, u64 mask)
>> +{
>> + int err;
>> +
>> + if (!in_nmi())
>> + return;
>> +
>> + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
>> + WARN_ON_ONCE(err);
>> +}
>> +
> There are xsaves()/xrstors() functions, already narrowed to the
> "independent" feature set only. So, adding a new xsaves_yyy() variant
> for a different use case -- without renaming the existing helpers to
> something like xsaves_xxx() -- would make the naming scheme appear
> inconsistent at a glance.
>
> But looking back at history:
>
> 1. These helpers were established with "independent" in the name (though
> they were initially described as for “dynamic” features):
> copy_kernel_to_independent_supervisor()/
> copy_independent_supervisor_to_kernel()
>
> 2. Later, Thomas reworked them, renaming and simplifying them to
> xsaves()/xrstors(), and adding a refactored validator:
> validate_xsaves_xrstors() [1]. At that point, their usage was
> *relaxed* and not strictly limited to independent features.
>
> 3. Subsequently, in preparation for dynamic feature support, the helpers
> were restricted again to independent features only [2]. This involved
> renaming and enforcing stricter validation via
> validate_independent_components().
>
> Given that, rather than introducing a new wrapper for every additional
> use case, another option could be to retain xsaves() naming but modestly
> expand its scope. That would mean to add another allowance: features in
> tightly constrained contexts (e.g., NMI). Perhaps, this approach can
> keep the API simple while still expanding usage.
>
So we need to add a parameter, e.g., nmi. For the NMI case, the limit to
the independent features should be removed, right?
void xsaves(struct xregs_state *xsave, u64 mask, bool nmi);
The only user for the xsaves is LBR. It should not be a problem to
update the interface.
But perf only needs the xsaves for the SIMD and other registers. So the
parameter will only be added for the xsaves(). The xsaves()/xrstore()
will not be symmetrical anymore. I'm not sure if it's a problem.
Dave? Thomas? Any comments?
Should we extend the existing xsaves() interface or adding a new
xsaves_nmi() interface for the perf usage?
Thanks,
Kan
> [1] a75c52896b6d ("x86/fpu/xstate: Sanitize handling of independent
> features")
> [2] f5daf836f292 ("x86/fpu: Restrict xsaves()/xrstors() to independent
> states")
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 06/13] perf: Support SIMD registers
2025-07-02 11:16 ` Mark Brown
@ 2025-07-07 18:12 ` Liang, Kan
0 siblings, 0 replies; 21+ messages in thread
From: Liang, Kan @ 2025-07-07 18:12 UTC (permalink / raw)
To: Mark Brown
Cc: peterz, mingo, acme, namhyung, tglx, dave.hansen, irogers,
adrian.hunter, jolsa, alexander.shishkin, linux-kernel,
dapeng1.mi, ak, zide.chen, mark.rutland, ravi.bangoria
On 2025-07-02 4:16 a.m., Mark Brown wrote:
> On Thu, Jun 26, 2025 at 12:56:03PM -0700, kan.liang@linux.intel.com wrote:
>
>> * { u64 abi; # enum perf_sample_regs_abi
>> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
>> + * u64 regs[weight(mask)];
>> + * struct {
>> + * u16 nr_vectors;
>> + * u16 vector_qwords;
>> + * u16 nr_pred;
>> + * u16 pred_qwords;
>> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
>> + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
>> + * } && PERF_SAMPLE_REGS_USER
>
> I'm not super familiar with perf but I think this should work for arm64,
> it supplies the vector length through the _qwords and we can handle FFR
> being optional by varying the number of predicate registers.
That's great. Thanks for the confirmation.
Thanks,
Kan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi
2025-07-07 18:12 ` Liang, Kan
@ 2025-07-07 18:17 ` Dave Hansen
0 siblings, 0 replies; 21+ messages in thread
From: Dave Hansen @ 2025-07-07 18:17 UTC (permalink / raw)
To: Liang, Kan, Chang S. Bae, Dave Hansen, Thomas Gleixner
Cc: peterz, mingo, acme, namhyung, irogers, adrian.hunter, jolsa,
alexander.shishkin, linux-kernel, dapeng1.mi, ak, zide.chen,
mark.rutland, broonie, ravi.bangoria
On 7/7/25 11:12, Liang, Kan wrote:
> Should we extend the existing xsaves() interface or adding a new
> xsaves_nmi() interface for the perf usage?
I think we should just add a new one. The perf usage is totally unique.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-07 18:14 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 19:55 [RFC PATCH V2 00/13] Support vector and more extended registers in perf kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 01/13] perf/x86: Use x86_perf_regs in the x86 nmi handler kan.liang
2025-06-26 19:55 ` [RFC PATCH V2 02/13] perf/x86: Setup the regs data kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 03/13] x86/fpu/xstate: Add xsaves_nmi kan.liang
2025-07-02 0:18 ` Chang S. Bae
2025-07-07 18:12 ` Liang, Kan
2025-07-07 18:17 ` Dave Hansen
2025-06-26 19:56 ` [RFC PATCH V2 04/13] perf: Move has_extended_regs() to header file kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 05/13] perf/x86: Support XMM register for non-PEBS and REGS_USER kan.liang
2025-06-27 14:35 ` Dave Hansen
2025-06-27 21:23 ` Liang, Kan
2025-06-26 19:56 ` [RFC PATCH V2 06/13] perf: Support SIMD registers kan.liang
2025-07-02 11:16 ` Mark Brown
2025-07-07 18:12 ` Liang, Kan
2025-06-26 19:56 ` [RFC PATCH V2 07/13] perf/x86: Move XMM to sample_simd_vec_regs kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 08/13] perf/x86: Add YMM into sample_simd_vec_regs kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 09/13] perf/x86: Add ZMM " kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 10/13] perf/x86: Add OPMASK into sample_simd_pred_reg kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 11/13] perf/x86: Add eGPRs into sample_regs kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 12/13] perf/x86: Add SSP " kan.liang
2025-06-26 19:56 ` [RFC PATCH V2 13/13] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS kan.liang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox