Linux Perf Users
 help / color / mirror / Atom feed
* [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling
@ 2026-05-29  8:24 Dapeng Mi
  2026-05-29  8:24 ` [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Dapeng Mi
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi

Changes since v7:
- Address Sashiko's comments including,
  * documents are not update availably; SIMD registers bitmask may be
    overwritten by shorter mask; u16 type in sample meta-data break the
    cross-endian conversion, etc.
- Add SIMD/eGPRs/SSP support in dwarf-regs (Ian).
- Misc enhancements.

Changes since v6:
- Address Ian's comments including,
  * Add more comments for newly added register indexes (patch 1/4).
  * Several minor enhancements in patch 4/4.
- Limit dwarf minimal regs to legacy GPRs (excluding APX eGPRs)
  (patch 2/4).

Changes since v5:
- Split the sampling support for eGPRs/SSP registers and SIMD registers
  into 2 patches.
- Address Ian's comments including,
  * Convert the architecture dependent functions into regular
    architectural independent functions, like whatperf_reg_name() does.
  * Refine the functions name to represent what the functions really do.
  * Add comments for some key functions arguments.
  * Misc enhancements.

This patch-set adds sampling support for x86 eGPRs/SSP/SIMD registers in
perf tools base on the corresponding sampling support for eGPRs/SSP/SIMD
registers in kernel[1]. In versions before v6, these perf-tools patches
are integrated as a whole patch-set with the kernel patches, but it's
split and posted to an independent perf-tools patch-set for convenient
review.

History:
v7: https://lore.kernel.org/all/20260324005706.3778057-1-dapeng1.mi@linux.intel.com/
v6: https://lore.kernel.org/all/20260209083514.2225115-1-dapeng1.mi@linux.intel.com/
v5: https://lore.kernel.org/all/20251203065500.2597594-1-dapeng1.mi@linux.intel.com/
v4: https://lore.kernel.org/all/20250925061213.178796-1-dapeng1.mi@linux.intel.com/
v3: https://lore.kernel.org/lkml/20250815213435.1702022-1-kan.liang@linux.intel.com/
v2: https://lore.kernel.org/lkml/20250626195610.405379-1-kan.liang@linux.intel.com/
v1: https://lore.kernel.org/lkml/20250613134943.3186517-1-kan.liang@linux.intel.com/

Ref:
[1] Kernel patches of supporting eGPRs/SSP/SIMD registers sampling:
https://lore.kernel.org/all/20260529075645.580362-1-dapeng1.mi@linux.intel.com/


Dapeng Mi (5):
  perf headers: Sync perf_event.h/perf_regs.h with the kernel headers
  perf regs: Support x86 eGPRs/SSP sampling
  perf regs: Support x86 SIMD registers sampling
  perf regs: Enable dumping of SIMD registers
  perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers

 tools/arch/x86/include/uapi/asm/perf_regs.h   |  51 +++
 tools/include/uapi/linux/perf_event.h         |  49 ++-
 tools/perf/Documentation/perf-record.txt      |  10 +-
 tools/perf/builtin-inject.c                   |   7 +
 tools/perf/builtin-script.c                   |   2 +-
 .../util/dwarf-regs-arch/dwarf-regs-x86.c     | 138 +++++-
 tools/perf/util/dwarf-regs.c                  |   7 +-
 tools/perf/util/evsel.c                       | 104 ++++-
 tools/perf/util/header.c                      |   3 +-
 tools/perf/util/include/dwarf-regs.h          |   7 +-
 tools/perf/util/intel-pt.c                    |   1 +
 tools/perf/util/parse-regs-options.c          | 184 +++++++-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 416 +++++++++++++++++-
 tools/perf/util/perf_event_attr_fprintf.c     |   6 +
 tools/perf/util/perf_regs.c                   |  84 +++-
 tools/perf/util/perf_regs.h                   |  21 +-
 tools/perf/util/record.h                      |   7 +
 tools/perf/util/sample.h                      |   5 +
 .../scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/session.c                     |  91 +++-
 tools/perf/util/synthetic-events.c            |  28 ++
 tools/perf/util/unwind-libdw.c                |   6 +-
 22 files changed, 1143 insertions(+), 86 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers
  2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
@ 2026-05-29  8:24 ` Dapeng Mi
  2026-05-29  9:23   ` sashiko-bot
  2026-05-29  8:24 ` [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi, Kan Liang

Sync the UAPI header changes of supporting SIMD/eGPRs/SSP sampling into
corresponding tools UAPI headers.

Additionally, add sanity check if the new introduced __reserved_4 field
in perf_attr_check().

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/arch/x86/include/uapi/asm/perf_regs.h | 51 +++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h       | 49 ++++++++++++++++++--
 tools/perf/util/header.c                    |  3 +-
 3 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/include/uapi/asm/perf_regs.h
index 7c9d2bb3833b..31a025cb9dba 100644
--- a/tools/arch/x86/include/uapi/asm/perf_regs.h
+++ b/tools/arch/x86/include/uapi/asm/perf_regs.h
@@ -27,9 +27,35 @@ enum perf_event_x86_regs {
 	PERF_REG_X86_R13,
 	PERF_REG_X86_R14,
 	PERF_REG_X86_R15,
+	/*
+	 * The eGPRs/SSP and XMM have overlaps. Only one can be used
+	 * at a time. The ABI PERF_SAMPLE_REGS_ABI_SIMD is used to
+	 * distinguish which one is used. If PERF_SAMPLE_REGS_ABI_SIMD
+	 * is set, then eGPRs/SSP is used, otherwise, XMM is used.
+	 *
+	 * Extended GPRs (eGPRs)
+	 */
+	PERF_REG_X86_R16,
+	PERF_REG_X86_R17,
+	PERF_REG_X86_R18,
+	PERF_REG_X86_R19,
+	PERF_REG_X86_R20,
+	PERF_REG_X86_R21,
+	PERF_REG_X86_R22,
+	PERF_REG_X86_R23,
+	PERF_REG_X86_R24,
+	PERF_REG_X86_R25,
+	PERF_REG_X86_R26,
+	PERF_REG_X86_R27,
+	PERF_REG_X86_R28,
+	PERF_REG_X86_R29,
+	PERF_REG_X86_R30,
+	PERF_REG_X86_R31,
+	PERF_REG_X86_SSP,
 	/* These are the limits for the GPRs. */
 	PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
 	PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
+	PERF_REG_MISC_MAX = PERF_REG_X86_SSP + 1,
 
 	/* These all need two bits set because they are 128bit */
 	PERF_REG_X86_XMM0  = 32,
@@ -54,5 +80,30 @@ enum perf_event_x86_regs {
 };
 
 #define PERF_REG_EXTENDED_MASK	(~((1ULL << PERF_REG_X86_XMM0) - 1))
+#define PERF_X86_EGPRS_MASK	__GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16)
+
+enum {
+	PERF_X86_SIMD_XMM_REGS      = 16,
+	PERF_X86_SIMD_YMM_REGS      = 16,
+	PERF_X86_SIMD_ZMM_REGS      = 32,
+	PERF_X86_SIMD_VEC_REGS_MAX  = PERF_X86_SIMD_ZMM_REGS,
+
+	PERF_X86_SIMD_OPMASK_REGS   = 8,
+	PERF_X86_SIMD_PRED_REGS_MAX = PERF_X86_SIMD_OPMASK_REGS,
+};
+
+#define PERF_X86_SIMD_PRED_MASK	__GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
+#define PERF_X86_SIMD_VEC_MASK	__GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
+
+#define PERF_X86_H16ZMM_BASE		16
+
+enum {
+	/* 1 qword = 8 bytes */
+	PERF_X86_OPMASK_QWORDS   = 1,
+	PERF_X86_XMM_QWORDS      = 2,
+	PERF_X86_YMM_QWORDS      = 4,
+	PERF_X86_ZMM_QWORDS      = 8,
+	PERF_X86_SIMD_QWORDS_MAX = PERF_X86_ZMM_QWORDS,
+};
 
 #endif /* _ASM_X86_PERF_REGS_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index fd10aa8d697f..c49fc76292f7 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -314,8 +314,9 @@ enum {
  */
 enum perf_sample_regs_abi {
 	PERF_SAMPLE_REGS_ABI_NONE		= 0,
-	PERF_SAMPLE_REGS_ABI_32			= 1,
-	PERF_SAMPLE_REGS_ABI_64			= 2,
+	PERF_SAMPLE_REGS_ABI_32			= (1 << 0),
+	PERF_SAMPLE_REGS_ABI_64			= (1 << 1),
+	PERF_SAMPLE_REGS_ABI_SIMD		= (1 << 2),
 };
 
 /*
@@ -383,6 +384,7 @@ enum perf_event_read_format {
 #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
 #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
 #define PERF_ATTR_SIZE_VER9			144	/* add: config4 */
+#define PERF_ATTR_SIZE_VER10			176	/* Add: sample_simd_{vec|pred}_reg_* */
 
 /*
  * 'struct perf_event_attr' contains various attributes that define
@@ -547,6 +549,29 @@ struct perf_event_attr {
 
 	__u64	config3; /* extension of config2 */
 	__u64	config4; /* extension of config3 */
+
+	/*
+	 * Defines the sampling SIMD/PRED(predicate) registers bitmap and
+	 * qwords (8 bytes) length.
+	 *
+	 * sample_simd_regs_enabled != 0 indicates there are SIMD/PRED
+	 * registers to be sampled, the SIMD/PRED registers bitmap and
+	 * qwords length are represented in
+	 * sample_simd_{vec|pred}_reg_{intr|user} and
+	 * sample_simd_{vec|pred}_reg_qwords fields separately.
+	 *
+	 * sample_simd_regs_enabled == 0 indicates no SIMD/PRED registers
+	 * are sampled.
+	 */
+	__u16	sample_simd_regs_enabled;
+	__u16	sample_simd_pred_reg_qwords;
+	__u16	sample_simd_vec_reg_qwords;
+	__u16	__reserved_4;
+
+	__u32	sample_simd_pred_reg_intr;
+	__u32	sample_simd_pred_reg_user;
+	__u64	sample_simd_vec_reg_intr;
+	__u64	sample_simd_vec_reg_user;
 };
 
 /*
@@ -1020,7 +1045,15 @@ enum perf_event_type {
 	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 *	{ u64			abi; # enum perf_sample_regs_abi
-	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
+	 *	  u64			regs[weight(mask)];
+	 *	  struct {
+	 *		u64 nr_vectors;		# 0 ... weight(sample_simd_vec_reg_user)
+	 *		u64 vector_qwords;	# 0 ... sample_simd_vec_reg_qwords
+	 *		u64 nr_pred;		# 0 ... weight(sample_simd_pred_reg_user)
+	 *		u64 pred_qwords;	# 0 ... sample_simd_pred_reg_qwords
+	 *		u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+	 *	  } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+	 *	} && PERF_SAMPLE_REGS_USER
 	 *
 	 *	{ u64			size;
 	 *	  char			data[size];
@@ -1047,7 +1080,15 @@ enum perf_event_type {
 	 *	{ u64			data_src; } && PERF_SAMPLE_DATA_SRC
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
 	 *	{ u64			abi; # enum perf_sample_regs_abi
-	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+	 *	  u64			regs[weight(mask)];
+	 *	  struct {
+	 *		u64 nr_vectors;		# 0 ... weight(sample_simd_vec_reg_intr)
+	 *		u64 vector_qwords;	# 0 ... sample_simd_vec_reg_qwords
+	 *		u64 nr_pred;		# 0 ... weight(sample_simd_pred_reg_intr)
+	 *		u64 pred_qwords;	# 0 ... sample_simd_pred_reg_qwords
+	 *		u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+	 *	  } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+	 *	} && PERF_SAMPLE_REGS_INTR
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
 	 *	{ u64			cgroup;} && PERF_SAMPLE_CGROUP
 	 *	{ u64			data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..e8e4e00d6b4d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2091,7 +2091,8 @@ static void free_event_desc(struct evsel *events)
 
 static bool perf_attr_check(struct perf_event_attr *attr)
 {
-	if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) {
+	if (attr->__reserved_1 || attr->__reserved_2 ||
+	    attr->__reserved_3 || attr->__reserved_4) {
 		pr_warning("Reserved bits are set unexpectedly. "
 			   "Please update perf tool.\n");
 		return false;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling
  2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
  2026-05-29  8:24 ` [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Dapeng Mi
@ 2026-05-29  8:24 ` Dapeng Mi
  2026-05-29 10:06   ` sashiko-bot
  2026-05-29  8:24 ` [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling Dapeng Mi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi

This patch adds support for sampling x86 extended GP registers (R16-R31)
and the shadow stack pointer (SSP) register.

The original XMM registers space in sample_regs_user/sample_regs_intr is
reclaimed to represent the eGPRs and SSP when SIMD registers sampling is
supported with the new SIMD sampling fields in the perf_event_attr
structure. This necessitates a way to distinguish which register layout
is used for the sample_regs_user/sample_regs_intr bitmap.

To address this, a new "abi" argument is added to the helpers
perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When
"abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP
layout is represented; otherwise, the legacy XMM registers are
represented.

Please note that PERF_SAMPLE_REGS_ABI_SIMD is set by default on platforms
that support SIMD register sampling, even when no eGPR or SSP register is
requested (for example, -Iax). As a result, sample_regs_intr and
sample_regs_usr always use the new GPR layout on platforms with SIMD
register sampling support.

The patch only supports eGPRs and SSP sampling, the complete SIMD
registers sampling would be supported in the next patch.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/builtin-inject.c                   |   2 +
 tools/perf/builtin-script.c                   |   2 +-
 tools/perf/util/evsel.c                       |  23 +++-
 tools/perf/util/intel-pt.c                    |   1 +
 tools/perf/util/parse-regs-options.c          |  35 +++--
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 124 +++++++++++++++---
 tools/perf/util/perf_regs.c                   |  12 +-
 tools/perf/util/perf_regs.h                   |  10 +-
 tools/perf/util/record.h                      |   7 +
 .../scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/session.c                     |  13 +-
 tools/perf/util/synthetic-events.c            |   8 ++
 12 files changed, 194 insertions(+), 45 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f174bc69cec4..f6611d7e85eb 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -457,6 +457,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
 	/* adjust sample size for stack and regs */
 	sample_size -= sample->user_stack.size;
 	sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);
+	if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+		sample_size -= 4 * sizeof(u64);	/* Reduce SIMD regs header size */
 	sample_size += (sample->callchain->nr + 1) * sizeof(u64);
 	event_copy->header.size = sample_size;
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c8ac9f01a36b..8ec791e22778 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
 	for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
 		u64 val = regs->regs[i++];
 		printed += fprintf(fp, "%5s:0x%"PRIx64" ",
-				   perf_reg_name(r, e_machine, e_flags),
+				   perf_reg_name(r, e_machine, e_flags, regs->abi),
 				   val);
 	}
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2ee87fd84d3e..1c856a2ecc6e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1055,19 +1055,22 @@ static void __evsel__config_callchain(struct evsel *evsel, const struct record_o
 	}
 
 	if (param->record_mode == CALLCHAIN_DWARF) {
+		int abi = -1; /* -1 indicates only basic GPRs are needed. */
+
 		if (!function) {
 			uint16_t e_machine = evsel__e_machine(evsel, /*e_flags=*/NULL);
 
 			evsel__set_sample_bit(evsel, REGS_USER);
 			evsel__set_sample_bit(evsel, STACK_USER);
 			if (opts->sample_user_regs &&
-			    DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST)) {
+			    DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST, &abi)) {
 				attr->sample_regs_user |= DWARF_MINIMAL_REGS(e_machine);
 				pr_warning("WARNING: The use of --call-graph=dwarf may require all the user registers, "
 					   "specifying a subset with --user-regs may render DWARF unwinding unreliable, "
 					   "so the minimal registers set (IP, SP) is explicitly forced.\n");
 			} else {
-				attr->sample_regs_user |= perf_user_reg_mask(EM_HOST);
+				abi = -1;
+				attr->sample_regs_user |= perf_user_reg_mask(EM_HOST, &abi);
 			}
 			attr->sample_stack_user = param->dump_size;
 			attr->exclude_callchain_user = 1;
@@ -1587,12 +1590,14 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
 	if (opts->sample_intr_regs && !evsel->no_aux_samples &&
 	    !evsel__is_dummy_event(evsel)) {
 		attr->sample_regs_intr = opts->sample_intr_regs;
+		attr->sample_simd_regs_enabled = !!opts->sample_simd_regs_enabled;
 		evsel__set_sample_bit(evsel, REGS_INTR);
 	}
 
 	if (opts->sample_user_regs && !evsel->no_aux_samples &&
 	    !evsel__is_dummy_event(evsel)) {
 		attr->sample_regs_user |= opts->sample_user_regs;
+		attr->sample_simd_regs_enabled = !!opts->sample_simd_regs_enabled;
 		evsel__set_sample_bit(evsel, REGS_USER);
 	}
 
@@ -3495,6 +3500,13 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			regs->mask = mask;
 			regs->regs = (u64 *)array;
 			array = (void *)array + sz;
+
+			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+				/* Skip SIMD-regs header. */
+				sz = 4 * sizeof(u64);
+				OVERFLOW_CHECK(array, sz, max_size);
+				array = (void *)array + sz;
+			}
 		}
 	}
 
@@ -3552,6 +3564,13 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			regs->mask = mask;
 			regs->regs = (u64 *)array;
 			array = (void *)array + sz;
+
+			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+				/* Skip SIMD-regs header. */
+				sz = 4 * sizeof(u64);
+				OVERFLOW_CHECK(array, sz, max_size);
+				array = (void *)array + sz;
+			}
 		}
 	}
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..2729ad8c6d26 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2470,6 +2470,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
 	}
 
 	if (sample_type & PERF_SAMPLE_REGS_INTR &&
+	    !evsel->core.attr.sample_simd_regs_enabled &&
 	    (items->mask[INTEL_PT_GP_REGS_POS] ||
 	     items->mask[INTEL_PT_XMM_POS])) {
 		u64 regs_mask = evsel->core.attr.sample_regs_intr;
diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
index c93c2f0c8105..70a1cc90b2c1 100644
--- a/tools/perf/util/parse-regs-options.c
+++ b/tools/perf/util/parse-regs-options.c
@@ -6,11 +6,14 @@
 #include <stdio.h>
 #include "util/debug.h"
 #include <dwarf-regs.h>
+#include <sys/param.h>
 #include <subcmd/parse-options.h>
 #include "util/perf_regs.h"
 #include "util/parse-regs-options.h"
+#include "record.h"
 
-static void list_perf_regs(FILE *fp, uint64_t mask)
+static void
+list_perf_regs(FILE *fp, uint64_t mask, int abi)
 {
 	const char *last_name = NULL;
 
@@ -21,7 +24,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
 		if (((1ULL << reg) & mask) == 0)
 			continue;
 
-		name = perf_reg_name(reg, EM_HOST, EF_HOST);
+		name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
 		if (name && (!last_name || strcmp(last_name, name)))
 			fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
 		last_name = name;
@@ -29,7 +32,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
 	fputc('\n', fp);
 }
 
-static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
+static uint64_t
+name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
 {
 	uint64_t reg_mask = 0;
 
@@ -39,7 +43,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
 		if (((1ULL << reg) & mask) == 0)
 			continue;
 
-		name = perf_reg_name(reg, EM_HOST, EF_HOST);
+		name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
 		if (!name)
 			continue;
 
@@ -53,9 +57,12 @@ static int
 __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 {
 	uint64_t *mode = (uint64_t *)opt->value;
+	struct record_opts *opts;
 	char *s, *os = NULL, *p;
+	const char *warn;
 	int ret = -1;
 	uint64_t mask;
+	int abi = 0;
 
 	if (unset)
 		return 0;
@@ -66,11 +73,16 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (*mode)
 		return -1;
 
-	mask = intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST);
+	mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) :
+		      perf_user_reg_mask(EM_HOST, &abi);
+	opts = intr ? container_of(opt->value, struct record_opts, sample_intr_regs) :
+		      container_of(opt->value, struct record_opts, sample_user_regs);
 
 	/* str may be NULL in case no arg is passed to -I */
 	if (!str) {
 		*mode = mask;
+		if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			opts->sample_simd_regs_enabled = 1;
 		return 0;
 	}
 
@@ -79,6 +91,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (!s)
 		return -1;
 
+	warn = "Unknown register \"%s\", check man page or run \"perf record %s?\"\n";
 	for (;;) {
 		uint64_t reg_mask;
 
@@ -87,14 +100,16 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 			*p = '\0';
 
 		if (!strcmp(s, "?")) {
-			list_perf_regs(stderr, mask);
+			list_perf_regs(stderr, mask, abi);
 			goto error;
 		}
 
-		reg_mask = name_to_perf_reg_mask(s, mask);
-		if (reg_mask == 0) {
-			ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
-				s, intr ? "-I" : "--user-regs=");
+		reg_mask = name_to_perf_reg_mask(s, mask, abi);
+		if (reg_mask) {
+			if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				opts->sample_simd_regs_enabled = 1;
+		} else {
+			ui__warning(warn, s, intr ? "-I" : "--user-regs=");
 			goto error;
 		}
 		*mode |= reg_mask;
diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
index b6d20522b4e8..ae26d991cdc9 100644
--- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
+++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
@@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op)
 	return SDT_ARG_VALID;
 }
 
-uint64_t __perf_reg_mask_x86(bool intr)
+static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_regs)
 {
 	struct perf_event_attr attr = {
-		.type			= PERF_TYPE_HARDWARE,
-		.config			= PERF_COUNT_HW_CPU_CYCLES,
-		.sample_type		= PERF_SAMPLE_REGS_INTR,
-		.sample_regs_intr	= PERF_REG_EXTENDED_MASK,
-		.precise_ip		= 1,
-		.disabled		= 1,
-		.exclude_kernel		= 1,
+		.type				= PERF_TYPE_HARDWARE,
+		.config				= PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type			= sample_type,
+		.precise_ip			= 1,
+		.disabled			= 1,
+		.exclude_kernel			= 1,
+		.sample_simd_regs_enabled	= has_simd_regs,
 	};
 	int fd;
-
-	if (!intr)
-		return PERF_REGS_MASK;
-
 	/*
 	 * In an unnamed union, init it here to build on older gcc versions
 	 */
 	attr.sample_period = 1;
+	if (sample_type == PERF_SAMPLE_REGS_INTR)
+		attr.sample_regs_intr = mask;
+	else
+		attr.sample_regs_user = mask;
 
 	if (perf_pmus__num_core_pmus() > 1) {
 		struct perf_pmu *pmu = NULL;
@@ -276,13 +276,38 @@ uint64_t __perf_reg_mask_x86(bool intr)
 				 /*group_fd=*/-1, /*flags=*/0);
 	if (fd != -1) {
 		close(fd);
-		return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK);
+		return mask;
+	}
+
+	return 0;
+}
+
+uint64_t __perf_reg_mask_x86(bool intr, int *abi)
+{
+	u64 sample_type = intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER;
+	uint64_t mask = PERF_REGS_MASK;
+
+	/* -1 indicates only basic GPRs are needed. */
+	if (*abi < 0)
+		return PERF_REGS_MASK;
+
+	*abi = 0;
+	mask |= __arch__reg_mask(sample_type,
+				 GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
+				 true);
+	mask |= __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true);
+
+	if (mask != PERF_REGS_MASK) {
+		*abi |= PERF_SAMPLE_REGS_ABI_SIMD;
+	} else {
+		mask |= __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK,
+					 false);
 	}
 
-	return PERF_REGS_MASK;
+	return mask;
 }
 
-const char *__perf_reg_name_x86(int id)
+static const char *__arch_reg_gpr_name(int id)
 {
 	switch (id) {
 	case PERF_REG_X86_AX:
@@ -333,7 +358,60 @@ const char *__perf_reg_name_x86(int id)
 		return "R14";
 	case PERF_REG_X86_R15:
 		return "R15";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
 
+static const char *__arch_reg_egpr_name(int id)
+{
+	switch (id) {
+	case PERF_REG_X86_R16:
+		return "R16";
+	case PERF_REG_X86_R17:
+		return "R17";
+	case PERF_REG_X86_R18:
+		return "R18";
+	case PERF_REG_X86_R19:
+		return "R19";
+	case PERF_REG_X86_R20:
+		return "R20";
+	case PERF_REG_X86_R21:
+		return "R21";
+	case PERF_REG_X86_R22:
+		return "R22";
+	case PERF_REG_X86_R23:
+		return "R23";
+	case PERF_REG_X86_R24:
+		return "R24";
+	case PERF_REG_X86_R25:
+		return "R25";
+	case PERF_REG_X86_R26:
+		return "R26";
+	case PERF_REG_X86_R27:
+		return "R27";
+	case PERF_REG_X86_R28:
+		return "R28";
+	case PERF_REG_X86_R29:
+		return "R29";
+	case PERF_REG_X86_R30:
+		return "R30";
+	case PERF_REG_X86_R31:
+		return "R31";
+	case PERF_REG_X86_SSP:
+		return "SSP";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
+
+static const char *__arch_reg_xmm_name(int id)
+{
+	switch (id) {
 #define XMM(x) \
 	case PERF_REG_X86_XMM ## x:	\
 	case PERF_REG_X86_XMM ## x + 1:	\
@@ -362,6 +440,22 @@ const char *__perf_reg_name_x86(int id)
 	return NULL;
 }
 
+const char *__perf_reg_name_x86(int id, int abi)
+{
+	const char *name;
+
+	name = __arch_reg_gpr_name(id);
+	if (name)
+		return name;
+
+	if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+		name = __arch_reg_egpr_name(id);
+	else
+		name = __arch_reg_xmm_name(id);
+
+	return name;
+}
+
 uint64_t __perf_reg_ip_x86(void)
 {
 	return PERF_REG_X86_IP;
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index f52b0e1f7fc7..18eed85cf220 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -35,7 +35,7 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op)
 	return ret;
 }
 
-uint64_t perf_intr_reg_mask(uint16_t e_machine)
+uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi /*inout*/)
 {
 	uint64_t mask = 0;
 
@@ -67,7 +67,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		mask = __perf_reg_mask_x86(/*intr=*/true);
+		mask = __perf_reg_mask_x86(/*intr=*/true, abi);
 		break;
 	default:
 		pr_debug("Unknown ELF machine %d, interrupt sampling register mask will be empty.\n",
@@ -78,7 +78,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
 	return mask;
 }
 
-uint64_t perf_user_reg_mask(uint16_t e_machine)
+uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi /*inout*/)
 {
 	uint64_t mask = 0;
 
@@ -110,7 +110,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		mask = __perf_reg_mask_x86(/*intr=*/false);
+		mask = __perf_reg_mask_x86(/*intr=*/false, abi);
 		break;
 	default:
 		pr_debug("Unknown ELF machine %d, user sampling register mask will be empty.\n",
@@ -121,7 +121,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
 	return mask;
 }
 
-const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
+const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
 {
 	const char *reg_name = NULL;
 
@@ -153,7 +153,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		reg_name = __perf_reg_name_x86(id);
+		reg_name = __perf_reg_name_x86(id, abi);
 		break;
 	default:
 		break;
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 573f0d1dfe04..3086d2f2a974 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -13,10 +13,10 @@ enum {
 };
 
 int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op);
-uint64_t perf_intr_reg_mask(uint16_t e_machine);
-uint64_t perf_user_reg_mask(uint16_t e_machine);
+uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi /*inout*/);
+uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi /*inout*/);
 
-const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags);
+const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi);
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 uint64_t perf_arch_reg_ip(uint16_t e_machine);
 uint64_t perf_arch_reg_sp(uint16_t e_machine);
@@ -65,8 +65,8 @@ uint64_t __perf_reg_sp_s390(void);
 int __perf_sdt_arg_parse_op_s390(char *old_op, char **new_op);
 
 int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op);
-uint64_t __perf_reg_mask_x86(bool intr);
-const char *__perf_reg_name_x86(int id);
+uint64_t __perf_reg_mask_x86(bool intr, int *abi);
+const char *__perf_reg_name_x86(int id, int abi);
 uint64_t __perf_reg_ip_x86(void);
 uint64_t __perf_reg_sp_x86(void);
 
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 93627c9a7338..411bb7276ad7 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -62,6 +62,13 @@ struct record_opts {
 	u64	      branch_stack;
 	u64	      sample_intr_regs;
 	u64	      sample_user_regs;
+	u16	      sample_simd_regs_enabled;
+	u16	      sample_vec_reg_qwords;
+	u16	      sample_pred_reg_qwords;
+	u32	      sample_intr_pred_regs;
+	u32	      sample_user_pred_regs;
+	u64	      sample_intr_vec_regs;
+	u64	      sample_user_vec_regs;
 	u64	      default_interval;
 	u64	      user_interval;
 	size_t	      auxtrace_snapshot_size;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 5a30caaec73e..a9ad7d712196 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, uint16_t e_machine,
 
 		printed += scnprintf(bf + printed, size - printed,
 				     "%5s:0x%" PRIx64 " ",
-				     perf_reg_name(r, e_machine, e_flags), val);
+				     perf_reg_name(r, e_machine, e_flags, regs->abi), val);
 	}
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..9e36c834a8f4 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -966,15 +966,16 @@ static void branch_stack__printf(struct perf_sample *sample,
 	}
 }
 
-static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uint32_t e_flags)
+static void regs_dump__printf(u64 mask, struct regs_dump *regs,
+			      uint16_t e_machine, uint32_t e_flags)
 {
 	unsigned rid, i = 0;
 
 	for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
-		u64 val = regs[i++];
+		u64 val = regs->regs[i++];
 
 		printf(".... %-5s 0x%016" PRIx64 "\n",
-		       perf_reg_name(rid, e_machine, e_flags), val);
+		       perf_reg_name(rid, e_machine, e_flags, regs->abi), val);
 	}
 }
 
@@ -982,11 +983,13 @@ static const char *regs_abi[] = {
 	[PERF_SAMPLE_REGS_ABI_NONE] = "none",
 	[PERF_SAMPLE_REGS_ABI_32] = "32-bit",
 	[PERF_SAMPLE_REGS_ABI_64] = "64-bit",
+	[PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_32] = "32-bit SIMD",
+	[PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] = "64-bit SIMD",
 };
 
 static inline const char *regs_dump_abi(struct regs_dump *d)
 {
-	if (d->abi > PERF_SAMPLE_REGS_ABI_64)
+	if (d->abi >= ARRAY_SIZE(regs_abi) || !regs_abi[d->abi])
 		return "unknown";
 
 	return regs_abi[d->abi];
@@ -1002,7 +1005,7 @@ static void regs__printf(const char *type, struct regs_dump *regs,
 	       mask,
 	       regs_dump_abi(regs));
 
-	regs_dump__printf(mask, regs->regs, e_machine, e_flags);
+	regs_dump__printf(mask, regs, e_machine, e_flags);
 }
 
 static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..ce61734cd5d2 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1524,6 +1524,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 		if (sample->user_regs && sample->user_regs->abi) {
 			result += sizeof(u64);
 			sz = hweight64(sample->user_regs->mask) * sizeof(u64);
+			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				sz += 4 * sizeof(u64);
 			result += sz;
 		} else {
 			result += sizeof(u64);
@@ -1552,6 +1554,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 		if (sample->intr_regs && sample->intr_regs->abi) {
 			result += sizeof(u64);
 			sz = hweight64(sample->intr_regs->mask) * sizeof(u64);
+			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				sz += 4 * sizeof(u64);
 			result += sz;
 		} else {
 			result += sizeof(u64);
@@ -1729,6 +1733,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		if (sample->user_regs && sample->user_regs->abi) {
 			*array++ = sample->user_regs->abi;
 			sz = hweight64(sample->user_regs->mask) * sizeof(u64);
+			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				sz += 4 * sizeof(u64);
 			memcpy(array, sample->user_regs->regs, sz);
 			array = (void *)array + sz;
 		} else {
@@ -1765,6 +1771,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		if (sample->intr_regs && sample->intr_regs->abi) {
 			*array++ = sample->intr_regs->abi;
 			sz = hweight64(sample->intr_regs->mask) * sizeof(u64);
+			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				sz += 4 * sizeof(u64);
 			memcpy(array, sample->intr_regs->regs, sz);
 			array = (void *)array + sz;
 		} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling
  2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
  2026-05-29  8:24 ` [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Dapeng Mi
  2026-05-29  8:24 ` [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
@ 2026-05-29  8:24 ` Dapeng Mi
  2026-05-29 10:45   ` sashiko-bot
  2026-05-29  8:24 ` [Patch v8 4/5] perf regs: Enable dumping of SIMD registers Dapeng Mi
  2026-05-29  8:24 ` [Patch v8 5/5] perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers Dapeng Mi
  4 siblings, 1 reply; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi

This patch adds support for the newly introduced SIMD register sampling
format by adding the following 5 functions:

uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
						uint16_t *qwords, bool pred);
uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
						uint16_t *qwords, bool pred);
const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);

The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap
of kernel supported SIMD/PRED register classes on current platform for
intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on
x86 platforms.

The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve
the bitmap and qwords length of a certain class of SIMD/PRED register
on current platform for intr-regs and user-regs sampling. For example,
for the XMM registers on x86 platforms, the returned bitmap is 0xffff
(XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM
register).

The perf_simd_reg_class_name() function gets the register class name for
a certain register class index.

Additionally, the function __parse_regs() is enhanced to support parsing
these newly introduced SIMD/PRED registers. Currently, each class of
register can only be sampled collectively; sampling a specific SIMD
register is not supported. For example, all XMM registers are sampled
together rather than sampling only XMM0.

When multiple overlapping register types, such as XMM and YMM, are
sampled simultaneously, only the superset (YMM registers) is sampled.

With this patch, all supported sampling registers on x86 platforms are
displayed as follows.

 $perf record --intr-regs=?
 available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7

 $perf record --user-regs=?
 available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
---
 tools/perf/Documentation/perf-record.txt      |  10 +-
 tools/perf/util/evsel.c                       |  21 ++
 tools/perf/util/parse-regs-options.c          | 159 +++++++++-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 292 ++++++++++++++++++
 tools/perf/util/perf_event_attr_fprintf.c     |   6 +
 tools/perf/util/perf_regs.c                   |  72 +++++
 tools/perf/util/perf_regs.h                   |  11 +
 7 files changed, 559 insertions(+), 12 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 178f483140ed..b8ff7ecd941d 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -513,12 +513,16 @@ Capture machine state (registers) at interrupt, i.e., on counter overflows for
 each sample. List of captured registers depends on the architecture. This option
 is off by default. It is possible to select the registers to sample using their
 symbolic names, e.g. on x86, ax, si. To list the available registers use
---intr-regs=\?. To name registers, pass a comma separated list such as
---intr-regs=ax,bx. The list of register is architecture dependent.
+--intr-regs=\?. On supported architectures, SIMD registers are displayed as
+groups (e.g., on x86: XMM0-15,YMM0-15,ZMM0-31). To name registers, pass a comma
+separated list such as --intr-regs=ax,bx,zmm. Please notice SIMD registers must
+be assigned as a complete set, sampling individual SIMD registers (e.g., zmm0)
+is not supported. The list of register is architecture dependent.
 
 --user-regs::
 Similar to -I, but capture user registers at sample time. To list the available
-user registers use --user-regs=\?.
+user registers use --user-regs=\?. For SIMD registers, only complete register
+sets are allowed like -I.
 
 --running-time::
 Record running and enabled time for read events (:S)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1c856a2ecc6e..cd62af14a4f5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -26,6 +26,7 @@
 #include <sys/resource.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
+#include <sys/param.h>
 #include <dirent.h>
 #include <stdlib.h>
 #include <perf/evsel.h>
@@ -1594,6 +1595,16 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
 		evsel__set_sample_bit(evsel, REGS_INTR);
 	}
 
+	if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) &&
+	    !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
+		attr->sample_simd_regs_enabled = !!opts->sample_simd_regs_enabled;
+		attr->sample_simd_vec_reg_intr = opts->sample_intr_vec_regs;
+		attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
+		attr->sample_simd_pred_reg_intr = opts->sample_intr_pred_regs;
+		attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
+		evsel__set_sample_bit(evsel, REGS_INTR);
+	}
+
 	if (opts->sample_user_regs && !evsel->no_aux_samples &&
 	    !evsel__is_dummy_event(evsel)) {
 		attr->sample_regs_user |= opts->sample_user_regs;
@@ -1601,6 +1612,16 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
 		evsel__set_sample_bit(evsel, REGS_USER);
 	}
 
+	if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) &&
+	    !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
+		attr->sample_simd_regs_enabled = !!opts->sample_simd_regs_enabled;
+		attr->sample_simd_vec_reg_user = opts->sample_user_vec_regs;
+		attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
+		attr->sample_simd_pred_reg_user = opts->sample_user_pred_regs;
+		attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
+		evsel__set_sample_bit(evsel, REGS_USER);
+	}
+
 	if (target__has_cpu(&opts->target) || opts->sample_cpu)
 		evsel__set_sample_bit(evsel, CPU);
 
diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
index 70a1cc90b2c1..26d560a486c8 100644
--- a/tools/perf/util/parse-regs-options.c
+++ b/tools/perf/util/parse-regs-options.c
@@ -13,11 +13,10 @@
 #include "record.h"
 
 static void
-list_perf_regs(FILE *fp, uint64_t mask, int abi)
+__list_gp_regs(FILE *fp, uint64_t mask, int abi)
 {
 	const char *last_name = NULL;
 
-	fprintf(fp, "available registers: ");
 	for (int reg = 0; reg < 64; reg++) {
 		const char *name;
 
@@ -29,14 +28,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi)
 			fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
 		last_name = name;
 	}
+}
+
+static void
+__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred)
+{
+	uint64_t bitmap = 0;
+	uint16_t qwords = 0;
+	const char *name;
+	int i = 0;
+
+	for (int reg_c = 0; reg_c < 64; reg_c++) {
+		if (((1ULL << reg_c) & mask) == 0)
+			continue;
+
+		name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
+		bitmap = intr ?
+			 perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) :
+			 perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred);
+		if (name && bitmap)
+			fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "",
+				name, fls64(bitmap) - 1);
+	}
+}
+
+static void
+list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask,
+	       uint64_t pred_mask, int abi, bool intr)
+{
+	bool printed = false;
+
+	fprintf(fp, "available registers: ");
+
+	if (mask) {
+		__list_gp_regs(fp, mask, abi);
+		printed = true;
+	}
+
+	if (simd_mask) {
+		if (printed)
+			fprintf(fp, " ");
+		__list_simd_regs(fp, simd_mask, intr, /*pred=*/false);
+		printed = true;
+	}
+
+	if (pred_mask) {
+		if (printed)
+			fprintf(fp, " ");
+		__list_simd_regs(fp, pred_mask, intr, /*pred=*/true);
+		printed = true;
+	}
+
 	fputc('\n', fp);
 }
 
 static uint64_t
-name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
+name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi)
 {
 	uint64_t reg_mask = 0;
 
+	if (!mask)
+		return reg_mask;
+
 	for (int reg = 0; reg < 64; reg++) {
 		const char *name;
 
@@ -53,22 +106,96 @@ name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
 	return reg_mask;
 }
 
+static bool
+name_to_simd_reg_mask(struct record_opts *opts, const char *to_match,
+		      uint64_t mask, bool intr, bool pred)
+{
+	bool matched = false;
+	uint64_t bitmap;
+	uint16_t qwords;
+	int reg_c;
+
+	if (!mask)
+		return false;
+
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		const char *name;
+
+		if (((1ULL << reg_c) & mask) == 0)
+			continue;
+
+		name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
+		if (!name)
+			continue;
+
+		if (!strcasecmp(to_match, name)) {
+			matched = true;
+			break;
+		}
+	}
+
+	if (!matched)
+		return false;
+
+	if (intr) {
+		bitmap = perf_intr_simd_reg_class_bitmap_qwords(EM_HOST,
+							reg_c, &qwords, pred);
+	} else {
+		bitmap = perf_user_simd_reg_class_bitmap_qwords(EM_HOST,
+							reg_c, &qwords, pred);
+	}
+
+	/*
+	 * Assume higher width SIMD registers are always the superset of lower
+	 * width SIMD registers. So only pick the largest qwords and bitmap.
+	 */
+	if (pred) {
+		opts->sample_pred_reg_qwords =
+			MAX(qwords, opts->sample_pred_reg_qwords);
+		if (intr &&
+		    hweight64(bitmap) > hweight32(opts->sample_intr_pred_regs))
+			opts->sample_intr_pred_regs = bitmap;
+		if (!intr &&
+		    hweight64(bitmap) > hweight32(opts->sample_user_pred_regs))
+			opts->sample_user_pred_regs = bitmap;
+	} else {
+		opts->sample_vec_reg_qwords =
+			MAX(qwords, opts->sample_vec_reg_qwords);
+		if (intr &&
+		    hweight64(bitmap) > hweight64(opts->sample_intr_vec_regs))
+			opts->sample_intr_vec_regs = bitmap;
+		if (!intr &&
+		    hweight64(bitmap) > hweight64(opts->sample_user_vec_regs))
+			opts->sample_user_vec_regs = bitmap;
+	}
+
+	if (opts->sample_pred_reg_qwords || opts->sample_vec_reg_qwords)
+		opts->sample_simd_regs_enabled = 1;
+
+	return true;
+}
+
 static int
 __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 {
 	uint64_t *mode = (uint64_t *)opt->value;
 	struct record_opts *opts;
 	char *s, *os = NULL, *p;
+	uint64_t simd_mask;
+	uint64_t pred_mask;
+	uint64_t mask;
 	const char *warn;
+	bool matched;
 	int ret = -1;
-	uint64_t mask;
 	int abi = 0;
 
 	if (unset)
 		return 0;
 
 	/*
-	 * cannot set it twice
+	 * Non-SIMD registers cannot be set twice.
+	 * SIMD registers can be set multiple times, but only the register
+	 * class with largest length (qwords) is sampled.
 	 */
 	if (*mode)
 		return -1;
@@ -91,6 +218,14 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (!s)
 		return -1;
 
+	if (intr) {
+		simd_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/false);
+		pred_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/true);
+	} else {
+		simd_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/false);
+		pred_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/true);
+	}
+
 	warn = "Unknown register \"%s\", check man page or run \"perf record %s?\"\n";
 	for (;;) {
 		uint64_t reg_mask;
@@ -100,17 +235,23 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 			*p = '\0';
 
 		if (!strcmp(s, "?")) {
-			list_perf_regs(stderr, mask, abi);
+			list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr);
 			goto error;
 		}
 
-		reg_mask = name_to_perf_reg_mask(s, mask, abi);
+		reg_mask = name_to_gp_reg_mask(s, mask, abi);
 		if (reg_mask) {
 			if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
 				opts->sample_simd_regs_enabled = 1;
 		} else {
-			ui__warning(warn, s, intr ? "-I" : "--user-regs=");
-			goto error;
+			matched = name_to_simd_reg_mask(opts, s, simd_mask,
+							intr, /*pred=*/false) ||
+				  name_to_simd_reg_mask(opts, s, pred_mask,
+							intr, /*pred=*/true);
+			if (!matched) {
+				ui__warning(warn, s, intr ? "-I" : "--user-regs=");
+				goto error;
+			}
 		}
 		*mode |= reg_mask;
 
diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
index ae26d991cdc9..96f156d9971c 100644
--- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
+++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
@@ -465,3 +465,295 @@ uint64_t __perf_reg_sp_x86(void)
 {
 	return PERF_REG_X86_SP;
 }
+
+enum {
+	PERF_REG_CLASS_X86_OPMASK = 0,
+	PERF_REG_CLASS_X86_XMM,
+	PERF_REG_CLASS_X86_YMM,
+	PERF_REG_CLASS_X86_ZMM,
+	PERF_REG_X86_MAX_SIMD_CLASSES,
+};
+
+#define PERF_REG_CLASS_X86_PRED_MASK	(BIT(PERF_REG_CLASS_X86_OPMASK))
+#define PERF_REG_CLASS_X86_SIMD_MASK	(BIT(PERF_REG_CLASS_X86_XMM) | \
+					 BIT(PERF_REG_CLASS_X86_YMM) | \
+					 BIT(PERF_REG_CLASS_X86_ZMM))
+
+/*
+ * This function is used to determine whether kernel perf subsystem
+ * supports which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling.
+ *
+ * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER
+ * @qwords: the length of SIMD register, like 1/2/4/8 qwords for
+ *          OPMASK/XMM/YMM/ZMM registers.
+ * @mask: the bitmask of SIMD register, like 0xffff for XMM0 ~ XMM15
+ * @pred: whether It's a predicate SIMD register, like OPMASK register.
+ *
+ * Return value: true indicates support, otherwise no support.
+ */
+static bool
+__support_simd_reg_class(uint64_t sample_type, uint16_t qwords,
+			 uint64_t mask, bool pred)
+{
+	struct perf_event_attr attr = {
+		.type				= PERF_TYPE_HARDWARE,
+		.config				= PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type			= sample_type,
+		.disabled			= 1,
+		.exclude_kernel			= 1,
+		.sample_simd_regs_enabled	= 1,
+	};
+	int fd;
+
+	attr.sample_period = 1;
+
+	if (!pred) {
+		attr.sample_simd_vec_reg_qwords = qwords;
+		if (sample_type == PERF_SAMPLE_REGS_INTR)
+			attr.sample_simd_vec_reg_intr = mask;
+		else
+			attr.sample_simd_vec_reg_user = mask;
+	} else {
+		attr.sample_simd_pred_reg_qwords = PERF_X86_OPMASK_QWORDS;
+		if (sample_type == PERF_SAMPLE_REGS_INTR)
+			attr.sample_simd_pred_reg_intr = PERF_X86_SIMD_PRED_MASK;
+		else
+			attr.sample_simd_pred_reg_user = PERF_X86_SIMD_PRED_MASK;
+	}
+
+	if (perf_pmus__num_core_pmus() > 1) {
+		__u64 type = perf_pmus__find_core_pmu()->type;
+
+		attr.config |= type << PERF_PMU_TYPE_SHIFT;
+	}
+
+	event_attr_init(&attr);
+
+	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+	if (fd != -1) {
+		close(fd);
+		return true;
+	}
+
+	return false;
+}
+
+#define PERF_X86_SIMD_ZMM_LOW_REGS	(PERF_X86_SIMD_ZMM_REGS / 2)
+
+static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class,
+				      uint64_t *mask, uint16_t *qwords)
+{
+	bool supported = false;
+	uint64_t bits;
+
+	*mask = 0;
+	*qwords = 0;
+
+	switch (reg_class) {
+	case PERF_REG_CLASS_X86_OPMASK:
+		bits = BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_OPMASK_QWORDS,
+						     bits, true);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_OPMASK_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_XMM:
+		bits = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_XMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_XMM_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_YMM:
+		bits = BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_YMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_YMM_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_ZMM:
+		bits = BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_ZMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_ZMM_QWORDS;
+			break;
+		}
+
+		bits = BIT_ULL(PERF_X86_SIMD_ZMM_LOW_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_ZMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_ZMM_QWORDS;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return supported;
+}
+
+static bool __support_simd_sampling(void)
+{
+	uint64_t mask = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
+	uint16_t qwords = PERF_X86_XMM_QWORDS;
+	static bool simd_sampling_supported;
+	static bool cached;
+
+	if (cached)
+		return simd_sampling_supported;
+
+	simd_sampling_supported =
+		 __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR,
+					   PERF_REG_CLASS_X86_XMM,
+					   &mask, &qwords);
+	cached = true;
+
+	return simd_sampling_supported;
+}
+
+/*
+ * @x86_intr_simd_cached: indicates the data of below 3
+ *  x86_intr_simd_* items has been retrieved from kernel and cached.
+ * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD
+ *  registers are supported for intr-regs option. Assume kernel perf
+ *  subsystem supports XMM/YMM sampling, then the mask is
+ *  PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM.
+ * @x86_intr_simd_mask: indicates register bitmask for each kind of
+ *  supported PRED/SIMD register, like
+ *  x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] = 0xffff.
+ * @x86_intr_simd_qwords: indicates the register length (qwords unit)
+ *  for each kind of supported PRED/SIMD register, like
+ *  x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] = 2.
+ */
+static bool x86_intr_simd_cached;
+static uint64_t x86_intr_simd_reg_class_mask;
+static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
+static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
+
+/*
+ * Similar with above x86_intr_simd_* items, the difference is these
+ * items are used for user-regs option.
+ */
+static bool x86_user_simd_cached;
+static uint64_t x86_user_simd_reg_class_mask;
+static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
+static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
+
+static uint64_t __arch__simd_reg_class_mask(bool intr)
+{
+	uint64_t mask = 0;
+	bool supported;
+	int reg_c;
+
+	if (!__support_simd_sampling())
+		goto done;
+
+	if (intr && x86_intr_simd_cached)
+		return x86_intr_simd_reg_class_mask;
+
+	if (!intr && x86_user_simd_cached)
+		return x86_user_simd_reg_class_mask;
+
+	for (reg_c = 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) {
+		supported = false;
+
+		if (intr) {
+			supported = __arch_has_simd_reg_class(
+						PERF_SAMPLE_REGS_INTR,
+						reg_c,
+						&x86_intr_simd_mask[reg_c],
+						&x86_intr_simd_qwords[reg_c]);
+		} else {
+			supported = __arch_has_simd_reg_class(
+						PERF_SAMPLE_REGS_USER,
+						reg_c,
+						&x86_user_simd_mask[reg_c],
+						&x86_user_simd_qwords[reg_c]);
+		}
+		if (supported)
+			mask |= BIT_ULL(reg_c);
+	}
+
+done:
+	if (intr) {
+		x86_intr_simd_reg_class_mask = mask;
+		x86_intr_simd_cached = true;
+	} else {
+		x86_user_simd_reg_class_mask = mask;
+		x86_user_simd_cached = true;
+	}
+
+	return mask;
+}
+
+static uint64_t
+__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qwords)
+{
+	uint64_t mask = 0;
+	uint64_t class_mask;
+
+	*qwords = 0;
+	class_mask = intr ? x86_intr_simd_reg_class_mask :
+			    x86_user_simd_reg_class_mask;
+	if (!(class_mask & BIT_ULL(reg_c)))
+		return 0;
+
+	if (intr) {
+		mask = x86_intr_simd_mask[reg_c];
+		*qwords = x86_intr_simd_qwords[reg_c];
+	} else {
+		mask = x86_user_simd_mask[reg_c];
+		*qwords = x86_user_simd_qwords[reg_c];
+	}
+
+	return mask;
+}
+
+uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred)
+{
+	uint64_t mask = __arch__simd_reg_class_mask(intr);
+
+	return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK :
+		      mask & PERF_REG_CLASS_X86_SIMD_MASK;
+}
+
+uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
+						 bool intr, bool pred)
+{
+	if (intr ? !x86_intr_simd_cached : !x86_user_simd_cached)
+		__perf_simd_reg_class_mask_x86(intr, pred);
+	return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords);
+}
+
+const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unused)
+{
+	switch (id) {
+	case PERF_REG_CLASS_X86_OPMASK:
+		return "OPMASK";
+	case PERF_REG_CLASS_X86_XMM:
+		return "XMM";
+	case PERF_REG_CLASS_X86_YMM:
+		return "YMM";
+	case PERF_REG_CLASS_X86_ZMM:
+		return "ZMM";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 741c3d657a8b..c6b8e53e06fd 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(aux_start_paused, p_unsigned);
 	PRINT_ATTRf(aux_pause, p_unsigned);
 	PRINT_ATTRf(aux_resume, p_unsigned);
+	PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned);
+	PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex);
+	PRINT_ATTRf(sample_simd_pred_reg_user, p_hex);
+	PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned);
+	PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex);
+	PRINT_ATTRf(sample_simd_vec_reg_user, p_hex);
 
 	return ret;
 }
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index 18eed85cf220..31920eb2fa04 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -249,3 +249,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine)
 		return 0;
 	}
 }
+
+uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_mask_x86(/*intr=*/true, pred);
+	default:
+		return 0;
+	}
+}
+
+uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_mask_x86(/*intr=*/false, pred);
+	default:
+		return 0;
+	}
+}
+
+uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
+							       /*intr=*/true,
+							       pred);
+	default:
+		*qwords = 0;
+		return 0;
+	}
+}
+
+uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
+							       /*intr=*/false,
+							       pred);
+	default:
+		*qwords = 0;
+		return 0;
+	}
+}
+
+const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred)
+{
+	const char *name = NULL;
+
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		name = __perf_simd_reg_class_name_x86(id, pred);
+		break;
+	default:
+		break;
+	}
+	if (name)
+		return name;
+
+	pr_debug("Failed to find %s register %d for ELF machine type %u\n",
+		 pred ? "PRED" : "SIMD", id, e_machine);
+	return "unknown";
+}
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 3086d2f2a974..8a3a40d6b1bb 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 uint64_t perf_arch_reg_ip(uint16_t e_machine);
 uint64_t perf_arch_reg_sp(uint16_t e_machine);
+uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
+uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
+uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred);
+uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred);
+const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);
 
 int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op);
 uint64_t __perf_reg_mask_arm64(bool intr);
@@ -69,6 +76,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi);
 const char *__perf_reg_name_x86(int id, int abi);
 uint64_t __perf_reg_ip_x86(void);
 uint64_t __perf_reg_sp_x86(void);
+uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred);
+uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
+						 bool intr, bool pred);
+const char *__perf_simd_reg_class_name_x86(int id, bool pred);
 
 static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v8 4/5] perf regs: Enable dumping of SIMD registers
  2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
                   ` (2 preceding siblings ...)
  2026-05-29  8:24 ` [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling Dapeng Mi
@ 2026-05-29  8:24 ` Dapeng Mi
  2026-05-29 11:23   ` sashiko-bot
  2026-05-29  8:24 ` [Patch v8 5/5] perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers Dapeng Mi
  4 siblings, 1 reply; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi, Kan Liang

This patch adds support for dumping SIMD registers using the new
PERF_SAMPLE_REGS_ABI_SIMD ABI.

Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86
platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI.

An example of the output is displayed below.

Example:

 $perf record -e cycles:p -Iax,bx,r8,r16,r31,ssp,xmm,ymm,zmm,opmask ./test
 $perf report -D
 ... ...
 3342715685845 0x3afe8 [0xbc8]: PERF_RECORD_SAMPLE(IP, 0x1):
 27776/27776: 0xffffffff91d7c18f period: 10000 addr: 0
 ... intr regs: mask 0x18001010003 ABI 64-bit SIMD
 .... AX    0xffffed102de1a606
 .... BX    0xffffed102de1a606
 .... R8    0x0000000000000001
 .... R16   0x0000000000000000
 .... R31   0x0000000000000000
 .... SSP   0x0000000000000000
 ... SIMD ABI nr_vectors 32 vector_qwords 8 nr_pred 8 pred_qwords 1
 .... ZMM[0][0] 0x616c2f656d6f682f
 .... ZMM[0][1] 0x696c2f7265737562
 .... ZMM[0][2] 0x0000000000000000
 .... ZMM[0][3] 0x0000000000000000
 .... ZMM[0][4] 0x0000000000000000
 .... ZMM[0][5] 0x0000000000000000
 .... ZMM[0][6] 0x0000000000000000
 .... ZMM[0][7] 0x0000000000000000
 .... ZMM[1][0] 0x702f636578656269
 .... ZMM[1][1] 0x65726f632d667265
 .... ZMM[1][2] 0x0000000000000000
 .... ZMM[1][3] 0x0000000000000000
 .... ZMM[1][4] 0x0000000000000000
 .... ZMM[1][5] 0x0000000000000000
 .... ZMM[1][6] 0x0000000000000000
 .... ZMM[1][7] 0x0000000000000000
 ... ...
 .... ZMM[31][0] 0x0000000000000000
 .... ZMM[31][1] 0x0000000000000000
 .... ZMM[31][2] 0x0000000000000000
 .... ZMM[31][3] 0x0000000000000000
 .... ZMM[31][4] 0x0000000000000000
 .... ZMM[31][5] 0x0000000000000000
 .... ZMM[31][6] 0x0000000000000000
 .... ZMM[31][7] 0x0000000000000000
 .... OPMASK[0] 0x0000000000100221
 .... OPMASK[1] 0x0000000000000020
 .... OPMASK[2] 0x000000007fffffff
 .... OPMASK[3] 0x0000000000000000
 .... OPMASK[4] 0x0000000000000000
 .... OPMASK[5] 0x0000000000000000
 .... OPMASK[6] 0x0000000000000000
 .... OPMASK[7] 0x0000000000000000
 ... ...

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/builtin-inject.c        |  9 +++-
 tools/perf/util/evsel.c            | 68 ++++++++++++++++++++++++--
 tools/perf/util/sample.h           |  5 ++
 tools/perf/util/session.c          | 78 ++++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.c | 28 +++++++++--
 5 files changed, 178 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f6611d7e85eb..de19d5bd2d57 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -457,8 +457,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
 	/* adjust sample size for stack and regs */
 	sample_size -= sample->user_stack.size;
 	sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);
-	if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
-		sample_size -= 4 * sizeof(u64);	/* Reduce SIMD regs header size */
+	if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+		sample_size -= 4 * sizeof(u64);
+		sample_size -= (sample->user_regs->nr_vectors *
+				sample->user_regs->vector_qwords +
+				sample->user_regs->nr_pred *
+				sample->user_regs->pred_qwords) * sizeof(u64);
+	}
 	sample_size += (sample->callchain->nr + 1) * sizeof(u64);
 	event_copy->header.size = sample_size;
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index cd62af14a4f5..a47747c8be08 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3523,9 +3523,39 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			array = (void *)array + sz;
 
 			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
-				/* Skip SIMD-regs header. */
-				sz = 4 * sizeof(u64);
+				u64 attr_nr_vectors =
+					hweight64(evsel->core.attr.sample_simd_vec_reg_user);
+				u64 attr_vec_qwords =
+					evsel->core.attr.sample_simd_vec_reg_qwords;
+				u64 attr_nr_pred =
+					hweight32(evsel->core.attr.sample_simd_pred_reg_user);
+				u64 attr_pred_qwords =
+					 evsel->core.attr.sample_simd_pred_reg_qwords;
+
+				OVERFLOW_CHECK_u64(array);
+				regs->nr_vectors = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->vector_qwords = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->nr_pred = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->pred_qwords = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+
+				if (regs->nr_vectors > attr_nr_vectors ||
+				    regs->vector_qwords > attr_vec_qwords ||
+				    regs->nr_pred > attr_nr_pred ||
+				    regs->pred_qwords > attr_pred_qwords)
+					goto out_efault;
+
+				sz = (regs->nr_vectors * regs->vector_qwords +
+				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);
 				OVERFLOW_CHECK(array, sz, max_size);
+
+				regs->simd_data = (u64 *)array;
 				array = (void *)array + sz;
 			}
 		}
@@ -3587,9 +3617,39 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			array = (void *)array + sz;
 
 			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
-				/* Skip SIMD-regs header. */
-				sz = 4 * sizeof(u64);
+				u64 attr_nr_vectors =
+					hweight64(evsel->core.attr.sample_simd_vec_reg_intr);
+				u64 attr_vec_qwords =
+					evsel->core.attr.sample_simd_vec_reg_qwords;
+				u64 attr_nr_pred =
+					hweight32(evsel->core.attr.sample_simd_pred_reg_intr);
+				u64 attr_pred_qwords =
+					 evsel->core.attr.sample_simd_pred_reg_qwords;
+
+				OVERFLOW_CHECK_u64(array);
+				regs->nr_vectors = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->vector_qwords = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->nr_pred = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				OVERFLOW_CHECK_u64(array);
+				regs->pred_qwords = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+
+				if (regs->nr_vectors > attr_nr_vectors ||
+				    regs->vector_qwords > attr_vec_qwords ||
+				    regs->nr_pred > attr_nr_pred ||
+				    regs->pred_qwords > attr_pred_qwords)
+					goto out_efault;
+
+				sz = (regs->nr_vectors * regs->vector_qwords +
+				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);
 				OVERFLOW_CHECK(array, sz, max_size);
+
+				regs->simd_data = (u64 *)array;
 				array = (void *)array + sz;
 			}
 		}
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index e556c9b656ea..95f921d482ad 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -16,6 +16,11 @@ struct regs_dump {
 	u64 abi;
 	u64 mask;
 	u64 *regs;
+	u64 nr_vectors;
+	u64 vector_qwords;
+	u64 nr_pred;
+	u64 pred_qwords;
+	u64 *simd_data;
 
 	/* Cached values/mask filled by first register access. */
 	u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE];
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 9e36c834a8f4..cd8e9aaa10a1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -979,6 +979,82 @@ static void regs_dump__printf(u64 mask, struct regs_dump *regs,
 	}
 }
 
+static void simd_regs_dump__printf(uint16_t e_machine, struct regs_dump *regs, bool intr)
+{
+	const char *name = "unknown";
+	const char *simd_header;
+	u32 i, j, idx, pred_base;
+	uint16_t qwords;
+	int reg_c;
+
+	if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD))
+		return;
+
+	if (!regs->nr_vectors && !regs->nr_pred)
+		return;
+
+	simd_header = "... SIMD ABI nr_vectors %" PRIu64 " vector_qwords %" PRIu64 \
+		      " nr_pred %" PRIu64 "  pred_qwords %" PRIu64 "\n";
+	printf(simd_header, regs->nr_vectors, regs->vector_qwords,
+	       regs->nr_pred, regs->pred_qwords);
+
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		if (intr) {
+			perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c,
+							       &qwords, /*pred=*/false);
+		} else {
+			perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c,
+							       &qwords, /*pred=*/false);
+		}
+		if (regs->vector_qwords == qwords) {
+			name = perf_simd_reg_class_name(e_machine, reg_c, /*pred=*/false);
+			break;
+		}
+	}
+
+	for (i = 0; i < regs->nr_vectors; i++) {
+		for (j = 0; j < regs->vector_qwords; j++) {
+			idx = i * regs->vector_qwords + j;
+			if (regs->vector_qwords > 1) {
+				printf(".... %3s[%d][%d] 0x%016" PRIx64 "\n",
+				       name, i, j, regs->simd_data[idx++]);
+			} else {
+				printf(".... %3s[%d] 0x%016" PRIx64 "\n",
+				       name, i, regs->simd_data[idx++]);
+			}
+		}
+	}
+
+	name = "unknown";
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		if (intr) {
+			perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c,
+							       &qwords, /*pred=*/true);
+		} else {
+			perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c,
+							       &qwords, /*pred=*/true);
+		}
+		if (regs->pred_qwords == qwords) {
+			name = perf_simd_reg_class_name(e_machine, reg_c, /*pred=*/true);
+			break;
+		}
+	}
+
+	pred_base = regs->nr_vectors * regs->vector_qwords;
+	for (i = 0; i < regs->nr_pred; i++) {
+		for (j = 0; j < regs->pred_qwords; j++) {
+			idx = pred_base + i * regs->pred_qwords + j;
+			if (regs->pred_qwords > 1) {
+				printf(".... %3s[%d][%d] 0x%016" PRIx64 "\n",
+				       name, i, j, regs->simd_data[idx++]);
+			} else {
+				printf(".... %3s[%d] 0x%016" PRIx64 "\n",
+				       name, i, regs->simd_data[idx++]);
+			}
+		}
+	}
+}
+
 static const char *regs_abi[] = {
 	[PERF_SAMPLE_REGS_ABI_NONE] = "none",
 	[PERF_SAMPLE_REGS_ABI_32] = "32-bit",
@@ -1019,6 +1095,7 @@ static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, ui
 
 	if (user_regs->regs)
 		regs__printf("user", user_regs, e_machine, e_flags);
+	simd_regs_dump__printf(e_machine, user_regs, /*intr=*/false);
 }
 
 static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
@@ -1032,6 +1109,7 @@ static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, ui
 
 	if (intr_regs->regs)
 		regs__printf("intr", intr_regs, e_machine, e_flags);
+	simd_regs_dump__printf(e_machine, intr_regs, /*intr=*/true);
 }
 
 static void stack_user__printf(struct stack_dump *dump)
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index ce61734cd5d2..461a4633fd4e 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1524,8 +1524,13 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 		if (sample->user_regs && sample->user_regs->abi) {
 			result += sizeof(u64);
 			sz = hweight64(sample->user_regs->mask) * sizeof(u64);
-			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
 				sz += 4 * sizeof(u64);
+				sz += (sample->user_regs->nr_vectors *
+				       sample->user_regs->vector_qwords +
+				       sample->user_regs->nr_pred *
+				       sample->user_regs->pred_qwords) * sizeof(u64);
+			}
 			result += sz;
 		} else {
 			result += sizeof(u64);
@@ -1554,8 +1559,13 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 		if (sample->intr_regs && sample->intr_regs->abi) {
 			result += sizeof(u64);
 			sz = hweight64(sample->intr_regs->mask) * sizeof(u64);
-			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
 				sz += 4 * sizeof(u64);
+				sz += (sample->intr_regs->nr_vectors *
+				       sample->intr_regs->vector_qwords +
+				       sample->intr_regs->nr_pred *
+				       sample->intr_regs->pred_qwords) * sizeof(u64);
+			}
 			result += sz;
 		} else {
 			result += sizeof(u64);
@@ -1733,8 +1743,13 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		if (sample->user_regs && sample->user_regs->abi) {
 			*array++ = sample->user_regs->abi;
 			sz = hweight64(sample->user_regs->mask) * sizeof(u64);
-			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
 				sz += 4 * sizeof(u64);
+				sz += (sample->user_regs->nr_vectors *
+				       sample->user_regs->vector_qwords +
+				       sample->user_regs->nr_pred *
+				       sample->user_regs->pred_qwords) * sizeof(u64);
+			}
 			memcpy(array, sample->user_regs->regs, sz);
 			array = (void *)array + sz;
 		} else {
@@ -1771,8 +1786,13 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		if (sample->intr_regs && sample->intr_regs->abi) {
 			*array++ = sample->intr_regs->abi;
 			sz = hweight64(sample->intr_regs->mask) * sizeof(u64);
-			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
 				sz += 4 * sizeof(u64);
+				sz += (sample->intr_regs->nr_vectors *
+				       sample->intr_regs->vector_qwords +
+				       sample->intr_regs->nr_pred *
+				       sample->intr_regs->pred_qwords) * sizeof(u64);
+			}
 			memcpy(array, sample->intr_regs->regs, sz);
 			array = (void *)array + sz;
 		} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v8 5/5] perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers
  2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
                   ` (3 preceding siblings ...)
  2026-05-29  8:24 ` [Patch v8 4/5] perf regs: Enable dumping of SIMD registers Dapeng Mi
@ 2026-05-29  8:24 ` Dapeng Mi
  4 siblings, 0 replies; 10+ messages in thread
From: Dapeng Mi @ 2026-05-29  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Dave Hansen, Ian Rogers,
	Adrian Hunter, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Eranian Stephane
  Cc: Mark Rutland, broonie, Ravi Bangoria, linux-kernel,
	linux-perf-users, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Dapeng Mi

Enhance the x86-specific DWARF register handling by adding support for
SIMD and eGPRs registers.

This update is based on the "DWARF Register Number Mapping" table from
the "System V Application Binary Interface AMD64 Architecture Processor
Supplement" (version 1.0).

Modifications include:
- Updating the x86_64_regidx_table[] array to incorporate SIMD and eGPRs
  registers.
- Enhancing the __get_dwarf_regnum_for_perf_regnum_x86_64() function to
  retrieve the DWARF register index for eGPRs.
- Enlarge the x86_64 supported register number to 146 to cover eGPRs and
  SIMD registers (get_libdw_frame_nregs()).

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 .../util/dwarf-regs-arch/dwarf-regs-x86.c     | 138 +++++++++++++++---
 tools/perf/util/dwarf-regs.c                  |   7 +-
 tools/perf/util/include/dwarf-regs.h          |   7 +-
 tools/perf/util/unwind-libdw.c                |   6 +-
 4 files changed, 129 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c
index cadef120aeb4..b014a36d21b5 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c
@@ -90,22 +90,22 @@ static const struct dwarf_regs_idx x86_64_regidx_table[] = {
 	{ "r14", 14 }, { "r14d", 14 }, { "r14w", 14 }, { "r14b", 14 },
 	{ "r15", 15 }, { "r15d", 15 }, { "r15w", 15 }, { "r15b", 15 },
 	// 16 - Return Address RA
-	{ "xmm0", 17},
-	{ "xmm1", 18},
-	{ "xmm2", 19},
-	{ "xmm3", 20},
-	{ "xmm4", 21},
-	{ "xmm5", 22},
-	{ "xmm6", 23},
-	{ "xmm7", 24},
-	{ "xmm8", 25},
-	{ "xmm9", 26},
-	{ "xmm10", 27},
-	{ "xmm11", 28},
-	{ "xmm12", 29},
-	{ "xmm13", 30},
-	{ "xmm14", 31},
-	{ "xmm15", 32},
+	{ "zmm0", 17 }, { "ymm0", 17 }, { "xmm0", 17 },
+	{ "zmm1", 18 }, { "ymm1", 18 }, { "xmm1", 18 },
+	{ "zmm2", 19 }, { "ymm2", 19 }, { "xmm2", 19 },
+	{ "zmm3", 20 }, { "ymm3", 20 }, { "xmm3", 20 },
+	{ "zmm4", 21 }, { "ymm4", 21 }, { "xmm4", 21 },
+	{ "zmm5", 22 }, { "ymm5", 22 }, { "xmm5", 22 },
+	{ "zmm6", 23 }, { "ymm6", 23 }, { "xmm6", 23 },
+	{ "zmm7", 24 }, { "ymm7", 24 }, { "xmm7", 24 },
+	{ "zmm8", 25 }, { "ymm8", 25 }, { "xmm8", 25 },
+	{ "zmm9", 26 }, { "ymm9", 26 }, { "xmm9", 26 },
+	{ "zmm10", 27 }, { "ymm10", 27 }, { "xmm10", 27 },
+	{ "zmm11", 28 }, { "ymm11", 28 }, { "xmm11", 28 },
+	{ "zmm12", 29 }, { "ymm12", 29 }, { "xmm12", 29 },
+	{ "zmm13", 30 }, { "ymm13", 30 }, { "xmm13", 30 },
+	{ "zmm14", 31 }, { "ymm14", 31 }, { "xmm14", 31 },
+	{ "zmm15", 32 }, { "ymm15", 32 }, { "xmm15", 32 },
 	{ "st0", 33},
 	{ "st1", 34},
 	{ "st2", 35},
@@ -129,7 +129,7 @@ static const struct dwarf_regs_idx x86_64_regidx_table[] = {
 	{ "ds", 53},
 	{ "fs", 54},
 	{ "gs", 55},
-	// 56-47 - reserved
+	// 56-57 - reserved
 	{ "fs.base", 58},
 	{ "gs.base", 59},
 	// 60-61 - reserved
@@ -138,6 +138,49 @@ static const struct dwarf_regs_idx x86_64_regidx_table[] = {
 	{ "mxcsr", 64}, // 128-bit Media Control and Status
 	{ "fcw", 65}, // x87 Control Word
 	{ "fsw", 66}, // x87 Status Word
+	// 67-82 - Upper Vector Registers 16–31
+	{ "zmm16", 67 }, { "ymm16", 67 }, { "xmm16", 67 },
+	{ "zmm17", 68 }, { "ymm17", 68 }, { "xmm17", 68 },
+	{ "zmm18", 69 }, { "ymm18", 69 }, { "xmm18", 69 },
+	{ "zmm19", 70 }, { "ymm19", 70 }, { "xmm19", 70 },
+	{ "zmm20", 71 }, { "ymm20", 71 }, { "xmm20", 71 },
+	{ "zmm21", 72 }, { "ymm21", 72 }, { "xmm21", 72 },
+	{ "zmm22", 73 }, { "ymm22", 73 }, { "xmm22", 73 },
+	{ "zmm23", 74 }, { "ymm23", 74 }, { "xmm23", 74 },
+	{ "zmm24", 75 }, { "ymm24", 75 }, { "xmm24", 75 },
+	{ "zmm25", 76 }, { "ymm25", 76 }, { "xmm25", 76 },
+	{ "zmm26", 77 }, { "ymm26", 77 }, { "xmm26", 77 },
+	{ "zmm27", 78 }, { "ymm27", 78 }, { "xmm27", 78 },
+	{ "zmm28", 79 }, { "ymm28", 79 }, { "xmm28", 79 },
+	{ "zmm29", 80 }, { "ymm29", 80 }, { "xmm29", 80 },
+	{ "zmm30", 81 }, { "ymm30", 81 }, { "xmm30", 81 },
+	{ "zmm31", 82 }, { "ymm31", 82 }, { "xmm31", 82 },
+	// 118-125 - Vector Mask Registers 0–7
+	{ "k0", 118 },
+	{ "k1", 119 },
+	{ "k2", 120 },
+	{ "k3", 121 },
+	{ "k4", 122 },
+	{ "k5", 123 },
+	{ "k6", 124 },
+	{ "k7", 125 },
+	// 130-145 - APX Integer Registers 16-31
+	{ "r16", 130 }, { "r16d", 130 }, { "r16w", 130 }, { "r16b", 130 },
+	{ "r17", 131 }, { "r17d", 131 }, { "r17w", 131 }, { "r17b", 131 },
+	{ "r18", 132 }, { "r18d", 132 }, { "r18w", 132 }, { "r18b", 132 },
+	{ "r19", 133 }, { "r19d", 133 }, { "r19w", 133 }, { "r19b", 133 },
+	{ "r20", 134 }, { "r20d", 134 }, { "r20w", 134 }, { "r20b", 134 },
+	{ "r21", 135 }, { "r21d", 135 }, { "r21w", 135 }, { "r21b", 135 },
+	{ "r22", 136 }, { "r22d", 136 }, { "r22w", 136 }, { "r22b", 136 },
+	{ "r23", 137 }, { "r23d", 137 }, { "r23w", 137 }, { "r23b", 137 },
+	{ "r24", 138 }, { "r24d", 138 }, { "r24w", 138 }, { "r24b", 138 },
+	{ "r25", 139 }, { "r25d", 139 }, { "r25w", 139 }, { "r25b", 139 },
+	{ "r26", 140 }, { "r26d", 140 }, { "r26w", 140 }, { "r26b", 140 },
+	{ "r27", 141 }, { "r27d", 141 }, { "r27w", 141 }, { "r27b", 141 },
+	{ "r28", 142 }, { "r28d", 142 }, { "r28w", 142 }, { "r28b", 142 },
+	{ "r29", 143 }, { "r29d", 143 }, { "r29w", 143 }, { "r29b", 143 },
+	{ "r30", 144 }, { "r30d", 144 }, { "r30w", 144 }, { "r30b", 144 },
+	{ "r31", 145 }, { "r31d", 145 }, { "r31w", 145 }, { "r31b", 145 },
 	// End of regular dwarf registers.
 	{ "rip", DWARF_REG_PC }, { "eip", DWARF_REG_PC }, { "ip", DWARF_REG_PC },
 };
@@ -204,7 +247,7 @@ int __get_dwarf_regnum_for_perf_regnum_i386(int perf_regnum)
 	return dwarf_i386_regnums[perf_regnum];
 }
 
-int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum)
+int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum, int abi)
 {
 	static const int dwarf_x86_64_regnums[] = {
 		[PERF_REG_X86_AX] = 0,
@@ -248,13 +291,66 @@ int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum)
 		[PERF_REG_X86_XMM14] = 31,
 		[PERF_REG_X86_XMM15] = 32,
 	};
+	static const int dwarf_x86_64_regnums_apx[] = {
+		[PERF_REG_X86_AX] = 0,
+		[PERF_REG_X86_BX] = 3,
+		[PERF_REG_X86_CX] = 2,
+		[PERF_REG_X86_DX] = 1,
+		[PERF_REG_X86_SI] = 4,
+		[PERF_REG_X86_DI] = 5,
+		[PERF_REG_X86_BP] = 6,
+		[PERF_REG_X86_SP] = 7,
+		[PERF_REG_X86_IP] = 16,
+		[PERF_REG_X86_FLAGS] = 49,
+		[PERF_REG_X86_CS] = 51,
+		[PERF_REG_X86_SS] = 52,
+		[PERF_REG_X86_DS] = 53,
+		[PERF_REG_X86_ES] = 50,
+		[PERF_REG_X86_FS] = 54,
+		[PERF_REG_X86_GS] = 55,
+		[PERF_REG_X86_R8] = 8,
+		[PERF_REG_X86_R9] = 9,
+		[PERF_REG_X86_R10] = 10,
+		[PERF_REG_X86_R11] = 11,
+		[PERF_REG_X86_R12] = 12,
+		[PERF_REG_X86_R13] = 13,
+		[PERF_REG_X86_R14] = 14,
+		[PERF_REG_X86_R15] = 15,
+		[PERF_REG_X86_R16] = 130,
+		[PERF_REG_X86_R17] = 131,
+		[PERF_REG_X86_R18] = 132,
+		[PERF_REG_X86_R19] = 133,
+		[PERF_REG_X86_R20] = 134,
+		[PERF_REG_X86_R21] = 135,
+		[PERF_REG_X86_R22] = 136,
+		[PERF_REG_X86_R23] = 137,
+		[PERF_REG_X86_R24] = 138,
+		[PERF_REG_X86_R25] = 139,
+		[PERF_REG_X86_R26] = 140,
+		[PERF_REG_X86_R27] = 141,
+		[PERF_REG_X86_R28] = 142,
+		[PERF_REG_X86_R29] = 143,
+		[PERF_REG_X86_R30] = 144,
+		[PERF_REG_X86_R31] = 145,
+	};
 
 	if (perf_regnum == 0)
 		return 0;
 
-	if (perf_regnum <  0 || perf_regnum > (int)ARRAY_SIZE(dwarf_x86_64_regnums) ||
-	    dwarf_x86_64_regnums[perf_regnum] == 0)
+	if (perf_regnum < 0)
+		return -ENOENT;
+
+	if (!(abi & PERF_SAMPLE_REGS_ABI_SIMD) &&
+	    (perf_regnum >= (int)ARRAY_SIZE(dwarf_x86_64_regnums) ||
+	     dwarf_x86_64_regnums[perf_regnum] == 0))
+		return -ENOENT;
+
+	if ((abi & PERF_SAMPLE_REGS_ABI_SIMD) &&
+	    (perf_regnum >= (int)ARRAY_SIZE(dwarf_x86_64_regnums_apx) ||
+	     dwarf_x86_64_regnums_apx[perf_regnum] == 0))
 		return -ENOENT;
 
-	return dwarf_x86_64_regnums[perf_regnum];
+	return abi & PERF_SAMPLE_REGS_ABI_SIMD ?
+			dwarf_x86_64_regnums_apx[perf_regnum] :
+			dwarf_x86_64_regnums[perf_regnum];
 }
diff --git a/tools/perf/util/dwarf-regs.c b/tools/perf/util/dwarf-regs.c
index 797f455eba0d..9e2a0c93ecc9 100644
--- a/tools/perf/util/dwarf-regs.c
+++ b/tools/perf/util/dwarf-regs.c
@@ -158,7 +158,7 @@ static int get_libdw_frame_nregs(unsigned int machine, unsigned int flags __mayb
 {
 	switch (machine) {
 	case EM_X86_64:
-		return 17;
+		return 146;	/* Support APX eGPRs. */
 	case EM_386:
 		return 9;
 	case EM_ARM:
@@ -187,13 +187,14 @@ static int get_libdw_frame_nregs(unsigned int machine, unsigned int flags __mayb
 }
 
 int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine,
-				     unsigned int flags, bool only_libdw_supported)
+				     unsigned int flags,
+				     bool only_libdw_supported, int abi)
 {
 	int reg;
 
 	switch (machine) {
 	case EM_X86_64:
-		reg = __get_dwarf_regnum_for_perf_regnum_x86_64(perf_regnum);
+		reg = __get_dwarf_regnum_for_perf_regnum_x86_64(perf_regnum, abi);
 		break;
 	case EM_386:
 		reg = __get_dwarf_regnum_for_perf_regnum_i386(perf_regnum);
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..92cf0af93e9e 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -103,7 +103,7 @@ int __get_csky_regnum(const char *name, unsigned int flags);
 int __get_dwarf_regnum_i386(const char *name);
 int __get_dwarf_regnum_x86_64(const char *name);
 int __get_dwarf_regnum_for_perf_regnum_i386(int perf_regnum);
-int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum);
+int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum, int abi);
 
 int __get_dwarf_regnum_for_perf_regnum_arm(int perf_regnum);
 int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum);
@@ -125,8 +125,9 @@ int get_dwarf_regnum(const char *name, unsigned int machine, unsigned int flags)
 /*
  * get_dwarf_regnum - Returns DWARF regnum from perf register number.
  */
-int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, unsigned int flags,
-				     bool only_libdw_supported);
+int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine,
+				     unsigned int flags,
+				     bool only_libdw_supported, int abi);
 
 void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
 
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 05e8e68bd49c..678db5a65ada 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -273,7 +273,8 @@ static bool libdw_set_initial_registers(Dwfl_Thread *thread, void *arg)
 			int dwarf_reg =
 				get_dwarf_regnum_for_perf_regnum(perf_reg, e_machine,
 								 e_flags,
-								 /*only_libdw_supported=*/true);
+								 /*only_libdw_supported=*/true,
+								 user_regs->abi);
 			if (dwarf_reg > max_dwarf_reg)
 				max_dwarf_reg = dwarf_reg;
 		}
@@ -288,7 +289,8 @@ static bool libdw_set_initial_registers(Dwfl_Thread *thread, void *arg)
 			int dwarf_reg =
 				get_dwarf_regnum_for_perf_regnum(perf_reg, e_machine,
 								 e_flags,
-								 /*only_libdw_supported=*/true);
+								 /*only_libdw_supported=*/true,
+								 user_regs->abi);
 			if (dwarf_reg >= 0) {
 				val = 0;
 				if (perf_reg_value(&val, user_regs, perf_reg) == 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers
  2026-05-29  8:24 ` [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Dapeng Mi
@ 2026-05-29  9:23   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-29  9:23 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] The newly introduced variable-length SIMD payload in `PERF_SAMPLE_REGS_USER` and `PERF_SAMPLE_REGS_INTR` breaks sample parsing, leading to severe data corruption for subsequent fields.
- [High] The newly added SIMD configuration and reserved fields in `struct perf_event_attr` are not byte-swapped in `perf_event__attr_swap()`.

Pre-existing issues:
- [Critical] An integer underflow in `read_attr` when processing the `size` field of a `perf_event_attr` structure leads to a massive stack-based buffer overflow.
- [Critical] In `read_event_desc`, a lack of minimum size validation on the `sz` variable leads to out-of-bounds heap memory corruption during byte-swapping.
--

commit 40d194b08491e7774620de12bd78192095f031fd
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf headers: Sync perf_event.h/perf_regs.h with the kernel headers

This patch syncs the UAPI header changes to support SIMD, eGPRs, and SSP
sampling into the corresponding perf tool UAPI headers. It also adds a
sanity check for the newly introduced __reserved_4 field in
perf_attr_check().

> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index fd10aa8d697f2..c49fc76292f79 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
[ ... ]
> @@ -547,6 +549,29 @@ struct perf_event_attr {
>  
>  	__u64	config3; /* extension of config2 */
>  	__u64	config4; /* extension of config3 */
> +
> +	/*
> +	 * Defines the sampling SIMD/PRED(predicate) registers bitmap and
> +	 * qwords (8 bytes) length.
> +	 *
> +	 * sample_simd_regs_enabled != 0 indicates there are SIMD/PRED
> +	 * registers to be sampled, the SIMD/PRED registers bitmap and
> +	 * qwords length are represented in
> +	 * sample_simd_{vec|pred}_reg_{intr|user} and
> +	 * sample_simd_{vec|pred}_reg_qwords fields separately.
> +	 *
> +	 * sample_simd_regs_enabled == 0 indicates no SIMD/PRED registers
> +	 * are sampled.
> +	 */
> +	__u16	sample_simd_regs_enabled;
> +	__u16	sample_simd_pred_reg_qwords;
> +	__u16	sample_simd_vec_reg_qwords;
> +	__u16	__reserved_4;
> +
> +	__u32	sample_simd_pred_reg_intr;
> +	__u32	sample_simd_pred_reg_user;
> +	__u64	sample_simd_vec_reg_intr;
> +	__u64	sample_simd_vec_reg_user;
>  };

[Severity: High]
Are these newly added fields missing byte-swapping logic in
perf_event__attr_swap()?

When perf reads a cross-endian perf.data file, if these fields are not added
to perf_event__attr_swap() in tools/perf/util/session.c, they will remain in
their original byte order. 

Later, when evsel__parse_sample() relies on fields like attr_nr_vectors and
attr_vec_qwords to validate SIMD data bounds, could incorrect values bypass
validation or cause out-of-bounds read faults?

[ ... ]
> @@ -1020,7 +1045,15 @@ enum perf_event_type {
>  	 *      } && PERF_SAMPLE_BRANCH_STACK
>  	 *
>  	 *	{ u64			abi; # enum perf_sample_regs_abi
> -	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> +	 *	  u64			regs[weight(mask)];
> +	 *	  struct {
> +	 *		u64 nr_vectors;		# 0 ... weight(sample_simd_vec_reg_user)
> +	 *		u64 vector_qwords;	# 0 ... sample_simd_vec_reg_qwords
> +	 *		u64 nr_pred;		# 0 ... weight(sample_simd_pred_reg_user)
> +	 *		u64 pred_qwords;	# 0 ... sample_simd_pred_reg_qwords
> +	 *		u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> +	 *	  } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +	 *	} && PERF_SAMPLE_REGS_USER

[Severity: High]
Does appending this dynamic-length SIMD structure to PERF_SAMPLE_REGS_USER
break sample parsing for older perf tools?

Older parsers do not know about the PERF_SAMPLE_REGS_ABI_SIMD bit. When
parsing the sample payload, they will only advance the array pointer by
hweight64(attr.sample_regs_user) * sizeof(u64) bytes, failing to skip the new
SIMD header and data. 

Could this cause subsequent fields like PERF_SAMPLE_STACK_USER to read the
SIMD data as their own, resulting in parsing desynchronization and data
corruption?

Should a new bit be added to enum perf_event_sample_format instead?

[ ... ]
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f30e48eb3fc32..e8e4e00d6b4d0 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c

[Severity: Critical]
This isn't a bug introduced by this patch, but is there an integer underflow
in read_attr() when processing the size field of the perf_event_attr structure?

The function reads the first 64 bytes and extracts attr->size into sz. It
checks if sz == 0 or sz > our_sz, but fails to verify if sz < PERF_ATTR_SIZE_VER0.
If a perf.data file supplies a size between 1 and 63, could the calculation
left = sz - PERF_ATTR_SIZE_VER0 underflow the unsigned size_t variable left,
producing a huge positive value?

The subsequent readn(fd, ptr, left) attempts to read this massive amount of
data into f_attr, which is allocated on the stack in
perf_session__read_header(), potentially resulting in a stack buffer overflow.

[Severity: Critical]
This is a pre-existing issue, but does read_event_desc() lack minimum size
validation on the sz variable before byte-swapping?

The function reads a 32-bit sz from the file, allocates a heap buffer, and
calls perf_event__attr_swap(buf). Inside this function, the code
unconditionally reads and byte-swaps the first 8 bytes.

If sz is between 1 and 7, attr->size reads out-of-bounds heap memory. Because
bswap_safe() uses this uncontrolled attr->size to determine which subsequent
fields to swap, could it proceed to byte-swap up to 176 bytes of heap memory
past the small allocation, leading to heap memory corruption?

> @@ -2091,7 +2091,8 @@ static void free_event_desc(struct evsel *events)
>  
>  static bool perf_attr_check(struct perf_event_attr *attr)
>  {
> -	if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) {
> +	if (attr->__reserved_1 || attr->__reserved_2 ||
> +	    attr->__reserved_3 || attr->__reserved_4) {
>  		pr_warning("Reserved bits are set unexpectedly. "
>  			   "Please update perf tool.\n");
>  		return false;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529082451.591783-1-dapeng1.mi@linux.intel.com?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling
  2026-05-29  8:24 ` [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
@ 2026-05-29 10:06   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-29 10:06 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Intel PT PEBS register synthesis is silently disabled on SIMD-capable machines.
- [Medium] Requesting legacy XMM registers (e.g., `-I XMM0`) fails with an 'Unknown register' error on modern CPUs, breaking backwards compatibility.
--

commit a0d7b85bb399d047c94284c24c34805466f84209
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: perf regs: Support x86 eGPRs/SSP sampling

This commit adds support for sampling x86 extended GP registers (R16-R31) and
the shadow stack pointer (SSP) register. It reclaims the original XMM registers
space in sample_regs_user/sample_regs_intr to represent the eGPRs and SSP when
SIMD sampling is supported, using a new abi argument to distinguish layouts.

> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index fc9eec8b54b82..2729ad8c6d263 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -2470,6 +2470,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
>  	}
>  
>  	if (sample_type & PERF_SAMPLE_REGS_INTR &&
> +	    !evsel->core.attr.sample_simd_regs_enabled &&
>  	    (items->mask[INTEL_PT_GP_REGS_POS] ||
>  	     items->mask[INTEL_PT_XMM_POS])) {
>  		u64 regs_mask = evsel->core.attr.sample_regs_intr;

[Severity: Medium]
Does this completely disable PEBS register synthesis on SIMD-capable machines?

Since PERF_SAMPLE_REGS_ABI_SIMD is set by default on supported platforms in
__perf_reg_mask_x86(), opts->sample_simd_regs_enabled will be set to 1 for
any register request (like -I ax).

Could this cause basic GPR synthesis to be silently skipped here, resulting in
a functional regression for userspace tooling?

[ ... ]

> diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> index b6d20522b4e8f..ae26d991cdc9f 100644
> --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> @@ -276,13 +276,38 @@ uint64_t __perf_reg_mask_x86(bool intr)
[ ... ]
> +uint64_t __perf_reg_mask_x86(bool intr, int *abi)
> +{
[ ... ]
> +	if (mask != PERF_REGS_MASK) {
> +		*abi |= PERF_SAMPLE_REGS_ABI_SIMD;
> +	} else {
> +		mask |= __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK,
> +					 false);
>  	}
>  
> -	return PERF_REGS_MASK;
> +	return mask;
>  }
[ ... ]
> +const char *__perf_reg_name_x86(int id, int abi)
> +{
> +	const char *name;
> +
> +	name = __arch_reg_gpr_name(id);
> +	if (name)
> +		return name;
> +
> +	if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +		name = __arch_reg_egpr_name(id);
> +	else
> +		name = __arch_reg_xmm_name(id);
> +
> +	return name;
> +}

[Severity: Medium]
Does this break backwards compatibility for legacy XMM register requests?

When the CPU supports eGPRs/SSP, PERF_SAMPLE_REGS_ABI_SIMD is set and the
legacy XMM registers (PERF_REG_EXTENDED_MASK) are excluded from the returned
mask because they share the same bit space.

If __perf_reg_name_x86() maps these overlapping IDs exclusively to eGPR names
(for instance, mapping ID 32 to R16), will name_to_perf_reg_mask() fail to
resolve legacy requests like -I XMM0 and throw an unknown register error?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529082451.591783-1-dapeng1.mi@linux.intel.com?part=2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling
  2026-05-29  8:24 ` [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling Dapeng Mi
@ 2026-05-29 10:45   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-29 10:45 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Parsing of `--intr-regs` and `--user-regs` with multiple options is order-dependent, preventing intended combination of General Purpose (GP) and SIMD registers.
- [Medium] Specifying only SIMD registers (e.g., `--intr-regs=xmm` without GP registers) leaves the general-purpose register mask as 0, causing the kernel to reject the event configuration with `-EINVAL`.
- [Low] The `sample_simd_regs_enabled` field is omitted from `perf_event_attr__fprintf()`.
--

commit 03a81994cf381ee1d67a41deb452833fc73a023e
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf regs: Support x86 SIMD registers sampling

This patch adds support for the newly introduced SIMD register sampling
format by adding functions to support parsing and sampling x86 SIMD/PRED
registers.

> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1c856a2ecc6ec..cd62af14a4f54 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c

[ ... ]

> @@ -1594,6 +1595,16 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
>  		evsel__set_sample_bit(evsel, REGS_INTR);
>  	}
>  
> +	if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) &&
> +	    !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
> +		attr->sample_simd_regs_enabled = !!opts->sample_simd_regs_enabled;
> +		attr->sample_simd_vec_reg_intr = opts->sample_intr_vec_regs;
> +		attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
> +		attr->sample_simd_pred_reg_intr = opts->sample_intr_pred_regs;
> +		attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
> +		evsel__set_sample_bit(evsel, REGS_INTR);
> +	}

[Severity: Medium]
Will specifying only SIMD registers (e.g., perf record --intr-regs=xmm)
leave the general-purpose register mask as 0?

If opts->sample_intr_regs is 0, attr->sample_regs_intr remains 0. However,
setting the REGS_INTR bit here causes the kernel validation function
(perf_reg_validate()) to unconditionally return -EINVAL because the sample
type bit is set but the corresponding GP register mask is 0.

> diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
> index 70a1cc90b2c13..26d560a486c85 100644
> --- a/tools/perf/util/parse-regs-options.c
> +++ b/tools/perf/util/parse-regs-options.c

[ ... ]

> @@ -53,22 +106,96 @@ name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)

[ ... ]

>  	/*
> -	 * cannot set it twice
> +	 * Non-SIMD registers cannot be set twice.
> +	 * SIMD registers can be set multiple times, but only the register
> +	 * class with largest length (qwords) is sampled.
>  	 */
>  	if (*mode)
>  		return -1;

[Severity: Medium]
Does the order of options affect whether they are parsed successfully?

If SIMD registers are parsed first (e.g., -I xmm), *mode is left as 0,
which allows a subsequent GP register option like -I ax to be parsed
successfully.

However, if GP registers are parsed first (e.g., -I ax), *mode is set to
a non-zero value, which triggers this failure check when -I xmm is
evaluated. Does this prevent intended combinations of GP and SIMD registers
from working consistently?

> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> index 741c3d657a8b6..c6b8e53e06fd1 100644
> --- a/tools/perf/util/perf_event_attr_fprintf.c
> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> @@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
>  	PRINT_ATTRf(aux_start_paused, p_unsigned);
>  	PRINT_ATTRf(aux_pause, p_unsigned);
>  	PRINT_ATTRf(aux_resume, p_unsigned);
> +	PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned);
> +	PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex);
> +	PRINT_ATTRf(sample_simd_pred_reg_user, p_hex);
> +	PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned);
> +	PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex);
> +	PRINT_ATTRf(sample_simd_vec_reg_user, p_hex);
>  
>  	return ret;
>  }

[Severity: Low]
Does this also need to print the sample_simd_regs_enabled field?

This patch adds several new SIMD-related fields to struct perf_event_attr,
but sample_simd_regs_enabled is missing here. Could this prevent developers
from verifying whether the tool correctly enabled SIMD sampling when inspecting
configurations via debug commands like perf record -vvv?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529082451.591783-1-dapeng1.mi@linux.intel.com?part=3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch v8 4/5] perf regs: Enable dumping of SIMD registers
  2026-05-29  8:24 ` [Patch v8 4/5] perf regs: Enable dumping of SIMD registers Dapeng Mi
@ 2026-05-29 11:23   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-29 11:23 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Low] Misleading post-increment of `idx` in `simd_regs_dump__printf` inner loop.

Pre-existing issues:
- [High] Integer underflow in manual `sample_size` calculation in `perf_event__convert_sample_callchain` corrupts the event stream if registers are missing.
--

commit 7e4b22be95dca28326d1271ce91f25c99e231f74
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf regs: Enable dumping of SIMD registers

This patch adds support for dumping SIMD registers using the new
PERF_SAMPLE_REGS_ABI_SIMD ABI.

> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -457,8 +457,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>  	/* adjust sample size for stack and regs */
>  	sample_size -= sample->user_stack.size;
>  	sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);

[Severity: High]
This is a pre-existing issue, but does unconditionally subtracting the full
expected size of the register mask cause an integer underflow if the perf
sample didn't capture user registers?

If the sample didn't capture user registers (e.g., when sampling a kernel
thread), the ABI is PERF_SAMPLE_REGS_ABI_NONE (0). It seems subtracting the
full mask size would underflow sample_size (a u32), resulting in a massive
size being written to event_copy->header.size.

Could this corrupt the event stream or cause out-of-bounds reads when
perf_event__repipe_synth writes to the pipe?

> -	if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)
> -		sample_size -= 4 * sizeof(u64);	/* Reduce SIMD regs header size */
> +	if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
> +		sample_size -= 4 * sizeof(u64);
> +		sample_size -= (sample->user_regs->nr_vectors *
> +				sample->user_regs->vector_qwords +
> +				sample->user_regs->nr_pred *
> +				sample->user_regs->pred_qwords) * sizeof(u64);
> +	}
>  	sample_size += (sample->callchain->nr + 1) * sizeof(u64);

[ ... ]

> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c

[ ... ]

> +	for (i = 0; i < regs->nr_vectors; i++) {
> +		for (j = 0; j < regs->vector_qwords; j++) {
> +			idx = i * regs->vector_qwords + j;
> +			if (regs->vector_qwords > 1) {
> +				printf(".... %3s[%d][%d] 0x%016" PRIx64 "\n",
> +				       name, i, j, regs->simd_data[idx++]);

[Severity: Low]
Is the post-increment of idx necessary here?

Since idx is re-initialized at the start of each inner loop iteration
(idx = i * regs->vector_qwords + j), the incremented value seems to be
discarded on the next iteration.

> +			} else {
> +				printf(".... %3s[%d] 0x%016" PRIx64 "\n",
> +				       name, i, regs->simd_data[idx++]);
> +			}
> +		}
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529082451.591783-1-dapeng1.mi@linux.intel.com?part=4

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-29 11:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  8:24 [Patch v8 0/5] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
2026-05-29  8:24 ` [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Dapeng Mi
2026-05-29  9:23   ` sashiko-bot
2026-05-29  8:24 ` [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
2026-05-29 10:06   ` sashiko-bot
2026-05-29  8:24 ` [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling Dapeng Mi
2026-05-29 10:45   ` sashiko-bot
2026-05-29  8:24 ` [Patch v8 4/5] perf regs: Enable dumping of SIMD registers Dapeng Mi
2026-05-29 11:23   ` sashiko-bot
2026-05-29  8:24 ` [Patch v8 5/5] perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers Dapeng Mi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox