public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling
@ 2026-02-09  8:35 Dapeng Mi
  2026-02-09  8:35 ` [Patch v6 1/4] perf headers: Sync with the kernel headers Dapeng Mi
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dapeng Mi @ 2026-02-09  8:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Zide Chen, Falcon Thomas,
	Dapeng Mi, Xudong Hao, Dapeng Mi

This patch-set adds sampling support for x86 eGPRs/SSP/SIMD registers in
perf tools base on the corresponding sampling support for eGPRs/SSP/SIMD
registers in kernel[1]. In previous version, these perf-tools patches
are integrated as a whole patch-set with the kernel patches, but it's
split and posted to an independent perf-tools patch-set for convenient
review.

Changes since v5:
- Split the sampling support for eGPRs/SSP registers and SIMD registers
  into 2 patches.
- Address Ian's comments including,
  * Convert the architecture dependent functions into regular
    architectural independent functions, like whatperf_reg_name() does.
  * Refine the functions name to represent what the functions really do.
  * Add comments for some key functions arguments.
  * Misc enhancements.

History:
v5: https://lore.kernel.org/all/20251203065500.2597594-1-dapeng1.mi@linux.intel.com/
v4: https://lore.kernel.org/all/20250925061213.178796-1-dapeng1.mi@linux.intel.com/
v3: https://lore.kernel.org/lkml/20250815213435.1702022-1-kan.liang@linux.intel.com/
v2: https://lore.kernel.org/lkml/20250626195610.405379-1-kan.liang@linux.intel.com/
v1: https://lore.kernel.org/lkml/20250613134943.3186517-1-kan.liang@linux.intel.com/

Ref:
[1] Kernel patches of supporting eGPRs/SSP/SIMD registers sampling:
https://lore.kernel.org/all/20260209072047.2180332-1-dapeng1.mi@linux.intel.com/

Dapeng Mi (2):
  perf regs: Support x86 eGPRs/SSP sampling
  perf regs: Support x86 SIMD registers sampling

Kan Liang (2):
  perf headers: Sync with the kernel headers
  perf regs: Enable dumping of SIMD registers

 tools/arch/x86/include/uapi/asm/perf_regs.h   |  49 +++
 tools/include/uapi/linux/perf_event.h         |  45 +-
 tools/perf/builtin-script.c                   |   2 +-
 tools/perf/util/evsel.c                       |  53 ++-
 tools/perf/util/parse-regs-options.c          | 168 ++++++-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 412 +++++++++++++++++-
 tools/perf/util/perf_event_attr_fprintf.c     |   6 +
 tools/perf/util/perf_regs.c                   |  86 +++-
 tools/perf/util/perf_regs.h                   |  21 +-
 tools/perf/util/record.h                      |   6 +
 tools/perf/util/sample.h                      |  10 +
 .../scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/session.c                     |  86 +++-
 13 files changed, 888 insertions(+), 58 deletions(-)


base-commit: 335047109d7d488bf5ad32a4076e1a011994cd0e
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Patch v6 1/4] perf headers: Sync with the kernel headers
  2026-02-09  8:35 [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
@ 2026-02-09  8:35 ` Dapeng Mi
  2026-02-09 22:09   ` Ian Rogers
  2026-02-09  8:35 ` [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2026-02-09  8:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Zide Chen, Falcon Thomas,
	Dapeng Mi, Xudong Hao, Kan Liang, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Update include/uapi/linux/perf_event.h and
arch/x86/include/uapi/asm/perf_regs.h to support extended regs.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/arch/x86/include/uapi/asm/perf_regs.h | 49 +++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h       | 45 +++++++++++++++++--
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/include/uapi/asm/perf_regs.h
index 7c9d2bb3833b..6da63e1dbb40 100644
--- a/tools/arch/x86/include/uapi/asm/perf_regs.h
+++ b/tools/arch/x86/include/uapi/asm/perf_regs.h
@@ -27,9 +27,34 @@ enum perf_event_x86_regs {
 	PERF_REG_X86_R13,
 	PERF_REG_X86_R14,
 	PERF_REG_X86_R15,
+	/*
+	 * The EGPRs/SSP and XMM have overlaps. Only one can be used
+	 * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD,
+	 * utilize EGPRs/SSP. For the other ABI type, XMM is used.
+	 *
+	 * Extended GPRs (EGPRs)
+	 */
+	PERF_REG_X86_R16,
+	PERF_REG_X86_R17,
+	PERF_REG_X86_R18,
+	PERF_REG_X86_R19,
+	PERF_REG_X86_R20,
+	PERF_REG_X86_R21,
+	PERF_REG_X86_R22,
+	PERF_REG_X86_R23,
+	PERF_REG_X86_R24,
+	PERF_REG_X86_R25,
+	PERF_REG_X86_R26,
+	PERF_REG_X86_R27,
+	PERF_REG_X86_R28,
+	PERF_REG_X86_R29,
+	PERF_REG_X86_R30,
+	PERF_REG_X86_R31,
+	PERF_REG_X86_SSP,
 	/* These are the limits for the GPRs. */
 	PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
 	PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
+	PERF_REG_MISC_MAX = PERF_REG_X86_SSP + 1,
 
 	/* These all need two bits set because they are 128bit */
 	PERF_REG_X86_XMM0  = 32,
@@ -54,5 +79,29 @@ enum perf_event_x86_regs {
 };
 
 #define PERF_REG_EXTENDED_MASK	(~((1ULL << PERF_REG_X86_XMM0) - 1))
+#define PERF_X86_EGPRS_MASK	GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16)
+
+enum {
+	PERF_X86_SIMD_XMM_REGS      = 16,
+	PERF_X86_SIMD_YMM_REGS      = 16,
+	PERF_X86_SIMD_ZMM_REGS      = 32,
+	PERF_X86_SIMD_VEC_REGS_MAX  = PERF_X86_SIMD_ZMM_REGS,
+
+	PERF_X86_SIMD_OPMASK_REGS   = 8,
+	PERF_X86_SIMD_PRED_REGS_MAX = PERF_X86_SIMD_OPMASK_REGS,
+};
+
+#define PERF_X86_SIMD_PRED_MASK	GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
+#define PERF_X86_SIMD_VEC_MASK	GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
+
+#define PERF_X86_H16ZMM_BASE		16
+
+enum {
+	PERF_X86_OPMASK_QWORDS   = 1,
+	PERF_X86_XMM_QWORDS      = 2,
+	PERF_X86_YMM_QWORDS      = 4,
+	PERF_X86_ZMM_QWORDS      = 8,
+	PERF_X86_SIMD_QWORDS_MAX = PERF_X86_ZMM_QWORDS,
+};
 
 #endif /* _ASM_X86_PERF_REGS_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 72f03153dd32..ce3a14d35390 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -314,8 +314,9 @@ enum {
  */
 enum perf_sample_regs_abi {
 	PERF_SAMPLE_REGS_ABI_NONE		= 0,
-	PERF_SAMPLE_REGS_ABI_32			= 1,
-	PERF_SAMPLE_REGS_ABI_64			= 2,
+	PERF_SAMPLE_REGS_ABI_32			= (1 << 0),
+	PERF_SAMPLE_REGS_ABI_64			= (1 << 1),
+	PERF_SAMPLE_REGS_ABI_SIMD		= (1 << 2),
 };
 
 /*
@@ -383,6 +384,7 @@ enum perf_event_read_format {
 #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
 #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
 #define PERF_ATTR_SIZE_VER9			144	/* add: config4 */
+#define PERF_ATTR_SIZE_VER10			176	/* Add: sample_simd_{pred,vec}_reg_* */
 
 /*
  * 'struct perf_event_attr' contains various attributes that define
@@ -547,6 +549,25 @@ struct perf_event_attr {
 
 	__u64	config3; /* extension of config2 */
 	__u64	config4; /* extension of config3 */
+
+	/*
+	 * Defines set of SIMD registers to dump on samples.
+	 * The sample_simd_regs_enabled !=0 implies the
+	 * set of SIMD registers is used to config all SIMD registers.
+	 * If !sample_simd_regs_enabled, sample_regs_XXX may be used to
+	 * config some SIMD registers on X86.
+	 */
+	union {
+		__u16 sample_simd_regs_enabled;
+		__u16 sample_simd_pred_reg_qwords;
+	};
+	__u16	sample_simd_vec_reg_qwords;
+	__u32	__reserved_4;
+
+	__u32	sample_simd_pred_reg_intr;
+	__u32	sample_simd_pred_reg_user;
+	__u64	sample_simd_vec_reg_intr;
+	__u64	sample_simd_vec_reg_user;
 };
 
 /*
@@ -1020,7 +1041,15 @@ enum perf_event_type {
 	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 *	{ u64			abi; # enum perf_sample_regs_abi
-	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
+	 *	  u64			regs[weight(mask)];
+	 *	  struct {
+	 *		u16 nr_vectors;		# 0 ... weight(sample_simd_vec_reg_user)
+	 *		u16 vector_qwords;	# 0 ... sample_simd_vec_reg_qwords
+	 *		u16 nr_pred;		# 0 ... weight(sample_simd_pred_reg_user)
+	 *		u16 pred_qwords;	# 0 ... sample_simd_pred_reg_qwords
+	 *		u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+	 *	  } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+	 *	} && PERF_SAMPLE_REGS_USER
 	 *
 	 *	{ u64			size;
 	 *	  char			data[size];
@@ -1047,7 +1076,15 @@ enum perf_event_type {
 	 *	{ u64			data_src; } && PERF_SAMPLE_DATA_SRC
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
 	 *	{ u64			abi; # enum perf_sample_regs_abi
-	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+	 *	  u64			regs[weight(mask)];
+	 *	  struct {
+	 *		u16 nr_vectors;		# 0 ... weight(sample_simd_vec_reg_intr)
+	 *		u16 vector_qwords;	# 0 ... sample_simd_vec_reg_qwords
+	 *		u16 nr_pred;		# 0 ... weight(sample_simd_pred_reg_intr)
+	 *		u16 pred_qwords;	# 0 ... sample_simd_pred_reg_qwords
+	 *		u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+	 *	  } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+	 *	} && PERF_SAMPLE_REGS_INTR
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
 	 *	{ u64			cgroup;} && PERF_SAMPLE_CGROUP
 	 *	{ u64			data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling
  2026-02-09  8:35 [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
  2026-02-09  8:35 ` [Patch v6 1/4] perf headers: Sync with the kernel headers Dapeng Mi
@ 2026-02-09  8:35 ` Dapeng Mi
  2026-02-09 22:36   ` Ian Rogers
  2026-02-09  8:35 ` [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling Dapeng Mi
  2026-02-09  8:35 ` [Patch v6 4/4] perf regs: Enable dumping of SIMD registers Dapeng Mi
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2026-02-09  8:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Zide Chen, Falcon Thomas,
	Dapeng Mi, Xudong Hao, Dapeng Mi

This patch adds support for sampling x86 extended GP registers (R16-R31)
and the shadow stack pointer (SSP) register.

The original XMM registers space in sample_regs_user/sample_regs_intr is
reclaimed to represent the eGPRs and SSP when SIMD registers sampling is
supported with the new SIMD sampling fields in the perf_event_attr
structure. This necessitates a way to distinguish which register layout
is used for the sample_regs_user/sample_regs_intr bitmap.

To address this, a new "abi" argument is added to the helpers
perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When
"abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP
layout is represented; otherwise, the legacy XMM registers are
represented.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/builtin-script.c                   |   2 +-
 tools/perf/util/evsel.c                       |   6 +-
 tools/perf/util/parse-regs-options.c          |  17 ++-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 120 +++++++++++++++---
 tools/perf/util/perf_regs.c                   |  14 +-
 tools/perf/util/perf_regs.h                   |  10 +-
 .../scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/session.c                     |   9 +-
 8 files changed, 139 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 14c6f6c3c4f2..ffe51f895666 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
 	for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
 		u64 val = regs->regs[i++];
 		printed += fprintf(fp, "%5s:0x%"PRIx64" ",
-				   perf_reg_name(r, e_machine, e_flags),
+				   perf_reg_name(r, e_machine, e_flags, regs->abi),
 				   val);
 	}
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f59228c1a39e..b7fb3f936ae3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1049,19 +1049,21 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
 	}
 
 	if (param->record_mode == CALLCHAIN_DWARF) {
+		int abi;
+
 		if (!function) {
 			uint16_t e_machine = evsel__e_machine(evsel, /*e_flags=*/NULL);
 
 			evsel__set_sample_bit(evsel, REGS_USER);
 			evsel__set_sample_bit(evsel, STACK_USER);
 			if (opts->sample_user_regs &&
-			    DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST)) {
+			    DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST, &abi)) {
 				attr->sample_regs_user |= DWARF_MINIMAL_REGS(e_machine);
 				pr_warning("WARNING: The use of --call-graph=dwarf may require all the user registers, "
 					   "specifying a subset with --user-regs may render DWARF unwinding unreliable, "
 					   "so the minimal registers set (IP, SP) is explicitly forced.\n");
 			} else {
-				attr->sample_regs_user |= perf_user_reg_mask(EM_HOST);
+				attr->sample_regs_user |= perf_user_reg_mask(EM_HOST, &abi);
 			}
 			attr->sample_stack_user = param->dump_size;
 			attr->exclude_callchain_user = 1;
diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
index c93c2f0c8105..518327883b18 100644
--- a/tools/perf/util/parse-regs-options.c
+++ b/tools/perf/util/parse-regs-options.c
@@ -10,7 +10,8 @@
 #include "util/perf_regs.h"
 #include "util/parse-regs-options.h"
 
-static void list_perf_regs(FILE *fp, uint64_t mask)
+static void
+list_perf_regs(FILE *fp, uint64_t mask, int abi)
 {
 	const char *last_name = NULL;
 
@@ -21,7 +22,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
 		if (((1ULL << reg) & mask) == 0)
 			continue;
 
-		name = perf_reg_name(reg, EM_HOST, EF_HOST);
+		name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
 		if (name && (!last_name || strcmp(last_name, name)))
 			fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
 		last_name = name;
@@ -29,7 +30,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
 	fputc('\n', fp);
 }
 
-static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
+static uint64_t
+name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
 {
 	uint64_t reg_mask = 0;
 
@@ -39,7 +41,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
 		if (((1ULL << reg) & mask) == 0)
 			continue;
 
-		name = perf_reg_name(reg, EM_HOST, EF_HOST);
+		name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
 		if (!name)
 			continue;
 
@@ -56,6 +58,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	char *s, *os = NULL, *p;
 	int ret = -1;
 	uint64_t mask;
+	int abi;
 
 	if (unset)
 		return 0;
@@ -66,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (*mode)
 		return -1;
 
-	mask = intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST);
+	mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM_HOST, &abi);
 
 	/* str may be NULL in case no arg is passed to -I */
 	if (!str) {
@@ -87,11 +90,11 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 			*p = '\0';
 
 		if (!strcmp(s, "?")) {
-			list_perf_regs(stderr, mask);
+			list_perf_regs(stderr, mask, abi);
 			goto error;
 		}
 
-		reg_mask = name_to_perf_reg_mask(s, mask);
+		reg_mask = name_to_perf_reg_mask(s, mask, abi);
 		if (reg_mask == 0) {
 			ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
 				s, intr ? "-I" : "--user-regs=");
diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
index b6d20522b4e8..3e9241a11a95 100644
--- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
+++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
@@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op)
 	return SDT_ARG_VALID;
 }
 
-uint64_t __perf_reg_mask_x86(bool intr)
+static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_regs)
 {
 	struct perf_event_attr attr = {
-		.type			= PERF_TYPE_HARDWARE,
-		.config			= PERF_COUNT_HW_CPU_CYCLES,
-		.sample_type		= PERF_SAMPLE_REGS_INTR,
-		.sample_regs_intr	= PERF_REG_EXTENDED_MASK,
-		.precise_ip		= 1,
-		.disabled		= 1,
-		.exclude_kernel		= 1,
+		.type				= PERF_TYPE_HARDWARE,
+		.config				= PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type			= sample_type,
+		.precise_ip			= 1,
+		.disabled			= 1,
+		.exclude_kernel			= 1,
+		.sample_simd_regs_enabled	= has_simd_regs,
 	};
 	int fd;
-
-	if (!intr)
-		return PERF_REGS_MASK;
-
 	/*
 	 * In an unnamed union, init it here to build on older gcc versions
 	 */
 	attr.sample_period = 1;
+	if (sample_type == PERF_SAMPLE_REGS_INTR)
+		attr.sample_regs_intr = mask;
+	else
+		attr.sample_regs_user = mask;
 
 	if (perf_pmus__num_core_pmus() > 1) {
 		struct perf_pmu *pmu = NULL;
@@ -276,13 +276,34 @@ uint64_t __perf_reg_mask_x86(bool intr)
 				 /*group_fd=*/-1, /*flags=*/0);
 	if (fd != -1) {
 		close(fd);
-		return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK);
+		return mask;
+	}
+
+	return 0;
+}
+
+uint64_t __perf_reg_mask_x86(bool intr, int *abi)
+{
+	u64 sample_type = intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER;
+	uint64_t mask = PERF_REGS_MASK;
+
+	*abi = 0;
+	mask |= __arch__reg_mask(sample_type,
+				 GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
+				 true);
+	mask |= __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true);
+
+	if (mask != PERF_REGS_MASK) {
+		*abi |= PERF_SAMPLE_REGS_ABI_SIMD;
+	} else {
+		mask |= __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK,
+					 false);
 	}
 
-	return PERF_REGS_MASK;
+	return mask;
 }
 
-const char *__perf_reg_name_x86(int id)
+static const char *__arch_reg_gpr_name(int id)
 {
 	switch (id) {
 	case PERF_REG_X86_AX:
@@ -333,7 +354,60 @@ const char *__perf_reg_name_x86(int id)
 		return "R14";
 	case PERF_REG_X86_R15:
 		return "R15";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
 
+static const char *__arch_reg_egpr_name(int id)
+{
+	switch (id) {
+	case PERF_REG_X86_R16:
+		return "R16";
+	case PERF_REG_X86_R17:
+		return "R17";
+	case PERF_REG_X86_R18:
+		return "R18";
+	case PERF_REG_X86_R19:
+		return "R19";
+	case PERF_REG_X86_R20:
+		return "R20";
+	case PERF_REG_X86_R21:
+		return "R21";
+	case PERF_REG_X86_R22:
+		return "R22";
+	case PERF_REG_X86_R23:
+		return "R23";
+	case PERF_REG_X86_R24:
+		return "R24";
+	case PERF_REG_X86_R25:
+		return "R25";
+	case PERF_REG_X86_R26:
+		return "R26";
+	case PERF_REG_X86_R27:
+		return "R27";
+	case PERF_REG_X86_R28:
+		return "R28";
+	case PERF_REG_X86_R29:
+		return "R29";
+	case PERF_REG_X86_R30:
+		return "R30";
+	case PERF_REG_X86_R31:
+		return "R31";
+	case PERF_REG_X86_SSP:
+		return "SSP";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
+
+static const char *__arch_reg_xmm_name(int id)
+{
+	switch (id) {
 #define XMM(x) \
 	case PERF_REG_X86_XMM ## x:	\
 	case PERF_REG_X86_XMM ## x + 1:	\
@@ -362,6 +436,22 @@ const char *__perf_reg_name_x86(int id)
 	return NULL;
 }
 
+const char *__perf_reg_name_x86(int id, int abi)
+{
+	const char *name;
+
+	name = __arch_reg_gpr_name(id);
+	if (name)
+		return name;
+
+	if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+		name = __arch_reg_egpr_name(id);
+	else
+		name = __arch_reg_xmm_name(id);
+
+	return name;
+}
+
 uint64_t __perf_reg_ip_x86(void)
 {
 	return PERF_REG_X86_IP;
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index 5b8f34beb24e..bdd2eef13bc3 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -32,10 +32,11 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op)
 	return ret;
 }
 
-uint64_t perf_intr_reg_mask(uint16_t e_machine)
+uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi)
 {
 	uint64_t mask = 0;
 
+	*abi = 0;
 	switch (e_machine) {
 	case EM_ARM:
 		mask = __perf_reg_mask_arm(/*intr=*/true);
@@ -64,7 +65,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		mask = __perf_reg_mask_x86(/*intr=*/true);
+		mask = __perf_reg_mask_x86(/*intr=*/true, abi);
 		break;
 	default:
 		pr_debug("Unknown ELF machine %d, interrupt sampling register mask will be empty.\n",
@@ -75,10 +76,11 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
 	return mask;
 }
 
-uint64_t perf_user_reg_mask(uint16_t e_machine)
+uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi)
 {
 	uint64_t mask = 0;
 
+	*abi = 0;
 	switch (e_machine) {
 	case EM_ARM:
 		mask = __perf_reg_mask_arm(/*intr=*/false);
@@ -107,7 +109,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		mask = __perf_reg_mask_x86(/*intr=*/false);
+		mask = __perf_reg_mask_x86(/*intr=*/false, abi);
 		break;
 	default:
 		pr_debug("Unknown ELF machine %d, user sampling register mask will be empty.\n",
@@ -118,7 +120,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
 	return mask;
 }
 
-const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
+const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
 {
 	const char *reg_name = NULL;
 
@@ -150,7 +152,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
 		break;
 	case EM_386:
 	case EM_X86_64:
-		reg_name = __perf_reg_name_x86(id);
+		reg_name = __perf_reg_name_x86(id, abi);
 		break;
 	default:
 		break;
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 7c04700bf837..c9501ca8045d 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -13,10 +13,10 @@ enum {
 };
 
 int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op);
-uint64_t perf_intr_reg_mask(uint16_t e_machine);
-uint64_t perf_user_reg_mask(uint16_t e_machine);
+uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi);
+uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi);
 
-const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags);
+const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi);
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 uint64_t perf_arch_reg_ip(uint16_t e_machine);
 uint64_t perf_arch_reg_sp(uint16_t e_machine);
@@ -64,8 +64,8 @@ uint64_t __perf_reg_ip_s390(void);
 uint64_t __perf_reg_sp_s390(void);
 
 int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op);
-uint64_t __perf_reg_mask_x86(bool intr);
-const char *__perf_reg_name_x86(int id);
+uint64_t __perf_reg_mask_x86(bool intr, int *abi);
+const char *__perf_reg_name_x86(int id, int abi);
 uint64_t __perf_reg_ip_x86(void);
 uint64_t __perf_reg_sp_x86(void);
 
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 2b0df7bd9a46..4cc5b96898e6 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, uint16_t e_machine,
 
 		printed += scnprintf(bf + printed, size - printed,
 				     "%5s:0x%" PRIx64 " ",
-				     perf_reg_name(r, e_machine, e_flags), val);
+				     perf_reg_name(r, e_machine, e_flags, regs->abi), val);
 	}
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b465abfa36c..7cf7bf86205d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -959,15 +959,16 @@ static void branch_stack__printf(struct perf_sample *sample,
 	}
 }
 
-static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uint32_t e_flags)
+static void regs_dump__printf(u64 mask, struct regs_dump *regs,
+			      uint16_t e_machine, uint32_t e_flags)
 {
 	unsigned rid, i = 0;
 
 	for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
-		u64 val = regs[i++];
+		u64 val = regs->regs[i++];
 
 		printf(".... %-5s 0x%016" PRIx64 "\n",
-		       perf_reg_name(rid, e_machine, e_flags), val);
+		       perf_reg_name(rid, e_machine, e_flags, regs->abi), val);
 	}
 }
 
@@ -995,7 +996,7 @@ static void regs__printf(const char *type, struct regs_dump *regs,
 	       mask,
 	       regs_dump_abi(regs));
 
-	regs_dump__printf(mask, regs->regs, e_machine, e_flags);
+	regs_dump__printf(mask, regs, e_machine, e_flags);
 }
 
 static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling
  2026-02-09  8:35 [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
  2026-02-09  8:35 ` [Patch v6 1/4] perf headers: Sync with the kernel headers Dapeng Mi
  2026-02-09  8:35 ` [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
@ 2026-02-09  8:35 ` Dapeng Mi
  2026-02-09 22:39   ` Ian Rogers
  2026-02-09  8:35 ` [Patch v6 4/4] perf regs: Enable dumping of SIMD registers Dapeng Mi
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2026-02-09  8:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Zide Chen, Falcon Thomas,
	Dapeng Mi, Xudong Hao, Dapeng Mi

This patch adds support for the newly introduced SIMD register sampling
format by adding the following 5 functions:

uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
						uint16_t *qwords, bool pred);
uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
						uint16_t *qwords, bool pred);
const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);

The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap
of kernel supported SIMD/PRED register classes on current platform for
intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on
x86 platforms.

The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve
the bitmap and qwords length of a certain class of SIMD/PRED register
on current platform for intr-regs and user-regs sampling. For example,
for the XMM registers on x86 platforms, the returned bitmap is 0xffff
(XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM
register).

The perf_simd_reg_class_name() function gets the register class name for
a certain register class index.

Additionally, the function __parse_regs() is enhanced to support parsing
these newly introduced SIMD/PRED registers. Currently, each class of
register can only be sampled collectively; sampling a specific SIMD
register is not supported. For example, all XMM registers are sampled
together rather than sampling only XMM0.

When multiple overlapping register types, such as XMM and YMM, are
sampled simultaneously, only the superset (YMM registers) is sampled.

With this patch, all supported sampling registers on x86 platforms are
displayed as follows.

 $perf record --intr-regs=?
 available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7

 $perf record --user-regs=?
 available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/util/evsel.c                       |  27 ++
 tools/perf/util/parse-regs-options.c          | 161 +++++++++-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  | 292 ++++++++++++++++++
 tools/perf/util/perf_event_attr_fprintf.c     |   6 +
 tools/perf/util/perf_regs.c                   |  72 +++++
 tools/perf/util/perf_regs.h                   |  11 +
 tools/perf/util/record.h                      |   6 +
 7 files changed, 565 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b7fb3f936ae3..a86d2434a4ad 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1583,12 +1583,39 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->sample_intr_regs && !evsel->no_aux_samples &&
 	    !evsel__is_dummy_event(evsel)) {
 		attr->sample_regs_intr = opts->sample_intr_regs;
+		attr->sample_simd_regs_enabled = !!opts->sample_pred_reg_qwords;
+		evsel__set_sample_bit(evsel, REGS_INTR);
+	}
+
+	if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) &&
+	    !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
+		/* The pred qwords is to implies the set of SIMD registers is used */
+		if (opts->sample_pred_reg_qwords)
+			attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
+		else
+			attr->sample_simd_pred_reg_qwords = 1;
+		attr->sample_simd_vec_reg_intr = opts->sample_intr_vec_regs;
+		attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
+		attr->sample_simd_pred_reg_intr = opts->sample_intr_pred_regs;
 		evsel__set_sample_bit(evsel, REGS_INTR);
 	}
 
 	if (opts->sample_user_regs && !evsel->no_aux_samples &&
 	    !evsel__is_dummy_event(evsel)) {
 		attr->sample_regs_user |= opts->sample_user_regs;
+		attr->sample_simd_regs_enabled = !!opts->sample_pred_reg_qwords;
+		evsel__set_sample_bit(evsel, REGS_USER);
+	}
+
+	if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) &&
+	    !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
+		if (opts->sample_pred_reg_qwords)
+			attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
+		else
+			attr->sample_simd_pred_reg_qwords = 1;
+		attr->sample_simd_vec_reg_user = opts->sample_user_vec_regs;
+		attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
+		attr->sample_simd_pred_reg_user = opts->sample_user_pred_regs;
 		evsel__set_sample_bit(evsel, REGS_USER);
 	}
 
diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
index 518327883b18..f27960846edc 100644
--- a/tools/perf/util/parse-regs-options.c
+++ b/tools/perf/util/parse-regs-options.c
@@ -9,13 +9,13 @@
 #include <subcmd/parse-options.h>
 #include "util/perf_regs.h"
 #include "util/parse-regs-options.h"
+#include "record.h"
 
 static void
-list_perf_regs(FILE *fp, uint64_t mask, int abi)
+__list_gp_regs(FILE *fp, uint64_t mask, int abi)
 {
 	const char *last_name = NULL;
 
-	fprintf(fp, "available registers: ");
 	for (int reg = 0; reg < 64; reg++) {
 		const char *name;
 
@@ -27,14 +27,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi)
 			fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
 		last_name = name;
 	}
+}
+
+static void
+__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred)
+{
+	uint64_t bitmap = 0;
+	uint16_t qwords = 0;
+	const char *name;
+	int i = 0;
+
+	for (int reg_c = 0; reg_c < 64; reg_c++) {
+		if (((1ULL << reg_c) & mask) == 0)
+			continue;
+
+		name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
+		bitmap = intr ?
+			 perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) :
+			 perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred);
+		if (name && bitmap)
+			fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "",
+				name, fls64(bitmap) - 1);
+	}
+}
+
+static void
+list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask,
+	       uint64_t pred_mask, int abi, bool intr)
+{
+	bool printed = false;
+
+	fprintf(fp, "available registers: ");
+
+	if (mask) {
+		__list_gp_regs(fp, mask, abi);
+		printed = true;
+	}
+
+	if (simd_mask) {
+		if (printed)
+			fprintf(fp, " ");
+		__list_simd_regs(fp, simd_mask, intr, /*pred=*/false);
+		printed = true;
+	}
+
+	if (pred_mask) {
+		if (printed)
+			fprintf(fp, " ");
+		__list_simd_regs(fp, pred_mask, intr, /*pred=*/true);
+		printed = true;
+	}
+
 	fputc('\n', fp);
 }
 
 static uint64_t
-name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
+name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi)
 {
 	uint64_t reg_mask = 0;
 
+	if (!mask)
+		return reg_mask;
+
 	for (int reg = 0; reg < 64; reg++) {
 		const char *name;
 
@@ -51,13 +105,78 @@ name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
 	return reg_mask;
 }
 
+static bool
+name_to_simd_reg_mask(struct record_opts *opts, const char *to_match,
+		      uint64_t mask, bool intr, bool pred)
+{
+	bool matched = false;
+	uint64_t bitmap;
+	uint16_t qwords;
+	int reg_c;
+
+	if (!mask)
+		return false;
+
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		const char *name;
+
+		if (((1ULL << reg_c) & mask) == 0)
+			continue;
+
+		name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
+		if (!name)
+			continue;
+
+		if (!strcasecmp(to_match, name)) {
+			matched = true;
+			break;
+		}
+	}
+
+	if (!matched)
+		return false;
+
+	if (intr) {
+		bitmap = perf_intr_simd_reg_class_bitmap_qwords(EM_HOST,
+							reg_c, &qwords, pred);
+	} else {
+		bitmap = perf_user_simd_reg_class_bitmap_qwords(EM_HOST,
+							reg_c, &qwords, pred);
+	}
+
+	/* Just need the highest qwords */
+	if (pred) {
+		if (qwords >= opts->sample_pred_reg_qwords) {
+			opts->sample_pred_reg_qwords = qwords;
+			if (intr)
+				opts->sample_intr_pred_regs = bitmap;
+			else
+				opts->sample_user_pred_regs = bitmap;
+		}
+	} else {
+		if (qwords >= opts->sample_vec_reg_qwords) {
+			opts->sample_vec_reg_qwords = qwords;
+			if (intr)
+				opts->sample_intr_vec_regs = bitmap;
+			else
+				opts->sample_user_vec_regs = bitmap;
+		}
+	}
+
+	return true;
+}
+
 static int
 __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 {
 	uint64_t *mode = (uint64_t *)opt->value;
+	struct record_opts *opts;
 	char *s, *os = NULL, *p;
-	int ret = -1;
+	uint64_t simd_mask;
+	uint64_t pred_mask;
 	uint64_t mask;
+	bool matched;
+	int ret = -1;
 	int abi;
 
 	if (unset)
@@ -69,11 +188,16 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (*mode)
 		return -1;
 
-	mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM_HOST, &abi);
+	mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) :
+		      perf_user_reg_mask(EM_HOST, &abi);
+	opts = intr ? container_of(opt->value, struct record_opts, sample_intr_regs) :
+		      container_of(opt->value, struct record_opts, sample_user_regs);
 
 	/* str may be NULL in case no arg is passed to -I */
 	if (!str) {
 		*mode = mask;
+		if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+			opts->sample_pred_reg_qwords = 1;
 		return 0;
 	}
 
@@ -82,6 +206,14 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 	if (!s)
 		return -1;
 
+	if (intr) {
+		simd_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/false);
+		pred_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/true);
+	} else {
+		simd_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/false);
+		pred_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/true);
+	}
+
 	for (;;) {
 		uint64_t reg_mask;
 
@@ -90,15 +222,24 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
 			*p = '\0';
 
 		if (!strcmp(s, "?")) {
-			list_perf_regs(stderr, mask, abi);
+			list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr);
 			goto error;
 		}
 
-		reg_mask = name_to_perf_reg_mask(s, mask, abi);
-		if (reg_mask == 0) {
-			ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
+		reg_mask = name_to_gp_reg_mask(s, mask, abi);
+		if (reg_mask) {
+			if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
+				opts->sample_pred_reg_qwords = 1;
+		} else {
+			matched = name_to_simd_reg_mask(opts, s, simd_mask,
+							intr, /*pred=*/false) ||
+				  name_to_simd_reg_mask(opts, s, pred_mask,
+							intr, /*pred=*/true);
+			if (!matched) {
+				ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
 				s, intr ? "-I" : "--user-regs=");
-			goto error;
+				goto error;
+			}
 		}
 		*mode |= reg_mask;
 
diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
index 3e9241a11a95..867059fc3cb0 100644
--- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
+++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
@@ -461,3 +461,295 @@ uint64_t __perf_reg_sp_x86(void)
 {
 	return PERF_REG_X86_SP;
 }
+
+enum {
+	PERF_REG_CLASS_X86_OPMASK = 0,
+	PERF_REG_CLASS_X86_XMM,
+	PERF_REG_CLASS_X86_YMM,
+	PERF_REG_CLASS_X86_ZMM,
+	PERF_REG_X86_MAX_SIMD_CLASSES,
+};
+
+#define PERF_REG_CLASS_X86_PRED_MASK	(BIT(PERF_REG_CLASS_X86_OPMASK))
+#define PERF_REG_CLASS_X86_SIMD_MASK	(BIT(PERF_REG_CLASS_X86_XMM) | \
+					 BIT(PERF_REG_CLASS_X86_YMM) | \
+					 BIT(PERF_REG_CLASS_X86_ZMM))
+
+/*
+ * This function is used to determin whether kernel perf subsystem supports
+ * which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling.
+ *
+ * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER
+ * @qwords: the length of SIMD register, like 1/2/4/8 qwords for
+ *          OPMASK/XMM/YMM/ZMM regisers.
+ * @mask: the bitamsk of SIMD register, like 0xffff for XMM0 ~ XMM15
+ * @pred: whether It's a preceding SIMD register, like OPMASK register.
+ *
+ * Return value: true indicates support, otherwise no support.
+ */
+static bool
+__support_simd_reg_class(uint64_t sample_type, uint16_t qwords,
+			 uint64_t mask, bool pred)
+{
+	struct perf_event_attr attr = {
+		.type				= PERF_TYPE_HARDWARE,
+		.config				= PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type			= sample_type,
+		.disabled			= 1,
+		.exclude_kernel			= 1,
+		.sample_simd_regs_enabled	= 1,
+	};
+	int fd;
+
+	attr.sample_period = 1;
+
+	if (!pred) {
+		attr.sample_simd_vec_reg_qwords = qwords;
+		if (sample_type == PERF_SAMPLE_REGS_INTR)
+			attr.sample_simd_vec_reg_intr = mask;
+		else
+			attr.sample_simd_vec_reg_user = mask;
+	} else {
+		attr.sample_simd_pred_reg_qwords = PERF_X86_OPMASK_QWORDS;
+		if (sample_type == PERF_SAMPLE_REGS_INTR)
+			attr.sample_simd_pred_reg_intr = PERF_X86_SIMD_PRED_MASK;
+		else
+			attr.sample_simd_pred_reg_user = PERF_X86_SIMD_PRED_MASK;
+	}
+
+	if (perf_pmus__num_core_pmus() > 1) {
+		__u64 type = perf_pmus__find_core_pmu()->type;
+
+		attr.config |= type << PERF_PMU_TYPE_SHIFT;
+	}
+
+	event_attr_init(&attr);
+
+	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+	if (fd != -1) {
+		close(fd);
+		return true;
+	}
+
+	return false;
+}
+
+#define PERF_X86_SIMD_ZMMH_REGS	(PERF_X86_SIMD_ZMM_REGS / 2)
+
+static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class,
+				      uint64_t *mask, uint16_t *qwords)
+{
+	bool supported = false;
+	uint64_t bits;
+
+	*mask = 0;
+	*qwords = 0;
+
+	switch (reg_class) {
+	case PERF_REG_CLASS_X86_OPMASK:
+		bits = BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_OPMASK_QWORDS,
+						     bits, true);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_OPMASK_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_XMM:
+		bits = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_XMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_XMM_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_YMM:
+		bits = BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_YMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_YMM_QWORDS;
+		}
+		break;
+	case PERF_REG_CLASS_X86_ZMM:
+		bits = BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_ZMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_ZMM_QWORDS;
+			break;
+		}
+
+		bits = BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1;
+		supported = __support_simd_reg_class(sample_type,
+						     PERF_X86_ZMM_QWORDS,
+						     bits, false);
+		if (supported) {
+			*mask = bits;
+			*qwords = PERF_X86_ZMM_QWORDS;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return supported;
+}
+
+static bool __support_simd_sampling(void)
+{
+	uint64_t mask = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
+	uint16_t qwords = PERF_X86_XMM_QWORDS;
+	static bool simd_sampling_supported;
+	static bool cached;
+
+	if (cached)
+		return simd_sampling_supported;
+
+	simd_sampling_supported =
+		 __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR,
+					   PERF_REG_CLASS_X86_XMM,
+					   &mask, &qwords);
+	simd_sampling_supported |=
+		 __arch_has_simd_reg_class(PERF_SAMPLE_REGS_USER,
+					   PERF_REG_CLASS_X86_XMM,
+					   &mask, &qwords);
+	cached = true;
+
+	return simd_sampling_supported;
+}
+
+/*
+ * @x86_intr_simd_cached: indicates the data of below 3
+ *  x86_intr_simd_* items has been retrieved from kernel and cached.
+ * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD
+ *  registers are supported for intr-regs option. Assume kernel perf
+ *  subsystem supports XMM/YMM sampling, then the mask is
+ *  PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM.
+ * @x86_intr_simd_mask: indicates register bitmask for each kind of
+ *  supported PRED/SIMD register, like
+ *  x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] = 0xffff.
+ * @x86_intr_simd_mask: indicates the register length (qwords uinit)
+ *  for each kind of supported PRED/SIMD register, like
+ *  x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] = 2.
+ */
+static bool x86_intr_simd_cached;
+static uint64_t x86_intr_simd_reg_class_mask;
+static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
+static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
+
+/*
+ * Similar with above x86_intr_simd_* items, the difference is these
+ * items are used for user-regs option.
+ */
+static bool x86_user_simd_cached;
+static uint64_t x86_user_simd_reg_class_mask;
+static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
+static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
+
+static uint64_t __arch__simd_reg_class_mask(bool intr)
+{
+	uint64_t mask = 0;
+	bool supported;
+	int reg_c;
+
+	if (!__support_simd_sampling())
+		return 0;
+
+	if (intr && x86_intr_simd_cached)
+		return x86_intr_simd_reg_class_mask;
+
+	if (!intr && x86_user_simd_cached)
+		return x86_user_simd_reg_class_mask;
+
+	for (reg_c = 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) {
+		supported = false;
+
+		if (intr) {
+			supported = __arch_has_simd_reg_class(
+						PERF_SAMPLE_REGS_INTR,
+						reg_c,
+						&x86_intr_simd_mask[reg_c],
+						&x86_intr_simd_qwords[reg_c]);
+		} else {
+			supported = __arch_has_simd_reg_class(
+						PERF_SAMPLE_REGS_USER,
+						reg_c,
+						&x86_user_simd_mask[reg_c],
+						&x86_user_simd_qwords[reg_c]);
+		}
+		if (supported)
+			mask |= BIT_ULL(reg_c);
+	}
+
+	if (intr) {
+		x86_intr_simd_reg_class_mask = mask;
+		x86_intr_simd_cached = true;
+	} else {
+		x86_user_simd_reg_class_mask = mask;
+		x86_user_simd_cached = true;
+	}
+
+	return mask;
+}
+
+static uint64_t
+__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qwords)
+{
+	uint64_t mask = 0;
+
+	*qwords = 0;
+	if (reg_c >= PERF_REG_X86_MAX_SIMD_CLASSES)
+		return mask;
+
+	if (intr) {
+		mask = x86_intr_simd_mask[reg_c];
+		*qwords = x86_intr_simd_qwords[reg_c];
+	} else {
+		mask = x86_user_simd_mask[reg_c];
+		*qwords = x86_user_simd_qwords[reg_c];
+	}
+
+	return mask;
+}
+
+uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred)
+{
+	uint64_t mask = __arch__simd_reg_class_mask(intr);
+
+	return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK :
+		      mask & PERF_REG_CLASS_X86_SIMD_MASK;
+}
+
+uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
+						 bool intr, bool pred)
+{
+	if (!x86_intr_simd_cached)
+		__perf_simd_reg_class_mask_x86(intr, pred);
+	return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords);
+}
+
+const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unused)
+{
+	switch (id) {
+	case PERF_REG_CLASS_X86_OPMASK:
+		return "OPMASK";
+	case PERF_REG_CLASS_X86_XMM:
+		return "XMM";
+	case PERF_REG_CLASS_X86_YMM:
+		return "YMM";
+	case PERF_REG_CLASS_X86_ZMM:
+		return "ZMM";
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 741c3d657a8b..c6b8e53e06fd 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(aux_start_paused, p_unsigned);
 	PRINT_ATTRf(aux_pause, p_unsigned);
 	PRINT_ATTRf(aux_resume, p_unsigned);
+	PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned);
+	PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex);
+	PRINT_ATTRf(sample_simd_pred_reg_user, p_hex);
+	PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned);
+	PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex);
+	PRINT_ATTRf(sample_simd_vec_reg_user, p_hex);
 
 	return ret;
 }
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index bdd2eef13bc3..0ad40421f34e 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -248,3 +248,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine)
 		return 0;
 	}
 }
+
+uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_mask_x86(/*intr=*/true, pred);
+	default:
+		return 0;
+	}
+}
+
+uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_mask_x86(/*intr=*/false, pred);
+	default:
+		return 0;
+	}
+}
+
+uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
+							       /*intr=*/true,
+							       pred);
+	default:
+		*qwords = 0;
+		return 0;
+	}
+}
+
+uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred)
+{
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
+							       /*intr=*/false,
+							       pred);
+	default:
+		*qwords = 0;
+		return 0;
+	}
+}
+
+const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred)
+{
+	const char *name = NULL;
+
+	switch (e_machine) {
+	case EM_386:
+	case EM_X86_64:
+		name = __perf_simd_reg_class_name_x86(id, pred);
+		break;
+	default:
+		break;
+	}
+	if (name)
+		return name;
+
+	pr_debug("Failed to find %s register %d for ELF machine type %u\n",
+		 pred ? "PRED" : "SIMD", id, e_machine);
+	return "unknown";
+}
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index c9501ca8045d..80d1d7316188 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 uint64_t perf_arch_reg_ip(uint16_t e_machine);
 uint64_t perf_arch_reg_sp(uint16_t e_machine);
+uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
+uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
+uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred);
+uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
+						uint16_t *qwords, bool pred);
+const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);
 
 int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op);
 uint64_t __perf_reg_mask_arm64(bool intr);
@@ -68,6 +75,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi);
 const char *__perf_reg_name_x86(int id, int abi);
 uint64_t __perf_reg_ip_x86(void);
 uint64_t __perf_reg_sp_x86(void);
+uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred);
+uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
+						 bool intr, bool pred);
+const char *__perf_simd_reg_class_name_x86(int id, bool pred);
 
 static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine)
 {
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 93627c9a7338..37ed44b5f15b 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -62,6 +62,12 @@ struct record_opts {
 	u64	      branch_stack;
 	u64	      sample_intr_regs;
 	u64	      sample_user_regs;
+	u64	      sample_intr_vec_regs;
+	u64	      sample_user_vec_regs;
+	u32	      sample_intr_pred_regs;
+	u32	      sample_user_pred_regs;
+	u16	      sample_vec_reg_qwords;
+	u16	      sample_pred_reg_qwords;
 	u64	      default_interval;
 	u64	      user_interval;
 	size_t	      auxtrace_snapshot_size;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch v6 4/4] perf regs: Enable dumping of SIMD registers
  2026-02-09  8:35 [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
                   ` (2 preceding siblings ...)
  2026-02-09  8:35 ` [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling Dapeng Mi
@ 2026-02-09  8:35 ` Dapeng Mi
  2026-02-09 23:02   ` Ian Rogers
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2026-02-09  8:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Zide Chen, Falcon Thomas,
	Dapeng Mi, Xudong Hao, Kan Liang, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

This patch adds support for dumping SIMD registers using the new
PERF_SAMPLE_REGS_ABI_SIMD ABI.

Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86
platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI.

An example of the output is displayed below.

Example:

 $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test
 $perf report -D
 ... ...
 237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1):
 179370/179370: 0xffffffff969627fc period: 124999 addr: 0
 ... intr regs: mask 0x20000000000 ABI 64-bit
 .... SSP   0x0000000000000000
 ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1
 .... YMM  [0] 0x0000000000004000
 .... YMM  [0] 0x000055e828695270
 .... YMM  [0] 0x0000000000000000
 .... YMM  [0] 0x0000000000000000
 .... YMM  [1] 0x000055e8286990e0
 .... YMM  [1] 0x000055e828698dd0
 .... YMM  [1] 0x0000000000000000
 .... YMM  [1] 0x0000000000000000
 ... ...
 .... YMM  [31] 0x0000000000000000
 .... YMM  [31] 0x0000000000000000
 .... YMM  [31] 0x0000000000000000
 .... YMM  [31] 0x0000000000000000
 .... OPMASK[0] 0x0000000000100221
 .... OPMASK[1] 0x0000000000000020
 .... OPMASK[2] 0x000000007fffffff
 .... OPMASK[3] 0x0000000000000000
 .... OPMASK[4] 0x0000000000000000
 .... OPMASK[5] 0x0000000000000000
 .... OPMASK[6] 0x0000000000000000
 .... OPMASK[7] 0x0000000000000000
 ... ...

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/util/evsel.c   | 20 ++++++++++
 tools/perf/util/sample.h  | 10 +++++
 tools/perf/util/session.c | 77 +++++++++++++++++++++++++++++++++++----
 3 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a86d2434a4ad..2e1d50a72762 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3514,6 +3514,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			regs->mask = mask;
 			regs->regs = (u64 *)array;
 			array = (void *)array + sz;
+
+			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+				regs->config = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				regs->data = (u64 *)array;
+				sz = (regs->nr_vectors * regs->vector_qwords +
+				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);
+				OVERFLOW_CHECK(array, sz, max_size);
+				array = (void *)array + sz;
+			}
 		}
 	}
 
@@ -3571,6 +3581,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			regs->mask = mask;
 			regs->regs = (u64 *)array;
 			array = (void *)array + sz;
+
+			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
+				regs->config = *(u64 *)array;
+				array = (void *)array + sizeof(u64);
+				regs->data = (u64 *)array;
+				sz = (regs->nr_vectors * regs->vector_qwords +
+				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);
+				OVERFLOW_CHECK(array, sz, max_size);
+				array = (void *)array + sz;
+			}
 		}
 	}
 
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index 3cce8dd202aa..b98bc58d365e 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -15,6 +15,16 @@ struct regs_dump {
 	u64 abi;
 	u64 mask;
 	u64 *regs;
+	union {
+		u64 config;
+		struct {
+			u16 nr_vectors;
+			u16 vector_qwords;
+			u16 nr_pred;
+			u16 pred_qwords;
+		};
+	};
+	u64 *data;
 
 	/* Cached values/mask filled by first register access. */
 	u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE];
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7cf7bf86205d..fba8ef52f0a1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -972,18 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_dump *regs,
 	}
 }
 
-static const char *regs_abi[] = {
-	[PERF_SAMPLE_REGS_ABI_NONE] = "none",
-	[PERF_SAMPLE_REGS_ABI_32] = "32-bit",
-	[PERF_SAMPLE_REGS_ABI_64] = "64-bit",
-};
+static void simd_regs_dump__printf(struct regs_dump *regs, bool intr)
+{
+	const char *name = "unknown";
+	int i, idx = 0;
+	uint16_t qwords;
+	int reg_c;
+
+	if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD))
+		return;
+
+	printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qwords %d\n",
+	       regs->nr_vectors, regs->vector_qwords,
+	       regs->nr_pred, regs->pred_qwords);
+
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		if (intr) {
+			perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
+							       &qwords, /*pred=*/false);
+		} else {
+			perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
+							       &qwords, /*pred=*/false);
+		}
+		if (regs->vector_qwords == qwords) {
+			name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/false);
+			break;
+		}
+	}
+
+	for (i = 0; i < regs->nr_vectors; i++) {
+		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+		if (regs->vector_qwords > 2) {
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+		}
+		if (regs->vector_qwords > 4) {
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+		}
+	}
+
+	name = "unknown";
+	for (reg_c = 0; reg_c < 64; reg_c++) {
+		if (intr) {
+			perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
+							       &qwords, /*pred=*/true);
+		} else {
+			perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
+							       &qwords, /*pred=*/true);
+		}
+		if (regs->pred_qwords == qwords) {
+			name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/true);
+			break;
+		}
+	}
+	for (i = 0; i < regs->nr_pred; i++)
+		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
+}
 
 static inline const char *regs_dump_abi(struct regs_dump *d)
 {
-	if (d->abi > PERF_SAMPLE_REGS_ABI_64)
-		return "unknown";
+	if (!d->abi)
+		return "none";
+	if (d->abi & PERF_SAMPLE_REGS_ABI_32)
+		return "32-bit";
+	else if (d->abi & PERF_SAMPLE_REGS_ABI_64)
+		return "64-bit";
 
-	return regs_abi[d->abi];
+	return "unknown";
 }
 
 static void regs__printf(const char *type, struct regs_dump *regs,
@@ -1010,6 +1069,7 @@ static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, ui
 
 	if (user_regs->regs)
 		regs__printf("user", user_regs, e_machine, e_flags);
+	simd_regs_dump__printf(user_regs, /*intr=*/false);
 }
 
 static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
@@ -1023,6 +1083,7 @@ static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, ui
 
 	if (intr_regs->regs)
 		regs__printf("intr", intr_regs, e_machine, e_flags);
+	simd_regs_dump__printf(intr_regs, /*intr=*/true);
 }
 
 static void stack_user__printf(struct stack_dump *dump)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Patch v6 1/4] perf headers: Sync with the kernel headers
  2026-02-09  8:35 ` [Patch v6 1/4] perf headers: Sync with the kernel headers Dapeng Mi
@ 2026-02-09 22:09   ` Ian Rogers
  2026-02-10  5:21     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-02-09 22:09 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Kan Liang

On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Update include/uapi/linux/perf_event.h and
> arch/x86/include/uapi/asm/perf_regs.h to support extended regs.
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/arch/x86/include/uapi/asm/perf_regs.h | 49 +++++++++++++++++++++
>  tools/include/uapi/linux/perf_event.h       | 45 +++++++++++++++++--
>  2 files changed, 90 insertions(+), 4 deletions(-)
>
> diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/include/uapi/asm/perf_regs.h
> index 7c9d2bb3833b..6da63e1dbb40 100644
> --- a/tools/arch/x86/include/uapi/asm/perf_regs.h
> +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h
> @@ -27,9 +27,34 @@ enum perf_event_x86_regs {
>         PERF_REG_X86_R13,
>         PERF_REG_X86_R14,
>         PERF_REG_X86_R15,
> +       /*
> +        * The EGPRs/SSP and XMM have overlaps. Only one can be used
> +        * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD,
> +        * utilize EGPRs/SSP. For the other ABI type, XMM is used.
> +        *
> +        * Extended GPRs (EGPRs)
> +        */
> +       PERF_REG_X86_R16,
> +       PERF_REG_X86_R17,
> +       PERF_REG_X86_R18,
> +       PERF_REG_X86_R19,
> +       PERF_REG_X86_R20,
> +       PERF_REG_X86_R21,
> +       PERF_REG_X86_R22,
> +       PERF_REG_X86_R23,
> +       PERF_REG_X86_R24,
> +       PERF_REG_X86_R25,
> +       PERF_REG_X86_R26,
> +       PERF_REG_X86_R27,
> +       PERF_REG_X86_R28,
> +       PERF_REG_X86_R29,
> +       PERF_REG_X86_R30,
> +       PERF_REG_X86_R31,
> +       PERF_REG_X86_SSP,

nit: I think it'd be nice to comment that PERF_REG_X86_SSP and
PERF_REG_X86_XMM0 are both 32, the meaning of the register is
dependent on the PERF_SAMPLE_REGS_ABI_SIMD, 0 meaning XMM0 and 1
meaning SSP (which could be the opposite of what would be expected).

>         /* These are the limits for the GPRs. */
>         PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
>         PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
> +       PERF_REG_MISC_MAX = PERF_REG_X86_SSP + 1,
>
>         /* These all need two bits set because they are 128bit */
>         PERF_REG_X86_XMM0  = 32,
> @@ -54,5 +79,29 @@ enum perf_event_x86_regs {
>  };
>
>  #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
> +#define PERF_X86_EGPRS_MASK    GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16)
> +
> +enum {
> +       PERF_X86_SIMD_XMM_REGS      = 16,
> +       PERF_X86_SIMD_YMM_REGS      = 16,
> +       PERF_X86_SIMD_ZMM_REGS      = 32,
> +       PERF_X86_SIMD_VEC_REGS_MAX  = PERF_X86_SIMD_ZMM_REGS,
> +
> +       PERF_X86_SIMD_OPMASK_REGS   = 8,
> +       PERF_X86_SIMD_PRED_REGS_MAX = PERF_X86_SIMD_OPMASK_REGS,
> +};
> +
> +#define PERF_X86_SIMD_PRED_MASK        GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
> +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
> +
> +#define PERF_X86_H16ZMM_BASE           16
> +
> +enum {
> +       PERF_X86_OPMASK_QWORDS   = 1,
> +       PERF_X86_XMM_QWORDS      = 2,
> +       PERF_X86_YMM_QWORDS      = 4,
> +       PERF_X86_ZMM_QWORDS      = 8,
> +       PERF_X86_SIMD_QWORDS_MAX = PERF_X86_ZMM_QWORDS,

nit: for a non-x86 audience who may think a word is more than 2 bytes,
I think it would be nice to comment that a QWORD is 8 bytes. I don't
see other mentions of the unit of length in the kernel headers.

> +};
>
>  #endif /* _ASM_X86_PERF_REGS_H */
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 72f03153dd32..ce3a14d35390 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -314,8 +314,9 @@ enum {
>   */
>  enum perf_sample_regs_abi {
>         PERF_SAMPLE_REGS_ABI_NONE               = 0,
> -       PERF_SAMPLE_REGS_ABI_32                 = 1,
> -       PERF_SAMPLE_REGS_ABI_64                 = 2,
> +       PERF_SAMPLE_REGS_ABI_32                 = (1 << 0),
> +       PERF_SAMPLE_REGS_ABI_64                 = (1 << 1),
> +       PERF_SAMPLE_REGS_ABI_SIMD               = (1 << 2),
>  };
>
>  /*
> @@ -383,6 +384,7 @@ enum perf_event_read_format {
>  #define PERF_ATTR_SIZE_VER7                    128     /* Add: sig_data */
>  #define PERF_ATTR_SIZE_VER8                    136     /* Add: config3 */
>  #define PERF_ATTR_SIZE_VER9                    144     /* add: config4 */
> +#define PERF_ATTR_SIZE_VER10                   176     /* Add: sample_simd_{pred,vec}_reg_* */
>
>  /*
>   * 'struct perf_event_attr' contains various attributes that define
> @@ -547,6 +549,25 @@ struct perf_event_attr {
>
>         __u64   config3; /* extension of config2 */
>         __u64   config4; /* extension of config3 */
> +
> +       /*
> +        * Defines set of SIMD registers to dump on samples.
> +        * The sample_simd_regs_enabled !=0 implies the
> +        * set of SIMD registers is used to config all SIMD registers.
> +        * If !sample_simd_regs_enabled, sample_regs_XXX may be used to
> +        * config some SIMD registers on X86.

nit: I think this comment could be clearer, perhaps:

If sample_simd_regs_enabled is non-zero then the following
sampled_simd values define a set of SIMD registers to dump in all
samples. Each register is defined as a bitmap position in
(pred|vec)_reg_(intr|user) and the width of the register in qwords
(8-bytes) is given in (pred|vec)_reg_qwords. If sample_simd_regs is 0
then the vector registers may be dumped if they are in use. To
determine if all or a subset of the registers are dumped, and the
register width, the sample contains the values nr_vectors,
vector_qwords, nr_pred and pred_qwords.

Note, it is particularly the notion of "config all SIMD registers"
that I'm having a hard time being clear on here.

> +        */
> +       union {
> +               __u16 sample_simd_regs_enabled;

nit: I wonder if "enabled" is the right name here as the value being 0
means the vector register may be dumped. Perhaps
sample_simd_regs_full.

Thanks,
Ian

> +               __u16 sample_simd_pred_reg_qwords;
> +       };
> +       __u16   sample_simd_vec_reg_qwords;
> +       __u32   __reserved_4;
> +
> +       __u32   sample_simd_pred_reg_intr;
> +       __u32   sample_simd_pred_reg_user;
> +       __u64   sample_simd_vec_reg_intr;
> +       __u64   sample_simd_vec_reg_user;
>  };
>
>  /*
> @@ -1020,7 +1041,15 @@ enum perf_event_type {
>          *      } && PERF_SAMPLE_BRANCH_STACK
>          *
>          *      { u64                   abi; # enum perf_sample_regs_abi
> -        *        u64                   regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> +        *        u64                   regs[weight(mask)];
> +        *        struct {
> +        *              u16 nr_vectors;         # 0 ... weight(sample_simd_vec_reg_user)
> +        *              u16 vector_qwords;      # 0 ... sample_simd_vec_reg_qwords
> +        *              u16 nr_pred;            # 0 ... weight(sample_simd_pred_reg_user)
> +        *              u16 pred_qwords;        # 0 ... sample_simd_pred_reg_qwords
> +        *              u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> +        *        } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +        *      } && PERF_SAMPLE_REGS_USER
>          *
>          *      { u64                   size;
>          *        char                  data[size];
> @@ -1047,7 +1076,15 @@ enum perf_event_type {
>          *      { u64                   data_src; } && PERF_SAMPLE_DATA_SRC
>          *      { u64                   transaction; } && PERF_SAMPLE_TRANSACTION
>          *      { u64                   abi; # enum perf_sample_regs_abi
> -        *        u64                   regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
> +        *        u64                   regs[weight(mask)];
> +        *        struct {
> +        *              u16 nr_vectors;         # 0 ... weight(sample_simd_vec_reg_intr)
> +        *              u16 vector_qwords;      # 0 ... sample_simd_vec_reg_qwords
> +        *              u16 nr_pred;            # 0 ... weight(sample_simd_pred_reg_intr)
> +        *              u16 pred_qwords;        # 0 ... sample_simd_pred_reg_qwords
> +        *              u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> +        *        } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +        *      } && PERF_SAMPLE_REGS_INTR
>          *      { u64                   phys_addr;} && PERF_SAMPLE_PHYS_ADDR
>          *      { u64                   cgroup;} && PERF_SAMPLE_CGROUP
>          *      { u64                   data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling
  2026-02-09  8:35 ` [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
@ 2026-02-09 22:36   ` Ian Rogers
  2026-02-10  5:35     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-02-09 22:36 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao

On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> This patch adds support for sampling x86 extended GP registers (R16-R31)
> and the shadow stack pointer (SSP) register.
>
> The original XMM registers space in sample_regs_user/sample_regs_intr is
> reclaimed to represent the eGPRs and SSP when SIMD registers sampling is
> supported with the new SIMD sampling fields in the perf_event_attr
> structure. This necessitates a way to distinguish which register layout
> is used for the sample_regs_user/sample_regs_intr bitmap.
>
> To address this, a new "abi" argument is added to the helpers
> perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When
> "abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP
> layout is represented; otherwise, the legacy XMM registers are
> represented.
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/builtin-script.c                   |   2 +-
>  tools/perf/util/evsel.c                       |   6 +-
>  tools/perf/util/parse-regs-options.c          |  17 ++-
>  .../perf/util/perf-regs-arch/perf_regs_x86.c  | 120 +++++++++++++++---
>  tools/perf/util/perf_regs.c                   |  14 +-
>  tools/perf/util/perf_regs.h                   |  10 +-
>  .../scripting-engines/trace-event-python.c    |   2 +-
>  tools/perf/util/session.c                     |   9 +-
>  8 files changed, 139 insertions(+), 41 deletions(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 14c6f6c3c4f2..ffe51f895666 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
>         for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
>                 u64 val = regs->regs[i++];
>                 printed += fprintf(fp, "%5s:0x%"PRIx64" ",
> -                                  perf_reg_name(r, e_machine, e_flags),
> +                                  perf_reg_name(r, e_machine, e_flags, regs->abi),

It is tempting for clarity to add the ABI to perf_reg_name as the first patch.

>                                    val);
>         }
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f59228c1a39e..b7fb3f936ae3 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1049,19 +1049,21 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
>         }
>
>         if (param->record_mode == CALLCHAIN_DWARF) {
> +               int abi;
> +
>                 if (!function) {
>                         uint16_t e_machine = evsel__e_machine(evsel, /*e_flags=*/NULL);
>
>                         evsel__set_sample_bit(evsel, REGS_USER);
>                         evsel__set_sample_bit(evsel, STACK_USER);
>                         if (opts->sample_user_regs &&
> -                           DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST)) {
> +                           DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST, &abi)) {
>                                 attr->sample_regs_user |= DWARF_MINIMAL_REGS(e_machine);
>                                 pr_warning("WARNING: The use of --call-graph=dwarf may require all the user registers, "
>                                            "specifying a subset with --user-regs may render DWARF unwinding unreliable, "
>                                            "so the minimal registers set (IP, SP) is explicitly forced.\n");
>                         } else {
> -                               attr->sample_regs_user |= perf_user_reg_mask(EM_HOST);
> +                               attr->sample_regs_user |= perf_user_reg_mask(EM_HOST, &abi);
>                         }
>                         attr->sample_stack_user = param->dump_size;
>                         attr->exclude_callchain_user = 1;
> diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
> index c93c2f0c8105..518327883b18 100644
> --- a/tools/perf/util/parse-regs-options.c
> +++ b/tools/perf/util/parse-regs-options.c
> @@ -10,7 +10,8 @@
>  #include "util/perf_regs.h"
>  #include "util/parse-regs-options.h"
>
> -static void list_perf_regs(FILE *fp, uint64_t mask)
> +static void
> +list_perf_regs(FILE *fp, uint64_t mask, int abi)
>  {
>         const char *last_name = NULL;
>
> @@ -21,7 +22,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
>                 if (((1ULL << reg) & mask) == 0)
>                         continue;
>
> -               name = perf_reg_name(reg, EM_HOST, EF_HOST);
> +               name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
>                 if (name && (!last_name || strcmp(last_name, name)))
>                         fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
>                 last_name = name;
> @@ -29,7 +30,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
>         fputc('\n', fp);
>  }
>
> -static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
> +static uint64_t
> +name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
>  {
>         uint64_t reg_mask = 0;
>
> @@ -39,7 +41,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
>                 if (((1ULL << reg) & mask) == 0)
>                         continue;
>
> -               name = perf_reg_name(reg, EM_HOST, EF_HOST);
> +               name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
>                 if (!name)
>                         continue;
>
> @@ -56,6 +58,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>         char *s, *os = NULL, *p;
>         int ret = -1;
>         uint64_t mask;
> +       int abi;
>
>         if (unset)
>                 return 0;
> @@ -66,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>         if (*mode)
>                 return -1;
>
> -       mask = intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST);
> +       mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM_HOST, &abi);
>
>         /* str may be NULL in case no arg is passed to -I */
>         if (!str) {
> @@ -87,11 +90,11 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>                         *p = '\0';
>
>                 if (!strcmp(s, "?")) {
> -                       list_perf_regs(stderr, mask);
> +                       list_perf_regs(stderr, mask, abi);
>                         goto error;
>                 }
>
> -               reg_mask = name_to_perf_reg_mask(s, mask);
> +               reg_mask = name_to_perf_reg_mask(s, mask, abi);
>                 if (reg_mask == 0) {
>                         ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
>                                 s, intr ? "-I" : "--user-regs=");
> diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> index b6d20522b4e8..3e9241a11a95 100644
> --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> @@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op)
>         return SDT_ARG_VALID;
>  }
>
> -uint64_t __perf_reg_mask_x86(bool intr)
> +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_regs)
>  {
>         struct perf_event_attr attr = {
> -               .type                   = PERF_TYPE_HARDWARE,
> -               .config                 = PERF_COUNT_HW_CPU_CYCLES,
> -               .sample_type            = PERF_SAMPLE_REGS_INTR,
> -               .sample_regs_intr       = PERF_REG_EXTENDED_MASK,
> -               .precise_ip             = 1,
> -               .disabled               = 1,
> -               .exclude_kernel         = 1,
> +               .type                           = PERF_TYPE_HARDWARE,
> +               .config                         = PERF_COUNT_HW_CPU_CYCLES,
> +               .sample_type                    = sample_type,
> +               .precise_ip                     = 1,
> +               .disabled                       = 1,
> +               .exclude_kernel                 = 1,
> +               .sample_simd_regs_enabled       = has_simd_regs,
>         };
>         int fd;
> -
> -       if (!intr)
> -               return PERF_REGS_MASK;
> -
>         /*
>          * In an unnamed union, init it here to build on older gcc versions
>          */
>         attr.sample_period = 1;
> +       if (sample_type == PERF_SAMPLE_REGS_INTR)
> +               attr.sample_regs_intr = mask;
> +       else
> +               attr.sample_regs_user = mask;
>
>         if (perf_pmus__num_core_pmus() > 1) {
>                 struct perf_pmu *pmu = NULL;
> @@ -276,13 +276,34 @@ uint64_t __perf_reg_mask_x86(bool intr)
>                                  /*group_fd=*/-1, /*flags=*/0);
>         if (fd != -1) {
>                 close(fd);
> -               return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK);
> +               return mask;
> +       }
> +
> +       return 0;
> +}
> +
> +uint64_t __perf_reg_mask_x86(bool intr, int *abi)
> +{
> +       u64 sample_type = intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER;
> +       uint64_t mask = PERF_REGS_MASK;
> +
> +       *abi = 0;
> +       mask |= __arch__reg_mask(sample_type,
> +                                GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
> +                                true);
> +       mask |= __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true);
> +
> +       if (mask != PERF_REGS_MASK) {
> +               *abi |= PERF_SAMPLE_REGS_ABI_SIMD;
> +       } else {
> +               mask |= __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK,
> +                                        false);
>         }
>
> -       return PERF_REGS_MASK;
> +       return mask;
>  }
>
> -const char *__perf_reg_name_x86(int id)
> +static const char *__arch_reg_gpr_name(int id)
>  {
>         switch (id) {
>         case PERF_REG_X86_AX:
> @@ -333,7 +354,60 @@ const char *__perf_reg_name_x86(int id)
>                 return "R14";
>         case PERF_REG_X86_R15:
>                 return "R15";
> +       default:
> +               return NULL;
> +       }
> +
> +       return NULL;
> +}
>
> +static const char *__arch_reg_egpr_name(int id)
> +{
> +       switch (id) {
> +       case PERF_REG_X86_R16:
> +               return "R16";
> +       case PERF_REG_X86_R17:
> +               return "R17";
> +       case PERF_REG_X86_R18:
> +               return "R18";
> +       case PERF_REG_X86_R19:
> +               return "R19";
> +       case PERF_REG_X86_R20:
> +               return "R20";
> +       case PERF_REG_X86_R21:
> +               return "R21";
> +       case PERF_REG_X86_R22:
> +               return "R22";
> +       case PERF_REG_X86_R23:
> +               return "R23";
> +       case PERF_REG_X86_R24:
> +               return "R24";
> +       case PERF_REG_X86_R25:
> +               return "R25";
> +       case PERF_REG_X86_R26:
> +               return "R26";
> +       case PERF_REG_X86_R27:
> +               return "R27";
> +       case PERF_REG_X86_R28:
> +               return "R28";
> +       case PERF_REG_X86_R29:
> +               return "R29";
> +       case PERF_REG_X86_R30:
> +               return "R30";
> +       case PERF_REG_X86_R31:
> +               return "R31";
> +       case PERF_REG_X86_SSP:
> +               return "SSP";
> +       default:
> +               return NULL;
> +       }
> +
> +       return NULL;
> +}
> +
> +static const char *__arch_reg_xmm_name(int id)
> +{
> +       switch (id) {
>  #define XMM(x) \
>         case PERF_REG_X86_XMM ## x:     \
>         case PERF_REG_X86_XMM ## x + 1: \
> @@ -362,6 +436,22 @@ const char *__perf_reg_name_x86(int id)
>         return NULL;
>  }
>
> +const char *__perf_reg_name_x86(int id, int abi)
> +{
> +       const char *name;
> +
> +       name = __arch_reg_gpr_name(id);
> +       if (name)
> +               return name;
> +
> +       if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +               name = __arch_reg_egpr_name(id);
> +       else
> +               name = __arch_reg_xmm_name(id);
> +
> +       return name;
> +}
> +
>  uint64_t __perf_reg_ip_x86(void)
>  {
>         return PERF_REG_X86_IP;
> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> index 5b8f34beb24e..bdd2eef13bc3 100644
> --- a/tools/perf/util/perf_regs.c
> +++ b/tools/perf/util/perf_regs.c
> @@ -32,10 +32,11 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op)
>         return ret;
>  }
>
> -uint64_t perf_intr_reg_mask(uint16_t e_machine)
> +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi)

I wonder if abi is the right out argument name here. Before the SIMD
change the ABI meant either 32 or 64-bit. So we could imagine if it
were 32-bit then registers R8 to R15 wouldn't be in the mask for x86.
Perhaps just a "bool *" for sample_simd_regs_enabled.

Everything else looks good. Thanks for the weak function clean up,
this code is much more generic and better than before. I know it
wasn't trivial to do, but I appreciate it!

Thanks,
Ian

>  {
>         uint64_t mask = 0;
>
> +       *abi = 0;
>         switch (e_machine) {
>         case EM_ARM:
>                 mask = __perf_reg_mask_arm(/*intr=*/true);
> @@ -64,7 +65,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
>                 break;
>         case EM_386:
>         case EM_X86_64:
> -               mask = __perf_reg_mask_x86(/*intr=*/true);
> +               mask = __perf_reg_mask_x86(/*intr=*/true, abi);
>                 break;
>         default:
>                 pr_debug("Unknown ELF machine %d, interrupt sampling register mask will be empty.\n",
> @@ -75,10 +76,11 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
>         return mask;
>  }
>
> -uint64_t perf_user_reg_mask(uint16_t e_machine)
> +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi)
>  {
>         uint64_t mask = 0;
>
> +       *abi = 0;
>         switch (e_machine) {
>         case EM_ARM:
>                 mask = __perf_reg_mask_arm(/*intr=*/false);
> @@ -107,7 +109,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
>                 break;
>         case EM_386:
>         case EM_X86_64:
> -               mask = __perf_reg_mask_x86(/*intr=*/false);
> +               mask = __perf_reg_mask_x86(/*intr=*/false, abi);
>                 break;
>         default:
>                 pr_debug("Unknown ELF machine %d, user sampling register mask will be empty.\n",
> @@ -118,7 +120,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
>         return mask;
>  }
>
> -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
> +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
>  {
>         const char *reg_name = NULL;
>
> @@ -150,7 +152,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
>                 break;
>         case EM_386:
>         case EM_X86_64:
> -               reg_name = __perf_reg_name_x86(id);
> +               reg_name = __perf_reg_name_x86(id, abi);
>                 break;
>         default:
>                 break;
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index 7c04700bf837..c9501ca8045d 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -13,10 +13,10 @@ enum {
>  };
>
>  int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op);
> -uint64_t perf_intr_reg_mask(uint16_t e_machine);
> -uint64_t perf_user_reg_mask(uint16_t e_machine);
> +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi);
> +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi);
>
> -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags);
> +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi);
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
>  uint64_t perf_arch_reg_ip(uint16_t e_machine);
>  uint64_t perf_arch_reg_sp(uint16_t e_machine);
> @@ -64,8 +64,8 @@ uint64_t __perf_reg_ip_s390(void);
>  uint64_t __perf_reg_sp_s390(void);
>
>  int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op);
> -uint64_t __perf_reg_mask_x86(bool intr);
> -const char *__perf_reg_name_x86(int id);
> +uint64_t __perf_reg_mask_x86(bool intr, int *abi);
> +const char *__perf_reg_name_x86(int id, int abi);
>  uint64_t __perf_reg_ip_x86(void);
>  uint64_t __perf_reg_sp_x86(void);
>
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 2b0df7bd9a46..4cc5b96898e6 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, uint16_t e_machine,
>
>                 printed += scnprintf(bf + printed, size - printed,
>                                      "%5s:0x%" PRIx64 " ",
> -                                    perf_reg_name(r, e_machine, e_flags), val);
> +                                    perf_reg_name(r, e_machine, e_flags, regs->abi), val);
>         }
>  }
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4b465abfa36c..7cf7bf86205d 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -959,15 +959,16 @@ static void branch_stack__printf(struct perf_sample *sample,
>         }
>  }
>
> -static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uint32_t e_flags)
> +static void regs_dump__printf(u64 mask, struct regs_dump *regs,
> +                             uint16_t e_machine, uint32_t e_flags)
>  {
>         unsigned rid, i = 0;
>
>         for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
> -               u64 val = regs[i++];
> +               u64 val = regs->regs[i++];
>
>                 printf(".... %-5s 0x%016" PRIx64 "\n",
> -                      perf_reg_name(rid, e_machine, e_flags), val);
> +                      perf_reg_name(rid, e_machine, e_flags, regs->abi), val);
>         }
>  }
>
> @@ -995,7 +996,7 @@ static void regs__printf(const char *type, struct regs_dump *regs,
>                mask,
>                regs_dump_abi(regs));
>
> -       regs_dump__printf(mask, regs->regs, e_machine, e_flags);
> +       regs_dump__printf(mask, regs, e_machine, e_flags);
>  }
>
>  static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling
  2026-02-09  8:35 ` [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling Dapeng Mi
@ 2026-02-09 22:39   ` Ian Rogers
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2026-02-09 22:39 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao

On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> This patch adds support for the newly introduced SIMD register sampling
> format by adding the following 5 functions:
>
> uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
> uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
> uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
>                                                 uint16_t *qwords, bool pred);
> uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
>                                                 uint16_t *qwords, bool pred);
> const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);
>
> The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap
> of kernel supported SIMD/PRED register classes on current platform for
> intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on
> x86 platforms.
>
> The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve
> the bitmap and qwords length of a certain class of SIMD/PRED register
> on current platform for intr-regs and user-regs sampling. For example,
> for the XMM registers on x86 platforms, the returned bitmap is 0xffff
> (XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM
> register).
>
> The perf_simd_reg_class_name() function gets the register class name for
> a certain register class index.
>
> Additionally, the function __parse_regs() is enhanced to support parsing
> these newly introduced SIMD/PRED registers. Currently, each class of
> register can only be sampled collectively; sampling a specific SIMD
> register is not supported. For example, all XMM registers are sampled
> together rather than sampling only XMM0.
>
> When multiple overlapping register types, such as XMM and YMM, are
> sampled simultaneously, only the superset (YMM registers) is sampled.
>
> With this patch, all supported sampling registers on x86 platforms are
> displayed as follows.
>
>  $perf record --intr-regs=?
>  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
>  R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
>  R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7
>
>  $perf record --user-regs=?
>  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
>  R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
>  R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/evsel.c                       |  27 ++
>  tools/perf/util/parse-regs-options.c          | 161 +++++++++-
>  .../perf/util/perf-regs-arch/perf_regs_x86.c  | 292 ++++++++++++++++++
>  tools/perf/util/perf_event_attr_fprintf.c     |   6 +
>  tools/perf/util/perf_regs.c                   |  72 +++++
>  tools/perf/util/perf_regs.h                   |  11 +
>  tools/perf/util/record.h                      |   6 +
>  7 files changed, 565 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index b7fb3f936ae3..a86d2434a4ad 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1583,12 +1583,39 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>         if (opts->sample_intr_regs && !evsel->no_aux_samples &&
>             !evsel__is_dummy_event(evsel)) {
>                 attr->sample_regs_intr = opts->sample_intr_regs;
> +               attr->sample_simd_regs_enabled = !!opts->sample_pred_reg_qwords;
> +               evsel__set_sample_bit(evsel, REGS_INTR);
> +       }
> +
> +       if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) &&
> +           !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
> +               /* The pred qwords is to implies the set of SIMD registers is used */
> +               if (opts->sample_pred_reg_qwords)
> +                       attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
> +               else
> +                       attr->sample_simd_pred_reg_qwords = 1;
> +               attr->sample_simd_vec_reg_intr = opts->sample_intr_vec_regs;
> +               attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
> +               attr->sample_simd_pred_reg_intr = opts->sample_intr_pred_regs;
>                 evsel__set_sample_bit(evsel, REGS_INTR);
>         }
>
>         if (opts->sample_user_regs && !evsel->no_aux_samples &&
>             !evsel__is_dummy_event(evsel)) {
>                 attr->sample_regs_user |= opts->sample_user_regs;
> +               attr->sample_simd_regs_enabled = !!opts->sample_pred_reg_qwords;
> +               evsel__set_sample_bit(evsel, REGS_USER);
> +       }
> +
> +       if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) &&
> +           !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
> +               if (opts->sample_pred_reg_qwords)
> +                       attr->sample_simd_pred_reg_qwords = opts->sample_pred_reg_qwords;
> +               else
> +                       attr->sample_simd_pred_reg_qwords = 1;
> +               attr->sample_simd_vec_reg_user = opts->sample_user_vec_regs;
> +               attr->sample_simd_vec_reg_qwords = opts->sample_vec_reg_qwords;
> +               attr->sample_simd_pred_reg_user = opts->sample_user_pred_regs;
>                 evsel__set_sample_bit(evsel, REGS_USER);
>         }
>
> diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
> index 518327883b18..f27960846edc 100644
> --- a/tools/perf/util/parse-regs-options.c
> +++ b/tools/perf/util/parse-regs-options.c
> @@ -9,13 +9,13 @@
>  #include <subcmd/parse-options.h>
>  #include "util/perf_regs.h"
>  #include "util/parse-regs-options.h"
> +#include "record.h"
>
>  static void
> -list_perf_regs(FILE *fp, uint64_t mask, int abi)
> +__list_gp_regs(FILE *fp, uint64_t mask, int abi)
>  {
>         const char *last_name = NULL;
>
> -       fprintf(fp, "available registers: ");
>         for (int reg = 0; reg < 64; reg++) {
>                 const char *name;
>
> @@ -27,14 +27,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi)
>                         fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
>                 last_name = name;
>         }
> +}
> +
> +static void
> +__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred)
> +{
> +       uint64_t bitmap = 0;
> +       uint16_t qwords = 0;
> +       const char *name;
> +       int i = 0;
> +
> +       for (int reg_c = 0; reg_c < 64; reg_c++) {
> +               if (((1ULL << reg_c) & mask) == 0)
> +                       continue;
> +
> +               name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
> +               bitmap = intr ?
> +                        perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) :
> +                        perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred);
> +               if (name && bitmap)
> +                       fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "",
> +                               name, fls64(bitmap) - 1);
> +       }
> +}
> +
> +static void
> +list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask,
> +              uint64_t pred_mask, int abi, bool intr)
> +{
> +       bool printed = false;
> +
> +       fprintf(fp, "available registers: ");
> +
> +       if (mask) {
> +               __list_gp_regs(fp, mask, abi);
> +               printed = true;
> +       }
> +
> +       if (simd_mask) {
> +               if (printed)
> +                       fprintf(fp, " ");
> +               __list_simd_regs(fp, simd_mask, intr, /*pred=*/false);
> +               printed = true;
> +       }
> +
> +       if (pred_mask) {
> +               if (printed)
> +                       fprintf(fp, " ");
> +               __list_simd_regs(fp, pred_mask, intr, /*pred=*/true);
> +               printed = true;
> +       }
> +
>         fputc('\n', fp);
>  }
>
>  static uint64_t
> -name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
> +name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi)
>  {
>         uint64_t reg_mask = 0;
>
> +       if (!mask)
> +               return reg_mask;
> +
>         for (int reg = 0; reg < 64; reg++) {
>                 const char *name;
>
> @@ -51,13 +105,78 @@ name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
>         return reg_mask;
>  }
>
> +static bool
> +name_to_simd_reg_mask(struct record_opts *opts, const char *to_match,
> +                     uint64_t mask, bool intr, bool pred)
> +{
> +       bool matched = false;
> +       uint64_t bitmap;
> +       uint16_t qwords;
> +       int reg_c;
> +
> +       if (!mask)
> +               return false;
> +
> +       for (reg_c = 0; reg_c < 64; reg_c++) {
> +               const char *name;
> +
> +               if (((1ULL << reg_c) & mask) == 0)
> +                       continue;
> +
> +               name = perf_simd_reg_class_name(EM_HOST, reg_c, pred);
> +               if (!name)
> +                       continue;
> +
> +               if (!strcasecmp(to_match, name)) {
> +                       matched = true;
> +                       break;
> +               }
> +       }
> +
> +       if (!matched)
> +               return false;
> +
> +       if (intr) {
> +               bitmap = perf_intr_simd_reg_class_bitmap_qwords(EM_HOST,
> +                                                       reg_c, &qwords, pred);
> +       } else {
> +               bitmap = perf_user_simd_reg_class_bitmap_qwords(EM_HOST,
> +                                                       reg_c, &qwords, pred);
> +       }
> +
> +       /* Just need the highest qwords */
> +       if (pred) {
> +               if (qwords >= opts->sample_pred_reg_qwords) {
> +                       opts->sample_pred_reg_qwords = qwords;
> +                       if (intr)
> +                               opts->sample_intr_pred_regs = bitmap;
> +                       else
> +                               opts->sample_user_pred_regs = bitmap;
> +               }
> +       } else {
> +               if (qwords >= opts->sample_vec_reg_qwords) {
> +                       opts->sample_vec_reg_qwords = qwords;
> +                       if (intr)
> +                               opts->sample_intr_vec_regs = bitmap;
> +                       else
> +                               opts->sample_user_vec_regs = bitmap;
> +               }
> +       }
> +
> +       return true;
> +}
> +
>  static int
>  __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>  {
>         uint64_t *mode = (uint64_t *)opt->value;
> +       struct record_opts *opts;
>         char *s, *os = NULL, *p;
> -       int ret = -1;
> +       uint64_t simd_mask;
> +       uint64_t pred_mask;
>         uint64_t mask;
> +       bool matched;
> +       int ret = -1;
>         int abi;
>
>         if (unset)
> @@ -69,11 +188,16 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>         if (*mode)
>                 return -1;
>
> -       mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM_HOST, &abi);
> +       mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) :
> +                     perf_user_reg_mask(EM_HOST, &abi);
> +       opts = intr ? container_of(opt->value, struct record_opts, sample_intr_regs) :
> +                     container_of(opt->value, struct record_opts, sample_user_regs);
>
>         /* str may be NULL in case no arg is passed to -I */
>         if (!str) {
>                 *mode = mask;
> +               if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +                       opts->sample_pred_reg_qwords = 1;
>                 return 0;
>         }
>
> @@ -82,6 +206,14 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>         if (!s)
>                 return -1;
>
> +       if (intr) {
> +               simd_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/false);
> +               pred_mask = perf_intr_simd_reg_class_mask(EM_HOST, /*pred=*/true);
> +       } else {
> +               simd_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/false);
> +               pred_mask = perf_user_simd_reg_class_mask(EM_HOST, /*pred=*/true);
> +       }
> +
>         for (;;) {
>                 uint64_t reg_mask;
>
> @@ -90,15 +222,24 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>                         *p = '\0';
>
>                 if (!strcmp(s, "?")) {
> -                       list_perf_regs(stderr, mask, abi);
> +                       list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr);
>                         goto error;
>                 }
>
> -               reg_mask = name_to_perf_reg_mask(s, mask, abi);
> -               if (reg_mask == 0) {
> -                       ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
> +               reg_mask = name_to_gp_reg_mask(s, mask, abi);
> +               if (reg_mask) {
> +                       if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
> +                               opts->sample_pred_reg_qwords = 1;
> +               } else {
> +                       matched = name_to_simd_reg_mask(opts, s, simd_mask,
> +                                                       intr, /*pred=*/false) ||
> +                                 name_to_simd_reg_mask(opts, s, pred_mask,
> +                                                       intr, /*pred=*/true);
> +                       if (!matched) {
> +                               ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
>                                 s, intr ? "-I" : "--user-regs=");
> -                       goto error;
> +                               goto error;
> +                       }
>                 }
>                 *mode |= reg_mask;
>
> diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> index 3e9241a11a95..867059fc3cb0 100644
> --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
> @@ -461,3 +461,295 @@ uint64_t __perf_reg_sp_x86(void)
>  {
>         return PERF_REG_X86_SP;
>  }
> +
> +enum {
> +       PERF_REG_CLASS_X86_OPMASK = 0,
> +       PERF_REG_CLASS_X86_XMM,
> +       PERF_REG_CLASS_X86_YMM,
> +       PERF_REG_CLASS_X86_ZMM,
> +       PERF_REG_X86_MAX_SIMD_CLASSES,
> +};
> +
> +#define PERF_REG_CLASS_X86_PRED_MASK   (BIT(PERF_REG_CLASS_X86_OPMASK))
> +#define PERF_REG_CLASS_X86_SIMD_MASK   (BIT(PERF_REG_CLASS_X86_XMM) | \
> +                                        BIT(PERF_REG_CLASS_X86_YMM) | \
> +                                        BIT(PERF_REG_CLASS_X86_ZMM))
> +
> +/*
> + * This function is used to determin whether kernel perf subsystem supports
> + * which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling.
> + *
> + * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER
> + * @qwords: the length of SIMD register, like 1/2/4/8 qwords for
> + *          OPMASK/XMM/YMM/ZMM regisers.
> + * @mask: the bitamsk of SIMD register, like 0xffff for XMM0 ~ XMM15
> + * @pred: whether It's a preceding SIMD register, like OPMASK register.
> + *
> + * Return value: true indicates support, otherwise no support.
> + */
> +static bool
> +__support_simd_reg_class(uint64_t sample_type, uint16_t qwords,
> +                        uint64_t mask, bool pred)
> +{
> +       struct perf_event_attr attr = {
> +               .type                           = PERF_TYPE_HARDWARE,
> +               .config                         = PERF_COUNT_HW_CPU_CYCLES,
> +               .sample_type                    = sample_type,
> +               .disabled                       = 1,
> +               .exclude_kernel                 = 1,
> +               .sample_simd_regs_enabled       = 1,
> +       };
> +       int fd;
> +
> +       attr.sample_period = 1;
> +
> +       if (!pred) {
> +               attr.sample_simd_vec_reg_qwords = qwords;
> +               if (sample_type == PERF_SAMPLE_REGS_INTR)
> +                       attr.sample_simd_vec_reg_intr = mask;
> +               else
> +                       attr.sample_simd_vec_reg_user = mask;
> +       } else {
> +               attr.sample_simd_pred_reg_qwords = PERF_X86_OPMASK_QWORDS;
> +               if (sample_type == PERF_SAMPLE_REGS_INTR)
> +                       attr.sample_simd_pred_reg_intr = PERF_X86_SIMD_PRED_MASK;
> +               else
> +                       attr.sample_simd_pred_reg_user = PERF_X86_SIMD_PRED_MASK;
> +       }
> +
> +       if (perf_pmus__num_core_pmus() > 1) {
> +               __u64 type = perf_pmus__find_core_pmu()->type;
> +
> +               attr.config |= type << PERF_PMU_TYPE_SHIFT;
> +       }
> +
> +       event_attr_init(&attr);
> +
> +       fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> +       if (fd != -1) {
> +               close(fd);
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
> +#define PERF_X86_SIMD_ZMMH_REGS        (PERF_X86_SIMD_ZMM_REGS / 2)
> +
> +static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class,
> +                                     uint64_t *mask, uint16_t *qwords)
> +{
> +       bool supported = false;
> +       uint64_t bits;
> +
> +       *mask = 0;
> +       *qwords = 0;
> +
> +       switch (reg_class) {
> +       case PERF_REG_CLASS_X86_OPMASK:
> +               bits = BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1;
> +               supported = __support_simd_reg_class(sample_type,
> +                                                    PERF_X86_OPMASK_QWORDS,
> +                                                    bits, true);
> +               if (supported) {
> +                       *mask = bits;
> +                       *qwords = PERF_X86_OPMASK_QWORDS;
> +               }
> +               break;
> +       case PERF_REG_CLASS_X86_XMM:
> +               bits = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
> +               supported = __support_simd_reg_class(sample_type,
> +                                                    PERF_X86_XMM_QWORDS,
> +                                                    bits, false);
> +               if (supported) {
> +                       *mask = bits;
> +                       *qwords = PERF_X86_XMM_QWORDS;
> +               }
> +               break;
> +       case PERF_REG_CLASS_X86_YMM:
> +               bits = BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1;
> +               supported = __support_simd_reg_class(sample_type,
> +                                                    PERF_X86_YMM_QWORDS,
> +                                                    bits, false);
> +               if (supported) {
> +                       *mask = bits;
> +                       *qwords = PERF_X86_YMM_QWORDS;
> +               }
> +               break;
> +       case PERF_REG_CLASS_X86_ZMM:
> +               bits = BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1;
> +               supported = __support_simd_reg_class(sample_type,
> +                                                    PERF_X86_ZMM_QWORDS,
> +                                                    bits, false);
> +               if (supported) {
> +                       *mask = bits;
> +                       *qwords = PERF_X86_ZMM_QWORDS;
> +                       break;
> +               }
> +
> +               bits = BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1;
> +               supported = __support_simd_reg_class(sample_type,
> +                                                    PERF_X86_ZMM_QWORDS,
> +                                                    bits, false);
> +               if (supported) {
> +                       *mask = bits;
> +                       *qwords = PERF_X86_ZMM_QWORDS;
> +               }
> +               break;
> +       default:
> +               break;
> +       }
> +
> +       return supported;
> +}
> +
> +static bool __support_simd_sampling(void)
> +{
> +       uint64_t mask = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
> +       uint16_t qwords = PERF_X86_XMM_QWORDS;
> +       static bool simd_sampling_supported;
> +       static bool cached;
> +
> +       if (cached)
> +               return simd_sampling_supported;
> +
> +       simd_sampling_supported =
> +                __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR,
> +                                          PERF_REG_CLASS_X86_XMM,
> +                                          &mask, &qwords);
> +       simd_sampling_supported |=
> +                __arch_has_simd_reg_class(PERF_SAMPLE_REGS_USER,
> +                                          PERF_REG_CLASS_X86_XMM,
> +                                          &mask, &qwords);
> +       cached = true;
> +
> +       return simd_sampling_supported;
> +}
> +
> +/*
> + * @x86_intr_simd_cached: indicates the data of below 3
> + *  x86_intr_simd_* items has been retrieved from kernel and cached.
> + * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD
> + *  registers are supported for intr-regs option. Assume kernel perf
> + *  subsystem supports XMM/YMM sampling, then the mask is
> + *  PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM.
> + * @x86_intr_simd_mask: indicates register bitmask for each kind of
> + *  supported PRED/SIMD register, like
> + *  x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] = 0xffff.
> + * @x86_intr_simd_mask: indicates the register length (qwords uinit)
> + *  for each kind of supported PRED/SIMD register, like
> + *  x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] = 2.
> + */
> +static bool x86_intr_simd_cached;
> +static uint64_t x86_intr_simd_reg_class_mask;
> +static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
> +static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
> +
> +/*
> + * Similar with above x86_intr_simd_* items, the difference is these
> + * items are used for user-regs option.
> + */
> +static bool x86_user_simd_cached;
> +static uint64_t x86_user_simd_reg_class_mask;
> +static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES];
> +static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES];
> +
> +static uint64_t __arch__simd_reg_class_mask(bool intr)
> +{
> +       uint64_t mask = 0;
> +       bool supported;
> +       int reg_c;
> +
> +       if (!__support_simd_sampling())
> +               return 0;
> +
> +       if (intr && x86_intr_simd_cached)
> +               return x86_intr_simd_reg_class_mask;
> +
> +       if (!intr && x86_user_simd_cached)
> +               return x86_user_simd_reg_class_mask;
> +
> +       for (reg_c = 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) {
> +               supported = false;
> +
> +               if (intr) {
> +                       supported = __arch_has_simd_reg_class(
> +                                               PERF_SAMPLE_REGS_INTR,
> +                                               reg_c,
> +                                               &x86_intr_simd_mask[reg_c],
> +                                               &x86_intr_simd_qwords[reg_c]);
> +               } else {
> +                       supported = __arch_has_simd_reg_class(
> +                                               PERF_SAMPLE_REGS_USER,
> +                                               reg_c,
> +                                               &x86_user_simd_mask[reg_c],
> +                                               &x86_user_simd_qwords[reg_c]);
> +               }
> +               if (supported)
> +                       mask |= BIT_ULL(reg_c);
> +       }
> +
> +       if (intr) {
> +               x86_intr_simd_reg_class_mask = mask;
> +               x86_intr_simd_cached = true;
> +       } else {
> +               x86_user_simd_reg_class_mask = mask;
> +               x86_user_simd_cached = true;
> +       }
> +
> +       return mask;
> +}
> +
> +static uint64_t
> +__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qwords)
> +{
> +       uint64_t mask = 0;
> +
> +       *qwords = 0;
> +       if (reg_c >= PERF_REG_X86_MAX_SIMD_CLASSES)
> +               return mask;
> +
> +       if (intr) {
> +               mask = x86_intr_simd_mask[reg_c];
> +               *qwords = x86_intr_simd_qwords[reg_c];
> +       } else {
> +               mask = x86_user_simd_mask[reg_c];
> +               *qwords = x86_user_simd_qwords[reg_c];
> +       }
> +
> +       return mask;
> +}
> +
> +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred)
> +{
> +       uint64_t mask = __arch__simd_reg_class_mask(intr);
> +
> +       return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK :
> +                     mask & PERF_REG_CLASS_X86_SIMD_MASK;
> +}
> +
> +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
> +                                                bool intr, bool pred)
> +{
> +       if (!x86_intr_simd_cached)
> +               __perf_simd_reg_class_mask_x86(intr, pred);
> +       return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords);
> +}
> +
> +const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unused)
> +{
> +       switch (id) {
> +       case PERF_REG_CLASS_X86_OPMASK:
> +               return "OPMASK";
> +       case PERF_REG_CLASS_X86_XMM:
> +               return "XMM";
> +       case PERF_REG_CLASS_X86_YMM:
> +               return "YMM";
> +       case PERF_REG_CLASS_X86_ZMM:
> +               return "ZMM";
> +       default:
> +               return NULL;
> +       }
> +
> +       return NULL;
> +}
> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> index 741c3d657a8b..c6b8e53e06fd 100644
> --- a/tools/perf/util/perf_event_attr_fprintf.c
> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> @@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
>         PRINT_ATTRf(aux_start_paused, p_unsigned);
>         PRINT_ATTRf(aux_pause, p_unsigned);
>         PRINT_ATTRf(aux_resume, p_unsigned);
> +       PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned);
> +       PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex);
> +       PRINT_ATTRf(sample_simd_pred_reg_user, p_hex);
> +       PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned);
> +       PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex);
> +       PRINT_ATTRf(sample_simd_vec_reg_user, p_hex);
>
>         return ret;
>  }
> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> index bdd2eef13bc3..0ad40421f34e 100644
> --- a/tools/perf/util/perf_regs.c
> +++ b/tools/perf/util/perf_regs.c
> @@ -248,3 +248,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine)
>                 return 0;
>         }
>  }
> +
> +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred)
> +{
> +       switch (e_machine) {
> +       case EM_386:
> +       case EM_X86_64:
> +               return __perf_simd_reg_class_mask_x86(/*intr=*/true, pred);
> +       default:
> +               return 0;
> +       }
> +}
> +
> +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred)
> +{
> +       switch (e_machine) {
> +       case EM_386:
> +       case EM_X86_64:
> +               return __perf_simd_reg_class_mask_x86(/*intr=*/false, pred);
> +       default:
> +               return 0;
> +       }
> +}
> +
> +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
> +                                               uint16_t *qwords, bool pred)
> +{
> +       switch (e_machine) {
> +       case EM_386:
> +       case EM_X86_64:
> +               return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
> +                                                              /*intr=*/true,
> +                                                              pred);
> +       default:
> +               *qwords = 0;
> +               return 0;
> +       }
> +}
> +
> +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
> +                                               uint16_t *qwords, bool pred)
> +{
> +       switch (e_machine) {
> +       case EM_386:
> +       case EM_X86_64:
> +               return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords,
> +                                                              /*intr=*/false,
> +                                                              pred);
> +       default:
> +               *qwords = 0;
> +               return 0;
> +       }
> +}
> +
> +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred)
> +{
> +       const char *name = NULL;
> +
> +       switch (e_machine) {
> +       case EM_386:
> +       case EM_X86_64:
> +               name = __perf_simd_reg_class_name_x86(id, pred);
> +               break;
> +       default:
> +               break;
> +       }
> +       if (name)
> +               return name;
> +
> +       pr_debug("Failed to find %s register %d for ELF machine type %u\n",
> +                pred ? "PRED" : "SIMD", id, e_machine);
> +       return "unknown";
> +}
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index c9501ca8045d..80d1d7316188 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
>  uint64_t perf_arch_reg_ip(uint16_t e_machine);
>  uint64_t perf_arch_reg_sp(uint16_t e_machine);
> +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred);
> +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred);
> +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
> +                                               uint16_t *qwords, bool pred);
> +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg_c,
> +                                               uint16_t *qwords, bool pred);
> +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred);
>
>  int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op);
>  uint64_t __perf_reg_mask_arm64(bool intr);
> @@ -68,6 +75,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi);
>  const char *__perf_reg_name_x86(int id, int abi);
>  uint64_t __perf_reg_ip_x86(void);
>  uint64_t __perf_reg_sp_x86(void);
> +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred);
> +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwords,
> +                                                bool intr, bool pred);
> +const char *__perf_simd_reg_class_name_x86(int id, bool pred);
>
>  static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine)
>  {
> diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
> index 93627c9a7338..37ed44b5f15b 100644
> --- a/tools/perf/util/record.h
> +++ b/tools/perf/util/record.h
> @@ -62,6 +62,12 @@ struct record_opts {
>         u64           branch_stack;
>         u64           sample_intr_regs;
>         u64           sample_user_regs;
> +       u64           sample_intr_vec_regs;
> +       u64           sample_user_vec_regs;
> +       u32           sample_intr_pred_regs;
> +       u32           sample_user_pred_regs;
> +       u16           sample_vec_reg_qwords;
> +       u16           sample_pred_reg_qwords;
>         u64           default_interval;
>         u64           user_interval;
>         size_t        auxtrace_snapshot_size;
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 4/4] perf regs: Enable dumping of SIMD registers
  2026-02-09  8:35 ` [Patch v6 4/4] perf regs: Enable dumping of SIMD registers Dapeng Mi
@ 2026-02-09 23:02   ` Ian Rogers
  2026-02-10  6:11     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-02-09 23:02 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Kan Liang

On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> This patch adds support for dumping SIMD registers using the new
> PERF_SAMPLE_REGS_ABI_SIMD ABI.

The parsing support is also added, so I think it should be "parsing
and dumping" here and in the message subject.

> Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86
> platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI.
>
> An example of the output is displayed below.
>
> Example:
>
>  $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test
>  $perf report -D
>  ... ...
>  237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1):
>  179370/179370: 0xffffffff969627fc period: 124999 addr: 0
>  ... intr regs: mask 0x20000000000 ABI 64-bit
>  .... SSP   0x0000000000000000
>  ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1
>  .... YMM  [0] 0x0000000000004000
>  .... YMM  [0] 0x000055e828695270
>  .... YMM  [0] 0x0000000000000000
>  .... YMM  [0] 0x0000000000000000
>  .... YMM  [1] 0x000055e8286990e0
>  .... YMM  [1] 0x000055e828698dd0
>  .... YMM  [1] 0x0000000000000000
>  .... YMM  [1] 0x0000000000000000
>  ... ...
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... OPMASK[0] 0x0000000000100221
>  .... OPMASK[1] 0x0000000000000020
>  .... OPMASK[2] 0x000000007fffffff
>  .... OPMASK[3] 0x0000000000000000
>  .... OPMASK[4] 0x0000000000000000
>  .... OPMASK[5] 0x0000000000000000
>  .... OPMASK[6] 0x0000000000000000
>  .... OPMASK[7] 0x0000000000000000
>  ... ...
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/util/evsel.c   | 20 ++++++++++
>  tools/perf/util/sample.h  | 10 +++++
>  tools/perf/util/session.c | 77 +++++++++++++++++++++++++++++++++++----
>  3 files changed, 99 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index a86d2434a4ad..2e1d50a72762 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -3514,6 +3514,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>                         regs->mask = mask;
>                         regs->regs = (u64 *)array;
>                         array = (void *)array + sz;
> +
> +                       if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
> +                               regs->config = *(u64 *)array;
> +                               array = (void *)array + sizeof(u64);
> +                               regs->data = (u64 *)array;

It could be nice to add asserts for the comments:
```
        *              u16 nr_vectors;         # 0 ...
weight(sample_simd_vec_reg_user)
        *              u16 vector_qwords;      # 0 ...
sample_simd_vec_reg_qwords
        *              u16 nr_pred;            # 0 ...
weight(sample_simd_pred_reg_user)
        *              u16 pred_qwords;        # 0 ...
sample_simd_pred_reg_qwords
        *              u64 data[nr_vectors * vector_qwords + nr_pred *
pred_qwords];
```
ie:
```
assert(regs->nr_vectors <= hweight64(
evsel->core.attr.sample_simd_vec_reg_user));
assert(regs->vector_qwords <= evsel->core.attr.sample_simd_vec_reg_qwords);
assert(regs->nr_vectors <= hweight64(
evsel->core.attr.sample_simd_pred_reg_user));
assert(regs->vector_qwords <= evsel->core.attr.sample_simd_pred_reg_qwords);
```

> +                               sz = (regs->nr_vectors * regs->vector_qwords +
> +                                     regs->nr_pred * regs->pred_qwords) * sizeof(u64);
> +                               OVERFLOW_CHECK(array, sz, max_size);
> +                               array = (void *)array + sz;
> +                       }
>                 }
>         }
>
> @@ -3571,6 +3581,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>                         regs->mask = mask;
>                         regs->regs = (u64 *)array;
>                         array = (void *)array + sz;
> +
> +                       if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
> +                               regs->config = *(u64 *)array;
> +                               array = (void *)array + sizeof(u64);
> +                               regs->data = (u64 *)array;

As above but for intr:
```
assert(regs->nr_vectors <= hweight64(
evsel->core.attr.sample_simd_vec_reg_intr));
assert(regs->vector_qwords <= evsel->core.attr.sample_simd_vec_reg_qwords);
assert(regs->nr_vectors <= hweight64(
evsel->core.attr.sample_simd_pred_reg_intr));
assert(regs->vector_qwords <= evsel->core.attr.sample_simd_pred_reg_qwords);
```

> +                               sz = (regs->nr_vectors * regs->vector_qwords +
> +                                     regs->nr_pred * regs->pred_qwords) * sizeof(u64);
> +                               OVERFLOW_CHECK(array, sz, max_size);
> +                               array = (void *)array + sz;
> +                       }
>                 }
>         }
>
> diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> index 3cce8dd202aa..b98bc58d365e 100644
> --- a/tools/perf/util/sample.h
> +++ b/tools/perf/util/sample.h
> @@ -15,6 +15,16 @@ struct regs_dump {
>         u64 abi;
>         u64 mask;
>         u64 *regs;
> +       union {
> +               u64 config;
> +               struct {
> +                       u16 nr_vectors;
> +                       u16 vector_qwords;
> +                       u16 nr_pred;
> +                       u16 pred_qwords;
> +               };
> +       };
> +       u64 *data;

I think "data" is a bit generic here and could be confused with regs,
perhaps "simd_data" for clarity.

>
>         /* Cached values/mask filled by first register access. */
>         u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE];
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 7cf7bf86205d..fba8ef52f0a1 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -972,18 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_dump *regs,
>         }
>  }
>
> -static const char *regs_abi[] = {
> -       [PERF_SAMPLE_REGS_ABI_NONE] = "none",
> -       [PERF_SAMPLE_REGS_ABI_32] = "32-bit",
> -       [PERF_SAMPLE_REGS_ABI_64] = "64-bit",
> -};
> +static void simd_regs_dump__printf(struct regs_dump *regs, bool intr)
> +{
> +       const char *name = "unknown";
> +       int i, idx = 0;
> +       uint16_t qwords;
> +       int reg_c;
> +
> +       if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD))
> +               return;
> +
> +       printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qwords %d\n",
> +              regs->nr_vectors, regs->vector_qwords,
> +              regs->nr_pred, regs->pred_qwords);
> +
> +       for (reg_c = 0; reg_c < 64; reg_c++) {
> +               if (intr) {
> +                       perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,

Rather than EM_HOST here, can e_machine be an argument to the
function? That way we can x86 SIMD registers on a non-x86 machine.

> +                                                              &qwords, /*pred=*/false);
> +               } else {
> +                       perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
> +                                                              &qwords, /*pred=*/false);
> +               }
> +               if (regs->vector_qwords == qwords) {
> +                       name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/false);
> +                       break;
> +               }
> +       }
> +
> +       for (i = 0; i < regs->nr_vectors; i++) {
> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);

Is 'i' the correct value to dump here? If the bitmap of pred or vec
registers has gaps in it then we may dump say "YMM0" and "YMM1" for a
bitmap of say "YMM0" and "YMM2". I think you may need to do something
like bitmap's for_each_set_bit.

> +               if (regs->vector_qwords > 2) {
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +               }
> +               if (regs->vector_qwords > 4) {
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +               }
> +       }
> +
> +       name = "unknown";
> +       for (reg_c = 0; reg_c < 64; reg_c++) {
> +               if (intr) {
> +                       perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
> +                                                              &qwords, /*pred=*/true);
> +               } else {
> +                       perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
> +                                                              &qwords, /*pred=*/true);
> +               }
> +               if (regs->pred_qwords == qwords) {
> +                       name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/true);
> +                       break;
> +               }
> +       }
> +       for (i = 0; i < regs->nr_pred; i++)
> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> +}
>
>  static inline const char *regs_dump_abi(struct regs_dump *d)
>  {
> -       if (d->abi > PERF_SAMPLE_REGS_ABI_64)
> -               return "unknown";
> +       if (!d->abi)
> +               return "none";
> +       if (d->abi & PERF_SAMPLE_REGS_ABI_32)
> +               return "32-bit";
> +       else if (d->abi & PERF_SAMPLE_REGS_ABI_64)
> +               return "64-bit";

This isn't testing PERF_SAMPLE_REGS_ABI_SIMD and reports "32-bit" if
both ABI_32 and ABI_64 are set, which is a little surprising. Perhaps:
```
const char *regs_abi[] = {
[PERF_SAMPLE_REGS_ABI_32] = "32-bit",
[PERF_SAMPLE_REGS_ABI_64] = "64-bit",
[PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] = "SIMD",
}
if (d->abi >= ARRAY_SIZE(regs_abi) || !regs_abi[d->abi])
  return "unknown";
return regs_abi[d->abi];
```

Thanks,
Ian


>
> -       return regs_abi[d->abi];
> +       return "unknown";
>  }
>
>  static void regs__printf(const char *type, struct regs_dump *regs,
> @@ -1010,6 +1069,7 @@ static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, ui
>
>         if (user_regs->regs)
>                 regs__printf("user", user_regs, e_machine, e_flags);
> +       simd_regs_dump__printf(user_regs, /*intr=*/false);
>  }
>
>  static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
> @@ -1023,6 +1083,7 @@ static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, ui
>
>         if (intr_regs->regs)
>                 regs__printf("intr", intr_regs, e_machine, e_flags);
> +       simd_regs_dump__printf(intr_regs, /*intr=*/true);
>  }
>
>  static void stack_user__printf(struct stack_dump *dump)
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 1/4] perf headers: Sync with the kernel headers
  2026-02-09 22:09   ` Ian Rogers
@ 2026-02-10  5:21     ` Mi, Dapeng
  0 siblings, 0 replies; 12+ messages in thread
From: Mi, Dapeng @ 2026-02-10  5:21 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Kan Liang


On 2/10/2026 6:09 AM, Ian Rogers wrote:
> On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Update include/uapi/linux/perf_event.h and
>> arch/x86/include/uapi/asm/perf_regs.h to support extended regs.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/arch/x86/include/uapi/asm/perf_regs.h | 49 +++++++++++++++++++++
>>  tools/include/uapi/linux/perf_event.h       | 45 +++++++++++++++++--
>>  2 files changed, 90 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/include/uapi/asm/perf_regs.h
>> index 7c9d2bb3833b..6da63e1dbb40 100644
>> --- a/tools/arch/x86/include/uapi/asm/perf_regs.h
>> +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h
>> @@ -27,9 +27,34 @@ enum perf_event_x86_regs {
>>         PERF_REG_X86_R13,
>>         PERF_REG_X86_R14,
>>         PERF_REG_X86_R15,
>> +       /*
>> +        * The EGPRs/SSP and XMM have overlaps. Only one can be used
>> +        * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD,
>> +        * utilize EGPRs/SSP. For the other ABI type, XMM is used.
>> +        *
>> +        * Extended GPRs (EGPRs)
>> +        */
>> +       PERF_REG_X86_R16,
>> +       PERF_REG_X86_R17,
>> +       PERF_REG_X86_R18,
>> +       PERF_REG_X86_R19,
>> +       PERF_REG_X86_R20,
>> +       PERF_REG_X86_R21,
>> +       PERF_REG_X86_R22,
>> +       PERF_REG_X86_R23,
>> +       PERF_REG_X86_R24,
>> +       PERF_REG_X86_R25,
>> +       PERF_REG_X86_R26,
>> +       PERF_REG_X86_R27,
>> +       PERF_REG_X86_R28,
>> +       PERF_REG_X86_R29,
>> +       PERF_REG_X86_R30,
>> +       PERF_REG_X86_R31,
>> +       PERF_REG_X86_SSP,
> nit: I think it'd be nice to comment that PERF_REG_X86_SSP and
> PERF_REG_X86_XMM0 are both 32, the meaning of the register is
> dependent on the PERF_SAMPLE_REGS_ABI_SIMD, 0 meaning XMM0 and 1
> meaning SSP (which could be the opposite of what would be expected).

PERF_REG_X86_SSP is actually 40 instead of 32 in current table. We have
added comment (above PERF_REG_X86_R16) to declare that the indexes are
overlapped. But I suppose the comment can be optimized further.


>
>>         /* These are the limits for the GPRs. */
>>         PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
>>         PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
>> +       PERF_REG_MISC_MAX = PERF_REG_X86_SSP + 1,
>>
>>         /* These all need two bits set because they are 128bit */
>>         PERF_REG_X86_XMM0  = 32,
>> @@ -54,5 +79,29 @@ enum perf_event_x86_regs {
>>  };
>>
>>  #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1))
>> +#define PERF_X86_EGPRS_MASK    GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16)
>> +
>> +enum {
>> +       PERF_X86_SIMD_XMM_REGS      = 16,
>> +       PERF_X86_SIMD_YMM_REGS      = 16,
>> +       PERF_X86_SIMD_ZMM_REGS      = 32,
>> +       PERF_X86_SIMD_VEC_REGS_MAX  = PERF_X86_SIMD_ZMM_REGS,
>> +
>> +       PERF_X86_SIMD_OPMASK_REGS   = 8,
>> +       PERF_X86_SIMD_PRED_REGS_MAX = PERF_X86_SIMD_OPMASK_REGS,
>> +};
>> +
>> +#define PERF_X86_SIMD_PRED_MASK        GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0)
>> +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1, 0)
>> +
>> +#define PERF_X86_H16ZMM_BASE           16
>> +
>> +enum {
>> +       PERF_X86_OPMASK_QWORDS   = 1,
>> +       PERF_X86_XMM_QWORDS      = 2,
>> +       PERF_X86_YMM_QWORDS      = 4,
>> +       PERF_X86_ZMM_QWORDS      = 8,
>> +       PERF_X86_SIMD_QWORDS_MAX = PERF_X86_ZMM_QWORDS,
> nit: for a non-x86 audience who may think a word is more than 2 bytes,
> I think it would be nice to comment that a QWORD is 8 bytes. I don't
> see other mentions of the unit of length in the kernel headers.

sure.


>
>> +};
>>
>>  #endif /* _ASM_X86_PERF_REGS_H */
>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> index 72f03153dd32..ce3a14d35390 100644
>> --- a/tools/include/uapi/linux/perf_event.h
>> +++ b/tools/include/uapi/linux/perf_event.h
>> @@ -314,8 +314,9 @@ enum {
>>   */
>>  enum perf_sample_regs_abi {
>>         PERF_SAMPLE_REGS_ABI_NONE               = 0,
>> -       PERF_SAMPLE_REGS_ABI_32                 = 1,
>> -       PERF_SAMPLE_REGS_ABI_64                 = 2,
>> +       PERF_SAMPLE_REGS_ABI_32                 = (1 << 0),
>> +       PERF_SAMPLE_REGS_ABI_64                 = (1 << 1),
>> +       PERF_SAMPLE_REGS_ABI_SIMD               = (1 << 2),
>>  };
>>
>>  /*
>> @@ -383,6 +384,7 @@ enum perf_event_read_format {
>>  #define PERF_ATTR_SIZE_VER7                    128     /* Add: sig_data */
>>  #define PERF_ATTR_SIZE_VER8                    136     /* Add: config3 */
>>  #define PERF_ATTR_SIZE_VER9                    144     /* add: config4 */
>> +#define PERF_ATTR_SIZE_VER10                   176     /* Add: sample_simd_{pred,vec}_reg_* */
>>
>>  /*
>>   * 'struct perf_event_attr' contains various attributes that define
>> @@ -547,6 +549,25 @@ struct perf_event_attr {
>>
>>         __u64   config3; /* extension of config2 */
>>         __u64   config4; /* extension of config3 */
>> +
>> +       /*
>> +        * Defines set of SIMD registers to dump on samples.
>> +        * The sample_simd_regs_enabled !=0 implies the
>> +        * set of SIMD registers is used to config all SIMD registers.
>> +        * If !sample_simd_regs_enabled, sample_regs_XXX may be used to
>> +        * config some SIMD registers on X86.
> nit: I think this comment could be clearer, perhaps:
>
> If sample_simd_regs_enabled is non-zero then the following
> sampled_simd values define a set of SIMD registers to dump in all
> samples. Each register is defined as a bitmap position in
> (pred|vec)_reg_(intr|user) and the width of the register in qwords
> (8-bytes) is given in (pred|vec)_reg_qwords. If sample_simd_regs is 0
> then the vector registers may be dumped if they are in use. To
> determine if all or a subset of the registers are dumped, and the
> register width, the sample contains the values nr_vectors,
> vector_qwords, nr_pred and pred_qwords.

Sure. I would enhance the comment and make it clearer.


>
> Note, it is particularly the notion of "config all SIMD registers"
> that I'm having a hard time being clear on here.
>
>> +        */
>> +       union {
>> +               __u16 sample_simd_regs_enabled;
> nit: I wonder if "enabled" is the right name here as the value being 0
> means the vector register may be dumped. Perhaps
> sample_simd_regs_full.

Hmm, the "enabled" word could still be the best name which I can image. The
"enabled" here is to represent if the below "sample_simd_xxx" fields are
effective and should be parsed. I admit the XMM registers could be sampled
even sample_simd_regs_enabled is 0, but it has nothing to do with these
newly added "sample_simd_xxx" fields.


>
> Thanks,
> Ian
>
>> +               __u16 sample_simd_pred_reg_qwords;
>> +       };
>> +       __u16   sample_simd_vec_reg_qwords;
>> +       __u32   __reserved_4;
>> +
>> +       __u32   sample_simd_pred_reg_intr;
>> +       __u32   sample_simd_pred_reg_user;
>> +       __u64   sample_simd_vec_reg_intr;
>> +       __u64   sample_simd_vec_reg_user;
>>  };
>>
>>  /*
>> @@ -1020,7 +1041,15 @@ enum perf_event_type {
>>          *      } && PERF_SAMPLE_BRANCH_STACK
>>          *
>>          *      { u64                   abi; # enum perf_sample_regs_abi
>> -        *        u64                   regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
>> +        *        u64                   regs[weight(mask)];
>> +        *        struct {
>> +        *              u16 nr_vectors;         # 0 ... weight(sample_simd_vec_reg_user)
>> +        *              u16 vector_qwords;      # 0 ... sample_simd_vec_reg_qwords
>> +        *              u16 nr_pred;            # 0 ... weight(sample_simd_pred_reg_user)
>> +        *              u16 pred_qwords;        # 0 ... sample_simd_pred_reg_qwords
>> +        *              u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
>> +        *        } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
>> +        *      } && PERF_SAMPLE_REGS_USER
>>          *
>>          *      { u64                   size;
>>          *        char                  data[size];
>> @@ -1047,7 +1076,15 @@ enum perf_event_type {
>>          *      { u64                   data_src; } && PERF_SAMPLE_DATA_SRC
>>          *      { u64                   transaction; } && PERF_SAMPLE_TRANSACTION
>>          *      { u64                   abi; # enum perf_sample_regs_abi
>> -        *        u64                   regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
>> +        *        u64                   regs[weight(mask)];
>> +        *        struct {
>> +        *              u16 nr_vectors;         # 0 ... weight(sample_simd_vec_reg_intr)
>> +        *              u16 vector_qwords;      # 0 ... sample_simd_vec_reg_qwords
>> +        *              u16 nr_pred;            # 0 ... weight(sample_simd_pred_reg_intr)
>> +        *              u16 pred_qwords;        # 0 ... sample_simd_pred_reg_qwords
>> +        *              u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
>> +        *        } && (abi & PERF_SAMPLE_REGS_ABI_SIMD)
>> +        *      } && PERF_SAMPLE_REGS_INTR
>>          *      { u64                   phys_addr;} && PERF_SAMPLE_PHYS_ADDR
>>          *      { u64                   cgroup;} && PERF_SAMPLE_CGROUP
>>          *      { u64                   data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling
  2026-02-09 22:36   ` Ian Rogers
@ 2026-02-10  5:35     ` Mi, Dapeng
  0 siblings, 0 replies; 12+ messages in thread
From: Mi, Dapeng @ 2026-02-10  5:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao


On 2/10/2026 6:36 AM, Ian Rogers wrote:
> On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>> This patch adds support for sampling x86 extended GP registers (R16-R31)
>> and the shadow stack pointer (SSP) register.
>>
>> The original XMM registers space in sample_regs_user/sample_regs_intr is
>> reclaimed to represent the eGPRs and SSP when SIMD registers sampling is
>> supported with the new SIMD sampling fields in the perf_event_attr
>> structure. This necessitates a way to distinguish which register layout
>> is used for the sample_regs_user/sample_regs_intr bitmap.
>>
>> To address this, a new "abi" argument is added to the helpers
>> perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When
>> "abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP
>> layout is represented; otherwise, the legacy XMM registers are
>> represented.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/builtin-script.c                   |   2 +-
>>  tools/perf/util/evsel.c                       |   6 +-
>>  tools/perf/util/parse-regs-options.c          |  17 ++-
>>  .../perf/util/perf-regs-arch/perf_regs_x86.c  | 120 +++++++++++++++---
>>  tools/perf/util/perf_regs.c                   |  14 +-
>>  tools/perf/util/perf_regs.h                   |  10 +-
>>  .../scripting-engines/trace-event-python.c    |   2 +-
>>  tools/perf/util/session.c                     |   9 +-
>>  8 files changed, 139 insertions(+), 41 deletions(-)
>>
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index 14c6f6c3c4f2..ffe51f895666 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
>>         for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
>>                 u64 val = regs->regs[i++];
>>                 printed += fprintf(fp, "%5s:0x%"PRIx64" ",
>> -                                  perf_reg_name(r, e_machine, e_flags),
>> +                                  perf_reg_name(r, e_machine, e_flags, regs->abi),
> It is tempting for clarity to add the ABI to perf_reg_name as the first patch.
>
>>                                    val);
>>         }
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index f59228c1a39e..b7fb3f936ae3 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -1049,19 +1049,21 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
>>         }
>>
>>         if (param->record_mode == CALLCHAIN_DWARF) {
>> +               int abi;
>> +
>>                 if (!function) {
>>                         uint16_t e_machine = evsel__e_machine(evsel, /*e_flags=*/NULL);
>>
>>                         evsel__set_sample_bit(evsel, REGS_USER);
>>                         evsel__set_sample_bit(evsel, STACK_USER);
>>                         if (opts->sample_user_regs &&
>> -                           DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST)) {
>> +                           DWARF_MINIMAL_REGS(e_machine) != perf_user_reg_mask(EM_HOST, &abi)) {
>>                                 attr->sample_regs_user |= DWARF_MINIMAL_REGS(e_machine);
>>                                 pr_warning("WARNING: The use of --call-graph=dwarf may require all the user registers, "
>>                                            "specifying a subset with --user-regs may render DWARF unwinding unreliable, "
>>                                            "so the minimal registers set (IP, SP) is explicitly forced.\n");
>>                         } else {
>> -                               attr->sample_regs_user |= perf_user_reg_mask(EM_HOST);
>> +                               attr->sample_regs_user |= perf_user_reg_mask(EM_HOST, &abi);
>>                         }
>>                         attr->sample_stack_user = param->dump_size;
>>                         attr->exclude_callchain_user = 1;
>> diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
>> index c93c2f0c8105..518327883b18 100644
>> --- a/tools/perf/util/parse-regs-options.c
>> +++ b/tools/perf/util/parse-regs-options.c
>> @@ -10,7 +10,8 @@
>>  #include "util/perf_regs.h"
>>  #include "util/parse-regs-options.h"
>>
>> -static void list_perf_regs(FILE *fp, uint64_t mask)
>> +static void
>> +list_perf_regs(FILE *fp, uint64_t mask, int abi)
>>  {
>>         const char *last_name = NULL;
>>
>> @@ -21,7 +22,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
>>                 if (((1ULL << reg) & mask) == 0)
>>                         continue;
>>
>> -               name = perf_reg_name(reg, EM_HOST, EF_HOST);
>> +               name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
>>                 if (name && (!last_name || strcmp(last_name, name)))
>>                         fprintf(fp, "%s%s", reg > 0 ? " " : "", name);
>>                 last_name = name;
>> @@ -29,7 +30,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask)
>>         fputc('\n', fp);
>>  }
>>
>> -static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
>> +static uint64_t
>> +name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi)
>>  {
>>         uint64_t reg_mask = 0;
>>
>> @@ -39,7 +41,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask)
>>                 if (((1ULL << reg) & mask) == 0)
>>                         continue;
>>
>> -               name = perf_reg_name(reg, EM_HOST, EF_HOST);
>> +               name = perf_reg_name(reg, EM_HOST, EF_HOST, abi);
>>                 if (!name)
>>                         continue;
>>
>> @@ -56,6 +58,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>>         char *s, *os = NULL, *p;
>>         int ret = -1;
>>         uint64_t mask;
>> +       int abi;
>>
>>         if (unset)
>>                 return 0;
>> @@ -66,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>>         if (*mode)
>>                 return -1;
>>
>> -       mask = intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST);
>> +       mask = intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM_HOST, &abi);
>>
>>         /* str may be NULL in case no arg is passed to -I */
>>         if (!str) {
>> @@ -87,11 +90,11 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
>>                         *p = '\0';
>>
>>                 if (!strcmp(s, "?")) {
>> -                       list_perf_regs(stderr, mask);
>> +                       list_perf_regs(stderr, mask, abi);
>>                         goto error;
>>                 }
>>
>> -               reg_mask = name_to_perf_reg_mask(s, mask);
>> +               reg_mask = name_to_perf_reg_mask(s, mask, abi);
>>                 if (reg_mask == 0) {
>>                         ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n",
>>                                 s, intr ? "-I" : "--user-regs=");
>> diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
>> index b6d20522b4e8..3e9241a11a95 100644
>> --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c
>> +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c
>> @@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op)
>>         return SDT_ARG_VALID;
>>  }
>>
>> -uint64_t __perf_reg_mask_x86(bool intr)
>> +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_regs)
>>  {
>>         struct perf_event_attr attr = {
>> -               .type                   = PERF_TYPE_HARDWARE,
>> -               .config                 = PERF_COUNT_HW_CPU_CYCLES,
>> -               .sample_type            = PERF_SAMPLE_REGS_INTR,
>> -               .sample_regs_intr       = PERF_REG_EXTENDED_MASK,
>> -               .precise_ip             = 1,
>> -               .disabled               = 1,
>> -               .exclude_kernel         = 1,
>> +               .type                           = PERF_TYPE_HARDWARE,
>> +               .config                         = PERF_COUNT_HW_CPU_CYCLES,
>> +               .sample_type                    = sample_type,
>> +               .precise_ip                     = 1,
>> +               .disabled                       = 1,
>> +               .exclude_kernel                 = 1,
>> +               .sample_simd_regs_enabled       = has_simd_regs,
>>         };
>>         int fd;
>> -
>> -       if (!intr)
>> -               return PERF_REGS_MASK;
>> -
>>         /*
>>          * In an unnamed union, init it here to build on older gcc versions
>>          */
>>         attr.sample_period = 1;
>> +       if (sample_type == PERF_SAMPLE_REGS_INTR)
>> +               attr.sample_regs_intr = mask;
>> +       else
>> +               attr.sample_regs_user = mask;
>>
>>         if (perf_pmus__num_core_pmus() > 1) {
>>                 struct perf_pmu *pmu = NULL;
>> @@ -276,13 +276,34 @@ uint64_t __perf_reg_mask_x86(bool intr)
>>                                  /*group_fd=*/-1, /*flags=*/0);
>>         if (fd != -1) {
>>                 close(fd);
>> -               return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK);
>> +               return mask;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +uint64_t __perf_reg_mask_x86(bool intr, int *abi)
>> +{
>> +       u64 sample_type = intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER;
>> +       uint64_t mask = PERF_REGS_MASK;
>> +
>> +       *abi = 0;
>> +       mask |= __arch__reg_mask(sample_type,
>> +                                GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
>> +                                true);
>> +       mask |= __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true);
>> +
>> +       if (mask != PERF_REGS_MASK) {
>> +               *abi |= PERF_SAMPLE_REGS_ABI_SIMD;
>> +       } else {
>> +               mask |= __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK,
>> +                                        false);
>>         }
>>
>> -       return PERF_REGS_MASK;
>> +       return mask;
>>  }
>>
>> -const char *__perf_reg_name_x86(int id)
>> +static const char *__arch_reg_gpr_name(int id)
>>  {
>>         switch (id) {
>>         case PERF_REG_X86_AX:
>> @@ -333,7 +354,60 @@ const char *__perf_reg_name_x86(int id)
>>                 return "R14";
>>         case PERF_REG_X86_R15:
>>                 return "R15";
>> +       default:
>> +               return NULL;
>> +       }
>> +
>> +       return NULL;
>> +}
>>
>> +static const char *__arch_reg_egpr_name(int id)
>> +{
>> +       switch (id) {
>> +       case PERF_REG_X86_R16:
>> +               return "R16";
>> +       case PERF_REG_X86_R17:
>> +               return "R17";
>> +       case PERF_REG_X86_R18:
>> +               return "R18";
>> +       case PERF_REG_X86_R19:
>> +               return "R19";
>> +       case PERF_REG_X86_R20:
>> +               return "R20";
>> +       case PERF_REG_X86_R21:
>> +               return "R21";
>> +       case PERF_REG_X86_R22:
>> +               return "R22";
>> +       case PERF_REG_X86_R23:
>> +               return "R23";
>> +       case PERF_REG_X86_R24:
>> +               return "R24";
>> +       case PERF_REG_X86_R25:
>> +               return "R25";
>> +       case PERF_REG_X86_R26:
>> +               return "R26";
>> +       case PERF_REG_X86_R27:
>> +               return "R27";
>> +       case PERF_REG_X86_R28:
>> +               return "R28";
>> +       case PERF_REG_X86_R29:
>> +               return "R29";
>> +       case PERF_REG_X86_R30:
>> +               return "R30";
>> +       case PERF_REG_X86_R31:
>> +               return "R31";
>> +       case PERF_REG_X86_SSP:
>> +               return "SSP";
>> +       default:
>> +               return NULL;
>> +       }
>> +
>> +       return NULL;
>> +}
>> +
>> +static const char *__arch_reg_xmm_name(int id)
>> +{
>> +       switch (id) {
>>  #define XMM(x) \
>>         case PERF_REG_X86_XMM ## x:     \
>>         case PERF_REG_X86_XMM ## x + 1: \
>> @@ -362,6 +436,22 @@ const char *__perf_reg_name_x86(int id)
>>         return NULL;
>>  }
>>
>> +const char *__perf_reg_name_x86(int id, int abi)
>> +{
>> +       const char *name;
>> +
>> +       name = __arch_reg_gpr_name(id);
>> +       if (name)
>> +               return name;
>> +
>> +       if (abi & PERF_SAMPLE_REGS_ABI_SIMD)
>> +               name = __arch_reg_egpr_name(id);
>> +       else
>> +               name = __arch_reg_xmm_name(id);
>> +
>> +       return name;
>> +}
>> +
>>  uint64_t __perf_reg_ip_x86(void)
>>  {
>>         return PERF_REG_X86_IP;
>> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
>> index 5b8f34beb24e..bdd2eef13bc3 100644
>> --- a/tools/perf/util/perf_regs.c
>> +++ b/tools/perf/util/perf_regs.c
>> @@ -32,10 +32,11 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op)
>>         return ret;
>>  }
>>
>> -uint64_t perf_intr_reg_mask(uint16_t e_machine)
>> +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi)
> I wonder if abi is the right out argument name here. Before the SIMD
> change the ABI meant either 32 or 64-bit. So we could imagine if it
> were 32-bit then registers R8 to R15 wouldn't be in the mask for x86.
> Perhaps just a "bool *" for sample_simd_regs_enabled.

Hmm, I ever concerned to add a "bool *simd_enabled" argument as well, but
it looks a little bit weird and abrupt since other architectures may never
need to use this argument. Instead, "abi" is neutral argument and it may be
needed by other architectures in the future.


>
> Everything else looks good. Thanks for the weak function clean up,
> this code is much more generic and better than before. I know it
> wasn't trivial to do, but I appreciate it!

Thanks a lot for your meticulous reviewing as well. :)


>
> Thanks,
> Ian
>
>>  {
>>         uint64_t mask = 0;
>>
>> +       *abi = 0;
>>         switch (e_machine) {
>>         case EM_ARM:
>>                 mask = __perf_reg_mask_arm(/*intr=*/true);
>> @@ -64,7 +65,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
>>                 break;
>>         case EM_386:
>>         case EM_X86_64:
>> -               mask = __perf_reg_mask_x86(/*intr=*/true);
>> +               mask = __perf_reg_mask_x86(/*intr=*/true, abi);
>>                 break;
>>         default:
>>                 pr_debug("Unknown ELF machine %d, interrupt sampling register mask will be empty.\n",
>> @@ -75,10 +76,11 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine)
>>         return mask;
>>  }
>>
>> -uint64_t perf_user_reg_mask(uint16_t e_machine)
>> +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi)
>>  {
>>         uint64_t mask = 0;
>>
>> +       *abi = 0;
>>         switch (e_machine) {
>>         case EM_ARM:
>>                 mask = __perf_reg_mask_arm(/*intr=*/false);
>> @@ -107,7 +109,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
>>                 break;
>>         case EM_386:
>>         case EM_X86_64:
>> -               mask = __perf_reg_mask_x86(/*intr=*/false);
>> +               mask = __perf_reg_mask_x86(/*intr=*/false, abi);
>>                 break;
>>         default:
>>                 pr_debug("Unknown ELF machine %d, user sampling register mask will be empty.\n",
>> @@ -118,7 +120,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine)
>>         return mask;
>>  }
>>
>> -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
>> +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi)
>>  {
>>         const char *reg_name = NULL;
>>
>> @@ -150,7 +152,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags)
>>                 break;
>>         case EM_386:
>>         case EM_X86_64:
>> -               reg_name = __perf_reg_name_x86(id);
>> +               reg_name = __perf_reg_name_x86(id, abi);
>>                 break;
>>         default:
>>                 break;
>> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
>> index 7c04700bf837..c9501ca8045d 100644
>> --- a/tools/perf/util/perf_regs.h
>> +++ b/tools/perf/util/perf_regs.h
>> @@ -13,10 +13,10 @@ enum {
>>  };
>>
>>  int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op);
>> -uint64_t perf_intr_reg_mask(uint16_t e_machine);
>> -uint64_t perf_user_reg_mask(uint16_t e_machine);
>> +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi);
>> +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi);
>>
>> -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags);
>> +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, int abi);
>>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
>>  uint64_t perf_arch_reg_ip(uint16_t e_machine);
>>  uint64_t perf_arch_reg_sp(uint16_t e_machine);
>> @@ -64,8 +64,8 @@ uint64_t __perf_reg_ip_s390(void);
>>  uint64_t __perf_reg_sp_s390(void);
>>
>>  int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op);
>> -uint64_t __perf_reg_mask_x86(bool intr);
>> -const char *__perf_reg_name_x86(int id);
>> +uint64_t __perf_reg_mask_x86(bool intr, int *abi);
>> +const char *__perf_reg_name_x86(int id, int abi);
>>  uint64_t __perf_reg_ip_x86(void);
>>  uint64_t __perf_reg_sp_x86(void);
>>
>> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
>> index 2b0df7bd9a46..4cc5b96898e6 100644
>> --- a/tools/perf/util/scripting-engines/trace-event-python.c
>> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
>> @@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, uint16_t e_machine,
>>
>>                 printed += scnprintf(bf + printed, size - printed,
>>                                      "%5s:0x%" PRIx64 " ",
>> -                                    perf_reg_name(r, e_machine, e_flags), val);
>> +                                    perf_reg_name(r, e_machine, e_flags, regs->abi), val);
>>         }
>>  }
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 4b465abfa36c..7cf7bf86205d 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -959,15 +959,16 @@ static void branch_stack__printf(struct perf_sample *sample,
>>         }
>>  }
>>
>> -static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uint32_t e_flags)
>> +static void regs_dump__printf(u64 mask, struct regs_dump *regs,
>> +                             uint16_t e_machine, uint32_t e_flags)
>>  {
>>         unsigned rid, i = 0;
>>
>>         for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
>> -               u64 val = regs[i++];
>> +               u64 val = regs->regs[i++];
>>
>>                 printf(".... %-5s 0x%016" PRIx64 "\n",
>> -                      perf_reg_name(rid, e_machine, e_flags), val);
>> +                      perf_reg_name(rid, e_machine, e_flags, regs->abi), val);
>>         }
>>  }
>>
>> @@ -995,7 +996,7 @@ static void regs__printf(const char *type, struct regs_dump *regs,
>>                mask,
>>                regs_dump_abi(regs));
>>
>> -       regs_dump__printf(mask, regs->regs, e_machine, e_flags);
>> +       regs_dump__printf(mask, regs, e_machine, e_flags);
>>  }
>>
>>  static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v6 4/4] perf regs: Enable dumping of SIMD registers
  2026-02-09 23:02   ` Ian Rogers
@ 2026-02-10  6:11     ` Mi, Dapeng
  0 siblings, 0 replies; 12+ messages in thread
From: Mi, Dapeng @ 2026-02-10  6:11 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Zide Chen, Falcon Thomas, Dapeng Mi, Xudong Hao,
	Kan Liang


On 2/10/2026 7:02 AM, Ian Rogers wrote:
> On Mon, Feb 9, 2026 at 12:39 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> This patch adds support for dumping SIMD registers using the new
>> PERF_SAMPLE_REGS_ABI_SIMD ABI.
> The parsing support is also added, so I think it should be "parsing
> and dumping" here and in the message subject.

Sure.


>
>> Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86
>> platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI.
>>
>> An example of the output is displayed below.
>>
>> Example:
>>
>>  $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test
>>  $perf report -D
>>  ... ...
>>  237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1):
>>  179370/179370: 0xffffffff969627fc period: 124999 addr: 0
>>  ... intr regs: mask 0x20000000000 ABI 64-bit
>>  .... SSP   0x0000000000000000
>>  ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1
>>  .... YMM  [0] 0x0000000000004000
>>  .... YMM  [0] 0x000055e828695270
>>  .... YMM  [0] 0x0000000000000000
>>  .... YMM  [0] 0x0000000000000000
>>  .... YMM  [1] 0x000055e8286990e0
>>  .... YMM  [1] 0x000055e828698dd0
>>  .... YMM  [1] 0x0000000000000000
>>  .... YMM  [1] 0x0000000000000000
>>  ... ...
>>  .... YMM  [31] 0x0000000000000000
>>  .... YMM  [31] 0x0000000000000000
>>  .... YMM  [31] 0x0000000000000000
>>  .... YMM  [31] 0x0000000000000000
>>  .... OPMASK[0] 0x0000000000100221
>>  .... OPMASK[1] 0x0000000000000020
>>  .... OPMASK[2] 0x000000007fffffff
>>  .... OPMASK[3] 0x0000000000000000
>>  .... OPMASK[4] 0x0000000000000000
>>  .... OPMASK[5] 0x0000000000000000
>>  .... OPMASK[6] 0x0000000000000000
>>  .... OPMASK[7] 0x0000000000000000
>>  ... ...
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/util/evsel.c   | 20 ++++++++++
>>  tools/perf/util/sample.h  | 10 +++++
>>  tools/perf/util/session.c | 77 +++++++++++++++++++++++++++++++++++----
>>  3 files changed, 99 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index a86d2434a4ad..2e1d50a72762 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -3514,6 +3514,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>>                         regs->mask = mask;
>>                         regs->regs = (u64 *)array;
>>                         array = (void *)array + sz;
>> +
>> +                       if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
>> +                               regs->config = *(u64 *)array;
>> +                               array = (void *)array + sizeof(u64);
>> +                               regs->data = (u64 *)array;
> It could be nice to add asserts for the comments:
> ```
>         *              u16 nr_vectors;         # 0 ...
> weight(sample_simd_vec_reg_user)
>         *              u16 vector_qwords;      # 0 ...
> sample_simd_vec_reg_qwords
>         *              u16 nr_pred;            # 0 ...
> weight(sample_simd_pred_reg_user)
>         *              u16 pred_qwords;        # 0 ...
> sample_simd_pred_reg_qwords
>         *              u64 data[nr_vectors * vector_qwords + nr_pred *
> pred_qwords];
> ```
> ie:
> ```
> assert(regs->nr_vectors <= hweight64(
> evsel->core.attr.sample_simd_vec_reg_user));
> assert(regs->vector_qwords <= evsel->core.attr.sample_simd_vec_reg_qwords);
> assert(regs->nr_vectors <= hweight64(
> evsel->core.attr.sample_simd_pred_reg_user));
> assert(regs->vector_qwords <= evsel->core.attr.sample_simd_pred_reg_qwords);

Good idea. Would do.


> ```
>
>> +                               sz = (regs->nr_vectors * regs->vector_qwords +
>> +                                     regs->nr_pred * regs->pred_qwords) * sizeof(u64);
>> +                               OVERFLOW_CHECK(array, sz, max_size);
>> +                               array = (void *)array + sz;
>> +                       }
>>                 }
>>         }
>>
>> @@ -3571,6 +3581,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>>                         regs->mask = mask;
>>                         regs->regs = (u64 *)array;
>>                         array = (void *)array + sz;
>> +
>> +                       if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
>> +                               regs->config = *(u64 *)array;
>> +                               array = (void *)array + sizeof(u64);
>> +                               regs->data = (u64 *)array;
> As above but for intr:
> ```
> assert(regs->nr_vectors <= hweight64(
> evsel->core.attr.sample_simd_vec_reg_intr));
> assert(regs->vector_qwords <= evsel->core.attr.sample_simd_vec_reg_qwords);
> assert(regs->nr_vectors <= hweight64(
> evsel->core.attr.sample_simd_pred_reg_intr));
> assert(regs->vector_qwords <= evsel->core.attr.sample_simd_pred_reg_qwords);
> ```

Sure.


>> +                               sz = (regs->nr_vectors * regs->vector_qwords +
>> +                                     regs->nr_pred * regs->pred_qwords) * sizeof(u64);
>> +                               OVERFLOW_CHECK(array, sz, max_size);
>> +                               array = (void *)array + sz;
>> +                       }
>>                 }
>>         }
>>
>> diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
>> index 3cce8dd202aa..b98bc58d365e 100644
>> --- a/tools/perf/util/sample.h
>> +++ b/tools/perf/util/sample.h
>> @@ -15,6 +15,16 @@ struct regs_dump {
>>         u64 abi;
>>         u64 mask;
>>         u64 *regs;
>> +       union {
>> +               u64 config;
>> +               struct {
>> +                       u16 nr_vectors;
>> +                       u16 vector_qwords;
>> +                       u16 nr_pred;
>> +                       u16 pred_qwords;
>> +               };
>> +       };
>> +       u64 *data;
> I think "data" is a bit generic here and could be confused with regs,
> perhaps "simd_data" for clarity.

Sure.


>
>>         /* Cached values/mask filled by first register access. */
>>         u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE];
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 7cf7bf86205d..fba8ef52f0a1 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -972,18 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_dump *regs,
>>         }
>>  }
>>
>> -static const char *regs_abi[] = {
>> -       [PERF_SAMPLE_REGS_ABI_NONE] = "none",
>> -       [PERF_SAMPLE_REGS_ABI_32] = "32-bit",
>> -       [PERF_SAMPLE_REGS_ABI_64] = "64-bit",
>> -};
>> +static void simd_regs_dump__printf(struct regs_dump *regs, bool intr)
>> +{
>> +       const char *name = "unknown";
>> +       int i, idx = 0;
>> +       uint16_t qwords;
>> +       int reg_c;
>> +
>> +       if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD))
>> +               return;
>> +
>> +       printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qwords %d\n",
>> +              regs->nr_vectors, regs->vector_qwords,
>> +              regs->nr_pred, regs->pred_qwords);
>> +
>> +       for (reg_c = 0; reg_c < 64; reg_c++) {
>> +               if (intr) {
>> +                       perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
> Rather than EM_HOST here, can e_machine be an argument to the
> function? That way we can x86 SIMD registers on a non-x86 machine.
>
>> +                                                              &qwords, /*pred=*/false);
>> +               } else {
>> +                       perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
>> +                                                              &qwords, /*pred=*/false);
>> +               }
>> +               if (regs->vector_qwords == qwords) {
>> +                       name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/false);
>> +                       break;
>> +               }
>> +       }
>> +
>> +       for (i = 0; i < regs->nr_vectors; i++) {
>> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
> Is 'i' the correct value to dump here? If the bitmap of pred or vec
> registers has gaps in it then we may dump say "YMM0" and "YMM1" for a
> bitmap of say "YMM0" and "YMM2". I think you may need to do something
> like bitmap's for_each_set_bit.

Currently the SIMD registers can only be sampled as a whole, so there
should be a hole in the registers bitmap, and I think we don't need to
worry this.

BTW, I looked the code, it seems quite hard to get the "sample_simd_xxx"
registers bitmap in this function ...



>
>> +               if (regs->vector_qwords > 2) {
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +               }
>> +               if (regs->vector_qwords > 4) {
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +                       printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +               }
>> +       }
>> +
>> +       name = "unknown";
>> +       for (reg_c = 0; reg_c < 64; reg_c++) {
>> +               if (intr) {
>> +                       perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
>> +                                                              &qwords, /*pred=*/true);
>> +               } else {
>> +                       perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c,
>> +                                                              &qwords, /*pred=*/true);
>> +               }
>> +               if (regs->pred_qwords == qwords) {
>> +                       name = perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=*/true);
>> +                       break;
>> +               }
>> +       }
>> +       for (i = 0; i < regs->nr_pred; i++)
>> +               printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]);
>> +}
>>
>>  static inline const char *regs_dump_abi(struct regs_dump *d)
>>  {
>> -       if (d->abi > PERF_SAMPLE_REGS_ABI_64)
>> -               return "unknown";
>> +       if (!d->abi)
>> +               return "none";
>> +       if (d->abi & PERF_SAMPLE_REGS_ABI_32)
>> +               return "32-bit";
>> +       else if (d->abi & PERF_SAMPLE_REGS_ABI_64)
>> +               return "64-bit";
> This isn't testing PERF_SAMPLE_REGS_ABI_SIMD and reports "32-bit" if
> both ABI_32 and ABI_64 are set, which is a little surprising. Perhaps:
> ```
> const char *regs_abi[] = {
> [PERF_SAMPLE_REGS_ABI_32] = "32-bit",
> [PERF_SAMPLE_REGS_ABI_64] = "64-bit",
> [PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] = "SIMD",
> }
> if (d->abi >= ARRAY_SIZE(regs_abi) || !regs_abi[d->abi])
>   return "unknown";
> return regs_abi[d->abi];

I'm not sure if we should return "SIMD" for this function when
PERF_SAMPLE_REGS_ABI_SIMD is set. The original regs_dump_abi() only returns
"32-bit" or "64 bit". If we really want to highlight the SIMD, maybe we can
return "64-bit SIMD".

Thanks.

> ```
>
> Thanks,
> Ian
>
>
>> -       return regs_abi[d->abi];
>> +       return "unknown";
>>  }
>>
>>  static void regs__printf(const char *type, struct regs_dump *regs,
>> @@ -1010,6 +1069,7 @@ static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, ui
>>
>>         if (user_regs->regs)
>>                 regs__printf("user", user_regs, e_machine, e_flags);
>> +       simd_regs_dump__printf(user_regs, /*intr=*/false);
>>  }
>>
>>  static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
>> @@ -1023,6 +1083,7 @@ static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, ui
>>
>>         if (intr_regs->regs)
>>                 regs__printf("intr", intr_regs, e_machine, e_flags);
>> +       simd_regs_dump__printf(intr_regs, /*intr=*/true);
>>  }
>>
>>  static void stack_user__printf(struct stack_dump *dump)
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-10  6:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09  8:35 [Patch v6 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
2026-02-09  8:35 ` [Patch v6 1/4] perf headers: Sync with the kernel headers Dapeng Mi
2026-02-09 22:09   ` Ian Rogers
2026-02-10  5:21     ` Mi, Dapeng
2026-02-09  8:35 ` [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
2026-02-09 22:36   ` Ian Rogers
2026-02-10  5:35     ` Mi, Dapeng
2026-02-09  8:35 ` [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling Dapeng Mi
2026-02-09 22:39   ` Ian Rogers
2026-02-09  8:35 ` [Patch v6 4/4] perf regs: Enable dumping of SIMD registers Dapeng Mi
2026-02-09 23:02   ` Ian Rogers
2026-02-10  6:11     ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox