The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
@ 2026-06-23 13:02 Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

`perf test -v "perf data type profiling tests"` fails on ARM64:

    Basic Rust perf annotate test
    perf mem record -o /tmp/perf.data perf test -w code_with_type
    perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
    Basic annotate [Failed: missing target data type]

The root cause is that ARM64 lacks the instruction parsing infrastructure
required for data type profiling. Specifically:

  1. annotate_get_insn_location() cannot extract register numbers and
     memory offsets from ARM64 load/store instructions, because ARM64
     does not set objdump.register_char or objdump.memory_ref_char
     (unlike x86 which uses '%' and '(').

  2. arch_supports_insn_tracking() does not include ARM64, so
     find_data_type_block() cannot perform instruction-level type state
     tracking.

  3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
     after memset, which causes x0-based memory accesses to be
     misidentified as stack accesses.

As a result, perf annotate --code-with-type silently produces no type
annotations on ARM64, and the test grep for "# data-type: struct Buf"
fails.

This series adds ARM64 data type profiling support following the PowerPC
model: decode raw 32-bit instruction words rather than parsing objdump
text. ARM64's fixed-width encoding and trivial DWARF register mapping
(x0-x30 = DWARF 0-30) make this approach clean and robust.

Three classes of instructions are tracked for register state propagation:
  - ADRP: compute PC-relative page address for global variable resolution
  - ADD (immediate): combine with ADRP result to form full variable address
  - MOV (register): propagate type state between registers

This covers the common `adrp + add + ldr/str` pattern that ARM64
compilers emit for global variable access.

Known limitations:
  - The `adrp + ldr` pattern (with :lo12: folded into the load offset,
    without an intermediate ADD) is not yet handled. This requires
    extending check_matching_type() to resolve TSR_KIND_CONST with the
    load offset, which can be added incrementally.
  - Pointer chain tracking (load-from-memory propagating type to the
    destination register) is not implemented, matching PowerPC's current
    scope.

Testing:
  All four sub-tests in `perf test "perf data type profiling tests"`
  pass reliably on ARM64 (AArch64, SPE-capable hardware):
    - Basic/Pipe Rust: struct Buf (code_with_type workload)
    - Basic/Pipe C: struct buf (datasym workload, global variable)

Patch breakdown:
  1/5  Widen type_state_reg::imm_value from u32 to u64 (prerequisite
       for storing 64-bit addresses from ADRP)
  2/5  Add arch__is_arm64() detection, raw instruction parsing from
       objdump output, and enable show_asm_raw for ARM64
  3/5  Add get_arm64_regs() to extract registers and memory offsets
       from load/store instruction encodings (4 addressing modes)
  4/5  Wire up ARM64 in annotate_get_insn_location(),
       arch_supports_insn_tracking(), and init_type_state()
  5/5  Main patch: instruction classification, ADRP/ADD/MOV register
       state tracking, and architecture initialization

Shuai Xue (5):
  perf annotate-data: Widen type_state_reg::imm_value to u64
  perf disasm: Add ARM64 architecture detection and raw instruction
    parsing
  perf dwarf-regs: Add ARM64 register and offset extraction from raw
    instructions
  perf annotate: Wire up ARM64 data type profiling infrastructure
  perf annotate-arch: Add ARM64 data type profiling support

 .../perf/util/annotate-arch/annotate-arm64.c  | 333 ++++++++++++++++++
 tools/perf/util/annotate-arch/annotate-x86.c  |   2 +-
 tools/perf/util/annotate-data.c               |  18 +-
 tools/perf/util/annotate-data.h               |   2 +-
 tools/perf/util/annotate.c                    |  12 +-
 tools/perf/util/disasm.c                      |  64 ++++
 tools/perf/util/disasm.h                      |   2 +
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 125 +++++++
 tools/perf/util/include/dwarf-regs.h          |   7 +
 9 files changed, 558 insertions(+), 7 deletions(-)

-- 
2.51.2.612.gdc70283dfc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

The imm_value field in struct type_state_reg is used to store addresses
computed from PC-relative instructions (e.g., ARM64 ADRP). As a u32,
it silently truncates addresses above 4GB, which breaks global variable
resolution for kernel profiling and large-address userspace on ARM64.

Widen it to u64 to support the full 64-bit address space. Update the
corresponding format string in the x86 annotation code.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 tools/perf/util/annotate-arch/annotate-x86.c | 2 +-
 tools/perf/util/annotate-data.h              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index 7e6136536393..985aa8bbd0b9 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -547,7 +547,7 @@ static void update_insn_state_x86(struct type_state *state,
 			tsr->offset = 0;
 			tsr->ok = true;
 
-			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+			pr_debug_dtp("mov [%x] imm=%#"PRIx64" -> reg%d\n",
 				     insn_offset, tsr->imm_value, dst->reg1);
 			return;
 		}
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index c26130744260..4a9b4814479f 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -173,7 +173,7 @@ extern struct annotated_data_stat ann_data_stat;
  */
 struct type_state_reg {
 	Dwarf_Die type;
-	u32 imm_value;
+	u64 imm_value;
 	/*
 	 * The offset within the struct that the register points to.
 	 * A value of 0 means the register points to the beginning.
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add arch__is_arm64() helper to identify ARM64 binaries by ELF machine
type, following the existing arch__is_x86() and arch__is_powerpc()
pattern.

Add disasm_line__parse_arm64() to extract raw 32-bit instruction words
from ARM64 objdump output. Unlike PowerPC which needs be32_to_cpu()
byte-swapping, ARM64 instructions are always little-endian and can be
used directly. The parser finds the hex word boundary dynamically
instead of using a hardcoded width, and validates the sscanf result.

Set annotate_opts.show_asm_raw in arch__new_arm64() so that objdump
includes raw instruction bytes, which the parser requires.

Wire up the ARM64 parsing path in disasm_line__new() alongside the
existing PowerPC path.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  |  1 +
 tools/perf/util/disasm.c                      | 64 +++++++++++++++++++
 tools/perf/util/disasm.h                      |  2 +
 3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 33080fdca125..b98aaf9a8a7b 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -104,6 +104,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.comment_char	  = '/';
 	arch->objdump.skip_functions_char = '+';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
+	annotate_opts.show_asm_raw = true;
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 59ba88e1f744..83fad4f01442 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -52,6 +52,7 @@ const struct ins_ops arithmetic_ops;
 static void ins__sort(struct arch *arch);
 static int disasm_line__parse(char *line, const char **namep, char **rawp);
 static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_args *args);
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args);
 
 static __attribute__((constructor)) void symbol__init_regexpr(void)
 {
@@ -203,6 +204,11 @@ bool arch__is_powerpc(const struct arch *arch)
 	return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
 }
 
+bool arch__is_arm64(const struct arch *arch)
+{
+	return arch->id.e_machine == EM_AARCH64;
+}
+
 static void ins_ops__delete(struct ins_operands *ops)
 {
 	if (ops == NULL)
@@ -777,6 +783,14 @@ static const struct ins_ops *__ins__find(const struct arch *arch, const char *na
 			return ops;
 	}
 
+	if (arch__is_arm64(arch)) {
+		const struct ins_ops *ops;
+
+		ops = check_arm64_insn(dl);
+		if (ops)
+			return ops;
+	}
+
 	if (!arch->sorted_instructions) {
 		ins__sort((struct arch *)arch);
 		((struct arch *)arch)->sorted_instructions = true;
@@ -902,6 +916,53 @@ static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_ar
 	return ret;
 }
 
+/*
+ * Parses ARM64 disassembly output which includes raw instruction bytes.
+ * ARM64 objdump format:
+ *   a9bf7bfd 	stp	x29, x30, [sp, #-16]!
+ *
+ * The raw instruction is a hex word (typically 8 chars) followed by whitespace.
+ */
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args)
+{
+	char *line = dl->al.line;
+	const char **namep = &dl->ins.name;
+	char **rawp = &dl->ops.raw;
+	char *name_raw_insn = skip_spaces(line);
+	char *end_raw, *name, *tmp_raw_insn;
+	int ret = 0;
+
+	if (name_raw_insn[0] == '\0')
+		return -1;
+
+	/* Find end of raw instruction hex by looking for whitespace */
+	end_raw = name_raw_insn;
+	while (*end_raw && !isspace(*end_raw))
+		end_raw++;
+
+	name = skip_spaces(end_raw);
+
+	if (args->options->disassembler_used)
+		ret = disasm_line__parse(name, namep, rawp);
+	else
+		*namep = "";
+
+	tmp_raw_insn = strndup(name_raw_insn, end_raw - name_raw_insn);
+	if (tmp_raw_insn == NULL) {
+		if (args->options->disassembler_used)
+			zfree(namep);
+		return -1;
+	}
+
+	remove_spaces(tmp_raw_insn);
+
+	if (sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn) != 1)
+		dl->raw.raw_insn = 0;
+	free(tmp_raw_insn);
+
+	return ret;
+}
+
 static void annotation_line__init(struct annotation_line *al,
 				  struct annotate_args *args,
 				  int nr)
@@ -958,6 +1019,9 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
 		if (arch__is_powerpc(args->arch)) {
 			if (disasm_line__parse_powerpc(dl, args) < 0)
 				goto out_free_line;
+		} else if (arch__is_arm64(args->arch)) {
+			if (disasm_line__parse_arm64(dl, args) < 0)
+				goto out_free_line;
 		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
 			goto out_free_line;
 
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 25756e3f47e4..dfce128a3188 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -111,6 +111,7 @@ struct annotate_args {
 const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
 bool arch__is_x86(const struct arch *arch);
 bool arch__is_powerpc(const struct arch *arch);
+bool arch__is_arm64(const struct arch *arch);
 
 extern const struct ins_ops call_ops;
 extern const struct ins_ops dec_ops;
@@ -143,6 +144,7 @@ bool ins__is_ret(const struct ins *ins);
 bool ins__is_lock(const struct ins *ins);
 
 const struct ins_ops *check_ppc_insn(struct disasm_line *dl);
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl);
 
 struct disasm_line *disasm_line__new(struct annotate_args *args);
 void disasm_line__free(struct disasm_line *dl);
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add get_arm64_regs() to extract register numbers (Rn, Rt, Rm) and
memory offsets from raw ARM64 load/store instruction encodings. This
follows the same pattern as get_powerpc_regs() for PowerPC.

ARM64 DWARF register numbers map trivially: x0-x30 = 0-30, sp = 31,
so the hardware register fields can be used directly as DWARF regnums.

Four addressing modes are handled:
  - Unsigned offset: imm12 scaled by access size
  - Pre/Post-indexed: sign-extended 9-bit immediate
  - Register offset: offset from Rm (set to 0, handled via multi_regs)
  - Load/Store Pair: sign-extended 7-bit immediate scaled by element size

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 125 ++++++++++++++++++
 tools/perf/util/include/dwarf-regs.h          |   7 +
 2 files changed, 132 insertions(+)

diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
index 593ca7d4fccc..26f296624966 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
@@ -1,8 +1,133 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <errno.h>
 #include <dwarf-regs.h>
+#include "../annotate.h"
 #include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
 
+/*
+ * ARM64 instruction field extraction.
+ * Mirrors definitions in annotate-arm64.c.
+ */
+#define A64_RT(insn)	((insn) & 0x1f)
+#define A64_RN(insn)	(((insn) >> 5) & 0x1f)
+#define A64_RT2(insn)	(((insn) >> 10) & 0x1f)
+#define A64_RM(insn)	(((insn) >> 16) & 0x1f)
+
+/*
+ * Load/Store encoding sub-class detection.
+ * Derived from ARM Architecture Reference Manual, C4.1.
+ *
+ * Load/Store Pair (offset/pre/post): bits[29:27]=101, bit[26]=0
+ * Load/Store Register:              bits[29:27]=111, bit[26]=0
+ *   - Unsigned offset:              bits[25:24]=01
+ *   - Pre/Post-indexed:             bits[25:24]=00, bit[21]=0
+ *   - Register offset:              bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_PAIR_MASK		0x3c000000
+#define A64_INSN_LS_PAIR_VAL		0x28000000
+
+#define A64_INSN_LS_REG_MASK		0x3c000000
+#define A64_INSN_LS_REG_VAL		0x38000000
+
+#define A64_INSN_LS_UNSIGNED_MASK	0x3b000000
+#define A64_INSN_LS_UNSIGNED_VAL	0x39000000
+
+#define A64_INSN_LS_PREPOST_MASK	0x3b200000
+#define A64_INSN_LS_PREPOST_VAL		0x38000000
+
+#define A64_INSN_LS_REG_OFF_MASK	0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL	0x38200800
+
+static int arm64_get_immoff_unsigned(u32 insn)
+{
+	int size = (insn >> 30) & 0x3;
+	int imm12 = (insn >> 10) & 0xfff;
+
+	return imm12 << size;
+}
+
+static int arm64_get_immoff_prepost(u32 insn)
+{
+	int imm9 = (insn >> 12) & 0x1ff;
+
+	/* sign-extend 9-bit immediate */
+	if (imm9 & 0x100)
+		imm9 |= ~0x1ff;
+
+	return imm9;
+}
+
+static int arm64_get_immoff_pair(u32 insn)
+{
+	int imm7 = (insn >> 15) & 0x7f;
+	int scale = 2 + ((insn >> 31) & 1);
+
+	/* sign-extend 7-bit immediate */
+	if (imm7 & 0x40)
+		imm7 |= ~0x7f;
+
+	return imm7 << scale;
+}
+
+/*
+ * Fills op_loc fields depending on whether it is a source or target operand.
+ *
+ * ARM64 load/store encoding forms:
+ *   Register (unsigned offset):  [Rn, #imm12 << scale]
+ *   Register (pre/post-indexed): [Rn, #imm9]  or  [Rn], #imm9
+ *   Register (register offset):  [Rn, Rm{, extend/shift}]
+ *   Pair:                        [Rn, #imm7 << scale]
+ *
+ * For source (memory) operand: reg1=Rn (base), offset=immediate
+ * For target (register) operand: reg1=Rt
+ */
+void get_arm64_regs(u32 raw_insn, int is_source,
+		    struct annotated_op_loc *op_loc)
+{
+	if (is_source)
+		op_loc->reg1 = A64_RN(raw_insn);
+	else
+		op_loc->reg1 = A64_RT(raw_insn);
+
+	if (op_loc->multi_regs) {
+		/* LDP/STP pair: second register is Rt2 (bits[14:10]) */
+		if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL)
+			op_loc->reg2 = A64_RT2(raw_insn);
+		else
+			op_loc->reg2 = A64_RM(raw_insn);
+	}
+
+	if (!op_loc->mem_ref || !is_source)
+		return;
+
+	/* Load/Store Pair */
+	if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL) {
+		op_loc->offset = arm64_get_immoff_pair(raw_insn);
+		return;
+	}
+
+	/* Load/Store Register */
+	if ((raw_insn & A64_INSN_LS_REG_MASK) == A64_INSN_LS_REG_VAL) {
+		/* Unsigned offset */
+		if ((raw_insn & A64_INSN_LS_UNSIGNED_MASK) == A64_INSN_LS_UNSIGNED_VAL) {
+			op_loc->offset = arm64_get_immoff_unsigned(raw_insn);
+			return;
+		}
+
+		/* Register offset */
+		if ((raw_insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL) {
+			op_loc->offset = 0;
+			return;
+		}
+
+		/* Pre/Post-indexed */
+		if ((raw_insn & A64_INSN_LS_PREPOST_MASK) == A64_INSN_LS_PREPOST_VAL) {
+			op_loc->offset = arm64_get_immoff_prepost(raw_insn);
+			return;
+		}
+	}
+}
+
 int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum)
 {
 	if (perf_regnum < 0 || perf_regnum >= PERF_REG_ARM64_MAX)
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..c3f730d2fd88 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -129,6 +129,7 @@ int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, unsi
 				     bool only_libdw_supported);
 
 void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
+void get_arm64_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
 
 #else /* HAVE_LIBDW_SUPPORT */
 
@@ -144,6 +145,12 @@ static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source _
 {
 	return;
 }
+
+static inline void get_arm64_regs(u32 raw_insn __maybe_unused, int is_source __maybe_unused,
+		struct annotated_op_loc *op_loc __maybe_unused)
+{
+	return;
+}
 #endif
 
 #endif
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
                   ` (2 preceding siblings ...)
  2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
  2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim
  5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add ARM64 support to the core dispatch and initialization points:

1. annotate_get_insn_location(): Add an arm64 branch alongside the
   existing powerpc branch to call get_arm64_regs() for extracting
   register numbers and memory offsets from raw instructions.

2. arch_supports_insn_tracking(): Include arm64 so that
   find_data_type_block() can perform instruction-level type state
   tracking on ARM64.

3. init_type_state(): Add arm64 branch to set caller-saved registers
   (x0-x18 per AAPCS64) and stack register (SP, DWARF reg 31).
   Without this, stack_reg defaults to 0 (x0) after memset, causing
   x0-based memory accesses to be misidentified as stack accesses.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 tools/perf/util/annotate-data.c | 18 +++++++++++++++++-
 tools/perf/util/annotate.c      | 12 ++++++++----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 1eff0a27237d..c04ad66ff077 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -29,6 +29,11 @@
 /* register number of the stack pointer */
 #define X86_REG_SP 7
 
+/* ARM64 DWARF register numbers: x0-x30=0-30, SP=31 */
+#define ARM64_REG_SP 31
+#define ARM64_REG_LR 30
+#define ARM64_REG_FP 29
+
 static void delete_var_types(struct die_var_type *var_types);
 
 #define pr_debug_dtp(fmt, ...)					\
@@ -178,6 +183,16 @@ static void init_type_state(struct type_state *state, const struct arch *arch)
 		state->ret_reg = 0;
 		state->stack_reg = X86_REG_SP;
 	}
+
+	if (arch__is_arm64(arch)) {
+		int i;
+
+		/* ARM64 ABI: x0-x18 are caller-saved */
+		for (i = 0; i <= 18; i++)
+			state->regs[i].caller_saved = true;
+		state->ret_reg = 0;
+		state->stack_reg = ARM64_REG_SP;
+	}
 }
 
 static void exit_type_state(struct type_state *state)
@@ -1421,7 +1436,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
 
 static int arch_supports_insn_tracking(struct data_loc_info *dloc)
 {
-	if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)))
+	if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)) ||
+	    (arch__is_arm64(dloc->arch)))
 		return 1;
 	return 0;
 }
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..bd734826538d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2574,19 +2574,23 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
 		op_loc->reg2 = -1;
 
 		if (insn_str == NULL) {
-			if (!arch__is_powerpc(arch))
+			if (!arch__is_powerpc(arch) && !arch__is_arm64(arch))
 				continue;
 		}
 
 		/*
-		 * For powerpc, call get_powerpc_regs function which extracts the
-		 * required fields for op_loc, ie reg1, reg2, offset from the
-		 * raw instruction.
+		 * For powerpc and arm64, call arch-specific functions to
+		 * extract the required fields for op_loc (reg1, reg2, offset)
+		 * from the raw instruction.
 		 */
 		if (arch__is_powerpc(arch)) {
 			op_loc->mem_ref = mem_ref;
 			op_loc->multi_regs = multi_regs;
 			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
+		} else if (arch__is_arm64(arch)) {
+			op_loc->mem_ref = mem_ref;
+			op_loc->multi_regs = multi_regs;
+			get_arm64_regs(dl->raw.raw_insn, !i, op_loc);
 		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
 			op_loc->mem_ref = true;
 			op_loc->multi_regs = multi_regs;
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
                   ` (3 preceding siblings ...)
  2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim
  5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add data type profiling support for ARM64, enabling 'perf annotate
--code-with-type' to show which data types each memory instruction
accesses. This follows the PowerPC model of raw 32-bit instruction
decoding rather than x86's text-based operand parsing.

Instruction classification (check_arm64_insn):
  Classify instructions by raw encoding into load/store, arithmetic
  (add immediate, adrp), and register move categories. GP load/store
  is detected by bits[27:25] pattern with LDR (literal) excluded to
  avoid misinterpreting its different register field layout. ADRP, ADD
  immediate (with ADDG/SUBG excluded via tighter mask), and MOV
  (register) have their own mask/val pairs derived from the ARM ARM.

Load/store operand parsing:
  Set mem_ref=true for all GP load/store instructions. Detect register
  offset addressing mode to set multi_regs=true when Rm is used as a
  second source operand.

Register state tracking (update_insn_state_arm64):
  Track three instruction patterns for type propagation:

  - ADRP Xd, #page: Compute the PC-relative page address using
    sign_extend64() and either resolve the global variable type
    directly or store the address as TSR_KIND_CONST for later
    resolution by ADD.

  - ADD Xd, Xn, #imm: If Xn holds an ADRP result (TSR_KIND_CONST),
    compute the full variable address and resolve via
    get_global_var_type(). This handles the common ARM64 global
    variable access pattern: adrp+add+ldr.

  - MOV Xd, Xm: Propagate type state including kind, offset, and
    imm_value from the source to destination register.

Known limitations:
  - The adrp+ldr pattern (without intermediate ADD, using lo12 folded
    into the LDR offset) is not yet handled. This requires extending
    check_matching_type() to resolve TSR_KIND_CONST with the load
    offset, which can be added incrementally.
  - Pointer chain tracking (load-from-memory propagating type to the
    destination register) is not implemented, matching PowerPC's
    current scope.

Architecture initialization:
  Register the update_insn_state callback for instruction-level type
  state tracking.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 332 ++++++++++++++++++
 1 file changed, 332 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index b98aaf9a8a7b..887ed22c4ca0 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -1,12 +1,21 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/bitops.h>
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <errno.h>
+#include <inttypes.h>
 #include <stdlib.h>
 #include <string.h>
 #include <linux/zalloc.h>
 #include <regex.h>
 #include "../annotate.h"
+#include "../debug.h"
 #include "../disasm.h"
+#ifdef HAVE_LIBDW_SUPPORT
+#include "../annotate-data.h"
+#include "../map.h"
+#include "../symbol.h"
+#endif
 
 struct arch_arm64 {
 	struct arch arch;
@@ -14,6 +23,47 @@ struct arch_arm64 {
 	regex_t jump_insn;
 };
 
+/*
+ * ARM64 instruction encoding masks and values.
+ * Derived from ARM Architecture Reference Manual, C4.1 A64 encoding index.
+ *
+ * These mirror the definitions in arch/arm64/include/asm/insn.h but are
+ * duplicated here because that header depends on kernel-only macros
+ * (BUILD_BUG_ON, __always_inline).
+ */
+
+/* GP Load/Store: bit[27]=1, bit[26]=0 (GP, not SIMD/FP), bit[25]=0 */
+#define A64_INSN_GP_LS_MASK	0x0e000000
+#define A64_INSN_GP_LS_VAL	0x08000000
+
+/* LDR/LDRSW (literal): bits[29:27]=011, bit[26]=0 -- must be excluded from GP LS */
+#define A64_INSN_LDR_LIT_MASK	0x3b000000
+#define A64_INSN_LDR_LIT_VAL	0x18000000
+
+/*
+ * Load/Store register (register offset):
+ * bits[29:27]=111, bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_REG_OFF_MASK	0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL	0x38200800
+
+/* ADRP: mask=0x9F000000, val=0x90000000 */
+#define A64_INSN_ADRP_MASK	0x9f000000
+#define A64_INSN_ADRP_VAL	0x90000000
+
+/* ADD (immediate): mask=0x7F800000, val=0x11000000 (excludes ADDG/SUBG) */
+#define A64_INSN_ADD_IMM_MASK	0x7f800000
+#define A64_INSN_ADD_IMM_VAL	0x11000000
+
+/* MOV (register) = ORR Xd/Wd, XZR/WZR, Xm/Wm: Rn=11111, imm6=000000 */
+#define A64_INSN_MOV_REG_MASK	0x7fe0ffe0
+#define A64_INSN_MOV_REG_VAL	0x2a0003e0
+
+/* Instruction field extraction */
+#define A64_RT(insn)	((insn) & 0x1f)
+#define A64_RN(insn)	(((insn) >> 5) & 0x1f)
+#define A64_RM(insn)	(((insn) >> 16) & 0x1f)
+
 static int arm64_mov__parse(const struct arch *arch __maybe_unused,
 			    struct ins_operands *ops,
 			    struct map_symbol *ms __maybe_unused,
@@ -69,6 +119,285 @@ static const struct ins_ops arm64_mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+/*
+ * ARM64 load/store instruction parser.
+ * Sets mem_ref and multi_regs based on raw instruction encoding.
+ */
+static int arm64_load_store__parse(const struct arch *arch __maybe_unused,
+				   struct ins_operands *ops,
+				   struct map_symbol *ms __maybe_unused,
+				   struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+
+	ops->source.mem_ref = true;
+	ops->source.multi_regs = false;
+
+	/* Load/Store register (register offset) uses Rm as second source */
+	if ((insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL)
+		ops->source.multi_regs = true;
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static int arm64_load_store__scnprintf(const struct ins *ins, char *bf,
+				       size_t size,
+				       struct ins_operands *ops,
+				       int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			 ops->raw);
+}
+
+static const struct ins_ops arm64_load_store_ops = {
+	.parse     = arm64_load_store__parse,
+	.scnprintf = arm64_load_store__scnprintf,
+};
+
+static int arm64_arithmetic__parse(const struct arch *arch __maybe_unused,
+				   struct ins_operands *ops,
+				   struct map_symbol *ms __maybe_unused,
+				   struct disasm_line *dl __maybe_unused)
+{
+	ops->source.mem_ref = false;
+	ops->source.multi_regs = false;
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static int arm64_arithmetic__scnprintf(const struct ins *ins, char *bf,
+				       size_t size,
+				       struct ins_operands *ops,
+				       int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			 ops->raw);
+}
+
+static const struct ins_ops arm64_arithmetic_ops = {
+	.parse     = arm64_arithmetic__parse,
+	.scnprintf = arm64_arithmetic__scnprintf,
+};
+
+/*
+ * Classify ARM64 instructions by raw encoding for data type profiling.
+ */
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+
+	/* Exclude LDR/LDRSW (literal) before matching GP Load/Store */
+	if ((insn & A64_INSN_LDR_LIT_MASK) == A64_INSN_LDR_LIT_VAL)
+		return NULL;
+
+	if ((insn & A64_INSN_GP_LS_MASK) == A64_INSN_GP_LS_VAL)
+		return &arm64_load_store_ops;
+
+	if ((insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL)
+		return &arm64_arithmetic_ops;
+
+	if ((insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL)
+		return &arm64_arithmetic_ops;
+
+	if ((insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL)
+		return &arm64_arithmetic_ops;
+
+	return NULL;
+}
+
+#ifdef HAVE_LIBDW_SUPPORT
+
+static inline bool arm64_is_adrp(u32 insn)
+{
+	return (insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL;
+}
+
+static inline bool arm64_is_add_imm(u32 insn)
+{
+	return (insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL;
+}
+
+static inline bool arm64_is_mov_reg(u32 insn)
+{
+	return (insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL;
+}
+
+/*
+ * Compute the page address from an ADRP instruction.
+ * ADRP Xd, #imm: Xd = (PC & ~0xFFF) + (imm << 12)
+ * immhi = bits[23:5] (19 bits), immlo = bits[30:29] (2 bits)
+ * imm = sign_extend(immhi:immlo, 21)
+ */
+static u64 arm64_adrp_target(u64 pc, u32 insn)
+{
+	u64 immhi = (insn >> 5) & 0x7ffff;
+	u64 immlo = (insn >> 29) & 0x3;
+	u64 imm = (immhi << 2) | immlo;
+
+	return (pc & ~0xfffULL) + (sign_extend64(imm, 20) << 12);
+}
+
+/*
+ * Track register state for ARM64 instructions.
+ *
+ * Handles three instruction patterns:
+ *
+ * 1. ADRP Xd, #page - computes a PC-relative page address.
+ *    Track the computed address so a subsequent LDR can resolve
+ *    the global variable.
+ *
+ * 2. ADD Xd, Xn, #imm - if Xn holds a tracked address (from ADRP),
+ *    propagate the adjusted address to Xd.
+ *
+ * 3. MOV Xd, Xm - propagate type state from Xm to Xd.
+ */
+static void update_insn_state_arm64(struct type_state *state,
+				    struct data_loc_info *dloc,
+				    Dwarf_Die *cu_die,
+				    struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+	int rd, rn;
+	struct type_state_reg *tsr;
+
+	if (arm64_is_adrp(insn)) {
+		u64 pc, page_addr;
+		int offset;
+		Dwarf_Die type_die;
+
+		rd = A64_RT(insn);
+		if (!has_reg_type(state, rd))
+			return;
+
+		tsr = &state->regs[rd];
+
+		pc = map__rip_2objdump(dloc->ms->map,
+				       dloc->ms->sym->start + dl->al.offset);
+		page_addr = arm64_adrp_target(pc, insn);
+
+		/*
+		 * Try to resolve the global variable at this page address.
+		 * If not found, store it as a constant for later ADD resolution.
+		 */
+		if (get_global_var_type(cu_die, dloc,
+					dloc->ms->sym->start + dl->al.offset,
+					page_addr, &offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_POINTER;
+			tsr->offset = offset;
+			tsr->ok = true;
+
+			pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d",
+				     (u32)dl->al.offset, page_addr, rd);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		} else {
+			tsr->kind = TSR_KIND_CONST;
+			tsr->imm_value = page_addr;
+			tsr->ok = true;
+
+			pr_debug_dtp("adrp [%x] page=%#"PRIx64" -> reg%d\n",
+				     (u32)dl->al.offset, page_addr, rd);
+		}
+		return;
+	}
+
+	if (arm64_is_add_imm(insn)) {
+		int imm12, shift;
+		u64 var_addr;
+		int offset;
+		Dwarf_Die type_die;
+
+		rd = A64_RT(insn);
+		rn = A64_RN(insn);
+
+		if (!has_reg_type(state, rd) || !has_reg_type(state, rn))
+			return;
+
+		tsr = &state->regs[rd];
+
+		if (!state->regs[rn].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		imm12 = (insn >> 10) & 0xfff;
+		shift = ((insn >> 22) & 0x1) ? 12 : 0;
+
+		/*
+		 * If Rn holds an ADRP result (TSR_KIND_CONST), compute
+		 * the full address and try to resolve the global variable.
+		 */
+		if (state->regs[rn].kind == TSR_KIND_CONST) {
+			var_addr = state->regs[rn].imm_value +
+				   ((u64)imm12 << shift);
+
+			if (get_global_var_type(cu_die, dloc,
+						dloc->ms->sym->start + dl->al.offset,
+						var_addr, &offset, &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_POINTER;
+				tsr->offset = offset;
+				tsr->ok = true;
+
+				pr_debug_dtp("add [%x] global addr=%#"PRIx64" -> reg%d",
+					     (u32)dl->al.offset, var_addr, rd);
+				pr_debug_type_name(&tsr->type, tsr->kind);
+				return;
+			}
+		}
+
+		/* Otherwise propagate existing type with adjusted offset */
+		if (state->regs[rn].kind == TSR_KIND_TYPE ||
+		    state->regs[rn].kind == TSR_KIND_POINTER) {
+			tsr->type = state->regs[rn].type;
+			tsr->kind = state->regs[rn].kind;
+			tsr->offset = state->regs[rn].offset + (imm12 << shift);
+			tsr->ok = true;
+
+			pr_debug_dtp("add [%x] imm=%#x reg%d -> reg%d",
+				     (u32)dl->al.offset, imm12 << shift, rn, rd);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		} else {
+			tsr->ok = false;
+		}
+		return;
+	}
+
+	if (arm64_is_mov_reg(insn)) {
+		int rm;
+
+		rd = A64_RT(insn);
+		rm = A64_RM(insn);
+
+		if (!has_reg_type(state, rd))
+			return;
+
+		tsr = &state->regs[rd];
+
+		if (!has_reg_type(state, rm) || !state->regs[rm].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		tsr->type = state->regs[rm].type;
+		tsr->kind = state->regs[rm].kind;
+		tsr->offset = state->regs[rm].offset;
+		tsr->imm_value = state->regs[rm].imm_value;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			     (u32)dl->al.offset, rm, rd);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+		return;
+	}
+}
+#endif /* HAVE_LIBDW_SUPPORT */
+
 static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
 {
 	struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
@@ -105,6 +434,9 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.skip_functions_char = '+';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
 	annotate_opts.show_asm_raw = true;
+#ifdef HAVE_LIBDW_SUPPORT
+	arch->update_insn_state = update_insn_state_arm64;
+#endif
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
                   ` (4 preceding siblings ...)
  2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 16:56 ` Namhyung Kim
  5 siblings, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2026-06-23 16:56 UTC (permalink / raw)
  To: Shuai Xue
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Hello,

On Tue, Jun 23, 2026 at 09:02:29PM +0800, Shuai Xue wrote:
> `perf test -v "perf data type profiling tests"` fails on ARM64:
> 
>     Basic Rust perf annotate test
>     perf mem record -o /tmp/perf.data perf test -w code_with_type
>     perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
>     Basic annotate [Failed: missing target data type]
> 
> The root cause is that ARM64 lacks the instruction parsing infrastructure
> required for data type profiling. Specifically:
> 
>   1. annotate_get_insn_location() cannot extract register numbers and
>      memory offsets from ARM64 load/store instructions, because ARM64
>      does not set objdump.register_char or objdump.memory_ref_char
>      (unlike x86 which uses '%' and '(').
> 
>   2. arch_supports_insn_tracking() does not include ARM64, so
>      find_data_type_block() cannot perform instruction-level type state
>      tracking.
> 
>   3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
>      after memset, which causes x0-based memory accesses to be
>      misidentified as stack accesses.
> 
> As a result, perf annotate --code-with-type silently produces no type
> annotations on ARM64, and the test grep for "# data-type: struct Buf"
> fails.
> 
> This series adds ARM64 data type profiling support following the PowerPC
> model: decode raw 32-bit instruction words rather than parsing objdump
> text. ARM64's fixed-width encoding and trivial DWARF register mapping
> (x0-x30 = DWARF 0-30) make this approach clean and robust.
> 
> Three classes of instructions are tracked for register state propagation:
>   - ADRP: compute PC-relative page address for global variable resolution
>   - ADD (immediate): combine with ADRP result to form full variable address
>   - MOV (register): propagate type state between registers
> 
> This covers the common `adrp + add + ldr/str` pattern that ARM64
> compilers emit for global variable access.
> 
> Known limitations:
>   - The `adrp + ldr` pattern (with :lo12: folded into the load offset,
>     without an intermediate ADD) is not yet handled. This requires
>     extending check_matching_type() to resolve TSR_KIND_CONST with the
>     load offset, which can be added incrementally.
>   - Pointer chain tracking (load-from-memory propagating type to the
>     destination register) is not implemented, matching PowerPC's current
>     scope.
> 
> Testing:
>   All four sub-tests in `perf test "perf data type profiling tests"`
>   pass reliably on ARM64 (AArch64, SPE-capable hardware):
>     - Basic/Pipe Rust: struct Buf (code_with_type workload)
>     - Basic/Pipe C: struct buf (datasym workload, global variable)
> 
> Patch breakdown:
>   1/5  Widen type_state_reg::imm_value from u32 to u64 (prerequisite
>        for storing 64-bit addresses from ADRP)
>   2/5  Add arch__is_arm64() detection, raw instruction parsing from
>        objdump output, and enable show_asm_raw for ARM64
>   3/5  Add get_arm64_regs() to extract registers and memory offsets
>        from load/store instruction encodings (4 addressing modes)
>   4/5  Wire up ARM64 in annotate_get_insn_location(),
>        arch_supports_insn_tracking(), and init_type_state()
>   5/5  Main patch: instruction classification, ADRP/ADD/MOV register
>        state tracking, and architecture initialization
> 
> Shuai Xue (5):
>   perf annotate-data: Widen type_state_reg::imm_value to u64
>   perf disasm: Add ARM64 architecture detection and raw instruction
>     parsing
>   perf dwarf-regs: Add ARM64 register and offset extraction from raw
>     instructions
>   perf annotate: Wire up ARM64 data type profiling infrastructure
>   perf annotate-arch: Add ARM64 data type profiling support

Thanks for the contribution!

There was another series on this, please take a look.  I hope you guys
can collaborate.

https://lore.kernel.org/r/20260403094800.1418825-1-wutengda@huaweicloud.com

Thanks,
Namhyung

> 
>  .../perf/util/annotate-arch/annotate-arm64.c  | 333 ++++++++++++++++++
>  tools/perf/util/annotate-arch/annotate-x86.c  |   2 +-
>  tools/perf/util/annotate-data.c               |  18 +-
>  tools/perf/util/annotate-data.h               |   2 +-
>  tools/perf/util/annotate.c                    |  12 +-
>  tools/perf/util/disasm.c                      |  64 ++++
>  tools/perf/util/disasm.h                      |   2 +
>  .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 125 +++++++
>  tools/perf/util/include/dwarf-regs.h          |   7 +
>  9 files changed, 558 insertions(+), 7 deletions(-)
> 
> -- 
> 2.51.2.612.gdc70283dfc

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-23 16:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox