Linux Perf Users
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
@ 2026-06-23 13:02 Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

`perf test -v "perf data type profiling tests"` fails on ARM64:

    Basic Rust perf annotate test
    perf mem record -o /tmp/perf.data perf test -w code_with_type
    perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
    Basic annotate [Failed: missing target data type]

The root cause is that ARM64 lacks the instruction parsing infrastructure
required for data type profiling. Specifically:

  1. annotate_get_insn_location() cannot extract register numbers and
     memory offsets from ARM64 load/store instructions, because ARM64
     does not set objdump.register_char or objdump.memory_ref_char
     (unlike x86 which uses '%' and '(').

  2. arch_supports_insn_tracking() does not include ARM64, so
     find_data_type_block() cannot perform instruction-level type state
     tracking.

  3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
     after memset, which causes x0-based memory accesses to be
     misidentified as stack accesses.

As a result, perf annotate --code-with-type silently produces no type
annotations on ARM64, and the test grep for "# data-type: struct Buf"
fails.

This series adds ARM64 data type profiling support following the PowerPC
model: decode raw 32-bit instruction words rather than parsing objdump
text. ARM64's fixed-width encoding and trivial DWARF register mapping
(x0-x30 = DWARF 0-30) make this approach clean and robust.

Three classes of instructions are tracked for register state propagation:
  - ADRP: compute PC-relative page address for global variable resolution
  - ADD (immediate): combine with ADRP result to form full variable address
  - MOV (register): propagate type state between registers

This covers the common `adrp + add + ldr/str` pattern that ARM64
compilers emit for global variable access.

Known limitations:
  - The `adrp + ldr` pattern (with :lo12: folded into the load offset,
    without an intermediate ADD) is not yet handled. This requires
    extending check_matching_type() to resolve TSR_KIND_CONST with the
    load offset, which can be added incrementally.
  - Pointer chain tracking (load-from-memory propagating type to the
    destination register) is not implemented, matching PowerPC's current
    scope.

Testing:
  All four sub-tests in `perf test "perf data type profiling tests"`
  pass reliably on ARM64 (AArch64, SPE-capable hardware):
    - Basic/Pipe Rust: struct Buf (code_with_type workload)
    - Basic/Pipe C: struct buf (datasym workload, global variable)

Patch breakdown:
  1/5  Widen type_state_reg::imm_value from u32 to u64 (prerequisite
       for storing 64-bit addresses from ADRP)
  2/5  Add arch__is_arm64() detection, raw instruction parsing from
       objdump output, and enable show_asm_raw for ARM64
  3/5  Add get_arm64_regs() to extract registers and memory offsets
       from load/store instruction encodings (4 addressing modes)
  4/5  Wire up ARM64 in annotate_get_insn_location(),
       arch_supports_insn_tracking(), and init_type_state()
  5/5  Main patch: instruction classification, ADRP/ADD/MOV register
       state tracking, and architecture initialization

Shuai Xue (5):
  perf annotate-data: Widen type_state_reg::imm_value to u64
  perf disasm: Add ARM64 architecture detection and raw instruction
    parsing
  perf dwarf-regs: Add ARM64 register and offset extraction from raw
    instructions
  perf annotate: Wire up ARM64 data type profiling infrastructure
  perf annotate-arch: Add ARM64 data type profiling support

 .../perf/util/annotate-arch/annotate-arm64.c  | 333 ++++++++++++++++++
 tools/perf/util/annotate-arch/annotate-x86.c  |   2 +-
 tools/perf/util/annotate-data.c               |  18 +-
 tools/perf/util/annotate-data.h               |   2 +-
 tools/perf/util/annotate.c                    |  12 +-
 tools/perf/util/disasm.c                      |  64 ++++
 tools/perf/util/disasm.h                      |   2 +
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 125 +++++++
 tools/perf/util/include/dwarf-regs.h          |   7 +
 9 files changed, 558 insertions(+), 7 deletions(-)

-- 
2.51.2.612.gdc70283dfc

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

The imm_value field in struct type_state_reg is used to store addresses
computed from PC-relative instructions (e.g., ARM64 ADRP). As a u32,
it silently truncates addresses above 4GB, which breaks global variable
resolution for kernel profiling and large-address userspace on ARM64.

Widen it to u64 to support the full 64-bit address space. Update the
corresponding format string in the x86 annotation code.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 tools/perf/util/annotate-arch/annotate-x86.c | 2 +-
 tools/perf/util/annotate-data.h              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index 7e6136536393..985aa8bbd0b9 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -547,7 +547,7 @@ static void update_insn_state_x86(struct type_state *state,
 			tsr->offset = 0;
 			tsr->ok = true;
 
-			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+			pr_debug_dtp("mov [%x] imm=%#"PRIx64" -> reg%d\n",
 				     insn_offset, tsr->imm_value, dst->reg1);
 			return;
 		}
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index c26130744260..4a9b4814479f 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -173,7 +173,7 @@ extern struct annotated_data_stat ann_data_stat;
  */
 struct type_state_reg {
 	Dwarf_Die type;
-	u32 imm_value;
+	u64 imm_value;
 	/*
 	 * The offset within the struct that the register points to.
 	 * A value of 0 means the register points to the beginning.
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:19   ` sashiko-bot
  2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add arch__is_arm64() helper to identify ARM64 binaries by ELF machine
type, following the existing arch__is_x86() and arch__is_powerpc()
pattern.

Add disasm_line__parse_arm64() to extract raw 32-bit instruction words
from ARM64 objdump output. Unlike PowerPC which needs be32_to_cpu()
byte-swapping, ARM64 instructions are always little-endian and can be
used directly. The parser finds the hex word boundary dynamically
instead of using a hardcoded width, and validates the sscanf result.

Set annotate_opts.show_asm_raw in arch__new_arm64() so that objdump
includes raw instruction bytes, which the parser requires.

Wire up the ARM64 parsing path in disasm_line__new() alongside the
existing PowerPC path.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  |  1 +
 tools/perf/util/disasm.c                      | 64 +++++++++++++++++++
 tools/perf/util/disasm.h                      |  2 +
 3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 33080fdca125..b98aaf9a8a7b 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -104,6 +104,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.comment_char	  = '/';
 	arch->objdump.skip_functions_char = '+';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
+	annotate_opts.show_asm_raw = true;
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 59ba88e1f744..83fad4f01442 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -52,6 +52,7 @@ const struct ins_ops arithmetic_ops;
 static void ins__sort(struct arch *arch);
 static int disasm_line__parse(char *line, const char **namep, char **rawp);
 static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_args *args);
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args);
 
 static __attribute__((constructor)) void symbol__init_regexpr(void)
 {
@@ -203,6 +204,11 @@ bool arch__is_powerpc(const struct arch *arch)
 	return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
 }
 
+bool arch__is_arm64(const struct arch *arch)
+{
+	return arch->id.e_machine == EM_AARCH64;
+}
+
 static void ins_ops__delete(struct ins_operands *ops)
 {
 	if (ops == NULL)
@@ -777,6 +783,14 @@ static const struct ins_ops *__ins__find(const struct arch *arch, const char *na
 			return ops;
 	}
 
+	if (arch__is_arm64(arch)) {
+		const struct ins_ops *ops;
+
+		ops = check_arm64_insn(dl);
+		if (ops)
+			return ops;
+	}
+
 	if (!arch->sorted_instructions) {
 		ins__sort((struct arch *)arch);
 		((struct arch *)arch)->sorted_instructions = true;
@@ -902,6 +916,53 @@ static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_ar
 	return ret;
 }
 
+/*
+ * Parses ARM64 disassembly output which includes raw instruction bytes.
+ * ARM64 objdump format:
+ *   a9bf7bfd 	stp	x29, x30, [sp, #-16]!
+ *
+ * The raw instruction is a hex word (typically 8 chars) followed by whitespace.
+ */
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args)
+{
+	char *line = dl->al.line;
+	const char **namep = &dl->ins.name;
+	char **rawp = &dl->ops.raw;
+	char *name_raw_insn = skip_spaces(line);
+	char *end_raw, *name, *tmp_raw_insn;
+	int ret = 0;
+
+	if (name_raw_insn[0] == '\0')
+		return -1;
+
+	/* Find end of raw instruction hex by looking for whitespace */
+	end_raw = name_raw_insn;
+	while (*end_raw && !isspace(*end_raw))
+		end_raw++;
+
+	name = skip_spaces(end_raw);
+
+	if (args->options->disassembler_used)
+		ret = disasm_line__parse(name, namep, rawp);
+	else
+		*namep = "";
+
+	tmp_raw_insn = strndup(name_raw_insn, end_raw - name_raw_insn);
+	if (tmp_raw_insn == NULL) {
+		if (args->options->disassembler_used)
+			zfree(namep);
+		return -1;
+	}
+
+	remove_spaces(tmp_raw_insn);
+
+	if (sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn) != 1)
+		dl->raw.raw_insn = 0;
+	free(tmp_raw_insn);
+
+	return ret;
+}
+
 static void annotation_line__init(struct annotation_line *al,
 				  struct annotate_args *args,
 				  int nr)
@@ -958,6 +1019,9 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
 		if (arch__is_powerpc(args->arch)) {
 			if (disasm_line__parse_powerpc(dl, args) < 0)
 				goto out_free_line;
+		} else if (arch__is_arm64(args->arch)) {
+			if (disasm_line__parse_arm64(dl, args) < 0)
+				goto out_free_line;
 		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
 			goto out_free_line;
 
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 25756e3f47e4..dfce128a3188 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -111,6 +111,7 @@ struct annotate_args {
 const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
 bool arch__is_x86(const struct arch *arch);
 bool arch__is_powerpc(const struct arch *arch);
+bool arch__is_arm64(const struct arch *arch);
 
 extern const struct ins_ops call_ops;
 extern const struct ins_ops dec_ops;
@@ -143,6 +144,7 @@ bool ins__is_ret(const struct ins *ins);
 bool ins__is_lock(const struct ins *ins);
 
 const struct ins_ops *check_ppc_insn(struct disasm_line *dl);
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl);
 
 struct disasm_line *disasm_line__new(struct annotate_args *args);
 void disasm_line__free(struct disasm_line *dl);
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
  4 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add get_arm64_regs() to extract register numbers (Rn, Rt, Rm) and
memory offsets from raw ARM64 load/store instruction encodings. This
follows the same pattern as get_powerpc_regs() for PowerPC.

ARM64 DWARF register numbers map trivially: x0-x30 = 0-30, sp = 31,
so the hardware register fields can be used directly as DWARF regnums.

Four addressing modes are handled:
  - Unsigned offset: imm12 scaled by access size
  - Pre/Post-indexed: sign-extended 9-bit immediate
  - Register offset: offset from Rm (set to 0, handled via multi_regs)
  - Load/Store Pair: sign-extended 7-bit immediate scaled by element size

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 125 ++++++++++++++++++
 tools/perf/util/include/dwarf-regs.h          |   7 +
 2 files changed, 132 insertions(+)

diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
index 593ca7d4fccc..26f296624966 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
@@ -1,8 +1,133 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <errno.h>
 #include <dwarf-regs.h>
+#include "../annotate.h"
 #include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
 
+/*
+ * ARM64 instruction field extraction.
+ * Mirrors definitions in annotate-arm64.c.
+ */
+#define A64_RT(insn)	((insn) & 0x1f)
+#define A64_RN(insn)	(((insn) >> 5) & 0x1f)
+#define A64_RT2(insn)	(((insn) >> 10) & 0x1f)
+#define A64_RM(insn)	(((insn) >> 16) & 0x1f)
+
+/*
+ * Load/Store encoding sub-class detection.
+ * Derived from ARM Architecture Reference Manual, C4.1.
+ *
+ * Load/Store Pair (offset/pre/post): bits[29:27]=101, bit[26]=0
+ * Load/Store Register:              bits[29:27]=111, bit[26]=0
+ *   - Unsigned offset:              bits[25:24]=01
+ *   - Pre/Post-indexed:             bits[25:24]=00, bit[21]=0
+ *   - Register offset:              bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_PAIR_MASK		0x3c000000
+#define A64_INSN_LS_PAIR_VAL		0x28000000
+
+#define A64_INSN_LS_REG_MASK		0x3c000000
+#define A64_INSN_LS_REG_VAL		0x38000000
+
+#define A64_INSN_LS_UNSIGNED_MASK	0x3b000000
+#define A64_INSN_LS_UNSIGNED_VAL	0x39000000
+
+#define A64_INSN_LS_PREPOST_MASK	0x3b200000
+#define A64_INSN_LS_PREPOST_VAL		0x38000000
+
+#define A64_INSN_LS_REG_OFF_MASK	0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL	0x38200800
+
+static int arm64_get_immoff_unsigned(u32 insn)
+{
+	int size = (insn >> 30) & 0x3;
+	int imm12 = (insn >> 10) & 0xfff;
+
+	return imm12 << size;
+}
+
+static int arm64_get_immoff_prepost(u32 insn)
+{
+	int imm9 = (insn >> 12) & 0x1ff;
+
+	/* sign-extend 9-bit immediate */
+	if (imm9 & 0x100)
+		imm9 |= ~0x1ff;
+
+	return imm9;
+}
+
+static int arm64_get_immoff_pair(u32 insn)
+{
+	int imm7 = (insn >> 15) & 0x7f;
+	int scale = 2 + ((insn >> 31) & 1);
+
+	/* sign-extend 7-bit immediate */
+	if (imm7 & 0x40)
+		imm7 |= ~0x7f;
+
+	return imm7 << scale;
+}
+
+/*
+ * Fills op_loc fields depending on whether it is a source or target operand.
+ *
+ * ARM64 load/store encoding forms:
+ *   Register (unsigned offset):  [Rn, #imm12 << scale]
+ *   Register (pre/post-indexed): [Rn, #imm9]  or  [Rn], #imm9
+ *   Register (register offset):  [Rn, Rm{, extend/shift}]
+ *   Pair:                        [Rn, #imm7 << scale]
+ *
+ * For source (memory) operand: reg1=Rn (base), offset=immediate
+ * For target (register) operand: reg1=Rt
+ */
+void get_arm64_regs(u32 raw_insn, int is_source,
+		    struct annotated_op_loc *op_loc)
+{
+	if (is_source)
+		op_loc->reg1 = A64_RN(raw_insn);
+	else
+		op_loc->reg1 = A64_RT(raw_insn);
+
+	if (op_loc->multi_regs) {
+		/* LDP/STP pair: second register is Rt2 (bits[14:10]) */
+		if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL)
+			op_loc->reg2 = A64_RT2(raw_insn);
+		else
+			op_loc->reg2 = A64_RM(raw_insn);
+	}
+
+	if (!op_loc->mem_ref || !is_source)
+		return;
+
+	/* Load/Store Pair */
+	if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL) {
+		op_loc->offset = arm64_get_immoff_pair(raw_insn);
+		return;
+	}
+
+	/* Load/Store Register */
+	if ((raw_insn & A64_INSN_LS_REG_MASK) == A64_INSN_LS_REG_VAL) {
+		/* Unsigned offset */
+		if ((raw_insn & A64_INSN_LS_UNSIGNED_MASK) == A64_INSN_LS_UNSIGNED_VAL) {
+			op_loc->offset = arm64_get_immoff_unsigned(raw_insn);
+			return;
+		}
+
+		/* Register offset */
+		if ((raw_insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL) {
+			op_loc->offset = 0;
+			return;
+		}
+
+		/* Pre/Post-indexed */
+		if ((raw_insn & A64_INSN_LS_PREPOST_MASK) == A64_INSN_LS_PREPOST_VAL) {
+			op_loc->offset = arm64_get_immoff_prepost(raw_insn);
+			return;
+		}
+	}
+}
+
 int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum)
 {
 	if (perf_regnum < 0 || perf_regnum >= PERF_REG_ARM64_MAX)
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..c3f730d2fd88 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -129,6 +129,7 @@ int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, unsi
 				     bool only_libdw_supported);
 
 void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
+void get_arm64_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
 
 #else /* HAVE_LIBDW_SUPPORT */
 
@@ -144,6 +145,12 @@ static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source _
 {
 	return;
 }
+
+static inline void get_arm64_regs(u32 raw_insn __maybe_unused, int is_source __maybe_unused,
+		struct annotated_op_loc *op_loc __maybe_unused)
+{
+	return;
+}
 #endif
 
 #endif
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
                   ` (2 preceding siblings ...)
  2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
  4 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add ARM64 support to the core dispatch and initialization points:

1. annotate_get_insn_location(): Add an arm64 branch alongside the
   existing powerpc branch to call get_arm64_regs() for extracting
   register numbers and memory offsets from raw instructions.

2. arch_supports_insn_tracking(): Include arm64 so that
   find_data_type_block() can perform instruction-level type state
   tracking on ARM64.

3. init_type_state(): Add arm64 branch to set caller-saved registers
   (x0-x18 per AAPCS64) and stack register (SP, DWARF reg 31).
   Without this, stack_reg defaults to 0 (x0) after memset, causing
   x0-based memory accesses to be misidentified as stack accesses.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 tools/perf/util/annotate-data.c | 18 +++++++++++++++++-
 tools/perf/util/annotate.c      | 12 ++++++++----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 1eff0a27237d..c04ad66ff077 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -29,6 +29,11 @@
 /* register number of the stack pointer */
 #define X86_REG_SP 7
 
+/* ARM64 DWARF register numbers: x0-x30=0-30, SP=31 */
+#define ARM64_REG_SP 31
+#define ARM64_REG_LR 30
+#define ARM64_REG_FP 29
+
 static void delete_var_types(struct die_var_type *var_types);
 
 #define pr_debug_dtp(fmt, ...)					\
@@ -178,6 +183,16 @@ static void init_type_state(struct type_state *state, const struct arch *arch)
 		state->ret_reg = 0;
 		state->stack_reg = X86_REG_SP;
 	}
+
+	if (arch__is_arm64(arch)) {
+		int i;
+
+		/* ARM64 ABI: x0-x18 are caller-saved */
+		for (i = 0; i <= 18; i++)
+			state->regs[i].caller_saved = true;
+		state->ret_reg = 0;
+		state->stack_reg = ARM64_REG_SP;
+	}
 }
 
 static void exit_type_state(struct type_state *state)
@@ -1421,7 +1436,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
 
 static int arch_supports_insn_tracking(struct data_loc_info *dloc)
 {
-	if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)))
+	if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)) ||
+	    (arch__is_arm64(dloc->arch)))
 		return 1;
 	return 0;
 }
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..bd734826538d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2574,19 +2574,23 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
 		op_loc->reg2 = -1;
 
 		if (insn_str == NULL) {
-			if (!arch__is_powerpc(arch))
+			if (!arch__is_powerpc(arch) && !arch__is_arm64(arch))
 				continue;
 		}
 
 		/*
-		 * For powerpc, call get_powerpc_regs function which extracts the
-		 * required fields for op_loc, ie reg1, reg2, offset from the
-		 * raw instruction.
+		 * For powerpc and arm64, call arch-specific functions to
+		 * extract the required fields for op_loc (reg1, reg2, offset)
+		 * from the raw instruction.
 		 */
 		if (arch__is_powerpc(arch)) {
 			op_loc->mem_ref = mem_ref;
 			op_loc->multi_regs = multi_regs;
 			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
+		} else if (arch__is_arm64(arch)) {
+			op_loc->mem_ref = mem_ref;
+			op_loc->multi_regs = multi_regs;
+			get_arm64_regs(dl->raw.raw_insn, !i, op_loc);
 		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
 			op_loc->mem_ref = true;
 			op_loc->multi_regs = multi_regs;
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support
  2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
                   ` (3 preceding siblings ...)
  2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
  2026-06-23 13:32   ` sashiko-bot
  4 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
	linux-kernel

Add data type profiling support for ARM64, enabling 'perf annotate
--code-with-type' to show which data types each memory instruction
accesses. This follows the PowerPC model of raw 32-bit instruction
decoding rather than x86's text-based operand parsing.

Instruction classification (check_arm64_insn):
  Classify instructions by raw encoding into load/store, arithmetic
  (add immediate, adrp), and register move categories. GP load/store
  is detected by bits[27:25] pattern with LDR (literal) excluded to
  avoid misinterpreting its different register field layout. ADRP, ADD
  immediate (with ADDG/SUBG excluded via tighter mask), and MOV
  (register) have their own mask/val pairs derived from the ARM ARM.

Load/store operand parsing:
  Set mem_ref=true for all GP load/store instructions. Detect register
  offset addressing mode to set multi_regs=true when Rm is used as a
  second source operand.

Register state tracking (update_insn_state_arm64):
  Track three instruction patterns for type propagation:

  - ADRP Xd, #page: Compute the PC-relative page address using
    sign_extend64() and either resolve the global variable type
    directly or store the address as TSR_KIND_CONST for later
    resolution by ADD.

  - ADD Xd, Xn, #imm: If Xn holds an ADRP result (TSR_KIND_CONST),
    compute the full variable address and resolve via
    get_global_var_type(). This handles the common ARM64 global
    variable access pattern: adrp+add+ldr.

  - MOV Xd, Xm: Propagate type state including kind, offset, and
    imm_value from the source to destination register.

Known limitations:
  - The adrp+ldr pattern (without intermediate ADD, using lo12 folded
    into the LDR offset) is not yet handled. This requires extending
    check_matching_type() to resolve TSR_KIND_CONST with the load
    offset, which can be added incrementally.
  - Pointer chain tracking (load-from-memory propagating type to the
    destination register) is not implemented, matching PowerPC's
    current scope.

Architecture initialization:
  Register the update_insn_state callback for instruction-level type
  state tracking.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 332 ++++++++++++++++++
 1 file changed, 332 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index b98aaf9a8a7b..887ed22c4ca0 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -1,12 +1,21 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/bitops.h>
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <errno.h>
+#include <inttypes.h>
 #include <stdlib.h>
 #include <string.h>
 #include <linux/zalloc.h>
 #include <regex.h>
 #include "../annotate.h"
+#include "../debug.h"
 #include "../disasm.h"
+#ifdef HAVE_LIBDW_SUPPORT
+#include "../annotate-data.h"
+#include "../map.h"
+#include "../symbol.h"
+#endif
 
 struct arch_arm64 {
 	struct arch arch;
@@ -14,6 +23,47 @@ struct arch_arm64 {
 	regex_t jump_insn;
 };
 
+/*
+ * ARM64 instruction encoding masks and values.
+ * Derived from ARM Architecture Reference Manual, C4.1 A64 encoding index.
+ *
+ * These mirror the definitions in arch/arm64/include/asm/insn.h but are
+ * duplicated here because that header depends on kernel-only macros
+ * (BUILD_BUG_ON, __always_inline).
+ */
+
+/* GP Load/Store: bit[27]=1, bit[26]=0 (GP, not SIMD/FP), bit[25]=0 */
+#define A64_INSN_GP_LS_MASK	0x0e000000
+#define A64_INSN_GP_LS_VAL	0x08000000
+
+/* LDR/LDRSW (literal): bits[29:27]=011, bit[26]=0 -- must be excluded from GP LS */
+#define A64_INSN_LDR_LIT_MASK	0x3b000000
+#define A64_INSN_LDR_LIT_VAL	0x18000000
+
+/*
+ * Load/Store register (register offset):
+ * bits[29:27]=111, bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_REG_OFF_MASK	0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL	0x38200800
+
+/* ADRP: mask=0x9F000000, val=0x90000000 */
+#define A64_INSN_ADRP_MASK	0x9f000000
+#define A64_INSN_ADRP_VAL	0x90000000
+
+/* ADD (immediate): mask=0x7F800000, val=0x11000000 (excludes ADDG/SUBG) */
+#define A64_INSN_ADD_IMM_MASK	0x7f800000
+#define A64_INSN_ADD_IMM_VAL	0x11000000
+
+/* MOV (register) = ORR Xd/Wd, XZR/WZR, Xm/Wm: Rn=11111, imm6=000000 */
+#define A64_INSN_MOV_REG_MASK	0x7fe0ffe0
+#define A64_INSN_MOV_REG_VAL	0x2a0003e0
+
+/* Instruction field extraction */
+#define A64_RT(insn)	((insn) & 0x1f)
+#define A64_RN(insn)	(((insn) >> 5) & 0x1f)
+#define A64_RM(insn)	(((insn) >> 16) & 0x1f)
+
 static int arm64_mov__parse(const struct arch *arch __maybe_unused,
 			    struct ins_operands *ops,
 			    struct map_symbol *ms __maybe_unused,
@@ -69,6 +119,285 @@ static const struct ins_ops arm64_mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+/*
+ * ARM64 load/store instruction parser.
+ * Sets mem_ref and multi_regs based on raw instruction encoding.
+ */
+static int arm64_load_store__parse(const struct arch *arch __maybe_unused,
+				   struct ins_operands *ops,
+				   struct map_symbol *ms __maybe_unused,
+				   struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+
+	ops->source.mem_ref = true;
+	ops->source.multi_regs = false;
+
+	/* Load/Store register (register offset) uses Rm as second source */
+	if ((insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL)
+		ops->source.multi_regs = true;
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static int arm64_load_store__scnprintf(const struct ins *ins, char *bf,
+				       size_t size,
+				       struct ins_operands *ops,
+				       int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			 ops->raw);
+}
+
+static const struct ins_ops arm64_load_store_ops = {
+	.parse     = arm64_load_store__parse,
+	.scnprintf = arm64_load_store__scnprintf,
+};
+
+static int arm64_arithmetic__parse(const struct arch *arch __maybe_unused,
+				   struct ins_operands *ops,
+				   struct map_symbol *ms __maybe_unused,
+				   struct disasm_line *dl __maybe_unused)
+{
+	ops->source.mem_ref = false;
+	ops->source.multi_regs = false;
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static int arm64_arithmetic__scnprintf(const struct ins *ins, char *bf,
+				       size_t size,
+				       struct ins_operands *ops,
+				       int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			 ops->raw);
+}
+
+static const struct ins_ops arm64_arithmetic_ops = {
+	.parse     = arm64_arithmetic__parse,
+	.scnprintf = arm64_arithmetic__scnprintf,
+};
+
+/*
+ * Classify ARM64 instructions by raw encoding for data type profiling.
+ */
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+
+	/* Exclude LDR/LDRSW (literal) before matching GP Load/Store */
+	if ((insn & A64_INSN_LDR_LIT_MASK) == A64_INSN_LDR_LIT_VAL)
+		return NULL;
+
+	if ((insn & A64_INSN_GP_LS_MASK) == A64_INSN_GP_LS_VAL)
+		return &arm64_load_store_ops;
+
+	if ((insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL)
+		return &arm64_arithmetic_ops;
+
+	if ((insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL)
+		return &arm64_arithmetic_ops;
+
+	if ((insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL)
+		return &arm64_arithmetic_ops;
+
+	return NULL;
+}
+
+#ifdef HAVE_LIBDW_SUPPORT
+
+static inline bool arm64_is_adrp(u32 insn)
+{
+	return (insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL;
+}
+
+static inline bool arm64_is_add_imm(u32 insn)
+{
+	return (insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL;
+}
+
+static inline bool arm64_is_mov_reg(u32 insn)
+{
+	return (insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL;
+}
+
+/*
+ * Compute the page address from an ADRP instruction.
+ * ADRP Xd, #imm: Xd = (PC & ~0xFFF) + (imm << 12)
+ * immhi = bits[23:5] (19 bits), immlo = bits[30:29] (2 bits)
+ * imm = sign_extend(immhi:immlo, 21)
+ */
+static u64 arm64_adrp_target(u64 pc, u32 insn)
+{
+	u64 immhi = (insn >> 5) & 0x7ffff;
+	u64 immlo = (insn >> 29) & 0x3;
+	u64 imm = (immhi << 2) | immlo;
+
+	return (pc & ~0xfffULL) + (sign_extend64(imm, 20) << 12);
+}
+
+/*
+ * Track register state for ARM64 instructions.
+ *
+ * Handles three instruction patterns:
+ *
+ * 1. ADRP Xd, #page - computes a PC-relative page address.
+ *    Track the computed address so a subsequent LDR can resolve
+ *    the global variable.
+ *
+ * 2. ADD Xd, Xn, #imm - if Xn holds a tracked address (from ADRP),
+ *    propagate the adjusted address to Xd.
+ *
+ * 3. MOV Xd, Xm - propagate type state from Xm to Xd.
+ */
+static void update_insn_state_arm64(struct type_state *state,
+				    struct data_loc_info *dloc,
+				    Dwarf_Die *cu_die,
+				    struct disasm_line *dl)
+{
+	u32 insn = dl->raw.raw_insn;
+	int rd, rn;
+	struct type_state_reg *tsr;
+
+	if (arm64_is_adrp(insn)) {
+		u64 pc, page_addr;
+		int offset;
+		Dwarf_Die type_die;
+
+		rd = A64_RT(insn);
+		if (!has_reg_type(state, rd))
+			return;
+
+		tsr = &state->regs[rd];
+
+		pc = map__rip_2objdump(dloc->ms->map,
+				       dloc->ms->sym->start + dl->al.offset);
+		page_addr = arm64_adrp_target(pc, insn);
+
+		/*
+		 * Try to resolve the global variable at this page address.
+		 * If not found, store it as a constant for later ADD resolution.
+		 */
+		if (get_global_var_type(cu_die, dloc,
+					dloc->ms->sym->start + dl->al.offset,
+					page_addr, &offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_POINTER;
+			tsr->offset = offset;
+			tsr->ok = true;
+
+			pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d",
+				     (u32)dl->al.offset, page_addr, rd);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		} else {
+			tsr->kind = TSR_KIND_CONST;
+			tsr->imm_value = page_addr;
+			tsr->ok = true;
+
+			pr_debug_dtp("adrp [%x] page=%#"PRIx64" -> reg%d\n",
+				     (u32)dl->al.offset, page_addr, rd);
+		}
+		return;
+	}
+
+	if (arm64_is_add_imm(insn)) {
+		int imm12, shift;
+		u64 var_addr;
+		int offset;
+		Dwarf_Die type_die;
+
+		rd = A64_RT(insn);
+		rn = A64_RN(insn);
+
+		if (!has_reg_type(state, rd) || !has_reg_type(state, rn))
+			return;
+
+		tsr = &state->regs[rd];
+
+		if (!state->regs[rn].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		imm12 = (insn >> 10) & 0xfff;
+		shift = ((insn >> 22) & 0x1) ? 12 : 0;
+
+		/*
+		 * If Rn holds an ADRP result (TSR_KIND_CONST), compute
+		 * the full address and try to resolve the global variable.
+		 */
+		if (state->regs[rn].kind == TSR_KIND_CONST) {
+			var_addr = state->regs[rn].imm_value +
+				   ((u64)imm12 << shift);
+
+			if (get_global_var_type(cu_die, dloc,
+						dloc->ms->sym->start + dl->al.offset,
+						var_addr, &offset, &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_POINTER;
+				tsr->offset = offset;
+				tsr->ok = true;
+
+				pr_debug_dtp("add [%x] global addr=%#"PRIx64" -> reg%d",
+					     (u32)dl->al.offset, var_addr, rd);
+				pr_debug_type_name(&tsr->type, tsr->kind);
+				return;
+			}
+		}
+
+		/* Otherwise propagate existing type with adjusted offset */
+		if (state->regs[rn].kind == TSR_KIND_TYPE ||
+		    state->regs[rn].kind == TSR_KIND_POINTER) {
+			tsr->type = state->regs[rn].type;
+			tsr->kind = state->regs[rn].kind;
+			tsr->offset = state->regs[rn].offset + (imm12 << shift);
+			tsr->ok = true;
+
+			pr_debug_dtp("add [%x] imm=%#x reg%d -> reg%d",
+				     (u32)dl->al.offset, imm12 << shift, rn, rd);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		} else {
+			tsr->ok = false;
+		}
+		return;
+	}
+
+	if (arm64_is_mov_reg(insn)) {
+		int rm;
+
+		rd = A64_RT(insn);
+		rm = A64_RM(insn);
+
+		if (!has_reg_type(state, rd))
+			return;
+
+		tsr = &state->regs[rd];
+
+		if (!has_reg_type(state, rm) || !state->regs[rm].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		tsr->type = state->regs[rm].type;
+		tsr->kind = state->regs[rm].kind;
+		tsr->offset = state->regs[rm].offset;
+		tsr->imm_value = state->regs[rm].imm_value;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			     (u32)dl->al.offset, rm, rd);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+		return;
+	}
+}
+#endif /* HAVE_LIBDW_SUPPORT */
+
 static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
 {
 	struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
@@ -105,6 +434,9 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.skip_functions_char = '+';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
 	annotate_opts.show_asm_raw = true;
+#ifdef HAVE_LIBDW_SUPPORT
+	arch->update_insn_state = update_insn_state_arm64;
+#endif
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
-- 
2.51.2.612.gdc70283dfc


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing
  2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
@ 2026-06-23 13:19   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-23 13:19 UTC (permalink / raw)
  To: Shuai Xue; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] `disasm_line__parse_arm64` corrupts disassembly lines from LLVM and Capstone by assuming the first word is always raw hex bytes.
- [High] Assigning `dl->ins.name = ""` leads to a glibc crash (`free(): invalid pointer`) during teardown.
- [Low] Missing explicit inclusion of `<stdio.h>` for `sscanf`.

Pre-existing issues:
- [High] Reads past the end of the line buffer when processing raw hexadecimal instructions.
--

commit 76213afac23a5e082aabd6700958b58201fa5b8a
Author: Shuai Xue <xueshuai@linux.alibaba.com>

perf disasm: Add ARM64 architecture detection and raw instruction parsing

This commit adds an architecture detection helper for ARM64 binaries and
introduces a new parser to extract raw 32-bit instruction words from ARM64
objdump output. It wires up the new ARM64 parsing path alongside the existing
PowerPC implementation.

> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 59ba88e1f7443..83fad4f01442b 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -902,6 +916,53 @@ static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_ar

[Severity: High]
This is a pre-existing issue, but does the pointer arithmetic in
disasm_line__parse_powerpc() read past the end of the line buffer?

char *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
char *name = skip_spaces(name_raw_insn + RAW_BYTES);

If symbol__disassemble_raw() populates the line argument with an 8-character
string, and RAW_BYTES is hardcoded to 11, could skip_spaces() end up reading
from uninitialized stack memory?

> +/*
> + * Parses ARM64 disassembly output which includes raw instruction bytes.
> + * ARM64 objdump format:
> + *   a9bf7bfd 	stp	x29, x30, [sp, #-16]!
> + *
> + * The raw instruction is a hex word (typically 8 chars) followed by whitespace.
> + */
> +static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args)
> +{
> +	char *line = dl->al.line;
> +	const char **namep = &dl->ins.name;
> +	char **rawp = &dl->ops.raw;
> +	char *name_raw_insn = skip_spaces(line);
> +	char *end_raw, *name, *tmp_raw_insn;
> +	int ret = 0;
> +
> +	if (name_raw_insn[0] == '\0')
> +		return -1;
> +
> +	/* Find end of raw instruction hex by looking for whitespace */
> +	end_raw = name_raw_insn;
> +	while (*end_raw && !isspace(*end_raw))
> +		end_raw++;
> +
> +	name = skip_spaces(end_raw);
> +
> +	if (args->options->disassembler_used)
> +		ret = disasm_line__parse(name, namep, rawp);

[Severity: High]
If a user analyzes a binary with the capstone or llvm disassemblers, the
output lacks raw hexadecimal bytes. Does this loop assume the first
non-whitespace token is always the raw hex instruction?

If so, could this isolate the instruction mnemonic by mistake, and cause the
remaining operands to be parsed incorrectly as the instruction mnemonic?

> +	else
> +		*namep = "";

[Severity: High]
When perf annotate is run with the source flag, the disassembler_used option
evaluates to 0. This assigns a read-only string literal to the namep pointer.

Later on, disasm_line__free() unconditionally attempts to free this memory:

void disasm_line__free(struct disasm_line *dl)
{
        zfree(&dl->ins.name);
        ...

Will attempting to free this string literal cause a glibc crash?

> +
> +	tmp_raw_insn = strndup(name_raw_insn, end_raw - name_raw_insn);
> +	if (tmp_raw_insn == NULL) {
> +		if (args->options->disassembler_used)
> +			zfree(namep);
> +		return -1;
> +	}
> +
> +	remove_spaces(tmp_raw_insn);
> +
> +	if (sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn) != 1)

[Severity: Low]
Does this new usage of sscanf require an explicit include of <stdio.h> at the
top of the file to prevent musl libc build failures?

> +		dl->raw.raw_insn = 0;
> +	free(tmp_raw_insn);
> +
> +	return ret;
> +}
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623130234.8709-1-xueshuai@linux.alibaba.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support
  2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 13:32   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-23 13:32 UTC (permalink / raw)
  To: Shuai Xue; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Eager global variable resolution in ADRP causes incorrect type binding when the page boundary falls inside an adjacent variable.
- [High] Missing register type state invalidation for unhandled instructions leads to stale type propagation.
- [Low] Comment mismatch regarding bit 26 in A64_INSN_LDR_LIT_MASK.
--

commit 429208e90da6fddfa29cce218b8f6998420f5ba0
Author: Shuai Xue <xueshuai@linux.alibaba.com>

perf annotate-arch: Add ARM64 data type profiling support
    
This commit adds data type profiling support for ARM64, allowing
perf annotate --code-with-type to show data types accessed by memory
instructions. It implements raw instruction decoding to track register
states across ADRP, ADD, and MOV instructions.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index b98aaf9a8a7be..887ed22c4ca05 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> @@ -14,6 +23,47 @@ struct arch_arm64 {
>  	regex_t jump_insn;
>  };
>  
> +/*
> + * ARM64 instruction encoding masks and values.
> + * Derived from ARM Architecture Reference Manual, C4.1 A64 encoding index.
> + *
> + * These mirror the definitions in arch/arm64/include/asm/insn.h but are
> + * duplicated here because that header depends on kernel-only macros
> + * (BUILD_BUG_ON, __always_inline).
> + */
> +
> +/* GP Load/Store: bit[27]=1, bit[26]=0 (GP, not SIMD/FP), bit[25]=0 */
> +#define A64_INSN_GP_LS_MASK	0x0e000000
> +#define A64_INSN_GP_LS_VAL	0x08000000
> +
> +/* LDR/LDRSW (literal): bits[29:27]=011, bit[26]=0 -- must be excluded from GP LS */
> +#define A64_INSN_LDR_LIT_MASK	0x3b000000

[Severity: Low]
This isn't a bug, but does the mask definition exclude bit 26, contradicting
the preceding comment that states bit[26]=0?

> +#define A64_INSN_LDR_LIT_VAL	0x18000000
> +
> +/*
> + * Load/Store register (register offset):
[ ... ]
> @@ -69,6 +119,285 @@ static const struct ins_ops arm64_mov_ops = {
[ ... ]
> +static void update_insn_state_arm64(struct type_state *state,
> +				    struct data_loc_info *dloc,
> +				    Dwarf_Die *cu_die,
> +				    struct disasm_line *dl)
> +{
> +	u32 insn = dl->raw.raw_insn;
> +	int rd, rn;
> +	struct type_state_reg *tsr;
> +
> +	if (arm64_is_adrp(insn)) {
> +		u64 pc, page_addr;
> +		int offset;
> +		Dwarf_Die type_die;
> +
> +		rd = A64_RT(insn);
> +		if (!has_reg_type(state, rd))
> +			return;
> +
> +		tsr = &state->regs[rd];
> +
> +		pc = map__rip_2objdump(dloc->ms->map,
> +				       dloc->ms->sym->start + dl->al.offset);
> +		page_addr = arm64_adrp_target(pc, insn);
> +
> +		/*
> +		 * Try to resolve the global variable at this page address.
> +		 * If not found, store it as a constant for later ADD resolution.
> +		 */
> +		if (get_global_var_type(cu_die, dloc,
> +					dloc->ms->sym->start + dl->al.offset,
> +					page_addr, &offset, &type_die)) {

[Severity: High]
Does eager global variable resolution in ADRP cause incorrect type binding when
the page boundary falls inside an adjacent variable?

If the 4KB-aligned page address happens to fall within the memory span of a
preceding global variable, get_global_var_type() might immediately resolve the
address to that wrong variable. The subsequent ADD instruction then adds its
immediate offset to the wrong variable's offset, never attempting to resolve
the true target variable.

Should type resolution be deferred until the full address is formed in the
subsequent ADD?

> +			tsr->type = type_die;
> +			tsr->kind = TSR_KIND_POINTER;
> +			tsr->offset = offset;
> +			tsr->ok = true;
> +
> +			pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d",
> +				     (u32)dl->al.offset, page_addr, rd);
> +			pr_debug_type_name(&tsr->type, tsr->kind);
> +		} else {
[ ... ]
> +	if (arm64_is_mov_reg(insn)) {
> +		int rm;
> +
> +		rd = A64_RT(insn);
> +		rm = A64_RM(insn);
> +
> +		if (!has_reg_type(state, rd))
> +			return;
> +
> +		tsr = &state->regs[rd];
> +
> +		if (!has_reg_type(state, rm) || !state->regs[rm].ok) {
> +			tsr->ok = false;
> +			return;
> +		}
> +
> +		tsr->type = state->regs[rm].type;
> +		tsr->kind = state->regs[rm].kind;
> +		tsr->offset = state->regs[rm].offset;
> +		tsr->imm_value = state->regs[rm].imm_value;
> +		tsr->ok = true;
> +
> +		pr_debug_dtp("mov [%x] reg%d -> reg%d",
> +			     (u32)dl->al.offset, rm, rd);
> +		pr_debug_type_name(&tsr->type, tsr->kind);
> +		return;
> +	}
> +}

[Severity: High]
Are we missing register type state invalidation for unhandled instructions
here?

If an unhandled instruction clobbers a register, this function simply ignores
the instruction and implicitly returns. The old type state for that register
is never invalidated, causing it to retain a stale data type. Subsequent
memory accesses using the clobbered register could falsely resolve against the
stale type, polluting the output with false positives.

Should there be an invalidation fallback for unhandled instructions, similar
to how x86 explicitly calls invalidate_reg_state() for unhandled clobbering
instructions?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623130234.8709-1-xueshuai@linux.alibaba.com?part=5

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-23 13:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
2026-06-23 13:19   ` sashiko-bot
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:32   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox