* [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
@ 2026-06-23 13:02 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
`perf test -v "perf data type profiling tests"` fails on ARM64:
Basic Rust perf annotate test
perf mem record -o /tmp/perf.data perf test -w code_with_type
perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
Basic annotate [Failed: missing target data type]
The root cause is that ARM64 lacks the instruction parsing infrastructure
required for data type profiling. Specifically:
1. annotate_get_insn_location() cannot extract register numbers and
memory offsets from ARM64 load/store instructions, because ARM64
does not set objdump.register_char or objdump.memory_ref_char
(unlike x86 which uses '%' and '(').
2. arch_supports_insn_tracking() does not include ARM64, so
find_data_type_block() cannot perform instruction-level type state
tracking.
3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
after memset, which causes x0-based memory accesses to be
misidentified as stack accesses.
As a result, perf annotate --code-with-type silently produces no type
annotations on ARM64, and the test grep for "# data-type: struct Buf"
fails.
This series adds ARM64 data type profiling support following the PowerPC
model: decode raw 32-bit instruction words rather than parsing objdump
text. ARM64's fixed-width encoding and trivial DWARF register mapping
(x0-x30 = DWARF 0-30) make this approach clean and robust.
Three classes of instructions are tracked for register state propagation:
- ADRP: compute PC-relative page address for global variable resolution
- ADD (immediate): combine with ADRP result to form full variable address
- MOV (register): propagate type state between registers
This covers the common `adrp + add + ldr/str` pattern that ARM64
compilers emit for global variable access.
Known limitations:
- The `adrp + ldr` pattern (with :lo12: folded into the load offset,
without an intermediate ADD) is not yet handled. This requires
extending check_matching_type() to resolve TSR_KIND_CONST with the
load offset, which can be added incrementally.
- Pointer chain tracking (load-from-memory propagating type to the
destination register) is not implemented, matching PowerPC's current
scope.
Testing:
All four sub-tests in `perf test "perf data type profiling tests"`
pass reliably on ARM64 (AArch64, SPE-capable hardware):
- Basic/Pipe Rust: struct Buf (code_with_type workload)
- Basic/Pipe C: struct buf (datasym workload, global variable)
Patch breakdown:
1/5 Widen type_state_reg::imm_value from u32 to u64 (prerequisite
for storing 64-bit addresses from ADRP)
2/5 Add arch__is_arm64() detection, raw instruction parsing from
objdump output, and enable show_asm_raw for ARM64
3/5 Add get_arm64_regs() to extract registers and memory offsets
from load/store instruction encodings (4 addressing modes)
4/5 Wire up ARM64 in annotate_get_insn_location(),
arch_supports_insn_tracking(), and init_type_state()
5/5 Main patch: instruction classification, ADRP/ADD/MOV register
state tracking, and architecture initialization
Shuai Xue (5):
perf annotate-data: Widen type_state_reg::imm_value to u64
perf disasm: Add ARM64 architecture detection and raw instruction
parsing
perf dwarf-regs: Add ARM64 register and offset extraction from raw
instructions
perf annotate: Wire up ARM64 data type profiling infrastructure
perf annotate-arch: Add ARM64 data type profiling support
.../perf/util/annotate-arch/annotate-arm64.c | 333 ++++++++++++++++++
tools/perf/util/annotate-arch/annotate-x86.c | 2 +-
tools/perf/util/annotate-data.c | 18 +-
tools/perf/util/annotate-data.h | 2 +-
tools/perf/util/annotate.c | 12 +-
tools/perf/util/disasm.c | 64 ++++
tools/perf/util/disasm.h | 2 +
.../util/dwarf-regs-arch/dwarf-regs-arm64.c | 125 +++++++
tools/perf/util/include/dwarf-regs.h | 7 +
9 files changed, 558 insertions(+), 7 deletions(-)
--
2.51.2.612.gdc70283dfc
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
The imm_value field in struct type_state_reg is used to store addresses
computed from PC-relative instructions (e.g., ARM64 ADRP). As a u32,
it silently truncates addresses above 4GB, which breaks global variable
resolution for kernel profiling and large-address userspace on ARM64.
Widen it to u64 to support the full 64-bit address space. Update the
corresponding format string in the x86 annotation code.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
tools/perf/util/annotate-arch/annotate-x86.c | 2 +-
tools/perf/util/annotate-data.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index 7e6136536393..985aa8bbd0b9 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -547,7 +547,7 @@ static void update_insn_state_x86(struct type_state *state,
tsr->offset = 0;
tsr->ok = true;
- pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+ pr_debug_dtp("mov [%x] imm=%#"PRIx64" -> reg%d\n",
insn_offset, tsr->imm_value, dst->reg1);
return;
}
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index c26130744260..4a9b4814479f 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -173,7 +173,7 @@ extern struct annotated_data_stat ann_data_stat;
*/
struct type_state_reg {
Dwarf_Die type;
- u32 imm_value;
+ u64 imm_value;
/*
* The offset within the struct that the register points to.
* A value of 0 means the register points to the beginning.
--
2.51.2.612.gdc70283dfc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
Add arch__is_arm64() helper to identify ARM64 binaries by ELF machine
type, following the existing arch__is_x86() and arch__is_powerpc()
pattern.
Add disasm_line__parse_arm64() to extract raw 32-bit instruction words
from ARM64 objdump output. Unlike PowerPC which needs be32_to_cpu()
byte-swapping, ARM64 instructions are always little-endian and can be
used directly. The parser finds the hex word boundary dynamically
instead of using a hardcoded width, and validates the sscanf result.
Set annotate_opts.show_asm_raw in arch__new_arm64() so that objdump
includes raw instruction bytes, which the parser requires.
Wire up the ARM64 parsing path in disasm_line__new() alongside the
existing PowerPC path.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 1 +
tools/perf/util/disasm.c | 64 +++++++++++++++++++
tools/perf/util/disasm.h | 2 +
3 files changed, 67 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 33080fdca125..b98aaf9a8a7b 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -104,6 +104,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
arch->objdump.comment_char = '/';
arch->objdump.skip_functions_char = '+';
arch->associate_instruction_ops = arm64__associate_instruction_ops;
+ annotate_opts.show_asm_raw = true;
/* bl, blr */
err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 59ba88e1f744..83fad4f01442 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -52,6 +52,7 @@ const struct ins_ops arithmetic_ops;
static void ins__sort(struct arch *arch);
static int disasm_line__parse(char *line, const char **namep, char **rawp);
static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_args *args);
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args);
static __attribute__((constructor)) void symbol__init_regexpr(void)
{
@@ -203,6 +204,11 @@ bool arch__is_powerpc(const struct arch *arch)
return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
}
+bool arch__is_arm64(const struct arch *arch)
+{
+ return arch->id.e_machine == EM_AARCH64;
+}
+
static void ins_ops__delete(struct ins_operands *ops)
{
if (ops == NULL)
@@ -777,6 +783,14 @@ static const struct ins_ops *__ins__find(const struct arch *arch, const char *na
return ops;
}
+ if (arch__is_arm64(arch)) {
+ const struct ins_ops *ops;
+
+ ops = check_arm64_insn(dl);
+ if (ops)
+ return ops;
+ }
+
if (!arch->sorted_instructions) {
ins__sort((struct arch *)arch);
((struct arch *)arch)->sorted_instructions = true;
@@ -902,6 +916,53 @@ static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_ar
return ret;
}
+/*
+ * Parses ARM64 disassembly output which includes raw instruction bytes.
+ * ARM64 objdump format:
+ * a9bf7bfd stp x29, x30, [sp, #-16]!
+ *
+ * The raw instruction is a hex word (typically 8 chars) followed by whitespace.
+ */
+static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args)
+{
+ char *line = dl->al.line;
+ const char **namep = &dl->ins.name;
+ char **rawp = &dl->ops.raw;
+ char *name_raw_insn = skip_spaces(line);
+ char *end_raw, *name, *tmp_raw_insn;
+ int ret = 0;
+
+ if (name_raw_insn[0] == '\0')
+ return -1;
+
+ /* Find end of raw instruction hex by looking for whitespace */
+ end_raw = name_raw_insn;
+ while (*end_raw && !isspace(*end_raw))
+ end_raw++;
+
+ name = skip_spaces(end_raw);
+
+ if (args->options->disassembler_used)
+ ret = disasm_line__parse(name, namep, rawp);
+ else
+ *namep = "";
+
+ tmp_raw_insn = strndup(name_raw_insn, end_raw - name_raw_insn);
+ if (tmp_raw_insn == NULL) {
+ if (args->options->disassembler_used)
+ zfree(namep);
+ return -1;
+ }
+
+ remove_spaces(tmp_raw_insn);
+
+ if (sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn) != 1)
+ dl->raw.raw_insn = 0;
+ free(tmp_raw_insn);
+
+ return ret;
+}
+
static void annotation_line__init(struct annotation_line *al,
struct annotate_args *args,
int nr)
@@ -958,6 +1019,9 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
if (arch__is_powerpc(args->arch)) {
if (disasm_line__parse_powerpc(dl, args) < 0)
goto out_free_line;
+ } else if (arch__is_arm64(args->arch)) {
+ if (disasm_line__parse_arm64(dl, args) < 0)
+ goto out_free_line;
} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
goto out_free_line;
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 25756e3f47e4..dfce128a3188 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -111,6 +111,7 @@ struct annotate_args {
const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
bool arch__is_x86(const struct arch *arch);
bool arch__is_powerpc(const struct arch *arch);
+bool arch__is_arm64(const struct arch *arch);
extern const struct ins_ops call_ops;
extern const struct ins_ops dec_ops;
@@ -143,6 +144,7 @@ bool ins__is_ret(const struct ins *ins);
bool ins__is_lock(const struct ins *ins);
const struct ins_ops *check_ppc_insn(struct disasm_line *dl);
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl);
struct disasm_line *disasm_line__new(struct annotate_args *args);
void disasm_line__free(struct disasm_line *dl);
--
2.51.2.612.gdc70283dfc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
Add get_arm64_regs() to extract register numbers (Rn, Rt, Rm) and
memory offsets from raw ARM64 load/store instruction encodings. This
follows the same pattern as get_powerpc_regs() for PowerPC.
ARM64 DWARF register numbers map trivially: x0-x30 = 0-30, sp = 31,
so the hardware register fields can be used directly as DWARF regnums.
Four addressing modes are handled:
- Unsigned offset: imm12 scaled by access size
- Pre/Post-indexed: sign-extended 9-bit immediate
- Register offset: offset from Rm (set to 0, handled via multi_regs)
- Load/Store Pair: sign-extended 7-bit immediate scaled by element size
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
.../util/dwarf-regs-arch/dwarf-regs-arm64.c | 125 ++++++++++++++++++
tools/perf/util/include/dwarf-regs.h | 7 +
2 files changed, 132 insertions(+)
diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
index 593ca7d4fccc..26f296624966 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
@@ -1,8 +1,133 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <dwarf-regs.h>
+#include "../annotate.h"
#include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
+/*
+ * ARM64 instruction field extraction.
+ * Mirrors definitions in annotate-arm64.c.
+ */
+#define A64_RT(insn) ((insn) & 0x1f)
+#define A64_RN(insn) (((insn) >> 5) & 0x1f)
+#define A64_RT2(insn) (((insn) >> 10) & 0x1f)
+#define A64_RM(insn) (((insn) >> 16) & 0x1f)
+
+/*
+ * Load/Store encoding sub-class detection.
+ * Derived from ARM Architecture Reference Manual, C4.1.
+ *
+ * Load/Store Pair (offset/pre/post): bits[29:27]=101, bit[26]=0
+ * Load/Store Register: bits[29:27]=111, bit[26]=0
+ * - Unsigned offset: bits[25:24]=01
+ * - Pre/Post-indexed: bits[25:24]=00, bit[21]=0
+ * - Register offset: bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_PAIR_MASK 0x3c000000
+#define A64_INSN_LS_PAIR_VAL 0x28000000
+
+#define A64_INSN_LS_REG_MASK 0x3c000000
+#define A64_INSN_LS_REG_VAL 0x38000000
+
+#define A64_INSN_LS_UNSIGNED_MASK 0x3b000000
+#define A64_INSN_LS_UNSIGNED_VAL 0x39000000
+
+#define A64_INSN_LS_PREPOST_MASK 0x3b200000
+#define A64_INSN_LS_PREPOST_VAL 0x38000000
+
+#define A64_INSN_LS_REG_OFF_MASK 0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL 0x38200800
+
+static int arm64_get_immoff_unsigned(u32 insn)
+{
+ int size = (insn >> 30) & 0x3;
+ int imm12 = (insn >> 10) & 0xfff;
+
+ return imm12 << size;
+}
+
+static int arm64_get_immoff_prepost(u32 insn)
+{
+ int imm9 = (insn >> 12) & 0x1ff;
+
+ /* sign-extend 9-bit immediate */
+ if (imm9 & 0x100)
+ imm9 |= ~0x1ff;
+
+ return imm9;
+}
+
+static int arm64_get_immoff_pair(u32 insn)
+{
+ int imm7 = (insn >> 15) & 0x7f;
+ int scale = 2 + ((insn >> 31) & 1);
+
+ /* sign-extend 7-bit immediate */
+ if (imm7 & 0x40)
+ imm7 |= ~0x7f;
+
+ return imm7 << scale;
+}
+
+/*
+ * Fills op_loc fields depending on whether it is a source or target operand.
+ *
+ * ARM64 load/store encoding forms:
+ * Register (unsigned offset): [Rn, #imm12 << scale]
+ * Register (pre/post-indexed): [Rn, #imm9] or [Rn], #imm9
+ * Register (register offset): [Rn, Rm{, extend/shift}]
+ * Pair: [Rn, #imm7 << scale]
+ *
+ * For source (memory) operand: reg1=Rn (base), offset=immediate
+ * For target (register) operand: reg1=Rt
+ */
+void get_arm64_regs(u32 raw_insn, int is_source,
+ struct annotated_op_loc *op_loc)
+{
+ if (is_source)
+ op_loc->reg1 = A64_RN(raw_insn);
+ else
+ op_loc->reg1 = A64_RT(raw_insn);
+
+ if (op_loc->multi_regs) {
+ /* LDP/STP pair: second register is Rt2 (bits[14:10]) */
+ if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL)
+ op_loc->reg2 = A64_RT2(raw_insn);
+ else
+ op_loc->reg2 = A64_RM(raw_insn);
+ }
+
+ if (!op_loc->mem_ref || !is_source)
+ return;
+
+ /* Load/Store Pair */
+ if ((raw_insn & A64_INSN_LS_PAIR_MASK) == A64_INSN_LS_PAIR_VAL) {
+ op_loc->offset = arm64_get_immoff_pair(raw_insn);
+ return;
+ }
+
+ /* Load/Store Register */
+ if ((raw_insn & A64_INSN_LS_REG_MASK) == A64_INSN_LS_REG_VAL) {
+ /* Unsigned offset */
+ if ((raw_insn & A64_INSN_LS_UNSIGNED_MASK) == A64_INSN_LS_UNSIGNED_VAL) {
+ op_loc->offset = arm64_get_immoff_unsigned(raw_insn);
+ return;
+ }
+
+ /* Register offset */
+ if ((raw_insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL) {
+ op_loc->offset = 0;
+ return;
+ }
+
+ /* Pre/Post-indexed */
+ if ((raw_insn & A64_INSN_LS_PREPOST_MASK) == A64_INSN_LS_PREPOST_VAL) {
+ op_loc->offset = arm64_get_immoff_prepost(raw_insn);
+ return;
+ }
+ }
+}
+
int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum)
{
if (perf_regnum < 0 || perf_regnum >= PERF_REG_ARM64_MAX)
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..c3f730d2fd88 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -129,6 +129,7 @@ int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, unsi
bool only_libdw_supported);
void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
+void get_arm64_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
#else /* HAVE_LIBDW_SUPPORT */
@@ -144,6 +145,12 @@ static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source _
{
return;
}
+
+static inline void get_arm64_regs(u32 raw_insn __maybe_unused, int is_source __maybe_unused,
+ struct annotated_op_loc *op_loc __maybe_unused)
+{
+ return;
+}
#endif
#endif
--
2.51.2.612.gdc70283dfc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
` (2 preceding siblings ...)
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim
5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
Add ARM64 support to the core dispatch and initialization points:
1. annotate_get_insn_location(): Add an arm64 branch alongside the
existing powerpc branch to call get_arm64_regs() for extracting
register numbers and memory offsets from raw instructions.
2. arch_supports_insn_tracking(): Include arm64 so that
find_data_type_block() can perform instruction-level type state
tracking on ARM64.
3. init_type_state(): Add arm64 branch to set caller-saved registers
(x0-x18 per AAPCS64) and stack register (SP, DWARF reg 31).
Without this, stack_reg defaults to 0 (x0) after memset, causing
x0-based memory accesses to be misidentified as stack accesses.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
tools/perf/util/annotate-data.c | 18 +++++++++++++++++-
tools/perf/util/annotate.c | 12 ++++++++----
2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 1eff0a27237d..c04ad66ff077 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -29,6 +29,11 @@
/* register number of the stack pointer */
#define X86_REG_SP 7
+/* ARM64 DWARF register numbers: x0-x30=0-30, SP=31 */
+#define ARM64_REG_SP 31
+#define ARM64_REG_LR 30
+#define ARM64_REG_FP 29
+
static void delete_var_types(struct die_var_type *var_types);
#define pr_debug_dtp(fmt, ...) \
@@ -178,6 +183,16 @@ static void init_type_state(struct type_state *state, const struct arch *arch)
state->ret_reg = 0;
state->stack_reg = X86_REG_SP;
}
+
+ if (arch__is_arm64(arch)) {
+ int i;
+
+ /* ARM64 ABI: x0-x18 are caller-saved */
+ for (i = 0; i <= 18; i++)
+ state->regs[i].caller_saved = true;
+ state->ret_reg = 0;
+ state->stack_reg = ARM64_REG_SP;
+ }
}
static void exit_type_state(struct type_state *state)
@@ -1421,7 +1436,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
static int arch_supports_insn_tracking(struct data_loc_info *dloc)
{
- if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)))
+ if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)) ||
+ (arch__is_arm64(dloc->arch)))
return 1;
return 0;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..bd734826538d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2574,19 +2574,23 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
op_loc->reg2 = -1;
if (insn_str == NULL) {
- if (!arch__is_powerpc(arch))
+ if (!arch__is_powerpc(arch) && !arch__is_arm64(arch))
continue;
}
/*
- * For powerpc, call get_powerpc_regs function which extracts the
- * required fields for op_loc, ie reg1, reg2, offset from the
- * raw instruction.
+ * For powerpc and arm64, call arch-specific functions to
+ * extract the required fields for op_loc (reg1, reg2, offset)
+ * from the raw instruction.
*/
if (arch__is_powerpc(arch)) {
op_loc->mem_ref = mem_ref;
op_loc->multi_regs = multi_regs;
get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
+ } else if (arch__is_arm64(arch)) {
+ op_loc->mem_ref = mem_ref;
+ op_loc->multi_regs = multi_regs;
+ get_arm64_regs(dl->raw.raw_insn, !i, op_loc);
} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
op_loc->mem_ref = true;
op_loc->multi_regs = multi_regs;
--
2.51.2.612.gdc70283dfc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
` (3 preceding siblings ...)
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
@ 2026-06-23 13:02 ` Shuai Xue
2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim
5 siblings, 0 replies; 7+ messages in thread
From: Shuai Xue @ 2026-06-23 13:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim
Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
Add data type profiling support for ARM64, enabling 'perf annotate
--code-with-type' to show which data types each memory instruction
accesses. This follows the PowerPC model of raw 32-bit instruction
decoding rather than x86's text-based operand parsing.
Instruction classification (check_arm64_insn):
Classify instructions by raw encoding into load/store, arithmetic
(add immediate, adrp), and register move categories. GP load/store
is detected by bits[27:25] pattern with LDR (literal) excluded to
avoid misinterpreting its different register field layout. ADRP, ADD
immediate (with ADDG/SUBG excluded via tighter mask), and MOV
(register) have their own mask/val pairs derived from the ARM ARM.
Load/store operand parsing:
Set mem_ref=true for all GP load/store instructions. Detect register
offset addressing mode to set multi_regs=true when Rm is used as a
second source operand.
Register state tracking (update_insn_state_arm64):
Track three instruction patterns for type propagation:
- ADRP Xd, #page: Compute the PC-relative page address using
sign_extend64() and either resolve the global variable type
directly or store the address as TSR_KIND_CONST for later
resolution by ADD.
- ADD Xd, Xn, #imm: If Xn holds an ADRP result (TSR_KIND_CONST),
compute the full variable address and resolve via
get_global_var_type(). This handles the common ARM64 global
variable access pattern: adrp+add+ldr.
- MOV Xd, Xm: Propagate type state including kind, offset, and
imm_value from the source to destination register.
Known limitations:
- The adrp+ldr pattern (without intermediate ADD, using lo12 folded
into the LDR offset) is not yet handled. This requires extending
check_matching_type() to resolve TSR_KIND_CONST with the load
offset, which can be added incrementally.
- Pointer chain tracking (load-from-memory propagating type to the
destination register) is not implemented, matching PowerPC's
current scope.
Architecture initialization:
Register the update_insn_state callback for instruction-level type
state tracking.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 332 ++++++++++++++++++
1 file changed, 332 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index b98aaf9a8a7b..887ed22c4ca0 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -1,12 +1,21 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/bitops.h>
#include <linux/compiler.h>
+#include <linux/kernel.h>
#include <errno.h>
+#include <inttypes.h>
#include <stdlib.h>
#include <string.h>
#include <linux/zalloc.h>
#include <regex.h>
#include "../annotate.h"
+#include "../debug.h"
#include "../disasm.h"
+#ifdef HAVE_LIBDW_SUPPORT
+#include "../annotate-data.h"
+#include "../map.h"
+#include "../symbol.h"
+#endif
struct arch_arm64 {
struct arch arch;
@@ -14,6 +23,47 @@ struct arch_arm64 {
regex_t jump_insn;
};
+/*
+ * ARM64 instruction encoding masks and values.
+ * Derived from ARM Architecture Reference Manual, C4.1 A64 encoding index.
+ *
+ * These mirror the definitions in arch/arm64/include/asm/insn.h but are
+ * duplicated here because that header depends on kernel-only macros
+ * (BUILD_BUG_ON, __always_inline).
+ */
+
+/* GP Load/Store: bit[27]=1, bit[26]=0 (GP, not SIMD/FP), bit[25]=0 */
+#define A64_INSN_GP_LS_MASK 0x0e000000
+#define A64_INSN_GP_LS_VAL 0x08000000
+
+/* LDR/LDRSW (literal): bits[29:27]=011, bit[26]=0 -- must be excluded from GP LS */
+#define A64_INSN_LDR_LIT_MASK 0x3b000000
+#define A64_INSN_LDR_LIT_VAL 0x18000000
+
+/*
+ * Load/Store register (register offset):
+ * bits[29:27]=111, bits[25:24]=00, bit[21]=1, bits[11:10]=10
+ */
+#define A64_INSN_LS_REG_OFF_MASK 0x3b200c00
+#define A64_INSN_LS_REG_OFF_VAL 0x38200800
+
+/* ADRP: mask=0x9F000000, val=0x90000000 */
+#define A64_INSN_ADRP_MASK 0x9f000000
+#define A64_INSN_ADRP_VAL 0x90000000
+
+/* ADD (immediate): mask=0x7F800000, val=0x11000000 (excludes ADDG/SUBG) */
+#define A64_INSN_ADD_IMM_MASK 0x7f800000
+#define A64_INSN_ADD_IMM_VAL 0x11000000
+
+/* MOV (register) = ORR Xd/Wd, XZR/WZR, Xm/Wm: Rn=11111, imm6=000000 */
+#define A64_INSN_MOV_REG_MASK 0x7fe0ffe0
+#define A64_INSN_MOV_REG_VAL 0x2a0003e0
+
+/* Instruction field extraction */
+#define A64_RT(insn) ((insn) & 0x1f)
+#define A64_RN(insn) (((insn) >> 5) & 0x1f)
+#define A64_RM(insn) (((insn) >> 16) & 0x1f)
+
static int arm64_mov__parse(const struct arch *arch __maybe_unused,
struct ins_operands *ops,
struct map_symbol *ms __maybe_unused,
@@ -69,6 +119,285 @@ static const struct ins_ops arm64_mov_ops = {
.scnprintf = mov__scnprintf,
};
+/*
+ * ARM64 load/store instruction parser.
+ * Sets mem_ref and multi_regs based on raw instruction encoding.
+ */
+static int arm64_load_store__parse(const struct arch *arch __maybe_unused,
+ struct ins_operands *ops,
+ struct map_symbol *ms __maybe_unused,
+ struct disasm_line *dl)
+{
+ u32 insn = dl->raw.raw_insn;
+
+ ops->source.mem_ref = true;
+ ops->source.multi_regs = false;
+
+ /* Load/Store register (register offset) uses Rm as second source */
+ if ((insn & A64_INSN_LS_REG_OFF_MASK) == A64_INSN_LS_REG_OFF_VAL)
+ ops->source.multi_regs = true;
+
+ ops->target.mem_ref = false;
+ ops->target.multi_regs = false;
+
+ return 0;
+}
+
+static int arm64_load_store__scnprintf(const struct ins *ins, char *bf,
+ size_t size,
+ struct ins_operands *ops,
+ int max_ins_name)
+{
+ return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+ ops->raw);
+}
+
+static const struct ins_ops arm64_load_store_ops = {
+ .parse = arm64_load_store__parse,
+ .scnprintf = arm64_load_store__scnprintf,
+};
+
+static int arm64_arithmetic__parse(const struct arch *arch __maybe_unused,
+ struct ins_operands *ops,
+ struct map_symbol *ms __maybe_unused,
+ struct disasm_line *dl __maybe_unused)
+{
+ ops->source.mem_ref = false;
+ ops->source.multi_regs = false;
+ ops->target.mem_ref = false;
+ ops->target.multi_regs = false;
+
+ return 0;
+}
+
+static int arm64_arithmetic__scnprintf(const struct ins *ins, char *bf,
+ size_t size,
+ struct ins_operands *ops,
+ int max_ins_name)
+{
+ return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+ ops->raw);
+}
+
+static const struct ins_ops arm64_arithmetic_ops = {
+ .parse = arm64_arithmetic__parse,
+ .scnprintf = arm64_arithmetic__scnprintf,
+};
+
+/*
+ * Classify ARM64 instructions by raw encoding for data type profiling.
+ */
+const struct ins_ops *check_arm64_insn(struct disasm_line *dl)
+{
+ u32 insn = dl->raw.raw_insn;
+
+ /* Exclude LDR/LDRSW (literal) before matching GP Load/Store */
+ if ((insn & A64_INSN_LDR_LIT_MASK) == A64_INSN_LDR_LIT_VAL)
+ return NULL;
+
+ if ((insn & A64_INSN_GP_LS_MASK) == A64_INSN_GP_LS_VAL)
+ return &arm64_load_store_ops;
+
+ if ((insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL)
+ return &arm64_arithmetic_ops;
+
+ if ((insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL)
+ return &arm64_arithmetic_ops;
+
+ if ((insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL)
+ return &arm64_arithmetic_ops;
+
+ return NULL;
+}
+
+#ifdef HAVE_LIBDW_SUPPORT
+
+static inline bool arm64_is_adrp(u32 insn)
+{
+ return (insn & A64_INSN_ADRP_MASK) == A64_INSN_ADRP_VAL;
+}
+
+static inline bool arm64_is_add_imm(u32 insn)
+{
+ return (insn & A64_INSN_ADD_IMM_MASK) == A64_INSN_ADD_IMM_VAL;
+}
+
+static inline bool arm64_is_mov_reg(u32 insn)
+{
+ return (insn & A64_INSN_MOV_REG_MASK) == A64_INSN_MOV_REG_VAL;
+}
+
+/*
+ * Compute the page address from an ADRP instruction.
+ * ADRP Xd, #imm: Xd = (PC & ~0xFFF) + (imm << 12)
+ * immhi = bits[23:5] (19 bits), immlo = bits[30:29] (2 bits)
+ * imm = sign_extend(immhi:immlo, 21)
+ */
+static u64 arm64_adrp_target(u64 pc, u32 insn)
+{
+ u64 immhi = (insn >> 5) & 0x7ffff;
+ u64 immlo = (insn >> 29) & 0x3;
+ u64 imm = (immhi << 2) | immlo;
+
+ return (pc & ~0xfffULL) + (sign_extend64(imm, 20) << 12);
+}
+
+/*
+ * Track register state for ARM64 instructions.
+ *
+ * Handles three instruction patterns:
+ *
+ * 1. ADRP Xd, #page - computes a PC-relative page address.
+ * Track the computed address so a subsequent LDR can resolve
+ * the global variable.
+ *
+ * 2. ADD Xd, Xn, #imm - if Xn holds a tracked address (from ADRP),
+ * propagate the adjusted address to Xd.
+ *
+ * 3. MOV Xd, Xm - propagate type state from Xm to Xd.
+ */
+static void update_insn_state_arm64(struct type_state *state,
+ struct data_loc_info *dloc,
+ Dwarf_Die *cu_die,
+ struct disasm_line *dl)
+{
+ u32 insn = dl->raw.raw_insn;
+ int rd, rn;
+ struct type_state_reg *tsr;
+
+ if (arm64_is_adrp(insn)) {
+ u64 pc, page_addr;
+ int offset;
+ Dwarf_Die type_die;
+
+ rd = A64_RT(insn);
+ if (!has_reg_type(state, rd))
+ return;
+
+ tsr = &state->regs[rd];
+
+ pc = map__rip_2objdump(dloc->ms->map,
+ dloc->ms->sym->start + dl->al.offset);
+ page_addr = arm64_adrp_target(pc, insn);
+
+ /*
+ * Try to resolve the global variable at this page address.
+ * If not found, store it as a constant for later ADD resolution.
+ */
+ if (get_global_var_type(cu_die, dloc,
+ dloc->ms->sym->start + dl->al.offset,
+ page_addr, &offset, &type_die)) {
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_POINTER;
+ tsr->offset = offset;
+ tsr->ok = true;
+
+ pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d",
+ (u32)dl->al.offset, page_addr, rd);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ } else {
+ tsr->kind = TSR_KIND_CONST;
+ tsr->imm_value = page_addr;
+ tsr->ok = true;
+
+ pr_debug_dtp("adrp [%x] page=%#"PRIx64" -> reg%d\n",
+ (u32)dl->al.offset, page_addr, rd);
+ }
+ return;
+ }
+
+ if (arm64_is_add_imm(insn)) {
+ int imm12, shift;
+ u64 var_addr;
+ int offset;
+ Dwarf_Die type_die;
+
+ rd = A64_RT(insn);
+ rn = A64_RN(insn);
+
+ if (!has_reg_type(state, rd) || !has_reg_type(state, rn))
+ return;
+
+ tsr = &state->regs[rd];
+
+ if (!state->regs[rn].ok) {
+ tsr->ok = false;
+ return;
+ }
+
+ imm12 = (insn >> 10) & 0xfff;
+ shift = ((insn >> 22) & 0x1) ? 12 : 0;
+
+ /*
+ * If Rn holds an ADRP result (TSR_KIND_CONST), compute
+ * the full address and try to resolve the global variable.
+ */
+ if (state->regs[rn].kind == TSR_KIND_CONST) {
+ var_addr = state->regs[rn].imm_value +
+ ((u64)imm12 << shift);
+
+ if (get_global_var_type(cu_die, dloc,
+ dloc->ms->sym->start + dl->al.offset,
+ var_addr, &offset, &type_die)) {
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_POINTER;
+ tsr->offset = offset;
+ tsr->ok = true;
+
+ pr_debug_dtp("add [%x] global addr=%#"PRIx64" -> reg%d",
+ (u32)dl->al.offset, var_addr, rd);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+ }
+
+ /* Otherwise propagate existing type with adjusted offset */
+ if (state->regs[rn].kind == TSR_KIND_TYPE ||
+ state->regs[rn].kind == TSR_KIND_POINTER) {
+ tsr->type = state->regs[rn].type;
+ tsr->kind = state->regs[rn].kind;
+ tsr->offset = state->regs[rn].offset + (imm12 << shift);
+ tsr->ok = true;
+
+ pr_debug_dtp("add [%x] imm=%#x reg%d -> reg%d",
+ (u32)dl->al.offset, imm12 << shift, rn, rd);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ } else {
+ tsr->ok = false;
+ }
+ return;
+ }
+
+ if (arm64_is_mov_reg(insn)) {
+ int rm;
+
+ rd = A64_RT(insn);
+ rm = A64_RM(insn);
+
+ if (!has_reg_type(state, rd))
+ return;
+
+ tsr = &state->regs[rd];
+
+ if (!has_reg_type(state, rm) || !state->regs[rm].ok) {
+ tsr->ok = false;
+ return;
+ }
+
+ tsr->type = state->regs[rm].type;
+ tsr->kind = state->regs[rm].kind;
+ tsr->offset = state->regs[rm].offset;
+ tsr->imm_value = state->regs[rm].imm_value;
+ tsr->ok = true;
+
+ pr_debug_dtp("mov [%x] reg%d -> reg%d",
+ (u32)dl->al.offset, rm, rd);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+}
+#endif /* HAVE_LIBDW_SUPPORT */
+
static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
{
struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
@@ -105,6 +434,9 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
arch->objdump.skip_functions_char = '+';
arch->associate_instruction_ops = arm64__associate_instruction_ops;
annotate_opts.show_asm_raw = true;
+#ifdef HAVE_LIBDW_SUPPORT
+ arch->update_insn_state = update_insn_state_arm64;
+#endif
/* bl, blr */
err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
--
2.51.2.612.gdc70283dfc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
` (4 preceding siblings ...)
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
@ 2026-06-23 16:56 ` Namhyung Kim
5 siblings, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2026-06-23 16:56 UTC (permalink / raw)
To: Shuai Xue
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Zecheng Li, linux-perf-users,
linux-kernel
Hello,
On Tue, Jun 23, 2026 at 09:02:29PM +0800, Shuai Xue wrote:
> `perf test -v "perf data type profiling tests"` fails on ARM64:
>
> Basic Rust perf annotate test
> perf mem record -o /tmp/perf.data perf test -w code_with_type
> perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
> Basic annotate [Failed: missing target data type]
>
> The root cause is that ARM64 lacks the instruction parsing infrastructure
> required for data type profiling. Specifically:
>
> 1. annotate_get_insn_location() cannot extract register numbers and
> memory offsets from ARM64 load/store instructions, because ARM64
> does not set objdump.register_char or objdump.memory_ref_char
> (unlike x86 which uses '%' and '(').
>
> 2. arch_supports_insn_tracking() does not include ARM64, so
> find_data_type_block() cannot perform instruction-level type state
> tracking.
>
> 3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
> after memset, which causes x0-based memory accesses to be
> misidentified as stack accesses.
>
> As a result, perf annotate --code-with-type silently produces no type
> annotations on ARM64, and the test grep for "# data-type: struct Buf"
> fails.
>
> This series adds ARM64 data type profiling support following the PowerPC
> model: decode raw 32-bit instruction words rather than parsing objdump
> text. ARM64's fixed-width encoding and trivial DWARF register mapping
> (x0-x30 = DWARF 0-30) make this approach clean and robust.
>
> Three classes of instructions are tracked for register state propagation:
> - ADRP: compute PC-relative page address for global variable resolution
> - ADD (immediate): combine with ADRP result to form full variable address
> - MOV (register): propagate type state between registers
>
> This covers the common `adrp + add + ldr/str` pattern that ARM64
> compilers emit for global variable access.
>
> Known limitations:
> - The `adrp + ldr` pattern (with :lo12: folded into the load offset,
> without an intermediate ADD) is not yet handled. This requires
> extending check_matching_type() to resolve TSR_KIND_CONST with the
> load offset, which can be added incrementally.
> - Pointer chain tracking (load-from-memory propagating type to the
> destination register) is not implemented, matching PowerPC's current
> scope.
>
> Testing:
> All four sub-tests in `perf test "perf data type profiling tests"`
> pass reliably on ARM64 (AArch64, SPE-capable hardware):
> - Basic/Pipe Rust: struct Buf (code_with_type workload)
> - Basic/Pipe C: struct buf (datasym workload, global variable)
>
> Patch breakdown:
> 1/5 Widen type_state_reg::imm_value from u32 to u64 (prerequisite
> for storing 64-bit addresses from ADRP)
> 2/5 Add arch__is_arm64() detection, raw instruction parsing from
> objdump output, and enable show_asm_raw for ARM64
> 3/5 Add get_arm64_regs() to extract registers and memory offsets
> from load/store instruction encodings (4 addressing modes)
> 4/5 Wire up ARM64 in annotate_get_insn_location(),
> arch_supports_insn_tracking(), and init_type_state()
> 5/5 Main patch: instruction classification, ADRP/ADD/MOV register
> state tracking, and architecture initialization
>
> Shuai Xue (5):
> perf annotate-data: Widen type_state_reg::imm_value to u64
> perf disasm: Add ARM64 architecture detection and raw instruction
> parsing
> perf dwarf-regs: Add ARM64 register and offset extraction from raw
> instructions
> perf annotate: Wire up ARM64 data type profiling infrastructure
> perf annotate-arch: Add ARM64 data type profiling support
Thanks for the contribution!
There was another series on this, please take a look. I hope you guys
can collaborate.
https://lore.kernel.org/r/20260403094800.1418825-1-wutengda@huaweicloud.com
Thanks,
Namhyung
>
> .../perf/util/annotate-arch/annotate-arm64.c | 333 ++++++++++++++++++
> tools/perf/util/annotate-arch/annotate-x86.c | 2 +-
> tools/perf/util/annotate-data.c | 18 +-
> tools/perf/util/annotate-data.h | 2 +-
> tools/perf/util/annotate.c | 12 +-
> tools/perf/util/disasm.c | 64 ++++
> tools/perf/util/disasm.h | 2 +
> .../util/dwarf-regs-arch/dwarf-regs-arm64.c | 125 +++++++
> tools/perf/util/include/dwarf-regs.h | 7 +
> 9 files changed, 558 insertions(+), 7 deletions(-)
>
> --
> 2.51.2.612.gdc70283dfc
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-23 16:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox