linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Add data type profiling support for arm64
@ 2025-03-14 16:21 Li Huafei
  2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

Hi,

This patchset supports arm64 perf data type profiling. Data type
profiling was introduced by Namhyung [1], which associates PMU sampling
(here referring to memory access-related event sampling) with the
referenced data types, providing developers with an effective tool for
analyzing the impact of memory usage and layout. For more detailed
background, please refer to [2].

Namhyung initially supported this feature only on x86, and later Athira
added support for it on powerpc [3]. Unlike the x86 implementation, the
powerpc implementation parses operands directly from raw instruction
code instead of using the results from assembler disassembly. As Athira
mentioned, this is mainly because not all memory access instructions on
powerpc have explicit memory reference assembler notations '()' in their
assembly code. On arm64, all memory access instructions have the
notation '[]', so my implementation is similar to x86, using the
disassembly results from objdump, llvm, or libcapstone, and parsing
based on strings. I believe this has the advantage of reusing the
complex instruction parsing logic of the assembler, but it may not
perform as well as raw instruction parsing in terms of efficiency.

Below is a brief description of this patchset:
 - Patch 1 first identifies load and store instructions and provides a
   parsing function.
 - Patches 2-3 are refactoring patches. They primarily move the code for
   extracting registers and offsets to specific architecture
   implementations. Additionally, a new callback function
   'extract_reg_offset' is introduced to avoid having too many
   architecture-specific implementations in the function
   'annotate_get_insn_location()'.
 - Patch 4 implements the extract_reg_offset callback for arm64.
   Currently, it does not support parsing instructions with register
   pairs or register offsets in operands. Register pairs often appear in
   stack push/pop instructions, and register offsets are common when
   accessing per-CPU variables, both of which require special handling.
 - Patch 5 adds support for instruction tracing on arm64, primarily
   addressing the issue where DWARF does not generate information for
   intermediate pointers in pointer chains.
 - Patches 6-7 further enhance instruction tracing. Patch 6 supports
   parsing accesses to global variables, while Patch 7 focuses on
   resolving accesses to the kernel's current pointer.

There are still areas for improvement in the current implementation:
 - Support more types of memory access instructions, such as those
   involving register pairs and register offsets.
 - Handle all data processing instructions (e.g., mov, add), as these
   instructions can change the state of registers and may affect the
   accuracy of instruction tracking.
 - Supporting parsing of special memory access scenarios like per-CPU
   variables and arrays.

The patch set is based on 6.14-rc6 (commit 80e54e84911a). After applying
this patch set, the date type profiling results on arm64 are as follows
(SPE support is required):

 # perf mem record -a -K -- sleep 1
 # perf annotate --data-type --type-stat --stdio
 Only instruction-based sampling period is currently supported by Arm SPE.
 Annotate data type stats:
 total 556, ok 357 (64.2%), bad 199 (35.8%)
 -----------------------------------------------------------
         10 : no_sym
         36 : no_insn_ops
         65 : no_var
         70 : no_typeinfo
         18 : bad_offset
         59 : insn_track
 
 Annotate type: 'struct rq' in [kernel.kallsyms] (29 samples):
 ============================================================================
  Percent     offset       size  field
   100.00          0      0xe80  struct rq        {
     0.00          0        0x4      raw_spinlock_t      __lock {
     0.00          0        0x4          arch_spinlock_t raw_lock {
     0.00          0        0x4              union        {
     0.00          0        0x4                  atomic_t        val {
     0.00          0        0x4                      int counter;
                                                 };
     0.00          0        0x2                  struct   {
     0.00          0        0x1                      u8  locked;
     0.00        0x1        0x1                      u8  pending;
                                                 };
     0.00          0        0x4                  struct   {
     0.00          0        0x2                      u16 locked_pending;
     0.00        0x2        0x2                      u16 tail;
                                                 };
                                             };
                                         };
                                     };
    13.79        0x4        0x4      unsigned int        nr_running;
    13.79        0x8        0x4      unsigned int        nr_numa_running;
     0.00        0xc        0x4      unsigned int        nr_preferred_running;
     0.00       0x10        0x4      unsigned int        numa_migrate_on;
     0.00       0x18        0x8      long unsigned int   last_blocked_load_update_tick;
     0.00       0x20        0x4      unsigned int        has_blocked_load;
     0.00       0x40       0x20      call_single_data_t  nohz_csd {
     0.00       0x40       0x10          struct __call_single_node       node {
     0.00       0x40        0x8              struct llist_node   llist {
     0.00       0x40        0x8                  struct llist_node*      next;
                                             };
     0.00       0x48        0x4              union        {
     0.00       0x48        0x4                  unsigned int    u_flags;
     0.00       0x48        0x4                  atomic_t        a_flags {
     0.00       0x48        0x4                      int counter;
                                                 };
                                             };
     ...

Thanks,
Huafei

[1] https://lore.kernel.org/lkml/20231213001323.718046-1-namhyung@kernel.org/
[2] https://lwn.net/Articles/955709/
[3] https://lore.kernel.org/all/20240718084358.72242-1-atrajeev@linux.vnet.ibm.com/#r

Li Huafei (7):
  perf annotate: Handle arm64 load and store instructions
  perf annotate: Advance the mem_ref check to mov__parse()
  perf annotate: Add 'extract_reg_offset' callback function to extract
    register number and access offset
  perf annotate: Support for the 'extract_reg_offset' callback function
    in arm64
  perf annotate-data: Support instruction tracking for arm64
  perf annotate-data: Handle arm64 global variable access
  perf annotate-data: Handle the access to the 'current' pointer on
    arm64

 tools/perf/arch/arm64/annotate/instructions.c | 302 +++++++++++++++++-
 .../perf/arch/powerpc/annotate/instructions.c |  10 +
 tools/perf/arch/x86/annotate/instructions.c   |  99 ++++++
 tools/perf/util/Build                         |   1 +
 tools/perf/util/annotate-data.c               |  23 +-
 tools/perf/util/annotate-data.h               |   4 +-
 tools/perf/util/annotate.c                    | 112 +------
 tools/perf/util/disasm.c                      |  14 +
 tools/perf/util/disasm.h                      |   4 +
 tools/perf/util/dwarf-regs-arm64.c            |  25 ++
 tools/perf/util/include/dwarf-regs.h          |   7 +
 11 files changed, 490 insertions(+), 111 deletions(-)
 create mode 100644 tools/perf/util/dwarf-regs-arm64.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] perf annotate: Handle arm64 load and store instructions
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18  1:32   ` Namhyung Kim
  2025-03-18 17:15   ` Leo Yan
  2025-03-14 16:21 ` [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse() Li Huafei
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

Add ldst_ops to handle load and store instructions in order to parse
the data types and offsets associated with PMU events for memory access
instructions. There are many variants of load and store instructions in
ARM64, making it difficult to match all of these instruction names
completely. Therefore, only the instruction prefixes are matched. The
prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
matches atomic instructions, and 'prf' matches memory prefetch
instructions.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/arch/arm64/annotate/instructions.c | 67 ++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index d465d093e7eb..c212eb7341bd 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -6,7 +6,8 @@
 
 struct arm64_annotate {
 	regex_t call_insn,
-		jump_insn;
+		jump_insn,
+		ldst_insn; /* load and store instruction */
 };
 
 static int arm64_mov__parse(struct arch *arch __maybe_unused,
@@ -67,6 +68,57 @@ static struct ins_ops arm64_mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+static int arm64_ldst__parse(struct arch *arch __maybe_unused,
+			     struct ins_operands *ops,
+			     struct map_symbol *ms __maybe_unused,
+			     struct disasm_line *dl __maybe_unused)
+{
+	char *s, *target;
+
+	/*
+	 * The part starting from the memory access annotation '[' is parsed
+	 * as 'target', while the part before it is parsed as 'source'.
+	 */
+	target = s = strchr(ops->raw, '[');
+	if (!s)
+		return -1;
+
+	while (s > ops->raw && *s != ',')
+		--s;
+
+	if (s == ops->raw)
+		return -1;
+
+	*s = '\0';
+	ops->source.raw = strdup(ops->raw);
+
+	*s = ',';
+	if (!ops->source.raw)
+		return -1;
+
+	ops->target.raw = strdup(target);
+	if (!ops->target.raw) {
+		zfree(ops->source.raw);
+		return -1;
+	}
+	ops->target.mem_ref = true;
+
+	return 0;
+}
+
+static int ldst__scnprintf(struct ins *ins, char *bf, size_t size,
+			   struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s,%s", max_ins_name, ins->name,
+			 ops->source.name ?: ops->source.raw,
+			 ops->target.name ?: ops->target.raw);
+}
+
+static struct ins_ops arm64_ldst_ops = {
+	.parse	   = arm64_ldst__parse,
+	.scnprintf = ldst__scnprintf,
+};
+
 static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
 {
 	struct arm64_annotate *arm = arch->priv;
@@ -77,6 +129,8 @@ static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const
 		ops = &jump_ops;
 	else if (!regexec(&arm->call_insn, name, 2, match, 0))
 		ops = &call_ops;
+	else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
+		ops = &arm64_ldst_ops;
 	else if (!strcmp(name, "ret"))
 		ops = &ret_ops;
 	else
@@ -107,6 +161,15 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 		      REG_EXTENDED);
 	if (err)
 		goto out_free_call;
+	/*
+	 * The ARM64 architecture has many variants of load/store instructions.
+	 * It is quite challenging to match all of them completely. Here, we
+	 * only match the prefixes of these instructions.
+	 */
+	err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
+		      REG_EXTENDED);
+	if (err)
+		goto out_free_jump;
 
 	arch->initialized = true;
 	arch->priv	  = arm;
@@ -117,6 +180,8 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 	arch->e_flags = 0;
 	return 0;
 
+out_free_jump:
+	regfree(&arm->jump_insn);
 out_free_call:
 	regfree(&arm->call_insn);
 out_free_arm:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse()
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
  2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18 18:02   ` Leo Yan
  2025-03-14 16:21 ` [PATCH 3/7] perf annotate: Add 'extract_reg_offset' callback function to extract register number and access offset Li Huafei
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

Advance the mem_ref check on x86 to mov__parse(), along with the
multi_reg check, to make annotate_get_insn_location() more concise.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/util/annotate.c | 9 ++++-----
 tools/perf/util/disasm.c   | 8 ++++++++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 31bb326b07a6..860ea6c72411 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2442,18 +2442,17 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 				continue;
 		}
 
+		op_loc->mem_ref = mem_ref;
+		op_loc->multi_regs = multi_regs;
+
 		/*
 		 * For powerpc, call get_powerpc_regs function which extracts the
 		 * required fields for op_loc, ie reg1, reg2, offset from the
 		 * raw instruction.
 		 */
 		if (arch__is(arch, "powerpc")) {
-			op_loc->mem_ref = mem_ref;
-			op_loc->multi_regs = multi_regs;
 			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
-		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
-			op_loc->mem_ref = true;
-			op_loc->multi_regs = multi_regs;
+		} else if (mem_ref) {
 			extract_reg_offset(arch, insn_str, op_loc);
 		} else {
 			char *s, *p = NULL;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 50c5c206b70e..d91526cff9df 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -607,6 +607,12 @@ static bool check_multi_regs(struct arch *arch, const char *op)
 	return count > 1;
 }
 
+/* Check whether the operand accesses memory. */
+static bool check_memory_ref(struct arch *arch, const char *op)
+{
+	return strchr(op, arch->objdump.memory_ref_char) != NULL;
+}
+
 static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
 		struct disasm_line *dl __maybe_unused)
 {
@@ -635,6 +641,7 @@ static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_sy
 	if (ops->source.raw == NULL)
 		return -1;
 
+	ops->source.mem_ref = check_memory_ref(arch, ops->source.raw);
 	ops->source.multi_regs = check_multi_regs(arch, ops->source.raw);
 
 	target = skip_spaces(++s);
@@ -657,6 +664,7 @@ static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_sy
 	if (ops->target.raw == NULL)
 		goto out_free_source;
 
+	ops->target.mem_ref = check_memory_ref(arch, ops->target.raw);
 	ops->target.multi_regs = check_multi_regs(arch, ops->target.raw);
 
 	if (comment == NULL)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] perf annotate: Add 'extract_reg_offset' callback function to extract register number and access offset
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
  2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
  2025-03-14 16:21 ` [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse() Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-14 16:21 ` [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64 Li Huafei
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

The assembly syntax for memory access instructions varies significantly
across different architectures, which makes it difficult to reuse the
code for extracting register numbers and access offsets in the function
annotate_get_insn_location().

To simplify the code, the extraction of register numbers and access
offsets from operands is written as a callback function for the
architecture, facilitating the implementation of architecture-specific
extraction logic.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 .../perf/arch/powerpc/annotate/instructions.c |  10 ++
 tools/perf/arch/x86/annotate/instructions.c   |  99 ++++++++++++++++
 tools/perf/util/annotate.c                    | 107 +-----------------
 tools/perf/util/disasm.c                      |   2 +
 tools/perf/util/disasm.h                      |   4 +
 5 files changed, 117 insertions(+), 105 deletions(-)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index ca567cfdcbdb..fd6516890f3b 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/compiler.h>
+#include <dwarf-regs.h>
 
 static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, const char *name)
 {
@@ -315,3 +316,12 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 
 	return 0;
 }
+
+static int extract_reg_offset_powerpc(struct arch *arch __maybe_unused,
+				      struct disasm_line *dl,
+				      const char *insn_str __maybe_unused, int insn_ops,
+				      struct annotated_op_loc *op_loc)
+{
+	get_powerpc_regs(dl->raw.raw_insn, insn_ops == INSN_OP_SOURCE, op_loc);
+	return 0;
+}
diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c
index ae94b1f0b9cc..83e0fc4b9788 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -596,3 +596,102 @@ static void update_insn_state_x86(struct type_state *state,
 	/* Case 4. memory to memory transfers (not handled for now) */
 }
 #endif
+
+/*
+ * Get register number and access offset from the given instruction.
+ * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
+ * to revisit the format when it handles different architecture.
+ * Fills @reg and @offset when return 0.
+ */
+static int extract_reg_offset(struct arch *arch, const char *str,
+			      struct annotated_op_loc *op_loc)
+{
+	char *p;
+	char *regname;
+
+	if (arch->objdump.register_char == 0)
+		return -1;
+
+	/*
+	 * It should start from offset, but it's possible to skip 0
+	 * in the asm.  So 0(%rax) should be same as (%rax).
+	 *
+	 * However, it also start with a segment select register like
+	 * %gs:0x18(%rbx).  In that case it should skip the part.
+	 */
+	if (*str == arch->objdump.register_char) {
+		/* FIXME: Handle other segment registers */
+		if (!strncmp(str, "%gs:", 4))
+			op_loc->segment = INSN_SEG_X86_GS;
+
+		while (*str && !isdigit(*str) &&
+		       *str != arch->objdump.memory_ref_char)
+			str++;
+	}
+
+	op_loc->offset = strtol(str, &p, 0);
+
+	p = strchr(p, arch->objdump.register_char);
+	if (p == NULL)
+		return -1;
+
+	regname = strdup(p);
+	if (regname == NULL)
+		return -1;
+
+	op_loc->reg1 = get_dwarf_regnum(regname, arch->e_machine, arch->e_flags);
+	free(regname);
+
+	/* Get the second register */
+	if (op_loc->multi_regs) {
+		p = strchr(p + 1, arch->objdump.register_char);
+		if (p == NULL)
+			return -1;
+
+		regname = strdup(p);
+		if (regname == NULL)
+			return -1;
+
+		op_loc->reg2 = get_dwarf_regnum(regname, arch->e_machine, arch->e_flags);
+		free(regname);
+	}
+	return 0;
+}
+
+static int extract_reg_offset_x86(struct arch *arch, struct disasm_line *dl __maybe_unused,
+				  const char *insn_str, int insn_ops __maybe_unused,
+				  struct annotated_op_loc *op_loc)
+{
+	if (insn_str == NULL)
+		return 0;
+
+	if (op_loc->mem_ref) {
+		extract_reg_offset(arch, insn_str, op_loc);
+	} else {
+		char *s, *p = NULL;
+
+		/* FIXME: Handle other segment registers */
+		if (!strncmp(insn_str, "%gs:", 4)) {
+			op_loc->segment = INSN_SEG_X86_GS;
+			op_loc->offset = strtol(insn_str + 4, &p, 0);
+			if (p && p != insn_str + 4)
+				op_loc->imm = true;
+			return 0;
+		}
+
+		s = strdup(insn_str);
+		if (s == NULL)
+			return -1;
+
+		if (*s == arch->objdump.register_char)
+			op_loc->reg1 = get_dwarf_regnum(s, arch->e_machine, arch->e_flags);
+		else if (*s == arch->objdump.imm_char) {
+			op_loc->offset = strtol(s + 1, &p, 0);
+			if (p && p != s + 1)
+				op_loc->imm = true;
+		}
+		free(s);
+	}
+
+	return 0;
+}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 860ea6c72411..288200e4b2b5 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2318,69 +2318,6 @@ int annotate_check_args(void)
 	return 0;
 }
 
-/*
- * Get register number and access offset from the given instruction.
- * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
- * to revisit the format when it handles different architecture.
- * Fills @reg and @offset when return 0.
- */
-static int extract_reg_offset(struct arch *arch, const char *str,
-			      struct annotated_op_loc *op_loc)
-{
-	char *p;
-	char *regname;
-
-	if (arch->objdump.register_char == 0)
-		return -1;
-
-	/*
-	 * It should start from offset, but it's possible to skip 0
-	 * in the asm.  So 0(%rax) should be same as (%rax).
-	 *
-	 * However, it also start with a segment select register like
-	 * %gs:0x18(%rbx).  In that case it should skip the part.
-	 */
-	if (*str == arch->objdump.register_char) {
-		if (arch__is(arch, "x86")) {
-			/* FIXME: Handle other segment registers */
-			if (!strncmp(str, "%gs:", 4))
-				op_loc->segment = INSN_SEG_X86_GS;
-		}
-
-		while (*str && !isdigit(*str) &&
-		       *str != arch->objdump.memory_ref_char)
-			str++;
-	}
-
-	op_loc->offset = strtol(str, &p, 0);
-
-	p = strchr(p, arch->objdump.register_char);
-	if (p == NULL)
-		return -1;
-
-	regname = strdup(p);
-	if (regname == NULL)
-		return -1;
-
-	op_loc->reg1 = get_dwarf_regnum(regname, arch->e_machine, arch->e_flags);
-	free(regname);
-
-	/* Get the second register */
-	if (op_loc->multi_regs) {
-		p = strchr(p + 1, arch->objdump.register_char);
-		if (p == NULL)
-			return -1;
-
-		regname = strdup(p);
-		if (regname == NULL)
-			return -1;
-
-		op_loc->reg2 = get_dwarf_regnum(regname, arch->e_machine, arch->e_flags);
-		free(regname);
-	}
-	return 0;
-}
-
 /**
  * annotate_get_insn_location - Get location of instruction
  * @arch: the architecture info
@@ -2437,51 +2374,11 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 		op_loc->reg1 = -1;
 		op_loc->reg2 = -1;
 
-		if (insn_str == NULL) {
-			if (!arch__is(arch, "powerpc"))
-				continue;
-		}
-
 		op_loc->mem_ref = mem_ref;
 		op_loc->multi_regs = multi_regs;
 
-		/*
-		 * For powerpc, call get_powerpc_regs function which extracts the
-		 * required fields for op_loc, ie reg1, reg2, offset from the
-		 * raw instruction.
-		 */
-		if (arch__is(arch, "powerpc")) {
-			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
-		} else if (mem_ref) {
-			extract_reg_offset(arch, insn_str, op_loc);
-		} else {
-			char *s, *p = NULL;
-
-			if (arch__is(arch, "x86")) {
-				/* FIXME: Handle other segment registers */
-				if (!strncmp(insn_str, "%gs:", 4)) {
-					op_loc->segment = INSN_SEG_X86_GS;
-					op_loc->offset = strtol(insn_str + 4,
-								&p, 0);
-					if (p && p != insn_str + 4)
-						op_loc->imm = true;
-					continue;
-				}
-			}
-
-			s = strdup(insn_str);
-			if (s == NULL)
-				return -1;
-
-			if (*s == arch->objdump.register_char)
-				op_loc->reg1 = get_dwarf_regnum(s, arch->e_machine, arch->e_flags);
-			else if (*s == arch->objdump.imm_char) {
-				op_loc->offset = strtol(s + 1, &p, 0);
-				if (p && p != s + 1)
-					op_loc->imm = true;
-			}
-			free(s);
-		}
+		if (arch->extract_reg_offset(arch, dl, insn_str, i, op_loc))
+			return -1;
 	}
 
 	return 0;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index d91526cff9df..905eceb824a4 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -155,6 +155,7 @@ static struct arch architectures[] = {
 #ifdef HAVE_LIBDW_SUPPORT
 		.update_insn_state = update_insn_state_x86,
 #endif
+		.extract_reg_offset = extract_reg_offset_x86,
 	},
 	{
 		.name = "powerpc",
@@ -162,6 +163,7 @@ static struct arch architectures[] = {
 #ifdef HAVE_LIBDW_SUPPORT
 		.update_insn_state = update_insn_state_powerpc,
 #endif
+		.extract_reg_offset = extract_reg_offset_powerpc,
 	},
 	{
 		.name = "riscv64",
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index c135db2416b5..44ac5aa892f7 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -16,6 +16,7 @@ struct symbol;
 struct data_loc_info;
 struct type_state;
 struct disasm_line;
+struct annotated_op_loc;
 
 struct arch {
 	const char	*name;
@@ -44,6 +45,9 @@ struct arch {
 				struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				struct disasm_line *dl);
 #endif
+	int		(*extract_reg_offset)(struct arch *arch, struct disasm_line *dl,
+					      const char *insn_str, int insn_ops,
+					      struct annotated_op_loc *op_loc);
 	/** @e_machine: ELF machine associated with arch. */
 	unsigned int e_machine;
 	/** @e_flags: Optional ELF flags associated with arch. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
                   ` (2 preceding siblings ...)
  2025-03-14 16:21 ` [PATCH 3/7] perf annotate: Add 'extract_reg_offset' callback function to extract register number and access offset Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18  1:45   ` Namhyung Kim
  2025-03-14 16:21 ` [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64 Li Huafei
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

At present, only the following two addressing modes are supported:

 1. Base register only (no offset): [base{, #0}]
 2. Base plus offset (immediate): [base{, #imm}]

For addressing modes where the offset needs to be calculated from the
register value, it is difficult to know the specific value of the offset
register, making it impossible to calculate the offset.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/arch/arm64/annotate/instructions.c | 62 +++++++++++++++++++
 tools/perf/util/Build                         |  1 +
 tools/perf/util/disasm.c                      |  1 +
 tools/perf/util/dwarf-regs-arm64.c            | 25 ++++++++
 tools/perf/util/include/dwarf-regs.h          |  7 +++
 5 files changed, 96 insertions(+)
 create mode 100644 tools/perf/util/dwarf-regs-arm64.c

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index c212eb7341bd..54497b72a5c5 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -188,3 +188,65 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 	free(arm);
 	return SYMBOL_ANNOTATE_ERRNO__ARCH_INIT_REGEXP;
 }
+
+
+/*
+ * Get the base register number and access offset in load/store instructions.
+ * At present, only the following two addressing modes are supported:
+ *
+ *  1. Base register only (no offset): [base{, #0}]
+ *  2. Base plus offset (immediate): [base{, #imm}]
+ *
+ * For addressing modes where the offset needs to be calculated from the
+ * register value, it is difficult to know the specific value of the offset
+ * register, making it impossible to calculate the offset.
+ *
+ * Fills @reg and @offset when return 0.
+ */
+static int
+extract_reg_offset_arm64(struct arch *arch __maybe_unused,
+			 struct disasm_line *dl __maybe_unused,
+			 const char *insn_str, int insn_ops __maybe_unused,
+			 struct annotated_op_loc *op_loc)
+{
+	char *str;
+	regmatch_t match[4];
+	static regex_t reg_off_regex;
+	static bool regex_compiled;
+
+	if (!regex_compiled) {
+		regcomp(&reg_off_regex, "^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",
+			REG_EXTENDED);
+		regex_compiled = true;
+	}
+
+	if (!op_loc->mem_ref)
+		return 0;
+
+	if (regexec(&reg_off_regex, insn_str, 4, match, 0))
+		return -1;
+
+	str = strdup(insn_str);
+	if (!str)
+		return -1;
+
+	/* Get the base register number. */
+	str[match[1].rm_eo] = '\0';
+	op_loc->reg1 = get_arm64_regnum(str + match[1].rm_so);
+
+	/*
+	 * If there is an immediate offset, match[2] records the start and end
+	 * positions of "#imm".
+	 */
+	if (match[2].rm_so == -1) {
+		free(str);
+		return 0;
+	}
+
+	/* Get the immediate offset. */
+	str[match[3].rm_eo] = '\0';
+	op_loc->offset = strtol(str + match[3].rm_so, NULL, 0);
+
+	free(str);
+	return 0;
+}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 5ec97e8d6b6d..d408cbe94fdd 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -210,6 +210,7 @@ perf-util-$(CONFIG_LIBDW) += dwarf-regs.o
 perf-util-$(CONFIG_LIBDW) += dwarf-regs-csky.o
 perf-util-$(CONFIG_LIBDW) += dwarf-regs-powerpc.o
 perf-util-$(CONFIG_LIBDW) += dwarf-regs-x86.o
+perf-util-$(CONFIG_LIBDW) += dwarf-regs-arm64.o
 perf-util-$(CONFIG_LIBDW) += debuginfo.o
 perf-util-$(CONFIG_LIBDW) += annotate-data.o
 
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 905eceb824a4..1035c60a8545 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -128,6 +128,7 @@ static struct arch architectures[] = {
 	{
 		.name = "arm64",
 		.init = arm64__annotate_init,
+		.extract_reg_offset = extract_reg_offset_arm64,
 	},
 	{
 		.name = "csky",
diff --git a/tools/perf/util/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arm64.c
new file mode 100644
index 000000000000..edf41c059967
--- /dev/null
+++ b/tools/perf/util/dwarf-regs-arm64.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Mapping of DWARF debug register numbers into register names.
+ *
+ * Copyright (c) 2025  Huawei Inc, Li Huafei <lihuafei1@huawei.com>
+ */
+#include <errno.h>
+#include <string.h>
+#include <dwarf-regs.h>
+
+int get_arm64_regnum(const char *name)
+{
+	int reg;
+
+	if (!strcmp(name, "sp"))
+		return 31;
+
+	if (*name != 'x' && *name != 'w')
+		return -EINVAL;
+
+	name++;
+	reg = strtol(name, NULL, 0);
+
+	return reg >= 0 && reg <= 30 ? reg : -EINVAL;
+}
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 6f1b9f6b2466..81cc5f69a391 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -101,6 +101,8 @@ const char *get_dwarf_regstr(unsigned int n, unsigned int machine, unsigned int
 
 int get_x86_regnum(const char *name);
 
+int get_arm64_regnum(const char *name);
+
 #if !defined(__x86_64__) && !defined(__i386__)
 int get_arch_regnum(const char *name);
 #endif
@@ -128,6 +130,11 @@ static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source _
 {
 	return;
 }
+
+static inline int get_arm64_regnum(const char *name __maybe_unused)
+{
+	return -1;
+}
 #endif
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
                   ` (3 preceding siblings ...)
  2025-03-14 16:21 ` [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64 Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18  1:51   ` Namhyung Kim
  2025-03-14 16:21 ` [PATCH 6/7] perf annotate-data: Handle arm64 global variable access Li Huafei
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

Support for arm64 instruction tracing. This patch addresses the scenario
where type information cannot be found during multi-level pointer
references. For example, consider the vfs_ioctl() function:

 long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
     int error = -ENOTTY;

     if (!filp->f_op->unlocked_ioctl)
         goto out;

     error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
     if (error == -ENOIOCTLCMD)
         error = -ENOTTY;
 out:
     return error;
 }

The 'SYSCALL_DEFINE3(ioctl)' inlines vfs_ioctl, and the assembly
instructions for 'if (!filp->f_op->unlocked_ioctl)' are as follows:

 ldr     x0, [x21, #16]
 ldr     x3, [x0, #80]
 cbz     x3, ffff80008048e9a4

The first instruction loads the 'filp->f_op' pointer, and the second
instruction loads the 'filp->f_op->unlocked_ioctl' pointer. DWARF
generates type information for x21, but not for x0. Therefore, if
PMU sampling occurs on the second instruction, the corresponding data
type cannot be obtained. However, by using the type information and
offset from x21 in the first ldr instruction, we can infer the type
of x0 and, combined with the offset, resolve the accessed data member.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/arch/arm64/annotate/instructions.c | 44 ++++++++++++++++++-
 tools/perf/util/annotate-data.c               |  3 +-
 tools/perf/util/annotate-data.h               |  2 +-
 tools/perf/util/disasm.c                      |  3 ++
 4 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index 54497b72a5c5..f70d93001fe7 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -215,7 +215,8 @@ extract_reg_offset_arm64(struct arch *arch __maybe_unused,
 	static bool regex_compiled;
 
 	if (!regex_compiled) {
-		regcomp(&reg_off_regex, "^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",
+		regcomp(&reg_off_regex,
+			"^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",
 			REG_EXTENDED);
 		regex_compiled = true;
 	}
@@ -250,3 +251,44 @@ extract_reg_offset_arm64(struct arch *arch __maybe_unused,
 	free(str);
 	return 0;
 }
+
+#ifdef HAVE_LIBDW_SUPPORT
+static void
+update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
+			Dwarf_Die * cu_die __maybe_unused, struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	Dwarf_Die type_die;
+	int sreg, dreg;
+
+	if (strncmp(dl->ins.name, "ld", 2))
+		return;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	sreg = get_arm64_regnum(dl->ops.source.raw);
+	if (sreg < 0)
+		return;
+	if (!has_reg_type(state, sreg))
+		return;
+
+	dreg = dst->reg1;
+	if (has_reg_type(state, dreg) && state->regs[dreg].ok &&
+	    state->regs[dreg].kind == TSR_KIND_TYPE &&
+	    dwarf_tag(&state->regs[dreg].type) == DW_TAG_pointer_type &&
+	    die_deref_ptr_type(&state->regs[dreg].type,
+			       dst->offset, &type_die)) {
+		tsr = &state->regs[sreg];
+		tsr->type = type_die;
+		tsr->kind = TSR_KIND_TYPE;
+		tsr->ok = true;
+
+		pr_debug_dtp("load [%x] %#x(reg%d) -> reg%d",
+			     (u32)dl->al.offset, dst->offset, dreg, sreg);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+}
+#endif
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 976abedca09e..2bc8d646eedc 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1293,7 +1293,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
 
 static int arch_supports_insn_tracking(struct data_loc_info *dloc)
 {
-	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")))
+	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")) ||
+	    (arch__is(dloc->arch, "arm64")))
 		return 1;
 	return 0;
 }
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 98c80b2268dd..717f394eb8f1 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -190,7 +190,7 @@ struct type_state_stack {
 };
 
 /* FIXME: This should be arch-dependent */
-#ifdef __powerpc__
+#if defined(__powerpc__) || defined(__aarch64__)
 #define TYPE_STATE_MAX_REGS  32
 #else
 #define TYPE_STATE_MAX_REGS  16
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 1035c60a8545..540981c155f9 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -129,6 +129,9 @@ static struct arch architectures[] = {
 		.name = "arm64",
 		.init = arm64__annotate_init,
 		.extract_reg_offset = extract_reg_offset_arm64,
+#ifdef HAVE_LIBDW_SUPPORT
+		.update_insn_state = update_insn_state_arm64,
+#endif
 	},
 	{
 		.name = "csky",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] perf annotate-data: Handle arm64 global variable access
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
                   ` (4 preceding siblings ...)
  2025-03-14 16:21 ` [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64 Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18  2:01   ` Namhyung Kim
  2025-03-14 16:21 ` [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64 Li Huafei
  2025-03-18  1:25 ` [PATCH 0/7] Add data type profiling support for arm64 Namhyung Kim
  7 siblings, 1 reply; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

Arm64 uses the 'adrp' and 'add' instructions to load the address of a
global variable. For example:

 adrp    x19, ffff8000819c3000
 add     x19, x19, #0x3e8
 <<after some sequence>>
 ldr     x22, [x19, #8]

Here, 'adrp' retrieves the base address of the page where the global
variable is located, and 'add' adds the offset within the page. If PMU
sampling occurs at the instruction 'ldr x22, [x19, #8]', we need to
trace the preceding 'adrp' and 'add' instructions to obtain the status
information of x19.

A new register status type 'TSR_KIND_GLOBAL_ADDR' is introduced,
indicating that the register holds the address of a global variable, and
this address is also stored in the 'type_state_reg' structure. After
obtaining the status information of x19, we use
get_global_var_type() to search for a matching global variable and
verify whether the returned offset is equal to 8. If it is, then we have
identified the data type and offset of the accessed global variable.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/arch/arm64/annotate/instructions.c | 90 ++++++++++++++++++-
 tools/perf/util/annotate-data.c               | 20 +++++
 tools/perf/util/annotate-data.h               |  2 +
 3 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index f70d93001fe7..f2053e7f60a8 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -262,6 +262,94 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
 	struct type_state_reg *tsr;
 	Dwarf_Die type_die;
 	int sreg, dreg;
+	u32 insn_offset = dl->al.offset;
+
+	/* Access global variables via PC relative addressing, for example:
+	 *
+	 *  adrp    x19, ffff800082074000
+	 *  add     x19, x19, #0x380
+	 *
+	 * The adrp instruction locates the page base address, and the add
+	 * instruction adds the offset within the page.
+	 */
+	if (!strncmp(dl->ins.name, "adrp", 4)) {
+		sreg = get_arm64_regnum(dl->ops.source.raw);
+		if (sreg < 0 || !has_reg_type(state, sreg))
+			return;
+
+		tsr = &state->regs[sreg];
+		tsr->ok = true;
+		tsr->kind = TSR_KIND_GLOBAL_ADDR;
+		/*
+		 * The default arm64_mov_ops has already parsed the adrp
+		 * instruction and saved the target address.
+		 */
+		tsr->addr = dl->ops.target.addr;
+
+		pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d\n",
+			     insn_offset, tsr->addr, sreg);
+		return;
+	}
+
+	/* Add the offset within the page. */
+	if (!strncmp(dl->ins.name, "add", 3)) {
+		regmatch_t match[4];
+		char *ops = strdup(dl->ops.raw);
+		u64 offset;
+		static regex_t add_regex;
+		static bool regex_compiled;
+
+		/*
+		 * Matching the operand assembly syntax of the add instruction:
+		 *
+		 *  <Xd|SP>, <Xn|SP>, #<imm>
+		 */
+		if (!regex_compiled) {
+			regcomp(&add_regex,
+				"^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
+				REG_EXTENDED);
+			regex_compiled = true;
+		}
+
+		if (!ops)
+			return;
+
+		if (regexec(&add_regex, dl->ops.raw, 4, match, 0))
+			return;
+
+		/*
+		 * Parse the source register first. If it is not of the type
+		 * TSR_KIND_GLOBAL_ADDR, further parsing is not required.
+		 */
+		ops[match[2].rm_eo] = '\0';
+		sreg = get_arm64_regnum(ops + match[2].rm_so);
+		if (sreg < 0 || !has_reg_type(state, sreg) ||
+		    state->regs[sreg].kind != TSR_KIND_GLOBAL_ADDR) {
+			free(ops);
+			return;
+		}
+
+		ops[match[1].rm_eo] = '\0';
+		dreg = get_arm64_regnum(ops + match[1].rm_so);
+		if (dreg < 0 || !has_reg_type(state, dreg)) {
+			free(ops);
+			return;
+		}
+
+		ops[match[3].rm_eo] = '\0';
+		offset = strtoul(ops + match[3].rm_so, NULL, 16);
+
+		tsr = &state->regs[dreg];
+		tsr->ok = true;
+		tsr->kind = TSR_KIND_GLOBAL_ADDR;
+		tsr->addr = state->regs[sreg].addr + offset;
+
+		pr_debug_dtp("add [%x] global addr=%#"PRIx64"(reg%d) -> reg%d\n",
+			     insn_offset, tsr->addr, sreg, dreg);
+
+		free(ops);
+		return;
+	}
 
 	if (strncmp(dl->ins.name, "ld", 2))
 		return;
@@ -287,7 +375,7 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
 		tsr->ok = true;
 
 		pr_debug_dtp("load [%x] %#x(reg%d) -> reg%d",
-			     (u32)dl->al.offset, dst->offset, dreg, sreg);
+			     insn_offset, dst->offset, dreg, sreg);
 		pr_debug_type_name(&tsr->type, tsr->kind);
 	}
 }
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 2bc8d646eedc..aaca08bb9097 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -65,6 +65,9 @@ void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
 	case TSR_KIND_CANARY:
 		pr_info(" stack canary\n");
 		return;
+	case TSR_KIND_GLOBAL_ADDR:
+		pr_info(" global address\n");
+		return;
 	case TSR_KIND_TYPE:
 	default:
 		break;
@@ -1087,6 +1090,23 @@ static enum type_match_result check_matching_type(struct type_state *state,
 		return PERF_TMR_OK;
 	}
 
+	if (state->regs[reg].kind == TSR_KIND_GLOBAL_ADDR) {
+		int var_offset;
+		u64 var_addr;
+
+		pr_debug_dtp("global var by address");
+
+		var_addr = state->regs[reg].addr + dloc->op->offset;
+
+		if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
+					&var_offset, type_die)) {
+			dloc->type_offset = var_offset;
+			return PERF_TMR_OK;
+		}
+
+		return PERF_TMR_BAIL_OUT;
+	}
+
 	if (state->regs[reg].kind == TSR_KIND_CANARY) {
 		pr_debug_dtp("stack canary");
 
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 717f394eb8f1..e3e877313207 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -36,6 +36,7 @@ enum type_state_kind {
 	TSR_KIND_CONST,
 	TSR_KIND_POINTER,
 	TSR_KIND_CANARY,
+	TSR_KIND_GLOBAL_ADDR,
 };
 
 /**
@@ -177,6 +178,7 @@ struct type_state_reg {
 	bool caller_saved;
 	u8 kind;
 	u8 copied_from;
+	u64 addr;
 };
 
 /* Type information in a stack location, dynamically allocated */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
                   ` (5 preceding siblings ...)
  2025-03-14 16:21 ` [PATCH 6/7] perf annotate-data: Handle arm64 global variable access Li Huafei
@ 2025-03-14 16:21 ` Li Huafei
  2025-03-18  2:06   ` Namhyung Kim
  2025-03-18  1:25 ` [PATCH 0/7] Add data type profiling support for arm64 Namhyung Kim
  7 siblings, 1 reply; 16+ messages in thread
From: Li Huafei @ 2025-03-14 16:21 UTC (permalink / raw)
  To: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers
  Cc: mike.leach, peterz, mingo, alexander.shishkin, jolsa, kjain,
	mhiramat, atrajeev, sesse, adrian.hunter, kan.liang, linux-kernel,
	linux-arm-kernel, linux-perf-users, lihuafei1

According to the implementation of the 'current' macro on ARM64, the
sp_el0 register stores the pointer to the current task's task_struct.
For example:

 mrs x1, sp_el0
 ldr x2, [x1, #1896]

We can infer that the ldr instruction is accessing a member of the
task_struct structure at an offset of 1896. The key is to construct the
data type for x1. The instruction 'mrs x1, sp_el0' belongs to the inline
function get_current(). By finding the DIE of the inline function
through its instruction address, and then obtaining the DIE for its
return type, which should be 'struct task_struct *'. Then, we update the
register state of x1 with this type information.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
---
 tools/perf/arch/arm64/annotate/instructions.c | 71 +++++++++++++++----
 1 file changed, 57 insertions(+), 14 deletions(-)

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index f2053e7f60a8..c5a0a6381547 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -263,6 +263,20 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
 	Dwarf_Die type_die;
 	int sreg, dreg;
 	u32 insn_offset = dl->al.offset;
+	static regex_t add_regex, mrs_regex;
+	static bool regex_compiled;
+
+	if (!regex_compiled) {
+		/*
+		 * Matching the operand assembly syntax of the add instruction:
+		 *
+		 *  <Xd|SP>, <Xn|SP>, #<imm>
+		 */
+		regcomp(&add_regex, "^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
+			REG_EXTENDED);
+		regcomp(&mrs_regex, "^(x[0-9]{1,2}), sp_el0", REG_EXTENDED);
+		regex_compiled = true;
+	}
 
 	/* Access global variables via PC relative addressing, for example:
 	 *
@@ -296,20 +310,6 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
 		regmatch_t match[4];
 		char *ops = strdup(dl->ops.raw);
 		u64 offset;
-		static regex_t add_regex;
-		static bool regex_compiled;
-
-		/*
-		 * Matching the operand assembly syntax of the add instruction:
-		 *
-		 *  <Xd|SP>, <Xn|SP>, #<imm>
-		 */
-		if (!regex_compiled) {
-			regcomp(&add_regex,
-				"^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
-				REG_EXTENDED);
-			regex_compiled = true;
-		}
 
 		if (!ops)
 			return;
@@ -351,6 +351,49 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
 		return;
 	}
 
+	if (!strncmp(dl->ins.name, "mrs", 3)) {
+		regmatch_t match[2];
+		char *ops = strdup(dl->ops.raw);
+		Dwarf_Die func_die;
+		Dwarf_Attribute attr;
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (!ops)
+			return;
+
+		if (regexec(&mrs_regex, dl->ops.raw, 2, match, 0))
+			return;
+
+		ops[match[1].rm_eo] = '\0';
+		sreg = get_arm64_regnum(ops + match[1].rm_so);
+		if (sreg < 0 || !has_reg_type(state, sreg)) {
+			free(ops);
+			return;
+		}
+
+		/*
+		 * Find the inline function 'get_current()' Dwarf_Die and
+		 * obtain its return value data type, which should be
+		 * 'struct task_struct *'.
+		 */
+		if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
+		    !dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
+		    !dwarf_formref_die(&attr, &type_die)) {
+			free(ops);
+			return;
+		}
+
+		tsr = &state->regs[sreg];
+		tsr->type = type_die;
+		tsr->kind = TSR_KIND_TYPE;
+		tsr->ok = true;
+
+		pr_debug_dtp("mrs sp_el0 [%x] -> reg%d", insn_offset, sreg);
+		free(ops);
+		return;
+	}
+
 	if (strncmp(dl->ins.name, "ld", 2))
 		return;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Add data type profiling support for arm64
  2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
                   ` (6 preceding siblings ...)
  2025-03-14 16:21 ` [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64 Li Huafei
@ 2025-03-18  1:25 ` Namhyung Kim
  7 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  1:25 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

Hello,

On Sat, Mar 15, 2025 at 12:21:30AM +0800, Li Huafei wrote:
> Hi,
> 
> This patchset supports arm64 perf data type profiling. Data type
> profiling was introduced by Namhyung [1], which associates PMU sampling
> (here referring to memory access-related event sampling) with the
> referenced data types, providing developers with an effective tool for
> analyzing the impact of memory usage and layout. For more detailed
> background, please refer to [2].

Thanks a lot for working on this!  I'm glad to see it running on more
architectures!  I'll review and leave comments on each patch.

Thanks,
Namhyung

> 
> Namhyung initially supported this feature only on x86, and later Athira
> added support for it on powerpc [3]. Unlike the x86 implementation, the
> powerpc implementation parses operands directly from raw instruction
> code instead of using the results from assembler disassembly. As Athira
> mentioned, this is mainly because not all memory access instructions on
> powerpc have explicit memory reference assembler notations '()' in their
> assembly code. On arm64, all memory access instructions have the
> notation '[]', so my implementation is similar to x86, using the
> disassembly results from objdump, llvm, or libcapstone, and parsing
> based on strings. I believe this has the advantage of reusing the
> complex instruction parsing logic of the assembler, but it may not
> perform as well as raw instruction parsing in terms of efficiency.
> 
> Below is a brief description of this patchset:
>  - Patch 1 first identifies load and store instructions and provides a
>    parsing function.
>  - Patches 2-3 are refactoring patches. They primarily move the code for
>    extracting registers and offsets to specific architecture
>    implementations. Additionally, a new callback function
>    'extract_reg_offset' is introduced to avoid having too many
>    architecture-specific implementations in the function
>    'annotate_get_insn_location()'.
>  - Patch 4 implements the extract_reg_offset callback for arm64.
>    Currently, it does not support parsing instructions with register
>    pairs or register offsets in operands. Register pairs often appear in
>    stack push/pop instructions, and register offsets are common when
>    accessing per-CPU variables, both of which require special handling.
>  - Patch 5 adds support for instruction tracing on arm64, primarily
>    addressing the issue where DWARF does not generate information for
>    intermediate pointers in pointer chains.
>  - Patches 6-7 further enhance instruction tracing. Patch 6 supports
>    parsing accesses to global variables, while Patch 7 focuses on
>    resolving accesses to the kernel's current pointer.
> 
> There are still areas for improvement in the current implementation:
>  - Support more types of memory access instructions, such as those
>    involving register pairs and register offsets.
>  - Handle all data processing instructions (e.g., mov, add), as these
>    instructions can change the state of registers and may affect the
>    accuracy of instruction tracking.
>  - Supporting parsing of special memory access scenarios like per-CPU
>    variables and arrays.
> 
> The patch set is based on 6.14-rc6 (commit 80e54e84911a). After applying
> this patch set, the date type profiling results on arm64 are as follows
> (SPE support is required):
> 
>  # perf mem record -a -K -- sleep 1
>  # perf annotate --data-type --type-stat --stdio
>  Only instruction-based sampling period is currently supported by Arm SPE.
>  Annotate data type stats:
>  total 556, ok 357 (64.2%), bad 199 (35.8%)
>  -----------------------------------------------------------
>          10 : no_sym
>          36 : no_insn_ops
>          65 : no_var
>          70 : no_typeinfo
>          18 : bad_offset
>          59 : insn_track
>  
>  Annotate type: 'struct rq' in [kernel.kallsyms] (29 samples):
>  ============================================================================
>   Percent     offset       size  field
>    100.00          0      0xe80  struct rq        {
>      0.00          0        0x4      raw_spinlock_t      __lock {
>      0.00          0        0x4          arch_spinlock_t raw_lock {
>      0.00          0        0x4              union        {
>      0.00          0        0x4                  atomic_t        val {
>      0.00          0        0x4                      int counter;
>                                                  };
>      0.00          0        0x2                  struct   {
>      0.00          0        0x1                      u8  locked;
>      0.00        0x1        0x1                      u8  pending;
>                                                  };
>      0.00          0        0x4                  struct   {
>      0.00          0        0x2                      u16 locked_pending;
>      0.00        0x2        0x2                      u16 tail;
>                                                  };
>                                              };
>                                          };
>                                      };
>     13.79        0x4        0x4      unsigned int        nr_running;
>     13.79        0x8        0x4      unsigned int        nr_numa_running;
>      0.00        0xc        0x4      unsigned int        nr_preferred_running;
>      0.00       0x10        0x4      unsigned int        numa_migrate_on;
>      0.00       0x18        0x8      long unsigned int   last_blocked_load_update_tick;
>      0.00       0x20        0x4      unsigned int        has_blocked_load;
>      0.00       0x40       0x20      call_single_data_t  nohz_csd {
>      0.00       0x40       0x10          struct __call_single_node       node {
>      0.00       0x40        0x8              struct llist_node   llist {
>      0.00       0x40        0x8                  struct llist_node*      next;
>                                              };
>      0.00       0x48        0x4              union        {
>      0.00       0x48        0x4                  unsigned int    u_flags;
>      0.00       0x48        0x4                  atomic_t        a_flags {
>      0.00       0x48        0x4                      int counter;
>                                                  };
>                                              };
>      ...
> 
> Thanks,
> Huafei
> 
> [1] https://lore.kernel.org/lkml/20231213001323.718046-1-namhyung@kernel.org/
> [2] https://lwn.net/Articles/955709/
> [3] https://lore.kernel.org/all/20240718084358.72242-1-atrajeev@linux.vnet.ibm.com/#r
> 
> Li Huafei (7):
>   perf annotate: Handle arm64 load and store instructions
>   perf annotate: Advance the mem_ref check to mov__parse()
>   perf annotate: Add 'extract_reg_offset' callback function to extract
>     register number and access offset
>   perf annotate: Support for the 'extract_reg_offset' callback function
>     in arm64
>   perf annotate-data: Support instruction tracking for arm64
>   perf annotate-data: Handle arm64 global variable access
>   perf annotate-data: Handle the access to the 'current' pointer on
>     arm64
> 
>  tools/perf/arch/arm64/annotate/instructions.c | 302 +++++++++++++++++-
>  .../perf/arch/powerpc/annotate/instructions.c |  10 +
>  tools/perf/arch/x86/annotate/instructions.c   |  99 ++++++
>  tools/perf/util/Build                         |   1 +
>  tools/perf/util/annotate-data.c               |  23 +-
>  tools/perf/util/annotate-data.h               |   4 +-
>  tools/perf/util/annotate.c                    | 112 +------
>  tools/perf/util/disasm.c                      |  14 +
>  tools/perf/util/disasm.h                      |   4 +
>  tools/perf/util/dwarf-regs-arm64.c            |  25 ++
>  tools/perf/util/include/dwarf-regs.h          |   7 +
>  11 files changed, 490 insertions(+), 111 deletions(-)
>  create mode 100644 tools/perf/util/dwarf-regs-arm64.c
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf annotate: Handle arm64 load and store instructions
  2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
@ 2025-03-18  1:32   ` Namhyung Kim
  2025-03-18 17:15   ` Leo Yan
  1 sibling, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  1:32 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:31AM +0800, Li Huafei wrote:
> Add ldst_ops to handle load and store instructions in order to parse
> the data types and offsets associated with PMU events for memory access
> instructions. There are many variants of load and store instructions in
> ARM64, making it difficult to match all of these instruction names
> completely. Therefore, only the instruction prefixes are matched. The
> prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
> matches atomic instructions, and 'prf' matches memory prefetch
> instructions.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 67 ++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index d465d093e7eb..c212eb7341bd 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -6,7 +6,8 @@
>  
>  struct arm64_annotate {
>  	regex_t call_insn,
> -		jump_insn;
> +		jump_insn,
> +		ldst_insn; /* load and store instruction */
>  };
>  
>  static int arm64_mov__parse(struct arch *arch __maybe_unused,
> @@ -67,6 +68,57 @@ static struct ins_ops arm64_mov_ops = {
>  	.scnprintf = mov__scnprintf,
>  };
>  
> +static int arm64_ldst__parse(struct arch *arch __maybe_unused,
> +			     struct ins_operands *ops,
> +			     struct map_symbol *ms __maybe_unused,
> +			     struct disasm_line *dl __maybe_unused)
> +{
> +	char *s, *target;
> +
> +	/*
> +	 * The part starting from the memory access annotation '[' is parsed
> +	 * as 'target', while the part before it is parsed as 'source'.
> +	 */
> +	target = s = strchr(ops->raw, '[');
> +	if (!s)
> +		return -1;
> +
> +	while (s > ops->raw && *s != ',')
> +		--s;
> +
> +	if (s == ops->raw)
> +		return -1;
> +
> +	*s = '\0';
> +	ops->source.raw = strdup(ops->raw);
> +
> +	*s = ',';
> +	if (!ops->source.raw)
> +		return -1;
> +
> +	ops->target.raw = strdup(target);
> +	if (!ops->target.raw) {
> +		zfree(ops->source.raw);

I think you need 'zfree(&ops->source.raw)' instead.


> +		return -1;
> +	}
> +	ops->target.mem_ref = true;
> +
> +	return 0;
> +}
> +
> +static int ldst__scnprintf(struct ins *ins, char *bf, size_t size,
> +			   struct ins_operands *ops, int max_ins_name)
> +{
> +	return scnprintf(bf, size, "%-*s %s,%s", max_ins_name, ins->name,
> +			 ops->source.name ?: ops->source.raw,
> +			 ops->target.name ?: ops->target.raw);
> +}
> +
> +static struct ins_ops arm64_ldst_ops = {
> +	.parse	   = arm64_ldst__parse,
> +	.scnprintf = ldst__scnprintf,
> +};
> +
>  static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
>  {
>  	struct arm64_annotate *arm = arch->priv;
> @@ -77,6 +129,8 @@ static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const
>  		ops = &jump_ops;
>  	else if (!regexec(&arm->call_insn, name, 2, match, 0))
>  		ops = &call_ops;
> +	else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
> +		ops = &arm64_ldst_ops;
>  	else if (!strcmp(name, "ret"))
>  		ops = &ret_ops;
>  	else
> @@ -107,6 +161,15 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  		      REG_EXTENDED);
>  	if (err)
>  		goto out_free_call;
> +	/*
> +	 * The ARM64 architecture has many variants of load/store instructions.
> +	 * It is quite challenging to match all of them completely. Here, we
> +	 * only match the prefixes of these instructions.
> +	 */
> +	err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
> +		      REG_EXTENDED);
> +	if (err)
> +		goto out_free_jump;
>  
>  	arch->initialized = true;
>  	arch->priv	  = arm;
> @@ -117,6 +180,8 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  	arch->e_flags = 0;
>  	return 0;
>  
> +out_free_jump:
> +	regfree(&arm->jump_insn);
>  out_free_call:
>  	regfree(&arm->call_insn);

It seems we leak these on the success path.  Probably we need arch
annotate_exit() to free the resources.

Thanks,
Namhyung


>  out_free_arm:
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64
  2025-03-14 16:21 ` [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64 Li Huafei
@ 2025-03-18  1:45   ` Namhyung Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  1:45 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:34AM +0800, Li Huafei wrote:
> At present, only the following two addressing modes are supported:
> 
>  1. Base register only (no offset): [base{, #0}]
>  2. Base plus offset (immediate): [base{, #imm}]
> 
> For addressing modes where the offset needs to be calculated from the
> register value, it is difficult to know the specific value of the offset
> register, making it impossible to calculate the offset.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 62 +++++++++++++++++++
>  tools/perf/util/Build                         |  1 +
>  tools/perf/util/disasm.c                      |  1 +
>  tools/perf/util/dwarf-regs-arm64.c            | 25 ++++++++
>  tools/perf/util/include/dwarf-regs.h          |  7 +++
>  5 files changed, 96 insertions(+)
>  create mode 100644 tools/perf/util/dwarf-regs-arm64.c
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index c212eb7341bd..54497b72a5c5 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -188,3 +188,65 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  	free(arm);
>  	return SYMBOL_ANNOTATE_ERRNO__ARCH_INIT_REGEXP;
>  }
> +
> +
> +/*
> + * Get the base register number and access offset in load/store instructions.
> + * At present, only the following two addressing modes are supported:
> + *
> + *  1. Base register only (no offset): [base{, #0}]
> + *  2. Base plus offset (immediate): [base{, #imm}]
> + *
> + * For addressing modes where the offset needs to be calculated from the
> + * register value, it is difficult to know the specific value of the offset
> + * register, making it impossible to calculate the offset.
> + *
> + * Fills @reg and @offset when return 0.
> + */
> +static int
> +extract_reg_offset_arm64(struct arch *arch __maybe_unused,
> +			 struct disasm_line *dl __maybe_unused,
> +			 const char *insn_str, int insn_ops __maybe_unused,
> +			 struct annotated_op_loc *op_loc)
> +{
> +	char *str;
> +	regmatch_t match[4];
> +	static regex_t reg_off_regex;
> +	static bool regex_compiled;
> +
> +	if (!regex_compiled) {
> +		regcomp(&reg_off_regex, "^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",
> +			REG_EXTENDED);
> +		regex_compiled = true;

Probably better to put it in the arch specific data and free it when you
add arch__annotate_exit().


> +	}
> +
> +	if (!op_loc->mem_ref)
> +		return 0;
> +
> +	if (regexec(&reg_off_regex, insn_str, 4, match, 0))
> +		return -1;
> +
> +	str = strdup(insn_str);
> +	if (!str)
> +		return -1;
> +
> +	/* Get the base register number. */
> +	str[match[1].rm_eo] = '\0';
> +	op_loc->reg1 = get_arm64_regnum(str + match[1].rm_so);
> +
> +	/*
> +	 * If there is an immediate offset, match[2] records the start and end
> +	 * positions of "#imm".
> +	 */
> +	if (match[2].rm_so == -1) {
> +		free(str);
> +		return 0;
> +	}
> +
> +	/* Get the immediate offset. */
> +	str[match[3].rm_eo] = '\0';
> +	op_loc->offset = strtol(str + match[3].rm_so, NULL, 0);

Can you please clarify what match 1,2,3 mean - hopefully with an
example?

Thanks,
Namhyung

> +
> +	free(str);
> +	return 0;
> +}
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 5ec97e8d6b6d..d408cbe94fdd 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -210,6 +210,7 @@ perf-util-$(CONFIG_LIBDW) += dwarf-regs.o
>  perf-util-$(CONFIG_LIBDW) += dwarf-regs-csky.o
>  perf-util-$(CONFIG_LIBDW) += dwarf-regs-powerpc.o
>  perf-util-$(CONFIG_LIBDW) += dwarf-regs-x86.o
> +perf-util-$(CONFIG_LIBDW) += dwarf-regs-arm64.o
>  perf-util-$(CONFIG_LIBDW) += debuginfo.o
>  perf-util-$(CONFIG_LIBDW) += annotate-data.o
>  
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 905eceb824a4..1035c60a8545 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -128,6 +128,7 @@ static struct arch architectures[] = {
>  	{
>  		.name = "arm64",
>  		.init = arm64__annotate_init,
> +		.extract_reg_offset = extract_reg_offset_arm64,
>  	},
>  	{
>  		.name = "csky",
> diff --git a/tools/perf/util/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arm64.c
> new file mode 100644
> index 000000000000..edf41c059967
> --- /dev/null
> +++ b/tools/perf/util/dwarf-regs-arm64.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Mapping of DWARF debug register numbers into register names.
> + *
> + * Copyright (c) 2025  Huawei Inc, Li Huafei <lihuafei1@huawei.com>
> + */
> +#include <errno.h>
> +#include <string.h>
> +#include <dwarf-regs.h>
> +
> +int get_arm64_regnum(const char *name)
> +{
> +	int reg;
> +
> +	if (!strcmp(name, "sp"))
> +		return 31;
> +
> +	if (*name != 'x' && *name != 'w')
> +		return -EINVAL;
> +
> +	name++;
> +	reg = strtol(name, NULL, 0);
> +
> +	return reg >= 0 && reg <= 30 ? reg : -EINVAL;
> +}
> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
> index 6f1b9f6b2466..81cc5f69a391 100644
> --- a/tools/perf/util/include/dwarf-regs.h
> +++ b/tools/perf/util/include/dwarf-regs.h
> @@ -101,6 +101,8 @@ const char *get_dwarf_regstr(unsigned int n, unsigned int machine, unsigned int
>  
>  int get_x86_regnum(const char *name);
>  
> +int get_arm64_regnum(const char *name);
> +
>  #if !defined(__x86_64__) && !defined(__i386__)
>  int get_arch_regnum(const char *name);
>  #endif
> @@ -128,6 +130,11 @@ static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source _
>  {
>  	return;
>  }
> +
> +static inline int get_arm64_regnum(const char *name __maybe_unused)
> +{
> +	return -1;
> +}
>  #endif
>  
>  #endif
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64
  2025-03-14 16:21 ` [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64 Li Huafei
@ 2025-03-18  1:51   ` Namhyung Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  1:51 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:35AM +0800, Li Huafei wrote:
> Support for arm64 instruction tracing. This patch addresses the scenario
> where type information cannot be found during multi-level pointer
> references. For example, consider the vfs_ioctl() function:
> 
>  long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  {
>      int error = -ENOTTY;
> 
>      if (!filp->f_op->unlocked_ioctl)
>          goto out;
> 
>      error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
>      if (error == -ENOIOCTLCMD)
>          error = -ENOTTY;
>  out:
>      return error;
>  }
> 
> The 'SYSCALL_DEFINE3(ioctl)' inlines vfs_ioctl, and the assembly
> instructions for 'if (!filp->f_op->unlocked_ioctl)' are as follows:
> 
>  ldr     x0, [x21, #16]
>  ldr     x3, [x0, #80]
>  cbz     x3, ffff80008048e9a4
> 
> The first instruction loads the 'filp->f_op' pointer, and the second
> instruction loads the 'filp->f_op->unlocked_ioctl' pointer. DWARF
> generates type information for x21, but not for x0. Therefore, if
> PMU sampling occurs on the second instruction, the corresponding data
> type cannot be obtained. However, by using the type information and
> offset from x21 in the first ldr instruction, we can infer the type
> of x0 and, combined with the offset, resolve the accessed data member.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 44 ++++++++++++++++++-
>  tools/perf/util/annotate-data.c               |  3 +-
>  tools/perf/util/annotate-data.h               |  2 +-
>  tools/perf/util/disasm.c                      |  3 ++
>  4 files changed, 49 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index 54497b72a5c5..f70d93001fe7 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -215,7 +215,8 @@ extract_reg_offset_arm64(struct arch *arch __maybe_unused,
>  	static bool regex_compiled;
>  
>  	if (!regex_compiled) {
> -		regcomp(&reg_off_regex, "^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",
> +		regcomp(&reg_off_regex,
> +			"^\\[(sp|[xw][0-9]{1,2})(, #(-?[0-9]+))?\\].*",

Does it have any real changes?  If not I'd rather leave it or move the
change to the original commit.


>  			REG_EXTENDED);
>  		regex_compiled = true;
>  	}
> @@ -250,3 +251,44 @@ extract_reg_offset_arm64(struct arch *arch __maybe_unused,
>  	free(str);
>  	return 0;
>  }
> +
> +#ifdef HAVE_LIBDW_SUPPORT
> +static void
> +update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
> +			Dwarf_Die * cu_die __maybe_unused, struct disasm_line *dl)
> +{
> +	struct annotated_insn_loc loc;
> +	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
> +	struct type_state_reg *tsr;
> +	Dwarf_Die type_die;
> +	int sreg, dreg;
> +
> +	if (strncmp(dl->ins.name, "ld", 2))
> +		return;
> +
> +	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
> +		return;
> +
> +	sreg = get_arm64_regnum(dl->ops.source.raw);
> +	if (sreg < 0)
> +		return;
> +	if (!has_reg_type(state, sreg))
> +		return;

It'd be better to invalidate state of the target register even if it
failed to get the information of source register and its state.  The
destination would be updated anyway and keeping the stale state would
result in an invalid report at the end.

Thanks,
Namhyung

> +
> +	dreg = dst->reg1;
> +	if (has_reg_type(state, dreg) && state->regs[dreg].ok &&
> +	    state->regs[dreg].kind == TSR_KIND_TYPE &&
> +	    dwarf_tag(&state->regs[dreg].type) == DW_TAG_pointer_type &&
> +	    die_deref_ptr_type(&state->regs[dreg].type,
> +			       dst->offset, &type_die)) {
> +		tsr = &state->regs[sreg];
> +		tsr->type = type_die;
> +		tsr->kind = TSR_KIND_TYPE;
> +		tsr->ok = true;
> +
> +		pr_debug_dtp("load [%x] %#x(reg%d) -> reg%d",
> +			     (u32)dl->al.offset, dst->offset, dreg, sreg);
> +		pr_debug_type_name(&tsr->type, tsr->kind);
> +	}
> +}
> +#endif
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 976abedca09e..2bc8d646eedc 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -1293,7 +1293,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
>  
>  static int arch_supports_insn_tracking(struct data_loc_info *dloc)
>  {
> -	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")))
> +	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")) ||
> +	    (arch__is(dloc->arch, "arm64")))
>  		return 1;
>  	return 0;
>  }
> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
> index 98c80b2268dd..717f394eb8f1 100644
> --- a/tools/perf/util/annotate-data.h
> +++ b/tools/perf/util/annotate-data.h
> @@ -190,7 +190,7 @@ struct type_state_stack {
>  };
>  
>  /* FIXME: This should be arch-dependent */
> -#ifdef __powerpc__
> +#if defined(__powerpc__) || defined(__aarch64__)
>  #define TYPE_STATE_MAX_REGS  32
>  #else
>  #define TYPE_STATE_MAX_REGS  16
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 1035c60a8545..540981c155f9 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -129,6 +129,9 @@ static struct arch architectures[] = {
>  		.name = "arm64",
>  		.init = arm64__annotate_init,
>  		.extract_reg_offset = extract_reg_offset_arm64,
> +#ifdef HAVE_LIBDW_SUPPORT
> +		.update_insn_state = update_insn_state_arm64,
> +#endif
>  	},
>  	{
>  		.name = "csky",
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/7] perf annotate-data: Handle arm64 global variable access
  2025-03-14 16:21 ` [PATCH 6/7] perf annotate-data: Handle arm64 global variable access Li Huafei
@ 2025-03-18  2:01   ` Namhyung Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  2:01 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:36AM +0800, Li Huafei wrote:
> Arm64 uses the 'adrp' and 'add' instructions to load the address of a
> global variable. For example:
> 
>  adrp    x19, ffff8000819c3000
>  add     x19, x19, #0x3e8
>  <<after some sequence>>
>  ldr     x22, [x19, #8]

You can try perf annotate --stdio --code-with-type and see if it finds a
correct type.  It'd be nice if you include the result in the commit log.

> 
> Here, 'adrp' retrieves the base address of the page where the global
> variable is located, and 'add' adds the offset within the page. If PMU
> sampling occurs at the instruction 'ldr x22, [x19, #8]', we need to
> trace the preceding 'adrp' and 'add' instructions to obtain the status
> information of x19.
> 
> A new register status type 'TSR_KIND_GLOBAL_ADDR' is introduced,
> indicating that the register holds the address of a global variable, and
> this address is also stored in the 'type_state_reg' structure. After
> obtaining the status information of x19, we use
> get_global_var_type() to search for a matching global variable and
> verify whether the returned offset is equal to 8. If it is, then we have
> identified the data type and offset of the accessed global variable.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 90 ++++++++++++++++++-
>  tools/perf/util/annotate-data.c               | 20 +++++
>  tools/perf/util/annotate-data.h               |  2 +
>  3 files changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index f70d93001fe7..f2053e7f60a8 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -262,6 +262,94 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
>  	struct type_state_reg *tsr;
>  	Dwarf_Die type_die;
>  	int sreg, dreg;
> +	u32 insn_offset = dl->al.offset;
> +
> +	/* Access global variables via PC relative addressing, for example:
> +	 *
> +	 *  adrp    x19, ffff800082074000
> +	 *  add     x19, x19, #0x380
> +	 *
> +	 * The adrp instruction locates the page base address, and the add
> +	 * instruction adds the offset within the page.
> +	 */
> +	if (!strncmp(dl->ins.name, "adrp", 4)) {
> +		sreg = get_arm64_regnum(dl->ops.source.raw);
> +		if (sreg < 0 || !has_reg_type(state, sreg))
> +			return;
> +
> +		tsr = &state->regs[sreg];
> +		tsr->ok = true;
> +		tsr->kind = TSR_KIND_GLOBAL_ADDR;
> +		/*
> +		 * The default arm64_mov_ops has already parsed the adrp
> +		 * instruction and saved the target address.
> +		 */
> +		tsr->addr = dl->ops.target.addr;
> +
> +		pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d\n",
> +			     insn_offset, tsr->addr, sreg);
> +		return;
> +	}
> +
> +	/* Add the offset within the page. */
> +	if (!strncmp(dl->ins.name, "add", 3)) {
> +		regmatch_t match[4];
> +		char *ops = strdup(dl->ops.raw);
> +		u64 offset;
> +		static regex_t add_regex;
> +		static bool regex_compiled;
> +
> +		/*
> +		 * Matching the operand assembly syntax of the add instruction:
> +		 *
> +		 *  <Xd|SP>, <Xn|SP>, #<imm>
> +		 */
> +		if (!regex_compiled) {
> +			regcomp(&add_regex,
> +				"^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
> +				REG_EXTENDED);
> +			regex_compiled = true;

Similarly you could put it in the arch and free later.

Thanks.
Namhyung


> +		}
> +
> +		if (!ops)
> +			return;
> +
> +		if (regexec(&add_regex, dl->ops.raw, 4, match, 0))
> +			return;
> +
> +		/*
> +		 * Parse the source register first. If it is not of the type
> +		 * TSR_KIND_GLOBAL_ADDR, further parsing is not required.
> +		 */
> +		ops[match[2].rm_eo] = '\0';
> +		sreg = get_arm64_regnum(ops + match[2].rm_so);
> +		if (sreg < 0 || !has_reg_type(state, sreg) ||
> +		    state->regs[sreg].kind != TSR_KIND_GLOBAL_ADDR) {
> +			free(ops);
> +			return;
> +		}
> +
> +		ops[match[1].rm_eo] = '\0';
> +		dreg = get_arm64_regnum(ops + match[1].rm_so);
> +		if (dreg < 0 || !has_reg_type(state, dreg)) {
> +			free(ops);
> +			return;
> +		}
> +
> +		ops[match[3].rm_eo] = '\0';
> +		offset = strtoul(ops + match[3].rm_so, NULL, 16);
> +
> +		tsr = &state->regs[dreg];
> +		tsr->ok = true;
> +		tsr->kind = TSR_KIND_GLOBAL_ADDR;
> +		tsr->addr = state->regs[sreg].addr + offset;
> +
> +		pr_debug_dtp("add [%x] global addr=%#"PRIx64"(reg%d) -> reg%d\n",
> +			     insn_offset, tsr->addr, sreg, dreg);
> +
> +		free(ops);
> +		return;
> +	}
>  
>  	if (strncmp(dl->ins.name, "ld", 2))
>  		return;
> @@ -287,7 +375,7 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
>  		tsr->ok = true;
>  
>  		pr_debug_dtp("load [%x] %#x(reg%d) -> reg%d",
> -			     (u32)dl->al.offset, dst->offset, dreg, sreg);
> +			     insn_offset, dst->offset, dreg, sreg);
>  		pr_debug_type_name(&tsr->type, tsr->kind);
>  	}
>  }
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 2bc8d646eedc..aaca08bb9097 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -65,6 +65,9 @@ void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
>  	case TSR_KIND_CANARY:
>  		pr_info(" stack canary\n");
>  		return;
> +	case TSR_KIND_GLOBAL_ADDR:
> +		pr_info(" global address\n");
> +		return;
>  	case TSR_KIND_TYPE:
>  	default:
>  		break;
> @@ -1087,6 +1090,23 @@ static enum type_match_result check_matching_type(struct type_state *state,
>  		return PERF_TMR_OK;
>  	}
>  
> +	if (state->regs[reg].kind == TSR_KIND_GLOBAL_ADDR) {
> +		int var_offset;
> +		u64 var_addr;
> +
> +		pr_debug_dtp("global var by address");
> +
> +		var_addr = state->regs[reg].addr + dloc->op->offset;
> +
> +		if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
> +					&var_offset, type_die)) {
> +			dloc->type_offset = var_offset;
> +			return PERF_TMR_OK;
> +		}
> +
> +		return PERF_TMR_BAIL_OUT;
> +	}
> +
>  	if (state->regs[reg].kind == TSR_KIND_CANARY) {
>  		pr_debug_dtp("stack canary");
>  
> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
> index 717f394eb8f1..e3e877313207 100644
> --- a/tools/perf/util/annotate-data.h
> +++ b/tools/perf/util/annotate-data.h
> @@ -36,6 +36,7 @@ enum type_state_kind {
>  	TSR_KIND_CONST,
>  	TSR_KIND_POINTER,
>  	TSR_KIND_CANARY,
> +	TSR_KIND_GLOBAL_ADDR,
>  };
>  
>  /**
> @@ -177,6 +178,7 @@ struct type_state_reg {
>  	bool caller_saved;
>  	u8 kind;
>  	u8 copied_from;
> +	u64 addr;
>  };
>  
>  /* Type information in a stack location, dynamically allocated */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64
  2025-03-14 16:21 ` [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64 Li Huafei
@ 2025-03-18  2:06   ` Namhyung Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2025-03-18  2:06 UTC (permalink / raw)
  To: Li Huafei
  Cc: acme, leo.yan, james.clark, mark.rutland, john.g.garry, will,
	irogers, mike.leach, peterz, mingo, alexander.shishkin, jolsa,
	kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:37AM +0800, Li Huafei wrote:
> According to the implementation of the 'current' macro on ARM64, the
> sp_el0 register stores the pointer to the current task's task_struct.
> For example:
> 
>  mrs x1, sp_el0
>  ldr x2, [x1, #1896]

Same here.  It'd be great if you could share a real example where it
found the current for x1 in the second instruction.

> 
> We can infer that the ldr instruction is accessing a member of the
> task_struct structure at an offset of 1896. The key is to construct the
> data type for x1. The instruction 'mrs x1, sp_el0' belongs to the inline
> function get_current(). By finding the DIE of the inline function
> through its instruction address, and then obtaining the DIE for its
> return type, which should be 'struct task_struct *'. Then, we update the
> register state of x1 with this type information.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 71 +++++++++++++++----
>  1 file changed, 57 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index f2053e7f60a8..c5a0a6381547 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -263,6 +263,20 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
>  	Dwarf_Die type_die;
>  	int sreg, dreg;
>  	u32 insn_offset = dl->al.offset;
> +	static regex_t add_regex, mrs_regex;
> +	static bool regex_compiled;
> +
> +	if (!regex_compiled) {
> +		/*
> +		 * Matching the operand assembly syntax of the add instruction:
> +		 *
> +		 *  <Xd|SP>, <Xn|SP>, #<imm>
> +		 */
> +		regcomp(&add_regex, "^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
> +			REG_EXTENDED);
> +		regcomp(&mrs_regex, "^(x[0-9]{1,2}), sp_el0", REG_EXTENDED);
> +		regex_compiled = true;
> +	}
>  
>  	/* Access global variables via PC relative addressing, for example:
>  	 *
> @@ -296,20 +310,6 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
>  		regmatch_t match[4];
>  		char *ops = strdup(dl->ops.raw);
>  		u64 offset;
> -		static regex_t add_regex;
> -		static bool regex_compiled;
> -
> -		/*
> -		 * Matching the operand assembly syntax of the add instruction:
> -		 *
> -		 *  <Xd|SP>, <Xn|SP>, #<imm>
> -		 */
> -		if (!regex_compiled) {
> -			regcomp(&add_regex,
> -				"^([xw][0-9]{1,2}|sp), ([xw][0-9]{1,2}|sp), #(0x[0-9a-f]+)",
> -				REG_EXTENDED);
> -			regex_compiled = true;
> -		}
>  
>  		if (!ops)
>  			return;
> @@ -351,6 +351,49 @@ update_insn_state_arm64(struct type_state *state, struct data_loc_info *dloc,
>  		return;
>  	}
>  
> +	if (!strncmp(dl->ins.name, "mrs", 3)) {

It should be kernel specific, you may want to add a check for it like
__map__is_kernel(dloc->ms->map).

Thanks,
Namhyung


> +		regmatch_t match[2];
> +		char *ops = strdup(dl->ops.raw);
> +		Dwarf_Die func_die;
> +		Dwarf_Attribute attr;
> +		u64 ip = dloc->ms->sym->start + dl->al.offset;
> +		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
> +
> +		if (!ops)
> +			return;
> +
> +		if (regexec(&mrs_regex, dl->ops.raw, 2, match, 0))
> +			return;
> +
> +		ops[match[1].rm_eo] = '\0';
> +		sreg = get_arm64_regnum(ops + match[1].rm_so);
> +		if (sreg < 0 || !has_reg_type(state, sreg)) {
> +			free(ops);
> +			return;
> +		}
> +
> +		/*
> +		 * Find the inline function 'get_current()' Dwarf_Die and
> +		 * obtain its return value data type, which should be
> +		 * 'struct task_struct *'.
> +		 */
> +		if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
> +		    !dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
> +		    !dwarf_formref_die(&attr, &type_die)) {
> +			free(ops);
> +			return;
> +		}
> +
> +		tsr = &state->regs[sreg];
> +		tsr->type = type_die;
> +		tsr->kind = TSR_KIND_TYPE;
> +		tsr->ok = true;
> +
> +		pr_debug_dtp("mrs sp_el0 [%x] -> reg%d", insn_offset, sreg);
> +		free(ops);
> +		return;
> +	}
> +
>  	if (strncmp(dl->ins.name, "ld", 2))
>  		return;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf annotate: Handle arm64 load and store instructions
  2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
  2025-03-18  1:32   ` Namhyung Kim
@ 2025-03-18 17:15   ` Leo Yan
  1 sibling, 0 replies; 16+ messages in thread
From: Leo Yan @ 2025-03-18 17:15 UTC (permalink / raw)
  To: Li Huafei
  Cc: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers, mike.leach, peterz, mingo, alexander.shishkin,
	jolsa, kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

Hi Huafei,

On Sat, Mar 15, 2025 at 12:21:31AM +0800, Li Huafei wrote:
> Add ldst_ops to handle load and store instructions in order to parse
> the data types and offsets associated with PMU events for memory access
> instructions. There are many variants of load and store instructions in
> ARM64, making it difficult to match all of these instruction names
> completely. Therefore, only the instruction prefixes are matched. The
> prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
> matches atomic instructions, and 'prf' matches memory prefetch
> instructions.

Thanks a lot for working on this!

> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c | 67 ++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index d465d093e7eb..c212eb7341bd 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -6,7 +6,8 @@
>  
>  struct arm64_annotate {
>  	regex_t call_insn,
> -		jump_insn;
> +		jump_insn,
> +		ldst_insn; /* load and store instruction */
>
>  };
>  
>  static int arm64_mov__parse(struct arch *arch __maybe_unused,
> @@ -67,6 +68,57 @@ static struct ins_ops arm64_mov_ops = {
>  	.scnprintf = mov__scnprintf,
>  };
>  
> +static int arm64_ldst__parse(struct arch *arch __maybe_unused,
> +			     struct ins_operands *ops,
> +			     struct map_symbol *ms __maybe_unused,
> +			     struct disasm_line *dl __maybe_unused)
> +{
> +	char *s, *target;
> +
> +	/*
> +	 * The part starting from the memory access annotation '[' is parsed
> +	 * as 'target', while the part before it is parsed as 'source'.
> +	 */
> +	target = s = strchr(ops->raw, '[');
> +	if (!s)
> +		return -1;

I am wandering if this is sufficient for handling different load /
store instructions.

A simple case is an instruction "ldr x1, [x2]", the handling above
should can work well.  How about instructions below with offsets:

    ldr     x2, [x0, #968]
    ldr     w3, [x25], #4

Could you also confirm if the parsing can fit the pattern for
load/store pair instructions?

    ldp     x29, x30, [sp], #16

The instruction loads paired 64-bit data into two registers.

> +	while (s > ops->raw && *s != ',')
> +		--s;
> +
> +	if (s == ops->raw)
> +		return -1;
> +
> +	*s = '\0';
> +	ops->source.raw = strdup(ops->raw);
>
> +
> +	*s = ',';
> +	if (!ops->source.raw)
> +		return -1;
> +
> +	ops->target.raw = strdup(target);
> +	if (!ops->target.raw) {
> +		zfree(ops->source.raw);
> +		return -1;
> +	}
> +	ops->target.mem_ref = true;
> +
> +	return 0;
> +}
> +
> +static int ldst__scnprintf(struct ins *ins, char *bf, size_t size,
> +			   struct ins_operands *ops, int max_ins_name)
> +{
> +	return scnprintf(bf, size, "%-*s %s,%s", max_ins_name, ins->name,
> +			 ops->source.name ?: ops->source.raw,
> +			 ops->target.name ?: ops->target.raw);
> +}
> +
> +static struct ins_ops arm64_ldst_ops = {
> +	.parse	   = arm64_ldst__parse,
> +	.scnprintf = ldst__scnprintf,
> +};
> +
>  static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
>  {
>  	struct arm64_annotate *arm = arch->priv;
> @@ -77,6 +129,8 @@ static struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const
>  		ops = &jump_ops;
>  	else if (!regexec(&arm->call_insn, name, 2, match, 0))
>  		ops = &call_ops;
> +	else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
> +		ops = &arm64_ldst_ops;
>  	else if (!strcmp(name, "ret"))
>  		ops = &ret_ops;
>  	else
> @@ -107,6 +161,15 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  		      REG_EXTENDED);
>  	if (err)
>  		goto out_free_call;
> +	/*
> +	 * The ARM64 architecture has many variants of load/store instructions.
> +	 * It is quite challenging to match all of them completely. Here, we
> +	 * only match the prefixes of these instructions.
> +	 */
> +	err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
> +		      REG_EXTENDED);

As a first step, it is fine for me to support these memory types.

After I review the whole series, I might go back to check if we can
support other memory instructions (e.g. SVE, Memory Copy and
Memory Set instructions, etc).  At least, we need to avoid any
barriers for extending these instructions.

Thanks,
Leo

> +	if (err)
> +		goto out_free_jump;
>  
>  	arch->initialized = true;
>  	arch->priv	  = arm;
> @@ -117,6 +180,8 @@ static int arm64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  	arch->e_flags = 0;
>  	return 0;
>  
> +out_free_jump:
> +	regfree(&arm->jump_insn);
>  out_free_call:
>  	regfree(&arm->call_insn);
>  out_free_arm:
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse()
  2025-03-14 16:21 ` [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse() Li Huafei
@ 2025-03-18 18:02   ` Leo Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2025-03-18 18:02 UTC (permalink / raw)
  To: Li Huafei
  Cc: namhyung, acme, leo.yan, james.clark, mark.rutland, john.g.garry,
	will, irogers, mike.leach, peterz, mingo, alexander.shishkin,
	jolsa, kjain, mhiramat, atrajeev, sesse, adrian.hunter, kan.liang,
	linux-kernel, linux-arm-kernel, linux-perf-users

On Sat, Mar 15, 2025 at 12:21:32AM +0800, Li Huafei wrote:
> Advance the mem_ref check on x86 to mov__parse(), along with the
> multi_reg check, to make annotate_get_insn_location() more concise.
> 
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> ---
>  tools/perf/util/annotate.c | 9 ++++-----
>  tools/perf/util/disasm.c   | 8 ++++++++
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 31bb326b07a6..860ea6c72411 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2442,18 +2442,17 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
>  				continue;
>  		}
>  
> +		op_loc->mem_ref = mem_ref;
> +		op_loc->multi_regs = multi_regs;
> +
>  		/*
>  		 * For powerpc, call get_powerpc_regs function which extracts the
>  		 * required fields for op_loc, ie reg1, reg2, offset from the
>  		 * raw instruction.
>  		 */
>  		if (arch__is(arch, "powerpc")) {
> -			op_loc->mem_ref = mem_ref;
> -			op_loc->multi_regs = multi_regs;
>  			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
> -		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
> -			op_loc->mem_ref = true;
> -			op_loc->multi_regs = multi_regs;
> +		} else if (mem_ref) {
>  			extract_reg_offset(arch, insn_str, op_loc);
>  		} else {
>  			char *s, *p = NULL;
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 50c5c206b70e..d91526cff9df 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -607,6 +607,12 @@ static bool check_multi_regs(struct arch *arch, const char *op)
>  	return count > 1;
>  }
>  
> +/* Check whether the operand accesses memory. */
> +static bool check_memory_ref(struct arch *arch, const char *op)
> +{
> +	return strchr(op, arch->objdump.memory_ref_char) != NULL;
> +}

This patch looks fine for me.

However, I did not find the 'memory_ref_char' field is set for Arm64.
Later patches even remove the condition checking for mem_ref() and
unconditionally invoke extract_reg_offset().

This is a logic change for me.  Shouldn't the register offset be
extracted only for memory access instructions?


>  static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
>  		struct disasm_line *dl __maybe_unused)
>  {
> @@ -635,6 +641,7 @@ static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_sy
>  	if (ops->source.raw == NULL)
>  		return -1;
>  
> +	ops->source.mem_ref = check_memory_ref(arch, ops->source.raw);
>  	ops->source.multi_regs = check_multi_regs(arch, ops->source.raw);
>  
>  	target = skip_spaces(++s);
> @@ -657,6 +664,7 @@ static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_sy
>  	if (ops->target.raw == NULL)
>  		goto out_free_source;
>  
> +	ops->target.mem_ref = check_memory_ref(arch, ops->target.raw);
>  	ops->target.multi_regs = check_multi_regs(arch, ops->target.raw);
>  
>  	if (comment == NULL)
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-18 18:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14 16:21 [PATCH 0/7] Add data type profiling support for arm64 Li Huafei
2025-03-14 16:21 ` [PATCH 1/7] perf annotate: Handle arm64 load and store instructions Li Huafei
2025-03-18  1:32   ` Namhyung Kim
2025-03-18 17:15   ` Leo Yan
2025-03-14 16:21 ` [PATCH 2/7] perf annotate: Advance the mem_ref check to mov__parse() Li Huafei
2025-03-18 18:02   ` Leo Yan
2025-03-14 16:21 ` [PATCH 3/7] perf annotate: Add 'extract_reg_offset' callback function to extract register number and access offset Li Huafei
2025-03-14 16:21 ` [PATCH 4/7] perf annotate: Support for the 'extract_reg_offset' callback function in arm64 Li Huafei
2025-03-18  1:45   ` Namhyung Kim
2025-03-14 16:21 ` [PATCH 5/7] perf annotate-data: Support instruction tracking for arm64 Li Huafei
2025-03-18  1:51   ` Namhyung Kim
2025-03-14 16:21 ` [PATCH 6/7] perf annotate-data: Handle arm64 global variable access Li Huafei
2025-03-18  2:01   ` Namhyung Kim
2025-03-14 16:21 ` [PATCH 7/7] perf annotate-data: Handle the access to the 'current' pointer on arm64 Li Huafei
2025-03-18  2:06   ` Namhyung Kim
2025-03-18  1:25 ` [PATCH 0/7] Add data type profiling support for arm64 Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).