linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V4 00/16] Add data type profiling support for powerpc
@ 2024-06-14 17:26 Athira Rajeev
  2024-06-14 17:26 ` [V4 01/16] tools/perf: Move the data structures related to register type to header file Athira Rajeev
                   ` (16 more replies)
  0 siblings, 17 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

The patchset from Namhyung added support for data type profiling
in perf tool. This enabled support to associate PMU samples to data
types they refer using DWARF debug information. With the upstream
perf, currently it possible to run perf report or perf annotate to
view the data type information on x86.

Initial patchset posted here had changes need to enable data type
profiling support for powerpc.

https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/

Main change were:
1. powerpc instruction nmemonic table to associate load/store
instructions with move_ops which is use to identify if instruction
is a memory access one.
2. To get register number and access offset from the given
instruction, code uses fields from "struct arch" -> objump.
Added entry for powerpc here.
3. A get_arch_regnum to return register number from the
register name string.

But the apporach used in the initial patchset used parsing of
disassembled code which the current perf tool implementation does.

Example: lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".

In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, second patchset added support to use
raw instruction. With raw instruction, macros are added to extract opcode
and register fields.
Link to second patchset:
https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/

Example representation using --show-raw-insn in objdump gives result:

38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
  _____________________________________
  | 58 |  RT  |  RA |      DS       | |
  -------------------------------------
0    6     11    16              30 31

Second patchset used "objdump" again to read the raw instruction.
But since there is no need to disassemble and binary code can be read
directly from the DSO, third patchset (ie this patchset) uses below
apporach. The apporach preferred in powerpc to parse sample for data
type profiling in V3 patchset is:
- Read directly from DSO using dso__data_read_offset
- If that fails for any case, fallback to using libcapstone
- If libcapstone is not supported, approach will use objdump

Patchset adds support to pick the opcode and reg fields from this
raw/binary instruction code. This approach came in from review comment
by Segher Boessenkool and Christophe for the initial patchset.

Apart from that, instruction tracking is enabled for powerpc and
support function is added to find variables defined as registers
Example, in powerpc, below two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");

2. r1: represents stack_pointer
register void *__stack_pointer asm("r1");

These are handled in this patchset.

- Patch 1 is to rearrange register state type structures to header file
so that it can referred from other arch specific files
- Patch 2 is to make instruction tracking as a callback to"struct arch"
so that it can be implemented by other archs easily and defined in arch
specific files
- Patch 3 adds support to capture and parse raw instruction in powerpc
using dso__data_read_offset utility
- Patch 4 adds logic to support using objdump when doing default "perf
report" or "perf annotate" since it that needs disassembled instruction.
- Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
- Patch 6 update parameters for reg extract functions to use raw
instruction on powerpc
- Patch 7 add support to identify memory instructions of opcode 31 in
powerpc
- Patch 8 adds more instructions to support instruction tracking in powerpc
- Patch 9 and 10 handles instruction tracking for powerpc.
- Patch 11, 12 and 13 add support to use libcapstone in powerpc
- Patch 14 and patch 15 handles support to find global register variables
- Patch 16 handles insn-stat option for perf annotate

Note:
- There are remaining unknowns (25%) as seen in annotate Instruction stats
below.
- This patchset is not tested on powerpc32. In next step of enhancements
along with handling remaining unknowns, plan to cover powerpc32 changes
based on how testing goes.

With the current patchset:

 ./perf record -a -e mem-loads sleep 1
 ./perf report -s type,typeoff --hierarchy --group --stdio
 ./perf annotate --data-type --insn-stat

perf annotate logs:
==================

Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)

  Name/opcode:  Good   Bad
  -----------------------------------------------------------
  58                  :   323    80
  32                  :    49    43
  34                  :    33    11
  OP_31_XOP_LDX       :     8    20
  40                  :    23     0
  OP_31_XOP_LWARX     :     5     1
  OP_31_XOP_LWZX      :     2     3
  OP_31_XOP_LDARX     :     3     0
  33                  :     0     2
  OP_31_XOP_LBZX      :     0     1
  OP_31_XOP_LWAX      :     0     1
  OP_31_XOP_LHZX      :     0     1

perf report logs:
=================

  Total Lost Samples: 0

  Samples: 1K of event 'mem-loads'
  Event count (approx.): 937238

  Overhead  Data Type  Data Type Offset
 ........  .........  ................

    48.60%  (unknown)  (unknown) +0 (no field)
    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
     2.69%  struct paca_struct  struct paca_struct +2808 (canary)
     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
     1.41%  struct vm_fault  struct vm_fault +0 (vma)
     1.29%  struct task_struct  struct task_struct +276 (flags)
     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
     0.76%  struct rq  struct rq +2856 (cpu)

Thanks
Athira Rajeev

Changelog:
From v3->v4:
- Addressed review comments from Ian by using capston_init from
  "util/print_insn.c" instead of "open_capston_handle".
- Addressed review comment from Namhyung by moving "opcode"
  field from "struct ins" to "struct disasm_line"

From v2->v3:
- Addressed review comments from Christophe and Namhyung for V2
- Changed the apporach in powerpc to parse sample for data
  type profiling as:
  Read directly from DSO using dso__data_read_offset
  If that fails for any case, fallback to using libcapstone
  If libcapstone is not supported, approach will use objdump
- Include instructions with opcode as 31 and correctly categorize
  them as memory or arithmetic instructions.
- Include more instructions for instruction tracking in powerpc

From v1->v2:
- Addressed suggestion from Christophe Leroy and Segher Boessenkool
  to use the binary code (raw insn) to fetch opcode, register and
  offset fields.
- Added support for instruction tracking in powerpc
- Find the register defined variables (r13 and r1 which points to
  local_paca and current_stack_pointer in powerpc)

Athira Rajeev (16):
  tools/perf: Move the data structures related to register type to
    header file
  tools/perf: Add "update_insn_state" callback function to handle arch
    specific instruction tracking
  tools/perf: Add support to capture and parse raw instruction in
    powerpc using dso__data_read_offset utility
  tools/perf: Use sort keys to determine whether to pick objdump to
    disassemble
  tools/perf: Add disasm_line__parse to parse raw instruction for
    powerpc
  tools/perf: Update parameters for reg extract functions to use raw
    instruction on powerpc
  tools/perf: Add support to identify memory instructions of opcode 31
    in powerpc
  tools/perf: Add some of the arithmetic instructions to support
    instruction tracking in powerpc
  tools/perf: Add more instructions for instruction tracking
  tools/perf: Update instruction tracking for powerpc
  tools/perf: Make capstone_init non-static so that it can be used
    during symbol disassemble
  tools/perf: Use capstone_init and remove open_capstone_handle from
    disasm.c
  tools/perf: Add support to use libcapstone in powerpc
  tools/perf: Add support to find global register variables using
    find_data_type_global_reg
  tools/perf: Add support for global_die to capture name of variable in
    case of register defined variable
  tools/perf: Set instruction name to be used with insn-stat when using
    raw instruction

 tools/include/linux/string.h                  |   2 +
 tools/lib/string.c                            |  13 +
 tools/perf/arch/arm64/annotate/instructions.c |   3 +-
 .../arch/loongarch/annotate/instructions.c    |   6 +-
 .../perf/arch/powerpc/annotate/instructions.c | 260 +++++++++
 tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
 tools/perf/arch/s390/annotate/instructions.c  |   5 +-
 tools/perf/arch/x86/annotate/instructions.c   | 383 +++++++++++++
 tools/perf/builtin-annotate.c                 |   4 +-
 tools/perf/util/annotate-data.c               | 519 +++---------------
 tools/perf/util/annotate-data.h               |  78 +++
 tools/perf/util/annotate.c                    |  35 +-
 tools/perf/util/annotate.h                    |   6 +-
 tools/perf/util/disasm.c                      | 475 ++++++++++++++--
 tools/perf/util/disasm.h                      |  13 +-
 tools/perf/util/dwarf-aux.c                   |   1 +
 tools/perf/util/dwarf-aux.h                   |   1 +
 tools/perf/util/include/dwarf-regs.h          |   4 +
 tools/perf/util/print_insn.c                  |  15 +-
 tools/perf/util/print_insn.h                  |   5 +
 tools/perf/util/sort.c                        |   7 +-
 21 files changed, 1386 insertions(+), 502 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [V4 01/16] tools/perf: Move the data structures related to register type to header file
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  5:15   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 02/16] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Data type profiling uses instruction tracking by checking each
instruction and updating the register type state in some data
structures. This is useful to find the data type in cases when the
register state gets transferred from one reg to another. Example, in
x86, "mov" instruction and in powerpc, "mr" instruction. Currently these
structures are defined in annotate-data.c and instruction tracking is
implemented only for x86. Move these data structures to
"annotate-data.h" header file so that other arch implementations can use
it in arch specific files as well.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c | 53 +------------------------------
 tools/perf/util/annotate-data.h | 55 +++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 52 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 965da6c0b542..a4c7f98a75e3 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -31,15 +31,6 @@
 
 static void delete_var_types(struct die_var_type *var_types);
 
-enum type_state_kind {
-	TSR_KIND_INVALID = 0,
-	TSR_KIND_TYPE,
-	TSR_KIND_PERCPU_BASE,
-	TSR_KIND_CONST,
-	TSR_KIND_POINTER,
-	TSR_KIND_CANARY,
-};
-
 #define pr_debug_dtp(fmt, ...)					\
 do {								\
 	if (debug_type_profile)					\
@@ -140,49 +131,7 @@ static void pr_debug_location(Dwarf_Die *die, u64 pc, int reg)
 	}
 }
 
-/*
- * Type information in a register, valid when @ok is true.
- * The @caller_saved registers are invalidated after a function call.
- */
-struct type_state_reg {
-	Dwarf_Die type;
-	u32 imm_value;
-	bool ok;
-	bool caller_saved;
-	u8 kind;
-};
-
-/* Type information in a stack location, dynamically allocated */
-struct type_state_stack {
-	struct list_head list;
-	Dwarf_Die type;
-	int offset;
-	int size;
-	bool compound;
-	u8 kind;
-};
-
-/* FIXME: This should be arch-dependent */
-#define TYPE_STATE_MAX_REGS  16
-
-/*
- * State table to maintain type info in each register and stack location.
- * It'll be updated when new variable is allocated or type info is moved
- * to a new location (register or stack).  As it'd be used with the
- * shortest path of basic blocks, it only maintains a single table.
- */
-struct type_state {
-	/* state of general purpose registers */
-	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
-	/* state of stack location */
-	struct list_head stack_vars;
-	/* return value register */
-	int ret_reg;
-	/* stack pointer register */
-	int stack_reg;
-};
-
-static bool has_reg_type(struct type_state *state, int reg)
+bool has_reg_type(struct type_state *state, int reg)
 {
 	return (unsigned)reg < ARRAY_SIZE(state->regs);
 }
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 0a57d9f5ee78..ef235b1b15e1 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -6,6 +6,9 @@
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include "dwarf-aux.h"
+#include "annotate.h"
+#include "debuginfo.h"
 
 struct annotated_op_loc;
 struct debuginfo;
@@ -15,6 +18,15 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+enum type_state_kind {
+	TSR_KIND_INVALID = 0,
+	TSR_KIND_TYPE,
+	TSR_KIND_PERCPU_BASE,
+	TSR_KIND_CONST,
+	TSR_KIND_POINTER,
+	TSR_KIND_CANARY,
+};
+
 /**
  * struct annotated_member - Type of member field
  * @node: List entry in the parent list
@@ -142,6 +154,48 @@ struct annotated_data_stat {
 };
 extern struct annotated_data_stat ann_data_stat;
 
+/*
+ * Type information in a register, valid when @ok is true.
+ * The @caller_saved registers are invalidated after a function call.
+ */
+struct type_state_reg {
+	Dwarf_Die type;
+	u32 imm_value;
+	bool ok;
+	bool caller_saved;
+	u8 kind;
+};
+
+/* Type information in a stack location, dynamically allocated */
+struct type_state_stack {
+	struct list_head list;
+	Dwarf_Die type;
+	int offset;
+	int size;
+	bool compound;
+	u8 kind;
+};
+
+/* FIXME: This should be arch-dependent */
+#define TYPE_STATE_MAX_REGS  32
+
+/*
+ * State table to maintain type info in each register and stack location.
+ * It'll be updated when new variable is allocated or type info is moved
+ * to a new location (register or stack).  As it'd be used with the
+ * shortest path of basic blocks, it only maintains a single table.
+ */
+struct type_state {
+	/* state of general purpose registers */
+	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
+	/* state of stack location */
+	struct list_head stack_vars;
+	/* return value register */
+	int ret_reg;
+	/* stack pointer register */
+	int stack_reg;
+};
+
 #ifdef HAVE_DWARF_SUPPORT
 
 /* Returns data type at the location (ip, reg, offset) */
@@ -160,6 +214,7 @@ void global_var_type__tree_delete(struct rb_root *root);
 
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
+bool has_reg_type(struct type_state *state, int reg);
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 02/16] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
  2024-06-14 17:26 ` [V4 01/16] tools/perf: Move the data structures related to register type to header file Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Add "update_insn_state" callback to "struct arch" to handle instruction
tracking. Currently updating instruction state is handled by static
function "update_insn_state_x86" which is defined in "annotate-data.c".
Make this as a callback for specific arch and move to archs specific
file "arch/x86/annotate/instructions.c" . This will help to add helper
function for other platforms in file:
"arch/<platform>/annotate/instructions.c and make changes/updates
easier.

Define callback "update_insn_state" as part of "struct arch", also make
some of the debug functions non-static so that it can be referenced from
other places.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/x86/annotate/instructions.c | 383 +++++++++++++++++++
 tools/perf/util/annotate-data.c             | 391 +-------------------
 tools/perf/util/annotate-data.h             |  23 ++
 tools/perf/util/disasm.c                    |   2 +
 tools/perf/util/disasm.h                    |   7 +
 5 files changed, 423 insertions(+), 383 deletions(-)

diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c
index 5cdf457f5cbe..715d8ce65f7f 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -206,3 +206,386 @@ static int x86__annotate_init(struct arch *arch, char *cpuid)
 	arch->initialized = true;
 	return err;
 }
+
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_x86(struct type_state *state,
+				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				  struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int fbreg = dloc->fbreg;
+	int fboff = 0;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	if (ins__is_call(&dl->ins)) {
+		struct symbol *func = dl->ops.target.sym;
+
+		if (func == NULL)
+			return;
+
+		/* __fentry__ will preserve all registers */
+		if (!strcmp(func->name, "__fentry__"))
+			return;
+
+		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
+
+		/* Otherwise invalidate caller-saved registers after call */
+		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
+			if (state->regs[i].caller_saved)
+				state->regs[i].ok = false;
+		}
+
+		/* Update register with the return type (if any) */
+		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
+			tsr = &state->regs[state->ret_reg];
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("call [%x] return -> reg%d",
+				     insn_offset, state->ret_reg);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		return;
+	}
+
+	if (!strncmp(dl->ins.name, "add", 3)) {
+		u64 imm_value = -1ULL;
+		int offset;
+		const char *var_name = NULL;
+		struct map_symbol *ms = dloc->ms;
+		u64 ip = ms->sym->start + dl->al.offset;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+		if (src->imm)
+			imm_value = src->offset;
+		else if (has_reg_type(state, src->reg1) &&
+			 state->regs[src->reg1].kind == TSR_KIND_CONST)
+			imm_value = state->regs[src->reg1].imm_value;
+		else if (src->reg1 == DWARF_REG_PC) {
+			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
+							   src->offset, dl);
+
+			if (get_global_var_info(dloc, var_addr,
+						&var_name, &offset) &&
+			    !strcmp(var_name, "this_cpu_off") &&
+			    tsr->kind == TSR_KIND_CONST) {
+				tsr->kind = TSR_KIND_PERCPU_BASE;
+				imm_value = tsr->imm_value;
+			}
+		}
+		else
+			return;
+
+		if (tsr->kind != TSR_KIND_PERCPU_BASE)
+			return;
+
+		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
+					&type_die) && offset == 0) {
+			/*
+			 * This is not a pointer type, but it should be treated
+			 * as a pointer.
+			 */
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_POINTER;
+			tsr->ok = true;
+
+			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
+				     insn_offset, imm_value, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		return;
+	}
+
+	if (strncmp(dl->ins.name, "mov", 3))
+		return;
+
+	if (dloc->fb_cfa) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+			fbreg = -1;
+	}
+
+	/* Case 1. register to register or segment:offset to register transfers */
+	if (!src->mem_ref && !dst->mem_ref) {
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+		if (dso__kernel(map__dso(dloc->ms->map)) &&
+		    src->segment == INSN_SEG_X86_GS && src->imm) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr;
+			int offset;
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			var_addr = src->offset;
+
+			if (var_addr == 40) {
+				tsr->kind = TSR_KIND_CANARY;
+				tsr->ok = true;
+
+				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
+					     insn_offset, dst->reg1);
+				return;
+			}
+
+			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
+						 &offset, &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
+				     insn_offset, var_addr, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+			return;
+		}
+
+		if (src->imm) {
+			tsr->kind = TSR_KIND_CONST;
+			tsr->imm_value = src->offset;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+				     insn_offset, tsr->imm_value, dst->reg1);
+			return;
+		}
+
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		tsr->type = state->regs[src->reg1].type;
+		tsr->kind = state->regs[src->reg1].kind;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			     insn_offset, src->reg1, dst->reg1);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+	/* Case 2. memory to register transers */
+	if (src->mem_ref && !dst->mem_ref) {
+		int sreg = src->reg1;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+retry:
+		/* Check stack variables with offset */
+		if (sreg == fbreg) {
+			struct type_state_stack *stack;
+			int offset = src->offset - fboff;
+
+			stack = find_stack_state(state, offset);
+			if (stack == NULL) {
+				tsr->ok = false;
+				return;
+			} else if (!stack->compound) {
+				tsr->type = stack->type;
+				tsr->kind = stack->kind;
+				tsr->ok = true;
+			} else if (die_get_member_type(&stack->type,
+						       offset - stack->offset,
+						       &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+			} else {
+				tsr->ok = false;
+				return;
+			}
+
+			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
+				     insn_offset, -offset, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* And then dereference the pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_TYPE &&
+			 die_deref_ptr_type(&state->regs[sreg].type,
+					    src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or check if it's a global variable */
+		else if (sreg == DWARF_REG_PC) {
+			struct map_symbol *ms = dloc->ms;
+			u64 ip = ms->sym->start + dl->al.offset;
+			u64 addr;
+			int offset;
+
+			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
+
+			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
+						 &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
+				     insn_offset, addr, dst->reg1);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		/* And check percpu access with base register */
+		else if (has_reg_type(state, sreg) &&
+			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr = src->offset;
+			int offset;
+
+			if (src->multi_regs) {
+				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
+
+				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
+				    state->regs[reg2].kind == TSR_KIND_CONST)
+					var_addr += state->regs[reg2].imm_value;
+			}
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			if (get_global_var_type(cu_die, dloc, ip, var_addr,
+						&offset, &type_die) &&
+			    die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+
+				if (src->multi_regs) {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
+						     insn_offset, src->offset, src->reg1,
+						     src->reg2, dst->reg1);
+				} else {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
+						     insn_offset, src->offset, sreg, dst->reg1);
+				}
+				pr_debug_type_name(&tsr->type, tsr->kind);
+			} else {
+				tsr->ok = false;
+			}
+		}
+		/* And then dereference the calculated pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_POINTER &&
+			 die_get_member_type(&state->regs[sreg].type,
+					     src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or try another register if any */
+		else if (src->multi_regs && sreg == src->reg1 &&
+			 src->reg1 != src->reg2) {
+			sreg = src->reg2;
+			goto retry;
+		}
+		else {
+			int offset;
+			const char *var_name = NULL;
+
+			/* it might be per-cpu variable (in kernel) access */
+			if (src->offset < 0) {
+				if (get_global_var_info(dloc, (s64)src->offset,
+							&var_name, &offset) &&
+				    !strcmp(var_name, "__per_cpu_offset")) {
+					tsr->kind = TSR_KIND_PERCPU_BASE;
+
+					pr_debug_dtp("mov [%x] percpu base reg%d\n",
+						     insn_offset, dst->reg1);
+				}
+			}
+
+			tsr->ok = false;
+		}
+	}
+	/* Case 3. register to memory transfers */
+	if (!src->mem_ref && dst->mem_ref) {
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok)
+			return;
+
+		/* Check stack variables with offset */
+		if (dst->reg1 == fbreg) {
+			struct type_state_stack *stack;
+			int offset = dst->offset - fboff;
+
+			tsr = &state->regs[src->reg1];
+
+			stack = find_stack_state(state, offset);
+			if (stack) {
+				/*
+				 * The source register is likely to hold a type
+				 * of member if it's a compound type.  Do not
+				 * update the stack variable type since we can
+				 * get the member type later by using the
+				 * die_get_member_type().
+				 */
+				if (!stack->compound)
+					set_stack_state(stack, offset, tsr->kind,
+							&tsr->type);
+			} else {
+				findnew_stack_state(state, offset, tsr->kind,
+						    &tsr->type);
+			}
+
+			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
+				     insn_offset, src->reg1, -offset);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/*
+		 * Ignore other transfers since it'd set a value in a struct
+		 * and won't change the type.
+		 */
+	}
+	/* Case 4. memory to memory transfers (not handled for now) */
+}
+#else /* HAVE_DWARF_SUPPORT */
+static void update_insn_state_x86(struct type_state *state __maybe_unused, struct data_loc_info *dloc __maybe_unused,
+		Dwarf_Die * cu_die __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	return;
+}
+#endif
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index a4c7f98a75e3..7a48c3d72b89 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -39,7 +39,7 @@ do {								\
 		pr_debug3(fmt, ##__VA_ARGS__);			\
 } while (0)
 
-static void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
 {
 	struct strbuf sb;
 	char *str;
@@ -390,7 +390,7 @@ static int check_variable(struct data_loc_info *dloc, Dwarf_Die *var_die,
 	return 0;
 }
 
-static struct type_state_stack *find_stack_state(struct type_state *state,
+struct type_state_stack *find_stack_state(struct type_state *state,
 						 int offset)
 {
 	struct type_state_stack *stack;
@@ -406,7 +406,7 @@ static struct type_state_stack *find_stack_state(struct type_state *state,
 	return NULL;
 }
 
-static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 			    Dwarf_Die *type_die)
 {
 	int tag;
@@ -433,7 +433,7 @@ static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 	}
 }
 
-static struct type_state_stack *findnew_stack_state(struct type_state *state,
+struct type_state_stack *findnew_stack_state(struct type_state *state,
 						    int offset, u8 kind,
 						    Dwarf_Die *type_die)
 {
@@ -537,7 +537,7 @@ void global_var_type__tree_delete(struct rb_root *root)
 	}
 }
 
-static bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
 				const char **var_name, int *var_offset)
 {
 	struct addr_location al;
@@ -611,7 +611,7 @@ static void global_var__collect(struct data_loc_info *dloc)
 	}
 }
 
-static bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
 				u64 ip, u64 var_addr, int *var_offset,
 				Dwarf_Die *type_die)
 {
@@ -722,381 +722,6 @@ static void update_var_state(struct type_state *state, struct data_loc_info *dlo
 	}
 }
 
-static void update_insn_state_x86(struct type_state *state,
-				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
-				  struct disasm_line *dl)
-{
-	struct annotated_insn_loc loc;
-	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
-	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
-	struct type_state_reg *tsr;
-	Dwarf_Die type_die;
-	u32 insn_offset = dl->al.offset;
-	int fbreg = dloc->fbreg;
-	int fboff = 0;
-
-	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
-		return;
-
-	if (ins__is_call(&dl->ins)) {
-		struct symbol *func = dl->ops.target.sym;
-
-		if (func == NULL)
-			return;
-
-		/* __fentry__ will preserve all registers */
-		if (!strcmp(func->name, "__fentry__"))
-			return;
-
-		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
-
-		/* Otherwise invalidate caller-saved registers after call */
-		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
-			if (state->regs[i].caller_saved)
-				state->regs[i].ok = false;
-		}
-
-		/* Update register with the return type (if any) */
-		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
-			tsr = &state->regs[state->ret_reg];
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("call [%x] return -> reg%d",
-				     insn_offset, state->ret_reg);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		return;
-	}
-
-	if (!strncmp(dl->ins.name, "add", 3)) {
-		u64 imm_value = -1ULL;
-		int offset;
-		const char *var_name = NULL;
-		struct map_symbol *ms = dloc->ms;
-		u64 ip = ms->sym->start + dl->al.offset;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-		if (src->imm)
-			imm_value = src->offset;
-		else if (has_reg_type(state, src->reg1) &&
-			 state->regs[src->reg1].kind == TSR_KIND_CONST)
-			imm_value = state->regs[src->reg1].imm_value;
-		else if (src->reg1 == DWARF_REG_PC) {
-			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
-							   src->offset, dl);
-
-			if (get_global_var_info(dloc, var_addr,
-						&var_name, &offset) &&
-			    !strcmp(var_name, "this_cpu_off") &&
-			    tsr->kind == TSR_KIND_CONST) {
-				tsr->kind = TSR_KIND_PERCPU_BASE;
-				imm_value = tsr->imm_value;
-			}
-		}
-		else
-			return;
-
-		if (tsr->kind != TSR_KIND_PERCPU_BASE)
-			return;
-
-		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
-					&type_die) && offset == 0) {
-			/*
-			 * This is not a pointer type, but it should be treated
-			 * as a pointer.
-			 */
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_POINTER;
-			tsr->ok = true;
-
-			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
-				     insn_offset, imm_value, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		return;
-	}
-
-	if (strncmp(dl->ins.name, "mov", 3))
-		return;
-
-	if (dloc->fb_cfa) {
-		u64 ip = dloc->ms->sym->start + dl->al.offset;
-		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
-
-		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
-			fbreg = -1;
-	}
-
-	/* Case 1. register to register or segment:offset to register transfers */
-	if (!src->mem_ref && !dst->mem_ref) {
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-		if (dso__kernel(map__dso(dloc->ms->map)) &&
-		    src->segment == INSN_SEG_X86_GS && src->imm) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr;
-			int offset;
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			var_addr = src->offset;
-
-			if (var_addr == 40) {
-				tsr->kind = TSR_KIND_CANARY;
-				tsr->ok = true;
-
-				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
-					     insn_offset, dst->reg1);
-				return;
-			}
-
-			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
-						 &offset, &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
-				     insn_offset, var_addr, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-			return;
-		}
-
-		if (src->imm) {
-			tsr->kind = TSR_KIND_CONST;
-			tsr->imm_value = src->offset;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
-				     insn_offset, tsr->imm_value, dst->reg1);
-			return;
-		}
-
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok) {
-			tsr->ok = false;
-			return;
-		}
-
-		tsr->type = state->regs[src->reg1].type;
-		tsr->kind = state->regs[src->reg1].kind;
-		tsr->ok = true;
-
-		pr_debug_dtp("mov [%x] reg%d -> reg%d",
-			     insn_offset, src->reg1, dst->reg1);
-		pr_debug_type_name(&tsr->type, tsr->kind);
-	}
-	/* Case 2. memory to register transers */
-	if (src->mem_ref && !dst->mem_ref) {
-		int sreg = src->reg1;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-retry:
-		/* Check stack variables with offset */
-		if (sreg == fbreg) {
-			struct type_state_stack *stack;
-			int offset = src->offset - fboff;
-
-			stack = find_stack_state(state, offset);
-			if (stack == NULL) {
-				tsr->ok = false;
-				return;
-			} else if (!stack->compound) {
-				tsr->type = stack->type;
-				tsr->kind = stack->kind;
-				tsr->ok = true;
-			} else if (die_get_member_type(&stack->type,
-						       offset - stack->offset,
-						       &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-			} else {
-				tsr->ok = false;
-				return;
-			}
-
-			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
-				     insn_offset, -offset, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* And then dereference the pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_TYPE &&
-			 die_deref_ptr_type(&state->regs[sreg].type,
-					    src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or check if it's a global variable */
-		else if (sreg == DWARF_REG_PC) {
-			struct map_symbol *ms = dloc->ms;
-			u64 ip = ms->sym->start + dl->al.offset;
-			u64 addr;
-			int offset;
-
-			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
-
-			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
-						 &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
-				     insn_offset, addr, dst->reg1);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		/* And check percpu access with base register */
-		else if (has_reg_type(state, sreg) &&
-			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr = src->offset;
-			int offset;
-
-			if (src->multi_regs) {
-				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
-
-				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
-				    state->regs[reg2].kind == TSR_KIND_CONST)
-					var_addr += state->regs[reg2].imm_value;
-			}
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			if (get_global_var_type(cu_die, dloc, ip, var_addr,
-						&offset, &type_die) &&
-			    die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-
-				if (src->multi_regs) {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
-						     insn_offset, src->offset, src->reg1,
-						     src->reg2, dst->reg1);
-				} else {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
-						     insn_offset, src->offset, sreg, dst->reg1);
-				}
-				pr_debug_type_name(&tsr->type, tsr->kind);
-			} else {
-				tsr->ok = false;
-			}
-		}
-		/* And then dereference the calculated pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_POINTER &&
-			 die_get_member_type(&state->regs[sreg].type,
-					     src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or try another register if any */
-		else if (src->multi_regs && sreg == src->reg1 &&
-			 src->reg1 != src->reg2) {
-			sreg = src->reg2;
-			goto retry;
-		}
-		else {
-			int offset;
-			const char *var_name = NULL;
-
-			/* it might be per-cpu variable (in kernel) access */
-			if (src->offset < 0) {
-				if (get_global_var_info(dloc, (s64)src->offset,
-							&var_name, &offset) &&
-				    !strcmp(var_name, "__per_cpu_offset")) {
-					tsr->kind = TSR_KIND_PERCPU_BASE;
-
-					pr_debug_dtp("mov [%x] percpu base reg%d\n",
-						     insn_offset, dst->reg1);
-				}
-			}
-
-			tsr->ok = false;
-		}
-	}
-	/* Case 3. register to memory transfers */
-	if (!src->mem_ref && dst->mem_ref) {
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok)
-			return;
-
-		/* Check stack variables with offset */
-		if (dst->reg1 == fbreg) {
-			struct type_state_stack *stack;
-			int offset = dst->offset - fboff;
-
-			tsr = &state->regs[src->reg1];
-
-			stack = find_stack_state(state, offset);
-			if (stack) {
-				/*
-				 * The source register is likely to hold a type
-				 * of member if it's a compound type.  Do not
-				 * update the stack variable type since we can
-				 * get the member type later by using the
-				 * die_get_member_type().
-				 */
-				if (!stack->compound)
-					set_stack_state(stack, offset, tsr->kind,
-							&tsr->type);
-			} else {
-				findnew_stack_state(state, offset, tsr->kind,
-						    &tsr->type);
-			}
-
-			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
-				     insn_offset, src->reg1, -offset);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/*
-		 * Ignore other transfers since it'd set a value in a struct
-		 * and won't change the type.
-		 */
-	}
-	/* Case 4. memory to memory transfers (not handled for now) */
-}
-
 /**
  * update_insn_state - Update type state for an instruction
  * @state: type state table
@@ -1115,8 +740,8 @@ static void update_insn_state_x86(struct type_state *state,
 static void update_insn_state(struct type_state *state, struct data_loc_info *dloc,
 			      Dwarf_Die *cu_die, struct disasm_line *dl)
 {
-	if (arch__is(dloc->arch, "x86"))
-		update_insn_state_x86(state, dloc, cu_die, dl);
+	if (dloc->arch->update_insn_state)
+		dloc->arch->update_insn_state(state, dloc, cu_die, dl);
 }
 
 /*
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index ef235b1b15e1..2bc870e61c74 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -7,6 +7,7 @@
 #include <linux/rbtree.h>
 #include <linux/types.h>
 #include "dwarf-aux.h"
+#include "dwarf-regs.h"
 #include "annotate.h"
 #include "debuginfo.h"
 
@@ -18,6 +19,14 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+#define pr_debug_dtp(fmt, ...)					\
+do {								\
+	if (debug_type_profile)					\
+		pr_info(fmt, ##__VA_ARGS__);			\
+	else							\
+		pr_debug3(fmt, ##__VA_ARGS__);			\
+} while (0)
+
 enum type_state_kind {
 	TSR_KIND_INVALID = 0,
 	TSR_KIND_TYPE,
@@ -215,6 +224,20 @@ void global_var_type__tree_delete(struct rb_root *root);
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
 bool has_reg_type(struct type_state *state, int reg);
+struct type_state_stack *findnew_stack_state(struct type_state *state,
+						int offset, u8 kind,
+						Dwarf_Die *type_die);
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+				Dwarf_Die *type_die);
+struct type_state_stack *find_stack_state(struct type_state *state,
+						int offset);
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+				u64 ip, u64 var_addr, int *var_offset,
+				Dwarf_Die *type_die);
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+				const char **var_name, int *var_offset);
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind);
+
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 72aec8f61b94..b5fe3a7508bb 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -12,6 +12,7 @@
 #include <subcmd/run-command.h>
 
 #include "annotate.h"
+#include "annotate-data.h"
 #include "build-id.h"
 #include "debug.h"
 #include "disasm.h"
@@ -145,6 +146,7 @@ static struct arch architectures[] = {
 			.memory_ref_char = '(',
 			.imm_char = '$',
 		},
+		.update_insn_state = update_insn_state_x86,
 	},
 	{
 		.name = "powerpc",
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 3d381a043520..718177fa4775 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -3,12 +3,16 @@
 #define __PERF_UTIL_DISASM_H
 
 #include "map_symbol.h"
+#include "dwarf-aux.h"
 
 struct annotation_options;
 struct disasm_line;
 struct ins;
 struct evsel;
 struct symbol;
+struct data_loc_info;
+struct type_state;
+struct disasm_line;
 
 struct arch {
 	const char	*name;
@@ -32,6 +36,9 @@ struct arch {
 		char memory_ref_char;
 		char imm_char;
 	} objdump;
+	void		(*update_insn_state)(struct type_state *state,
+				struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				struct disasm_line *dl);
 };
 
 struct ins {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
  2024-06-14 17:26 ` [V4 01/16] tools/perf: Move the data structures related to register type to header file Athira Rajeev
  2024-06-14 17:26 ` [V4 02/16] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  5:29   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble Athira Rajeev
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Add support to capture and parse raw instruction in powerpc.
Currently, the perf tool infrastructure uses two ways to disassemble
and understand the instruction. One is objdump and other option is
via libcapstone.

Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdump" while disassemble. Example from powerpc with this option
for an instruction address is:

Snippet from:
objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>

c0000000010224b4:	lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".

In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, patch adds support to use raw
instruction for powerpc. Approach used is to read the raw instruction
directly from the DSO file using "dso__data_read_offset" utility which
is already implemented in perf infrastructure in "util/dso.c".

Example:

38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:

   | 58 |  RT  |  RA |      DS       | |
   -------------------------------------
   0    6     11    16              30 31

Function "symbol__disassemble_dso" is updated to read raw instruction
directly from DSO using dso__data_read_offset utility. In case of
above example, this captures:
line:    38 01 81 e8

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c | 98 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index b5fe3a7508bb..f19496133bf0 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1586,6 +1586,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 }
 #endif
 
+static int symbol__disassemble_dso(char *filename, struct symbol *sym,
+					struct annotate_args *args)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	struct map *map = args->ms.map;
+	struct dso *dso = map__dso(map);
+	u64 start = map__rip_2objdump(map, sym->start);
+	u64 end = map__rip_2objdump(map, sym->end);
+	u64 len = end - start;
+	u64 offset;
+	int i, count;
+	u8 *buf = NULL;
+	char disasm_buf[512];
+	struct disasm_line *dl;
+	u32 *line;
+
+	/* Return if objdump is specified explicitly */
+	if (args->options->objdump_path)
+		return -1;
+
+	pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);
+
+	buf = malloc(len);
+	if (buf == NULL)
+		goto err;
+
+	count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
+
+	line = (u32 *)buf;
+
+	if ((u64)count != len)
+		goto err;
+
+	/* add the function address and name */
+	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
+		  start, sym->name);
+
+	args->offset = -1;
+	args->line = disasm_buf;
+	args->line_nr = 0;
+	args->fileloc = NULL;
+	args->ms.sym = sym;
+
+	dl = disasm_line__new(args);
+	if (dl == NULL)
+		goto err;
+
+	annotation_line__add(&dl->al, &notes->src->source);
+
+	/* Each raw instruction is 4 byte */
+	count = len/4;
+
+	for (i = 0, offset = 0; i < count; i++) {
+		args->offset = offset;
+		sprintf(args->line, "%x", line[i]);
+		dl = disasm_line__new(args);
+		if (dl == NULL)
+			goto err;
+
+		annotation_line__add(&dl->al, &notes->src->source);
+		offset += 4;
+	}
+
+	/* It failed in the middle */
+	if (offset != len) {
+		struct list_head *list = &notes->src->source;
+
+		/* Discard all lines and fallback to objdump */
+		while (!list_empty(list)) {
+			dl = list_first_entry(list, struct disasm_line, al.node);
+
+			list_del_init(&dl->al.node);
+			disasm_line__free(dl);
+		}
+		count = -1;
+	}
+
+out:
+	free(buf);
+	return count < 0 ? count : 0;
+
+err:
+	count = -1;
+	goto out;
+}
 /*
  * Possibly create a new version of line with tabs expanded. Returns the
  * existing or new line, storage is updated if a new line is allocated. If
@@ -1710,6 +1795,19 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 		strcpy(symfs_filename, tmp);
 	}
 
+	/*
+	 * For powerpc data type profiling, use the dso__data_read_offset
+	 * to read raw instruction directly and interpret the binary code
+	 * to understand instructions and register fields. For sort keys as
+	 * type and typeoff, disassemble to mnemonic notation is
+	 * not required in case of powerpc.
+	 */
+	if (arch__is(args->arch, "powerpc")) {
+		err = symbol__disassemble_dso(symfs_filename, sym, args);
+		if (err == 0)
+			goto out_remove_tmp;
+	}
+
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 	err = symbol__disassemble_capstone(symfs_filename, sym, args);
 	if (err == 0)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (2 preceding siblings ...)
  2024-06-14 17:26 ` [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  5:32   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

perf annotate can be done in different ways. One way is to directly use
"perf annotate" command, other way to annotate specific symbol is to do
"perf report" and press "a" on the sample in UI mode. The approach
preferred in powerpc to parse sample for data type profiling is:
- Read directly from DSO using dso__data_read_offset
- If that fails for any case, fallback to using libcapstone
- If libcapstone is not supported, approach will use objdump

The above works well when perf report is invoked with only sort keys for
data type ie type and typeoff. Because there is no instruction level
annotation needed if only data type information is requested for. For
annotating sample, along with type and typeoff sort key, "sym" sort key
is also needed. And by default invoking just "perf report" uses sort key
"sym" that displays the symbol information.

With approach changes in powerpc which first reads DSO for raw
instruction, "perf annotate" and "perf report" + a key breaks since
it doesn't do the instruction level disassembly.

Snippet of result from perf report:

Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work  /usr/bin/pmlogger [Percent: local period]
Percent│        ea230010
       │        3a550010
       │        3a600000

       │        38f60001
       │        39490008
       │        42400438
 51.44 │        81290008
       │        7d485378

Here, raw instruction is displayed in the output instead of human
readable annotated form.

One way to get the appropriate data is to specify "--objdump path", by
which code annotation will be done. But the default behaviour will be
changed. To fix this breakage, check if "sym" sort key is set. If so
fallback and use the libcapstone/objdump way of disassmbling the sample.

With the changes and "perf report"

Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work  /usr/bin/pmlogger [Percent: local period]
Percent│        ld        r17,16(r3)
       │        addi      r18,r21,16
       │        li        r19,0

       │ 8b0:   rldicl    r10,r10,63,33
       │        addi      r10,r10,1
       │        mtctr     r10
       │      ↓ b         8e4
       │ 8c0:   addi      r7,r22,1
       │        addi      r10,r9,8
       │      ↓ bdz       d00
 51.44 │        lwz       r9,8(r9)
       │        mr        r8,r10
       │        cmpw      r20,r9

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index f19496133bf0..b81cdcf4d6b4 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -25,6 +25,7 @@
 #include "srcline.h"
 #include "symbol.h"
 #include "util.h"
+#include "sort.h"
 
 static regex_t	 file_lineno;
 
@@ -1803,9 +1804,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 	 * not required in case of powerpc.
 	 */
 	if (arch__is(args->arch, "powerpc")) {
-		err = symbol__disassemble_dso(symfs_filename, sym, args);
-		if (err == 0)
-			goto out_remove_tmp;
+		if (sort_order && !strstr(sort_order, "sym")) {
+			err = symbol__disassemble_dso(symfs_filename, sym, args);
+			if (err == 0)
+				goto out_remove_tmp;
+		}
 	}
 
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (3 preceding siblings ...)
  2024-06-14 17:26 ` [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  5:39   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Currently, the perf tool infrastructure disasm_line__parse function to
parse disassembled line.

Example snippet from objdump:
objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>

c0000000010224b4:	lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. In powerpc, the approach for data type
profiling uses raw instruction instead of result from objdump to identify
the instruction category and extract the source/target registers.

Example: 38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:

line -> "38 01 81 e8     ld      r4,312(r1)"
raw instruction "38 01 81 e8"

Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw). Function "disasm_line__parse_powerpc fills the raw
instruction hex value and can use macros to get opcode. There is no
changes in existing code paths, which parses the disassembled code.
The architecture using the instruction name and present approach is
not altered. Since this approach targets powerpc, the macro
implementation is added for powerpc as of now.

Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mneumonic representation. Also in case
if the DSO read fails and libcapstone is not supported, the approach
fallback to use objdump as option. Hence as option, patch has changes to
ensure objdump option also works well.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/include/linux/string.h                  |  2 +
 tools/lib/string.c                            | 13 ++++
 .../perf/arch/powerpc/annotate/instructions.c |  1 +
 tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 +++
 tools/perf/util/annotate.h                    |  5 +-
 tools/perf/util/disasm.c                      | 59 ++++++++++++++++++-
 6 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
index db5c99318c79..0acb1fc14e19 100644
--- a/tools/include/linux/string.h
+++ b/tools/include/linux/string.h
@@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
 
 extern char *strim(char *);
 
+extern void remove_spaces(char *s);
+
 extern void *memchr_inv(const void *start, int c, size_t bytes);
 #endif /* _TOOLS_LINUX_STRING_H_ */
diff --git a/tools/lib/string.c b/tools/lib/string.c
index 8b6892f959ab..3126d2cff716 100644
--- a/tools/lib/string.c
+++ b/tools/lib/string.c
@@ -153,6 +153,19 @@ char *strim(char *s)
 	return skip_spaces(s);
 }
 
+/*
+ * remove_spaces - Removes whitespaces from @s
+ */
+void remove_spaces(char *s)
+{
+	char *d = s;
+
+	do {
+		while (*d == ' ')
+			++d;
+	} while ((*s++ = *d++));
+}
+
 /**
  * strreplace - Replace all occurrences of character in string.
  * @s: The string to operate on.
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index a3f423c27cae..d57fd023ef9c 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 		arch->initialized = true;
 		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
 		arch->objdump.comment_char      = '#';
+		annotate_opts.show_asm_raw = true;
 	}
 
 	return 0;
diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 0c4f4caf53ac..430623ca5612 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
 			return roff->ptregs_offset;
 	return -EINVAL;
 }
+
+#define PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define PPC_RA(a)	(((a) >> 16) & 0x1f)
+#define PPC_RT(t)	(((t) >> 21) & 0x1f)
+#define PPC_RB(b)	(((b) >> 11) & 0x1f)
+#define PPC_D(D)	((D) & 0xfffe)
+#define PPC_DS(DS)	((DS) & 0xfffc)
+#define OP_LD	58
+#define OP_STD	62
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index d5c821c22f79..9ba772f46270 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -113,7 +113,10 @@ struct annotation_line {
 struct disasm_line {
 	struct ins		 ins;
 	struct ins_operands	 ops;
-
+	union {
+		u8 bytes[4];
+		u32 raw_insn;
+	} raw;
 	/* This needs to be at the end. */
 	struct annotation_line	 al;
 };
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index b81cdcf4d6b4..1e8568738b38 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -45,6 +45,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
 
 static void ins__sort(struct arch *arch);
 static int disasm_line__parse(char *line, const char **namep, char **rawp);
+static int disasm_line__parse_powerpc(struct disasm_line *dl);
 
 static __attribute__((constructor)) void symbol__init_regexpr(void)
 {
@@ -844,6 +845,59 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
 	return -1;
 }
 
+/*
+ * Parses the result captured from symbol__disassemble_*
+ * Example, line read from DSO file in powerpc:
+ * line:    38 01 81 e8
+ * opcode: fetched from arch specific get_opcode_insn
+ * rawp_insn: e8810138
+ *
+ * rawp_insn is used later to extract the reg/offset fields
+ */
+#define	PPC_OP(op)	(((op) >> 26) & 0x3F)
+
+static int disasm_line__parse_powerpc(struct disasm_line *dl)
+{
+	char *line = dl->al.line;
+	const char **namep = &dl->ins.name;
+	char **rawp = &dl->ops.raw;
+	char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
+	char *name = skip_spaces(name_raw_insn + 11);
+	int objdump = 0;
+
+	if (strlen(line) > 11)
+		objdump = 1;
+
+	if (name_raw_insn[0] == '\0')
+		return -1;
+
+	if (objdump) {
+		*rawp = name + 1;
+		while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
+			++*rawp;
+		tmp = (*rawp)[0];
+		(*rawp)[0] = '\0';
+
+		*namep = strdup(name);
+		if (*namep == NULL)
+			return -1;
+
+		(*rawp)[0] = tmp;
+		*rawp = strim(*rawp);
+	} else
+		*namep = "";
+
+	tmp_raw_insn = strdup(name_raw_insn);
+	tmp_raw_insn[11] = '\0';
+	remove_spaces(tmp_raw_insn);
+
+	dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
+	if (objdump)
+		dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
+
+	return 0;
+}
+
 static void annotation_line__init(struct annotation_line *al,
 				  struct annotate_args *args,
 				  int nr)
@@ -897,7 +951,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
 		goto out_delete;
 
 	if (args->offset != -1) {
-		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
+		if (arch__is(args->arch, "powerpc")) {
+			if (disasm_line__parse_powerpc(dl) < 0)
+				goto out_free_line;
+		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
 			goto out_free_line;
 
 		disasm_line__init_ins(dl, args->arch, &args->ms);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (4 preceding siblings ...)
  2024-06-14 17:26 ` [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  6:00   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 07/16] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc Athira Rajeev
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Two main functions are added.
New parse function "load_store__parse" as instruction ops parser for
memory instructions. Unlink other parser (like mov__parse), this parser
fills in the "multi_regs" field for source/target and new added "mem_ref"
field. No other fields are set because, here there is no need to parse the
disassembled code and arch specific macros will take care of extracting
offset and regs which is easier and will be precise.

In powerpc, all instructions with a primary opcode from 32 to 63
are memory instructions. Update "ins__find" function to have "raw_insn"
also as a parameter. Don't use the "extract_reg_offset", instead use
newly added function "get_arch_regs" which will set these fields: reg1,
reg2, offset depending of where it is source or target ops.

Update "parse" callback for "struct ins_ops" to also pass "struct
disasm_line" as argument. This is needed in parse functions where opcode
is used to determine whether to set multi_regs.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/arm64/annotate/instructions.c |  3 +-
 .../arch/loongarch/annotate/instructions.c    |  6 +-
 .../perf/arch/powerpc/annotate/instructions.c | 16 ++++
 tools/perf/arch/powerpc/util/dwarf-regs.c     | 44 +++++++++++
 tools/perf/arch/s390/annotate/instructions.c  |  5 +-
 tools/perf/util/annotate.c                    | 25 ++++++-
 tools/perf/util/disasm.c                      | 73 ++++++++++++++++---
 tools/perf/util/disasm.h                      |  6 +-
 tools/perf/util/include/dwarf-regs.h          |  3 +
 9 files changed, 159 insertions(+), 22 deletions(-)

diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index 4af0c3a0f86e..f86d9f4798bd 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -11,7 +11,8 @@ struct arm64_annotate {
 
 static int arm64_mov__parse(struct arch *arch __maybe_unused,
 			    struct ins_operands *ops,
-			    struct map_symbol *ms __maybe_unused)
+			    struct map_symbol *ms __maybe_unused,
+			    struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *endptr;
 
diff --git a/tools/perf/arch/loongarch/annotate/instructions.c b/tools/perf/arch/loongarch/annotate/instructions.c
index 21cc7e4149f7..ab43b1ab51e3 100644
--- a/tools/perf/arch/loongarch/annotate/instructions.c
+++ b/tools/perf/arch/loongarch/annotate/instructions.c
@@ -5,7 +5,8 @@
  * Copyright (C) 2020-2023 Loongson Technology Corporation Limited
  */
 
-static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *c, *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -51,7 +52,8 @@ static struct ins_ops loongarch_call_ops = {
 	.scnprintf = call__scnprintf,
 };
 
-static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	struct map *map = ms->map;
 	struct symbol *sym = ms->sym;
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index d57fd023ef9c..10fea5e5cf4c 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -49,6 +49,22 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 	return ops;
 }
 
+#define PPC_OP(op)      (((op) >> 26) & 0x3F)
+
+static struct ins_ops *check_ppc_insn(int raw_insn)
+{
+	int opcode = PPC_OP(raw_insn);
+
+	/*
+	 * Instructions with opcode 32 to 63 are memory
+	 * instructions in powerpc
+	 */
+	if ((opcode & 0x20))
+		return &load_store_ops;
+
+	return NULL;
+}
+
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 430623ca5612..e01729f3c0b3 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -107,3 +107,47 @@ int regs_query_register_offset(const char *name)
 #define PPC_DS(DS)	((DS) & 0xfffc)
 #define OP_LD	58
 #define OP_STD	62
+
+static int get_source_reg(unsigned int raw_insn)
+{
+	return PPC_RA(raw_insn);
+}
+
+static int get_target_reg(unsigned int raw_insn)
+{
+	return PPC_RT(raw_insn);
+}
+
+static int get_offset_opcode(int raw_insn)
+{
+	int opcode = PPC_OP(raw_insn);
+
+	/* DS- form */
+	if ((opcode == OP_LD) || (opcode == OP_STD))
+		return PPC_DS(raw_insn);
+	else
+		return PPC_D(raw_insn);
+}
+
+/*
+ * Fills the required fields for op_loc depending on if it
+ * is a source or target.
+ * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
+ * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
+ * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
+ */
+void get_arch_regs(int raw_insn, int is_source,
+		struct annotated_op_loc *op_loc)
+{
+	if (is_source)
+		op_loc->reg1 = get_source_reg(raw_insn);
+	else
+		op_loc->reg1 = get_target_reg(raw_insn);
+
+	if (op_loc->multi_regs)
+		op_loc->reg2 = PPC_RB(raw_insn);
+
+	/* TODO: Implement offset handling for X Form */
+	if ((op_loc->mem_ref) && (PPC_OP(raw_insn) != 31))
+		op_loc->offset = get_offset_opcode(raw_insn);
+}
diff --git a/tools/perf/arch/s390/annotate/instructions.c b/tools/perf/arch/s390/annotate/instructions.c
index da5aa3e1f04c..eeac25cca699 100644
--- a/tools/perf/arch/s390/annotate/instructions.c
+++ b/tools/perf/arch/s390/annotate/instructions.c
@@ -2,7 +2,7 @@
 #include <linux/compiler.h>
 
 static int s390_call__parse(struct arch *arch, struct ins_operands *ops,
-			    struct map_symbol *ms)
+			    struct map_symbol *ms, struct disasm_line *dl __maybe_unused)
 {
 	char *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -52,7 +52,8 @@ static struct ins_ops s390_call_ops = {
 
 static int s390_mov__parse(struct arch *arch __maybe_unused,
 			   struct ins_operands *ops,
-			   struct map_symbol *ms __maybe_unused)
+			   struct map_symbol *ms __maybe_unused,
+			   struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *endptr;
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1451caf25e77..bfa6420dc4b9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2079,6 +2079,12 @@ static int extract_reg_offset(struct arch *arch, const char *str,
 	return 0;
 }
 
+__weak void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused,
+		struct annotated_op_loc *op_loc __maybe_unused)
+{
+	return;
+}
+
 /**
  * annotate_get_insn_location - Get location of instruction
  * @arch: the architecture info
@@ -2123,20 +2129,33 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 	for_each_insn_op_loc(loc, i, op_loc) {
 		const char *insn_str = ops->source.raw;
 		bool multi_regs = ops->source.multi_regs;
+		bool mem_ref = ops->source.mem_ref;
 
 		if (i == INSN_OP_TARGET) {
 			insn_str = ops->target.raw;
 			multi_regs = ops->target.multi_regs;
+			mem_ref = ops->target.mem_ref;
 		}
 
 		/* Invalidate the register by default */
 		op_loc->reg1 = -1;
 		op_loc->reg2 = -1;
 
-		if (insn_str == NULL)
-			continue;
+		if (insn_str == NULL) {
+			if (!arch__is(arch, "powerpc"))
+				continue;
+		}
 
-		if (strchr(insn_str, arch->objdump.memory_ref_char)) {
+		/*
+		 * For powerpc, call get_arch_regs function which extracts the
+		 * required fields for op_loc, ie reg1, reg2, offset from the
+		 * raw instruction.
+		 */
+		if (arch__is(arch, "powerpc")) {
+			op_loc->mem_ref = mem_ref;
+			op_loc->multi_regs = multi_regs;
+			get_arch_regs(dl->raw.raw_insn, !i, op_loc);
+		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
 			op_loc->mem_ref = true;
 			op_loc->multi_regs = multi_regs;
 			extract_reg_offset(arch, insn_str, op_loc);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 1e8568738b38..8428df0b9c17 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -37,6 +37,7 @@ static struct ins_ops mov_ops;
 static struct ins_ops nop_ops;
 static struct ins_ops lock_ops;
 static struct ins_ops ret_ops;
+static struct ins_ops load_store_ops;
 
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
@@ -254,7 +255,8 @@ bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2)
 	return arch->ins_is_fused(arch, ins1, ins2);
 }
 
-static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -349,7 +351,8 @@ static inline const char *validate_comma(const char *c, struct ins_operands *ops
 	return c;
 }
 
-static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	struct map *map = ms->map;
 	struct symbol *sym = ms->sym;
@@ -508,7 +511,8 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
 	return 0;
 }
 
-static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	ops->locked.ops = zalloc(sizeof(*ops->locked.ops));
 	if (ops->locked.ops == NULL)
@@ -517,13 +521,13 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
 		goto out_free_ops;
 
-	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
+	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name, 0);
 
 	if (ops->locked.ins.ops == NULL)
 		goto out_free_ops;
 
 	if (ops->locked.ins.ops->parse &&
-	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms) < 0)
+	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms, NULL) < 0)
 		goto out_free_ops;
 
 	return 0;
@@ -594,7 +598,8 @@ static bool check_multi_regs(struct arch *arch, const char *op)
 	return count > 1;
 }
 
-static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
+static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *comment, prev;
 
@@ -672,7 +677,39 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
-static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
+static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
+		struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			ops->raw);
+}
+
+/*
+ * Sets the fields: multi_regs and "mem_ref".
+ * "mem_ref" is set for ops->source which is later used to
+ * fill the objdump->memory_ref-char field. This ops is currently
+ * used by powerpc and since binary instruction code is used to
+ * extract opcode, regs and offset, no other parsing is needed here
+ */
+static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
+		struct map_symbol *ms __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	ops->source.mem_ref = true;
+	ops->source.multi_regs = false;
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static struct ins_ops load_store_ops = {
+	.parse     = load_store__parse,
+	.scnprintf = load_store__scnprintf,
+};
+
+static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *target, *comment, *s, prev;
 
@@ -762,11 +799,23 @@ static void ins__sort(struct arch *arch)
 	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins_ops *__ins__find(struct arch *arch, const char *name)
+static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_insn)
 {
 	struct ins *ins;
 	const int nmemb = arch->nr_instructions;
 
+	if (arch__is(arch, "powerpc")) {
+		/*
+		 * For powerpc, identify the instruction ops
+		 * from the opcode using raw_insn.
+		 */
+		struct ins_ops *ops;
+
+		ops = check_ppc_insn(raw_insn);
+		if (ops)
+			return ops;
+	}
+
 	if (!arch->sorted_instructions) {
 		ins__sort(arch);
 		arch->sorted_instructions = true;
@@ -796,9 +845,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name)
 	return ins ? ins->ops : NULL;
 }
 
-struct ins_ops *ins__find(struct arch *arch, const char *name)
+struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn)
 {
-	struct ins_ops *ops = __ins__find(arch, name);
+	struct ins_ops *ops = __ins__find(arch, name, raw_insn);
 
 	if (!ops && arch->associate_instruction_ops)
 		ops = arch->associate_instruction_ops(arch, name);
@@ -808,12 +857,12 @@ struct ins_ops *ins__find(struct arch *arch, const char *name)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
 {
-	dl->ins.ops = ins__find(arch, dl->ins.name);
+	dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
 
 	if (!dl->ins.ops)
 		return;
 
-	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms) < 0)
+	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms, dl) < 0)
 		dl->ins.ops = NULL;
 }
 
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 718177fa4775..6b6ec23e4f6f 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -57,6 +57,7 @@ struct ins_operands {
 		bool	offset_avail;
 		bool	outside;
 		bool	multi_regs;
+		bool	mem_ref;
 	} target;
 	union {
 		struct {
@@ -64,6 +65,7 @@ struct ins_operands {
 			char	*name;
 			u64	addr;
 			bool	multi_regs;
+			bool	mem_ref;
 		} source;
 		struct {
 			struct ins	    ins;
@@ -78,7 +80,7 @@ struct ins_operands {
 
 struct ins_ops {
 	void (*free)(struct ins_operands *ops);
-	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms);
+	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms, struct disasm_line *dl);
 	int (*scnprintf)(struct ins *ins, char *bf, size_t size,
 			 struct ins_operands *ops, int max_ins_name);
 };
@@ -97,7 +99,7 @@ struct annotate_args {
 struct arch *arch__find(const char *name);
 bool arch__is(struct arch *arch, const char *name);
 
-struct ins_ops *ins__find(struct arch *arch, const char *name);
+struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 		   struct ins_operands *ops, int max_ins_name);
 
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 01fb25a1150a..7ea39362ecaf 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _PERF_DWARF_REGS_H_
 #define _PERF_DWARF_REGS_H_
+#include "annotate.h"
 
 #define DWARF_REG_PC  0xd3af9c /* random number */
 #define DWARF_REG_FB  0xd3affb /* random number */
@@ -31,6 +32,8 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
 }
 #endif
 
+void get_arch_regs(int raw_insn, int is_source, struct annotated_op_loc *op_loc);
+
 #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
 /*
  * Arch should support fetching the offset of a register in pt_regs
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 07/16] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (5 preceding siblings ...)
  2024-06-14 17:26 ` [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 08/16] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

There are memory instructions in powerpc with opcode as 31.
Example: "ldx RT,RA,RB" , Its X form is as below:

  ______________________________________
  | 31 |  RT  |  RA |  RB |   21     |/|
  --------------------------------------
  0    6     11    16    21         30 31

The opcode for "ldx" is 31. There are other instructions also with
opcode 31 which are memory insn like ldux, stbx, lwzx, lhaux
But all instructions with opcode 31 are not memory. Example is add
instruction: "add RT,RA,RB"

The value in bit 21-30 [ 21 for ldx ] is different for these
instructions. Patch uses this value to assign instruction ops for these
cases. The naming convention and value to identify these are picked from
defines in "arch/powerpc/include/asm/ppc-opcode.h"

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 107 +++++++++++++++++-
 tools/perf/util/disasm.c                      |   3 +
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index 10fea5e5cf4c..4ee959a24738 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -49,18 +49,121 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 	return ops;
 }
 
-#define PPC_OP(op)      (((op) >> 26) & 0x3F)
+#define PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define PPC_21_30(R)	(((R) >> 1) & 0x3ff)
+
+struct insn_offset {
+	const char	*name;
+	int		value;
+};
+
+/*
+ * There are memory instructions with opcode 31 which are
+ * of X Form, Example:
+ * ldx RT,RA,RB
+ * ______________________________________
+ * | 31 |  RT  |  RA |  RB |   21     |/|
+ * --------------------------------------
+ * 0    6     11    16    21         30 31
+ *
+ * But all instructions with opcode 31 are not memory.
+ * Example: add RT,RA,RB
+ *
+ * Use bits 21 to 30 to check memory insns with 31 as opcode.
+ * In ins_array below, for ldx instruction:
+ * name => OP_31_XOP_LDX
+ * value => 21
+ */
+
+static struct insn_offset ins_array[] = {
+	{ .name = "OP_31_XOP_LXSIWZX",  .value = 12, },
+	{ .name = "OP_31_XOP_LWARX",	.value = 20, },
+	{ .name = "OP_31_XOP_LDX",	.value = 21, },
+	{ .name = "OP_31_XOP_LWZX",	.value = 23, },
+	{ .name = "OP_31_XOP_LDUX",	.value = 53, },
+	{ .name = "OP_31_XOP_LWZUX",	.value = 55, },
+	{ .name = "OP_31_XOP_LXSIWAX",  .value = 76, },
+	{ .name = "OP_31_XOP_LDARX",    .value = 84, },
+	{ .name = "OP_31_XOP_LBZX",	.value = 87, },
+	{ .name = "OP_31_XOP_LVX",      .value = 103, },
+	{ .name = "OP_31_XOP_LBZUX",    .value = 119, },
+	{ .name = "OP_31_XOP_STXSIWX",  .value = 140, },
+	{ .name = "OP_31_XOP_STDX",	.value = 149, },
+	{ .name = "OP_31_XOP_STWX",	.value = 151, },
+	{ .name = "OP_31_XOP_STDUX",	.value = 181, },
+	{ .name = "OP_31_XOP_STWUX",	.value = 183, },
+	{ .name = "OP_31_XOP_STBX",	.value = 215, },
+	{ .name = "OP_31_XOP_STVX",     .value = 231, },
+	{ .name = "OP_31_XOP_STBUX",	.value = 247, },
+	{ .name = "OP_31_XOP_LHZX",	.value = 279, },
+	{ .name = "OP_31_XOP_LHZUX",	.value = 311, },
+	{ .name = "OP_31_XOP_LXVDSX",   .value = 332, },
+	{ .name = "OP_31_XOP_LWAX",	.value = 341, },
+	{ .name = "OP_31_XOP_LHAX",	.value = 343, },
+	{ .name = "OP_31_XOP_LWAUX",	.value = 373, },
+	{ .name = "OP_31_XOP_LHAUX",	.value = 375, },
+	{ .name = "OP_31_XOP_STHX",	.value = 407, },
+	{ .name = "OP_31_XOP_STHUX",	.value = 439, },
+	{ .name = "OP_31_XOP_LXSSPX",   .value = 524, },
+	{ .name = "OP_31_XOP_LDBRX",	.value = 532, },
+	{ .name = "OP_31_XOP_LSWX",	.value = 533, },
+	{ .name = "OP_31_XOP_LWBRX",	.value = 534, },
+	{ .name = "OP_31_XOP_LFSUX",    .value = 567, },
+	{ .name = "OP_31_XOP_LXSDX",    .value = 588, },
+	{ .name = "OP_31_XOP_LSWI",	.value = 597, },
+	{ .name = "OP_31_XOP_LFDX",     .value = 599, },
+	{ .name = "OP_31_XOP_LFDUX",    .value = 631, },
+	{ .name = "OP_31_XOP_STXSSPX",  .value = 652, },
+	{ .name = "OP_31_XOP_STDBRX",	.value = 660, },
+	{ .name = "OP_31_XOP_STXWX",	.value = 661, },
+	{ .name = "OP_31_XOP_STWBRX",	.value = 662, },
+	{ .name = "OP_31_XOP_STFSX",	.value = 663, },
+	{ .name = "OP_31_XOP_STFSUX",	.value = 695, },
+	{ .name = "OP_31_XOP_STXSDX",   .value = 716, },
+	{ .name = "OP_31_XOP_STSWI",	.value = 725, },
+	{ .name = "OP_31_XOP_STFDX",	.value = 727, },
+	{ .name = "OP_31_XOP_STFDUX",	.value = 759, },
+	{ .name = "OP_31_XOP_LXVW4X",   .value = 780, },
+	{ .name = "OP_31_XOP_LHBRX",	.value = 790, },
+	{ .name = "OP_31_XOP_LXVD2X",   .value = 844, },
+	{ .name = "OP_31_XOP_LFIWAX",	.value = 855, },
+	{ .name = "OP_31_XOP_LFIWZX",	.value = 887, },
+	{ .name = "OP_31_XOP_STXVW4X",  .value = 908, },
+	{ .name = "OP_31_XOP_STHBRX",	.value = 918, },
+	{ .name = "OP_31_XOP_STXVD2X",  .value = 972, },
+	{ .name = "OP_31_XOP_STFIWX",	.value = 983, },
+};
+
+static int cmp_offset(const void *a, const void *b)
+{
+	const struct insn_offset *val1 = a;
+	const struct insn_offset *val2 = b;
+
+	return (val1->value - val2->value);
+}
 
 static struct ins_ops *check_ppc_insn(int raw_insn)
 {
 	int opcode = PPC_OP(raw_insn);
+	int mem_insn_31 = PPC_21_30(raw_insn);
+	struct insn_offset *ret;
+	struct insn_offset mem_insns_31_opcode = {
+		"OP_31_INSN",
+		mem_insn_31
+	};
 
 	/*
 	 * Instructions with opcode 32 to 63 are memory
 	 * instructions in powerpc
 	 */
-	if ((opcode & 0x20))
+	if ((opcode & 0x20)) {
 		return &load_store_ops;
+	} else if (opcode == 31) {
+		/* Check for memory instructions with opcode 31 */
+		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
+		if (ret != NULL)
+			return &load_store_ops;
+	}
 
 	return NULL;
 }
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 8428df0b9c17..4e605d082a02 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -696,6 +696,9 @@ static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operan
 {
 	ops->source.mem_ref = true;
 	ops->source.multi_regs = false;
+	/* opcode 31 is of X form */
+	if (PPC_OP(dl->raw.raw_insn) == 31)
+		ops->source.multi_regs = true;
 
 	ops->target.mem_ref = false;
 	ops->target.multi_regs = false;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 08/16] tools/perf: Add some of the arithmetic instructions to support instruction tracking in powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (6 preceding siblings ...)
  2024-06-14 17:26 ` [V4 07/16] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 09/16] tools/perf: Add more instructions for instruction tracking Athira Rajeev
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Data type profiling has concept of instruction tracking.
Example sequence in powerpc:

	ld      r10,264(r3)
	mr      r31,r3
	<<after some sequence>
	ld      r9,312(r31)

or differently

	lwz	r10,264(r3)
	add	r31, r3, RB
	lwz	r9, 0(r31)

If a sample is hit at "lwz r9, 0(r31)", data type of r31 depends
on previous instruction sequence here. So to track the previous
instructions, patch adds changes to identify some of the arithmetic
instructions which are having opcode as 31. Since memory instructions
also has cases with opcode 31, use the bits 22:30 to filter the
arithmetic instructions here. Also there are instructions with just
two operands like addme, addze. Patch adds new instructions ops
"arithmetic_ops" to handle this

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 49 ++++++++++++++++++
 tools/perf/util/disasm.c                      | 51 +++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index 4ee959a24738..bec8ab0ee18d 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -51,6 +51,7 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 
 #define PPC_OP(op)	(((op) >> 26) & 0x3F)
 #define PPC_21_30(R)	(((R) >> 1) & 0x3ff)
+#define PPC_22_30(R)	(((R) >> 1) & 0x1ff)
 
 struct insn_offset {
 	const char	*name;
@@ -134,6 +135,44 @@ static struct insn_offset ins_array[] = {
 	{ .name = "OP_31_XOP_STFIWX",	.value = 983, },
 };
 
+/*
+ * Arithmetic instructions which are having opcode as 31.
+ * These instructions are tracked to save the register state
+ * changes. Example:
+ *
+ * lwz	r10,264(r3)
+ * add	r31, r3, r3
+ * lwz	r9, 0(r31)
+ *
+ * Here instruction tracking needs to identify the "add"
+ * instruction and save data type of r3 to r31. If a sample
+ * is hit at next "lwz r9, 0(r31)", by this instruction tracking,
+ * data type of r31 can be resolved.
+ */
+static struct insn_offset arithmetic_ins_op_31[] = {
+	{ .name = "SUB_CARRY_XO_FORM",  .value = 8, },
+	{ .name = "MUL_HDW_XO_FORM1",   .value = 9, },
+	{ .name = "ADD_CARRY_XO_FORM",  .value = 10, },
+	{ .name = "MUL_HW_XO_FORM1",    .value = 11, },
+	{ .name = "SUB_XO_FORM",        .value = 40, },
+	{ .name = "MUL_HDW_XO_FORM",    .value = 73, },
+	{ .name = "MUL_HW_XO_FORM",     .value = 75, },
+	{ .name = "SUB_EXT_XO_FORM",    .value = 136, },
+	{ .name = "ADD_EXT_XO_FORM",    .value = 138, },
+	{ .name = "SUB_ZERO_EXT_XO_FORM",       .value = 200, },
+	{ .name = "ADD_ZERO_EXT_XO_FORM",       .value = 202, },
+	{ .name = "SUB_EXT_XO_FORM2",   .value = 232, },
+	{ .name = "MUL_DW_XO_FORM",     .value = 233, },
+	{ .name = "ADD_EXT_XO_FORM2",   .value = 234, },
+	{ .name = "MUL_W_XO_FORM",      .value = 235, },
+	{ .name = "ADD_XO_FORM",	.value = 266, },
+	{ .name = "DIV_DW_XO_FORM1",    .value = 457, },
+	{ .name = "DIV_W_XO_FORM1",     .value = 459, },
+	{ .name = "DIV_DW_XO_FORM",	.value = 489, },
+	{ .name = "DIV_W_XO_FORM",	.value = 491, },
+};
+
+
 static int cmp_offset(const void *a, const void *b)
 {
 	const struct insn_offset *val1 = a;
@@ -163,6 +202,16 @@ static struct ins_ops *check_ppc_insn(int raw_insn)
 		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
 		if (ret != NULL)
 			return &load_store_ops;
+		else {
+			mem_insns_31_opcode.value = PPC_22_30(raw_insn);
+			ret = bsearch(&mem_insns_31_opcode, arithmetic_ins_op_31, ARRAY_SIZE(arithmetic_ins_op_31),
+					sizeof(arithmetic_ins_op_31[0]), cmp_offset);
+			if (ret != NULL)
+				return &arithmetic_ops;
+			/* Bits 21 to 30 has value 444 for "mr" insn ie, OR X form */
+			if (PPC_21_30(raw_insn) == 444)
+				return &arithmetic_ops;
+		}
 	}
 
 	return NULL;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 4e605d082a02..2191a3ec6517 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -38,6 +38,7 @@ static struct ins_ops nop_ops;
 static struct ins_ops lock_ops;
 static struct ins_ops ret_ops;
 static struct ins_ops load_store_ops;
+static struct ins_ops arithmetic_ops;
 
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
@@ -677,6 +678,56 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+#define PPC_22_30(R)    (((R) >> 1) & 0x1ff)
+#define MINUS_EXT_XO_FORM	234
+#define SUB_EXT_XO_FORM		232
+#define	ADD_ZERO_EXT_XO_FORM	202
+#define	SUB_ZERO_EXT_XO_FORM	200
+
+static int arithmetic__scnprintf(struct ins *ins, char *bf, size_t size,
+		struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			ops->raw);
+}
+
+/*
+ * Sets the fields: multi_regs and "mem_ref".
+ * "mem_ref" is set for ops->source which is later used to
+ * fill the objdump->memory_ref-char field. This ops is currently
+ * used by powerpc and since binary instruction code is used to
+ * extract opcode, regs and offset, no other parsing is needed here.
+ *
+ * Dont set multi regs for 4 cases since it has only one operand
+ * for source:
+ * - Add to Minus One Extended XO-form ( Ex: addme, addmeo )
+ * - Subtract From Minus One Extended XO-form ( Ex: subfme )
+ * - Add to Zero Extended XO-form ( Ex: addze, addzeo )
+ * - Subtract From Zero Extended XO-form ( Ex: subfze )
+ */
+static int arithmetic__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
+		struct map_symbol *ms __maybe_unused, struct disasm_line *dl)
+{
+	int opcode = PPC_OP(dl->raw.raw_insn);
+
+	ops->source.mem_ref = false;
+	if (opcode == 31) {
+		if ((opcode != MINUS_EXT_XO_FORM) && (opcode != SUB_EXT_XO_FORM) \
+				&& (opcode != ADD_ZERO_EXT_XO_FORM) && (opcode != SUB_ZERO_EXT_XO_FORM))
+			ops->source.multi_regs = true;
+	}
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static struct ins_ops arithmetic_ops = {
+	.parse     = arithmetic__parse,
+	.scnprintf = arithmetic__scnprintf,
+};
+
 static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
 		struct ins_operands *ops, int max_ins_name)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 09/16] tools/perf: Add more instructions for instruction tracking
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (7 preceding siblings ...)
  2024-06-14 17:26 ` [V4 08/16] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 10/16] tools/perf: Update instruction tracking for powerpc Athira Rajeev
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Add few more instructions and use opcode as search key
to find if it is supported by the architecture. Added ones
are: addi, addic, addic., addis, subfic and mulli

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/powerpc/annotate/instructions.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index bec8ab0ee18d..db72148eb857 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -172,6 +172,14 @@ static struct insn_offset arithmetic_ins_op_31[] = {
 	{ .name = "DIV_W_XO_FORM",	.value = 491, },
 };
 
+static struct insn_offset arithmetic_two_ops[] = {
+	{ .name = "mulli",      .value = 7, },
+	{ .name = "subfic",     .value = 8, },
+	{ .name = "addic",      .value = 12, },
+	{ .name = "addic.",     .value = 13, },
+	{ .name = "addi",       .value = 14, },
+	{ .name = "addis",      .value = 15, },
+};
 
 static int cmp_offset(const void *a, const void *b)
 {
@@ -212,6 +220,12 @@ static struct ins_ops *check_ppc_insn(int raw_insn)
 			if (PPC_21_30(raw_insn) == 444)
 				return &arithmetic_ops;
 		}
+	} else {
+		mem_insns_31_opcode.value = opcode;
+		ret = bsearch(&mem_insns_31_opcode, arithmetic_two_ops, ARRAY_SIZE(arithmetic_two_ops),
+				sizeof(arithmetic_two_ops[0]), cmp_offset);
+		if (ret != NULL)
+			return &arithmetic_ops;
 	}
 
 	return NULL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 10/16] tools/perf: Update instruction tracking for powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (8 preceding siblings ...)
  2024-06-14 17:26 ` [V4 09/16] tools/perf: Add more instructions for instruction tracking Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 11/16] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Add instruction tracking function "update_insn_state_powerpc" for
powerpc. Example sequence in powerpc:

ld      r10,264(r3)
mr      r31,r3
<<after some sequence>
ld      r9,312(r31)

Consider ithe sample is pointing to: "ld r9,312(r31)".
Here the memory reference is hit at "312(r31)" where 312 is the offset
and r31 is the source register. Previous instruction sequence shows that
register state of r3 is moved to r31. So to identify the data type for r31
access, the previous instruction ("mr") needs to be tracked and the
state type entry has to be updated. Current instruction tracking support
in perf tools infrastructure is specific to x86. Patch adds this support
for powerpc as well.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 65 +++++++++++++++++++
 tools/perf/util/annotate-data.c               |  9 ++-
 tools/perf/util/disasm.c                      |  1 +
 3 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index db72148eb857..13eaec36a9dc 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -231,6 +231,71 @@ static struct ins_ops *check_ppc_insn(int raw_insn)
 	return NULL;
 }
 
+/*
+ * Instruction tracking function to track register state moves.
+ * Example sequence:
+ *    ld      r10,264(r3)
+ *    mr      r31,r3
+ *    <<after some sequence>
+ *    ld      r9,312(r31)
+ *
+ * Previous instruction sequence shows that register state of r3
+ * is moved to r31. update_insn_state_powerpc tracks these state
+ * changes
+ */
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_powerpc(struct type_state *state,
+		struct data_loc_info *dloc, Dwarf_Die * cu_die __maybe_unused,
+		struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	/*
+	 * Value 444 for bits 21:30 is for "mr"
+	 * instruction. "mr" is extended OR. So set the
+	 * source and destination reg correctly
+	 */
+	if (PPC_21_30(dl->raw.raw_insn) == 444) {
+		int src_reg = src->reg1;
+
+		src->reg1 = dst->reg1;
+		dst->reg1 = src_reg;
+	}
+
+	if (!has_reg_type(state, dst->reg1))
+		return;
+
+	tsr = &state->regs[dst->reg1];
+
+	if (!has_reg_type(state, src->reg1) ||
+			!state->regs[src->reg1].ok) {
+		tsr->ok = false;
+		return;
+	}
+
+	tsr->type = state->regs[src->reg1].type;
+	tsr->kind = state->regs[src->reg1].kind;
+	tsr->ok = true;
+
+	pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			insn_offset, src->reg1, dst->reg1);
+	pr_debug_type_name(&tsr->type, tsr->kind);
+}
+#else /* HAVE_DWARF_SUPPORT */
+static void update_insn_state_powerpc(struct type_state *state __maybe_unused, struct data_loc_info *dloc __maybe_unused,
+		Dwarf_Die * cu_die __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	return;
+}
+#endif /* HAVE_DWARF_SUPPORT */
+
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 7a48c3d72b89..734acdd8c4b7 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1080,6 +1080,13 @@ static int find_data_type_insn(struct data_loc_info *dloc,
 	return ret;
 }
 
+static int arch_supports_insn_tracking(struct data_loc_info *dloc)
+{
+	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")))
+		return 1;
+	return 0;
+}
+
 /*
  * Construct a list of basic blocks for each scope with variables and try to find
  * the data type by updating a type state table through instructions.
@@ -1094,7 +1101,7 @@ static int find_data_type_block(struct data_loc_info *dloc,
 	int ret = -1;
 
 	/* TODO: other architecture support */
-	if (!arch__is(dloc->arch, "x86"))
+	if (!arch_supports_insn_tracking(dloc))
 		return -1;
 
 	prev_dst_ip = dst_ip = dloc->ip;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 2191a3ec6517..86ff98e64890 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -155,6 +155,7 @@ static struct arch architectures[] = {
 	{
 		.name = "powerpc",
 		.init = powerpc__annotate_init,
+		.update_insn_state = update_insn_state_powerpc,
 	},
 	{
 		.name = "riscv64",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 11/16] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (9 preceding siblings ...)
  2024-06-14 17:26 ` [V4 10/16] tools/perf: Update instruction tracking for powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 12/16] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

symbol__disassemble_capstone in util/disasm.c calls function
open_capstone_handle to open/init the capstone. We already have a
capstone_init function in "util/print_insn.c". But capstone_init
is defined as a static function in util/print_insn.c. Change this and
also add the function in print_insn.h

The open_capstone_handle checks the disassembler_style option from
annotation_options to decide whether to set CS_OPT_SYNTAX_ATT.
Add that logic in capstone_init also and by default set it to true.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/print_insn.c | 12 +++++++++---
 tools/perf/util/print_insn.h |  5 +++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index a950e9157d2d..a76aae81d7a0 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -32,7 +32,7 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 #include <capstone/capstone.h>
 
-static int capstone_init(struct machine *machine, csh *cs_handle, bool is64)
+int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disassembler_style)
 {
 	cs_arch arch;
 	cs_mode mode;
@@ -62,7 +62,13 @@ static int capstone_init(struct machine *machine, csh *cs_handle, bool is64)
 	}
 
 	if (machine__normalized_is(machine, "x86")) {
-		cs_option(*cs_handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
+		/*
+		 * In case of using capstone_init while symbol__disassemble
+		 * setting CS_OPT_SYNTAX_ATT depends if disassembler_style opts
+		 * is set via annotation args
+		 */
+		if (disassembler_style)
+			cs_option(*cs_handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
 		/*
 		 * Resolving address operands to symbols is implemented
 		 * on x86 by investigating instruction details.
@@ -122,7 +128,7 @@ ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpum
 	int ret;
 
 	/* TODO: Try to initiate capstone only once but need a proper place. */
-	ret = capstone_init(machine, &cs_handle, is64bit);
+	ret = capstone_init(machine, &cs_handle, is64bit, true);
 	if (ret < 0)
 		return ret;
 
diff --git a/tools/perf/util/print_insn.h b/tools/perf/util/print_insn.h
index 07d11af3fc1c..2c8ee41c4a5d 100644
--- a/tools/perf/util/print_insn.h
+++ b/tools/perf/util/print_insn.h
@@ -19,4 +19,9 @@ ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpum
 			 bool is64bit, const uint8_t *code, size_t code_size,
 			 uint64_t ip, int *lenp, int print_opts, FILE *fp);
 
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+#include <capstone/capstone.h>
+int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disassembler_style);
+#endif
+
 #endif /* PERF_PRINT_INSN_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 12/16] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (10 preceding siblings ...)
  2024-06-14 17:26 ` [V4 11/16] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 13/16] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

capstone_init is made availbale for all archs to use and updated to
enable support for CS_ARCH_PPC as well. Patch removes
open_capstone_handle and uses capstone_init in all the places.

Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c     | 42 +++++++++++-------------------------
 tools/perf/util/print_insn.c |  3 +++
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 86ff98e64890..43743ca4bdc9 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -26,6 +26,7 @@
 #include "symbol.h"
 #include "util.h"
 #include "sort.h"
+#include "print_insn.h"
 
 static regex_t	 file_lineno;
 
@@ -1517,32 +1518,6 @@ symbol__disassemble_bpf_image(struct symbol *sym,
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 #include <capstone/capstone.h>
 
-static int open_capstone_handle(struct annotate_args *args, bool is_64bit,
-				csh *handle)
-{
-	struct annotation_options *opt = args->options;
-	cs_mode mode = is_64bit ? CS_MODE_64 : CS_MODE_32;
-
-	/* TODO: support more architectures */
-	if (!arch__is(args->arch, "x86"))
-		return -1;
-
-	if (cs_open(CS_ARCH_X86, mode, handle) != CS_ERR_OK)
-		return -1;
-
-	if (!opt->disassembler_style ||
-	    !strcmp(opt->disassembler_style, "att"))
-		cs_option(*handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
-
-	/*
-	 * Resolving address operands to symbols is implemented
-	 * on x86 by investigating instruction details.
-	 */
-	cs_option(*handle, CS_OPT_DETAIL, CS_OPT_ON);
-
-	return 0;
-}
-
 struct find_file_offset_data {
 	u64 ip;
 	u64 offset;
@@ -1639,6 +1614,7 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 	cs_insn *insn;
 	char disasm_buf[512];
 	struct disasm_line *dl;
+	bool disassembler_style = false;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -1653,7 +1629,11 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 			    &is_64bit) == 0)
 		goto err;
 
-	if (open_capstone_handle(args, is_64bit, &handle) < 0)
+	if (!args->options->disassembler_style ||
+			!strcmp(args->options->disassembler_style, "att"))
+		disassembler_style = true;
+
+	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
@@ -1973,9 +1953,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 	}
 
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
-	err = symbol__disassemble_capstone(symfs_filename, sym, args);
-	if (err == 0)
-		goto out_remove_tmp;
+	if (arch__is(args->arch, "x86")) {
+		err = symbol__disassemble_capstone(symfs_filename, sym, args);
+		if (err == 0)
+			goto out_remove_tmp;
+	}
 #endif
 
 	err = asprintf(&command,
diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index a76aae81d7a0..79dec5ab3bef 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -52,6 +52,9 @@ int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disas
 	} else if (machine__normalized_is(machine, "s390")) {
 		arch = CS_ARCH_SYSZ;
 		mode = CS_MODE_BIG_ENDIAN;
+	} else if (machine__normalized_is(machine, "powerpc")) {
+		arch = CS_ARCH_PPC;
+		mode = CS_MODE_64;
 	} else {
 		return -1;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 13/16] tools/perf: Add support to use libcapstone in powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (11 preceding siblings ...)
  2024-06-14 17:26 ` [V4 12/16] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  6:08   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Now perf uses the capstone library to disassemble the instructions in
x86. capstone is used (if available) for perf annotate to speed up.
Currently it only supports x86 architecture. Patch includes changes to
enable this in powerpc. For now, only for data type sort keys, this
method is used and only binary code (raw instruction) is read. This is
because powerpc approach to understand instructions and reg fields uses
raw instruction. The "cs_disasm" is currently not enabled. While
attempting to do cs_disasm, observation is that some of the instructions
were not identified (ex: extswsli, maddld) and it had to fallback to use
objdump. Hence enabling "cs_disasm" is added in comment section as a
TODO for powerpc.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)

diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 43743ca4bdc9..987bff9f71c3 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1592,6 +1592,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
 	}
 }
 
+static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
+					struct annotate_args *args)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	struct map *map = args->ms.map;
+	struct dso *dso = map__dso(map);
+	struct nscookie nsc;
+	u64 start = map__rip_2objdump(map, sym->start);
+	u64 end = map__rip_2objdump(map, sym->end);
+	u64 len = end - start;
+	u64 offset;
+	int i, fd, count;
+	bool is_64bit = false;
+	bool needs_cs_close = false;
+	u8 *buf = NULL;
+	struct find_file_offset_data data = {
+		.ip = start,
+	};
+	csh handle;
+	char disasm_buf[512];
+	struct disasm_line *dl;
+	u32 *line;
+	bool disassembler_style = false;
+
+	if (args->options->objdump_path)
+		return -1;
+
+	nsinfo__mountns_enter(dso->nsinfo, &nsc);
+	fd = open(filename, O_RDONLY);
+	nsinfo__mountns_exit(&nsc);
+	if (fd < 0)
+		return -1;
+
+	if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
+			    &is_64bit) == 0)
+		goto err;
+
+	if (!args->options->disassembler_style ||
+			!strcmp(args->options->disassembler_style, "att"))
+		disassembler_style = true;
+
+	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
+		goto err;
+
+	needs_cs_close = true;
+
+	buf = malloc(len);
+	if (buf == NULL)
+		goto err;
+
+	count = pread(fd, buf, len, data.offset);
+	close(fd);
+	fd = -1;
+
+	if ((u64)count != len)
+		goto err;
+
+	line = (u32 *)buf;
+
+	/* add the function address and name */
+	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
+		  start, sym->name);
+
+	args->offset = -1;
+	args->line = disasm_buf;
+	args->line_nr = 0;
+	args->fileloc = NULL;
+	args->ms.sym = sym;
+
+	dl = disasm_line__new(args);
+	if (dl == NULL)
+		goto err;
+
+	annotation_line__add(&dl->al, &notes->src->source);
+
+	/*
+	 * TODO: enable disassm for powerpc
+	 * count = cs_disasm(handle, buf, len, start, len, &insn);
+	 *
+	 * For now, only binary code is saved in disassembled line
+	 * to be used in "type" and "typeoff" sort keys. Each raw code
+	 * is 32 bit instruction. So use "len/4" to get the number of
+	 * entries.
+	 */
+	count = len/4;
+
+	for (i = 0, offset = 0; i < count; i++) {
+		args->offset = offset;
+		sprintf(args->line, "%x", line[i]);
+
+		dl = disasm_line__new(args);
+		if (dl == NULL)
+			goto err;
+
+		annotation_line__add(&dl->al, &notes->src->source);
+
+		offset += 4;
+	}
+
+	/* It failed in the middle */
+	if (offset != len) {
+		struct list_head *list = &notes->src->source;
+
+		/* Discard all lines and fallback to objdump */
+		while (!list_empty(list)) {
+			dl = list_first_entry(list, struct disasm_line, al.node);
+
+			list_del_init(&dl->al.node);
+			disasm_line__free(dl);
+		}
+		count = -1;
+	}
+
+out:
+	if (needs_cs_close)
+		cs_close(&handle);
+	free(buf);
+	return count < 0 ? count : 0;
+
+err:
+	if (fd >= 0)
+		close(fd);
+	if (needs_cs_close) {
+		struct disasm_line *tmp;
+
+		/*
+		 * It probably failed in the middle of the above loop.
+		 * Release any resources it might add.
+		 */
+		list_for_each_entry_safe(dl, tmp, &notes->src->source, al.node) {
+			list_del(&dl->al.node);
+			free(dl);
+		}
+	}
+	count = -1;
+	goto out;
+}
+
 static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 					struct annotate_args *args)
 {
@@ -1949,6 +2087,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 			err = symbol__disassemble_dso(symfs_filename, sym, args);
 			if (err == 0)
 				goto out_remove_tmp;
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+			err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
+			if (err == 0)
+				goto out_remove_tmp;
+#endif
 		}
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (12 preceding siblings ...)
  2024-06-14 17:26 ` [V4 13/16] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-25  6:17   ` Namhyung Kim
  2024-06-14 17:26 ` [V4 15/16] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

There are cases where define a global register variable and associate it
with a specified register. Example, in powerpc, two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");

2. r1: represents stack_pointer
register void *__stack_pointer asm("r1");

These regs are present in dwarf debug as DW_OP_reg as part of variables
in the cu_die (compile unit). These are not present in die search done
in the list of nested scopes since these are global register variables.

Example for local_paca represented by r13:

<<>>
 <1><18dc6b4>: Abbrev Number: 128 (DW_TAG_variable)
    <18dc6b6>   DW_AT_name        : (indirect string, offset: 0x3861): local_paca
    <18dc6ba>   DW_AT_decl_file   : 48
    <18dc6bb>   DW_AT_decl_line   : 36
    <18dc6bc>   DW_AT_decl_column : 30
    <18dc6bd>   DW_AT_type        : <0x18dc6c3>
    <18dc6c1>   DW_AT_external    : 1
    <18dc6c1>   DW_AT_location    : 1 byte block: 5d    (DW_OP_reg13 (r13))

 <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <18dc6c4>   DW_AT_byte_size   : 8
    <18dc6c4>   DW_AT_type        : <0x18dc353>

Where  DW_AT_type : <0x18dc6c3> further points to :

 <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <18dc6c4>   DW_AT_byte_size   : 8
    <18dc6c4>   DW_AT_type        : <0x18dc353>

which belongs to:

 <1><18dc353>: Abbrev Number: 67 (DW_TAG_structure_type)
    <18dc354>   DW_AT_name        : (indirect string, offset: 0x56cd): paca_struct
    <18dc358>   DW_AT_byte_size   : 2944
    <18dc35a>   DW_AT_alignment   : 128
    <18dc35b>   DW_AT_decl_file   : 48
    <18dc35c>   DW_AT_decl_line   : 61
    <18dc35d>   DW_AT_decl_column : 8
    <18dc35d>   DW_AT_sibling     : <0x18dc6b4>
<<>>

Similar is case with "r1".

<<>>
 <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
    <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
    <18dd778>   DW_AT_decl_file   : 51
    <18dd779>   DW_AT_decl_line   : 1468
    <18dd77b>   DW_AT_decl_column : 24
    <18dd77c>   DW_AT_type        : <0x18da5cd>
    <18dd780>   DW_AT_external    : 1
    <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))

 where 18da5cd is:

 <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
    <18da5ce>   DW_AT_byte_size   : 8
    <18da5cf>   DW_AT_encoding    : 7   (unsigned)
    <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
<<>>

To identify data type for these two special cases, iterate over
variables in the CU die (Compile Unit) and match it with the register.
If the variable is a base type, ie die_get_real_type will return NULL
here, set offset to zero. With the changes, data type for "paca_struct"
and "long unsigned int" for r1 is identified.

Snippet from ./perf report -s type,type_off

    12.85%  long unsigned int  long unsigned int +0 (no field)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c      | 40 ++++++++++++++++++++++++++++
 tools/perf/util/annotate.c           |  8 ++++++
 tools/perf/util/annotate.h           |  1 +
 tools/perf/util/include/dwarf-regs.h |  1 +
 4 files changed, 50 insertions(+)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 734acdd8c4b7..82232f2d8e16 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1170,6 +1170,40 @@ static int find_data_type_block(struct data_loc_info *dloc,
 	return ret;
 }
 
+/*
+ * Handle cases where define a global register variable and
+ * associate it with a specified register. These regs are
+ * present in dwarf debug as DW_OP_reg as part of variables
+ * in the cu_die (compile unit). Iterate over variables in the
+ * cu_die and match with reg to identify data type die.
+ */
+static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
+		Dwarf_Die *type_die)
+{
+	Dwarf_Die vr_die;
+	int ret = -1;
+	struct die_var_type *var_types = NULL;
+
+	die_collect_vars(cu_die, &var_types);
+	while (var_types) {
+		if (var_types->reg == reg) {
+			if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
+				if (die_get_real_type(&vr_die, type_die) == NULL) {
+					dloc->type_offset = 0;
+					dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
+				}
+				pr_debug_type_name(type_die, TSR_KIND_TYPE);
+				ret = 0;
+				pr_debug_dtp("found by CU for %s (die:%#lx)\n",
+						dwarf_diename(type_die), (long)dwarf_dieoffset(type_die));
+			}
+			break;
+		}
+		var_types = var_types->next;
+	}
+	return ret;
+}
+
 /* The result will be saved in @type_die */
 static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 {
@@ -1217,6 +1251,12 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 	pr_debug_dtp("CU for %s (die:%#lx)\n",
 		     dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
 
+	if (loc->reg_type == DWARF_REG_GLOBAL) {
+		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
+		if (!ret)
+			goto out;
+	}
+
 	if (reg == DWARF_REG_PC) {
 		if (get_global_var_type(&cu_die, dloc, dloc->ip, dloc->var_addr,
 					&offset, type_die)) {
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index bfa6420dc4b9..c7e4fd16e8b4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2431,6 +2431,14 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
 			op_loc->reg1 = DWARF_REG_PC;
 		}
 
+		/* Global reg variable 13 and 1
+		 * assign to DWARF_REG_GLOBAL
+		 */
+		if (arch__is(arch, "powerpc")) {
+			if ((op_loc->reg1 == 13) || (op_loc->reg1 == 1))
+				op_loc->reg_type = DWARF_REG_GLOBAL;
+		}
+
 		mem_type = find_data_type(&dloc);
 
 		if (mem_type == NULL && is_stack_canary(arch, op_loc)) {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 9ba772f46270..ad69842a8ebc 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -475,6 +475,7 @@ struct annotated_op_loc {
 	bool mem_ref;
 	bool multi_regs;
 	bool imm;
+	int reg_type;
 };
 
 enum annotated_insn_ops {
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 7ea39362ecaf..a873c906a86b 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -5,6 +5,7 @@
 
 #define DWARF_REG_PC  0xd3af9c /* random number */
 #define DWARF_REG_FB  0xd3affb /* random number */
+#define DWARF_REG_GLOBAL 0xd3affc /* random number */
 
 #ifdef HAVE_DWARF_SUPPORT
 const char *get_arch_regstr(unsigned int n);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 15/16] tools/perf: Add support for global_die to capture name of variable in case of register defined variable
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (13 preceding siblings ...)
  2024-06-14 17:26 ` [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-14 17:26 ` [V4 16/16] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
  2024-06-20 15:31 ` [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

In case of register defined variable (found using
find_data_type_global_reg), if the type of variable happens to be base
type (example, long unsigned int), perf report captures it as:

    12.85%  long unsigned int  long unsigned int +0 (no field)

The above data type is actually referring to samples captured while
accessing "r1" which represents current stack pointer in powerpc.
register void *__stack_pointer asm("r1");

The dwarf debug contains this as:

<<>>
 <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
    <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
    <18dd778>   DW_AT_decl_file   : 51
    <18dd779>   DW_AT_decl_line   : 1468
    <18dd77b>   DW_AT_decl_column : 24
    <18dd77c>   DW_AT_type        : <0x18da5cd>
    <18dd780>   DW_AT_external    : 1
    <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))

 where 18da5cd is:

 <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
    <18da5ce>   DW_AT_byte_size   : 8
    <18da5cf>   DW_AT_encoding    : 7   (unsigned)
    <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
<<>>

To make it more clear to the user, capture the DW_AT_name of the
variable and save it as part of Dwarf_Global. Dwarf_Global is used so
that it can be used and retrieved while presenting the result.

Update "dso__findnew_data_type" function to set "var_name" if
variable name is set as part of Dwarf_Global. Updated
"hist_entry__typeoff_snprintf" to print var_name if it is set.
With the changes, along with "long unsigned int" report also says the
variable name as current_stack_pointer

Snippet of result:

    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c | 30 ++++++++++++++++++++++++------
 tools/perf/util/dwarf-aux.c     |  1 +
 tools/perf/util/dwarf-aux.h     |  1 +
 tools/perf/util/sort.c          |  7 +++++--
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 82232f2d8e16..2bce522304f4 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -268,23 +268,32 @@ static void delete_members(struct annotated_member *member)
 }
 
 static struct annotated_data_type *dso__findnew_data_type(struct dso *dso,
-							  Dwarf_Die *type_die)
+							  Dwarf_Die *type_die, Dwarf_Global *global_die)
 {
 	struct annotated_data_type *result = NULL;
 	struct annotated_data_type key;
 	struct rb_node *node;
 	struct strbuf sb;
+	struct strbuf sb_var_name;
 	char *type_name;
+	char *var_name;
 	Dwarf_Word size;
 
 	strbuf_init(&sb, 32);
+	strbuf_init(&sb_var_name, 32);
 	if (die_get_typename_from_type(type_die, &sb) < 0)
 		strbuf_add(&sb, "(unknown type)", 14);
+	if (global_die->name) {
+		strbuf_addstr(&sb_var_name, global_die->name);
+		var_name = strbuf_detach(&sb_var_name, NULL);
+	}
 	type_name = strbuf_detach(&sb, NULL);
 	dwarf_aggregate_size(type_die, &size);
 
 	/* Check existing nodes in dso->data_types tree */
 	key.self.type_name = type_name;
+	if (global_die->name)
+		key.self.var_name = var_name;
 	key.self.size = size;
 	node = rb_find(&key, dso__data_types(dso), data_type_cmp);
 	if (node) {
@@ -301,6 +310,8 @@ static struct annotated_data_type *dso__findnew_data_type(struct dso *dso,
 	}
 
 	result->self.type_name = type_name;
+	if (global_die->name)
+		result->self.var_name = var_name;
 	result->self.size = size;
 	INIT_LIST_HEAD(&result->self.children);
 
@@ -1178,7 +1189,7 @@ static int find_data_type_block(struct data_loc_info *dloc,
  * cu_die and match with reg to identify data type die.
  */
 static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
-		Dwarf_Die *type_die)
+		Dwarf_Die *type_die, Dwarf_Global *global_die)
 {
 	Dwarf_Die vr_die;
 	int ret = -1;
@@ -1190,8 +1201,11 @@ static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_
 			if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
 				if (die_get_real_type(&vr_die, type_die) == NULL) {
 					dloc->type_offset = 0;
+					global_die->name = var_types->name;
 					dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
 				}
+				global_die->die_offset = (long)dwarf_dieoffset(type_die);
+				global_die->cu_offset = (long)dwarf_dieoffset(cu_die);
 				pr_debug_type_name(type_die, TSR_KIND_TYPE);
 				ret = 0;
 				pr_debug_dtp("found by CU for %s (die:%#lx)\n",
@@ -1205,7 +1219,8 @@ static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_
 }
 
 /* The result will be saved in @type_die */
-static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
+static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die,
+		Dwarf_Global *global_die)
 {
 	struct annotated_op_loc *loc = dloc->op;
 	Dwarf_Die cu_die, var_die;
@@ -1219,6 +1234,8 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 	u64 pc;
 	char buf[64];
 
+	memset(global_die, 0, sizeof(Dwarf_Global));
+
 	if (dloc->op->multi_regs)
 		snprintf(buf, sizeof(buf), "reg%d, reg%d", dloc->op->reg1, dloc->op->reg2);
 	else if (dloc->op->reg1 == DWARF_REG_PC)
@@ -1252,7 +1269,7 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 		     dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
 
 	if (loc->reg_type == DWARF_REG_GLOBAL) {
-		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
+		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die, global_die);
 		if (!ret)
 			goto out;
 	}
@@ -1388,6 +1405,7 @@ struct annotated_data_type *find_data_type(struct data_loc_info *dloc)
 	struct annotated_data_type *result = NULL;
 	struct dso *dso = map__dso(dloc->ms->map);
 	Dwarf_Die type_die;
+	Dwarf_Global global_die;
 
 	dloc->di = debuginfo__new(dso__long_name(dso));
 	if (dloc->di == NULL) {
@@ -1403,10 +1421,10 @@ struct annotated_data_type *find_data_type(struct data_loc_info *dloc)
 
 	dloc->fbreg = -1;
 
-	if (find_data_type_die(dloc, &type_die) < 0)
+	if (find_data_type_die(dloc, &type_die, &global_die) < 0)
 		goto out;
 
-	result = dso__findnew_data_type(dso, &type_die);
+	result = dso__findnew_data_type(dso, &type_die, &global_die);
 
 out:
 	debuginfo__delete(dloc->di);
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index 44ef968a7ad3..9e61ff326651 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -1610,6 +1610,7 @@ static int __die_collect_vars_cb(Dwarf_Die *die_mem, void *arg)
 	vt->reg = reg_from_dwarf_op(ops);
 	vt->offset = offset_from_dwarf_op(ops);
 	vt->next = *var_types;
+	vt->name = dwarf_diename(die_mem);
 	*var_types = vt;
 
 	return DIE_FIND_CB_SIBLING;
diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h
index 24446412b869..406a5b1e269b 100644
--- a/tools/perf/util/dwarf-aux.h
+++ b/tools/perf/util/dwarf-aux.h
@@ -146,6 +146,7 @@ struct die_var_type {
 	u64 addr;
 	int reg;
 	int offset;
+	const char *name;
 };
 
 /* Return type info of a member at offset */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cd39ea972193..535ca19a23fd 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2305,9 +2305,12 @@ static int hist_entry__typeoff_snprintf(struct hist_entry *he, char *bf,
 	char buf[4096];
 
 	buf[0] = '\0';
-	if (list_empty(&he_type->self.children))
+	if (list_empty(&he_type->self.children)) {
 		snprintf(buf, sizeof(buf), "no field");
-	else
+		if (he_type->self.var_name)
+			strcpy(buf, he_type->self.var_name);
+
+	} else
 		fill_member_name(buf, sizeof(buf), &he_type->self,
 				 he->mem_type_off, true);
 	buf[4095] = '\0';
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [V4 16/16] tools/perf: Set instruction name to be used with insn-stat when using raw instruction
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (14 preceding siblings ...)
  2024-06-14 17:26 ` [V4 15/16] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
@ 2024-06-14 17:26 ` Athira Rajeev
  2024-06-20 15:31 ` [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
  16 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-14 17:26 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, disgoel

Since the "ins.name" is not set while using raw instruction,
perf annotate with insn-stat gives wrong data:

Result from "./perf annotate --data-type --insn-stat":

Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)

  Name      :  Good   Bad
-----------------------------------------------------------
            :   419   196

Patch sets "dl->ins.name" in arch specific function "check_ppc_insn"
while initialising "struct disasm_line". Also update "ins_find" function
to pass "struct disasm_line" as a parameter so as to set its name field
in arch specific call.

With the patch changes:

Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)

  Name/opcode:  Good   Bad
-----------------------------------------------------------
  58                  :   323    80
  32                  :    49    43
  34                  :    33    11
  OP_31_XOP_LDX       :     8    20
  40                  :    23     0
  OP_31_XOP_LWARX     :     5     1
  OP_31_XOP_LWZX      :     2     3
  OP_31_XOP_LDARX     :     3     0
  33                  :     0     2
  OP_31_XOP_LBZX      :     0     1
  OP_31_XOP_LWAX      :     0     1
  OP_31_XOP_LHZX      :     0     1

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c  | 18 +++++++++++++++---
 tools/perf/builtin-annotate.c                  |  4 ++--
 tools/perf/util/annotate.c                     |  2 +-
 tools/perf/util/disasm.c                       | 10 +++++-----
 tools/perf/util/disasm.h                       |  2 +-
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index 13eaec36a9dc..0667229cf656 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -189,8 +189,9 @@ static int cmp_offset(const void *a, const void *b)
 	return (val1->value - val2->value);
 }
 
-static struct ins_ops *check_ppc_insn(int raw_insn)
+static struct ins_ops *check_ppc_insn(struct disasm_line *dl)
 {
+	int raw_insn = dl->raw.raw_insn;
 	int opcode = PPC_OP(raw_insn);
 	int mem_insn_31 = PPC_21_30(raw_insn);
 	struct insn_offset *ret;
@@ -198,19 +199,30 @@ static struct ins_ops *check_ppc_insn(int raw_insn)
 		"OP_31_INSN",
 		mem_insn_31
 	};
+	char name_insn[32];
 
 	/*
 	 * Instructions with opcode 32 to 63 are memory
 	 * instructions in powerpc
 	 */
 	if ((opcode & 0x20)) {
+		/*
+		 * Set name in case of raw instruction to
+		 * opcode to be used in insn-stat
+		 */
+		if (!strlen(dl->ins.name)) {
+			sprintf(name_insn, "%d", opcode);
+			dl->ins.name = strdup(name_insn);
+		}
 		return &load_store_ops;
 	} else if (opcode == 31) {
 		/* Check for memory instructions with opcode 31 */
 		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
-		if (ret != NULL)
+		if (ret) {
+			if (!strlen(dl->ins.name))
+				dl->ins.name = strdup(ret->name);
 			return &load_store_ops;
-		else {
+		} else {
 			mem_insns_31_opcode.value = PPC_22_30(raw_insn);
 			ret = bsearch(&mem_insns_31_opcode, arithmetic_ins_op_31, ARRAY_SIZE(arithmetic_ins_op_31),
 					sizeof(arithmetic_ins_op_31[0]), cmp_offset);
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 50d2fb222d48..926467b9a023 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -396,10 +396,10 @@ static void print_annotate_item_stat(struct list_head *head, const char *title)
 	printf("total %d, ok %d (%.1f%%), bad %d (%.1f%%)\n\n", total,
 	       total_good, 100.0 * total_good / (total ?: 1),
 	       total_bad, 100.0 * total_bad / (total ?: 1));
-	printf("  %-10s: %5s %5s\n", "Name", "Good", "Bad");
+	printf("  %-10s: %5s %5s\n", "Name/opcode", "Good", "Bad");
 	printf("-----------------------------------------------------------\n");
 	list_for_each_entry(istat, head, list)
-		printf("  %-10s: %5d %5d\n", istat->name, istat->good, istat->bad);
+		printf("  %-20s: %5d %5d\n", istat->name, istat->good, istat->bad);
 	printf("\n");
 }
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c7e4fd16e8b4..cebaffd24fd7 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2235,7 +2235,7 @@ static struct annotated_item_stat *annotate_data_stat(struct list_head *head,
 		return NULL;
 
 	istat->name = strdup(name);
-	if (istat->name == NULL) {
+	if ((istat->name == NULL) || (!strlen(istat->name))) {
 		free(istat);
 		return NULL;
 	}
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 987bff9f71c3..0373cabf2625 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -855,7 +855,7 @@ static void ins__sort(struct arch *arch)
 	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_insn)
+static struct ins_ops *__ins__find(struct arch *arch, const char *name, struct disasm_line *dl)
 {
 	struct ins *ins;
 	const int nmemb = arch->nr_instructions;
@@ -867,7 +867,7 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_
 		 */
 		struct ins_ops *ops;
 
-		ops = check_ppc_insn(raw_insn);
+		ops = check_ppc_insn(dl);
 		if (ops)
 			return ops;
 	}
@@ -901,9 +901,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_
 	return ins ? ins->ops : NULL;
 }
 
-struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn)
+struct ins_ops *ins__find(struct arch *arch, const char *name, struct disasm_line *dl)
 {
-	struct ins_ops *ops = __ins__find(arch, name, raw_insn);
+	struct ins_ops *ops = __ins__find(arch, name, dl);
 
 	if (!ops && arch->associate_instruction_ops)
 		ops = arch->associate_instruction_ops(arch, name);
@@ -913,7 +913,7 @@ struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
 {
-	dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
+	dl->ins.ops = ins__find(arch, dl->ins.name, dl);
 
 	if (!dl->ins.ops)
 		return;
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 6b6ec23e4f6f..e3b32a796e80 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -99,7 +99,7 @@ struct annotate_args {
 struct arch *arch__find(const char *name);
 bool arch__is(struct arch *arch, const char *name);
 
-struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn);
+struct ins_ops *ins__find(struct arch *arch, const char *name, struct disasm_line *dl);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 		   struct ins_operands *ops, int max_ins_name);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [V4 00/16] Add data type profiling support for powerpc
  2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
                   ` (15 preceding siblings ...)
  2024-06-14 17:26 ` [V4 16/16] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
@ 2024-06-20 15:31 ` Athira Rajeev
  2024-06-22  0:06   ` Namhyung Kim
  16 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-20 15:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Namhyung Kim, segher, christophe.leroy
  Cc: LKML, linux-perf-users, linuxppc-dev, akanksha, maddy, kjain,
	disgoel



> On 14 Jun 2024, at 10:56 PM, Athira Rajeev <atrajeev@linux.vnet.ibm.com> wrote:
> 
> The patchset from Namhyung added support for data type profiling
> in perf tool. This enabled support to associate PMU samples to data
> types they refer using DWARF debug information. With the upstream
> perf, currently it possible to run perf report or perf annotate to
> view the data type information on x86.
> 
> Initial patchset posted here had changes need to enable data type
> profiling support for powerpc.
> 
> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
> 
> Main change were:
> 1. powerpc instruction nmemonic table to associate load/store
> instructions with move_ops which is use to identify if instruction
> is a memory access one.
> 2. To get register number and access offset from the given
> instruction, code uses fields from "struct arch" -> objump.
> Added entry for powerpc here.
> 3. A get_arch_regnum to return register number from the
> register name string.
> 
> But the apporach used in the initial patchset used parsing of
> disassembled code which the current perf tool implementation does.
> 
> Example: lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, second patchset added support to use
> raw instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> Link to second patchset:
> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
> 
> Example representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
>  _____________________________________
>  | 58 |  RT  |  RA |      DS       | |
>  -------------------------------------
> 0    6     11    16              30 31
> 
> Second patchset used "objdump" again to read the raw instruction.
> But since there is no need to disassemble and binary code can be read
> directly from the DSO, third patchset (ie this patchset) uses below
> apporach. The apporach preferred in powerpc to parse sample for data
> type profiling in V3 patchset is:
> - Read directly from DSO using dso__data_read_offset
> - If that fails for any case, fallback to using libcapstone
> - If libcapstone is not supported, approach will use objdump
> 
> Patchset adds support to pick the opcode and reg fields from this
> raw/binary instruction code. This approach came in from review comment
> by Segher Boessenkool and Christophe for the initial patchset.
> 
> Apart from that, instruction tracking is enabled for powerpc and
> support function is added to find variables defined as registers
> Example, in powerpc, below two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
> 
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");
> 
> These are handled in this patchset.
> 
> - Patch 1 is to rearrange register state type structures to header file
> so that it can referred from other arch specific files
> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> so that it can be implemented by other archs easily and defined in arch
> specific files
> - Patch 3 adds support to capture and parse raw instruction in powerpc
> using dso__data_read_offset utility
> - Patch 4 adds logic to support using objdump when doing default "perf
> report" or "perf annotate" since it that needs disassembled instruction.
> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> - Patch 6 update parameters for reg extract functions to use raw
> instruction on powerpc
> - Patch 7 add support to identify memory instructions of opcode 31 in
> powerpc
> - Patch 8 adds more instructions to support instruction tracking in powerpc
> - Patch 9 and 10 handles instruction tracking for powerpc.
> - Patch 11, 12 and 13 add support to use libcapstone in powerpc
> - Patch 14 and patch 15 handles support to find global register variables
> - Patch 16 handles insn-stat option for perf annotate
> 
> Note:
> - There are remaining unknowns (25%) as seen in annotate Instruction stats
> below.
> - This patchset is not tested on powerpc32. In next step of enhancements
> along with handling remaining unknowns, plan to cover powerpc32 changes
> based on how testing goes.
> 
> With the current patchset:
> 
> ./perf record -a -e mem-loads sleep 1
> ./perf report -s type,typeoff --hierarchy --group --stdio
> ./perf annotate --data-type --insn-stat
> 
> perf annotate logs:
> ==================
> 
> Annotate Instruction stats
> total 609, ok 446 (73.2%), bad 163 (26.8%)
> 
>  Name/opcode:  Good   Bad
>  -----------------------------------------------------------
>  58                  :   323    80
>  32                  :    49    43
>  34                  :    33    11
>  OP_31_XOP_LDX       :     8    20
>  40                  :    23     0
>  OP_31_XOP_LWARX     :     5     1
>  OP_31_XOP_LWZX      :     2     3
>  OP_31_XOP_LDARX     :     3     0
>  33                  :     0     2
>  OP_31_XOP_LBZX      :     0     1
>  OP_31_XOP_LWAX      :     0     1
>  OP_31_XOP_LHZX      :     0     1
> 
> perf report logs:
> =================
> 
>  Total Lost Samples: 0
> 
>  Samples: 1K of event 'mem-loads'
>  Event count (approx.): 937238
> 
>  Overhead  Data Type  Data Type Offset
> ........  .........  ................
> 
>    48.60%  (unknown)  (unknown) +0 (no field)
>    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>     2.69%  struct paca_struct  struct paca_struct +2808 (canary)
>     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>     1.41%  struct vm_fault  struct vm_fault +0 (vma)
>     1.29%  struct task_struct  struct task_struct +276 (flags)
>     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
>     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
>     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
>     0.76%  struct rq  struct rq +2856 (cpu)
> 
> Thanks
> Athira Rajeev

Hi All

Requesting for review comments for this patchset

Thanks
Athira
> 
> Changelog:
> From v3->v4:
> - Addressed review comments from Ian by using capston_init from
>  "util/print_insn.c" instead of "open_capston_handle".
> - Addressed review comment from Namhyung by moving "opcode"
>  field from "struct ins" to "struct disasm_line"
> 
> From v2->v3:
> - Addressed review comments from Christophe and Namhyung for V2
> - Changed the apporach in powerpc to parse sample for data
>  type profiling as:
>  Read directly from DSO using dso__data_read_offset
>  If that fails for any case, fallback to using libcapstone
>  If libcapstone is not supported, approach will use objdump
> - Include instructions with opcode as 31 and correctly categorize
>  them as memory or arithmetic instructions.
> - Include more instructions for instruction tracking in powerpc
> 
> From v1->v2:
> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>  to use the binary code (raw insn) to fetch opcode, register and
>  offset fields.
> - Added support for instruction tracking in powerpc
> - Find the register defined variables (r13 and r1 which points to
>  local_paca and current_stack_pointer in powerpc)
> 
> Athira Rajeev (16):
>  tools/perf: Move the data structures related to register type to
>    header file
>  tools/perf: Add "update_insn_state" callback function to handle arch
>    specific instruction tracking
>  tools/perf: Add support to capture and parse raw instruction in
>    powerpc using dso__data_read_offset utility
>  tools/perf: Use sort keys to determine whether to pick objdump to
>    disassemble
>  tools/perf: Add disasm_line__parse to parse raw instruction for
>    powerpc
>  tools/perf: Update parameters for reg extract functions to use raw
>    instruction on powerpc
>  tools/perf: Add support to identify memory instructions of opcode 31
>    in powerpc
>  tools/perf: Add some of the arithmetic instructions to support
>    instruction tracking in powerpc
>  tools/perf: Add more instructions for instruction tracking
>  tools/perf: Update instruction tracking for powerpc
>  tools/perf: Make capstone_init non-static so that it can be used
>    during symbol disassemble
>  tools/perf: Use capstone_init and remove open_capstone_handle from
>    disasm.c
>  tools/perf: Add support to use libcapstone in powerpc
>  tools/perf: Add support to find global register variables using
>    find_data_type_global_reg
>  tools/perf: Add support for global_die to capture name of variable in
>    case of register defined variable
>  tools/perf: Set instruction name to be used with insn-stat when using
>    raw instruction
> 
> tools/include/linux/string.h                  |   2 +
> tools/lib/string.c                            |  13 +
> tools/perf/arch/arm64/annotate/instructions.c |   3 +-
> .../arch/loongarch/annotate/instructions.c    |   6 +-
> .../perf/arch/powerpc/annotate/instructions.c | 260 +++++++++
> tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
> tools/perf/arch/s390/annotate/instructions.c  |   5 +-
> tools/perf/arch/x86/annotate/instructions.c   | 383 +++++++++++++
> tools/perf/builtin-annotate.c                 |   4 +-
> tools/perf/util/annotate-data.c               | 519 +++---------------
> tools/perf/util/annotate-data.h               |  78 +++
> tools/perf/util/annotate.c                    |  35 +-
> tools/perf/util/annotate.h                    |   6 +-
> tools/perf/util/disasm.c                      | 475 ++++++++++++++--
> tools/perf/util/disasm.h                      |  13 +-
> tools/perf/util/dwarf-aux.c                   |   1 +
> tools/perf/util/dwarf-aux.h                   |   1 +
> tools/perf/util/include/dwarf-regs.h          |   4 +
> tools/perf/util/print_insn.c                  |  15 +-
> tools/perf/util/print_insn.h                  |   5 +
> tools/perf/util/sort.c                        |   7 +-
> 21 files changed, 1386 insertions(+), 502 deletions(-)
> 
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 00/16] Add data type profiling support for powerpc
  2024-06-20 15:31 ` [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
@ 2024-06-22  0:06   ` Namhyung Kim
  2024-06-25 11:48     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-22  0:06 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	segher, christophe.leroy, LKML, linux-perf-users, linuxppc-dev,
	akanksha, maddy, kjain, disgoel

Hello,

On Thu, Jun 20, 2024 at 09:01:01PM +0530, Athira Rajeev wrote:
> 
> 
> > On 14 Jun 2024, at 10:56 PM, Athira Rajeev <atrajeev@linux.vnet.ibm.com> wrote:
> > 
> > The patchset from Namhyung added support for data type profiling
> > in perf tool. This enabled support to associate PMU samples to data
> > types they refer using DWARF debug information. With the upstream
> > perf, currently it possible to run perf report or perf annotate to
> > view the data type information on x86.
> > 
> > Initial patchset posted here had changes need to enable data type
> > profiling support for powerpc.
> > 
> > https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
> > 
> > Main change were:
> > 1. powerpc instruction nmemonic table to associate load/store
> > instructions with move_ops which is use to identify if instruction
> > is a memory access one.
> > 2. To get register number and access offset from the given
> > instruction, code uses fields from "struct arch" -> objump.
> > Added entry for powerpc here.
> > 3. A get_arch_regnum to return register number from the
> > register name string.
> > 
> > But the apporach used in the initial patchset used parsing of
> > disassembled code which the current perf tool implementation does.
> > 
> > Example: lwz     r10,0(r9)
> > 
> > This line "lwz r10,0(r9)" is parsed to extract instruction name,
> > registers names and offset. Also to find whether there is a memory
> > reference in the operands, "memory_ref_char" field of objdump is used.
> > For x86, "(" is used as memory_ref_char to tackle instructions of the
> > form "mov  (%rax), %rcx".
> > 
> > In case of powerpc, not all instructions using "(" are the only memory
> > instructions. Example, above instruction can also be of extended form (X
> > form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> > and extract the source/target registers, second patchset added support to use
> > raw instruction. With raw instruction, macros are added to extract opcode
> > and register fields.
> > Link to second patchset:
> > https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
> > 
> > Example representation using --show-raw-insn in objdump gives result:
> > 
> > 38 01 81 e8     ld      r4,312(r1)
> > 
> > Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> > this translates to instruction form: "ld RT,DS(RA)" and binary code
> > as:
> >  _____________________________________
> >  | 58 |  RT  |  RA |      DS       | |
> >  -------------------------------------
> > 0    6     11    16              30 31
> > 
> > Second patchset used "objdump" again to read the raw instruction.
> > But since there is no need to disassemble and binary code can be read
> > directly from the DSO, third patchset (ie this patchset) uses below
> > apporach. The apporach preferred in powerpc to parse sample for data
> > type profiling in V3 patchset is:
> > - Read directly from DSO using dso__data_read_offset
> > - If that fails for any case, fallback to using libcapstone
> > - If libcapstone is not supported, approach will use objdump
> > 
> > Patchset adds support to pick the opcode and reg fields from this
> > raw/binary instruction code. This approach came in from review comment
> > by Segher Boessenkool and Christophe for the initial patchset.
> > 
> > Apart from that, instruction tracking is enabled for powerpc and
> > support function is added to find variables defined as registers
> > Example, in powerpc, below two registers are
> > defined to represent variable:
> > 1. r13: represents local_paca
> > register struct paca_struct *local_paca asm("r13");
> > 
> > 2. r1: represents stack_pointer
> > register void *__stack_pointer asm("r1");
> > 
> > These are handled in this patchset.
> > 
> > - Patch 1 is to rearrange register state type structures to header file
> > so that it can referred from other arch specific files
> > - Patch 2 is to make instruction tracking as a callback to"struct arch"
> > so that it can be implemented by other archs easily and defined in arch
> > specific files
> > - Patch 3 adds support to capture and parse raw instruction in powerpc
> > using dso__data_read_offset utility
> > - Patch 4 adds logic to support using objdump when doing default "perf
> > report" or "perf annotate" since it that needs disassembled instruction.
> > - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> > - Patch 6 update parameters for reg extract functions to use raw
> > instruction on powerpc
> > - Patch 7 add support to identify memory instructions of opcode 31 in
> > powerpc
> > - Patch 8 adds more instructions to support instruction tracking in powerpc
> > - Patch 9 and 10 handles instruction tracking for powerpc.
> > - Patch 11, 12 and 13 add support to use libcapstone in powerpc
> > - Patch 14 and patch 15 handles support to find global register variables
> > - Patch 16 handles insn-stat option for perf annotate
> > 
> > Note:
> > - There are remaining unknowns (25%) as seen in annotate Instruction stats
> > below.
> > - This patchset is not tested on powerpc32. In next step of enhancements
> > along with handling remaining unknowns, plan to cover powerpc32 changes
> > based on how testing goes.
> > 
> > With the current patchset:
> > 
> > ./perf record -a -e mem-loads sleep 1
> > ./perf report -s type,typeoff --hierarchy --group --stdio
> > ./perf annotate --data-type --insn-stat
> > 
> > perf annotate logs:
> > ==================
> > 
> > Annotate Instruction stats
> > total 609, ok 446 (73.2%), bad 163 (26.8%)
> > 
> >  Name/opcode:  Good   Bad
> >  -----------------------------------------------------------
> >  58                  :   323    80
> >  32                  :    49    43
> >  34                  :    33    11
> >  OP_31_XOP_LDX       :     8    20
> >  40                  :    23     0
> >  OP_31_XOP_LWARX     :     5     1
> >  OP_31_XOP_LWZX      :     2     3
> >  OP_31_XOP_LDARX     :     3     0
> >  33                  :     0     2
> >  OP_31_XOP_LBZX      :     0     1
> >  OP_31_XOP_LWAX      :     0     1
> >  OP_31_XOP_LHZX      :     0     1
> > 
> > perf report logs:
> > =================
> > 
> >  Total Lost Samples: 0
> > 
> >  Samples: 1K of event 'mem-loads'
> >  Event count (approx.): 937238
> > 
> >  Overhead  Data Type  Data Type Offset
> > ........  .........  ................
> > 
> >    48.60%  (unknown)  (unknown) +0 (no field)
> >    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
> >     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
> >     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
> >     2.69%  struct paca_struct  struct paca_struct +2808 (canary)
> >     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
> >     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
> >     1.41%  struct vm_fault  struct vm_fault +0 (vma)
> >     1.29%  struct task_struct  struct task_struct +276 (flags)
> >     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
> >     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
> >     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
> >     0.76%  struct rq  struct rq +2856 (cpu)
> > 
> > Thanks
> > Athira Rajeev
> 
> Hi All
> 
> Requesting for review comments for this patchset

Sorry about the delay, I was traveling and busy with other things.
I'll review this next week!

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 01/16] tools/perf: Move the data structures related to register type to header file
  2024-06-14 17:26 ` [V4 01/16] tools/perf: Move the data structures related to register type to header file Athira Rajeev
@ 2024-06-25  5:15   ` Namhyung Kim
  2024-06-25 10:54     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  5:15 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

Hello,

On Fri, Jun 14, 2024 at 10:56:16PM +0530, Athira Rajeev wrote:
> Data type profiling uses instruction tracking by checking each
> instruction and updating the register type state in some data
> structures. This is useful to find the data type in cases when the
> register state gets transferred from one reg to another. Example, in
> x86, "mov" instruction and in powerpc, "mr" instruction. Currently these
> structures are defined in annotate-data.c and instruction tracking is
> implemented only for x86. Move these data structures to
> "annotate-data.h" header file so that other arch implementations can use
> it in arch specific files as well.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/annotate-data.c | 53 +------------------------------
>  tools/perf/util/annotate-data.h | 55 +++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 52 deletions(-)
> 
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 965da6c0b542..a4c7f98a75e3 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -31,15 +31,6 @@
>  
>  static void delete_var_types(struct die_var_type *var_types);
>  
> -enum type_state_kind {
> -	TSR_KIND_INVALID = 0,
> -	TSR_KIND_TYPE,
> -	TSR_KIND_PERCPU_BASE,
> -	TSR_KIND_CONST,
> -	TSR_KIND_POINTER,
> -	TSR_KIND_CANARY,
> -};
> -
>  #define pr_debug_dtp(fmt, ...)					\
>  do {								\
>  	if (debug_type_profile)					\
> @@ -140,49 +131,7 @@ static void pr_debug_location(Dwarf_Die *die, u64 pc, int reg)
>  	}
>  }
>  
> -/*
> - * Type information in a register, valid when @ok is true.
> - * The @caller_saved registers are invalidated after a function call.
> - */
> -struct type_state_reg {
> -	Dwarf_Die type;
> -	u32 imm_value;
> -	bool ok;
> -	bool caller_saved;
> -	u8 kind;
> -};
> -
> -/* Type information in a stack location, dynamically allocated */
> -struct type_state_stack {
> -	struct list_head list;
> -	Dwarf_Die type;
> -	int offset;
> -	int size;
> -	bool compound;
> -	u8 kind;
> -};
> -
> -/* FIXME: This should be arch-dependent */
> -#define TYPE_STATE_MAX_REGS  16
> -
> -/*
> - * State table to maintain type info in each register and stack location.
> - * It'll be updated when new variable is allocated or type info is moved
> - * to a new location (register or stack).  As it'd be used with the
> - * shortest path of basic blocks, it only maintains a single table.
> - */
> -struct type_state {
> -	/* state of general purpose registers */
> -	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
> -	/* state of stack location */
> -	struct list_head stack_vars;
> -	/* return value register */
> -	int ret_reg;
> -	/* stack pointer register */
> -	int stack_reg;
> -};
> -
> -static bool has_reg_type(struct type_state *state, int reg)
> +bool has_reg_type(struct type_state *state, int reg)
>  {
>  	return (unsigned)reg < ARRAY_SIZE(state->regs);
>  }
> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
> index 0a57d9f5ee78..ef235b1b15e1 100644
> --- a/tools/perf/util/annotate-data.h
> +++ b/tools/perf/util/annotate-data.h
> @@ -6,6 +6,9 @@
>  #include <linux/compiler.h>
>  #include <linux/rbtree.h>
>  #include <linux/types.h>
> +#include "dwarf-aux.h"
> +#include "annotate.h"
> +#include "debuginfo.h"
>  
>  struct annotated_op_loc;
>  struct debuginfo;
> @@ -15,6 +18,15 @@ struct hist_entry;
>  struct map_symbol;
>  struct thread;
>  
> +enum type_state_kind {
> +	TSR_KIND_INVALID = 0,
> +	TSR_KIND_TYPE,
> +	TSR_KIND_PERCPU_BASE,
> +	TSR_KIND_CONST,
> +	TSR_KIND_POINTER,
> +	TSR_KIND_CANARY,
> +};
> +
>  /**
>   * struct annotated_member - Type of member field
>   * @node: List entry in the parent list
> @@ -142,6 +154,48 @@ struct annotated_data_stat {
>  };
>  extern struct annotated_data_stat ann_data_stat;
>  
> +/*
> + * Type information in a register, valid when @ok is true.
> + * The @caller_saved registers are invalidated after a function call.
> + */
> +struct type_state_reg {
> +	Dwarf_Die type;
> +	u32 imm_value;
> +	bool ok;
> +	bool caller_saved;
> +	u8 kind;
> +};
> +
> +/* Type information in a stack location, dynamically allocated */
> +struct type_state_stack {
> +	struct list_head list;
> +	Dwarf_Die type;
> +	int offset;
> +	int size;
> +	bool compound;
> +	u8 kind;
> +};
> +
> +/* FIXME: This should be arch-dependent */
> +#define TYPE_STATE_MAX_REGS  32

Can you please define this for powerpc separately?  I think x86 should
remain in 16.

Thanks,
Namhyung

> +
> +/*
> + * State table to maintain type info in each register and stack location.
> + * It'll be updated when new variable is allocated or type info is moved
> + * to a new location (register or stack).  As it'd be used with the
> + * shortest path of basic blocks, it only maintains a single table.
> + */
> +struct type_state {
> +	/* state of general purpose registers */
> +	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
> +	/* state of stack location */
> +	struct list_head stack_vars;
> +	/* return value register */
> +	int ret_reg;
> +	/* stack pointer register */
> +	int stack_reg;
> +};
> +
>  #ifdef HAVE_DWARF_SUPPORT
>  
>  /* Returns data type at the location (ip, reg, offset) */
> @@ -160,6 +214,7 @@ void global_var_type__tree_delete(struct rb_root *root);
>  
>  int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
>  
> +bool has_reg_type(struct type_state *state, int reg);
>  #else /* HAVE_DWARF_SUPPORT */
>  
>  static inline struct annotated_data_type *
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-06-14 17:26 ` [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
@ 2024-06-25  5:29   ` Namhyung Kim
  2024-06-25 12:38     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  5:29 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:18PM +0530, Athira Rajeev wrote:
> Add support to capture and parse raw instruction in powerpc.
> Currently, the perf tool infrastructure uses two ways to disassemble
> and understand the instruction. One is objdump and other option is
> via libcapstone.
> 
> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> with "objdump" while disassemble. Example from powerpc with this option
> for an instruction address is:
> 
> Snippet from:
> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> 
> c0000000010224b4:	lwz     r10,0(r9)

What about removing --no-show-raw-insn and parse the raw byte code in
the output for powerpc?  I think it's better to support normal
annotation together.

> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, patch adds support to use raw
> instruction for powerpc. Approach used is to read the raw instruction
> directly from the DSO file using "dso__data_read_offset" utility which
> is already implemented in perf infrastructure in "util/dso.c".
> 
> Example:
> 
> 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> 
>    | 58 |  RT  |  RA |      DS       | |
>    -------------------------------------
>    0    6     11    16              30 31
> 
> Function "symbol__disassemble_dso" is updated to read raw instruction
> directly from DSO using dso__data_read_offset utility. In case of
> above example, this captures:
> line:    38 01 81 e8
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/disasm.c | 98 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index b5fe3a7508bb..f19496133bf0 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -1586,6 +1586,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>  }
>  #endif
>  
> +static int symbol__disassemble_dso(char *filename, struct symbol *sym,

Maybe rename to symbol__disassemble_raw() ?

> +					struct annotate_args *args)
> +{
> +	struct annotation *notes = symbol__annotation(sym);
> +	struct map *map = args->ms.map;
> +	struct dso *dso = map__dso(map);
> +	u64 start = map__rip_2objdump(map, sym->start);
> +	u64 end = map__rip_2objdump(map, sym->end);
> +	u64 len = end - start;
> +	u64 offset;
> +	int i, count;
> +	u8 *buf = NULL;
> +	char disasm_buf[512];
> +	struct disasm_line *dl;
> +	u32 *line;
> +
> +	/* Return if objdump is specified explicitly */
> +	if (args->options->objdump_path)
> +		return -1;
> +
> +	pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);

You may want to print the actual offset and remove the "using
dso__data_read_offset" part.

Thanks,
Namhyung

> +
> +	buf = malloc(len);
> +	if (buf == NULL)
> +		goto err;
> +
> +	count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
> +
> +	line = (u32 *)buf;
> +
> +	if ((u64)count != len)
> +		goto err;
> +
> +	/* add the function address and name */
> +	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
> +		  start, sym->name);
> +
> +	args->offset = -1;
> +	args->line = disasm_buf;
> +	args->line_nr = 0;
> +	args->fileloc = NULL;
> +	args->ms.sym = sym;
> +
> +	dl = disasm_line__new(args);
> +	if (dl == NULL)
> +		goto err;
> +
> +	annotation_line__add(&dl->al, &notes->src->source);
> +
> +	/* Each raw instruction is 4 byte */
> +	count = len/4;
> +
> +	for (i = 0, offset = 0; i < count; i++) {
> +		args->offset = offset;
> +		sprintf(args->line, "%x", line[i]);
> +		dl = disasm_line__new(args);
> +		if (dl == NULL)
> +			goto err;
> +
> +		annotation_line__add(&dl->al, &notes->src->source);
> +		offset += 4;
> +	}
> +
> +	/* It failed in the middle */
> +	if (offset != len) {
> +		struct list_head *list = &notes->src->source;
> +
> +		/* Discard all lines and fallback to objdump */
> +		while (!list_empty(list)) {
> +			dl = list_first_entry(list, struct disasm_line, al.node);
> +
> +			list_del_init(&dl->al.node);
> +			disasm_line__free(dl);
> +		}
> +		count = -1;
> +	}
> +
> +out:
> +	free(buf);
> +	return count < 0 ? count : 0;
> +
> +err:
> +	count = -1;
> +	goto out;
> +}
>  /*
>   * Possibly create a new version of line with tabs expanded. Returns the
>   * existing or new line, storage is updated if a new line is allocated. If
> @@ -1710,6 +1795,19 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>  		strcpy(symfs_filename, tmp);
>  	}
>  
> +	/*
> +	 * For powerpc data type profiling, use the dso__data_read_offset
> +	 * to read raw instruction directly and interpret the binary code
> +	 * to understand instructions and register fields. For sort keys as
> +	 * type and typeoff, disassemble to mnemonic notation is
> +	 * not required in case of powerpc.
> +	 */
> +	if (arch__is(args->arch, "powerpc")) {
> +		err = symbol__disassemble_dso(symfs_filename, sym, args);
> +		if (err == 0)
> +			goto out_remove_tmp;
> +	}
> +
>  #ifdef HAVE_LIBCAPSTONE_SUPPORT
>  	err = symbol__disassemble_capstone(symfs_filename, sym, args);
>  	if (err == 0)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble
  2024-06-14 17:26 ` [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble Athira Rajeev
@ 2024-06-25  5:32   ` Namhyung Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  5:32 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:19PM +0530, Athira Rajeev wrote:
> perf annotate can be done in different ways. One way is to directly use
> "perf annotate" command, other way to annotate specific symbol is to do
> "perf report" and press "a" on the sample in UI mode. The approach
> preferred in powerpc to parse sample for data type profiling is:
> - Read directly from DSO using dso__data_read_offset
> - If that fails for any case, fallback to using libcapstone
> - If libcapstone is not supported, approach will use objdump
> 
> The above works well when perf report is invoked with only sort keys for
> data type ie type and typeoff. Because there is no instruction level
> annotation needed if only data type information is requested for. For
> annotating sample, along with type and typeoff sort key, "sym" sort key
> is also needed. And by default invoking just "perf report" uses sort key
> "sym" that displays the symbol information.
> 
> With approach changes in powerpc which first reads DSO for raw
> instruction, "perf annotate" and "perf report" + a key breaks since
> it doesn't do the instruction level disassembly.

So as I said, it'd be nice you can read the raw insn from the objdump
output directly.

Thanks,
Namhyung

> 
> Snippet of result from perf report:
> 
> Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
> do_work  /usr/bin/pmlogger [Percent: local period]
> Percent│        ea230010
>        │        3a550010
>        │        3a600000
> 
>        │        38f60001
>        │        39490008
>        │        42400438
>  51.44 │        81290008
>        │        7d485378
> 
> Here, raw instruction is displayed in the output instead of human
> readable annotated form.
> 
> One way to get the appropriate data is to specify "--objdump path", by
> which code annotation will be done. But the default behaviour will be
> changed. To fix this breakage, check if "sym" sort key is set. If so
> fallback and use the libcapstone/objdump way of disassmbling the sample.
> 
> With the changes and "perf report"
> 
> Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
> do_work  /usr/bin/pmlogger [Percent: local period]
> Percent│        ld        r17,16(r3)
>        │        addi      r18,r21,16
>        │        li        r19,0
> 
>        │ 8b0:   rldicl    r10,r10,63,33
>        │        addi      r10,r10,1
>        │        mtctr     r10
>        │      ↓ b         8e4
>        │ 8c0:   addi      r7,r22,1
>        │        addi      r10,r9,8
>        │      ↓ bdz       d00
>  51.44 │        lwz       r9,8(r9)
>        │        mr        r8,r10
>        │        cmpw      r20,r9
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/disasm.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index f19496133bf0..b81cdcf4d6b4 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -25,6 +25,7 @@
>  #include "srcline.h"
>  #include "symbol.h"
>  #include "util.h"
> +#include "sort.h"
>  
>  static regex_t	 file_lineno;
>  
> @@ -1803,9 +1804,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>  	 * not required in case of powerpc.
>  	 */
>  	if (arch__is(args->arch, "powerpc")) {
> -		err = symbol__disassemble_dso(symfs_filename, sym, args);
> -		if (err == 0)
> -			goto out_remove_tmp;
> +		if (sort_order && !strstr(sort_order, "sym")) {
> +			err = symbol__disassemble_dso(symfs_filename, sym, args);
> +			if (err == 0)
> +				goto out_remove_tmp;
> +		}
>  	}
>  
>  #ifdef HAVE_LIBCAPSTONE_SUPPORT
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-14 17:26 ` [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
@ 2024-06-25  5:39   ` Namhyung Kim
  2024-06-25 12:42     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  5:39 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
> Currently, the perf tool infrastructure disasm_line__parse function to
> parse disassembled line.
> 
> Example snippet from objdump:
> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> 
> c0000000010224b4:	lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. In powerpc, the approach for data type
> profiling uses raw instruction instead of result from objdump to identify
> the instruction category and extract the source/target registers.
> 
> Example: 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. Add function
> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
> Also update "struct disasm_line" to save the binary code/
> With the change, function captures:
> 
> line -> "38 01 81 e8     ld      r4,312(r1)"
> raw instruction "38 01 81 e8"
> 
> Raw instruction is used later to extract the reg/offset fields. Macros
> are added to extract opcode and register fields. "struct disasm_line"
> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
> code (raw). Function "disasm_line__parse_powerpc fills the raw
> instruction hex value and can use macros to get opcode. There is no
> changes in existing code paths, which parses the disassembled code.
> The architecture using the instruction name and present approach is
> not altered. Since this approach targets powerpc, the macro
> implementation is added for powerpc as of now.
> 
> Since the disasm_line__parse is used in other cases (perf annotate) and
> not only data tye profiling, the powerpc callback includes changes to
> work with binary code as well as mneumonic representation. Also in case
> if the DSO read fails and libcapstone is not supported, the approach
> fallback to use objdump as option. Hence as option, patch has changes to
> ensure objdump option also works well.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/include/linux/string.h                  |  2 +
>  tools/lib/string.c                            | 13 ++++
>  .../perf/arch/powerpc/annotate/instructions.c |  1 +
>  tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 +++
>  tools/perf/util/annotate.h                    |  5 +-
>  tools/perf/util/disasm.c                      | 59 ++++++++++++++++++-
>  6 files changed, 87 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
> index db5c99318c79..0acb1fc14e19 100644
> --- a/tools/include/linux/string.h
> +++ b/tools/include/linux/string.h
> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>  
>  extern char *strim(char *);
>  
> +extern void remove_spaces(char *s);
> +
>  extern void *memchr_inv(const void *start, int c, size_t bytes);
>  #endif /* _TOOLS_LINUX_STRING_H_ */
> diff --git a/tools/lib/string.c b/tools/lib/string.c
> index 8b6892f959ab..3126d2cff716 100644
> --- a/tools/lib/string.c
> +++ b/tools/lib/string.c
> @@ -153,6 +153,19 @@ char *strim(char *s)
>  	return skip_spaces(s);
>  }
>  
> +/*
> + * remove_spaces - Removes whitespaces from @s
> + */
> +void remove_spaces(char *s)
> +{
> +	char *d = s;
> +
> +	do {
> +		while (*d == ' ')
> +			++d;
> +	} while ((*s++ = *d++));
> +}
> +
>  /**
>   * strreplace - Replace all occurrences of character in string.
>   * @s: The string to operate on.
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
> index a3f423c27cae..d57fd023ef9c 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  		arch->initialized = true;
>  		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>  		arch->objdump.comment_char      = '#';
> +		annotate_opts.show_asm_raw = true;

Right, I think this will add the raw insn in the output of objdump, no?
Why not using the information?

>  	}
>  
>  	return 0;
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index 0c4f4caf53ac..430623ca5612 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
>  			return roff->ptregs_offset;
>  	return -EINVAL;
>  }
> +
> +#define PPC_OP(op)	(((op) >> 26) & 0x3F)
> +#define PPC_RA(a)	(((a) >> 16) & 0x1f)
> +#define PPC_RT(t)	(((t) >> 21) & 0x1f)
> +#define PPC_RB(b)	(((b) >> 11) & 0x1f)
> +#define PPC_D(D)	((D) & 0xfffe)
> +#define PPC_DS(DS)	((DS) & 0xfffc)
> +#define OP_LD	58
> +#define OP_STD	62
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index d5c821c22f79..9ba772f46270 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -113,7 +113,10 @@ struct annotation_line {
>  struct disasm_line {
>  	struct ins		 ins;
>  	struct ins_operands	 ops;
> -
> +	union {
> +		u8 bytes[4];
> +		u32 raw_insn;
> +	} raw;
>  	/* This needs to be at the end. */
>  	struct annotation_line	 al;
>  };
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index b81cdcf4d6b4..1e8568738b38 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -45,6 +45,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>  
>  static void ins__sort(struct arch *arch);
>  static int disasm_line__parse(char *line, const char **namep, char **rawp);
> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
>  
>  static __attribute__((constructor)) void symbol__init_regexpr(void)
>  {
> @@ -844,6 +845,59 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>  	return -1;
>  }
>  
> +/*
> + * Parses the result captured from symbol__disassemble_*
> + * Example, line read from DSO file in powerpc:
> + * line:    38 01 81 e8
> + * opcode: fetched from arch specific get_opcode_insn
> + * rawp_insn: e8810138
> + *
> + * rawp_insn is used later to extract the reg/offset fields
> + */
> +#define	PPC_OP(op)	(((op) >> 26) & 0x3F)
> +
> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
> +{
> +	char *line = dl->al.line;
> +	const char **namep = &dl->ins.name;
> +	char **rawp = &dl->ops.raw;
> +	char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
> +	char *name = skip_spaces(name_raw_insn + 11);
> +	int objdump = 0;
> +
> +	if (strlen(line) > 11)
> +		objdump = 1;
> +
> +	if (name_raw_insn[0] == '\0')
> +		return -1;
> +
> +	if (objdump) {
> +		*rawp = name + 1;
> +		while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
> +			++*rawp;
> +		tmp = (*rawp)[0];
> +		(*rawp)[0] = '\0';
> +
> +		*namep = strdup(name);
> +		if (*namep == NULL)
> +			return -1;
> +
> +		(*rawp)[0] = tmp;
> +		*rawp = strim(*rawp);
> +	} else
> +		*namep = "";
> +
> +	tmp_raw_insn = strdup(name_raw_insn);
> +	tmp_raw_insn[11] = '\0';
> +	remove_spaces(tmp_raw_insn);
> +
> +	dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
> +	if (objdump)
> +		dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));

Hmm.. can you use a sscanf() instead?

  sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)

Thanks,
Namhyung

> +
> +	return 0;
> +}
> +
>  static void annotation_line__init(struct annotation_line *al,
>  				  struct annotate_args *args,
>  				  int nr)
> @@ -897,7 +951,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>  		goto out_delete;
>  
>  	if (args->offset != -1) {
> -		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
> +		if (arch__is(args->arch, "powerpc")) {
> +			if (disasm_line__parse_powerpc(dl) < 0)
> +				goto out_free_line;
> +		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>  			goto out_free_line;
>  
>  		disasm_line__init_ins(dl, args->arch, &args->ms);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-06-14 17:26 ` [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
@ 2024-06-25  6:00   ` Namhyung Kim
  2024-06-25 12:43     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  6:00 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:21PM +0530, Athira Rajeev wrote:
> Use the raw instruction code and macros to identify memory instructions,
> extract register fields and also offset. The implementation addresses
> the D-form, X-form, DS-form instructions. Two main functions are added.
> New parse function "load_store__parse" as instruction ops parser for
> memory instructions. Unlink other parser (like mov__parse), this parser
> fills in the "multi_regs" field for source/target and new added "mem_ref"
> field. No other fields are set because, here there is no need to parse the
> disassembled code and arch specific macros will take care of extracting
> offset and regs which is easier and will be precise.
> 
> In powerpc, all instructions with a primary opcode from 32 to 63
> are memory instructions. Update "ins__find" function to have "raw_insn"
> also as a parameter. Don't use the "extract_reg_offset", instead use
> newly added function "get_arch_regs" which will set these fields: reg1,
> reg2, offset depending of where it is source or target ops.
> 
> Update "parse" callback for "struct ins_ops" to also pass "struct
> disasm_line" as argument. This is needed in parse functions where opcode
> is used to determine whether to set multi_regs.

Can you please split "ins__find" change and "parse" change into separate
commits?

> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/arch/arm64/annotate/instructions.c |  3 +-
>  .../arch/loongarch/annotate/instructions.c    |  6 +-
>  .../perf/arch/powerpc/annotate/instructions.c | 16 ++++
>  tools/perf/arch/powerpc/util/dwarf-regs.c     | 44 +++++++++++
>  tools/perf/arch/s390/annotate/instructions.c  |  5 +-
>  tools/perf/util/annotate.c                    | 25 ++++++-
>  tools/perf/util/disasm.c                      | 73 ++++++++++++++++---
>  tools/perf/util/disasm.h                      |  6 +-
>  tools/perf/util/include/dwarf-regs.h          |  3 +
>  9 files changed, 159 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
> index 4af0c3a0f86e..f86d9f4798bd 100644
> --- a/tools/perf/arch/arm64/annotate/instructions.c
> +++ b/tools/perf/arch/arm64/annotate/instructions.c
> @@ -11,7 +11,8 @@ struct arm64_annotate {
>  
>  static int arm64_mov__parse(struct arch *arch __maybe_unused,
>  			    struct ins_operands *ops,
> -			    struct map_symbol *ms __maybe_unused)
> +			    struct map_symbol *ms __maybe_unused,
> +			    struct disasm_line *dl __maybe_unused)
>  {
>  	char *s = strchr(ops->raw, ','), *target, *endptr;
>  
> diff --git a/tools/perf/arch/loongarch/annotate/instructions.c b/tools/perf/arch/loongarch/annotate/instructions.c
> index 21cc7e4149f7..ab43b1ab51e3 100644
> --- a/tools/perf/arch/loongarch/annotate/instructions.c
> +++ b/tools/perf/arch/loongarch/annotate/instructions.c
> @@ -5,7 +5,8 @@
>   * Copyright (C) 2020-2023 Loongson Technology Corporation Limited
>   */
>  
> -static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
> +static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	char *c, *endptr, *tok, *name;
>  	struct map *map = ms->map;
> @@ -51,7 +52,8 @@ static struct ins_ops loongarch_call_ops = {
>  	.scnprintf = call__scnprintf,
>  };
>  
> -static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
> +static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	struct map *map = ms->map;
>  	struct symbol *sym = ms->sym;
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
> index d57fd023ef9c..10fea5e5cf4c 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -49,6 +49,22 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
>  	return ops;
>  }
>  
> +#define PPC_OP(op)      (((op) >> 26) & 0x3F)
> +
> +static struct ins_ops *check_ppc_insn(int raw_insn)

It'd be nice to use 'u32' instead of 'int' for raw_insn if you want to
do some bit operations.

> +{
> +	int opcode = PPC_OP(raw_insn);
> +
> +	/*
> +	 * Instructions with opcode 32 to 63 are memory
> +	 * instructions in powerpc
> +	 */
> +	if ((opcode & 0x20))
> +		return &load_store_ops;
> +
> +	return NULL;
> +}
> +
>  static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  {
>  	if (!arch->initialized) {
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index 430623ca5612..e01729f3c0b3 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -107,3 +107,47 @@ int regs_query_register_offset(const char *name)
>  #define PPC_DS(DS)	((DS) & 0xfffc)
>  #define OP_LD	58
>  #define OP_STD	62
> +
> +static int get_source_reg(unsigned int raw_insn)
> +{
> +	return PPC_RA(raw_insn);

Ditto, and others too.


> +}
> +
> +static int get_target_reg(unsigned int raw_insn)
> +{
> +	return PPC_RT(raw_insn);
> +}
> +
> +static int get_offset_opcode(int raw_insn)
> +{
> +	int opcode = PPC_OP(raw_insn);
> +
> +	/* DS- form */
> +	if ((opcode == OP_LD) || (opcode == OP_STD))
> +		return PPC_DS(raw_insn);
> +	else
> +		return PPC_D(raw_insn);
> +}
> +
> +/*
> + * Fills the required fields for op_loc depending on if it
> + * is a source or target.
> + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
> + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
> + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
> + */
> +void get_arch_regs(int raw_insn, int is_source,
> +		struct annotated_op_loc *op_loc)
> +{
> +	if (is_source)
> +		op_loc->reg1 = get_source_reg(raw_insn);
> +	else
> +		op_loc->reg1 = get_target_reg(raw_insn);
> +
> +	if (op_loc->multi_regs)
> +		op_loc->reg2 = PPC_RB(raw_insn);
> +
> +	/* TODO: Implement offset handling for X Form */
> +	if ((op_loc->mem_ref) && (PPC_OP(raw_insn) != 31))
> +		op_loc->offset = get_offset_opcode(raw_insn);
> +}
> diff --git a/tools/perf/arch/s390/annotate/instructions.c b/tools/perf/arch/s390/annotate/instructions.c
> index da5aa3e1f04c..eeac25cca699 100644
> --- a/tools/perf/arch/s390/annotate/instructions.c
> +++ b/tools/perf/arch/s390/annotate/instructions.c
> @@ -2,7 +2,7 @@
>  #include <linux/compiler.h>
>  
>  static int s390_call__parse(struct arch *arch, struct ins_operands *ops,
> -			    struct map_symbol *ms)
> +			    struct map_symbol *ms, struct disasm_line *dl __maybe_unused)
>  {
>  	char *endptr, *tok, *name;
>  	struct map *map = ms->map;
> @@ -52,7 +52,8 @@ static struct ins_ops s390_call_ops = {
>  
>  static int s390_mov__parse(struct arch *arch __maybe_unused,
>  			   struct ins_operands *ops,
> -			   struct map_symbol *ms __maybe_unused)
> +			   struct map_symbol *ms __maybe_unused,
> +			   struct disasm_line *dl __maybe_unused)
>  {
>  	char *s = strchr(ops->raw, ','), *target, *endptr;
>  
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 1451caf25e77..bfa6420dc4b9 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2079,6 +2079,12 @@ static int extract_reg_offset(struct arch *arch, const char *str,
>  	return 0;
>  }
>  
> +__weak void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused,
> +		struct annotated_op_loc *op_loc __maybe_unused)

I'd like to avoid adding weak functions if possible.  It's supposed to
be powerpc only, maybe you can add get_powerpc_regs() in the arch
directory and add a dummy static inline somewhere under #ifndef.

> +{
> +	return;
> +}
> +
>  /**
>   * annotate_get_insn_location - Get location of instruction
>   * @arch: the architecture info
> @@ -2123,20 +2129,33 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
>  	for_each_insn_op_loc(loc, i, op_loc) {
>  		const char *insn_str = ops->source.raw;
>  		bool multi_regs = ops->source.multi_regs;
> +		bool mem_ref = ops->source.mem_ref;
>  
>  		if (i == INSN_OP_TARGET) {
>  			insn_str = ops->target.raw;
>  			multi_regs = ops->target.multi_regs;
> +			mem_ref = ops->target.mem_ref;
>  		}
>  
>  		/* Invalidate the register by default */
>  		op_loc->reg1 = -1;
>  		op_loc->reg2 = -1;
>  
> -		if (insn_str == NULL)
> -			continue;
> +		if (insn_str == NULL) {
> +			if (!arch__is(arch, "powerpc"))
> +				continue;
> +		}
>  
> -		if (strchr(insn_str, arch->objdump.memory_ref_char)) {
> +		/*
> +		 * For powerpc, call get_arch_regs function which extracts the
> +		 * required fields for op_loc, ie reg1, reg2, offset from the
> +		 * raw instruction.
> +		 */
> +		if (arch__is(arch, "powerpc")) {
> +			op_loc->mem_ref = mem_ref;
> +			op_loc->multi_regs = multi_regs;
> +			get_arch_regs(dl->raw.raw_insn, !i, op_loc);
> +		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
>  			op_loc->mem_ref = true;
>  			op_loc->multi_regs = multi_regs;
>  			extract_reg_offset(arch, insn_str, op_loc);
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 1e8568738b38..8428df0b9c17 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -37,6 +37,7 @@ static struct ins_ops mov_ops;
>  static struct ins_ops nop_ops;
>  static struct ins_ops lock_ops;
>  static struct ins_ops ret_ops;
> +static struct ins_ops load_store_ops;
>  
>  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>  			   struct ins_operands *ops, int max_ins_name);
> @@ -254,7 +255,8 @@ bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2)
>  	return arch->ins_is_fused(arch, ins1, ins2);
>  }
>  
> -static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
> +static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	char *endptr, *tok, *name;
>  	struct map *map = ms->map;
> @@ -349,7 +351,8 @@ static inline const char *validate_comma(const char *c, struct ins_operands *ops
>  	return c;
>  }
>  
> -static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
> +static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	struct map *map = ms->map;
>  	struct symbol *sym = ms->sym;
> @@ -508,7 +511,8 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
>  	return 0;
>  }
>  
> -static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
> +static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	ops->locked.ops = zalloc(sizeof(*ops->locked.ops));
>  	if (ops->locked.ops == NULL)
> @@ -517,13 +521,13 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>  	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
>  		goto out_free_ops;
>  
> -	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
> +	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name, 0);
>  
>  	if (ops->locked.ins.ops == NULL)
>  		goto out_free_ops;
>  
>  	if (ops->locked.ins.ops->parse &&
> -	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms) < 0)
> +	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms, NULL) < 0)
>  		goto out_free_ops;
>  
>  	return 0;
> @@ -594,7 +598,8 @@ static bool check_multi_regs(struct arch *arch, const char *op)
>  	return count > 1;
>  }
>  
> -static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
> +static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	char *s = strchr(ops->raw, ','), *target, *comment, prev;
>  
> @@ -672,7 +677,39 @@ static struct ins_ops mov_ops = {
>  	.scnprintf = mov__scnprintf,
>  };
>  
> -static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
> +static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
> +		struct ins_operands *ops, int max_ins_name)
> +{
> +	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
> +			ops->raw);
> +}
> +
> +/*
> + * Sets the fields: multi_regs and "mem_ref".
> + * "mem_ref" is set for ops->source which is later used to
> + * fill the objdump->memory_ref-char field. This ops is currently
> + * used by powerpc and since binary instruction code is used to
> + * extract opcode, regs and offset, no other parsing is needed here
> + */
> +static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
> +		struct map_symbol *ms __maybe_unused, struct disasm_line *dl __maybe_unused)
> +{
> +	ops->source.mem_ref = true;
> +	ops->source.multi_regs = false;
> +
> +	ops->target.mem_ref = false;
> +	ops->target.multi_regs = false;
> +
> +	return 0;
> +}
> +
> +static struct ins_ops load_store_ops = {
> +	.parse     = load_store__parse,
> +	.scnprintf = load_store__scnprintf,
> +};
> +
> +static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
> +		struct disasm_line *dl __maybe_unused)
>  {
>  	char *target, *comment, *s, prev;
>  
> @@ -762,11 +799,23 @@ static void ins__sort(struct arch *arch)
>  	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
>  }
>  
> -static struct ins_ops *__ins__find(struct arch *arch, const char *name)
> +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_insn)
>  {
>  	struct ins *ins;
>  	const int nmemb = arch->nr_instructions;
>  
> +	if (arch__is(arch, "powerpc")) {
> +		/*
> +		 * For powerpc, identify the instruction ops
> +		 * from the opcode using raw_insn.
> +		 */
> +		struct ins_ops *ops;
> +
> +		ops = check_ppc_insn(raw_insn);
> +		if (ops)
> +			return ops;
> +	}
> +
>  	if (!arch->sorted_instructions) {
>  		ins__sort(arch);
>  		arch->sorted_instructions = true;
> @@ -796,9 +845,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name)
>  	return ins ? ins->ops : NULL;
>  }
>  
> -struct ins_ops *ins__find(struct arch *arch, const char *name)
> +struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn)
>  {
> -	struct ins_ops *ops = __ins__find(arch, name);
> +	struct ins_ops *ops = __ins__find(arch, name, raw_insn);
>  
>  	if (!ops && arch->associate_instruction_ops)
>  		ops = arch->associate_instruction_ops(arch, name);
> @@ -808,12 +857,12 @@ struct ins_ops *ins__find(struct arch *arch, const char *name)
>  
>  static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
>  {
> -	dl->ins.ops = ins__find(arch, dl->ins.name);
> +	dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
>  
>  	if (!dl->ins.ops)
>  		return;
>  
> -	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms) < 0)
> +	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms, dl) < 0)
>  		dl->ins.ops = NULL;
>  }
>  
> diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
> index 718177fa4775..6b6ec23e4f6f 100644
> --- a/tools/perf/util/disasm.h
> +++ b/tools/perf/util/disasm.h
> @@ -57,6 +57,7 @@ struct ins_operands {
>  		bool	offset_avail;
>  		bool	outside;
>  		bool	multi_regs;
> +		bool	mem_ref;
>  	} target;
>  	union {
>  		struct {
> @@ -64,6 +65,7 @@ struct ins_operands {
>  			char	*name;
>  			u64	addr;
>  			bool	multi_regs;
> +			bool	mem_ref;
>  		} source;
>  		struct {
>  			struct ins	    ins;
> @@ -78,7 +80,7 @@ struct ins_operands {
>  
>  struct ins_ops {
>  	void (*free)(struct ins_operands *ops);
> -	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms);
> +	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms, struct disasm_line *dl);

The line is too long, please break.

Thanks,
Namhyung


>  	int (*scnprintf)(struct ins *ins, char *bf, size_t size,
>  			 struct ins_operands *ops, int max_ins_name);
>  };
> @@ -97,7 +99,7 @@ struct annotate_args {
>  struct arch *arch__find(const char *name);
>  bool arch__is(struct arch *arch, const char *name);
>  
> -struct ins_ops *ins__find(struct arch *arch, const char *name);
> +struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn);
>  int ins__scnprintf(struct ins *ins, char *bf, size_t size,
>  		   struct ins_operands *ops, int max_ins_name);
>  
> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
> index 01fb25a1150a..7ea39362ecaf 100644
> --- a/tools/perf/util/include/dwarf-regs.h
> +++ b/tools/perf/util/include/dwarf-regs.h
> @@ -1,6 +1,7 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef _PERF_DWARF_REGS_H_
>  #define _PERF_DWARF_REGS_H_
> +#include "annotate.h"
>  
>  #define DWARF_REG_PC  0xd3af9c /* random number */
>  #define DWARF_REG_FB  0xd3affb /* random number */
> @@ -31,6 +32,8 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
>  }
>  #endif
>  
> +void get_arch_regs(int raw_insn, int is_source, struct annotated_op_loc *op_loc);
> +
>  #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
>  /*
>   * Arch should support fetching the offset of a register in pt_regs
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 13/16] tools/perf: Add support to use libcapstone in powerpc
  2024-06-14 17:26 ` [V4 13/16] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
@ 2024-06-25  6:08   ` Namhyung Kim
  2024-06-25 12:44     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  6:08 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:28PM +0530, Athira Rajeev wrote:
> Now perf uses the capstone library to disassemble the instructions in
> x86. capstone is used (if available) for perf annotate to speed up.
> Currently it only supports x86 architecture. Patch includes changes to
> enable this in powerpc. For now, only for data type sort keys, this
> method is used and only binary code (raw instruction) is read. This is
> because powerpc approach to understand instructions and reg fields uses
> raw instruction. The "cs_disasm" is currently not enabled. While
> attempting to do cs_disasm, observation is that some of the instructions
> were not identified (ex: extswsli, maddld) and it had to fallback to use
> objdump. Hence enabling "cs_disasm" is added in comment section as a
> TODO for powerpc.

Well.. I'm not sure if I understand it correctly but it seems this
function effectively does nothing more than the raw disassemble.
Can we simply drop this patch for now?  Or did I miss something?

Thanks,
Namhyung

> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 143 insertions(+)
> 
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 43743ca4bdc9..987bff9f71c3 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -1592,6 +1592,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
>  	}
>  }
>  
> +static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
> +					struct annotate_args *args)
> +{
> +	struct annotation *notes = symbol__annotation(sym);
> +	struct map *map = args->ms.map;
> +	struct dso *dso = map__dso(map);
> +	struct nscookie nsc;
> +	u64 start = map__rip_2objdump(map, sym->start);
> +	u64 end = map__rip_2objdump(map, sym->end);
> +	u64 len = end - start;
> +	u64 offset;
> +	int i, fd, count;
> +	bool is_64bit = false;
> +	bool needs_cs_close = false;
> +	u8 *buf = NULL;
> +	struct find_file_offset_data data = {
> +		.ip = start,
> +	};
> +	csh handle;
> +	char disasm_buf[512];
> +	struct disasm_line *dl;
> +	u32 *line;
> +	bool disassembler_style = false;
> +
> +	if (args->options->objdump_path)
> +		return -1;
> +
> +	nsinfo__mountns_enter(dso->nsinfo, &nsc);
> +	fd = open(filename, O_RDONLY);
> +	nsinfo__mountns_exit(&nsc);
> +	if (fd < 0)
> +		return -1;
> +
> +	if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
> +			    &is_64bit) == 0)
> +		goto err;
> +
> +	if (!args->options->disassembler_style ||
> +			!strcmp(args->options->disassembler_style, "att"))
> +		disassembler_style = true;
> +
> +	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
> +		goto err;
> +
> +	needs_cs_close = true;
> +
> +	buf = malloc(len);
> +	if (buf == NULL)
> +		goto err;
> +
> +	count = pread(fd, buf, len, data.offset);
> +	close(fd);
> +	fd = -1;
> +
> +	if ((u64)count != len)
> +		goto err;
> +
> +	line = (u32 *)buf;
> +
> +	/* add the function address and name */
> +	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
> +		  start, sym->name);
> +
> +	args->offset = -1;
> +	args->line = disasm_buf;
> +	args->line_nr = 0;
> +	args->fileloc = NULL;
> +	args->ms.sym = sym;
> +
> +	dl = disasm_line__new(args);
> +	if (dl == NULL)
> +		goto err;
> +
> +	annotation_line__add(&dl->al, &notes->src->source);
> +
> +	/*
> +	 * TODO: enable disassm for powerpc
> +	 * count = cs_disasm(handle, buf, len, start, len, &insn);
> +	 *
> +	 * For now, only binary code is saved in disassembled line
> +	 * to be used in "type" and "typeoff" sort keys. Each raw code
> +	 * is 32 bit instruction. So use "len/4" to get the number of
> +	 * entries.
> +	 */
> +	count = len/4;
> +
> +	for (i = 0, offset = 0; i < count; i++) {
> +		args->offset = offset;
> +		sprintf(args->line, "%x", line[i]);
> +
> +		dl = disasm_line__new(args);
> +		if (dl == NULL)
> +			goto err;
> +
> +		annotation_line__add(&dl->al, &notes->src->source);
> +
> +		offset += 4;
> +	}
> +
> +	/* It failed in the middle */
> +	if (offset != len) {
> +		struct list_head *list = &notes->src->source;
> +
> +		/* Discard all lines and fallback to objdump */
> +		while (!list_empty(list)) {
> +			dl = list_first_entry(list, struct disasm_line, al.node);
> +
> +			list_del_init(&dl->al.node);
> +			disasm_line__free(dl);
> +		}
> +		count = -1;
> +	}
> +
> +out:
> +	if (needs_cs_close)
> +		cs_close(&handle);
> +	free(buf);
> +	return count < 0 ? count : 0;
> +
> +err:
> +	if (fd >= 0)
> +		close(fd);
> +	if (needs_cs_close) {
> +		struct disasm_line *tmp;
> +
> +		/*
> +		 * It probably failed in the middle of the above loop.
> +		 * Release any resources it might add.
> +		 */
> +		list_for_each_entry_safe(dl, tmp, &notes->src->source, al.node) {
> +			list_del(&dl->al.node);
> +			free(dl);
> +		}
> +	}
> +	count = -1;
> +	goto out;
> +}
> +
>  static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>  					struct annotate_args *args)
>  {
> @@ -1949,6 +2087,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>  			err = symbol__disassemble_dso(symfs_filename, sym, args);
>  			if (err == 0)
>  				goto out_remove_tmp;
> +#ifdef HAVE_LIBCAPSTONE_SUPPORT
> +			err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
> +			if (err == 0)
> +				goto out_remove_tmp;
> +#endif
>  		}
>  	}
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-06-14 17:26 ` [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
@ 2024-06-25  6:17   ` Namhyung Kim
  2024-06-25 12:45     ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25  6:17 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, adrian.hunter, irogers, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, disgoel

On Fri, Jun 14, 2024 at 10:56:29PM +0530, Athira Rajeev wrote:
> There are cases where define a global register variable and associate it
> with a specified register. Example, in powerpc, two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
> 
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");
> 
> These regs are present in dwarf debug as DW_OP_reg as part of variables
> in the cu_die (compile unit). These are not present in die search done
> in the list of nested scopes since these are global register variables.
> 
> Example for local_paca represented by r13:
> 
> <<>>
>  <1><18dc6b4>: Abbrev Number: 128 (DW_TAG_variable)
>     <18dc6b6>   DW_AT_name        : (indirect string, offset: 0x3861): local_paca
>     <18dc6ba>   DW_AT_decl_file   : 48
>     <18dc6bb>   DW_AT_decl_line   : 36
>     <18dc6bc>   DW_AT_decl_column : 30
>     <18dc6bd>   DW_AT_type        : <0x18dc6c3>
>     <18dc6c1>   DW_AT_external    : 1
>     <18dc6c1>   DW_AT_location    : 1 byte block: 5d    (DW_OP_reg13 (r13))
> 
>  <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
>     <18dc6c4>   DW_AT_byte_size   : 8
>     <18dc6c4>   DW_AT_type        : <0x18dc353>
> 
> Where  DW_AT_type : <0x18dc6c3> further points to :
> 
>  <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
>     <18dc6c4>   DW_AT_byte_size   : 8
>     <18dc6c4>   DW_AT_type        : <0x18dc353>
> 
> which belongs to:
> 
>  <1><18dc353>: Abbrev Number: 67 (DW_TAG_structure_type)
>     <18dc354>   DW_AT_name        : (indirect string, offset: 0x56cd): paca_struct
>     <18dc358>   DW_AT_byte_size   : 2944
>     <18dc35a>   DW_AT_alignment   : 128
>     <18dc35b>   DW_AT_decl_file   : 48
>     <18dc35c>   DW_AT_decl_line   : 61
>     <18dc35d>   DW_AT_decl_column : 8
>     <18dc35d>   DW_AT_sibling     : <0x18dc6b4>
> <<>>
> 
> Similar is case with "r1".
> 
> <<>>
>  <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
>     <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
>     <18dd778>   DW_AT_decl_file   : 51
>     <18dd779>   DW_AT_decl_line   : 1468
>     <18dd77b>   DW_AT_decl_column : 24
>     <18dd77c>   DW_AT_type        : <0x18da5cd>
>     <18dd780>   DW_AT_external    : 1
>     <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))
> 
>  where 18da5cd is:
> 
>  <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
>     <18da5ce>   DW_AT_byte_size   : 8
>     <18da5cf>   DW_AT_encoding    : 7   (unsigned)
>     <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
> <<>>
> 
> To identify data type for these two special cases, iterate over
> variables in the CU die (Compile Unit) and match it with the register.
> If the variable is a base type, ie die_get_real_type will return NULL
> here, set offset to zero. With the changes, data type for "paca_struct"
> and "long unsigned int" for r1 is identified.
> 
> Snippet from ./perf report -s type,type_off
> 
>     12.85%  long unsigned int  long unsigned int +0 (no field)
>      4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>      4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/annotate-data.c      | 40 ++++++++++++++++++++++++++++
>  tools/perf/util/annotate.c           |  8 ++++++
>  tools/perf/util/annotate.h           |  1 +
>  tools/perf/util/include/dwarf-regs.h |  1 +
>  4 files changed, 50 insertions(+)
> 
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 734acdd8c4b7..82232f2d8e16 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -1170,6 +1170,40 @@ static int find_data_type_block(struct data_loc_info *dloc,
>  	return ret;
>  }
>  
> +/*
> + * Handle cases where define a global register variable and
> + * associate it with a specified register. These regs are
> + * present in dwarf debug as DW_OP_reg as part of variables
> + * in the cu_die (compile unit). Iterate over variables in the
> + * cu_die and match with reg to identify data type die.

Ok, if they always point to the same type, you may cache the result and
avoid the repeated search everytime.

> + */
> +static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
> +		Dwarf_Die *type_die)
> +{
> +	Dwarf_Die vr_die;
> +	int ret = -1;
> +	struct die_var_type *var_types = NULL;
> +
> +	die_collect_vars(cu_die, &var_types);
> +	while (var_types) {
> +		if (var_types->reg == reg) {
> +			if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
> +				if (die_get_real_type(&vr_die, type_die) == NULL) {
> +					dloc->type_offset = 0;
> +					dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
> +				}
> +				pr_debug_type_name(type_die, TSR_KIND_TYPE);
> +				ret = 0;
> +				pr_debug_dtp("found by CU for %s (die:%#lx)\n",
> +						dwarf_diename(type_die), (long)dwarf_dieoffset(type_die));
> +			}
> +			break;
> +		}
> +		var_types = var_types->next;
> +	}

Please add 'delete_var_types(var_types);' here.

Thanks,
Namhyung


> +	return ret;
> +}
> +
>  /* The result will be saved in @type_die */
>  static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
>  {
> @@ -1217,6 +1251,12 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
>  	pr_debug_dtp("CU for %s (die:%#lx)\n",
>  		     dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
>  
> +	if (loc->reg_type == DWARF_REG_GLOBAL) {
> +		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
> +		if (!ret)
> +			goto out;
> +	}
> +
>  	if (reg == DWARF_REG_PC) {
>  		if (get_global_var_type(&cu_die, dloc, dloc->ip, dloc->var_addr,
>  					&offset, type_die)) {
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index bfa6420dc4b9..c7e4fd16e8b4 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2431,6 +2431,14 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
>  			op_loc->reg1 = DWARF_REG_PC;
>  		}
>  
> +		/* Global reg variable 13 and 1
> +		 * assign to DWARF_REG_GLOBAL
> +		 */
> +		if (arch__is(arch, "powerpc")) {
> +			if ((op_loc->reg1 == 13) || (op_loc->reg1 == 1))
> +				op_loc->reg_type = DWARF_REG_GLOBAL;
> +		}
> +
>  		mem_type = find_data_type(&dloc);
>  
>  		if (mem_type == NULL && is_stack_canary(arch, op_loc)) {
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index 9ba772f46270..ad69842a8ebc 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -475,6 +475,7 @@ struct annotated_op_loc {
>  	bool mem_ref;
>  	bool multi_regs;
>  	bool imm;
> +	int reg_type;
>  };
>  
>  enum annotated_insn_ops {
> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
> index 7ea39362ecaf..a873c906a86b 100644
> --- a/tools/perf/util/include/dwarf-regs.h
> +++ b/tools/perf/util/include/dwarf-regs.h
> @@ -5,6 +5,7 @@
>  
>  #define DWARF_REG_PC  0xd3af9c /* random number */
>  #define DWARF_REG_FB  0xd3affb /* random number */
> +#define DWARF_REG_GLOBAL 0xd3affc /* random number */
>  
>  #ifdef HAVE_DWARF_SUPPORT
>  const char *get_arch_regstr(unsigned int n);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 01/16] tools/perf: Move the data structures related to register type to header file
  2024-06-25  5:15   ` Namhyung Kim
@ 2024-06-25 10:54     ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 10:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 25 Jun 2024, at 10:45 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Fri, Jun 14, 2024 at 10:56:16PM +0530, Athira Rajeev wrote:
>> Data type profiling uses instruction tracking by checking each
>> instruction and updating the register type state in some data
>> structures. This is useful to find the data type in cases when the
>> register state gets transferred from one reg to another. Example, in
>> x86, "mov" instruction and in powerpc, "mr" instruction. Currently these
>> structures are defined in annotate-data.c and instruction tracking is
>> implemented only for x86. Move these data structures to
>> "annotate-data.h" header file so that other arch implementations can use
>> it in arch specific files as well.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/util/annotate-data.c | 53 +------------------------------
>> tools/perf/util/annotate-data.h | 55 +++++++++++++++++++++++++++++++++
>> 2 files changed, 56 insertions(+), 52 deletions(-)
>> 
>> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
>> index 965da6c0b542..a4c7f98a75e3 100644
>> --- a/tools/perf/util/annotate-data.c
>> +++ b/tools/perf/util/annotate-data.c
>> @@ -31,15 +31,6 @@
>> 
>> static void delete_var_types(struct die_var_type *var_types);
>> 
>> -enum type_state_kind {
>> - TSR_KIND_INVALID = 0,
>> - TSR_KIND_TYPE,
>> - TSR_KIND_PERCPU_BASE,
>> - TSR_KIND_CONST,
>> - TSR_KIND_POINTER,
>> - TSR_KIND_CANARY,
>> -};
>> -
>> #define pr_debug_dtp(fmt, ...) \
>> do { \
>> if (debug_type_profile) \
>> @@ -140,49 +131,7 @@ static void pr_debug_location(Dwarf_Die *die, u64 pc, int reg)
>> }
>> }
>> 
>> -/*
>> - * Type information in a register, valid when @ok is true.
>> - * The @caller_saved registers are invalidated after a function call.
>> - */
>> -struct type_state_reg {
>> - Dwarf_Die type;
>> - u32 imm_value;
>> - bool ok;
>> - bool caller_saved;
>> - u8 kind;
>> -};
>> -
>> -/* Type information in a stack location, dynamically allocated */
>> -struct type_state_stack {
>> - struct list_head list;
>> - Dwarf_Die type;
>> - int offset;
>> - int size;
>> - bool compound;
>> - u8 kind;
>> -};
>> -
>> -/* FIXME: This should be arch-dependent */
>> -#define TYPE_STATE_MAX_REGS  16
>> -
>> -/*
>> - * State table to maintain type info in each register and stack location.
>> - * It'll be updated when new variable is allocated or type info is moved
>> - * to a new location (register or stack).  As it'd be used with the
>> - * shortest path of basic blocks, it only maintains a single table.
>> - */
>> -struct type_state {
>> - /* state of general purpose registers */
>> - struct type_state_reg regs[TYPE_STATE_MAX_REGS];
>> - /* state of stack location */
>> - struct list_head stack_vars;
>> - /* return value register */
>> - int ret_reg;
>> - /* stack pointer register */
>> - int stack_reg;
>> -};
>> -
>> -static bool has_reg_type(struct type_state *state, int reg)
>> +bool has_reg_type(struct type_state *state, int reg)
>> {
>> return (unsigned)reg < ARRAY_SIZE(state->regs);
>> }
>> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
>> index 0a57d9f5ee78..ef235b1b15e1 100644
>> --- a/tools/perf/util/annotate-data.h
>> +++ b/tools/perf/util/annotate-data.h
>> @@ -6,6 +6,9 @@
>> #include <linux/compiler.h>
>> #include <linux/rbtree.h>
>> #include <linux/types.h>
>> +#include "dwarf-aux.h"
>> +#include "annotate.h"
>> +#include "debuginfo.h"
>> 
>> struct annotated_op_loc;
>> struct debuginfo;
>> @@ -15,6 +18,15 @@ struct hist_entry;
>> struct map_symbol;
>> struct thread;
>> 
>> +enum type_state_kind {
>> + TSR_KIND_INVALID = 0,
>> + TSR_KIND_TYPE,
>> + TSR_KIND_PERCPU_BASE,
>> + TSR_KIND_CONST,
>> + TSR_KIND_POINTER,
>> + TSR_KIND_CANARY,
>> +};
>> +
>> /**
>> * struct annotated_member - Type of member field
>> * @node: List entry in the parent list
>> @@ -142,6 +154,48 @@ struct annotated_data_stat {
>> };
>> extern struct annotated_data_stat ann_data_stat;
>> 
>> +/*
>> + * Type information in a register, valid when @ok is true.
>> + * The @caller_saved registers are invalidated after a function call.
>> + */
>> +struct type_state_reg {
>> + Dwarf_Die type;
>> + u32 imm_value;
>> + bool ok;
>> + bool caller_saved;
>> + u8 kind;
>> +};
>> +
>> +/* Type information in a stack location, dynamically allocated */
>> +struct type_state_stack {
>> + struct list_head list;
>> + Dwarf_Die type;
>> + int offset;
>> + int size;
>> + bool compound;
>> + u8 kind;
>> +};
>> +
>> +/* FIXME: This should be arch-dependent */
>> +#define TYPE_STATE_MAX_REGS  32
> 
> Can you please define this for powerpc separately?  I think x86 should
> remain in 16.
> 
> Thanks,
> Namhyung

Sure, I will have this change in V5
> 
>> +
>> +/*
>> + * State table to maintain type info in each register and stack location.
>> + * It'll be updated when new variable is allocated or type info is moved
>> + * to a new location (register or stack).  As it'd be used with the
>> + * shortest path of basic blocks, it only maintains a single table.
>> + */
>> +struct type_state {
>> + /* state of general purpose registers */
>> + struct type_state_reg regs[TYPE_STATE_MAX_REGS];
>> + /* state of stack location */
>> + struct list_head stack_vars;
>> + /* return value register */
>> + int ret_reg;
>> + /* stack pointer register */
>> + int stack_reg;
>> +};
>> +
>> #ifdef HAVE_DWARF_SUPPORT
>> 
>> /* Returns data type at the location (ip, reg, offset) */
>> @@ -160,6 +214,7 @@ void global_var_type__tree_delete(struct rb_root *root);
>> 
>> int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
>> 
>> +bool has_reg_type(struct type_state *state, int reg);
>> #else /* HAVE_DWARF_SUPPORT */
>> 
>> static inline struct annotated_data_type *
>> -- 
>> 2.43.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 00/16] Add data type profiling support for powerpc
  2024-06-22  0:06   ` Namhyung Kim
@ 2024-06-25 11:48     ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 11:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel



> On 22 Jun 2024, at 5:36 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Thu, Jun 20, 2024 at 09:01:01PM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 14 Jun 2024, at 10:56 PM, Athira Rajeev <atrajeev@linux.vnet.ibm.com> wrote:
>>> 
>>> The patchset from Namhyung added support for data type profiling
>>> in perf tool. This enabled support to associate PMU samples to data
>>> types they refer using DWARF debug information. With the upstream
>>> perf, currently it possible to run perf report or perf annotate to
>>> view the data type information on x86.
>>> 
>>> Initial patchset posted here had changes need to enable data type
>>> profiling support for powerpc.
>>> 
>>> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
>>> 
>>> Main change were:
>>> 1. powerpc instruction nmemonic table to associate load/store
>>> instructions with move_ops which is use to identify if instruction
>>> is a memory access one.
>>> 2. To get register number and access offset from the given
>>> instruction, code uses fields from "struct arch" -> objump.
>>> Added entry for powerpc here.
>>> 3. A get_arch_regnum to return register number from the
>>> register name string.
>>> 
>>> But the apporach used in the initial patchset used parsing of
>>> disassembled code which the current perf tool implementation does.
>>> 
>>> Example: lwz     r10,0(r9)
>>> 
>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>> registers names and offset. Also to find whether there is a memory
>>> reference in the operands, "memory_ref_char" field of objdump is used.
>>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>>> form "mov  (%rax), %rcx".
>>> 
>>> In case of powerpc, not all instructions using "(" are the only memory
>>> instructions. Example, above instruction can also be of extended form (X
>>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>>> and extract the source/target registers, second patchset added support to use
>>> raw instruction. With raw instruction, macros are added to extract opcode
>>> and register fields.
>>> Link to second patchset:
>>> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
>>> 
>>> Example representation using --show-raw-insn in objdump gives result:
>>> 
>>> 38 01 81 e8     ld      r4,312(r1)
>>> 
>>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>>> as:
>>> _____________________________________
>>> | 58 |  RT  |  RA |      DS       | |
>>> -------------------------------------
>>> 0    6     11    16              30 31
>>> 
>>> Second patchset used "objdump" again to read the raw instruction.
>>> But since there is no need to disassemble and binary code can be read
>>> directly from the DSO, third patchset (ie this patchset) uses below
>>> apporach. The apporach preferred in powerpc to parse sample for data
>>> type profiling in V3 patchset is:
>>> - Read directly from DSO using dso__data_read_offset
>>> - If that fails for any case, fallback to using libcapstone
>>> - If libcapstone is not supported, approach will use objdump
>>> 
>>> Patchset adds support to pick the opcode and reg fields from this
>>> raw/binary instruction code. This approach came in from review comment
>>> by Segher Boessenkool and Christophe for the initial patchset.
>>> 
>>> Apart from that, instruction tracking is enabled for powerpc and
>>> support function is added to find variables defined as registers
>>> Example, in powerpc, below two registers are
>>> defined to represent variable:
>>> 1. r13: represents local_paca
>>> register struct paca_struct *local_paca asm("r13");
>>> 
>>> 2. r1: represents stack_pointer
>>> register void *__stack_pointer asm("r1");
>>> 
>>> These are handled in this patchset.
>>> 
>>> - Patch 1 is to rearrange register state type structures to header file
>>> so that it can referred from other arch specific files
>>> - Patch 2 is to make instruction tracking as a callback to"struct arch"
>>> so that it can be implemented by other archs easily and defined in arch
>>> specific files
>>> - Patch 3 adds support to capture and parse raw instruction in powerpc
>>> using dso__data_read_offset utility
>>> - Patch 4 adds logic to support using objdump when doing default "perf
>>> report" or "perf annotate" since it that needs disassembled instruction.
>>> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
>>> - Patch 6 update parameters for reg extract functions to use raw
>>> instruction on powerpc
>>> - Patch 7 add support to identify memory instructions of opcode 31 in
>>> powerpc
>>> - Patch 8 adds more instructions to support instruction tracking in powerpc
>>> - Patch 9 and 10 handles instruction tracking for powerpc.
>>> - Patch 11, 12 and 13 add support to use libcapstone in powerpc
>>> - Patch 14 and patch 15 handles support to find global register variables
>>> - Patch 16 handles insn-stat option for perf annotate
>>> 
>>> Note:
>>> - There are remaining unknowns (25%) as seen in annotate Instruction stats
>>> below.
>>> - This patchset is not tested on powerpc32. In next step of enhancements
>>> along with handling remaining unknowns, plan to cover powerpc32 changes
>>> based on how testing goes.
>>> 
>>> With the current patchset:
>>> 
>>> ./perf record -a -e mem-loads sleep 1
>>> ./perf report -s type,typeoff --hierarchy --group --stdio
>>> ./perf annotate --data-type --insn-stat
>>> 
>>> perf annotate logs:
>>> ==================
>>> 
>>> Annotate Instruction stats
>>> total 609, ok 446 (73.2%), bad 163 (26.8%)
>>> 
>>> Name/opcode:  Good   Bad
>>> -----------------------------------------------------------
>>> 58                  :   323    80
>>> 32                  :    49    43
>>> 34                  :    33    11
>>> OP_31_XOP_LDX       :     8    20
>>> 40                  :    23     0
>>> OP_31_XOP_LWARX     :     5     1
>>> OP_31_XOP_LWZX      :     2     3
>>> OP_31_XOP_LDARX     :     3     0
>>> 33                  :     0     2
>>> OP_31_XOP_LBZX      :     0     1
>>> OP_31_XOP_LWAX      :     0     1
>>> OP_31_XOP_LHZX      :     0     1
>>> 
>>> perf report logs:
>>> =================
>>> 
>>> Total Lost Samples: 0
>>> 
>>> Samples: 1K of event 'mem-loads'
>>> Event count (approx.): 937238
>>> 
>>> Overhead  Data Type  Data Type Offset
>>> ........  .........  ................
>>> 
>>> 48.60%  (unknown)  (unknown) +0 (no field)
>>> 12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>>>  4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>>>  4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>>>  2.69%  struct paca_struct  struct paca_struct +2808 (canary)
>>>  2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>>>  2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>>>  1.41%  struct vm_fault  struct vm_fault +0 (vma)
>>>  1.29%  struct task_struct  struct task_struct +276 (flags)
>>>  1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
>>>  0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
>>>  0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
>>>  0.76%  struct rq  struct rq +2856 (cpu)
>>> 
>>> Thanks
>>> Athira Rajeev
>> 
>> Hi All
>> 
>> Requesting for review comments for this patchset
> 
> Sorry about the delay, I was traveling and busy with other things.
> I'll review this next week!

Thanks Namhyung
> 
> Thanks,
> Namhyung



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-06-25  5:29   ` Namhyung Kim
@ 2024-06-25 12:38     ` Athira Rajeev
  2024-06-25 18:39       ` Namhyung Kim
  0 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 12:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, linux-kernel,
	linux-perf-users, linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 25 Jun 2024, at 10:59 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Fri, Jun 14, 2024 at 10:56:18PM +0530, Athira Rajeev wrote:
>> Add support to capture and parse raw instruction in powerpc.
>> Currently, the perf tool infrastructure uses two ways to disassemble
>> and understand the instruction. One is objdump and other option is
>> via libcapstone.
>> 
>> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
>> with "objdump" while disassemble. Example from powerpc with this option
>> for an instruction address is:
>> 
>> Snippet from:
>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>> 
>> c0000000010224b4: lwz     r10,0(r9)
> 
> What about removing --no-show-raw-insn and parse the raw byte code in
> the output for powerpc?  I think it's better to support normal
> annotation together.
Hi Namhyung,

Yes, In the other patch in same series, I have added support for normal annotation together.
Patch 5 includes changes to work with binary code as well as mneumonic representation.

Example representation using --show-raw-insn in objdump gives result:

38 01 81 e8 ld r4,312(r1)

Patch5 has changes to use “objdump” with --show-raw-insn to read the raw instruction and also support normal annotation.
In case of data type profiling, with only sort keys, (type, typeoff) there is no need to disassemble and then get raw byte code.
Binary code can be read directly from the DSO. Compared to using objdump, directly reading from DSO will be faster in this case.
In summary, current patchset uses below approach:

1. Read directly from DSO using dso__data_read_offset if only “type, typeoff” is needed.
2. If in any case reading directly from DSO fails, fallback to using libcapstone. Using libcapstone to read is faster than objdump
3. If libcapstone is not supported, approach will use objdump. Patchset has changes to handle objdump result created with show-raw-ins in powerpc. 
4. Also for normal perf report or perf annotate, approach will use objdump

NOTE:
libcapstone is used currently only for reading raw binary code. Disassemble is currently not enabled. While attempting to do cs_disasm, observation is that some of the instructions were not identified (ex: extswsli, maddld) and it had to fallback to use objdump. Hence enabling "cs_disasm" is added in comment section as a TODO for powerpc. Patch number 13. 

Thanks
Athira

> 
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. Also to find whether there is a memory
>> reference in the operands, "memory_ref_char" field of objdump is used.
>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>> form "mov  (%rax), %rcx".
>> 
>> In case of powerpc, not all instructions using "(" are the only memory
>> instructions. Example, above instruction can also be of extended form (X
>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>> and extract the source/target registers, patch adds support to use raw
>> instruction for powerpc. Approach used is to read the raw instruction
>> directly from the DSO file using "dso__data_read_offset" utility which
>> is already implemented in perf infrastructure in "util/dso.c".
>> 
>> Example:
>> 
>> 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>> as:
>> 
>>   | 58 |  RT  |  RA |      DS       | |
>>   -------------------------------------
>>   0    6     11    16              30 31
>> 
>> Function "symbol__disassemble_dso" is updated to read raw instruction
>> directly from DSO using dso__data_read_offset utility. In case of
>> above example, this captures:
>> line:    38 01 81 e8
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/util/disasm.c | 98 ++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 98 insertions(+)
>> 
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index b5fe3a7508bb..f19496133bf0 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -1586,6 +1586,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>> }
>> #endif
>> 
>> +static int symbol__disassemble_dso(char *filename, struct symbol *sym,
> 
> Maybe rename to symbol__disassemble_raw() ?

This is specifically using dso__data_read_offset. Hence using symbol__disassemble_dso 
> 
>> + struct annotate_args *args)
>> +{
>> + struct annotation *notes = symbol__annotation(sym);
>> + struct map *map = args->ms.map;
>> + struct dso *dso = map__dso(map);
>> + u64 start = map__rip_2objdump(map, sym->start);
>> + u64 end = map__rip_2objdump(map, sym->end);
>> + u64 len = end - start;
>> + u64 offset;
>> + int i, count;
>> + u8 *buf = NULL;
>> + char disasm_buf[512];
>> + struct disasm_line *dl;
>> + u32 *line;
>> +
>> + /* Return if objdump is specified explicitly */
>> + if (args->options->objdump_path)
>> + return -1;
>> +
>> + pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);
> 
> You may want to print the actual offset and remove the "using
> dso__data_read_offset" part.

Ok Sure
> 
> Thanks,
> Namhyung
> 
>> +
>> + buf = malloc(len);
>> + if (buf == NULL)
>> + goto err;
>> +
>> + count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
>> +
>> + line = (u32 *)buf;
>> +
>> + if ((u64)count != len)
>> + goto err;
>> +
>> + /* add the function address and name */
>> + scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
>> +   start, sym->name);
>> +
>> + args->offset = -1;
>> + args->line = disasm_buf;
>> + args->line_nr = 0;
>> + args->fileloc = NULL;
>> + args->ms.sym = sym;
>> +
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, &notes->src->source);
>> +
>> + /* Each raw instruction is 4 byte */
>> + count = len/4;
>> +
>> + for (i = 0, offset = 0; i < count; i++) {
>> + args->offset = offset;
>> + sprintf(args->line, "%x", line[i]);
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, &notes->src->source);
>> + offset += 4;
>> + }
>> +
>> + /* It failed in the middle */
>> + if (offset != len) {
>> + struct list_head *list = &notes->src->source;
>> +
>> + /* Discard all lines and fallback to objdump */
>> + while (!list_empty(list)) {
>> + dl = list_first_entry(list, struct disasm_line, al.node);
>> +
>> + list_del_init(&dl->al.node);
>> + disasm_line__free(dl);
>> + }
>> + count = -1;
>> + }
>> +
>> +out:
>> + free(buf);
>> + return count < 0 ? count : 0;
>> +
>> +err:
>> + count = -1;
>> + goto out;
>> +}
>> /*
>>  * Possibly create a new version of line with tabs expanded. Returns the
>>  * existing or new line, storage is updated if a new line is allocated. If
>> @@ -1710,6 +1795,19 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>> strcpy(symfs_filename, tmp);
>> }
>> 
>> + /*
>> +  * For powerpc data type profiling, use the dso__data_read_offset
>> +  * to read raw instruction directly and interpret the binary code
>> +  * to understand instructions and register fields. For sort keys as
>> +  * type and typeoff, disassemble to mnemonic notation is
>> +  * not required in case of powerpc.
>> +  */
>> + if (arch__is(args->arch, "powerpc")) {
>> + err = symbol__disassemble_dso(symfs_filename, sym, args);
>> + if (err == 0)
>> + goto out_remove_tmp;
>> + }
>> +
>> #ifdef HAVE_LIBCAPSTONE_SUPPORT
>> err = symbol__disassemble_capstone(symfs_filename, sym, args);
>> if (err == 0)
>> -- 
>> 2.43.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-25  5:39   ` Namhyung Kim
@ 2024-06-25 12:42     ` Athira Rajeev
  2024-06-25 18:45       ` Namhyung Kim
  0 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 12:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel



> On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
>> Currently, the perf tool infrastructure disasm_line__parse function to
>> parse disassembled line.
>> 
>> Example snippet from objdump:
>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>> 
>> c0000000010224b4: lwz     r10,0(r9)
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. In powerpc, the approach for data type
>> profiling uses raw instruction instead of result from objdump to identify
>> the instruction category and extract the source/target registers.
>> 
>> Example: 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. Add function
>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
>> Also update "struct disasm_line" to save the binary code/
>> With the change, function captures:
>> 
>> line -> "38 01 81 e8     ld      r4,312(r1)"
>> raw instruction "38 01 81 e8"
>> 
>> Raw instruction is used later to extract the reg/offset fields. Macros
>> are added to extract opcode and register fields. "struct disasm_line"
>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
>> code (raw). Function "disasm_line__parse_powerpc fills the raw
>> instruction hex value and can use macros to get opcode. There is no
>> changes in existing code paths, which parses the disassembled code.
>> The architecture using the instruction name and present approach is
>> not altered. Since this approach targets powerpc, the macro
>> implementation is added for powerpc as of now.
>> 
>> Since the disasm_line__parse is used in other cases (perf annotate) and
>> not only data tye profiling, the powerpc callback includes changes to
>> work with binary code as well as mneumonic representation. Also in case
>> if the DSO read fails and libcapstone is not supported, the approach
>> fallback to use objdump as option. Hence as option, patch has changes to
>> ensure objdump option also works well.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/include/linux/string.h                  |  2 +
>> tools/lib/string.c                            | 13 ++++
>> .../perf/arch/powerpc/annotate/instructions.c |  1 +
>> tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 +++
>> tools/perf/util/annotate.h                    |  5 +-
>> tools/perf/util/disasm.c                      | 59 ++++++++++++++++++-
>> 6 files changed, 87 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
>> index db5c99318c79..0acb1fc14e19 100644
>> --- a/tools/include/linux/string.h
>> +++ b/tools/include/linux/string.h
>> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>> 
>> extern char *strim(char *);
>> 
>> +extern void remove_spaces(char *s);
>> +
>> extern void *memchr_inv(const void *start, int c, size_t bytes);
>> #endif /* _TOOLS_LINUX_STRING_H_ */
>> diff --git a/tools/lib/string.c b/tools/lib/string.c
>> index 8b6892f959ab..3126d2cff716 100644
>> --- a/tools/lib/string.c
>> +++ b/tools/lib/string.c
>> @@ -153,6 +153,19 @@ char *strim(char *s)
>> return skip_spaces(s);
>> }
>> 
>> +/*
>> + * remove_spaces - Removes whitespaces from @s
>> + */
>> +void remove_spaces(char *s)
>> +{
>> + char *d = s;
>> +
>> + do {
>> + while (*d == ' ')
>> + ++d;
>> + } while ((*s++ = *d++));
>> +}
>> +
>> /**
>>  * strreplace - Replace all occurrences of character in string.
>>  * @s: The string to operate on.
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
>> index a3f423c27cae..d57fd023ef9c 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>> arch->initialized = true;
>> arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>> arch->objdump.comment_char      = '#';
>> + annotate_opts.show_asm_raw = true;
> 
> Right, I think this will add the raw insn in the output of objdump, no?
> Why not using the information?

Shared response in previous patch
> 
>> }
>> 
>> return 0;
>> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> index 0c4f4caf53ac..430623ca5612 100644
>> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
>> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
>> return roff->ptregs_offset;
>> return -EINVAL;
>> }
>> +
>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>> +#define PPC_RA(a) (((a) >> 16) & 0x1f)
>> +#define PPC_RT(t) (((t) >> 21) & 0x1f)
>> +#define PPC_RB(b) (((b) >> 11) & 0x1f)
>> +#define PPC_D(D) ((D) & 0xfffe)
>> +#define PPC_DS(DS) ((DS) & 0xfffc)
>> +#define OP_LD 58
>> +#define OP_STD 62
>> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
>> index d5c821c22f79..9ba772f46270 100644
>> --- a/tools/perf/util/annotate.h
>> +++ b/tools/perf/util/annotate.h
>> @@ -113,7 +113,10 @@ struct annotation_line {
>> struct disasm_line {
>> struct ins  ins;
>> struct ins_operands  ops;
>> -
>> + union {
>> + u8 bytes[4];
>> + u32 raw_insn;
>> + } raw;
>> /* This needs to be at the end. */
>> struct annotation_line  al;
>> };
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index b81cdcf4d6b4..1e8568738b38 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -45,6 +45,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>> 
>> static void ins__sort(struct arch *arch);
>> static int disasm_line__parse(char *line, const char **namep, char **rawp);
>> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
>> 
>> static __attribute__((constructor)) void symbol__init_regexpr(void)
>> {
>> @@ -844,6 +845,59 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>> return -1;
>> }
>> 
>> +/*
>> + * Parses the result captured from symbol__disassemble_*
>> + * Example, line read from DSO file in powerpc:
>> + * line:    38 01 81 e8
>> + * opcode: fetched from arch specific get_opcode_insn
>> + * rawp_insn: e8810138
>> + *
>> + * rawp_insn is used later to extract the reg/offset fields
>> + */
>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>> +
>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
>> +{
>> + char *line = dl->al.line;
>> + const char **namep = &dl->ins.name;
>> + char **rawp = &dl->ops.raw;
>> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
>> + char *name = skip_spaces(name_raw_insn + 11);
>> + int objdump = 0;
>> +
>> + if (strlen(line) > 11)
>> + objdump = 1;
>> +
>> + if (name_raw_insn[0] == '\0')
>> + return -1;
>> +
>> + if (objdump) {
>> + *rawp = name + 1;
>> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
>> + ++*rawp;
>> + tmp = (*rawp)[0];
>> + (*rawp)[0] = '\0';
>> +
>> + *namep = strdup(name);
>> + if (*namep == NULL)
>> + return -1;
>> +
>> + (*rawp)[0] = tmp;
>> + *rawp = strim(*rawp);
>> + } else
>> + *namep = "";
>> +
>> + tmp_raw_insn = strdup(name_raw_insn);
>> + tmp_raw_insn[11] = '\0';
>> + remove_spaces(tmp_raw_insn);
>> +
>> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
>> + if (objdump)
>> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
> 
> Hmm.. can you use a sscanf() instead?
> 
>  sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
> 
> Thanks,
> Namhyung
> 
Sure will address in V5

Thanks
Athira
>> +
>> + return 0;
>> +}
>> +
>> static void annotation_line__init(struct annotation_line *al,
>>   struct annotate_args *args,
>>   int nr)
>> @@ -897,7 +951,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>> goto out_delete;
>> 
>> if (args->offset != -1) {
>> - if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>> + if (arch__is(args->arch, "powerpc")) {
>> + if (disasm_line__parse_powerpc(dl) < 0)
>> + goto out_free_line;
>> + } else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>> goto out_free_line;
>> 
>> disasm_line__init_ins(dl, args->arch, &args->ms);
>> -- 
>> 2.43.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-06-25  6:00   ` Namhyung Kim
@ 2024-06-25 12:43     ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 12:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, linux-kernel,
	linux-perf-users, linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 25 Jun 2024, at 11:30 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Fri, Jun 14, 2024 at 10:56:21PM +0530, Athira Rajeev wrote:
>> Use the raw instruction code and macros to identify memory instructions,
>> extract register fields and also offset. The implementation addresses
>> the D-form, X-form, DS-form instructions. Two main functions are added.
>> New parse function "load_store__parse" as instruction ops parser for
>> memory instructions. Unlink other parser (like mov__parse), this parser
>> fills in the "multi_regs" field for source/target and new added "mem_ref"
>> field. No other fields are set because, here there is no need to parse the
>> disassembled code and arch specific macros will take care of extracting
>> offset and regs which is easier and will be precise.
>> 
>> In powerpc, all instructions with a primary opcode from 32 to 63
>> are memory instructions. Update "ins__find" function to have "raw_insn"
>> also as a parameter. Don't use the "extract_reg_offset", instead use
>> newly added function "get_arch_regs" which will set these fields: reg1,
>> reg2, offset depending of where it is source or target ops.
>> 
>> Update "parse" callback for "struct ins_ops" to also pass "struct
>> disasm_line" as argument. This is needed in parse functions where opcode
>> is used to determine whether to set multi_regs.
> 
> Can you please split "ins__find" change and "parse" change into separate
> commits?

Ok. 
> 
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/arch/arm64/annotate/instructions.c |  3 +-
>> .../arch/loongarch/annotate/instructions.c    |  6 +-
>> .../perf/arch/powerpc/annotate/instructions.c | 16 ++++
>> tools/perf/arch/powerpc/util/dwarf-regs.c     | 44 +++++++++++
>> tools/perf/arch/s390/annotate/instructions.c  |  5 +-
>> tools/perf/util/annotate.c                    | 25 ++++++-
>> tools/perf/util/disasm.c                      | 73 ++++++++++++++++---
>> tools/perf/util/disasm.h                      |  6 +-
>> tools/perf/util/include/dwarf-regs.h          |  3 +
>> 9 files changed, 159 insertions(+), 22 deletions(-)
>> 
>> diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
>> index 4af0c3a0f86e..f86d9f4798bd 100644
>> --- a/tools/perf/arch/arm64/annotate/instructions.c
>> +++ b/tools/perf/arch/arm64/annotate/instructions.c
>> @@ -11,7 +11,8 @@ struct arm64_annotate {
>> 
>> static int arm64_mov__parse(struct arch *arch __maybe_unused,
>>     struct ins_operands *ops,
>> -     struct map_symbol *ms __maybe_unused)
>> +     struct map_symbol *ms __maybe_unused,
>> +     struct disasm_line *dl __maybe_unused)
>> {
>> char *s = strchr(ops->raw, ','), *target, *endptr;
>> 
>> diff --git a/tools/perf/arch/loongarch/annotate/instructions.c b/tools/perf/arch/loongarch/annotate/instructions.c
>> index 21cc7e4149f7..ab43b1ab51e3 100644
>> --- a/tools/perf/arch/loongarch/annotate/instructions.c
>> +++ b/tools/perf/arch/loongarch/annotate/instructions.c
>> @@ -5,7 +5,8 @@
>>  * Copyright (C) 2020-2023 Loongson Technology Corporation Limited
>>  */
>> 
>> -static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
>> +static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> char *c, *endptr, *tok, *name;
>> struct map *map = ms->map;
>> @@ -51,7 +52,8 @@ static struct ins_ops loongarch_call_ops = {
>> .scnprintf = call__scnprintf,
>> };
>> 
>> -static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
>> +static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> struct map *map = ms->map;
>> struct symbol *sym = ms->sym;
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
>> index d57fd023ef9c..10fea5e5cf4c 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -49,6 +49,22 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
>> return ops;
>> }
>> 
>> +#define PPC_OP(op)      (((op) >> 26) & 0x3F)
>> +
>> +static struct ins_ops *check_ppc_insn(int raw_insn)
> 
> It'd be nice to use 'u32' instead of 'int' for raw_insn if you want to
> do some bit operations.
Sure
> 
>> +{
>> + int opcode = PPC_OP(raw_insn);
>> +
>> + /*
>> +  * Instructions with opcode 32 to 63 are memory
>> +  * instructions in powerpc
>> +  */
>> + if ((opcode & 0x20))
>> + return &load_store_ops;
>> +
>> + return NULL;
>> +}
>> +
>> static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>> {
>> if (!arch->initialized) {
>> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> index 430623ca5612..e01729f3c0b3 100644
>> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
>> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> @@ -107,3 +107,47 @@ int regs_query_register_offset(const char *name)
>> #define PPC_DS(DS) ((DS) & 0xfffc)
>> #define OP_LD 58
>> #define OP_STD 62
>> +
>> +static int get_source_reg(unsigned int raw_insn)
>> +{
>> + return PPC_RA(raw_insn);
> 
> Ditto, and others too.
Ok
> 
> 
>> +}
>> +
>> +static int get_target_reg(unsigned int raw_insn)
>> +{
>> + return PPC_RT(raw_insn);
>> +}
>> +
>> +static int get_offset_opcode(int raw_insn)
>> +{
>> + int opcode = PPC_OP(raw_insn);
>> +
>> + /* DS- form */
>> + if ((opcode == OP_LD) || (opcode == OP_STD))
>> + return PPC_DS(raw_insn);
>> + else
>> + return PPC_D(raw_insn);
>> +}
>> +
>> +/*
>> + * Fills the required fields for op_loc depending on if it
>> + * is a source or target.
>> + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
>> + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
>> + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
>> + */
>> +void get_arch_regs(int raw_insn, int is_source,
>> + struct annotated_op_loc *op_loc)
>> +{
>> + if (is_source)
>> + op_loc->reg1 = get_source_reg(raw_insn);
>> + else
>> + op_loc->reg1 = get_target_reg(raw_insn);
>> +
>> + if (op_loc->multi_regs)
>> + op_loc->reg2 = PPC_RB(raw_insn);
>> +
>> + /* TODO: Implement offset handling for X Form */
>> + if ((op_loc->mem_ref) && (PPC_OP(raw_insn) != 31))
>> + op_loc->offset = get_offset_opcode(raw_insn);
>> +}
>> diff --git a/tools/perf/arch/s390/annotate/instructions.c b/tools/perf/arch/s390/annotate/instructions.c
>> index da5aa3e1f04c..eeac25cca699 100644
>> --- a/tools/perf/arch/s390/annotate/instructions.c
>> +++ b/tools/perf/arch/s390/annotate/instructions.c
>> @@ -2,7 +2,7 @@
>> #include <linux/compiler.h>
>> 
>> static int s390_call__parse(struct arch *arch, struct ins_operands *ops,
>> -     struct map_symbol *ms)
>> +     struct map_symbol *ms, struct disasm_line *dl __maybe_unused)
>> {
>> char *endptr, *tok, *name;
>> struct map *map = ms->map;
>> @@ -52,7 +52,8 @@ static struct ins_ops s390_call_ops = {
>> 
>> static int s390_mov__parse(struct arch *arch __maybe_unused,
>>    struct ins_operands *ops,
>> -    struct map_symbol *ms __maybe_unused)
>> +    struct map_symbol *ms __maybe_unused,
>> +    struct disasm_line *dl __maybe_unused)
>> {
>> char *s = strchr(ops->raw, ','), *target, *endptr;
>> 
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index 1451caf25e77..bfa6420dc4b9 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -2079,6 +2079,12 @@ static int extract_reg_offset(struct arch *arch, const char *str,
>> return 0;
>> }
>> 
>> +__weak void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused,
>> + struct annotated_op_loc *op_loc __maybe_unused)
> 
> I'd like to avoid adding weak functions if possible.  It's supposed to
> be powerpc only, maybe you can add get_powerpc_regs() in the arch
> directory and add a dummy static inline somewhere under #ifndef.
> 
>> +{
>> + return;
>> +}
>> +
>> /**
>>  * annotate_get_insn_location - Get location of instruction
>>  * @arch: the architecture info
>> @@ -2123,20 +2129,33 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
>> for_each_insn_op_loc(loc, i, op_loc) {
>> const char *insn_str = ops->source.raw;
>> bool multi_regs = ops->source.multi_regs;
>> + bool mem_ref = ops->source.mem_ref;
>> 
>> if (i == INSN_OP_TARGET) {
>> insn_str = ops->target.raw;
>> multi_regs = ops->target.multi_regs;
>> + mem_ref = ops->target.mem_ref;
>> }
>> 
>> /* Invalidate the register by default */
>> op_loc->reg1 = -1;
>> op_loc->reg2 = -1;
>> 
>> - if (insn_str == NULL)
>> - continue;
>> + if (insn_str == NULL) {
>> + if (!arch__is(arch, "powerpc"))
>> + continue;
>> + }
>> 
>> - if (strchr(insn_str, arch->objdump.memory_ref_char)) {
>> + /*
>> +  * For powerpc, call get_arch_regs function which extracts the
>> +  * required fields for op_loc, ie reg1, reg2, offset from the
>> +  * raw instruction.
>> +  */
>> + if (arch__is(arch, "powerpc")) {
>> + op_loc->mem_ref = mem_ref;
>> + op_loc->multi_regs = multi_regs;
>> + get_arch_regs(dl->raw.raw_insn, !i, op_loc);
>> + } else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
>> op_loc->mem_ref = true;
>> op_loc->multi_regs = multi_regs;
>> extract_reg_offset(arch, insn_str, op_loc);
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index 1e8568738b38..8428df0b9c17 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -37,6 +37,7 @@ static struct ins_ops mov_ops;
>> static struct ins_ops nop_ops;
>> static struct ins_ops lock_ops;
>> static struct ins_ops ret_ops;
>> +static struct ins_ops load_store_ops;
>> 
>> static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>>    struct ins_operands *ops, int max_ins_name);
>> @@ -254,7 +255,8 @@ bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2)
>> return arch->ins_is_fused(arch, ins1, ins2);
>> }
>> 
>> -static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
>> +static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> char *endptr, *tok, *name;
>> struct map *map = ms->map;
>> @@ -349,7 +351,8 @@ static inline const char *validate_comma(const char *c, struct ins_operands *ops
>> return c;
>> }
>> 
>> -static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
>> +static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> struct map *map = ms->map;
>> struct symbol *sym = ms->sym;
>> @@ -508,7 +511,8 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
>> return 0;
>> }
>> 
>> -static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
>> +static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> ops->locked.ops = zalloc(sizeof(*ops->locked.ops));
>> if (ops->locked.ops == NULL)
>> @@ -517,13 +521,13 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>> if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
>> goto out_free_ops;
>> 
>> - ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
>> + ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name, 0);
>> 
>> if (ops->locked.ins.ops == NULL)
>> goto out_free_ops;
>> 
>> if (ops->locked.ins.ops->parse &&
>> -     ops->locked.ins.ops->parse(arch, ops->locked.ops, ms) < 0)
>> +     ops->locked.ins.ops->parse(arch, ops->locked.ops, ms, NULL) < 0)
>> goto out_free_ops;
>> 
>> return 0;
>> @@ -594,7 +598,8 @@ static bool check_multi_regs(struct arch *arch, const char *op)
>> return count > 1;
>> }
>> 
>> -static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
>> +static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> char *s = strchr(ops->raw, ','), *target, *comment, prev;
>> 
>> @@ -672,7 +677,39 @@ static struct ins_ops mov_ops = {
>> .scnprintf = mov__scnprintf,
>> };
>> 
>> -static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
>> +static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
>> + struct ins_operands *ops, int max_ins_name)
>> +{
>> + return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
>> + ops->raw);
>> +}
>> +
>> +/*
>> + * Sets the fields: multi_regs and "mem_ref".
>> + * "mem_ref" is set for ops->source which is later used to
>> + * fill the objdump->memory_ref-char field. This ops is currently
>> + * used by powerpc and since binary instruction code is used to
>> + * extract opcode, regs and offset, no other parsing is needed here
>> + */
>> +static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
>> + struct map_symbol *ms __maybe_unused, struct disasm_line *dl __maybe_unused)
>> +{
>> + ops->source.mem_ref = true;
>> + ops->source.multi_regs = false;
>> +
>> + ops->target.mem_ref = false;
>> + ops->target.multi_regs = false;
>> +
>> + return 0;
>> +}
>> +
>> +static struct ins_ops load_store_ops = {
>> + .parse     = load_store__parse,
>> + .scnprintf = load_store__scnprintf,
>> +};
>> +
>> +static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
>> + struct disasm_line *dl __maybe_unused)
>> {
>> char *target, *comment, *s, prev;
>> 
>> @@ -762,11 +799,23 @@ static void ins__sort(struct arch *arch)
>> qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
>> }
>> 
>> -static struct ins_ops *__ins__find(struct arch *arch, const char *name)
>> +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int raw_insn)
>> {
>> struct ins *ins;
>> const int nmemb = arch->nr_instructions;
>> 
>> + if (arch__is(arch, "powerpc")) {
>> + /*
>> +  * For powerpc, identify the instruction ops
>> +  * from the opcode using raw_insn.
>> +  */
>> + struct ins_ops *ops;
>> +
>> + ops = check_ppc_insn(raw_insn);
>> + if (ops)
>> + return ops;
>> + }
>> +
>> if (!arch->sorted_instructions) {
>> ins__sort(arch);
>> arch->sorted_instructions = true;
>> @@ -796,9 +845,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name)
>> return ins ? ins->ops : NULL;
>> }
>> 
>> -struct ins_ops *ins__find(struct arch *arch, const char *name)
>> +struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn)
>> {
>> - struct ins_ops *ops = __ins__find(arch, name);
>> + struct ins_ops *ops = __ins__find(arch, name, raw_insn);
>> 
>> if (!ops && arch->associate_instruction_ops)
>> ops = arch->associate_instruction_ops(arch, name);
>> @@ -808,12 +857,12 @@ struct ins_ops *ins__find(struct arch *arch, const char *name)
>> 
>> static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
>> {
>> - dl->ins.ops = ins__find(arch, dl->ins.name);
>> + dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
>> 
>> if (!dl->ins.ops)
>> return;
>> 
>> - if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms) < 0)
>> + if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms, dl) < 0)
>> dl->ins.ops = NULL;
>> }
>> 
>> diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
>> index 718177fa4775..6b6ec23e4f6f 100644
>> --- a/tools/perf/util/disasm.h
>> +++ b/tools/perf/util/disasm.h
>> @@ -57,6 +57,7 @@ struct ins_operands {
>> bool offset_avail;
>> bool outside;
>> bool multi_regs;
>> + bool mem_ref;
>> } target;
>> union {
>> struct {
>> @@ -64,6 +65,7 @@ struct ins_operands {
>> char *name;
>> u64 addr;
>> bool multi_regs;
>> + bool mem_ref;
>> } source;
>> struct {
>> struct ins     ins;
>> @@ -78,7 +80,7 @@ struct ins_operands {
>> 
>> struct ins_ops {
>> void (*free)(struct ins_operands *ops);
>> - int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms);
>> + int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms, struct disasm_line *dl);
> 
> The line is too long, please break.
> 
> Thanks,
> Namhyung
> 
> 
>> int (*scnprintf)(struct ins *ins, char *bf, size_t size,
>>  struct ins_operands *ops, int max_ins_name);
>> };
>> @@ -97,7 +99,7 @@ struct annotate_args {
>> struct arch *arch__find(const char *name);
>> bool arch__is(struct arch *arch, const char *name);
>> 
>> -struct ins_ops *ins__find(struct arch *arch, const char *name);
>> +struct ins_ops *ins__find(struct arch *arch, const char *name, int raw_insn);
>> int ins__scnprintf(struct ins *ins, char *bf, size_t size,
>>    struct ins_operands *ops, int max_ins_name);
>> 
>> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
>> index 01fb25a1150a..7ea39362ecaf 100644
>> --- a/tools/perf/util/include/dwarf-regs.h
>> +++ b/tools/perf/util/include/dwarf-regs.h
>> @@ -1,6 +1,7 @@
>> /* SPDX-License-Identifier: GPL-2.0 */
>> #ifndef _PERF_DWARF_REGS_H_
>> #define _PERF_DWARF_REGS_H_
>> +#include "annotate.h"
>> 
>> #define DWARF_REG_PC  0xd3af9c /* random number */
>> #define DWARF_REG_FB  0xd3affb /* random number */
>> @@ -31,6 +32,8 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
>> }
>> #endif
>> 
>> +void get_arch_regs(int raw_insn, int is_source, struct annotated_op_loc *op_loc);
>> +
>> #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
>> /*
>>  * Arch should support fetching the offset of a register in pt_regs
>> -- 
>> 2.43.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 13/16] tools/perf: Add support to use libcapstone in powerpc
  2024-06-25  6:08   ` Namhyung Kim
@ 2024-06-25 12:44     ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 12:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 25 Jun 2024, at 11:38 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Fri, Jun 14, 2024 at 10:56:28PM +0530, Athira Rajeev wrote:
>> Now perf uses the capstone library to disassemble the instructions in
>> x86. capstone is used (if available) for perf annotate to speed up.
>> Currently it only supports x86 architecture. Patch includes changes to
>> enable this in powerpc. For now, only for data type sort keys, this
>> method is used and only binary code (raw instruction) is read. This is
>> because powerpc approach to understand instructions and reg fields uses
>> raw instruction. The "cs_disasm" is currently not enabled. While
>> attempting to do cs_disasm, observation is that some of the instructions
>> were not identified (ex: extswsli, maddld) and it had to fallback to use
>> objdump. Hence enabling "cs_disasm" is added in comment section as a
>> TODO for powerpc.
> 
> Well.. I'm not sure if I understand it correctly but it seems this
> function effectively does nothing more than the raw disassemble.
> Can we simply drop this patch for now?  Or did I miss something?
> 
> Thanks,
> Namhyung

Hi Namhyung

Responded to this in previous patch ( for Patch number 3 ) 

Thanks
Athira

> 
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 143 insertions(+)
>> 
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index 43743ca4bdc9..987bff9f71c3 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -1592,6 +1592,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
>> }
>> }
>> 
>> +static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
>> + struct annotate_args *args)
>> +{
>> + struct annotation *notes = symbol__annotation(sym);
>> + struct map *map = args->ms.map;
>> + struct dso *dso = map__dso(map);
>> + struct nscookie nsc;
>> + u64 start = map__rip_2objdump(map, sym->start);
>> + u64 end = map__rip_2objdump(map, sym->end);
>> + u64 len = end - start;
>> + u64 offset;
>> + int i, fd, count;
>> + bool is_64bit = false;
>> + bool needs_cs_close = false;
>> + u8 *buf = NULL;
>> + struct find_file_offset_data data = {
>> + .ip = start,
>> + };
>> + csh handle;
>> + char disasm_buf[512];
>> + struct disasm_line *dl;
>> + u32 *line;
>> + bool disassembler_style = false;
>> +
>> + if (args->options->objdump_path)
>> + return -1;
>> +
>> + nsinfo__mountns_enter(dso->nsinfo, &nsc);
>> + fd = open(filename, O_RDONLY);
>> + nsinfo__mountns_exit(&nsc);
>> + if (fd < 0)
>> + return -1;
>> +
>> + if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
>> +    &is_64bit) == 0)
>> + goto err;
>> +
>> + if (!args->options->disassembler_style ||
>> + !strcmp(args->options->disassembler_style, "att"))
>> + disassembler_style = true;
>> +
>> + if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
>> + goto err;
>> +
>> + needs_cs_close = true;
>> +
>> + buf = malloc(len);
>> + if (buf == NULL)
>> + goto err;
>> +
>> + count = pread(fd, buf, len, data.offset);
>> + close(fd);
>> + fd = -1;
>> +
>> + if ((u64)count != len)
>> + goto err;
>> +
>> + line = (u32 *)buf;
>> +
>> + /* add the function address and name */
>> + scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
>> +  start, sym->name);
>> +
>> + args->offset = -1;
>> + args->line = disasm_buf;
>> + args->line_nr = 0;
>> + args->fileloc = NULL;
>> + args->ms.sym = sym;
>> +
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, &notes->src->source);
>> +
>> + /*
>> + * TODO: enable disassm for powerpc
>> + * count = cs_disasm(handle, buf, len, start, len, &insn);
>> + *
>> + * For now, only binary code is saved in disassembled line
>> + * to be used in "type" and "typeoff" sort keys. Each raw code
>> + * is 32 bit instruction. So use "len/4" to get the number of
>> + * entries.
>> + */
>> + count = len/4;
>> +
>> + for (i = 0, offset = 0; i < count; i++) {
>> + args->offset = offset;
>> + sprintf(args->line, "%x", line[i]);
>> +
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, &notes->src->source);
>> +
>> + offset += 4;
>> + }
>> +
>> + /* It failed in the middle */
>> + if (offset != len) {
>> + struct list_head *list = &notes->src->source;
>> +
>> + /* Discard all lines and fallback to objdump */
>> + while (!list_empty(list)) {
>> + dl = list_first_entry(list, struct disasm_line, al.node);
>> +
>> + list_del_init(&dl->al.node);
>> + disasm_line__free(dl);
>> + }
>> + count = -1;
>> + }
>> +
>> +out:
>> + if (needs_cs_close)
>> + cs_close(&handle);
>> + free(buf);
>> + return count < 0 ? count : 0;
>> +
>> +err:
>> + if (fd >= 0)
>> + close(fd);
>> + if (needs_cs_close) {
>> + struct disasm_line *tmp;
>> +
>> + /*
>> + * It probably failed in the middle of the above loop.
>> + * Release any resources it might add.
>> + */
>> + list_for_each_entry_safe(dl, tmp, &notes->src->source, al.node) {
>> + list_del(&dl->al.node);
>> + free(dl);
>> + }
>> + }
>> + count = -1;
>> + goto out;
>> +}
>> +
>> static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>> struct annotate_args *args)
>> {
>> @@ -1949,6 +2087,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>> err = symbol__disassemble_dso(symfs_filename, sym, args);
>> if (err == 0)
>> goto out_remove_tmp;
>> +#ifdef HAVE_LIBCAPSTONE_SUPPORT
>> + err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
>> + if (err == 0)
>> + goto out_remove_tmp;
>> +#endif
>> }
>> }
>> 
>> -- 
>> 2.43.0
>> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-06-25  6:17   ` Namhyung Kim
@ 2024-06-25 12:45     ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-25 12:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 25 Jun 2024, at 11:47 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Fri, Jun 14, 2024 at 10:56:29PM +0530, Athira Rajeev wrote:
>> There are cases where define a global register variable and associate it
>> with a specified register. Example, in powerpc, two registers are
>> defined to represent variable:
>> 1. r13: represents local_paca
>> register struct paca_struct *local_paca asm("r13");
>> 
>> 2. r1: represents stack_pointer
>> register void *__stack_pointer asm("r1");
>> 
>> These regs are present in dwarf debug as DW_OP_reg as part of variables
>> in the cu_die (compile unit). These are not present in die search done
>> in the list of nested scopes since these are global register variables.
>> 
>> Example for local_paca represented by r13:
>> 
>> <<>>
>> <1><18dc6b4>: Abbrev Number: 128 (DW_TAG_variable)
>>    <18dc6b6>   DW_AT_name        : (indirect string, offset: 0x3861): local_paca
>>    <18dc6ba>   DW_AT_decl_file   : 48
>>    <18dc6bb>   DW_AT_decl_line   : 36
>>    <18dc6bc>   DW_AT_decl_column : 30
>>    <18dc6bd>   DW_AT_type        : <0x18dc6c3>
>>    <18dc6c1>   DW_AT_external    : 1
>>    <18dc6c1>   DW_AT_location    : 1 byte block: 5d    (DW_OP_reg13 (r13))
>> 
>> <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>    <18dc6c4>   DW_AT_byte_size   : 8
>>    <18dc6c4>   DW_AT_type        : <0x18dc353>
>> 
>> Where  DW_AT_type : <0x18dc6c3> further points to :
>> 
>> <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>    <18dc6c4>   DW_AT_byte_size   : 8
>>    <18dc6c4>   DW_AT_type        : <0x18dc353>
>> 
>> which belongs to:
>> 
>> <1><18dc353>: Abbrev Number: 67 (DW_TAG_structure_type)
>>    <18dc354>   DW_AT_name        : (indirect string, offset: 0x56cd): paca_struct
>>    <18dc358>   DW_AT_byte_size   : 2944
>>    <18dc35a>   DW_AT_alignment   : 128
>>    <18dc35b>   DW_AT_decl_file   : 48
>>    <18dc35c>   DW_AT_decl_line   : 61
>>    <18dc35d>   DW_AT_decl_column : 8
>>    <18dc35d>   DW_AT_sibling     : <0x18dc6b4>
>> <<>>
>> 
>> Similar is case with "r1".
>> 
>> <<>>
>> <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
>>    <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
>>    <18dd778>   DW_AT_decl_file   : 51
>>    <18dd779>   DW_AT_decl_line   : 1468
>>    <18dd77b>   DW_AT_decl_column : 24
>>    <18dd77c>   DW_AT_type        : <0x18da5cd>
>>    <18dd780>   DW_AT_external    : 1
>>    <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))
>> 
>> where 18da5cd is:
>> 
>> <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
>>    <18da5ce>   DW_AT_byte_size   : 8
>>    <18da5cf>   DW_AT_encoding    : 7   (unsigned)
>>    <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
>> <<>>
>> 
>> To identify data type for these two special cases, iterate over
>> variables in the CU die (Compile Unit) and match it with the register.
>> If the variable is a base type, ie die_get_real_type will return NULL
>> here, set offset to zero. With the changes, data type for "paca_struct"
>> and "long unsigned int" for r1 is identified.
>> 
>> Snippet from ./perf report -s type,type_off
>> 
>>    12.85%  long unsigned int  long unsigned int +0 (no field)
>>     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>>     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/util/annotate-data.c      | 40 ++++++++++++++++++++++++++++
>> tools/perf/util/annotate.c           |  8 ++++++
>> tools/perf/util/annotate.h           |  1 +
>> tools/perf/util/include/dwarf-regs.h |  1 +
>> 4 files changed, 50 insertions(+)
>> 
>> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
>> index 734acdd8c4b7..82232f2d8e16 100644
>> --- a/tools/perf/util/annotate-data.c
>> +++ b/tools/perf/util/annotate-data.c
>> @@ -1170,6 +1170,40 @@ static int find_data_type_block(struct data_loc_info *dloc,
>> return ret;
>> }
>> 
>> +/*
>> + * Handle cases where define a global register variable and
>> + * associate it with a specified register. These regs are
>> + * present in dwarf debug as DW_OP_reg as part of variables
>> + * in the cu_die (compile unit). Iterate over variables in the
>> + * cu_die and match with reg to identify data type die.
> 
> Ok, if they always point to the same type, you may cache the result and
> avoid the repeated search everytime.
Ok
> 
>> + */
>> +static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
>> + Dwarf_Die *type_die)
>> +{
>> + Dwarf_Die vr_die;
>> + int ret = -1;
>> + struct die_var_type *var_types = NULL;
>> +
>> + die_collect_vars(cu_die, &var_types);
>> + while (var_types) {
>> + if (var_types->reg == reg) {
>> + if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
>> + if (die_get_real_type(&vr_die, type_die) == NULL) {
>> + dloc->type_offset = 0;
>> + dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
>> + }
>> + pr_debug_type_name(type_die, TSR_KIND_TYPE);
>> + ret = 0;
>> + pr_debug_dtp("found by CU for %s (die:%#lx)\n",
>> + dwarf_diename(type_die), (long)dwarf_dieoffset(type_die));
>> + }
>> + break;
>> + }
>> + var_types = var_types->next;
>> + }
> 
> Please add 'delete_var_types(var_types);' here.
> 
> Thanks,
> Namhyung

Sure, will make these changes in V5

Thanks
Athira
> 
> 
>> + return ret;
>> +}
>> +
>> /* The result will be saved in @type_die */
>> static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
>> {
>> @@ -1217,6 +1251,12 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
>> pr_debug_dtp("CU for %s (die:%#lx)\n",
>>      dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
>> 
>> + if (loc->reg_type == DWARF_REG_GLOBAL) {
>> + ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
>> + if (!ret)
>> + goto out;
>> + }
>> +
>> if (reg == DWARF_REG_PC) {
>> if (get_global_var_type(&cu_die, dloc, dloc->ip, dloc->var_addr,
>> &offset, type_die)) {
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index bfa6420dc4b9..c7e4fd16e8b4 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -2431,6 +2431,14 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
>> op_loc->reg1 = DWARF_REG_PC;
>> }
>> 
>> + /* Global reg variable 13 and 1
>> +  * assign to DWARF_REG_GLOBAL
>> +  */
>> + if (arch__is(arch, "powerpc")) {
>> + if ((op_loc->reg1 == 13) || (op_loc->reg1 == 1))
>> + op_loc->reg_type = DWARF_REG_GLOBAL;
>> + }
>> +
>> mem_type = find_data_type(&dloc);
>> 
>> if (mem_type == NULL && is_stack_canary(arch, op_loc)) {
>> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
>> index 9ba772f46270..ad69842a8ebc 100644
>> --- a/tools/perf/util/annotate.h
>> +++ b/tools/perf/util/annotate.h
>> @@ -475,6 +475,7 @@ struct annotated_op_loc {
>> bool mem_ref;
>> bool multi_regs;
>> bool imm;
>> + int reg_type;
>> };
>> 
>> enum annotated_insn_ops {
>> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
>> index 7ea39362ecaf..a873c906a86b 100644
>> --- a/tools/perf/util/include/dwarf-regs.h
>> +++ b/tools/perf/util/include/dwarf-regs.h
>> @@ -5,6 +5,7 @@
>> 
>> #define DWARF_REG_PC  0xd3af9c /* random number */
>> #define DWARF_REG_FB  0xd3affb /* random number */
>> +#define DWARF_REG_GLOBAL 0xd3affc /* random number */
>> 
>> #ifdef HAVE_DWARF_SUPPORT
>> const char *get_arch_regstr(unsigned int n);
>> -- 
>> 2.43.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-06-25 12:38     ` Athira Rajeev
@ 2024-06-25 18:39       ` Namhyung Kim
  2024-06-26  4:09         ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25 18:39 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, linux-kernel,
	linux-perf-users, linuxppc-dev, akanksha, maddy, kjain, disgoel

On Tue, Jun 25, 2024 at 06:08:49PM +0530, Athira Rajeev wrote:
> 
> 
> > On 25 Jun 2024, at 10:59 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > On Fri, Jun 14, 2024 at 10:56:18PM +0530, Athira Rajeev wrote:
> >> Add support to capture and parse raw instruction in powerpc.
> >> Currently, the perf tool infrastructure uses two ways to disassemble
> >> and understand the instruction. One is objdump and other option is
> >> via libcapstone.
> >> 
> >> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> >> with "objdump" while disassemble. Example from powerpc with this option
> >> for an instruction address is:
> >> 
> >> Snippet from:
> >> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> >> 
> >> c0000000010224b4: lwz     r10,0(r9)
> > 
> > What about removing --no-show-raw-insn and parse the raw byte code in
> > the output for powerpc?  I think it's better to support normal
> > annotation together.
> Hi Namhyung,
> 
> Yes, In the other patch in same series, I have added support for normal annotation together.
> Patch 5 includes changes to work with binary code as well as mneumonic representation.
> 
> Example representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8 ld r4,312(r1)
> 
> Patch5 has changes to use “objdump” with --show-raw-insn to read the raw instruction and also support normal annotation.

Ok, that's good!


> In case of data type profiling, with only sort keys, (type, typeoff) there is no need to disassemble and then get raw byte code.
> Binary code can be read directly from the DSO. Compared to using objdump, directly reading from DSO will be faster in this case.

Sounds like an optimization.  Then I think you'd better handle the
general case first and optimize later.  Probably you want to merge
patch 3 and 4 together.

Thanks,
Namhyung


> In summary, current patchset uses below approach:
> 
> 1. Read directly from DSO using dso__data_read_offset if only “type, typeoff” is needed.
> 2. If in any case reading directly from DSO fails, fallback to using libcapstone. Using libcapstone to read is faster than objdump
> 3. If libcapstone is not supported, approach will use objdump. Patchset has changes to handle objdump result created with show-raw-ins in powerpc. 
> 4. Also for normal perf report or perf annotate, approach will use objdump
> 
> NOTE:
> libcapstone is used currently only for reading raw binary code. Disassemble is currently not enabled. While attempting to do cs_disasm, observation is that some of the instructions were not identified (ex: extswsli, maddld) and it had to fallback to use objdump. Hence enabling "cs_disasm" is added in comment section as a TODO for powerpc. Patch number 13. 
> 
> Thanks
> Athira
> 
> > 
> >> 
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. Also to find whether there is a memory
> >> reference in the operands, "memory_ref_char" field of objdump is used.
> >> For x86, "(" is used as memory_ref_char to tackle instructions of the
> >> form "mov  (%rax), %rcx".
> >> 
> >> In case of powerpc, not all instructions using "(" are the only memory
> >> instructions. Example, above instruction can also be of extended form (X
> >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> >> and extract the source/target registers, patch adds support to use raw
> >> instruction for powerpc. Approach used is to read the raw instruction
> >> directly from the DSO file using "dso__data_read_offset" utility which
> >> is already implemented in perf infrastructure in "util/dso.c".
> >> 
> >> Example:
> >> 
> >> 38 01 81 e8     ld      r4,312(r1)
> >> 
> >> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> >> this translates to instruction form: "ld RT,DS(RA)" and binary code
> >> as:
> >> 
> >>   | 58 |  RT  |  RA |      DS       | |
> >>   -------------------------------------
> >>   0    6     11    16              30 31
> >> 
> >> Function "symbol__disassemble_dso" is updated to read raw instruction
> >> directly from DSO using dso__data_read_offset utility. In case of
> >> above example, this captures:
> >> line:    38 01 81 e8
> >> 
> >> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> >> ---
> >> tools/perf/util/disasm.c | 98 ++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 98 insertions(+)
> >> 
> >> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> >> index b5fe3a7508bb..f19496133bf0 100644
> >> --- a/tools/perf/util/disasm.c
> >> +++ b/tools/perf/util/disasm.c
> >> @@ -1586,6 +1586,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
> >> }
> >> #endif
> >> 
> >> +static int symbol__disassemble_dso(char *filename, struct symbol *sym,
> > 
> > Maybe rename to symbol__disassemble_raw() ?
> 
> This is specifically using dso__data_read_offset. Hence using symbol__disassemble_dso 
> > 
> >> + struct annotate_args *args)
> >> +{
> >> + struct annotation *notes = symbol__annotation(sym);
> >> + struct map *map = args->ms.map;
> >> + struct dso *dso = map__dso(map);
> >> + u64 start = map__rip_2objdump(map, sym->start);
> >> + u64 end = map__rip_2objdump(map, sym->end);
> >> + u64 len = end - start;
> >> + u64 offset;
> >> + int i, count;
> >> + u8 *buf = NULL;
> >> + char disasm_buf[512];
> >> + struct disasm_line *dl;
> >> + u32 *line;
> >> +
> >> + /* Return if objdump is specified explicitly */
> >> + if (args->options->objdump_path)
> >> + return -1;
> >> +
> >> + pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);
> > 
> > You may want to print the actual offset and remove the "using
> > dso__data_read_offset" part.
> 
> Ok Sure
> > 
> > Thanks,
> > Namhyung
> > 
> >> +
> >> + buf = malloc(len);
> >> + if (buf == NULL)
> >> + goto err;
> >> +
> >> + count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
> >> +
> >> + line = (u32 *)buf;
> >> +
> >> + if ((u64)count != len)
> >> + goto err;
> >> +
> >> + /* add the function address and name */
> >> + scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
> >> +   start, sym->name);
> >> +
> >> + args->offset = -1;
> >> + args->line = disasm_buf;
> >> + args->line_nr = 0;
> >> + args->fileloc = NULL;
> >> + args->ms.sym = sym;
> >> +
> >> + dl = disasm_line__new(args);
> >> + if (dl == NULL)
> >> + goto err;
> >> +
> >> + annotation_line__add(&dl->al, &notes->src->source);
> >> +
> >> + /* Each raw instruction is 4 byte */
> >> + count = len/4;
> >> +
> >> + for (i = 0, offset = 0; i < count; i++) {
> >> + args->offset = offset;
> >> + sprintf(args->line, "%x", line[i]);
> >> + dl = disasm_line__new(args);
> >> + if (dl == NULL)
> >> + goto err;
> >> +
> >> + annotation_line__add(&dl->al, &notes->src->source);
> >> + offset += 4;
> >> + }
> >> +
> >> + /* It failed in the middle */
> >> + if (offset != len) {
> >> + struct list_head *list = &notes->src->source;
> >> +
> >> + /* Discard all lines and fallback to objdump */
> >> + while (!list_empty(list)) {
> >> + dl = list_first_entry(list, struct disasm_line, al.node);
> >> +
> >> + list_del_init(&dl->al.node);
> >> + disasm_line__free(dl);
> >> + }
> >> + count = -1;
> >> + }
> >> +
> >> +out:
> >> + free(buf);
> >> + return count < 0 ? count : 0;
> >> +
> >> +err:
> >> + count = -1;
> >> + goto out;
> >> +}
> >> /*
> >>  * Possibly create a new version of line with tabs expanded. Returns the
> >>  * existing or new line, storage is updated if a new line is allocated. If
> >> @@ -1710,6 +1795,19 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
> >> strcpy(symfs_filename, tmp);
> >> }
> >> 
> >> + /*
> >> +  * For powerpc data type profiling, use the dso__data_read_offset
> >> +  * to read raw instruction directly and interpret the binary code
> >> +  * to understand instructions and register fields. For sort keys as
> >> +  * type and typeoff, disassemble to mnemonic notation is
> >> +  * not required in case of powerpc.
> >> +  */
> >> + if (arch__is(args->arch, "powerpc")) {
> >> + err = symbol__disassemble_dso(symfs_filename, sym, args);
> >> + if (err == 0)
> >> + goto out_remove_tmp;
> >> + }
> >> +
> >> #ifdef HAVE_LIBCAPSTONE_SUPPORT
> >> err = symbol__disassemble_capstone(symfs_filename, sym, args);
> >> if (err == 0)
> >> -- 
> >> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-25 12:42     ` Athira Rajeev
@ 2024-06-25 18:45       ` Namhyung Kim
  2024-06-26  4:08         ` Athira Rajeev
  0 siblings, 1 reply; 40+ messages in thread
From: Namhyung Kim @ 2024-06-25 18:45 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel

On Tue, Jun 25, 2024 at 06:12:51PM +0530, Athira Rajeev wrote:
> 
> 
> > On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
> >> Currently, the perf tool infrastructure disasm_line__parse function to
> >> parse disassembled line.
> >> 
> >> Example snippet from objdump:
> >> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> >> 
> >> c0000000010224b4: lwz     r10,0(r9)
> >> 
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. In powerpc, the approach for data type
> >> profiling uses raw instruction instead of result from objdump to identify
> >> the instruction category and extract the source/target registers.
> >> 
> >> Example: 38 01 81 e8     ld      r4,312(r1)
> >> 
> >> Here "38 01 81 e8" is the raw instruction representation. Add function
> >> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
> >> Also update "struct disasm_line" to save the binary code/
> >> With the change, function captures:
> >> 
> >> line -> "38 01 81 e8     ld      r4,312(r1)"
> >> raw instruction "38 01 81 e8"
> >> 
> >> Raw instruction is used later to extract the reg/offset fields. Macros
> >> are added to extract opcode and register fields. "struct disasm_line"
> >> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
> >> code (raw). Function "disasm_line__parse_powerpc fills the raw
> >> instruction hex value and can use macros to get opcode. There is no
> >> changes in existing code paths, which parses the disassembled code.
> >> The architecture using the instruction name and present approach is
> >> not altered. Since this approach targets powerpc, the macro
> >> implementation is added for powerpc as of now.
> >> 
> >> Since the disasm_line__parse is used in other cases (perf annotate) and
> >> not only data tye profiling, the powerpc callback includes changes to
> >> work with binary code as well as mneumonic representation. Also in case
> >> if the DSO read fails and libcapstone is not supported, the approach
> >> fallback to use objdump as option. Hence as option, patch has changes to
> >> ensure objdump option also works well.
> >> 
> >> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> >> ---
> >> tools/include/linux/string.h                  |  2 +
> >> tools/lib/string.c                            | 13 ++++
> >> .../perf/arch/powerpc/annotate/instructions.c |  1 +
> >> tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 +++
> >> tools/perf/util/annotate.h                    |  5 +-
> >> tools/perf/util/disasm.c                      | 59 ++++++++++++++++++-
> >> 6 files changed, 87 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
> >> index db5c99318c79..0acb1fc14e19 100644
> >> --- a/tools/include/linux/string.h
> >> +++ b/tools/include/linux/string.h
> >> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
> >> 
> >> extern char *strim(char *);
> >> 
> >> +extern void remove_spaces(char *s);
> >> +
> >> extern void *memchr_inv(const void *start, int c, size_t bytes);
> >> #endif /* _TOOLS_LINUX_STRING_H_ */
> >> diff --git a/tools/lib/string.c b/tools/lib/string.c
> >> index 8b6892f959ab..3126d2cff716 100644
> >> --- a/tools/lib/string.c
> >> +++ b/tools/lib/string.c
> >> @@ -153,6 +153,19 @@ char *strim(char *s)
> >> return skip_spaces(s);
> >> }
> >> 
> >> +/*
> >> + * remove_spaces - Removes whitespaces from @s
> >> + */
> >> +void remove_spaces(char *s)
> >> +{
> >> + char *d = s;
> >> +
> >> + do {
> >> + while (*d == ' ')
> >> + ++d;
> >> + } while ((*s++ = *d++));
> >> +}
> >> +
> >> /**
> >>  * strreplace - Replace all occurrences of character in string.
> >>  * @s: The string to operate on.
> >> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
> >> index a3f423c27cae..d57fd023ef9c 100644
> >> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> >> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> >> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
> >> arch->initialized = true;
> >> arch->associate_instruction_ops = powerpc__associate_instruction_ops;
> >> arch->objdump.comment_char      = '#';
> >> + annotate_opts.show_asm_raw = true;
> > 
> > Right, I think this will add the raw insn in the output of objdump, no?
> > Why not using the information?
> 
> Shared response in previous patch

Ok, now I understand it's a fallback. :)

> > 
> >> }
> >> 
> >> return 0;
> >> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> >> index 0c4f4caf53ac..430623ca5612 100644
> >> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> >> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> >> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
> >> return roff->ptregs_offset;
> >> return -EINVAL;
> >> }
> >> +
> >> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
> >> +#define PPC_RA(a) (((a) >> 16) & 0x1f)
> >> +#define PPC_RT(t) (((t) >> 21) & 0x1f)
> >> +#define PPC_RB(b) (((b) >> 11) & 0x1f)
> >> +#define PPC_D(D) ((D) & 0xfffe)
> >> +#define PPC_DS(DS) ((DS) & 0xfffc)
> >> +#define OP_LD 58
> >> +#define OP_STD 62
> >> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> >> index d5c821c22f79..9ba772f46270 100644
> >> --- a/tools/perf/util/annotate.h
> >> +++ b/tools/perf/util/annotate.h
> >> @@ -113,7 +113,10 @@ struct annotation_line {
> >> struct disasm_line {
> >> struct ins  ins;
> >> struct ins_operands  ops;
> >> -
> >> + union {
> >> + u8 bytes[4];
> >> + u32 raw_insn;
> >> + } raw;
> >> /* This needs to be at the end. */
> >> struct annotation_line  al;
> >> };
> >> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> >> index b81cdcf4d6b4..1e8568738b38 100644
> >> --- a/tools/perf/util/disasm.c
> >> +++ b/tools/perf/util/disasm.c
> >> @@ -45,6 +45,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
> >> 
> >> static void ins__sort(struct arch *arch);
> >> static int disasm_line__parse(char *line, const char **namep, char **rawp);
> >> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
> >> 
> >> static __attribute__((constructor)) void symbol__init_regexpr(void)
> >> {
> >> @@ -844,6 +845,59 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
> >> return -1;
> >> }
> >> 
> >> +/*
> >> + * Parses the result captured from symbol__disassemble_*
> >> + * Example, line read from DSO file in powerpc:
> >> + * line:    38 01 81 e8
> >> + * opcode: fetched from arch specific get_opcode_insn
> >> + * rawp_insn: e8810138
> >> + *
> >> + * rawp_insn is used later to extract the reg/offset fields
> >> + */
> >> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
> >> +
> >> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
> >> +{
> >> + char *line = dl->al.line;
> >> + const char **namep = &dl->ins.name;
> >> + char **rawp = &dl->ops.raw;
> >> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
> >> + char *name = skip_spaces(name_raw_insn + 11);
> >> + int objdump = 0;
> >> +
> >> + if (strlen(line) > 11)
> >> + objdump = 1;
> >> +
> >> + if (name_raw_insn[0] == '\0')
> >> + return -1;
> >> +
> >> + if (objdump) {
> >> + *rawp = name + 1;
> >> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
> >> + ++*rawp;
> >> + tmp = (*rawp)[0];
> >> + (*rawp)[0] = '\0';
> >> +
> >> + *namep = strdup(name);
> >> + if (*namep == NULL)
> >> + return -1;
> >> +
> >> + (*rawp)[0] = tmp;
> >> + *rawp = strim(*rawp);
> >> + } else
> >> + *namep = "";

Then can you handle this logic under if (annotate_opts.show_raw_insn)
in disasm_line__parse() instead of adding a new function?

Thanks,
Namhyung


> >> +
> >> + tmp_raw_insn = strdup(name_raw_insn);
> >> + tmp_raw_insn[11] = '\0';
> >> + remove_spaces(tmp_raw_insn);
> >> +
> >> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
> >> + if (objdump)
> >> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
> > 
> > Hmm.. can you use a sscanf() instead?
> > 
> >  sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
> > 
> > Thanks,
> > Namhyung
> > 
> Sure will address in V5
> 
> Thanks
> Athira
> >> +
> >> + return 0;
> >> +}
> >> +
> >> static void annotation_line__init(struct annotation_line *al,
> >>   struct annotate_args *args,
> >>   int nr)
> >> @@ -897,7 +951,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
> >> goto out_delete;
> >> 
> >> if (args->offset != -1) {
> >> - if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
> >> + if (arch__is(args->arch, "powerpc")) {
> >> + if (disasm_line__parse_powerpc(dl) < 0)
> >> + goto out_free_line;
> >> + } else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
> >> goto out_free_line;
> >> 
> >> disasm_line__init_ins(dl, args->arch, &args->ms);
> >> -- 
> >> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-25 18:45       ` Namhyung Kim
@ 2024-06-26  4:08         ` Athira Rajeev
  2024-06-26 21:17           ` Namhyung Kim
  0 siblings, 1 reply; 40+ messages in thread
From: Athira Rajeev @ 2024-06-26  4:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel



> On 26 Jun 2024, at 12:15 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Tue, Jun 25, 2024 at 06:12:51PM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
>>>> Currently, the perf tool infrastructure disasm_line__parse function to
>>>> parse disassembled line.
>>>> 
>>>> Example snippet from objdump:
>>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>>>> 
>>>> c0000000010224b4: lwz     r10,0(r9)
>>>> 
>>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>>> registers names and offset. In powerpc, the approach for data type
>>>> profiling uses raw instruction instead of result from objdump to identify
>>>> the instruction category and extract the source/target registers.
>>>> 
>>>> Example: 38 01 81 e8     ld      r4,312(r1)
>>>> 
>>>> Here "38 01 81 e8" is the raw instruction representation. Add function
>>>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
>>>> Also update "struct disasm_line" to save the binary code/
>>>> With the change, function captures:
>>>> 
>>>> line -> "38 01 81 e8     ld      r4,312(r1)"
>>>> raw instruction "38 01 81 e8"
>>>> 
>>>> Raw instruction is used later to extract the reg/offset fields. Macros
>>>> are added to extract opcode and register fields. "struct disasm_line"
>>>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
>>>> code (raw). Function "disasm_line__parse_powerpc fills the raw
>>>> instruction hex value and can use macros to get opcode. There is no
>>>> changes in existing code paths, which parses the disassembled code.
>>>> The architecture using the instruction name and present approach is
>>>> not altered. Since this approach targets powerpc, the macro
>>>> implementation is added for powerpc as of now.
>>>> 
>>>> Since the disasm_line__parse is used in other cases (perf annotate) and
>>>> not only data tye profiling, the powerpc callback includes changes to
>>>> work with binary code as well as mneumonic representation. Also in case
>>>> if the DSO read fails and libcapstone is not supported, the approach
>>>> fallback to use objdump as option. Hence as option, patch has changes to
>>>> ensure objdump option also works well.
>>>> 
>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>> ---
>>>> tools/include/linux/string.h                  |  2 +
>>>> tools/lib/string.c                            | 13 ++++
>>>> .../perf/arch/powerpc/annotate/instructions.c |  1 +
>>>> tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 +++
>>>> tools/perf/util/annotate.h                    |  5 +-
>>>> tools/perf/util/disasm.c                      | 59 ++++++++++++++++++-
>>>> 6 files changed, 87 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
>>>> index db5c99318c79..0acb1fc14e19 100644
>>>> --- a/tools/include/linux/string.h
>>>> +++ b/tools/include/linux/string.h
>>>> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>>>> 
>>>> extern char *strim(char *);
>>>> 
>>>> +extern void remove_spaces(char *s);
>>>> +
>>>> extern void *memchr_inv(const void *start, int c, size_t bytes);
>>>> #endif /* _TOOLS_LINUX_STRING_H_ */
>>>> diff --git a/tools/lib/string.c b/tools/lib/string.c
>>>> index 8b6892f959ab..3126d2cff716 100644
>>>> --- a/tools/lib/string.c
>>>> +++ b/tools/lib/string.c
>>>> @@ -153,6 +153,19 @@ char *strim(char *s)
>>>> return skip_spaces(s);
>>>> }
>>>> 
>>>> +/*
>>>> + * remove_spaces - Removes whitespaces from @s
>>>> + */
>>>> +void remove_spaces(char *s)
>>>> +{
>>>> + char *d = s;
>>>> +
>>>> + do {
>>>> + while (*d == ' ')
>>>> + ++d;
>>>> + } while ((*s++ = *d++));
>>>> +}
>>>> +
>>>> /**
>>>> * strreplace - Replace all occurrences of character in string.
>>>> * @s: The string to operate on.
>>>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
>>>> index a3f423c27cae..d57fd023ef9c 100644
>>>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>>>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>>>> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>>>> arch->initialized = true;
>>>> arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>>>> arch->objdump.comment_char      = '#';
>>>> + annotate_opts.show_asm_raw = true;
>>> 
>>> Right, I think this will add the raw insn in the output of objdump, no?
>>> Why not using the information?
>> 
>> Shared response in previous patch
> 
> Ok, now I understand it's a fallback. :)
> 
>>> 
>>>> }
>>>> 
>>>> return 0;
>>>> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
>>>> index 0c4f4caf53ac..430623ca5612 100644
>>>> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
>>>> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
>>>> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
>>>> return roff->ptregs_offset;
>>>> return -EINVAL;
>>>> }
>>>> +
>>>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>>>> +#define PPC_RA(a) (((a) >> 16) & 0x1f)
>>>> +#define PPC_RT(t) (((t) >> 21) & 0x1f)
>>>> +#define PPC_RB(b) (((b) >> 11) & 0x1f)
>>>> +#define PPC_D(D) ((D) & 0xfffe)
>>>> +#define PPC_DS(DS) ((DS) & 0xfffc)
>>>> +#define OP_LD 58
>>>> +#define OP_STD 62
>>>> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
>>>> index d5c821c22f79..9ba772f46270 100644
>>>> --- a/tools/perf/util/annotate.h
>>>> +++ b/tools/perf/util/annotate.h
>>>> @@ -113,7 +113,10 @@ struct annotation_line {
>>>> struct disasm_line {
>>>> struct ins  ins;
>>>> struct ins_operands  ops;
>>>> -
>>>> + union {
>>>> + u8 bytes[4];
>>>> + u32 raw_insn;
>>>> + } raw;
>>>> /* This needs to be at the end. */
>>>> struct annotation_line  al;
>>>> };
>>>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>>>> index b81cdcf4d6b4..1e8568738b38 100644
>>>> --- a/tools/perf/util/disasm.c
>>>> +++ b/tools/perf/util/disasm.c
>>>> @@ -45,6 +45,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>>>> 
>>>> static void ins__sort(struct arch *arch);
>>>> static int disasm_line__parse(char *line, const char **namep, char **rawp);
>>>> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
>>>> 
>>>> static __attribute__((constructor)) void symbol__init_regexpr(void)
>>>> {
>>>> @@ -844,6 +845,59 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>>>> return -1;
>>>> }
>>>> 
>>>> +/*
>>>> + * Parses the result captured from symbol__disassemble_*
>>>> + * Example, line read from DSO file in powerpc:
>>>> + * line:    38 01 81 e8
>>>> + * opcode: fetched from arch specific get_opcode_insn
>>>> + * rawp_insn: e8810138
>>>> + *
>>>> + * rawp_insn is used later to extract the reg/offset fields
>>>> + */
>>>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>>>> +
>>>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
>>>> +{
>>>> + char *line = dl->al.line;
>>>> + const char **namep = &dl->ins.name;
>>>> + char **rawp = &dl->ops.raw;
>>>> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
>>>> + char *name = skip_spaces(name_raw_insn + 11);
>>>> + int objdump = 0;
>>>> +
>>>> + if (strlen(line) > 11)
>>>> + objdump = 1;
>>>> +
>>>> + if (name_raw_insn[0] == '\0')
>>>> + return -1;
>>>> +
>>>> + if (objdump) {
>>>> + *rawp = name + 1;
>>>> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
>>>> + ++*rawp;
>>>> + tmp = (*rawp)[0];
>>>> + (*rawp)[0] = '\0';
>>>> +
>>>> + *namep = strdup(name);
>>>> + if (*namep == NULL)
>>>> + return -1;
>>>> +
>>>> + (*rawp)[0] = tmp;
>>>> + *rawp = strim(*rawp);
>>>> + } else
>>>> + *namep = "";
> 
> Then can you handle this logic under if (annotate_opts.show_raw_insn)
> in disasm_line__parse() instead of adding a new function?
> 
> Thanks,
> Namhyung

Hi Namhyung,

We discussed to have a per-arch disasm_line_parse() here:
https://lore.kernel.org/all/CAM9d7ci1LDa7moT2qDr2qK+DTNLU6ZBkmROnbdozAjuQLQfNog@mail.gmail.com/#t

So I added it as a new function : disasm_line__parse_powerpc
Since it is not used by other archs, we can go with having new function ?

Thanks
Athira

> 
> 
>>>> +
>>>> + tmp_raw_insn = strdup(name_raw_insn);
>>>> + tmp_raw_insn[11] = '\0';
>>>> + remove_spaces(tmp_raw_insn);
>>>> +
>>>> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
>>>> + if (objdump)
>>>> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
>>> 
>>> Hmm.. can you use a sscanf() instead?
>>> 
>>> sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
>>> 
>>> Thanks,
>>> Namhyung
>>> 
>> Sure will address in V5
>> 
>> Thanks
>> Athira
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> static void annotation_line__init(struct annotation_line *al,
>>>>  struct annotate_args *args,
>>>>  int nr)
>>>> @@ -897,7 +951,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>>>> goto out_delete;
>>>> 
>>>> if (args->offset != -1) {
>>>> - if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>>>> + if (arch__is(args->arch, "powerpc")) {
>>>> + if (disasm_line__parse_powerpc(dl) < 0)
>>>> + goto out_free_line;
>>>> + } else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>>>> goto out_free_line;
>>>> 
>>>> disasm_line__init_ins(dl, args->arch, &args->ms);
>>>> -- 
>>>> 2.43.0
>> 
>> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-06-25 18:39       ` Namhyung Kim
@ 2024-06-26  4:09         ` Athira Rajeev
  0 siblings, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-26  4:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, linux-kernel,
	linux-perf-users, linuxppc-dev, akanksha, maddy, kjain, disgoel



> On 26 Jun 2024, at 12:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Tue, Jun 25, 2024 at 06:08:49PM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 25 Jun 2024, at 10:59 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> On Fri, Jun 14, 2024 at 10:56:18PM +0530, Athira Rajeev wrote:
>>>> Add support to capture and parse raw instruction in powerpc.
>>>> Currently, the perf tool infrastructure uses two ways to disassemble
>>>> and understand the instruction. One is objdump and other option is
>>>> via libcapstone.
>>>> 
>>>> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
>>>> with "objdump" while disassemble. Example from powerpc with this option
>>>> for an instruction address is:
>>>> 
>>>> Snippet from:
>>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>>>> 
>>>> c0000000010224b4: lwz     r10,0(r9)
>>> 
>>> What about removing --no-show-raw-insn and parse the raw byte code in
>>> the output for powerpc?  I think it's better to support normal
>>> annotation together.
>> Hi Namhyung,
>> 
>> Yes, In the other patch in same series, I have added support for normal annotation together.
>> Patch 5 includes changes to work with binary code as well as mneumonic representation.
>> 
>> Example representation using --show-raw-insn in objdump gives result:
>> 
>> 38 01 81 e8 ld r4,312(r1)
>> 
>> Patch5 has changes to use “objdump” with --show-raw-insn to read the raw instruction and also support normal annotation.
> 
> Ok, that's good!
> 
> 
>> In case of data type profiling, with only sort keys, (type, typeoff) there is no need to disassemble and then get raw byte code.
>> Binary code can be read directly from the DSO. Compared to using objdump, directly reading from DSO will be faster in this case.
> 
> Sounds like an optimization.  Then I think you'd better handle the
> general case first and optimize later.  Probably you want to merge
> patch 3 and 4 together.
> 
> Thanks,
> Namhyung

Sure, will do that.

Thanks
Athira
> 
> 
>> In summary, current patchset uses below approach:
>> 
>> 1. Read directly from DSO using dso__data_read_offset if only “type, typeoff” is needed.
>> 2. If in any case reading directly from DSO fails, fallback to using libcapstone. Using libcapstone to read is faster than objdump
>> 3. If libcapstone is not supported, approach will use objdump. Patchset has changes to handle objdump result created with show-raw-ins in powerpc. 
>> 4. Also for normal perf report or perf annotate, approach will use objdump
>> 
>> NOTE:
>> libcapstone is used currently only for reading raw binary code. Disassemble is currently not enabled. While attempting to do cs_disasm, observation is that some of the instructions were not identified (ex: extswsli, maddld) and it had to fallback to use objdump. Hence enabling "cs_disasm" is added in comment section as a TODO for powerpc. Patch number 13. 
>> 
>> Thanks
>> Athira
>> 
>>> 
>>>> 
>>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>>> registers names and offset. Also to find whether there is a memory
>>>> reference in the operands, "memory_ref_char" field of objdump is used.
>>>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>>>> form "mov  (%rax), %rcx".
>>>> 
>>>> In case of powerpc, not all instructions using "(" are the only memory
>>>> instructions. Example, above instruction can also be of extended form (X
>>>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>>>> and extract the source/target registers, patch adds support to use raw
>>>> instruction for powerpc. Approach used is to read the raw instruction
>>>> directly from the DSO file using "dso__data_read_offset" utility which
>>>> is already implemented in perf infrastructure in "util/dso.c".
>>>> 
>>>> Example:
>>>> 
>>>> 38 01 81 e8     ld      r4,312(r1)
>>>> 
>>>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>>>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>>>> as:
>>>> 
>>>>  | 58 |  RT  |  RA |      DS       | |
>>>>  -------------------------------------
>>>>  0    6     11    16              30 31
>>>> 
>>>> Function "symbol__disassemble_dso" is updated to read raw instruction
>>>> directly from DSO using dso__data_read_offset utility. In case of
>>>> above example, this captures:
>>>> line:    38 01 81 e8
>>>> 
>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>> ---
>>>> tools/perf/util/disasm.c | 98 ++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 98 insertions(+)
>>>> 
>>>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>>>> index b5fe3a7508bb..f19496133bf0 100644
>>>> --- a/tools/perf/util/disasm.c
>>>> +++ b/tools/perf/util/disasm.c
>>>> @@ -1586,6 +1586,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>>>> }
>>>> #endif
>>>> 
>>>> +static int symbol__disassemble_dso(char *filename, struct symbol *sym,
>>> 
>>> Maybe rename to symbol__disassemble_raw() ?
>> 
>> This is specifically using dso__data_read_offset. Hence using symbol__disassemble_dso
>>> 
>>>> + struct annotate_args *args)
>>>> +{
>>>> + struct annotation *notes = symbol__annotation(sym);
>>>> + struct map *map = args->ms.map;
>>>> + struct dso *dso = map__dso(map);
>>>> + u64 start = map__rip_2objdump(map, sym->start);
>>>> + u64 end = map__rip_2objdump(map, sym->end);
>>>> + u64 len = end - start;
>>>> + u64 offset;
>>>> + int i, count;
>>>> + u8 *buf = NULL;
>>>> + char disasm_buf[512];
>>>> + struct disasm_line *dl;
>>>> + u32 *line;
>>>> +
>>>> + /* Return if objdump is specified explicitly */
>>>> + if (args->options->objdump_path)
>>>> + return -1;
>>>> +
>>>> + pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);
>>> 
>>> You may want to print the actual offset and remove the "using
>>> dso__data_read_offset" part.
>> 
>> Ok Sure
>>> 
>>> Thanks,
>>> Namhyung
>>> 
>>>> +
>>>> + buf = malloc(len);
>>>> + if (buf == NULL)
>>>> + goto err;
>>>> +
>>>> + count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
>>>> +
>>>> + line = (u32 *)buf;
>>>> +
>>>> + if ((u64)count != len)
>>>> + goto err;
>>>> +
>>>> + /* add the function address and name */
>>>> + scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
>>>> +   start, sym->name);
>>>> +
>>>> + args->offset = -1;
>>>> + args->line = disasm_buf;
>>>> + args->line_nr = 0;
>>>> + args->fileloc = NULL;
>>>> + args->ms.sym = sym;
>>>> +
>>>> + dl = disasm_line__new(args);
>>>> + if (dl == NULL)
>>>> + goto err;
>>>> +
>>>> + annotation_line__add(&dl->al, &notes->src->source);
>>>> +
>>>> + /* Each raw instruction is 4 byte */
>>>> + count = len/4;
>>>> +
>>>> + for (i = 0, offset = 0; i < count; i++) {
>>>> + args->offset = offset;
>>>> + sprintf(args->line, "%x", line[i]);
>>>> + dl = disasm_line__new(args);
>>>> + if (dl == NULL)
>>>> + goto err;
>>>> +
>>>> + annotation_line__add(&dl->al, &notes->src->source);
>>>> + offset += 4;
>>>> + }
>>>> +
>>>> + /* It failed in the middle */
>>>> + if (offset != len) {
>>>> + struct list_head *list = &notes->src->source;
>>>> +
>>>> + /* Discard all lines and fallback to objdump */
>>>> + while (!list_empty(list)) {
>>>> + dl = list_first_entry(list, struct disasm_line, al.node);
>>>> +
>>>> + list_del_init(&dl->al.node);
>>>> + disasm_line__free(dl);
>>>> + }
>>>> + count = -1;
>>>> + }
>>>> +
>>>> +out:
>>>> + free(buf);
>>>> + return count < 0 ? count : 0;
>>>> +
>>>> +err:
>>>> + count = -1;
>>>> + goto out;
>>>> +}
>>>> /*
>>>> * Possibly create a new version of line with tabs expanded. Returns the
>>>> * existing or new line, storage is updated if a new line is allocated. If
>>>> @@ -1710,6 +1795,19 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>>>> strcpy(symfs_filename, tmp);
>>>> }
>>>> 
>>>> + /*
>>>> +  * For powerpc data type profiling, use the dso__data_read_offset
>>>> +  * to read raw instruction directly and interpret the binary code
>>>> +  * to understand instructions and register fields. For sort keys as
>>>> +  * type and typeoff, disassemble to mnemonic notation is
>>>> +  * not required in case of powerpc.
>>>> +  */
>>>> + if (arch__is(args->arch, "powerpc")) {
>>>> + err = symbol__disassemble_dso(symfs_filename, sym, args);
>>>> + if (err == 0)
>>>> + goto out_remove_tmp;
>>>> + }
>>>> +
>>>> #ifdef HAVE_LIBCAPSTONE_SUPPORT
>>>> err = symbol__disassemble_capstone(symfs_filename, sym, args);
>>>> if (err == 0)
>>>> -- 
>>>> 2.43.0
>> 
>> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-26  4:08         ` Athira Rajeev
@ 2024-06-26 21:17           ` Namhyung Kim
  2024-06-27  9:28             ` Athira Rajeev
  2024-06-30 11:10             ` Athira Rajeev
  0 siblings, 2 replies; 40+ messages in thread
From: Namhyung Kim @ 2024-06-26 21:17 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel

Hello,

On Wed, Jun 26, 2024 at 09:38:28AM +0530, Athira Rajeev wrote:
> 
> 
> > On 26 Jun 2024, at 12:15 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > On Tue, Jun 25, 2024 at 06:12:51PM +0530, Athira Rajeev wrote:
> >> 
> >> 
> >>> On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> >>> 
> >>> On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
> >>>> Currently, the perf tool infrastructure disasm_line__parse function to
> >>>> parse disassembled line.
> >>>> 
> >>>> Example snippet from objdump:
> >>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> >>>> 
> >>>> c0000000010224b4: lwz     r10,0(r9)
> >>>> 
> >>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >>>> registers names and offset. In powerpc, the approach for data type
> >>>> profiling uses raw instruction instead of result from objdump to identify
> >>>> the instruction category and extract the source/target registers.
> >>>> 
> >>>> Example: 38 01 81 e8     ld      r4,312(r1)
> >>>> 
> >>>> Here "38 01 81 e8" is the raw instruction representation. Add function
> >>>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
> >>>> Also update "struct disasm_line" to save the binary code/
> >>>> With the change, function captures:
> >>>> 
> >>>> line -> "38 01 81 e8     ld      r4,312(r1)"
> >>>> raw instruction "38 01 81 e8"
> >>>> 
> >>>> Raw instruction is used later to extract the reg/offset fields. Macros
> >>>> are added to extract opcode and register fields. "struct disasm_line"
> >>>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
> >>>> code (raw). Function "disasm_line__parse_powerpc fills the raw
> >>>> instruction hex value and can use macros to get opcode. There is no
> >>>> changes in existing code paths, which parses the disassembled code.
> >>>> The architecture using the instruction name and present approach is
> >>>> not altered. Since this approach targets powerpc, the macro
> >>>> implementation is added for powerpc as of now.
> >>>> 
> >>>> Since the disasm_line__parse is used in other cases (perf annotate) and
> >>>> not only data tye profiling, the powerpc callback includes changes to
> >>>> work with binary code as well as mneumonic representation. Also in case
> >>>> if the DSO read fails and libcapstone is not supported, the approach
> >>>> fallback to use objdump as option. Hence as option, patch has changes to
> >>>> ensure objdump option also works well.
> >>>> 
> >>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> >>>> ---
[SNIP]
> >>>> +/*
> >>>> + * Parses the result captured from symbol__disassemble_*
> >>>> + * Example, line read from DSO file in powerpc:
> >>>> + * line:    38 01 81 e8
> >>>> + * opcode: fetched from arch specific get_opcode_insn
> >>>> + * rawp_insn: e8810138
> >>>> + *
> >>>> + * rawp_insn is used later to extract the reg/offset fields
> >>>> + */
> >>>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
> >>>> +
> >>>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
> >>>> +{
> >>>> + char *line = dl->al.line;
> >>>> + const char **namep = &dl->ins.name;
> >>>> + char **rawp = &dl->ops.raw;
> >>>> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
> >>>> + char *name = skip_spaces(name_raw_insn + 11);
> >>>> + int objdump = 0;
> >>>> +
> >>>> + if (strlen(line) > 11)
> >>>> + objdump = 1;
> >>>> +
> >>>> + if (name_raw_insn[0] == '\0')
> >>>> + return -1;
> >>>> +
> >>>> + if (objdump) {
> >>>> + *rawp = name + 1;
> >>>> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
> >>>> + ++*rawp;
> >>>> + tmp = (*rawp)[0];
> >>>> + (*rawp)[0] = '\0';
> >>>> +
> >>>> + *namep = strdup(name);
> >>>> + if (*namep == NULL)
> >>>> + return -1;
> >>>> +
> >>>> + (*rawp)[0] = tmp;
> >>>> + *rawp = strim(*rawp);
> >>>> + } else
> >>>> + *namep = "";
> > 
> > Then can you handle this logic under if (annotate_opts.show_raw_insn)
> > in disasm_line__parse() instead of adding a new function?
> > 
> > Thanks,
> > Namhyung
> 
> Hi Namhyung,
> 
> We discussed to have a per-arch disasm_line_parse() here:
> https://lore.kernel.org/all/CAM9d7ci1LDa7moT2qDr2qK+DTNLU6ZBkmROnbdozAjuQLQfNog@mail.gmail.com/#t
> 
> So I added it as a new function : disasm_line__parse_powerpc
> Since it is not used by other archs, we can go with having new function ?

Ok, I thought it'd be quite different from disasm_line__parse() but it
seems that it's mostly similar except for the raw insn.  So I think it's
better to add the logic to the generic disasm_line__parse().  Sorry for
the inconvenience.

Thanks,
Namhyung

> >>>> +
> >>>> + tmp_raw_insn = strdup(name_raw_insn);
> >>>> + tmp_raw_insn[11] = '\0';
> >>>> + remove_spaces(tmp_raw_insn);
> >>>> +
> >>>> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
> >>>> + if (objdump)
> >>>> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
> >>> 
> >>> Hmm.. can you use a sscanf() instead?
> >>> 
> >>> sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
> >>> 
> >>> Thanks,
> >>> Namhyung
> >>> 
> >> Sure will address in V5
> >> 
> >> Thanks
> >> Athira
> >>>> +
> >>>> + return 0;
> >>>> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-26 21:17           ` Namhyung Kim
@ 2024-06-27  9:28             ` Athira Rajeev
  2024-06-30 11:10             ` Athira Rajeev
  1 sibling, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-27  9:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel



> On 27 Jun 2024, at 2:47 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Wed, Jun 26, 2024 at 09:38:28AM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 26 Jun 2024, at 12:15 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> On Tue, Jun 25, 2024 at 06:12:51PM +0530, Athira Rajeev wrote:
>>>> 
>>>> 
>>>>> On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>> 
>>>>> On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
>>>>>> Currently, the perf tool infrastructure disasm_line__parse function to
>>>>>> parse disassembled line.
>>>>>> 
>>>>>> Example snippet from objdump:
>>>>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>>>>>> 
>>>>>> c0000000010224b4: lwz     r10,0(r9)
>>>>>> 
>>>>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>>>>> registers names and offset. In powerpc, the approach for data type
>>>>>> profiling uses raw instruction instead of result from objdump to identify
>>>>>> the instruction category and extract the source/target registers.
>>>>>> 
>>>>>> Example: 38 01 81 e8     ld      r4,312(r1)
>>>>>> 
>>>>>> Here "38 01 81 e8" is the raw instruction representation. Add function
>>>>>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
>>>>>> Also update "struct disasm_line" to save the binary code/
>>>>>> With the change, function captures:
>>>>>> 
>>>>>> line -> "38 01 81 e8     ld      r4,312(r1)"
>>>>>> raw instruction "38 01 81 e8"
>>>>>> 
>>>>>> Raw instruction is used later to extract the reg/offset fields. Macros
>>>>>> are added to extract opcode and register fields. "struct disasm_line"
>>>>>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
>>>>>> code (raw). Function "disasm_line__parse_powerpc fills the raw
>>>>>> instruction hex value and can use macros to get opcode. There is no
>>>>>> changes in existing code paths, which parses the disassembled code.
>>>>>> The architecture using the instruction name and present approach is
>>>>>> not altered. Since this approach targets powerpc, the macro
>>>>>> implementation is added for powerpc as of now.
>>>>>> 
>>>>>> Since the disasm_line__parse is used in other cases (perf annotate) and
>>>>>> not only data tye profiling, the powerpc callback includes changes to
>>>>>> work with binary code as well as mneumonic representation. Also in case
>>>>>> if the DSO read fails and libcapstone is not supported, the approach
>>>>>> fallback to use objdump as option. Hence as option, patch has changes to
>>>>>> ensure objdump option also works well.
>>>>>> 
>>>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>>>> ---
> [SNIP]
>>>>>> +/*
>>>>>> + * Parses the result captured from symbol__disassemble_*
>>>>>> + * Example, line read from DSO file in powerpc:
>>>>>> + * line:    38 01 81 e8
>>>>>> + * opcode: fetched from arch specific get_opcode_insn
>>>>>> + * rawp_insn: e8810138
>>>>>> + *
>>>>>> + * rawp_insn is used later to extract the reg/offset fields
>>>>>> + */
>>>>>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>>>>>> +
>>>>>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
>>>>>> +{
>>>>>> + char *line = dl->al.line;
>>>>>> + const char **namep = &dl->ins.name;
>>>>>> + char **rawp = &dl->ops.raw;
>>>>>> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
>>>>>> + char *name = skip_spaces(name_raw_insn + 11);
>>>>>> + int objdump = 0;
>>>>>> +
>>>>>> + if (strlen(line) > 11)
>>>>>> + objdump = 1;
>>>>>> +
>>>>>> + if (name_raw_insn[0] == '\0')
>>>>>> + return -1;
>>>>>> +
>>>>>> + if (objdump) {
>>>>>> + *rawp = name + 1;
>>>>>> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
>>>>>> + ++*rawp;
>>>>>> + tmp = (*rawp)[0];
>>>>>> + (*rawp)[0] = '\0';
>>>>>> +
>>>>>> + *namep = strdup(name);
>>>>>> + if (*namep == NULL)
>>>>>> + return -1;
>>>>>> +
>>>>>> + (*rawp)[0] = tmp;
>>>>>> + *rawp = strim(*rawp);
>>>>>> + } else
>>>>>> + *namep = "";
>>> 
>>> Then can you handle this logic under if (annotate_opts.show_raw_insn)
>>> in disasm_line__parse() instead of adding a new function?
>>> 
>>> Thanks,
>>> Namhyung
>> 
>> Hi Namhyung,
>> 
>> We discussed to have a per-arch disasm_line_parse() here:
>> https://lore.kernel.org/all/CAM9d7ci1LDa7moT2qDr2qK+DTNLU6ZBkmROnbdozAjuQLQfNog@mail.gmail.com/#t
>> 
>> So I added it as a new function : disasm_line__parse_powerpc
>> Since it is not used by other archs, we can go with having new function ?
> 
> Ok, I thought it'd be quite different from disasm_line__parse() but it
> seems that it's mostly similar except for the raw insn.  So I think it's
> better to add the logic to the generic disasm_line__parse().  Sorry for
> the inconvenience.
> 
> Thanks,
> Namhyung

Sure

Thanks
Athira
> 
>>>>>> +
>>>>>> + tmp_raw_insn = strdup(name_raw_insn);
>>>>>> + tmp_raw_insn[11] = '\0';
>>>>>> + remove_spaces(tmp_raw_insn);
>>>>>> +
>>>>>> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
>>>>>> + if (objdump)
>>>>>> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
>>>>> 
>>>>> Hmm.. can you use a sscanf() instead?
>>>>> 
>>>>> sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
>>>>> 
>>>>> Thanks,
>>>>> Namhyung
>>>>> 
>>>> Sure will address in V5
>>>> 
>>>> Thanks
>>>> Athira
>>>>>> +
>>>>>> + return 0;
>>>>>> +}



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-06-26 21:17           ` Namhyung Kim
  2024-06-27  9:28             ` Athira Rajeev
@ 2024-06-30 11:10             ` Athira Rajeev
  1 sibling, 0 replies; 40+ messages in thread
From: Athira Rajeev @ 2024-06-30 11:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter, Ian Rogers,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Disha Goel



> On 27 Jun 2024, at 2:47 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Wed, Jun 26, 2024 at 09:38:28AM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 26 Jun 2024, at 12:15 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> On Tue, Jun 25, 2024 at 06:12:51PM +0530, Athira Rajeev wrote:
>>>> 
>>>> 
>>>>> On 25 Jun 2024, at 11:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>> 
>>>>> On Fri, Jun 14, 2024 at 10:56:20PM +0530, Athira Rajeev wrote:
>>>>>> Currently, the perf tool infrastructure disasm_line__parse function to
>>>>>> parse disassembled line.
>>>>>> 
>>>>>> Example snippet from objdump:
>>>>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>>>>>> 
>>>>>> c0000000010224b4: lwz     r10,0(r9)
>>>>>> 
>>>>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>>>>> registers names and offset. In powerpc, the approach for data type
>>>>>> profiling uses raw instruction instead of result from objdump to identify
>>>>>> the instruction category and extract the source/target registers.
>>>>>> 
>>>>>> Example: 38 01 81 e8     ld      r4,312(r1)
>>>>>> 
>>>>>> Here "38 01 81 e8" is the raw instruction representation. Add function
>>>>>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
>>>>>> Also update "struct disasm_line" to save the binary code/
>>>>>> With the change, function captures:
>>>>>> 
>>>>>> line -> "38 01 81 e8     ld      r4,312(r1)"
>>>>>> raw instruction "38 01 81 e8"
>>>>>> 
>>>>>> Raw instruction is used later to extract the reg/offset fields. Macros
>>>>>> are added to extract opcode and register fields. "struct disasm_line"
>>>>>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
>>>>>> code (raw). Function "disasm_line__parse_powerpc fills the raw
>>>>>> instruction hex value and can use macros to get opcode. There is no
>>>>>> changes in existing code paths, which parses the disassembled code.
>>>>>> The architecture using the instruction name and present approach is
>>>>>> not altered. Since this approach targets powerpc, the macro
>>>>>> implementation is added for powerpc as of now.
>>>>>> 
>>>>>> Since the disasm_line__parse is used in other cases (perf annotate) and
>>>>>> not only data tye profiling, the powerpc callback includes changes to
>>>>>> work with binary code as well as mneumonic representation. Also in case
>>>>>> if the DSO read fails and libcapstone is not supported, the approach
>>>>>> fallback to use objdump as option. Hence as option, patch has changes to
>>>>>> ensure objdump option also works well.
>>>>>> 
>>>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>>>> ---
> [SNIP]
>>>>>> +/*
>>>>>> + * Parses the result captured from symbol__disassemble_*
>>>>>> + * Example, line read from DSO file in powerpc:
>>>>>> + * line:    38 01 81 e8
>>>>>> + * opcode: fetched from arch specific get_opcode_insn
>>>>>> + * rawp_insn: e8810138
>>>>>> + *
>>>>>> + * rawp_insn is used later to extract the reg/offset fields
>>>>>> + */
>>>>>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>>>>>> +
>>>>>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
>>>>>> +{
>>>>>> + char *line = dl->al.line;
>>>>>> + const char **namep = &dl->ins.name;
>>>>>> + char **rawp = &dl->ops.raw;
>>>>>> + char tmp, *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
>>>>>> + char *name = skip_spaces(name_raw_insn + 11);
>>>>>> + int objdump = 0;
>>>>>> +
>>>>>> + if (strlen(line) > 11)
>>>>>> + objdump = 1;
>>>>>> +
>>>>>> + if (name_raw_insn[0] == '\0')
>>>>>> + return -1;
>>>>>> +
>>>>>> + if (objdump) {
>>>>>> + *rawp = name + 1;
>>>>>> + while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
>>>>>> + ++*rawp;
>>>>>> + tmp = (*rawp)[0];
>>>>>> + (*rawp)[0] = '\0';
>>>>>> +
>>>>>> + *namep = strdup(name);
>>>>>> + if (*namep == NULL)
>>>>>> + return -1;
>>>>>> +
>>>>>> + (*rawp)[0] = tmp;
>>>>>> + *rawp = strim(*rawp);
>>>>>> + } else
>>>>>> + *namep = "";
>>> 
>>> Then can you handle this logic under if (annotate_opts.show_raw_insn)
>>> in disasm_line__parse() instead of adding a new function?
>>> 
>>> Thanks,
>>> Namhyung
>> 
>> Hi Namhyung,
>> 
>> We discussed to have a per-arch disasm_line_parse() here:
>> https://lore.kernel.org/all/CAM9d7ci1LDa7moT2qDr2qK+DTNLU6ZBkmROnbdozAjuQLQfNog@mail.gmail.com/#t
>> 
>> So I added it as a new function : disasm_line__parse_powerpc
>> Since it is not used by other archs, we can go with having new function ?
> 
> Ok, I thought it'd be quite different from disasm_line__parse() but it
> seems that it's mostly similar except for the raw insn.  So I think it's
> better to add the logic to the generic disasm_line__parse().  Sorry for
> the inconvenience.
> 
> Thanks,
> Namhyung

Hi Namhyung

I think it’s better to keep it as separate function. The field for raw_insn and length differs from arch to arch ( powerpc is 32 bit instruction) .
Also to handle cases where no objdump is used and only directly read from DSO, disasm_line_parse needs some changes.
Later for other archs to adopt DSO read or raw instruction or use different way, its better we keep it separate function. 
I will have it as separate function in V5 and include comment on what is the specific changes done compared to generic disasm_line__parse.

Thanks
Athira


> 
>>>>>> +
>>>>>> + tmp_raw_insn = strdup(name_raw_insn);
>>>>>> + tmp_raw_insn[11] = '\0';
>>>>>> + remove_spaces(tmp_raw_insn);
>>>>>> +
>>>>>> + dl->raw.raw_insn = strtol(tmp_raw_insn, NULL, 16);
>>>>>> + if (objdump)
>>>>>> + dl->raw.raw_insn = be32_to_cpu(strtol(tmp_raw_insn, NULL, 16));
>>>>> 
>>>>> Hmm.. can you use a sscanf() instead?
>>>>> 
>>>>> sscanf(line, "%x %x %x %x", &dl->raw.bytes[0], &dl->raw.bytes[1], ...)
>>>>> 
>>>>> Thanks,
>>>>> Namhyung
>>>>> 
>>>> Sure will address in V5
>>>> 
>>>> Thanks
>>>> Athira
>>>>>> +
>>>>>> + return 0;
>>>>>> +}



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-06-30 11:11 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-14 17:26 [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
2024-06-14 17:26 ` [V4 01/16] tools/perf: Move the data structures related to register type to header file Athira Rajeev
2024-06-25  5:15   ` Namhyung Kim
2024-06-25 10:54     ` Athira Rajeev
2024-06-14 17:26 ` [V4 02/16] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
2024-06-14 17:26 ` [V4 03/16] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
2024-06-25  5:29   ` Namhyung Kim
2024-06-25 12:38     ` Athira Rajeev
2024-06-25 18:39       ` Namhyung Kim
2024-06-26  4:09         ` Athira Rajeev
2024-06-14 17:26 ` [V4 04/16] tools/perf: Use sort keys to determine whether to pick objdump to disassemble Athira Rajeev
2024-06-25  5:32   ` Namhyung Kim
2024-06-14 17:26 ` [V4 05/16] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
2024-06-25  5:39   ` Namhyung Kim
2024-06-25 12:42     ` Athira Rajeev
2024-06-25 18:45       ` Namhyung Kim
2024-06-26  4:08         ` Athira Rajeev
2024-06-26 21:17           ` Namhyung Kim
2024-06-27  9:28             ` Athira Rajeev
2024-06-30 11:10             ` Athira Rajeev
2024-06-14 17:26 ` [V4 06/16] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
2024-06-25  6:00   ` Namhyung Kim
2024-06-25 12:43     ` Athira Rajeev
2024-06-14 17:26 ` [V4 07/16] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc Athira Rajeev
2024-06-14 17:26 ` [V4 08/16] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
2024-06-14 17:26 ` [V4 09/16] tools/perf: Add more instructions for instruction tracking Athira Rajeev
2024-06-14 17:26 ` [V4 10/16] tools/perf: Update instruction tracking for powerpc Athira Rajeev
2024-06-14 17:26 ` [V4 11/16] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
2024-06-14 17:26 ` [V4 12/16] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
2024-06-14 17:26 ` [V4 13/16] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
2024-06-25  6:08   ` Namhyung Kim
2024-06-25 12:44     ` Athira Rajeev
2024-06-14 17:26 ` [V4 14/16] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
2024-06-25  6:17   ` Namhyung Kim
2024-06-25 12:45     ` Athira Rajeev
2024-06-14 17:26 ` [V4 15/16] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
2024-06-14 17:26 ` [V4 16/16] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
2024-06-20 15:31 ` [V4 00/16] Add data type profiling support for powerpc Athira Rajeev
2024-06-22  0:06   ` Namhyung Kim
2024-06-25 11:48     ` Athira Rajeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).