* [PATCH V8 00/15] Add data type profiling support for powerpc
@ 2024-07-18  8:43 Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 01/15] tools/perf: Move the data structures related to register type to header file Athira Rajeev
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
The patchset from Namhyung added support for data type profiling
in perf tool. This enabled support to associate PMU samples to data
types they refer using DWARF debug information. With the upstream
perf, currently it possible to run perf report or perf annotate to
view the data type information on x86.
Initial patchset posted here had changes need to enable data type
profiling support for powerpc.
https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
Main change were:
1. powerpc instruction nmemonic table to associate load/store
instructions with move_ops which is use to identify if instruction
is a memory access one.
2. To get register number and access offset from the given
instruction, code uses fields from "struct arch" -> objump.
Added entry for powerpc here.
3. A get_arch_regnum to return register number from the
register name string.
But the apporach used in the initial patchset used parsing of
disassembled code which the current perf tool implementation does.
Example: lwz     r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".
In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, second patchset added support to use
raw instruction. With raw instruction, macros are added to extract opcode
and register fields.
Link to second patchset:
https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
Example representation using --show-raw-insn in objdump gives result:
38 01 81 e8     ld      r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
  _____________________________________
  | 58 |  RT  |  RA |      DS       | |
  -------------------------------------
0    6     11    16              30 31
Second patchset used "objdump" again to read the raw instruction.
But since there is no need to disassemble and binary code can be read
directly from the DSO, third patchset (ie this patchset) uses below
apporach. The apporach preferred in powerpc to parse sample for data
type profiling in V3 patchset is:
- Read directly from DSO using dso__data_read_offset
- If that fails for any case, fallback to using libcapstone
- If libcapstone is not supported, approach will use objdump
Patchset adds support to pick the opcode and reg fields from this
raw/binary instruction code. This approach came in from review comment
by Segher Boessenkool and Christophe for the initial patchset.
Apart from that, instruction tracking is enabled for powerpc.
These are handled in this patchset.
- Patch 1 is to rearrange register state type structures to header file
so that it can referred from other arch specific files
- Patch 2 is to make instruction tracking as a callback to"struct arch"
so that it can be implemented by other archs easily and defined in arch
specific files
- Patch 3 is to handle state type regs array size for x86 and powerpc
- Patch 4 adds support to capture and parse raw instruction in powerpc
using dso__data_read_offset utility
- Patch 4 also adds logic to support using objdump when doing default "perf
report" or "perf annotate" since it that needs disassembled instruction.
- Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
- Patch 6 update parameters for reg extract functions to use raw
instruction on powerpc
- Patch 7 updates ins__find to carry raw_insn and also adds parse
callback for memory instructions for powerpc
- Patch 8 add support to identify memory instructions of opcode 31 in
powerpc
- Patch 9 adds more instructions to support instruction tracking in powerpc
- Patch 10 and 11 handles instruction tracking for powerpc.
- Patch 12, 13 and 14 add support to use libcapstone in powerpc
- Patch 15 handles insn-stat option for perf annotate
Note:
- There are remaining unknowns (25%) as seen in annotate Instruction stats
below.
- This patchset is not tested on powerpc32. In next step of enhancements
along with handling remaining unknowns, plan to cover powerpc32 changes
based on how testing goes.
- Patch changes to handle global register support will be posted in
follow up after addressing review comment from Namhung here:
https://lore.kernel.org/linux-perf-users/CAM9d7cg5MYvLeOoBuKqp1pw7uvRfqCw1fXpLtgct0npL96JaYg@mail.gmail.com/T/#mb309184ebf5d2c9b539cd86d61ffe2a3696eac71
With the current patchset:
 ./perf record -a -e mem-loads sleep 1
 ./perf report -s type,typeoff --hierarchy --group --stdio
 ./perf annotate --data-type --insn-stat
perf annotate logs:
==================
Annotate Instruction stats
total 609, ok 266 (43.7%), bad 343 (56.3%)
  Name/opcode         :  Good   Bad
-----------------------------------------------------------
  58                  :   172   231
  32                  :    48    44
  34                  :    19    25
  OP_31_XOP_LDX       :     8    20
  40                  :     9    14
  OP_31_XOP_LWARX     :     5     1
  OP_31_XOP_LWZX      :     2     3
  OP_31_XOP_LDARX     :     3     0
  33                  :     0     2
  OP_31_XOP_LBZX      :     0     1
  OP_31_XOP_LWAX      :     0     1
  OP_31_XOP_LHZX      :     0     1
 
perf report logs:
=================
  Total Lost Samples: 0
  Samples: 1K of event 'mem-loads'
  Event count (approx.): 937238
    Overhead  Data Type / Data Type Offset
    ........  .........  ................
     1.73%     struct task_struct
        1.29%     struct task_struct +276 (flags)
        0.10%     struct task_struct +1872 (mm)
        0.08%     struct task_struct +3632 (thread.regs)
        0.08%     struct task_struct +1072 (sched_class)
        0.05%     struct task_struct +1928 ()
        0.03%     struct task_struct +1064 (dl_server)
        0.02%     struct task_struct +0 (thread_info.preempt_count)
        0.02%     struct task_struct +120 (thread_info)
        0.02%     struct task_struct +2216 (worker_private)
        0.02%     struct task_struct +24 (thread_info.slb_preload_nr)
        0.02%     struct task_struct +8 (thread_info.local_flags)
        0.01%     struct task_struct +1416 (nr_cpus_allowed)
     1.68%     struct file
        0.58%     struct file +176 (f_op)
        0.53%     struct file +24 (f_count.counter)
        0.46%     struct file +20 (f_mode)
        0.11%     struct file +200 (private_data)
     1.63%     struct rq
        0.76%     struct rq +2856 (cpu)
        0.21%     struct rq +2592 (curr)
        0.11%     struct rq +2632 (clock_update_flags)
        0.08%     struct rq +2600 (idle)
        0.06%     struct rq +2688 (clock_task)
        0.05%     struct rq +16 (numa_migrate_on)
        0.04%     struct rq +2864 (cfs_tasks.next)
Thanks
Athira Rajeev
Changelog:
From v7 -> v8:
  Split patches 1 to 14 and patch 18 as basic patches.
  Patch 15, 16 and 17 is related to global register handling
  and will be handled separately.
  Added Reviewed-and-tested-by from Kajol Jain
  Added Reviewed-by from Namhyung for patches 1 to 14, patch 18
From v6 -> v7:
- Addressed review comments from Namhyung
  Changed format string space to %-20s while printing
  instruction stats in patch 18.
  Use cmp_null in patch 17 while comparing var_name to
  properly sort with correct order.
From v5 -> v6:
- Addressed review comments from Namhyung
  Conditionally define TYPE_STATE_MAX_REGS based on arch.
  Added macro for defining width of the raw codes and spaces
  in disasm_line__parse_powerpc.
  Call disasm_line__parse from disasm_line__parse_powerpc
  for generic code.
  Renamed symbol__disassemble_dso to symbol__disassemble_raw.
  Fixed find_data_type_global_reg to correclty free var_types
  and change indent level.
  Fixed data_type_cmp and sort__typeoff_sort to include var_name
  in comparing data type entries.
  
From v4 -> v5:
- Addressed review comments from Namhyung
  Handle max number of type state regs as 16 for x86 and 32 for
  powerpc.
  Added generic support for objdump patch first and DSO read
  optimisation next
  combined patch 3 and patch 4 in patchseries V4 to one patch
  Changed reference for "raw_insn" to use "u32"
  Splitted "parse" callback patch changes and "ins__find" patch
  changes into two
  Instead of making weak function, added get_powerpc_regs to
  extract register and offset fields for powerpc
- Addressed complation fail when "dwarf.h" is not present ie
  elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT
  when including functions that use Dwarf references. Also
  conditionally include some of the header files.
From v3->v4:
- Addressed review comments from Ian by using capston_init from
  "util/print_insn.c" instead of "open_capston_handle".
- Addressed review comment from Namhyung by moving "opcode"
  field from "struct ins" to "struct disasm_line"
From v2->v3:
- Addressed review comments from Christophe and Namhyung for V2
- Changed the apporach in powerpc to parse sample for data
  type profiling as:
  Read directly from DSO using dso__data_read_offset
  If that fails for any case, fallback to using libcapstone
  If libcapstone is not supported, approach will use objdump
- Include instructions with opcode as 31 and correctly categorize
  them as memory or arithmetic instructions.
- Include more instructions for instruction tracking in powerpc
From v1->v2:
- Addressed suggestion from Christophe Leroy and Segher Boessenkool
  to use the binary code (raw insn) to fetch opcode, register and
  offset fields.
- Added support for instruction tracking in powerpc
- Find the register defined variables (r13 and r1 which points to
  local_paca and current_stack_pointer in powerpc)
Athira Rajeev (15):
  tools/perf: Move the data structures related to register type to
    header file
  tools/perf: Add "update_insn_state" callback function to handle arch
    specific instruction tracking
  tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in
    powerpc
  tools/perf: Add disasm_line__parse to parse raw instruction for
    powerpc
  tools/perf: Add support to capture and parse raw instruction in
    powerpc using dso__data_read_offset utility
  tools/perf: Update parameters for reg extract functions to use raw
    instruction on powerpc
  tools/perf: Add parse function for memory instructions in powerpc
  tools/perf: Add support to identify memory instructions of opcode 31
    in powerpc
  tools/perf: Add some of the arithmetic instructions to support
    instruction tracking in powerpc
  tools/perf: Add more instructions for instruction tracking
  tools/perf: Update instruction tracking for powerpc
  tools/perf: Make capstone_init non-static so that it can be used
    during symbol disassemble
  tools/perf: Use capstone_init and remove open_capstone_handle from
    disasm.c
  tools/perf: Add support to use libcapstone in powerpc
  tools/perf: Set instruction name to be used with insn-stat when using
    raw instruction
 tools/include/linux/string.h                  |   2 +
 tools/lib/string.c                            |  13 +
 tools/perf/arch/arm64/annotate/instructions.c |   3 +-
 .../arch/loongarch/annotate/instructions.c    |   6 +-
 .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++++
 tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
 tools/perf/arch/s390/annotate/instructions.c  |   5 +-
 tools/perf/arch/x86/annotate/instructions.c   | 377 ++++++++++++++
 tools/perf/builtin-annotate.c                 |   4 +-
 tools/perf/util/annotate-data.c               | 453 +----------------
 tools/perf/util/annotate-data.h               |  83 ++++
 tools/perf/util/annotate.c                    |  21 +-
 tools/perf/util/annotate.h                    |   5 +-
 tools/perf/util/disasm.c                      | 468 ++++++++++++++++--
 tools/perf/util/disasm.h                      |  19 +-
 tools/perf/util/include/dwarf-regs.h          |  11 +
 tools/perf/util/print_insn.c                  |  15 +-
 tools/perf/util/print_insn.h                  |   5 +
 18 files changed, 1301 insertions(+), 496 deletions(-)
-- 
2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH V8 01/15] tools/perf: Move the data structures related to register type to header file
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 02/15] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Data type profiling uses instruction tracking by checking each
instruction and updating the register type state in some data
structures. This is useful to find the data type in cases when the
register state gets transferred from one reg to another. Example, in
x86, "mov" instruction and in powerpc, "mr" instruction. Currently these
structures are defined in annotate-data.c and instruction tracking is
implemented only for x86. Move these data structures to
"annotate-data.h" header file so that other arch implementations can use
it in arch specific files as well.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c | 53 +------------------------------
 tools/perf/util/annotate-data.h | 56 +++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 52 deletions(-)
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 965da6c0b542..a4c7f98a75e3 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -31,15 +31,6 @@
 
 static void delete_var_types(struct die_var_type *var_types);
 
-enum type_state_kind {
-	TSR_KIND_INVALID = 0,
-	TSR_KIND_TYPE,
-	TSR_KIND_PERCPU_BASE,
-	TSR_KIND_CONST,
-	TSR_KIND_POINTER,
-	TSR_KIND_CANARY,
-};
-
 #define pr_debug_dtp(fmt, ...)					\
 do {								\
 	if (debug_type_profile)					\
@@ -140,49 +131,7 @@ static void pr_debug_location(Dwarf_Die *die, u64 pc, int reg)
 	}
 }
 
-/*
- * Type information in a register, valid when @ok is true.
- * The @caller_saved registers are invalidated after a function call.
- */
-struct type_state_reg {
-	Dwarf_Die type;
-	u32 imm_value;
-	bool ok;
-	bool caller_saved;
-	u8 kind;
-};
-
-/* Type information in a stack location, dynamically allocated */
-struct type_state_stack {
-	struct list_head list;
-	Dwarf_Die type;
-	int offset;
-	int size;
-	bool compound;
-	u8 kind;
-};
-
-/* FIXME: This should be arch-dependent */
-#define TYPE_STATE_MAX_REGS  16
-
-/*
- * State table to maintain type info in each register and stack location.
- * It'll be updated when new variable is allocated or type info is moved
- * to a new location (register or stack).  As it'd be used with the
- * shortest path of basic blocks, it only maintains a single table.
- */
-struct type_state {
-	/* state of general purpose registers */
-	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
-	/* state of stack location */
-	struct list_head stack_vars;
-	/* return value register */
-	int ret_reg;
-	/* stack pointer register */
-	int stack_reg;
-};
-
-static bool has_reg_type(struct type_state *state, int reg)
+bool has_reg_type(struct type_state *state, int reg)
 {
 	return (unsigned)reg < ARRAY_SIZE(state->regs);
 }
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 0a57d9f5ee78..cdb5cd8960bb 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -6,6 +6,11 @@
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include "annotate.h"
+
+#ifdef HAVE_DWARF_SUPPORT
+#include "debuginfo.h"
+#endif
 
 struct annotated_op_loc;
 struct debuginfo;
@@ -15,6 +20,15 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+enum type_state_kind {
+	TSR_KIND_INVALID = 0,
+	TSR_KIND_TYPE,
+	TSR_KIND_PERCPU_BASE,
+	TSR_KIND_CONST,
+	TSR_KIND_POINTER,
+	TSR_KIND_CANARY,
+};
+
 /**
  * struct annotated_member - Type of member field
  * @node: List entry in the parent list
@@ -143,6 +157,47 @@ struct annotated_data_stat {
 extern struct annotated_data_stat ann_data_stat;
 
 #ifdef HAVE_DWARF_SUPPORT
+/*
+ * Type information in a register, valid when @ok is true.
+ * The @caller_saved registers are invalidated after a function call.
+ */
+struct type_state_reg {
+	Dwarf_Die type;
+	u32 imm_value;
+	bool ok;
+	bool caller_saved;
+	u8 kind;
+};
+
+/* Type information in a stack location, dynamically allocated */
+struct type_state_stack {
+	struct list_head list;
+	Dwarf_Die type;
+	int offset;
+	int size;
+	bool compound;
+	u8 kind;
+};
+
+/* FIXME: This should be arch-dependent */
+#define TYPE_STATE_MAX_REGS  16
+
+/*
+ * State table to maintain type info in each register and stack location.
+ * It'll be updated when new variable is allocated or type info is moved
+ * to a new location (register or stack).  As it'd be used with the
+ * shortest path of basic blocks, it only maintains a single table.
+ */
+struct type_state {
+	/* state of general purpose registers */
+	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
+	/* state of stack location */
+	struct list_head stack_vars;
+	/* return value register */
+	int ret_reg;
+	/* stack pointer register */
+	int stack_reg;
+};
 
 /* Returns data type at the location (ip, reg, offset) */
 struct annotated_data_type *find_data_type(struct data_loc_info *dloc);
@@ -160,6 +215,7 @@ void global_var_type__tree_delete(struct rb_root *root);
 
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
+bool has_reg_type(struct type_state *state, int reg);
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 02/15] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 01/15] tools/perf: Move the data structures related to register type to header file Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc Athira Rajeev
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Add "update_insn_state" callback to "struct arch" to handle instruction
tracking. Currently updating instruction state is handled by static
function "update_insn_state_x86" which is defined in "annotate-data.c".
Make this as a callback for specific arch and move to archs specific
file "arch/x86/annotate/instructions.c" . This will help to add helper
function for other platforms in file:
"arch/<platform>/annotate/instructions.c and make changes/updates
easier.
Define callback "update_insn_state" as part of "struct arch", also make
some of the debug functions non-static so that it can be referenced from
other places.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/x86/annotate/instructions.c | 377 +++++++++++++++++++
 tools/perf/util/annotate-data.c             | 391 +-------------------
 tools/perf/util/annotate-data.h             |  23 ++
 tools/perf/util/disasm.c                    |   4 +
 tools/perf/util/disasm.h                    |  12 +
 5 files changed, 424 insertions(+), 383 deletions(-)
diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c
index 5cdf457f5cbe..7b7d462c6c6b 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -206,3 +206,380 @@ static int x86__annotate_init(struct arch *arch, char *cpuid)
 	arch->initialized = true;
 	return err;
 }
+
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_x86(struct type_state *state,
+				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				  struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int fbreg = dloc->fbreg;
+	int fboff = 0;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	if (ins__is_call(&dl->ins)) {
+		struct symbol *func = dl->ops.target.sym;
+
+		if (func == NULL)
+			return;
+
+		/* __fentry__ will preserve all registers */
+		if (!strcmp(func->name, "__fentry__"))
+			return;
+
+		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
+
+		/* Otherwise invalidate caller-saved registers after call */
+		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
+			if (state->regs[i].caller_saved)
+				state->regs[i].ok = false;
+		}
+
+		/* Update register with the return type (if any) */
+		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
+			tsr = &state->regs[state->ret_reg];
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("call [%x] return -> reg%d",
+				     insn_offset, state->ret_reg);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		return;
+	}
+
+	if (!strncmp(dl->ins.name, "add", 3)) {
+		u64 imm_value = -1ULL;
+		int offset;
+		const char *var_name = NULL;
+		struct map_symbol *ms = dloc->ms;
+		u64 ip = ms->sym->start + dl->al.offset;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+		if (src->imm)
+			imm_value = src->offset;
+		else if (has_reg_type(state, src->reg1) &&
+			 state->regs[src->reg1].kind == TSR_KIND_CONST)
+			imm_value = state->regs[src->reg1].imm_value;
+		else if (src->reg1 == DWARF_REG_PC) {
+			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
+							   src->offset, dl);
+
+			if (get_global_var_info(dloc, var_addr,
+						&var_name, &offset) &&
+			    !strcmp(var_name, "this_cpu_off") &&
+			    tsr->kind == TSR_KIND_CONST) {
+				tsr->kind = TSR_KIND_PERCPU_BASE;
+				imm_value = tsr->imm_value;
+			}
+		}
+		else
+			return;
+
+		if (tsr->kind != TSR_KIND_PERCPU_BASE)
+			return;
+
+		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
+					&type_die) && offset == 0) {
+			/*
+			 * This is not a pointer type, but it should be treated
+			 * as a pointer.
+			 */
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_POINTER;
+			tsr->ok = true;
+
+			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
+				     insn_offset, imm_value, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		return;
+	}
+
+	if (strncmp(dl->ins.name, "mov", 3))
+		return;
+
+	if (dloc->fb_cfa) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+			fbreg = -1;
+	}
+
+	/* Case 1. register to register or segment:offset to register transfers */
+	if (!src->mem_ref && !dst->mem_ref) {
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+		if (dso__kernel(map__dso(dloc->ms->map)) &&
+		    src->segment == INSN_SEG_X86_GS && src->imm) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr;
+			int offset;
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			var_addr = src->offset;
+
+			if (var_addr == 40) {
+				tsr->kind = TSR_KIND_CANARY;
+				tsr->ok = true;
+
+				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
+					     insn_offset, dst->reg1);
+				return;
+			}
+
+			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
+						 &offset, &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
+				     insn_offset, var_addr, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+			return;
+		}
+
+		if (src->imm) {
+			tsr->kind = TSR_KIND_CONST;
+			tsr->imm_value = src->offset;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+				     insn_offset, tsr->imm_value, dst->reg1);
+			return;
+		}
+
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		tsr->type = state->regs[src->reg1].type;
+		tsr->kind = state->regs[src->reg1].kind;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			     insn_offset, src->reg1, dst->reg1);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+	/* Case 2. memory to register transers */
+	if (src->mem_ref && !dst->mem_ref) {
+		int sreg = src->reg1;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+retry:
+		/* Check stack variables with offset */
+		if (sreg == fbreg) {
+			struct type_state_stack *stack;
+			int offset = src->offset - fboff;
+
+			stack = find_stack_state(state, offset);
+			if (stack == NULL) {
+				tsr->ok = false;
+				return;
+			} else if (!stack->compound) {
+				tsr->type = stack->type;
+				tsr->kind = stack->kind;
+				tsr->ok = true;
+			} else if (die_get_member_type(&stack->type,
+						       offset - stack->offset,
+						       &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+			} else {
+				tsr->ok = false;
+				return;
+			}
+
+			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
+				     insn_offset, -offset, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* And then dereference the pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_TYPE &&
+			 die_deref_ptr_type(&state->regs[sreg].type,
+					    src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or check if it's a global variable */
+		else if (sreg == DWARF_REG_PC) {
+			struct map_symbol *ms = dloc->ms;
+			u64 ip = ms->sym->start + dl->al.offset;
+			u64 addr;
+			int offset;
+
+			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
+
+			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
+						 &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
+				     insn_offset, addr, dst->reg1);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		/* And check percpu access with base register */
+		else if (has_reg_type(state, sreg) &&
+			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr = src->offset;
+			int offset;
+
+			if (src->multi_regs) {
+				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
+
+				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
+				    state->regs[reg2].kind == TSR_KIND_CONST)
+					var_addr += state->regs[reg2].imm_value;
+			}
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			if (get_global_var_type(cu_die, dloc, ip, var_addr,
+						&offset, &type_die) &&
+			    die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+
+				if (src->multi_regs) {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
+						     insn_offset, src->offset, src->reg1,
+						     src->reg2, dst->reg1);
+				} else {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
+						     insn_offset, src->offset, sreg, dst->reg1);
+				}
+				pr_debug_type_name(&tsr->type, tsr->kind);
+			} else {
+				tsr->ok = false;
+			}
+		}
+		/* And then dereference the calculated pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_POINTER &&
+			 die_get_member_type(&state->regs[sreg].type,
+					     src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or try another register if any */
+		else if (src->multi_regs && sreg == src->reg1 &&
+			 src->reg1 != src->reg2) {
+			sreg = src->reg2;
+			goto retry;
+		}
+		else {
+			int offset;
+			const char *var_name = NULL;
+
+			/* it might be per-cpu variable (in kernel) access */
+			if (src->offset < 0) {
+				if (get_global_var_info(dloc, (s64)src->offset,
+							&var_name, &offset) &&
+				    !strcmp(var_name, "__per_cpu_offset")) {
+					tsr->kind = TSR_KIND_PERCPU_BASE;
+
+					pr_debug_dtp("mov [%x] percpu base reg%d\n",
+						     insn_offset, dst->reg1);
+				}
+			}
+
+			tsr->ok = false;
+		}
+	}
+	/* Case 3. register to memory transfers */
+	if (!src->mem_ref && dst->mem_ref) {
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok)
+			return;
+
+		/* Check stack variables with offset */
+		if (dst->reg1 == fbreg) {
+			struct type_state_stack *stack;
+			int offset = dst->offset - fboff;
+
+			tsr = &state->regs[src->reg1];
+
+			stack = find_stack_state(state, offset);
+			if (stack) {
+				/*
+				 * The source register is likely to hold a type
+				 * of member if it's a compound type.  Do not
+				 * update the stack variable type since we can
+				 * get the member type later by using the
+				 * die_get_member_type().
+				 */
+				if (!stack->compound)
+					set_stack_state(stack, offset, tsr->kind,
+							&tsr->type);
+			} else {
+				findnew_stack_state(state, offset, tsr->kind,
+						    &tsr->type);
+			}
+
+			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
+				     insn_offset, src->reg1, -offset);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/*
+		 * Ignore other transfers since it'd set a value in a struct
+		 * and won't change the type.
+		 */
+	}
+	/* Case 4. memory to memory transfers (not handled for now) */
+}
+#endif
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index a4c7f98a75e3..7a48c3d72b89 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -39,7 +39,7 @@ do {								\
 		pr_debug3(fmt, ##__VA_ARGS__);			\
 } while (0)
 
-static void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
 {
 	struct strbuf sb;
 	char *str;
@@ -390,7 +390,7 @@ static int check_variable(struct data_loc_info *dloc, Dwarf_Die *var_die,
 	return 0;
 }
 
-static struct type_state_stack *find_stack_state(struct type_state *state,
+struct type_state_stack *find_stack_state(struct type_state *state,
 						 int offset)
 {
 	struct type_state_stack *stack;
@@ -406,7 +406,7 @@ static struct type_state_stack *find_stack_state(struct type_state *state,
 	return NULL;
 }
 
-static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 			    Dwarf_Die *type_die)
 {
 	int tag;
@@ -433,7 +433,7 @@ static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 	}
 }
 
-static struct type_state_stack *findnew_stack_state(struct type_state *state,
+struct type_state_stack *findnew_stack_state(struct type_state *state,
 						    int offset, u8 kind,
 						    Dwarf_Die *type_die)
 {
@@ -537,7 +537,7 @@ void global_var_type__tree_delete(struct rb_root *root)
 	}
 }
 
-static bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
 				const char **var_name, int *var_offset)
 {
 	struct addr_location al;
@@ -611,7 +611,7 @@ static void global_var__collect(struct data_loc_info *dloc)
 	}
 }
 
-static bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
 				u64 ip, u64 var_addr, int *var_offset,
 				Dwarf_Die *type_die)
 {
@@ -722,381 +722,6 @@ static void update_var_state(struct type_state *state, struct data_loc_info *dlo
 	}
 }
 
-static void update_insn_state_x86(struct type_state *state,
-				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
-				  struct disasm_line *dl)
-{
-	struct annotated_insn_loc loc;
-	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
-	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
-	struct type_state_reg *tsr;
-	Dwarf_Die type_die;
-	u32 insn_offset = dl->al.offset;
-	int fbreg = dloc->fbreg;
-	int fboff = 0;
-
-	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
-		return;
-
-	if (ins__is_call(&dl->ins)) {
-		struct symbol *func = dl->ops.target.sym;
-
-		if (func == NULL)
-			return;
-
-		/* __fentry__ will preserve all registers */
-		if (!strcmp(func->name, "__fentry__"))
-			return;
-
-		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
-
-		/* Otherwise invalidate caller-saved registers after call */
-		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
-			if (state->regs[i].caller_saved)
-				state->regs[i].ok = false;
-		}
-
-		/* Update register with the return type (if any) */
-		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
-			tsr = &state->regs[state->ret_reg];
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("call [%x] return -> reg%d",
-				     insn_offset, state->ret_reg);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		return;
-	}
-
-	if (!strncmp(dl->ins.name, "add", 3)) {
-		u64 imm_value = -1ULL;
-		int offset;
-		const char *var_name = NULL;
-		struct map_symbol *ms = dloc->ms;
-		u64 ip = ms->sym->start + dl->al.offset;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-		if (src->imm)
-			imm_value = src->offset;
-		else if (has_reg_type(state, src->reg1) &&
-			 state->regs[src->reg1].kind == TSR_KIND_CONST)
-			imm_value = state->regs[src->reg1].imm_value;
-		else if (src->reg1 == DWARF_REG_PC) {
-			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
-							   src->offset, dl);
-
-			if (get_global_var_info(dloc, var_addr,
-						&var_name, &offset) &&
-			    !strcmp(var_name, "this_cpu_off") &&
-			    tsr->kind == TSR_KIND_CONST) {
-				tsr->kind = TSR_KIND_PERCPU_BASE;
-				imm_value = tsr->imm_value;
-			}
-		}
-		else
-			return;
-
-		if (tsr->kind != TSR_KIND_PERCPU_BASE)
-			return;
-
-		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
-					&type_die) && offset == 0) {
-			/*
-			 * This is not a pointer type, but it should be treated
-			 * as a pointer.
-			 */
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_POINTER;
-			tsr->ok = true;
-
-			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
-				     insn_offset, imm_value, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		return;
-	}
-
-	if (strncmp(dl->ins.name, "mov", 3))
-		return;
-
-	if (dloc->fb_cfa) {
-		u64 ip = dloc->ms->sym->start + dl->al.offset;
-		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
-
-		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
-			fbreg = -1;
-	}
-
-	/* Case 1. register to register or segment:offset to register transfers */
-	if (!src->mem_ref && !dst->mem_ref) {
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-		if (dso__kernel(map__dso(dloc->ms->map)) &&
-		    src->segment == INSN_SEG_X86_GS && src->imm) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr;
-			int offset;
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			var_addr = src->offset;
-
-			if (var_addr == 40) {
-				tsr->kind = TSR_KIND_CANARY;
-				tsr->ok = true;
-
-				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
-					     insn_offset, dst->reg1);
-				return;
-			}
-
-			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
-						 &offset, &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
-				     insn_offset, var_addr, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-			return;
-		}
-
-		if (src->imm) {
-			tsr->kind = TSR_KIND_CONST;
-			tsr->imm_value = src->offset;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
-				     insn_offset, tsr->imm_value, dst->reg1);
-			return;
-		}
-
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok) {
-			tsr->ok = false;
-			return;
-		}
-
-		tsr->type = state->regs[src->reg1].type;
-		tsr->kind = state->regs[src->reg1].kind;
-		tsr->ok = true;
-
-		pr_debug_dtp("mov [%x] reg%d -> reg%d",
-			     insn_offset, src->reg1, dst->reg1);
-		pr_debug_type_name(&tsr->type, tsr->kind);
-	}
-	/* Case 2. memory to register transers */
-	if (src->mem_ref && !dst->mem_ref) {
-		int sreg = src->reg1;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-retry:
-		/* Check stack variables with offset */
-		if (sreg == fbreg) {
-			struct type_state_stack *stack;
-			int offset = src->offset - fboff;
-
-			stack = find_stack_state(state, offset);
-			if (stack == NULL) {
-				tsr->ok = false;
-				return;
-			} else if (!stack->compound) {
-				tsr->type = stack->type;
-				tsr->kind = stack->kind;
-				tsr->ok = true;
-			} else if (die_get_member_type(&stack->type,
-						       offset - stack->offset,
-						       &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-			} else {
-				tsr->ok = false;
-				return;
-			}
-
-			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
-				     insn_offset, -offset, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* And then dereference the pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_TYPE &&
-			 die_deref_ptr_type(&state->regs[sreg].type,
-					    src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or check if it's a global variable */
-		else if (sreg == DWARF_REG_PC) {
-			struct map_symbol *ms = dloc->ms;
-			u64 ip = ms->sym->start + dl->al.offset;
-			u64 addr;
-			int offset;
-
-			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
-
-			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
-						 &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
-				     insn_offset, addr, dst->reg1);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		/* And check percpu access with base register */
-		else if (has_reg_type(state, sreg) &&
-			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr = src->offset;
-			int offset;
-
-			if (src->multi_regs) {
-				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
-
-				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
-				    state->regs[reg2].kind == TSR_KIND_CONST)
-					var_addr += state->regs[reg2].imm_value;
-			}
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			if (get_global_var_type(cu_die, dloc, ip, var_addr,
-						&offset, &type_die) &&
-			    die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-
-				if (src->multi_regs) {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
-						     insn_offset, src->offset, src->reg1,
-						     src->reg2, dst->reg1);
-				} else {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
-						     insn_offset, src->offset, sreg, dst->reg1);
-				}
-				pr_debug_type_name(&tsr->type, tsr->kind);
-			} else {
-				tsr->ok = false;
-			}
-		}
-		/* And then dereference the calculated pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_POINTER &&
-			 die_get_member_type(&state->regs[sreg].type,
-					     src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or try another register if any */
-		else if (src->multi_regs && sreg == src->reg1 &&
-			 src->reg1 != src->reg2) {
-			sreg = src->reg2;
-			goto retry;
-		}
-		else {
-			int offset;
-			const char *var_name = NULL;
-
-			/* it might be per-cpu variable (in kernel) access */
-			if (src->offset < 0) {
-				if (get_global_var_info(dloc, (s64)src->offset,
-							&var_name, &offset) &&
-				    !strcmp(var_name, "__per_cpu_offset")) {
-					tsr->kind = TSR_KIND_PERCPU_BASE;
-
-					pr_debug_dtp("mov [%x] percpu base reg%d\n",
-						     insn_offset, dst->reg1);
-				}
-			}
-
-			tsr->ok = false;
-		}
-	}
-	/* Case 3. register to memory transfers */
-	if (!src->mem_ref && dst->mem_ref) {
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok)
-			return;
-
-		/* Check stack variables with offset */
-		if (dst->reg1 == fbreg) {
-			struct type_state_stack *stack;
-			int offset = dst->offset - fboff;
-
-			tsr = &state->regs[src->reg1];
-
-			stack = find_stack_state(state, offset);
-			if (stack) {
-				/*
-				 * The source register is likely to hold a type
-				 * of member if it's a compound type.  Do not
-				 * update the stack variable type since we can
-				 * get the member type later by using the
-				 * die_get_member_type().
-				 */
-				if (!stack->compound)
-					set_stack_state(stack, offset, tsr->kind,
-							&tsr->type);
-			} else {
-				findnew_stack_state(state, offset, tsr->kind,
-						    &tsr->type);
-			}
-
-			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
-				     insn_offset, src->reg1, -offset);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/*
-		 * Ignore other transfers since it'd set a value in a struct
-		 * and won't change the type.
-		 */
-	}
-	/* Case 4. memory to memory transfers (not handled for now) */
-}
-
 /**
  * update_insn_state - Update type state for an instruction
  * @state: type state table
@@ -1115,8 +740,8 @@ static void update_insn_state_x86(struct type_state *state,
 static void update_insn_state(struct type_state *state, struct data_loc_info *dloc,
 			      Dwarf_Die *cu_die, struct disasm_line *dl)
 {
-	if (arch__is(dloc->arch, "x86"))
-		update_insn_state_x86(state, dloc, cu_die, dl);
+	if (dloc->arch->update_insn_state)
+		dloc->arch->update_insn_state(state, dloc, cu_die, dl);
 }
 
 /*
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index cdb5cd8960bb..6fe8ee8b8410 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -6,6 +6,7 @@
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include "dwarf-regs.h"
 #include "annotate.h"
 
 #ifdef HAVE_DWARF_SUPPORT
@@ -20,6 +21,14 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+#define pr_debug_dtp(fmt, ...)					\
+do {								\
+	if (debug_type_profile)					\
+		pr_info(fmt, ##__VA_ARGS__);			\
+	else							\
+		pr_debug3(fmt, ##__VA_ARGS__);			\
+} while (0)
+
 enum type_state_kind {
 	TSR_KIND_INVALID = 0,
 	TSR_KIND_TYPE,
@@ -216,6 +225,20 @@ void global_var_type__tree_delete(struct rb_root *root);
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
 bool has_reg_type(struct type_state *state, int reg);
+struct type_state_stack *findnew_stack_state(struct type_state *state,
+						int offset, u8 kind,
+						Dwarf_Die *type_die);
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+				Dwarf_Die *type_die);
+struct type_state_stack *find_stack_state(struct type_state *state,
+						int offset);
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+				u64 ip, u64 var_addr, int *var_offset,
+				Dwarf_Die *type_die);
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+				const char **var_name, int *var_offset);
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind);
+
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 72aec8f61b94..d2723ba024bf 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -12,6 +12,7 @@
 #include <subcmd/run-command.h>
 
 #include "annotate.h"
+#include "annotate-data.h"
 #include "build-id.h"
 #include "debug.h"
 #include "disasm.h"
@@ -145,6 +146,9 @@ static struct arch architectures[] = {
 			.memory_ref_char = '(',
 			.imm_char = '$',
 		},
+#ifdef HAVE_DWARF_SUPPORT
+		.update_insn_state = update_insn_state_x86,
+#endif
 	},
 	{
 		.name = "powerpc",
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 3d381a043520..c835759c8e2b 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -4,11 +4,18 @@
 
 #include "map_symbol.h"
 
+#ifdef HAVE_DWARF_SUPPORT
+#include "dwarf-aux.h"
+#endif
+
 struct annotation_options;
 struct disasm_line;
 struct ins;
 struct evsel;
 struct symbol;
+struct data_loc_info;
+struct type_state;
+struct disasm_line;
 
 struct arch {
 	const char	*name;
@@ -32,6 +39,11 @@ struct arch {
 		char memory_ref_char;
 		char imm_char;
 	} objdump;
+#ifdef HAVE_DWARF_SUPPORT
+	void		(*update_insn_state)(struct type_state *state,
+				struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				struct disasm_line *dl);
+#endif
 };
 
 struct ins {
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 01/15] tools/perf: Move the data structures related to register type to header file Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 02/15] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-23 19:06   ` Arnaldo Carvalho de Melo
  2024-07-18  8:43 ` [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined
to be 16. While checking if reg is valid using has_reg_type,
max value is checked using TYPE_STATE_MAX_REGS value. Define
this conditionally for powerpc.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.h | 4 ++++
 1 file changed, 4 insertions(+)
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 6fe8ee8b8410..992b7ce4bd11 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -189,7 +189,11 @@ struct type_state_stack {
 };
 
 /* FIXME: This should be arch-dependent */
+#ifdef __powerpc__
+#define TYPE_STATE_MAX_REGS  32
+#else
 #define TYPE_STATE_MAX_REGS  16
+#endif
 
 /*
  * State table to maintain type info in each register and stack location.
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (2 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-23 19:14   ` Arnaldo Carvalho de Melo
  2024-07-18  8:43 ` [PATCH V8 05/15] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Currently, the perf tool infrastructure disasm_line__parse function to
parse disassembled line.
Example snippet from objdump:
objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4:	lwz     r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. In powerpc, the approach for data type
profiling uses raw instruction instead of result from objdump to identify
the instruction category and extract the source/target registers.
Example: 38 01 81 e8     ld      r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:
line -> "38 01 81 e8     ld      r4,312(r1)"
raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw). Function "disasm_line__parse_powerpc fills the raw
instruction hex value and can use macros to get opcode. There is no
changes in existing code paths, which parses the disassembled code.
The size of raw instruction depends on architecture. In case of powerpc,
the parsing the disasm line needs to handle cases for reading binary code
directly from DSO as well as parsing the objdump result. Hence adding
the logic into separate function instead of updating "disasm_line__parse".
The architecture using the instruction name and present approach is
not altered. Since this approach targets powerpc, the macro
implementation is added for powerpc as of now.
Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mneumonic representation. Also in case
if the DSO read fails and libcapstone is not supported, the approach
fallback to use objdump as option. Hence as option, patch has changes to
ensure objdump option also works well.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/include/linux/string.h                  |  2 +
 tools/lib/string.c                            | 13 +++++
 .../perf/arch/powerpc/annotate/instructions.c |  1 +
 tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 ++++
 tools/perf/util/annotate.h                    |  5 +-
 tools/perf/util/disasm.c                      | 48 ++++++++++++++++++-
 6 files changed, 76 insertions(+), 2 deletions(-)
diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
index db5c99318c79..0acb1fc14e19 100644
--- a/tools/include/linux/string.h
+++ b/tools/include/linux/string.h
@@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
 
 extern char *strim(char *);
 
+extern void remove_spaces(char *s);
+
 extern void *memchr_inv(const void *start, int c, size_t bytes);
 #endif /* _TOOLS_LINUX_STRING_H_ */
diff --git a/tools/lib/string.c b/tools/lib/string.c
index 8b6892f959ab..3126d2cff716 100644
--- a/tools/lib/string.c
+++ b/tools/lib/string.c
@@ -153,6 +153,19 @@ char *strim(char *s)
 	return skip_spaces(s);
 }
 
+/*
+ * remove_spaces - Removes whitespaces from @s
+ */
+void remove_spaces(char *s)
+{
+	char *d = s;
+
+	do {
+		while (*d == ' ')
+			++d;
+	} while ((*s++ = *d++));
+}
+
 /**
  * strreplace - Replace all occurrences of character in string.
  * @s: The string to operate on.
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index a3f423c27cae..d57fd023ef9c 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 		arch->initialized = true;
 		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
 		arch->objdump.comment_char      = '#';
+		annotate_opts.show_asm_raw = true;
 	}
 
 	return 0;
diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 0c4f4caf53ac..430623ca5612 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
 			return roff->ptregs_offset;
 	return -EINVAL;
 }
+
+#define PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define PPC_RA(a)	(((a) >> 16) & 0x1f)
+#define PPC_RT(t)	(((t) >> 21) & 0x1f)
+#define PPC_RB(b)	(((b) >> 11) & 0x1f)
+#define PPC_D(D)	((D) & 0xfffe)
+#define PPC_DS(DS)	((DS) & 0xfffc)
+#define OP_LD	58
+#define OP_STD	62
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index d5c821c22f79..9ba772f46270 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -113,7 +113,10 @@ struct annotation_line {
 struct disasm_line {
 	struct ins		 ins;
 	struct ins_operands	 ops;
-
+	union {
+		u8 bytes[4];
+		u32 raw_insn;
+	} raw;
 	/* This needs to be at the end. */
 	struct annotation_line	 al;
 };
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index d2723ba024bf..a53591a6111e 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -44,6 +44,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
 
 static void ins__sort(struct arch *arch);
 static int disasm_line__parse(char *line, const char **namep, char **rawp);
+static int disasm_line__parse_powerpc(struct disasm_line *dl);
 
 static __attribute__((constructor)) void symbol__init_regexpr(void)
 {
@@ -845,6 +846,48 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
 	return -1;
 }
 
+/*
+ * Parses the result captured from symbol__disassemble_*
+ * Example, line read from DSO file in powerpc:
+ * line:    38 01 81 e8
+ * opcode: fetched from arch specific get_opcode_insn
+ * rawp_insn: e8810138
+ *
+ * rawp_insn is used later to extract the reg/offset fields
+ */
+#define	PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define	RAW_BYTES	11
+
+static int disasm_line__parse_powerpc(struct disasm_line *dl)
+{
+	char *line = dl->al.line;
+	const char **namep = &dl->ins.name;
+	char **rawp = &dl->ops.raw;
+	char *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
+	char *name = skip_spaces(name_raw_insn + RAW_BYTES);
+	int objdump = 0;
+
+	if (strlen(line) > RAW_BYTES)
+		objdump = 1;
+
+	if (name_raw_insn[0] == '\0')
+		return -1;
+
+	if (objdump) {
+		disasm_line__parse(name, namep, rawp);
+	} else
+		*namep = "";
+
+	tmp_raw_insn = strndup(name_raw_insn, 11);
+	remove_spaces(tmp_raw_insn);
+
+	sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn);
+	if (objdump)
+		dl->raw.raw_insn = be32_to_cpu(dl->raw.raw_insn);
+
+	return 0;
+}
+
 static void annotation_line__init(struct annotation_line *al,
 				  struct annotate_args *args,
 				  int nr)
@@ -898,7 +941,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
 		goto out_delete;
 
 	if (args->offset != -1) {
-		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
+		if (arch__is(args->arch, "powerpc")) {
+			if (disasm_line__parse_powerpc(dl) < 0)
+				goto out_free_line;
+		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
 			goto out_free_line;
 
 		disasm_line__init_ins(dl, args->arch, &args->ms);
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 05/15] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (3 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 06/15] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Add support to capture and parse raw instruction in powerpc.
Currently, the perf tool infrastructure uses two ways to disassemble
and understand the instruction. One is objdump and other option is
via libcapstone.
Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdump" while disassemble. Example from powerpc with this option
for an instruction address is:
Snippet from:
objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4:	lwz     r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".
In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, patch adds support to use raw
instruction for powerpc. Approach used is to read the raw instruction
directly from the DSO file using "dso__data_read_offset" utility which
is already implemented in perf infrastructure in "util/dso.c".
Example:
38 01 81 e8     ld      r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
   | 58 |  RT  |  RA |      DS       | |
   -------------------------------------
   0    6     11    16              30 31
Function "symbol__disassemble_dso" is updated to read raw instruction
directly from DSO using dso__data_read_offset utility. In case of
above example, this captures:
line:    38 01 81 e8
The above works well when perf report is invoked with only sort keys for
data type ie type and typeoff. Because there is no instruction level
annotation needed if only data type information is requested for. For
annotating sample, along with type and typeoff sort key, "sym" sort key
is also needed. And by default invoking just "perf report" uses sort key
"sym" that displays the symbol information.
With approach changes in powerpc which first reads DSO for raw
instruction, "perf annotate" and "perf report" + a key breaks since
it doesn't do the instruction level disassembly.
Snippet of result from perf report:
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work  /usr/bin/pmlogger [Percent: local period]
Percent│        ea230010
       │        3a550010
       │        3a600000
       │        38f60001
       │        39490008
       │        42400438
 51.44 │        81290008
       │        7d485378
Here, raw instruction is displayed in the output instead of human
readable annotated form.
One way to get the appropriate data is to specify "--objdump path", by
which code annotation will be done. But the default behaviour will be
changed. To fix this breakage, check if "sym" sort key is set. If so
fallback and use the libcapstone/objdump way of disassmbling the sample.
With the changes and "perf report"
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work  /usr/bin/pmlogger [Percent: local period]
Percent│        ld        r17,16(r3)
       │        addi      r18,r21,16
       │        li        r19,0
       │ 8b0:   rldicl    r10,r10,63,33
       │        addi      r10,r10,1
       │        mtctr     r10
       │      ↓ b         8e4
       │ 8c0:   addi      r7,r22,1
       │        addi      r10,r9,8
       │      ↓ bdz       d00
 51.44 │        lwz       r9,8(r9)
       │        mr        r8,r10
       │        cmpw      r20,r9
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c | 101 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index a53591a6111e..646290b043b2 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -25,6 +25,7 @@
 #include "srcline.h"
 #include "symbol.h"
 #include "util.h"
+#include "sort.h"
 
 static regex_t	 file_lineno;
 
@@ -1634,6 +1635,91 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 }
 #endif
 
+static int symbol__disassemble_raw(char *filename, struct symbol *sym,
+					struct annotate_args *args)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	struct map *map = args->ms.map;
+	struct dso *dso = map__dso(map);
+	u64 start = map__rip_2objdump(map, sym->start);
+	u64 end = map__rip_2objdump(map, sym->end);
+	u64 len = end - start;
+	u64 offset;
+	int i, count;
+	u8 *buf = NULL;
+	char disasm_buf[512];
+	struct disasm_line *dl;
+	u32 *line;
+
+	/* Return if objdump is specified explicitly */
+	if (args->options->objdump_path)
+		return -1;
+
+	pr_debug("Reading raw instruction from : %s using dso__data_read_offset\n", filename);
+
+	buf = malloc(len);
+	if (buf == NULL)
+		goto err;
+
+	count = dso__data_read_offset(dso, NULL, sym->start, buf, len);
+
+	line = (u32 *)buf;
+
+	if ((u64)count != len)
+		goto err;
+
+	/* add the function address and name */
+	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
+		  start, sym->name);
+
+	args->offset = -1;
+	args->line = disasm_buf;
+	args->line_nr = 0;
+	args->fileloc = NULL;
+	args->ms.sym = sym;
+
+	dl = disasm_line__new(args);
+	if (dl == NULL)
+		goto err;
+
+	annotation_line__add(&dl->al, ¬es->src->source);
+
+	/* Each raw instruction is 4 byte */
+	count = len/4;
+
+	for (i = 0, offset = 0; i < count; i++) {
+		args->offset = offset;
+		sprintf(args->line, "%x", line[i]);
+		dl = disasm_line__new(args);
+		if (dl == NULL)
+			goto err;
+
+		annotation_line__add(&dl->al, ¬es->src->source);
+		offset += 4;
+	}
+
+	/* It failed in the middle */
+	if (offset != len) {
+		struct list_head *list = ¬es->src->source;
+
+		/* Discard all lines and fallback to objdump */
+		while (!list_empty(list)) {
+			dl = list_first_entry(list, struct disasm_line, al.node);
+
+			list_del_init(&dl->al.node);
+			disasm_line__free(dl);
+		}
+		count = -1;
+	}
+
+out:
+	free(buf);
+	return count < 0 ? count : 0;
+
+err:
+	count = -1;
+	goto out;
+}
 /*
  * Possibly create a new version of line with tabs expanded. Returns the
  * existing or new line, storage is updated if a new line is allocated. If
@@ -1758,6 +1844,21 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 		strcpy(symfs_filename, tmp);
 	}
 
+	/*
+	 * For powerpc data type profiling, use the dso__data_read_offset
+	 * to read raw instruction directly and interpret the binary code
+	 * to understand instructions and register fields. For sort keys as
+	 * type and typeoff, disassemble to mnemonic notation is
+	 * not required in case of powerpc.
+	 */
+	if (arch__is(args->arch, "powerpc")) {
+		if (sort_order && !strstr(sort_order, "sym")) {
+			err = symbol__disassemble_raw(symfs_filename, sym, args);
+			if (err == 0)
+				goto out_remove_tmp;
+		}
+	}
+
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 	err = symbol__disassemble_capstone(symfs_filename, sym, args);
 	if (err == 0)
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 06/15] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (4 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 05/15] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 07/15] tools/perf: Add parse function for memory instructions in powerpc Athira Rajeev
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check
whether source/target has memory reference. Add function
"get_powerpc_regs" which will set these fields: reg1, reg2, offset
depending of where it is source or target ops.
Update "parse" callback for "struct ins_ops" to also pass "struct
disasm_line" as argument. This is needed in parse functions where opcode
is used to determine whether to set multi_regs and other fields
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/arm64/annotate/instructions.c |  3 +-
 .../arch/loongarch/annotate/instructions.c    |  6 ++-
 tools/perf/arch/powerpc/util/dwarf-regs.c     | 44 +++++++++++++++++++
 tools/perf/arch/s390/annotate/instructions.c  |  5 ++-
 tools/perf/util/annotate.c                    | 19 ++++++--
 tools/perf/util/disasm.c                      | 19 +++++---
 tools/perf/util/disasm.h                      |  5 ++-
 tools/perf/util/include/dwarf-regs.h          | 11 +++++
 8 files changed, 96 insertions(+), 16 deletions(-)
diff --git a/tools/perf/arch/arm64/annotate/instructions.c b/tools/perf/arch/arm64/annotate/instructions.c
index 4af0c3a0f86e..f86d9f4798bd 100644
--- a/tools/perf/arch/arm64/annotate/instructions.c
+++ b/tools/perf/arch/arm64/annotate/instructions.c
@@ -11,7 +11,8 @@ struct arm64_annotate {
 
 static int arm64_mov__parse(struct arch *arch __maybe_unused,
 			    struct ins_operands *ops,
-			    struct map_symbol *ms __maybe_unused)
+			    struct map_symbol *ms __maybe_unused,
+			    struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *endptr;
 
diff --git a/tools/perf/arch/loongarch/annotate/instructions.c b/tools/perf/arch/loongarch/annotate/instructions.c
index 21cc7e4149f7..ab43b1ab51e3 100644
--- a/tools/perf/arch/loongarch/annotate/instructions.c
+++ b/tools/perf/arch/loongarch/annotate/instructions.c
@@ -5,7 +5,8 @@
  * Copyright (C) 2020-2023 Loongson Technology Corporation Limited
  */
 
-static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int loongarch_call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *c, *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -51,7 +52,8 @@ static struct ins_ops loongarch_call_ops = {
 	.scnprintf = call__scnprintf,
 };
 
-static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int loongarch_jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	struct map *map = ms->map;
 	struct symbol *sym = ms->sym;
diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 430623ca5612..104c7ae5c433 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -107,3 +107,47 @@ int regs_query_register_offset(const char *name)
 #define PPC_DS(DS)	((DS) & 0xfffc)
 #define OP_LD	58
 #define OP_STD	62
+
+static int get_source_reg(u32 raw_insn)
+{
+	return PPC_RA(raw_insn);
+}
+
+static int get_target_reg(u32 raw_insn)
+{
+	return PPC_RT(raw_insn);
+}
+
+static int get_offset_opcode(u32 raw_insn)
+{
+	int opcode = PPC_OP(raw_insn);
+
+	/* DS- form */
+	if ((opcode == OP_LD) || (opcode == OP_STD))
+		return PPC_DS(raw_insn);
+	else
+		return PPC_D(raw_insn);
+}
+
+/*
+ * Fills the required fields for op_loc depending on if it
+ * is a source or target.
+ * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
+ * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
+ * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
+ */
+void get_powerpc_regs(u32 raw_insn, int is_source,
+		struct annotated_op_loc *op_loc)
+{
+	if (is_source)
+		op_loc->reg1 = get_source_reg(raw_insn);
+	else
+		op_loc->reg1 = get_target_reg(raw_insn);
+
+	if (op_loc->multi_regs)
+		op_loc->reg2 = PPC_RB(raw_insn);
+
+	/* TODO: Implement offset handling for X Form */
+	if ((op_loc->mem_ref) && (PPC_OP(raw_insn) != 31))
+		op_loc->offset = get_offset_opcode(raw_insn);
+}
diff --git a/tools/perf/arch/s390/annotate/instructions.c b/tools/perf/arch/s390/annotate/instructions.c
index da5aa3e1f04c..eeac25cca699 100644
--- a/tools/perf/arch/s390/annotate/instructions.c
+++ b/tools/perf/arch/s390/annotate/instructions.c
@@ -2,7 +2,7 @@
 #include <linux/compiler.h>
 
 static int s390_call__parse(struct arch *arch, struct ins_operands *ops,
-			    struct map_symbol *ms)
+			    struct map_symbol *ms, struct disasm_line *dl __maybe_unused)
 {
 	char *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -52,7 +52,8 @@ static struct ins_ops s390_call_ops = {
 
 static int s390_mov__parse(struct arch *arch __maybe_unused,
 			   struct ins_operands *ops,
-			   struct map_symbol *ms __maybe_unused)
+			   struct map_symbol *ms __maybe_unused,
+			   struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *endptr;
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1451caf25e77..ce99db291c5e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2123,20 +2123,33 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 	for_each_insn_op_loc(loc, i, op_loc) {
 		const char *insn_str = ops->source.raw;
 		bool multi_regs = ops->source.multi_regs;
+		bool mem_ref = ops->source.mem_ref;
 
 		if (i == INSN_OP_TARGET) {
 			insn_str = ops->target.raw;
 			multi_regs = ops->target.multi_regs;
+			mem_ref = ops->target.mem_ref;
 		}
 
 		/* Invalidate the register by default */
 		op_loc->reg1 = -1;
 		op_loc->reg2 = -1;
 
-		if (insn_str == NULL)
-			continue;
+		if (insn_str == NULL) {
+			if (!arch__is(arch, "powerpc"))
+				continue;
+		}
 
-		if (strchr(insn_str, arch->objdump.memory_ref_char)) {
+		/*
+		 * For powerpc, call get_powerpc_regs function which extracts the
+		 * required fields for op_loc, ie reg1, reg2, offset from the
+		 * raw instruction.
+		 */
+		if (arch__is(arch, "powerpc")) {
+			op_loc->mem_ref = mem_ref;
+			op_loc->multi_regs = multi_regs;
+			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
+		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
 			op_loc->mem_ref = true;
 			op_loc->multi_regs = multi_regs;
 			extract_reg_offset(arch, insn_str, op_loc);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 646290b043b2..8e45f0874e03 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -256,7 +256,8 @@ bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2)
 	return arch->ins_is_fused(arch, ins1, ins2);
 }
 
-static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *endptr, *tok, *name;
 	struct map *map = ms->map;
@@ -351,7 +352,8 @@ static inline const char *validate_comma(const char *c, struct ins_operands *ops
 	return c;
 }
 
-static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int jump__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	struct map *map = ms->map;
 	struct symbol *sym = ms->sym;
@@ -510,7 +512,8 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
 	return 0;
 }
 
-static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms)
+static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+		struct disasm_line *dl __maybe_unused)
 {
 	ops->locked.ops = zalloc(sizeof(*ops->locked.ops));
 	if (ops->locked.ops == NULL)
@@ -525,7 +528,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 		goto out_free_ops;
 
 	if (ops->locked.ins.ops->parse &&
-	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms) < 0)
+	    ops->locked.ins.ops->parse(arch, ops->locked.ops, ms, NULL) < 0)
 		goto out_free_ops;
 
 	return 0;
@@ -596,7 +599,8 @@ static bool check_multi_regs(struct arch *arch, const char *op)
 	return count > 1;
 }
 
-static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
+static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *comment, prev;
 
@@ -674,7 +678,8 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
-static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
+static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
+		struct disasm_line *dl __maybe_unused)
 {
 	char *target, *comment, *s, prev;
 
@@ -815,7 +820,7 @@ static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, str
 	if (!dl->ins.ops)
 		return;
 
-	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms) < 0)
+	if (dl->ins.ops->parse && dl->ins.ops->parse(arch, &dl->ops, ms, dl) < 0)
 		dl->ins.ops = NULL;
 }
 
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index c835759c8e2b..30be0a94ea04 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -62,6 +62,7 @@ struct ins_operands {
 		bool	offset_avail;
 		bool	outside;
 		bool	multi_regs;
+		bool	mem_ref;
 	} target;
 	union {
 		struct {
@@ -69,6 +70,7 @@ struct ins_operands {
 			char	*name;
 			u64	addr;
 			bool	multi_regs;
+			bool	mem_ref;
 		} source;
 		struct {
 			struct ins	    ins;
@@ -83,7 +85,8 @@ struct ins_operands {
 
 struct ins_ops {
 	void (*free)(struct ins_operands *ops);
-	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms);
+	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map_symbol *ms,
+			struct disasm_line *dl);
 	int (*scnprintf)(struct ins *ins, char *bf, size_t size,
 			 struct ins_operands *ops, int max_ins_name);
 };
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 01fb25a1150a..75b28dcc8317 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _PERF_DWARF_REGS_H_
 #define _PERF_DWARF_REGS_H_
+#include "annotate.h"
 
 #define DWARF_REG_PC  0xd3af9c /* random number */
 #define DWARF_REG_FB  0xd3affb /* random number */
@@ -31,6 +32,16 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
 }
 #endif
 
+#if !defined(__powerpc__) || !defined(HAVE_DWARF_SUPPORT)
+static inline void get_powerpc_regs(u32 raw_insn __maybe_unused, int is_source __maybe_unused,
+		struct annotated_op_loc *op_loc __maybe_unused)
+{
+	return;
+}
+#else
+void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc *op_loc);
+#endif
+
 #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
 /*
  * Arch should support fetching the offset of a register in pt_regs
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 07/15] tools/perf: Add parse function for memory instructions in powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (5 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 06/15] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 08/15] tools/perf: Add support to identify memory instructions of opcode 31 " Athira Rajeev
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Two main functions are added.
New parse function "load_store__parse" as instruction ops parser for
memory instructions. Unlink other parser (like mov__parse), this parser
fills in the "multi_regs" field for source/target and new added "mem_ref"
field. No other fields are set because, here there is no need to parse the
disassembled code and arch specific macros will take care of extracting
offset and regs which is easier and will be precise.
In powerpc, all instructions with a primary opcode from 32 to 63
are memory instructions. Update "ins__find" function to have "raw_insn"
also as a parameter.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 16 ++++++
 tools/perf/util/disasm.c                      | 54 +++++++++++++++++--
 tools/perf/util/disasm.h                      |  2 +-
 3 files changed, 66 insertions(+), 6 deletions(-)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index d57fd023ef9c..b084423d8477 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -49,6 +49,22 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 	return ops;
 }
 
+#define PPC_OP(op)      (((op) >> 26) & 0x3F)
+
+static struct ins_ops *check_ppc_insn(u32 raw_insn)
+{
+	int opcode = PPC_OP(raw_insn);
+
+	/*
+	 * Instructions with opcode 32 to 63 are memory
+	 * instructions in powerpc
+	 */
+	if ((opcode & 0x20))
+		return &load_store_ops;
+
+	return NULL;
+}
+
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 8e45f0874e03..b30277a930c0 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -37,6 +37,7 @@ static struct ins_ops mov_ops;
 static struct ins_ops nop_ops;
 static struct ins_ops lock_ops;
 static struct ins_ops ret_ops;
+static struct ins_ops load_store_ops;
 
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
@@ -522,7 +523,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
 		goto out_free_ops;
 
-	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
+	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name, 0);
 
 	if (ops->locked.ins.ops == NULL)
 		goto out_free_ops;
@@ -678,6 +679,37 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
+		struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			ops->raw);
+}
+
+/*
+ * Sets the fields: multi_regs and "mem_ref".
+ * "mem_ref" is set for ops->source which is later used to
+ * fill the objdump->memory_ref-char field. This ops is currently
+ * used by powerpc and since binary instruction code is used to
+ * extract opcode, regs and offset, no other parsing is needed here
+ */
+static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
+		struct map_symbol *ms __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	ops->source.mem_ref = true;
+	ops->source.multi_regs = false;
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static struct ins_ops load_store_ops = {
+	.parse     = load_store__parse,
+	.scnprintf = load_store__scnprintf,
+};
+
 static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused,
 		struct disasm_line *dl __maybe_unused)
 {
@@ -769,11 +801,23 @@ static void ins__sort(struct arch *arch)
 	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins_ops *__ins__find(struct arch *arch, const char *name)
+static struct ins_ops *__ins__find(struct arch *arch, const char *name, u32 raw_insn)
 {
 	struct ins *ins;
 	const int nmemb = arch->nr_instructions;
 
+	if (arch__is(arch, "powerpc")) {
+		/*
+		 * For powerpc, identify the instruction ops
+		 * from the opcode using raw_insn.
+		 */
+		struct ins_ops *ops;
+
+		ops = check_ppc_insn(raw_insn);
+		if (ops)
+			return ops;
+	}
+
 	if (!arch->sorted_instructions) {
 		ins__sort(arch);
 		arch->sorted_instructions = true;
@@ -803,9 +847,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name)
 	return ins ? ins->ops : NULL;
 }
 
-struct ins_ops *ins__find(struct arch *arch, const char *name)
+struct ins_ops *ins__find(struct arch *arch, const char *name, u32 raw_insn)
 {
-	struct ins_ops *ops = __ins__find(arch, name);
+	struct ins_ops *ops = __ins__find(arch, name, raw_insn);
 
 	if (!ops && arch->associate_instruction_ops)
 		ops = arch->associate_instruction_ops(arch, name);
@@ -815,7 +859,7 @@ struct ins_ops *ins__find(struct arch *arch, const char *name)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
 {
-	dl->ins.ops = ins__find(arch, dl->ins.name);
+	dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
 
 	if (!dl->ins.ops)
 		return;
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 30be0a94ea04..c1bb1e484bfb 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -105,7 +105,7 @@ struct annotate_args {
 struct arch *arch__find(const char *name);
 bool arch__is(struct arch *arch, const char *name);
 
-struct ins_ops *ins__find(struct arch *arch, const char *name);
+struct ins_ops *ins__find(struct arch *arch, const char *name, u32 raw_insn);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 		   struct ins_operands *ops, int max_ins_name);
 
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 08/15] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (6 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 07/15] tools/perf: Add parse function for memory instructions in powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 09/15] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
There are memory instructions in powerpc with opcode as 31.
Example: "ldx RT,RA,RB" , Its X form is as below:
  ______________________________________
  | 31 |  RT  |  RA |  RB |   21     |/|
  --------------------------------------
  0    6     11    16    21         30 31
The opcode for "ldx" is 31. There are other instructions also with
opcode 31 which are memory insn like ldux, stbx, lwzx, lhaux
But all instructions with opcode 31 are not memory. Example is add
instruction: "add RT,RA,RB"
The value in bit 21-30 [ 21 for ldx ] is different for these
instructions. Patch uses this value to assign instruction ops for these
cases. The naming convention and value to identify these are picked from
defines in "arch/powerpc/include/asm/ppc-opcode.h"
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 107 +++++++++++++++++-
 tools/perf/util/disasm.c                      |   3 +
 2 files changed, 108 insertions(+), 2 deletions(-)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index b084423d8477..1ffb64c6bd0d 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -49,18 +49,121 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 	return ops;
 }
 
-#define PPC_OP(op)      (((op) >> 26) & 0x3F)
+#define PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define PPC_21_30(R)	(((R) >> 1) & 0x3ff)
+
+struct insn_offset {
+	const char	*name;
+	int		value;
+};
+
+/*
+ * There are memory instructions with opcode 31 which are
+ * of X Form, Example:
+ * ldx RT,RA,RB
+ * ______________________________________
+ * | 31 |  RT  |  RA |  RB |   21     |/|
+ * --------------------------------------
+ * 0    6     11    16    21         30 31
+ *
+ * But all instructions with opcode 31 are not memory.
+ * Example: add RT,RA,RB
+ *
+ * Use bits 21 to 30 to check memory insns with 31 as opcode.
+ * In ins_array below, for ldx instruction:
+ * name => OP_31_XOP_LDX
+ * value => 21
+ */
+
+static struct insn_offset ins_array[] = {
+	{ .name = "OP_31_XOP_LXSIWZX",  .value = 12, },
+	{ .name = "OP_31_XOP_LWARX",	.value = 20, },
+	{ .name = "OP_31_XOP_LDX",	.value = 21, },
+	{ .name = "OP_31_XOP_LWZX",	.value = 23, },
+	{ .name = "OP_31_XOP_LDUX",	.value = 53, },
+	{ .name = "OP_31_XOP_LWZUX",	.value = 55, },
+	{ .name = "OP_31_XOP_LXSIWAX",  .value = 76, },
+	{ .name = "OP_31_XOP_LDARX",    .value = 84, },
+	{ .name = "OP_31_XOP_LBZX",	.value = 87, },
+	{ .name = "OP_31_XOP_LVX",      .value = 103, },
+	{ .name = "OP_31_XOP_LBZUX",    .value = 119, },
+	{ .name = "OP_31_XOP_STXSIWX",  .value = 140, },
+	{ .name = "OP_31_XOP_STDX",	.value = 149, },
+	{ .name = "OP_31_XOP_STWX",	.value = 151, },
+	{ .name = "OP_31_XOP_STDUX",	.value = 181, },
+	{ .name = "OP_31_XOP_STWUX",	.value = 183, },
+	{ .name = "OP_31_XOP_STBX",	.value = 215, },
+	{ .name = "OP_31_XOP_STVX",     .value = 231, },
+	{ .name = "OP_31_XOP_STBUX",	.value = 247, },
+	{ .name = "OP_31_XOP_LHZX",	.value = 279, },
+	{ .name = "OP_31_XOP_LHZUX",	.value = 311, },
+	{ .name = "OP_31_XOP_LXVDSX",   .value = 332, },
+	{ .name = "OP_31_XOP_LWAX",	.value = 341, },
+	{ .name = "OP_31_XOP_LHAX",	.value = 343, },
+	{ .name = "OP_31_XOP_LWAUX",	.value = 373, },
+	{ .name = "OP_31_XOP_LHAUX",	.value = 375, },
+	{ .name = "OP_31_XOP_STHX",	.value = 407, },
+	{ .name = "OP_31_XOP_STHUX",	.value = 439, },
+	{ .name = "OP_31_XOP_LXSSPX",   .value = 524, },
+	{ .name = "OP_31_XOP_LDBRX",	.value = 532, },
+	{ .name = "OP_31_XOP_LSWX",	.value = 533, },
+	{ .name = "OP_31_XOP_LWBRX",	.value = 534, },
+	{ .name = "OP_31_XOP_LFSUX",    .value = 567, },
+	{ .name = "OP_31_XOP_LXSDX",    .value = 588, },
+	{ .name = "OP_31_XOP_LSWI",	.value = 597, },
+	{ .name = "OP_31_XOP_LFDX",     .value = 599, },
+	{ .name = "OP_31_XOP_LFDUX",    .value = 631, },
+	{ .name = "OP_31_XOP_STXSSPX",  .value = 652, },
+	{ .name = "OP_31_XOP_STDBRX",	.value = 660, },
+	{ .name = "OP_31_XOP_STXWX",	.value = 661, },
+	{ .name = "OP_31_XOP_STWBRX",	.value = 662, },
+	{ .name = "OP_31_XOP_STFSX",	.value = 663, },
+	{ .name = "OP_31_XOP_STFSUX",	.value = 695, },
+	{ .name = "OP_31_XOP_STXSDX",   .value = 716, },
+	{ .name = "OP_31_XOP_STSWI",	.value = 725, },
+	{ .name = "OP_31_XOP_STFDX",	.value = 727, },
+	{ .name = "OP_31_XOP_STFDUX",	.value = 759, },
+	{ .name = "OP_31_XOP_LXVW4X",   .value = 780, },
+	{ .name = "OP_31_XOP_LHBRX",	.value = 790, },
+	{ .name = "OP_31_XOP_LXVD2X",   .value = 844, },
+	{ .name = "OP_31_XOP_LFIWAX",	.value = 855, },
+	{ .name = "OP_31_XOP_LFIWZX",	.value = 887, },
+	{ .name = "OP_31_XOP_STXVW4X",  .value = 908, },
+	{ .name = "OP_31_XOP_STHBRX",	.value = 918, },
+	{ .name = "OP_31_XOP_STXVD2X",  .value = 972, },
+	{ .name = "OP_31_XOP_STFIWX",	.value = 983, },
+};
+
+static int cmp_offset(const void *a, const void *b)
+{
+	const struct insn_offset *val1 = a;
+	const struct insn_offset *val2 = b;
+
+	return (val1->value - val2->value);
+}
 
 static struct ins_ops *check_ppc_insn(u32 raw_insn)
 {
 	int opcode = PPC_OP(raw_insn);
+	int mem_insn_31 = PPC_21_30(raw_insn);
+	struct insn_offset *ret;
+	struct insn_offset mem_insns_31_opcode = {
+		"OP_31_INSN",
+		mem_insn_31
+	};
 
 	/*
 	 * Instructions with opcode 32 to 63 are memory
 	 * instructions in powerpc
 	 */
-	if ((opcode & 0x20))
+	if ((opcode & 0x20)) {
 		return &load_store_ops;
+	} else if (opcode == 31) {
+		/* Check for memory instructions with opcode 31 */
+		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
+		if (ret != NULL)
+			return &load_store_ops;
+	}
 
 	return NULL;
 }
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index b30277a930c0..d39ff19ea081 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -698,6 +698,9 @@ static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operan
 {
 	ops->source.mem_ref = true;
 	ops->source.multi_regs = false;
+	/* opcode 31 is of X form */
+	if (PPC_OP(dl->raw.raw_insn) == 31)
+		ops->source.multi_regs = true;
 
 	ops->target.mem_ref = false;
 	ops->target.multi_regs = false;
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 09/15] tools/perf: Add some of the arithmetic instructions to support instruction tracking in powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (7 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 08/15] tools/perf: Add support to identify memory instructions of opcode 31 " Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 10/15] tools/perf: Add more instructions for instruction tracking Athira Rajeev
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Data type profiling has concept of instruction tracking.
Example sequence in powerpc:
	ld      r10,264(r3)
	mr      r31,r3
	<<after some sequence>
	ld      r9,312(r31)
or differently
	lwz	r10,264(r3)
	add	r31, r3, RB
	lwz	r9, 0(r31)
If a sample is hit at "lwz r9, 0(r31)", data type of r31 depends
on previous instruction sequence here. So to track the previous
instructions, patch adds changes to identify some of the arithmetic
instructions which are having opcode as 31. Since memory instructions
also has cases with opcode 31, use the bits 22:30 to filter the
arithmetic instructions here. Also there are instructions with just
two operands like addme, addze. Patch adds new instructions ops
"arithmetic_ops" to handle this
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 49 ++++++++++++++++++
 tools/perf/util/disasm.c                      | 51 +++++++++++++++++++
 2 files changed, 100 insertions(+)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index 1ffb64c6bd0d..aa5ee09fa28f 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -51,6 +51,7 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 
 #define PPC_OP(op)	(((op) >> 26) & 0x3F)
 #define PPC_21_30(R)	(((R) >> 1) & 0x3ff)
+#define PPC_22_30(R)	(((R) >> 1) & 0x1ff)
 
 struct insn_offset {
 	const char	*name;
@@ -134,6 +135,44 @@ static struct insn_offset ins_array[] = {
 	{ .name = "OP_31_XOP_STFIWX",	.value = 983, },
 };
 
+/*
+ * Arithmetic instructions which are having opcode as 31.
+ * These instructions are tracked to save the register state
+ * changes. Example:
+ *
+ * lwz	r10,264(r3)
+ * add	r31, r3, r3
+ * lwz	r9, 0(r31)
+ *
+ * Here instruction tracking needs to identify the "add"
+ * instruction and save data type of r3 to r31. If a sample
+ * is hit at next "lwz r9, 0(r31)", by this instruction tracking,
+ * data type of r31 can be resolved.
+ */
+static struct insn_offset arithmetic_ins_op_31[] = {
+	{ .name = "SUB_CARRY_XO_FORM",  .value = 8, },
+	{ .name = "MUL_HDW_XO_FORM1",   .value = 9, },
+	{ .name = "ADD_CARRY_XO_FORM",  .value = 10, },
+	{ .name = "MUL_HW_XO_FORM1",    .value = 11, },
+	{ .name = "SUB_XO_FORM",        .value = 40, },
+	{ .name = "MUL_HDW_XO_FORM",    .value = 73, },
+	{ .name = "MUL_HW_XO_FORM",     .value = 75, },
+	{ .name = "SUB_EXT_XO_FORM",    .value = 136, },
+	{ .name = "ADD_EXT_XO_FORM",    .value = 138, },
+	{ .name = "SUB_ZERO_EXT_XO_FORM",       .value = 200, },
+	{ .name = "ADD_ZERO_EXT_XO_FORM",       .value = 202, },
+	{ .name = "SUB_EXT_XO_FORM2",   .value = 232, },
+	{ .name = "MUL_DW_XO_FORM",     .value = 233, },
+	{ .name = "ADD_EXT_XO_FORM2",   .value = 234, },
+	{ .name = "MUL_W_XO_FORM",      .value = 235, },
+	{ .name = "ADD_XO_FORM",	.value = 266, },
+	{ .name = "DIV_DW_XO_FORM1",    .value = 457, },
+	{ .name = "DIV_W_XO_FORM1",     .value = 459, },
+	{ .name = "DIV_DW_XO_FORM",	.value = 489, },
+	{ .name = "DIV_W_XO_FORM",	.value = 491, },
+};
+
+
 static int cmp_offset(const void *a, const void *b)
 {
 	const struct insn_offset *val1 = a;
@@ -163,6 +202,16 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
 		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
 		if (ret != NULL)
 			return &load_store_ops;
+		else {
+			mem_insns_31_opcode.value = PPC_22_30(raw_insn);
+			ret = bsearch(&mem_insns_31_opcode, arithmetic_ins_op_31, ARRAY_SIZE(arithmetic_ins_op_31),
+					sizeof(arithmetic_ins_op_31[0]), cmp_offset);
+			if (ret != NULL)
+				return &arithmetic_ops;
+			/* Bits 21 to 30 has value 444 for "mr" insn ie, OR X form */
+			if (PPC_21_30(raw_insn) == 444)
+				return &arithmetic_ops;
+		}
 	}
 
 	return NULL;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index d39ff19ea081..801d57287a35 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -38,6 +38,7 @@ static struct ins_ops nop_ops;
 static struct ins_ops lock_ops;
 static struct ins_ops ret_ops;
 static struct ins_ops load_store_ops;
+static struct ins_ops arithmetic_ops;
 
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
@@ -679,6 +680,56 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+#define PPC_22_30(R)    (((R) >> 1) & 0x1ff)
+#define MINUS_EXT_XO_FORM	234
+#define SUB_EXT_XO_FORM		232
+#define	ADD_ZERO_EXT_XO_FORM	202
+#define	SUB_ZERO_EXT_XO_FORM	200
+
+static int arithmetic__scnprintf(struct ins *ins, char *bf, size_t size,
+		struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			ops->raw);
+}
+
+/*
+ * Sets the fields: multi_regs and "mem_ref".
+ * "mem_ref" is set for ops->source which is later used to
+ * fill the objdump->memory_ref-char field. This ops is currently
+ * used by powerpc and since binary instruction code is used to
+ * extract opcode, regs and offset, no other parsing is needed here.
+ *
+ * Dont set multi regs for 4 cases since it has only one operand
+ * for source:
+ * - Add to Minus One Extended XO-form ( Ex: addme, addmeo )
+ * - Subtract From Minus One Extended XO-form ( Ex: subfme )
+ * - Add to Zero Extended XO-form ( Ex: addze, addzeo )
+ * - Subtract From Zero Extended XO-form ( Ex: subfze )
+ */
+static int arithmetic__parse(struct arch *arch __maybe_unused, struct ins_operands *ops,
+		struct map_symbol *ms __maybe_unused, struct disasm_line *dl)
+{
+	int opcode = PPC_OP(dl->raw.raw_insn);
+
+	ops->source.mem_ref = false;
+	if (opcode == 31) {
+		if ((opcode != MINUS_EXT_XO_FORM) && (opcode != SUB_EXT_XO_FORM) \
+				&& (opcode != ADD_ZERO_EXT_XO_FORM) && (opcode != SUB_ZERO_EXT_XO_FORM))
+			ops->source.multi_regs = true;
+	}
+
+	ops->target.mem_ref = false;
+	ops->target.multi_regs = false;
+
+	return 0;
+}
+
+static struct ins_ops arithmetic_ops = {
+	.parse     = arithmetic__parse,
+	.scnprintf = arithmetic__scnprintf,
+};
+
 static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
 		struct ins_operands *ops, int max_ins_name)
 {
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 10/15] tools/perf: Add more instructions for instruction tracking
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (8 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 09/15] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 11/15] tools/perf: Update instruction tracking for powerpc Athira Rajeev
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Add few more instructions and use opcode as search key
to find if it is supported by the architecture. Added ones
are: addi, addic, addic., addis, subfic and mulli
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/powerpc/annotate/instructions.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index aa5ee09fa28f..aa25a336d8d0 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -172,6 +172,14 @@ static struct insn_offset arithmetic_ins_op_31[] = {
 	{ .name = "DIV_W_XO_FORM",	.value = 491, },
 };
 
+static struct insn_offset arithmetic_two_ops[] = {
+	{ .name = "mulli",      .value = 7, },
+	{ .name = "subfic",     .value = 8, },
+	{ .name = "addic",      .value = 12, },
+	{ .name = "addic.",     .value = 13, },
+	{ .name = "addi",       .value = 14, },
+	{ .name = "addis",      .value = 15, },
+};
 
 static int cmp_offset(const void *a, const void *b)
 {
@@ -212,6 +220,12 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
 			if (PPC_21_30(raw_insn) == 444)
 				return &arithmetic_ops;
 		}
+	} else {
+		mem_insns_31_opcode.value = opcode;
+		ret = bsearch(&mem_insns_31_opcode, arithmetic_two_ops, ARRAY_SIZE(arithmetic_two_ops),
+				sizeof(arithmetic_two_ops[0]), cmp_offset);
+		if (ret != NULL)
+			return &arithmetic_ops;
 	}
 
 	return NULL;
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 11/15] tools/perf: Update instruction tracking for powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (9 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 10/15] tools/perf: Add more instructions for instruction tracking Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 12/15] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Add instruction tracking function "update_insn_state_powerpc" for
powerpc. Example sequence in powerpc:
ld      r10,264(r3)
mr      r31,r3
<<after some sequence>
ld      r9,312(r31)
Consider ithe sample is pointing to: "ld r9,312(r31)".
Here the memory reference is hit at "312(r31)" where 312 is the offset
and r31 is the source register. Previous instruction sequence shows that
register state of r3 is moved to r31. So to identify the data type for r31
access, the previous instruction ("mr") needs to be tracked and the
state type entry has to be updated. Current instruction tracking support
in perf tools infrastructure is specific to x86. Patch adds this support
for powerpc as well.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 59 +++++++++++++++++++
 tools/perf/util/annotate-data.c               |  9 ++-
 tools/perf/util/disasm.c                      |  3 +
 3 files changed, 70 insertions(+), 1 deletion(-)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index aa25a336d8d0..af1032572bf3 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -231,6 +231,65 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
 	return NULL;
 }
 
+/*
+ * Instruction tracking function to track register state moves.
+ * Example sequence:
+ *    ld      r10,264(r3)
+ *    mr      r31,r3
+ *    <<after some sequence>
+ *    ld      r9,312(r31)
+ *
+ * Previous instruction sequence shows that register state of r3
+ * is moved to r31. update_insn_state_powerpc tracks these state
+ * changes
+ */
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_powerpc(struct type_state *state,
+		struct data_loc_info *dloc, Dwarf_Die * cu_die __maybe_unused,
+		struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	/*
+	 * Value 444 for bits 21:30 is for "mr"
+	 * instruction. "mr" is extended OR. So set the
+	 * source and destination reg correctly
+	 */
+	if (PPC_21_30(dl->raw.raw_insn) == 444) {
+		int src_reg = src->reg1;
+
+		src->reg1 = dst->reg1;
+		dst->reg1 = src_reg;
+	}
+
+	if (!has_reg_type(state, dst->reg1))
+		return;
+
+	tsr = &state->regs[dst->reg1];
+
+	if (!has_reg_type(state, src->reg1) ||
+			!state->regs[src->reg1].ok) {
+		tsr->ok = false;
+		return;
+	}
+
+	tsr->type = state->regs[src->reg1].type;
+	tsr->kind = state->regs[src->reg1].kind;
+	tsr->ok = true;
+
+	pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			insn_offset, src->reg1, dst->reg1);
+	pr_debug_type_name(&tsr->type, tsr->kind);
+}
+#endif /* HAVE_DWARF_SUPPORT */
+
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 7a48c3d72b89..734acdd8c4b7 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1080,6 +1080,13 @@ static int find_data_type_insn(struct data_loc_info *dloc,
 	return ret;
 }
 
+static int arch_supports_insn_tracking(struct data_loc_info *dloc)
+{
+	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")))
+		return 1;
+	return 0;
+}
+
 /*
  * Construct a list of basic blocks for each scope with variables and try to find
  * the data type by updating a type state table through instructions.
@@ -1094,7 +1101,7 @@ static int find_data_type_block(struct data_loc_info *dloc,
 	int ret = -1;
 
 	/* TODO: other architecture support */
-	if (!arch__is(dloc->arch, "x86"))
+	if (!arch_supports_insn_tracking(dloc))
 		return -1;
 
 	prev_dst_ip = dst_ip = dloc->ip;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 801d57287a35..a839f037bdaf 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -157,6 +157,9 @@ static struct arch architectures[] = {
 	{
 		.name = "powerpc",
 		.init = powerpc__annotate_init,
+#ifdef HAVE_DWARF_SUPPORT
+		.update_insn_state = update_insn_state_powerpc,
+#endif
 	},
 	{
 		.name = "riscv64",
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 12/15] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (10 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 11/15] tools/perf: Update instruction tracking for powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 13/15] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
symbol__disassemble_capstone in util/disasm.c calls function
open_capstone_handle to open/init the capstone. We already have a
capstone_init function in "util/print_insn.c". But capstone_init
is defined as a static function in util/print_insn.c. Change this and
also add the function in print_insn.h
The open_capstone_handle checks the disassembler_style option from
annotation_options to decide whether to set CS_OPT_SYNTAX_ATT.
Add that logic in capstone_init also and by default set it to true.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/print_insn.c | 12 +++++++++---
 tools/perf/util/print_insn.h |  5 +++++
 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index a950e9157d2d..a76aae81d7a0 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -32,7 +32,7 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 #include <capstone/capstone.h>
 
-static int capstone_init(struct machine *machine, csh *cs_handle, bool is64)
+int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disassembler_style)
 {
 	cs_arch arch;
 	cs_mode mode;
@@ -62,7 +62,13 @@ static int capstone_init(struct machine *machine, csh *cs_handle, bool is64)
 	}
 
 	if (machine__normalized_is(machine, "x86")) {
-		cs_option(*cs_handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
+		/*
+		 * In case of using capstone_init while symbol__disassemble
+		 * setting CS_OPT_SYNTAX_ATT depends if disassembler_style opts
+		 * is set via annotation args
+		 */
+		if (disassembler_style)
+			cs_option(*cs_handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
 		/*
 		 * Resolving address operands to symbols is implemented
 		 * on x86 by investigating instruction details.
@@ -122,7 +128,7 @@ ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpum
 	int ret;
 
 	/* TODO: Try to initiate capstone only once but need a proper place. */
-	ret = capstone_init(machine, &cs_handle, is64bit);
+	ret = capstone_init(machine, &cs_handle, is64bit, true);
 	if (ret < 0)
 		return ret;
 
diff --git a/tools/perf/util/print_insn.h b/tools/perf/util/print_insn.h
index 07d11af3fc1c..2c8ee41c4a5d 100644
--- a/tools/perf/util/print_insn.h
+++ b/tools/perf/util/print_insn.h
@@ -19,4 +19,9 @@ ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpum
 			 bool is64bit, const uint8_t *code, size_t code_size,
 			 uint64_t ip, int *lenp, int print_opts, FILE *fp);
 
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+#include <capstone/capstone.h>
+int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disassembler_style);
+#endif
+
 #endif /* PERF_PRINT_INSN_H */
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 13/15] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (11 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 12/15] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-18  8:43 ` [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
capstone_init is made availbale for all archs to use and updated to
enable support for CS_ARCH_PPC as well. Patch removes
open_capstone_handle and uses capstone_init in all the places.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c     | 42 +++++++++++-------------------------
 tools/perf/util/print_insn.c |  3 +++
 2 files changed, 15 insertions(+), 30 deletions(-)
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index a839f037bdaf..a848e6f5f05a 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -26,6 +26,7 @@
 #include "symbol.h"
 #include "util.h"
 #include "sort.h"
+#include "print_insn.h"
 
 static regex_t	 file_lineno;
 
@@ -1510,32 +1511,6 @@ symbol__disassemble_bpf_image(struct symbol *sym,
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
 #include <capstone/capstone.h>
 
-static int open_capstone_handle(struct annotate_args *args, bool is_64bit,
-				csh *handle)
-{
-	struct annotation_options *opt = args->options;
-	cs_mode mode = is_64bit ? CS_MODE_64 : CS_MODE_32;
-
-	/* TODO: support more architectures */
-	if (!arch__is(args->arch, "x86"))
-		return -1;
-
-	if (cs_open(CS_ARCH_X86, mode, handle) != CS_ERR_OK)
-		return -1;
-
-	if (!opt->disassembler_style ||
-	    !strcmp(opt->disassembler_style, "att"))
-		cs_option(*handle, CS_OPT_SYNTAX, CS_OPT_SYNTAX_ATT);
-
-	/*
-	 * Resolving address operands to symbols is implemented
-	 * on x86 by investigating instruction details.
-	 */
-	cs_option(*handle, CS_OPT_DETAIL, CS_OPT_ON);
-
-	return 0;
-}
-
 struct find_file_offset_data {
 	u64 ip;
 	u64 offset;
@@ -1632,6 +1607,7 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 	cs_insn *insn;
 	char disasm_buf[512];
 	struct disasm_line *dl;
+	bool disassembler_style = false;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -1646,7 +1622,11 @@ static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 			    &is_64bit) == 0)
 		goto err;
 
-	if (open_capstone_handle(args, is_64bit, &handle) < 0)
+	if (!args->options->disassembler_style ||
+			!strcmp(args->options->disassembler_style, "att"))
+		disassembler_style = true;
+
+	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
@@ -1966,9 +1946,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 	}
 
 #ifdef HAVE_LIBCAPSTONE_SUPPORT
-	err = symbol__disassemble_capstone(symfs_filename, sym, args);
-	if (err == 0)
-		goto out_remove_tmp;
+	if (arch__is(args->arch, "x86")) {
+		err = symbol__disassemble_capstone(symfs_filename, sym, args);
+		if (err == 0)
+			goto out_remove_tmp;
+	}
 #endif
 
 	err = asprintf(&command,
diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index a76aae81d7a0..79dec5ab3bef 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -52,6 +52,9 @@ int capstone_init(struct machine *machine, csh *cs_handle, bool is64, bool disas
 	} else if (machine__normalized_is(machine, "s390")) {
 		arch = CS_ARCH_SYSZ;
 		mode = CS_MODE_BIG_ENDIAN;
+	} else if (machine__normalized_is(machine, "powerpc")) {
+		arch = CS_ARCH_PPC;
+		mode = CS_MODE_64;
 	} else {
 		return -1;
 	}
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (12 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 13/15] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-24 21:00   ` Arnaldo Carvalho de Melo
  2024-07-31 13:51   ` kernel test robot
  2024-07-18  8:43 ` [PATCH V8 15/15] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
  2024-07-23 20:07 ` [PATCH V8 00/15] Add data type profiling support for powerpc Arnaldo Carvalho de Melo
  15 siblings, 2 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Now perf uses the capstone library to disassemble the instructions in
x86. capstone is used (if available) for perf annotate to speed up.
Currently it only supports x86 architecture. Patch includes changes to
enable this in powerpc. For now, only for data type sort keys, this
method is used and only binary code (raw instruction) is read. This is
because powerpc approach to understand instructions and reg fields uses
raw instruction. The "cs_disasm" is currently not enabled. While
attempting to do cs_disasm, observation is that some of the instructions
were not identified (ex: extswsli, maddld) and it had to fallback to use
objdump. Hence enabling "cs_disasm" is added in comment section as a
TODO for powerpc.
Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index a848e6f5f05a..63681df6482b 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1585,6 +1585,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
 	}
 }
 
+static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
+					struct annotate_args *args)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	struct map *map = args->ms.map;
+	struct dso *dso = map__dso(map);
+	struct nscookie nsc;
+	u64 start = map__rip_2objdump(map, sym->start);
+	u64 end = map__rip_2objdump(map, sym->end);
+	u64 len = end - start;
+	u64 offset;
+	int i, fd, count;
+	bool is_64bit = false;
+	bool needs_cs_close = false;
+	u8 *buf = NULL;
+	struct find_file_offset_data data = {
+		.ip = start,
+	};
+	csh handle;
+	char disasm_buf[512];
+	struct disasm_line *dl;
+	u32 *line;
+	bool disassembler_style = false;
+
+	if (args->options->objdump_path)
+		return -1;
+
+	nsinfo__mountns_enter(dso->nsinfo, &nsc);
+	fd = open(filename, O_RDONLY);
+	nsinfo__mountns_exit(&nsc);
+	if (fd < 0)
+		return -1;
+
+	if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
+			    &is_64bit) == 0)
+		goto err;
+
+	if (!args->options->disassembler_style ||
+			!strcmp(args->options->disassembler_style, "att"))
+		disassembler_style = true;
+
+	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
+		goto err;
+
+	needs_cs_close = true;
+
+	buf = malloc(len);
+	if (buf == NULL)
+		goto err;
+
+	count = pread(fd, buf, len, data.offset);
+	close(fd);
+	fd = -1;
+
+	if ((u64)count != len)
+		goto err;
+
+	line = (u32 *)buf;
+
+	/* add the function address and name */
+	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
+		  start, sym->name);
+
+	args->offset = -1;
+	args->line = disasm_buf;
+	args->line_nr = 0;
+	args->fileloc = NULL;
+	args->ms.sym = sym;
+
+	dl = disasm_line__new(args);
+	if (dl == NULL)
+		goto err;
+
+	annotation_line__add(&dl->al, ¬es->src->source);
+
+	/*
+	 * TODO: enable disassm for powerpc
+	 * count = cs_disasm(handle, buf, len, start, len, &insn);
+	 *
+	 * For now, only binary code is saved in disassembled line
+	 * to be used in "type" and "typeoff" sort keys. Each raw code
+	 * is 32 bit instruction. So use "len/4" to get the number of
+	 * entries.
+	 */
+	count = len/4;
+
+	for (i = 0, offset = 0; i < count; i++) {
+		args->offset = offset;
+		sprintf(args->line, "%x", line[i]);
+
+		dl = disasm_line__new(args);
+		if (dl == NULL)
+			goto err;
+
+		annotation_line__add(&dl->al, ¬es->src->source);
+
+		offset += 4;
+	}
+
+	/* It failed in the middle */
+	if (offset != len) {
+		struct list_head *list = ¬es->src->source;
+
+		/* Discard all lines and fallback to objdump */
+		while (!list_empty(list)) {
+			dl = list_first_entry(list, struct disasm_line, al.node);
+
+			list_del_init(&dl->al.node);
+			disasm_line__free(dl);
+		}
+		count = -1;
+	}
+
+out:
+	if (needs_cs_close)
+		cs_close(&handle);
+	free(buf);
+	return count < 0 ? count : 0;
+
+err:
+	if (fd >= 0)
+		close(fd);
+	if (needs_cs_close) {
+		struct disasm_line *tmp;
+
+		/*
+		 * It probably failed in the middle of the above loop.
+		 * Release any resources it might add.
+		 */
+		list_for_each_entry_safe(dl, tmp, ¬es->src->source, al.node) {
+			list_del(&dl->al.node);
+			free(dl);
+		}
+	}
+	count = -1;
+	goto out;
+}
+
 static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
 					struct annotate_args *args)
 {
@@ -1942,6 +2080,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 			err = symbol__disassemble_raw(symfs_filename, sym, args);
 			if (err == 0)
 				goto out_remove_tmp;
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+			err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
+			if (err == 0)
+				goto out_remove_tmp;
+#endif
 		}
 	}
 
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* [PATCH V8 15/15] tools/perf: Set instruction name to be used with insn-stat when using raw instruction
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (13 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
@ 2024-07-18  8:43 ` Athira Rajeev
  2024-07-23 20:07 ` [PATCH V8 00/15] Add data type profiling support for powerpc Arnaldo Carvalho de Melo
  15 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-18  8:43 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher,
	christophe.leroy
  Cc: linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	atrajeev, kjain, hbathini, disgoel
Since the "ins.name" is not set while using raw instruction,
perf annotate with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)
  Name      :  Good   Bad
  -----------------------------------------------------------
            :   419   196
Patch sets "dl->ins.name" in arch specific function "check_ppc_insn"
while initialising "struct disasm_line". Also update "ins_find" function
to pass "struct disasm_line" as a parameter so as to set its name field
in arch specific call.
With the patch changes:
  Annotate Instruction stats
  total 609, ok 446 (73.2%), bad 163 (26.8%)
  Name/opcode         :  Good   Bad
  -----------------------------------------------------------
  58                  :   323    80
  32                  :    49    43
  34                  :    33    11
  OP_31_XOP_LDX       :     8    20
  40                  :    23     0
  OP_31_XOP_LWARX     :     5     1
  OP_31_XOP_LWZX      :     2     3
  OP_31_XOP_LDARX     :     3     0
  33                  :     0     2
  OP_31_XOP_LBZX      :     0     1
  OP_31_XOP_LWAX      :     0     1
  OP_31_XOP_LHZX      :     0     1
Reviewed-and-tested-by: Kajol Jain<kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c  | 18 +++++++++++++++---
 tools/perf/builtin-annotate.c                  |  4 ++--
 tools/perf/util/annotate.c                     |  2 +-
 tools/perf/util/disasm.c                       | 10 +++++-----
 tools/perf/util/disasm.h                       |  2 +-
 5 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index af1032572bf3..ede9eeade0ab 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -189,8 +189,9 @@ static int cmp_offset(const void *a, const void *b)
 	return (val1->value - val2->value);
 }
 
-static struct ins_ops *check_ppc_insn(u32 raw_insn)
+static struct ins_ops *check_ppc_insn(struct disasm_line *dl)
 {
+	int raw_insn = dl->raw.raw_insn;
 	int opcode = PPC_OP(raw_insn);
 	int mem_insn_31 = PPC_21_30(raw_insn);
 	struct insn_offset *ret;
@@ -198,19 +199,30 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
 		"OP_31_INSN",
 		mem_insn_31
 	};
+	char name_insn[32];
 
 	/*
 	 * Instructions with opcode 32 to 63 are memory
 	 * instructions in powerpc
 	 */
 	if ((opcode & 0x20)) {
+		/*
+		 * Set name in case of raw instruction to
+		 * opcode to be used in insn-stat
+		 */
+		if (!strlen(dl->ins.name)) {
+			sprintf(name_insn, "%d", opcode);
+			dl->ins.name = strdup(name_insn);
+		}
 		return &load_store_ops;
 	} else if (opcode == 31) {
 		/* Check for memory instructions with opcode 31 */
 		ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
-		if (ret != NULL)
+		if (ret) {
+			if (!strlen(dl->ins.name))
+				dl->ins.name = strdup(ret->name);
 			return &load_store_ops;
-		else {
+		} else {
 			mem_insns_31_opcode.value = PPC_22_30(raw_insn);
 			ret = bsearch(&mem_insns_31_opcode, arithmetic_ins_op_31, ARRAY_SIZE(arithmetic_ins_op_31),
 					sizeof(arithmetic_ins_op_31[0]), cmp_offset);
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index b10b7f005658..cf60392b1c19 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -396,10 +396,10 @@ static void print_annotate_item_stat(struct list_head *head, const char *title)
 	printf("total %d, ok %d (%.1f%%), bad %d (%.1f%%)\n\n", total,
 	       total_good, 100.0 * total_good / (total ?: 1),
 	       total_bad, 100.0 * total_bad / (total ?: 1));
-	printf("  %-10s: %5s %5s\n", "Name", "Good", "Bad");
+	printf("  %-20s: %5s %5s\n", "Name/opcode", "Good", "Bad");
 	printf("-----------------------------------------------------------\n");
 	list_for_each_entry(istat, head, list)
-		printf("  %-10s: %5d %5d\n", istat->name, istat->good, istat->bad);
+		printf("  %-20s: %5d %5d\n", istat->name, istat->good, istat->bad);
 	printf("\n");
 }
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ce99db291c5e..a2ee4074f768 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2229,7 +2229,7 @@ static struct annotated_item_stat *annotate_data_stat(struct list_head *head,
 		return NULL;
 
 	istat->name = strdup(name);
-	if (istat->name == NULL) {
+	if ((istat->name == NULL) || (!strlen(istat->name))) {
 		free(istat);
 		return NULL;
 	}
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 63681df6482b..cd283c42195c 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -859,7 +859,7 @@ static void ins__sort(struct arch *arch)
 	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins_ops *__ins__find(struct arch *arch, const char *name, u32 raw_insn)
+static struct ins_ops *__ins__find(struct arch *arch, const char *name, struct disasm_line *dl)
 {
 	struct ins *ins;
 	const int nmemb = arch->nr_instructions;
@@ -871,7 +871,7 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name, u32 raw_
 		 */
 		struct ins_ops *ops;
 
-		ops = check_ppc_insn(raw_insn);
+		ops = check_ppc_insn(dl);
 		if (ops)
 			return ops;
 	}
@@ -905,9 +905,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name, u32 raw_
 	return ins ? ins->ops : NULL;
 }
 
-struct ins_ops *ins__find(struct arch *arch, const char *name, u32 raw_insn)
+struct ins_ops *ins__find(struct arch *arch, const char *name, struct disasm_line *dl)
 {
-	struct ins_ops *ops = __ins__find(arch, name, raw_insn);
+	struct ins_ops *ops = __ins__find(arch, name, dl);
 
 	if (!ops && arch->associate_instruction_ops)
 		ops = arch->associate_instruction_ops(arch, name);
@@ -917,7 +917,7 @@ struct ins_ops *ins__find(struct arch *arch, const char *name, u32 raw_insn)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
 {
-	dl->ins.ops = ins__find(arch, dl->ins.name, dl->raw.raw_insn);
+	dl->ins.ops = ins__find(arch, dl->ins.name, dl);
 
 	if (!dl->ins.ops)
 		return;
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index c1bb1e484bfb..f56beedeb9da 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -105,7 +105,7 @@ struct annotate_args {
 struct arch *arch__find(const char *name);
 bool arch__is(struct arch *arch, const char *name);
 
-struct ins_ops *ins__find(struct arch *arch, const char *name, u32 raw_insn);
+struct ins_ops *ins__find(struct arch *arch, const char *name, struct disasm_line *dl);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 		   struct ins_operands *ops, int max_ins_name);
 
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc
  2024-07-18  8:43 ` [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc Athira Rajeev
@ 2024-07-23 19:06   ` Arnaldo Carvalho de Melo
  2024-07-24  5:38     ` Athira Rajeev
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-07-23 19:06 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, hbathini, disgoel
On Thu, Jul 18, 2024 at 02:13:46PM +0530, Athira Rajeev wrote:
> TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined
> to be 16. While checking if reg is valid using has_reg_type,
> max value is checked using TYPE_STATE_MAX_REGS value. Define
> this conditionally for powerpc.
So what would happen if I get a perf.data file on a powerpc system and
then try to do data-type profiling on a x86 system?
I'm processing this now, but please consider fixing this up in some
other fashion, I think we have support for collecting registers in a way
that perf.data has all that is needed for us to print them in a cross
arch way, no?
I see there is the FIXME there, ok.
- Arnaldo
 
> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/util/annotate-data.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
> index 6fe8ee8b8410..992b7ce4bd11 100644
> --- a/tools/perf/util/annotate-data.h
> +++ b/tools/perf/util/annotate-data.h
> @@ -189,7 +189,11 @@ struct type_state_stack {
>  };
>  
>  /* FIXME: This should be arch-dependent */
> +#ifdef __powerpc__
> +#define TYPE_STATE_MAX_REGS  32
> +#else
>  #define TYPE_STATE_MAX_REGS  16
> +#endif
>  
>  /*
>   * State table to maintain type info in each register and stack location.
> -- 
> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-07-18  8:43 ` [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
@ 2024-07-23 19:14   ` Arnaldo Carvalho de Melo
  2024-07-24  5:38     ` Athira Rajeev
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-07-23 19:14 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, hbathini, disgoel
On Thu, Jul 18, 2024 at 02:13:47PM +0530, Athira Rajeev wrote:
> Currently, the perf tool infrastructure disasm_line__parse function to
> parse disassembled line.
> 
> Example snippet from objdump:
> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> 
> c0000000010224b4:	lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. In powerpc, the approach for data type
> profiling uses raw instruction instead of result from objdump to identify
> the instruction category and extract the source/target registers.
> 
> Example: 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. Add function
> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
> Also update "struct disasm_line" to save the binary code/
> With the change, function captures:
> 
> line -> "38 01 81 e8     ld      r4,312(r1)"
> raw instruction "38 01 81 e8"
> 
> Raw instruction is used later to extract the reg/offset fields. Macros
> are added to extract opcode and register fields. "struct disasm_line"
> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
> code (raw). Function "disasm_line__parse_powerpc fills the raw
> instruction hex value and can use macros to get opcode. There is no
> changes in existing code paths, which parses the disassembled code.
> The size of raw instruction depends on architecture. In case of powerpc,
> the parsing the disasm line needs to handle cases for reading binary code
> directly from DSO as well as parsing the objdump result. Hence adding
> the logic into separate function instead of updating "disasm_line__parse".
> The architecture using the instruction name and present approach is
> not altered. Since this approach targets powerpc, the macro
> implementation is added for powerpc as of now.
> 
> Since the disasm_line__parse is used in other cases (perf annotate) and
> not only data tye profiling, the powerpc callback includes changes to
> work with binary code as well as mneumonic representation. Also in case
> if the DSO read fails and libcapstone is not supported, the approach
> fallback to use objdump as option. Hence as option, patch has changes to
> ensure objdump option also works well.
> 
> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/include/linux/string.h                  |  2 +
>  tools/lib/string.c                            | 13 +++++
>  .../perf/arch/powerpc/annotate/instructions.c |  1 +
>  tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 ++++
>  tools/perf/util/annotate.h                    |  5 +-
>  tools/perf/util/disasm.c                      | 48 ++++++++++++++++++-
>  6 files changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
> index db5c99318c79..0acb1fc14e19 100644
> --- a/tools/include/linux/string.h
> +++ b/tools/include/linux/string.h
> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>  
>  extern char *strim(char *);
>  
> +extern void remove_spaces(char *s);
> +
>  extern void *memchr_inv(const void *start, int c, size_t bytes);
>  #endif /* _TOOLS_LINUX_STRING_H_ */
> diff --git a/tools/lib/string.c b/tools/lib/string.c
> index 8b6892f959ab..3126d2cff716 100644
> --- a/tools/lib/string.c
> +++ b/tools/lib/string.c
> @@ -153,6 +153,19 @@ char *strim(char *s)
>  	return skip_spaces(s);
>  }
>  
> +/*
> + * remove_spaces - Removes whitespaces from @s
> + */
> +void remove_spaces(char *s)
> +{
> +	char *d = s;
> +
> +	do {
> +		while (*d == ' ')
> +			++d;
> +	} while ((*s++ = *d++));
> +}
> +
>  /**
>   * strreplace - Replace all occurrences of character in string.
>   * @s: The string to operate on.
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
> index a3f423c27cae..d57fd023ef9c 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>  		arch->initialized = true;
>  		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>  		arch->objdump.comment_char      = '#';
> +		annotate_opts.show_asm_raw = true;
>  	}
>  
>  	return 0;
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index 0c4f4caf53ac..430623ca5612 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
>  			return roff->ptregs_offset;
>  	return -EINVAL;
>  }
> +
> +#define PPC_OP(op)	(((op) >> 26) & 0x3F)
> +#define PPC_RA(a)	(((a) >> 16) & 0x1f)
> +#define PPC_RT(t)	(((t) >> 21) & 0x1f)
> +#define PPC_RB(b)	(((b) >> 11) & 0x1f)
> +#define PPC_D(D)	((D) & 0xfffe)
> +#define PPC_DS(DS)	((DS) & 0xfffc)
> +#define OP_LD	58
> +#define OP_STD	62
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index d5c821c22f79..9ba772f46270 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -113,7 +113,10 @@ struct annotation_line {
>  struct disasm_line {
>  	struct ins		 ins;
>  	struct ins_operands	 ops;
> -
> +	union {
> +		u8 bytes[4];
> +		u32 raw_insn;
> +	} raw;
>  	/* This needs to be at the end. */
>  	struct annotation_line	 al;
>  };
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index d2723ba024bf..a53591a6111e 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -44,6 +44,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>  
>  static void ins__sort(struct arch *arch);
>  static int disasm_line__parse(char *line, const char **namep, char **rawp);
> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
>  
>  static __attribute__((constructor)) void symbol__init_regexpr(void)
>  {
> @@ -845,6 +846,48 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>  	return -1;
>  }
>  
> +/*
> + * Parses the result captured from symbol__disassemble_*
> + * Example, line read from DSO file in powerpc:
> + * line:    38 01 81 e8
> + * opcode: fetched from arch specific get_opcode_insn
> + * rawp_insn: e8810138
> + *
> + * rawp_insn is used later to extract the reg/offset fields
> + */
> +#define	PPC_OP(op)	(((op) >> 26) & 0x3F)
> +#define	RAW_BYTES	11
> +
> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
> +{
> +	char *line = dl->al.line;
> +	const char **namep = &dl->ins.name;
> +	char **rawp = &dl->ops.raw;
> +	char *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
> +	char *name = skip_spaces(name_raw_insn + RAW_BYTES);
> +	int objdump = 0;
> +
> +	if (strlen(line) > RAW_BYTES)
> +		objdump = 1;
> +
> +	if (name_raw_insn[0] == '\0')
> +		return -1;
> +
> +	if (objdump) {
> +		disasm_line__parse(name, namep, rawp);
> +	} else
> +		*namep = "";
> +
> +	tmp_raw_insn = strndup(name_raw_insn, 11);
Not  checking the result of strndup(), I'll try to add the check on an
extra pass after I read the patches.
- Arnaldo
> +	remove_spaces(tmp_raw_insn);
> +
> +	sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn);
> +	if (objdump)
> +		dl->raw.raw_insn = be32_to_cpu(dl->raw.raw_insn);
> +
> +	return 0;
> +}
> +
>  static void annotation_line__init(struct annotation_line *al,
>  				  struct annotate_args *args,
>  				  int nr)
> @@ -898,7 +941,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>  		goto out_delete;
>  
>  	if (args->offset != -1) {
> -		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
> +		if (arch__is(args->arch, "powerpc")) {
> +			if (disasm_line__parse_powerpc(dl) < 0)
> +				goto out_free_line;
> +		} else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>  			goto out_free_line;
When the if has {}, the else should as well.
Documentation/process/coding-style.rst has this documented.
This is minor, I'm pointing out so that you take into account for the
future.
- Arnaldo
>  
>  		disasm_line__init_ins(dl, args->arch, &args->ms);
> -- 
> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 00/15] Add data type profiling support for powerpc
  2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
                   ` (14 preceding siblings ...)
  2024-07-18  8:43 ` [PATCH V8 15/15] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
@ 2024-07-23 20:07 ` Arnaldo Carvalho de Melo
  2024-07-24  7:59   ` Athira Rajeev
  15 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-07-23 20:07 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, hbathini, disgoel
On Thu, Jul 18, 2024 at 02:13:43PM +0530, Athira Rajeev wrote:
> The patchset from Namhyung added support for data type profiling
> in perf tool. This enabled support to associate PMU samples to data
> types they refer using DWARF debug information. With the upstream
> perf, currently it possible to run perf report or perf annotate to
> view the data type information on x86.
> 
> Initial patchset posted here had changes need to enable data type
> profiling support for powerpc.
I have this in tmp.perf-tools-next now, will do more tests before moving
to perf-tools-next.
I see that Namhyung reviewed the patch series, and that it was tested,
in general I thought we need to go on improving things go avoid having
arch specific code in the non-arch code, I guess right now its important
to land these to try and encourage people to try data-type profiling
more widely, hopefully finding and improving data structures in the
kernel.
We need to write about such cases so as to provide success stories on
using perf for data-type profiling.
Thanks for working on this!
- Arnaldo
 
> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
> 
> Main change were:
> 1. powerpc instruction nmemonic table to associate load/store
> instructions with move_ops which is use to identify if instruction
> is a memory access one.
> 2. To get register number and access offset from the given
> instruction, code uses fields from "struct arch" -> objump.
> Added entry for powerpc here.
> 3. A get_arch_regnum to return register number from the
> register name string.
> 
> But the apporach used in the initial patchset used parsing of
> disassembled code which the current perf tool implementation does.
> 
> Example: lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, second patchset added support to use
> raw instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> Link to second patchset:
> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
> 
> Example representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
>   _____________________________________
>   | 58 |  RT  |  RA |      DS       | |
>   -------------------------------------
> 0    6     11    16              30 31
> 
> Second patchset used "objdump" again to read the raw instruction.
> But since there is no need to disassemble and binary code can be read
> directly from the DSO, third patchset (ie this patchset) uses below
> apporach. The apporach preferred in powerpc to parse sample for data
> type profiling in V3 patchset is:
> - Read directly from DSO using dso__data_read_offset
> - If that fails for any case, fallback to using libcapstone
> - If libcapstone is not supported, approach will use objdump
> 
> Patchset adds support to pick the opcode and reg fields from this
> raw/binary instruction code. This approach came in from review comment
> by Segher Boessenkool and Christophe for the initial patchset.
> 
> Apart from that, instruction tracking is enabled for powerpc.
> 
> These are handled in this patchset.
> 
> - Patch 1 is to rearrange register state type structures to header file
> so that it can referred from other arch specific files
> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> so that it can be implemented by other archs easily and defined in arch
> specific files
> - Patch 3 is to handle state type regs array size for x86 and powerpc
> - Patch 4 adds support to capture and parse raw instruction in powerpc
> using dso__data_read_offset utility
> - Patch 4 also adds logic to support using objdump when doing default "perf
> report" or "perf annotate" since it that needs disassembled instruction.
> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> - Patch 6 update parameters for reg extract functions to use raw
> instruction on powerpc
> - Patch 7 updates ins__find to carry raw_insn and also adds parse
> callback for memory instructions for powerpc
> - Patch 8 add support to identify memory instructions of opcode 31 in
> powerpc
> - Patch 9 adds more instructions to support instruction tracking in powerpc
> - Patch 10 and 11 handles instruction tracking for powerpc.
> - Patch 12, 13 and 14 add support to use libcapstone in powerpc
> - Patch 15 handles insn-stat option for perf annotate
> 
> Note:
> - There are remaining unknowns (25%) as seen in annotate Instruction stats
> below.
> - This patchset is not tested on powerpc32. In next step of enhancements
> along with handling remaining unknowns, plan to cover powerpc32 changes
> based on how testing goes.
> - Patch changes to handle global register support will be posted in
> follow up after addressing review comment from Namhung here:
> https://lore.kernel.org/linux-perf-users/CAM9d7cg5MYvLeOoBuKqp1pw7uvRfqCw1fXpLtgct0npL96JaYg@mail.gmail.com/T/#mb309184ebf5d2c9b539cd86d61ffe2a3696eac71
> 
> With the current patchset:
> 
>  ./perf record -a -e mem-loads sleep 1
>  ./perf report -s type,typeoff --hierarchy --group --stdio
>  ./perf annotate --data-type --insn-stat
> 
> perf annotate logs:
> ==================
> 
> Annotate Instruction stats
> total 609, ok 266 (43.7%), bad 343 (56.3%)
> 
>   Name/opcode         :  Good   Bad
> -----------------------------------------------------------
>   58                  :   172   231
>   32                  :    48    44
>   34                  :    19    25
>   OP_31_XOP_LDX       :     8    20
>   40                  :     9    14
>   OP_31_XOP_LWARX     :     5     1
>   OP_31_XOP_LWZX      :     2     3
>   OP_31_XOP_LDARX     :     3     0
>   33                  :     0     2
>   OP_31_XOP_LBZX      :     0     1
>   OP_31_XOP_LWAX      :     0     1
>   OP_31_XOP_LHZX      :     0     1
>  
> perf report logs:
> =================
> 
>   Total Lost Samples: 0
> 
>   Samples: 1K of event 'mem-loads'
>   Event count (approx.): 937238
> 
>     Overhead  Data Type / Data Type Offset
>     ........  .........  ................
>      1.73%     struct task_struct
>         1.29%     struct task_struct +276 (flags)
>         0.10%     struct task_struct +1872 (mm)
>         0.08%     struct task_struct +3632 (thread.regs)
>         0.08%     struct task_struct +1072 (sched_class)
>         0.05%     struct task_struct +1928 ()
>         0.03%     struct task_struct +1064 (dl_server)
>         0.02%     struct task_struct +0 (thread_info.preempt_count)
>         0.02%     struct task_struct +120 (thread_info)
>         0.02%     struct task_struct +2216 (worker_private)
>         0.02%     struct task_struct +24 (thread_info.slb_preload_nr)
>         0.02%     struct task_struct +8 (thread_info.local_flags)
>         0.01%     struct task_struct +1416 (nr_cpus_allowed)
>      1.68%     struct file
>         0.58%     struct file +176 (f_op)
>         0.53%     struct file +24 (f_count.counter)
>         0.46%     struct file +20 (f_mode)
>         0.11%     struct file +200 (private_data)
>      1.63%     struct rq
>         0.76%     struct rq +2856 (cpu)
>         0.21%     struct rq +2592 (curr)
>         0.11%     struct rq +2632 (clock_update_flags)
>         0.08%     struct rq +2600 (idle)
>         0.06%     struct rq +2688 (clock_task)
>         0.05%     struct rq +16 (numa_migrate_on)
>         0.04%     struct rq +2864 (cfs_tasks.next)
> 
> Thanks
> Athira Rajeev
> 
> Changelog:
> >From v7 -> v8:
>   Split patches 1 to 14 and patch 18 as basic patches.
>   Patch 15, 16 and 17 is related to global register handling
>   and will be handled separately.
>   Added Reviewed-and-tested-by from Kajol Jain
>   Added Reviewed-by from Namhyung for patches 1 to 14, patch 18
> 
> >From v6 -> v7:
> - Addressed review comments from Namhyung
>   Changed format string space to %-20s while printing
>   instruction stats in patch 18.
>   Use cmp_null in patch 17 while comparing var_name to
>   properly sort with correct order.
> 
> >From v5 -> v6:
> - Addressed review comments from Namhyung
>   Conditionally define TYPE_STATE_MAX_REGS based on arch.
>   Added macro for defining width of the raw codes and spaces
>   in disasm_line__parse_powerpc.
>   Call disasm_line__parse from disasm_line__parse_powerpc
>   for generic code.
>   Renamed symbol__disassemble_dso to symbol__disassemble_raw.
>   Fixed find_data_type_global_reg to correclty free var_types
>   and change indent level.
>   Fixed data_type_cmp and sort__typeoff_sort to include var_name
>   in comparing data type entries.
>   
> >From v4 -> v5:
> - Addressed review comments from Namhyung
>   Handle max number of type state regs as 16 for x86 and 32 for
>   powerpc.
>   Added generic support for objdump patch first and DSO read
>   optimisation next
>   combined patch 3 and patch 4 in patchseries V4 to one patch
>   Changed reference for "raw_insn" to use "u32"
>   Splitted "parse" callback patch changes and "ins__find" patch
>   changes into two
>   Instead of making weak function, added get_powerpc_regs to
>   extract register and offset fields for powerpc
> - Addressed complation fail when "dwarf.h" is not present ie
>   elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT
>   when including functions that use Dwarf references. Also
>   conditionally include some of the header files.
> 
> >From v3->v4:
> - Addressed review comments from Ian by using capston_init from
>   "util/print_insn.c" instead of "open_capston_handle".
> - Addressed review comment from Namhyung by moving "opcode"
>   field from "struct ins" to "struct disasm_line"
> 
> >From v2->v3:
> - Addressed review comments from Christophe and Namhyung for V2
> - Changed the apporach in powerpc to parse sample for data
>   type profiling as:
>   Read directly from DSO using dso__data_read_offset
>   If that fails for any case, fallback to using libcapstone
>   If libcapstone is not supported, approach will use objdump
> - Include instructions with opcode as 31 and correctly categorize
>   them as memory or arithmetic instructions.
> - Include more instructions for instruction tracking in powerpc
> 
> >From v1->v2:
> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>   to use the binary code (raw insn) to fetch opcode, register and
>   offset fields.
> - Added support for instruction tracking in powerpc
> - Find the register defined variables (r13 and r1 which points to
>   local_paca and current_stack_pointer in powerpc)
> 
> Athira Rajeev (15):
>   tools/perf: Move the data structures related to register type to
>     header file
>   tools/perf: Add "update_insn_state" callback function to handle arch
>     specific instruction tracking
>   tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in
>     powerpc
>   tools/perf: Add disasm_line__parse to parse raw instruction for
>     powerpc
>   tools/perf: Add support to capture and parse raw instruction in
>     powerpc using dso__data_read_offset utility
>   tools/perf: Update parameters for reg extract functions to use raw
>     instruction on powerpc
>   tools/perf: Add parse function for memory instructions in powerpc
>   tools/perf: Add support to identify memory instructions of opcode 31
>     in powerpc
>   tools/perf: Add some of the arithmetic instructions to support
>     instruction tracking in powerpc
>   tools/perf: Add more instructions for instruction tracking
>   tools/perf: Update instruction tracking for powerpc
>   tools/perf: Make capstone_init non-static so that it can be used
>     during symbol disassemble
>   tools/perf: Use capstone_init and remove open_capstone_handle from
>     disasm.c
>   tools/perf: Add support to use libcapstone in powerpc
>   tools/perf: Set instruction name to be used with insn-stat when using
>     raw instruction
> 
>  tools/include/linux/string.h                  |   2 +
>  tools/lib/string.c                            |  13 +
>  tools/perf/arch/arm64/annotate/instructions.c |   3 +-
>  .../arch/loongarch/annotate/instructions.c    |   6 +-
>  .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++++
>  tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
>  tools/perf/arch/s390/annotate/instructions.c  |   5 +-
>  tools/perf/arch/x86/annotate/instructions.c   | 377 ++++++++++++++
>  tools/perf/builtin-annotate.c                 |   4 +-
>  tools/perf/util/annotate-data.c               | 453 +----------------
>  tools/perf/util/annotate-data.h               |  83 ++++
>  tools/perf/util/annotate.c                    |  21 +-
>  tools/perf/util/annotate.h                    |   5 +-
>  tools/perf/util/disasm.c                      | 468 ++++++++++++++++--
>  tools/perf/util/disasm.h                      |  19 +-
>  tools/perf/util/include/dwarf-regs.h          |  11 +
>  tools/perf/util/print_insn.c                  |  15 +-
>  tools/perf/util/print_insn.h                  |   5 +
>  18 files changed, 1301 insertions(+), 496 deletions(-)
> 
> -- 
> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc
  2024-07-23 19:06   ` Arnaldo Carvalho de Melo
@ 2024-07-24  5:38     ` Athira Rajeev
  0 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-24  5:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Adrian Hunter, Ian Rogers, Namhyung Kim,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Hari Bathini, Disha Goel
> On 24 Jul 2024, at 12:36 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, Jul 18, 2024 at 02:13:46PM +0530, Athira Rajeev wrote:
>> TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined
>> to be 16. While checking if reg is valid using has_reg_type,
>> max value is checked using TYPE_STATE_MAX_REGS value. Define
>> this conditionally for powerpc.
> 
> So what would happen if I get a perf.data file on a powerpc system and
> then try to do data-type profiling on a x86 system?
> 
> I'm processing this now, but please consider fixing this up in some
> other fashion, I think we have support for collecting registers in a way
> that perf.data has all that is needed for us to print them in a cross
> arch way, no?
> 
> I see there is the FIXME there, ok.
> 
> - Arnaldo
> 
Hi Arnaldo
Sure,  thanks.
>> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
>> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
>> Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/util/annotate-data.h | 4 ++++
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
>> index 6fe8ee8b8410..992b7ce4bd11 100644
>> --- a/tools/perf/util/annotate-data.h
>> +++ b/tools/perf/util/annotate-data.h
>> @@ -189,7 +189,11 @@ struct type_state_stack {
>> };
>> 
>> /* FIXME: This should be arch-dependent */
>> +#ifdef __powerpc__
>> +#define TYPE_STATE_MAX_REGS  32
>> +#else
>> #define TYPE_STATE_MAX_REGS  16
>> +#endif
>> 
>> /*
>>  * State table to maintain type info in each register and stack location.
>> -- 
>> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc
  2024-07-23 19:14   ` Arnaldo Carvalho de Melo
@ 2024-07-24  5:38     ` Athira Rajeev
  0 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-24  5:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Adrian Hunter, Ian Rogers, Namhyung Kim, segher,
	christophe.leroy, linux-kernel, linux-perf-users, linuxppc-dev,
	akanksha, maddy, kjain, hbathini, disgoel
> On 24 Jul 2024, at 12:44 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, Jul 18, 2024 at 02:13:47PM +0530, Athira Rajeev wrote:
>> Currently, the perf tool infrastructure disasm_line__parse function to
>> parse disassembled line.
>> 
>> Example snippet from objdump:
>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>> 
>> c0000000010224b4: lwz     r10,0(r9)
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. In powerpc, the approach for data type
>> profiling uses raw instruction instead of result from objdump to identify
>> the instruction category and extract the source/target registers.
>> 
>> Example: 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. Add function
>> "disasm_line__parse_powerpc" to handle parsing of raw instruction.
>> Also update "struct disasm_line" to save the binary code/
>> With the change, function captures:
>> 
>> line -> "38 01 81 e8     ld      r4,312(r1)"
>> raw instruction "38 01 81 e8"
>> 
>> Raw instruction is used later to extract the reg/offset fields. Macros
>> are added to extract opcode and register fields. "struct disasm_line"
>> is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
>> code (raw). Function "disasm_line__parse_powerpc fills the raw
>> instruction hex value and can use macros to get opcode. There is no
>> changes in existing code paths, which parses the disassembled code.
>> The size of raw instruction depends on architecture. In case of powerpc,
>> the parsing the disasm line needs to handle cases for reading binary code
>> directly from DSO as well as parsing the objdump result. Hence adding
>> the logic into separate function instead of updating "disasm_line__parse".
>> The architecture using the instruction name and present approach is
>> not altered. Since this approach targets powerpc, the macro
>> implementation is added for powerpc as of now.
>> 
>> Since the disasm_line__parse is used in other cases (perf annotate) and
>> not only data tye profiling, the powerpc callback includes changes to
>> work with binary code as well as mneumonic representation. Also in case
>> if the DSO read fails and libcapstone is not supported, the approach
>> fallback to use objdump as option. Hence as option, patch has changes to
>> ensure objdump option also works well.
>> 
>> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
>> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/include/linux/string.h                  |  2 +
>> tools/lib/string.c                            | 13 +++++
>> .../perf/arch/powerpc/annotate/instructions.c |  1 +
>> tools/perf/arch/powerpc/util/dwarf-regs.c     |  9 ++++
>> tools/perf/util/annotate.h                    |  5 +-
>> tools/perf/util/disasm.c                      | 48 ++++++++++++++++++-
>> 6 files changed, 76 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
>> index db5c99318c79..0acb1fc14e19 100644
>> --- a/tools/include/linux/string.h
>> +++ b/tools/include/linux/string.h
>> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>> 
>> extern char *strim(char *);
>> 
>> +extern void remove_spaces(char *s);
>> +
>> extern void *memchr_inv(const void *start, int c, size_t bytes);
>> #endif /* _TOOLS_LINUX_STRING_H_ */
>> diff --git a/tools/lib/string.c b/tools/lib/string.c
>> index 8b6892f959ab..3126d2cff716 100644
>> --- a/tools/lib/string.c
>> +++ b/tools/lib/string.c
>> @@ -153,6 +153,19 @@ char *strim(char *s)
>> return skip_spaces(s);
>> }
>> 
>> +/*
>> + * remove_spaces - Removes whitespaces from @s
>> + */
>> +void remove_spaces(char *s)
>> +{
>> + char *d = s;
>> +
>> + do {
>> + while (*d == ' ')
>> + ++d;
>> + } while ((*s++ = *d++));
>> +}
>> +
>> /**
>>  * strreplace - Replace all occurrences of character in string.
>>  * @s: The string to operate on.
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
>> index a3f423c27cae..d57fd023ef9c 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -55,6 +55,7 @@ static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>> arch->initialized = true;
>> arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>> arch->objdump.comment_char      = '#';
>> + annotate_opts.show_asm_raw = true;
>> }
>> 
>> return 0;
>> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> index 0c4f4caf53ac..430623ca5612 100644
>> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
>> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> @@ -98,3 +98,12 @@ int regs_query_register_offset(const char *name)
>> return roff->ptregs_offset;
>> return -EINVAL;
>> }
>> +
>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>> +#define PPC_RA(a) (((a) >> 16) & 0x1f)
>> +#define PPC_RT(t) (((t) >> 21) & 0x1f)
>> +#define PPC_RB(b) (((b) >> 11) & 0x1f)
>> +#define PPC_D(D) ((D) & 0xfffe)
>> +#define PPC_DS(DS) ((DS) & 0xfffc)
>> +#define OP_LD 58
>> +#define OP_STD 62
>> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
>> index d5c821c22f79..9ba772f46270 100644
>> --- a/tools/perf/util/annotate.h
>> +++ b/tools/perf/util/annotate.h
>> @@ -113,7 +113,10 @@ struct annotation_line {
>> struct disasm_line {
>> struct ins ins;
>> struct ins_operands ops;
>> -
>> + union {
>> + u8 bytes[4];
>> + u32 raw_insn;
>> + } raw;
>> /* This needs to be at the end. */
>> struct annotation_line al;
>> };
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index d2723ba024bf..a53591a6111e 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -44,6 +44,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>> 
>> static void ins__sort(struct arch *arch);
>> static int disasm_line__parse(char *line, const char **namep, char **rawp);
>> +static int disasm_line__parse_powerpc(struct disasm_line *dl);
>> 
>> static __attribute__((constructor)) void symbol__init_regexpr(void)
>> {
>> @@ -845,6 +846,48 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>> return -1;
>> }
>> 
>> +/*
>> + * Parses the result captured from symbol__disassemble_*
>> + * Example, line read from DSO file in powerpc:
>> + * line:    38 01 81 e8
>> + * opcode: fetched from arch specific get_opcode_insn
>> + * rawp_insn: e8810138
>> + *
>> + * rawp_insn is used later to extract the reg/offset fields
>> + */
>> +#define PPC_OP(op) (((op) >> 26) & 0x3F)
>> +#define RAW_BYTES 11
>> +
>> +static int disasm_line__parse_powerpc(struct disasm_line *dl)
>> +{
>> + char *line = dl->al.line;
>> + const char **namep = &dl->ins.name;
>> + char **rawp = &dl->ops.raw;
>> + char *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
>> + char *name = skip_spaces(name_raw_insn + RAW_BYTES);
>> + int objdump = 0;
>> +
>> + if (strlen(line) > RAW_BYTES)
>> + objdump = 1;
>> +
>> + if (name_raw_insn[0] == '\0')
>> + return -1;
>> +
>> + if (objdump) {
>> + disasm_line__parse(name, namep, rawp);
>> + } else
>> + *namep = "";
>> +
>> + tmp_raw_insn = strndup(name_raw_insn, 11);
> 
> Not  checking the result of strndup(), I'll try to add the check on an
> extra pass after I read the patches.
> 
> - Arnaldo
> 
>> + remove_spaces(tmp_raw_insn);
>> +
>> + sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn);
>> + if (objdump)
>> + dl->raw.raw_insn = be32_to_cpu(dl->raw.raw_insn);
>> +
>> + return 0;
>> +}
>> +
>> static void annotation_line__init(struct annotation_line *al,
>>  struct annotate_args *args,
>>  int nr)
>> @@ -898,7 +941,10 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>> goto out_delete;
>> 
>> if (args->offset != -1) {
>> - if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>> + if (arch__is(args->arch, "powerpc")) {
>> + if (disasm_line__parse_powerpc(dl) < 0)
>> + goto out_free_line;
>> + } else if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
>> goto out_free_line;
> 
> When the if has {}, the else should as well.
> Documentation/process/coding-style.rst has this documented.
> 
> This is minor, I'm pointing out so that you take into account for the
> future.
> 
> - Arnaldo
Thanks for pointing out. I will take care of it in the future
Athira
> 
>> 
>> disasm_line__init_ins(dl, args->arch, &args->ms);
>> -- 
>> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 00/15] Add data type profiling support for powerpc
  2024-07-23 20:07 ` [PATCH V8 00/15] Add data type profiling support for powerpc Arnaldo Carvalho de Melo
@ 2024-07-24  7:59   ` Athira Rajeev
  0 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-24  7:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Adrian Hunter, Ian Rogers, Namhyung Kim,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, kjain, hbathini,
	disgoel
> On 24 Jul 2024, at 1:37 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, Jul 18, 2024 at 02:13:43PM +0530, Athira Rajeev wrote:
>> The patchset from Namhyung added support for data type profiling
>> in perf tool. This enabled support to associate PMU samples to data
>> types they refer using DWARF debug information. With the upstream
>> perf, currently it possible to run perf report or perf annotate to
>> view the data type information on x86.
>> 
>> Initial patchset posted here had changes need to enable data type
>> profiling support for powerpc.
> 
> I have this in tmp.perf-tools-next now, will do more tests before moving
> to perf-tools-next.
> 
> I see that Namhyung reviewed the patch series, and that it was tested,
> in general I thought we need to go on improving things go avoid having
> arch specific code in the non-arch code, I guess right now its important
> to land these to try and encourage people to try data-type profiling
> more widely, hopefully finding and improving data structures in the
> kernel.
> 
> We need to write about such cases so as to provide success stories on
> using perf for data-type profiling.
> 
> Thanks for working on this!
Sure, Thanks Arnaldo for taking look at the patchset and pulling to tmp.perf-tools-next
- Athira
> 
> - Arnaldo
> 
>> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
>> 
>> Main change were:
>> 1. powerpc instruction nmemonic table to associate load/store
>> instructions with move_ops which is use to identify if instruction
>> is a memory access one.
>> 2. To get register number and access offset from the given
>> instruction, code uses fields from "struct arch" -> objump.
>> Added entry for powerpc here.
>> 3. A get_arch_regnum to return register number from the
>> register name string.
>> 
>> But the apporach used in the initial patchset used parsing of
>> disassembled code which the current perf tool implementation does.
>> 
>> Example: lwz     r10,0(r9)
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. Also to find whether there is a memory
>> reference in the operands, "memory_ref_char" field of objdump is used.
>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>> form "mov  (%rax), %rcx".
>> 
>> In case of powerpc, not all instructions using "(" are the only memory
>> instructions. Example, above instruction can also be of extended form (X
>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>> and extract the source/target registers, second patchset added support to use
>> raw instruction. With raw instruction, macros are added to extract opcode
>> and register fields.
>> Link to second patchset:
>> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
>> 
>> Example representation using --show-raw-insn in objdump gives result:
>> 
>> 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>> as:
>>  _____________________________________
>>  | 58 |  RT  |  RA |      DS       | |
>>  -------------------------------------
>> 0    6     11    16              30 31
>> 
>> Second patchset used "objdump" again to read the raw instruction.
>> But since there is no need to disassemble and binary code can be read
>> directly from the DSO, third patchset (ie this patchset) uses below
>> apporach. The apporach preferred in powerpc to parse sample for data
>> type profiling in V3 patchset is:
>> - Read directly from DSO using dso__data_read_offset
>> - If that fails for any case, fallback to using libcapstone
>> - If libcapstone is not supported, approach will use objdump
>> 
>> Patchset adds support to pick the opcode and reg fields from this
>> raw/binary instruction code. This approach came in from review comment
>> by Segher Boessenkool and Christophe for the initial patchset.
>> 
>> Apart from that, instruction tracking is enabled for powerpc.
>> 
>> These are handled in this patchset.
>> 
>> - Patch 1 is to rearrange register state type structures to header file
>> so that it can referred from other arch specific files
>> - Patch 2 is to make instruction tracking as a callback to"struct arch"
>> so that it can be implemented by other archs easily and defined in arch
>> specific files
>> - Patch 3 is to handle state type regs array size for x86 and powerpc
>> - Patch 4 adds support to capture and parse raw instruction in powerpc
>> using dso__data_read_offset utility
>> - Patch 4 also adds logic to support using objdump when doing default "perf
>> report" or "perf annotate" since it that needs disassembled instruction.
>> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
>> - Patch 6 update parameters for reg extract functions to use raw
>> instruction on powerpc
>> - Patch 7 updates ins__find to carry raw_insn and also adds parse
>> callback for memory instructions for powerpc
>> - Patch 8 add support to identify memory instructions of opcode 31 in
>> powerpc
>> - Patch 9 adds more instructions to support instruction tracking in powerpc
>> - Patch 10 and 11 handles instruction tracking for powerpc.
>> - Patch 12, 13 and 14 add support to use libcapstone in powerpc
>> - Patch 15 handles insn-stat option for perf annotate
>> 
>> Note:
>> - There are remaining unknowns (25%) as seen in annotate Instruction stats
>> below.
>> - This patchset is not tested on powerpc32. In next step of enhancements
>> along with handling remaining unknowns, plan to cover powerpc32 changes
>> based on how testing goes.
>> - Patch changes to handle global register support will be posted in
>> follow up after addressing review comment from Namhung here:
>> https://lore.kernel.org/linux-perf-users/CAM9d7cg5MYvLeOoBuKqp1pw7uvRfqCw1fXpLtgct0npL96JaYg@mail.gmail.com/T/#mb309184ebf5d2c9b539cd86d61ffe2a3696eac71
>> 
>> With the current patchset:
>> 
>> ./perf record -a -e mem-loads sleep 1
>> ./perf report -s type,typeoff --hierarchy --group --stdio
>> ./perf annotate --data-type --insn-stat
>> 
>> perf annotate logs:
>> ==================
>> 
>> Annotate Instruction stats
>> total 609, ok 266 (43.7%), bad 343 (56.3%)
>> 
>>  Name/opcode         :  Good   Bad
>> -----------------------------------------------------------
>>  58                  :   172   231
>>  32                  :    48    44
>>  34                  :    19    25
>>  OP_31_XOP_LDX       :     8    20
>>  40                  :     9    14
>>  OP_31_XOP_LWARX     :     5     1
>>  OP_31_XOP_LWZX      :     2     3
>>  OP_31_XOP_LDARX     :     3     0
>>  33                  :     0     2
>>  OP_31_XOP_LBZX      :     0     1
>>  OP_31_XOP_LWAX      :     0     1
>>  OP_31_XOP_LHZX      :     0     1
>> 
>> perf report logs:
>> =================
>> 
>>  Total Lost Samples: 0
>> 
>>  Samples: 1K of event 'mem-loads'
>>  Event count (approx.): 937238
>> 
>>    Overhead  Data Type / Data Type Offset
>>    ........  .........  ................
>>     1.73%     struct task_struct
>>        1.29%     struct task_struct +276 (flags)
>>        0.10%     struct task_struct +1872 (mm)
>>        0.08%     struct task_struct +3632 (thread.regs)
>>        0.08%     struct task_struct +1072 (sched_class)
>>        0.05%     struct task_struct +1928 ()
>>        0.03%     struct task_struct +1064 (dl_server)
>>        0.02%     struct task_struct +0 (thread_info.preempt_count)
>>        0.02%     struct task_struct +120 (thread_info)
>>        0.02%     struct task_struct +2216 (worker_private)
>>        0.02%     struct task_struct +24 (thread_info.slb_preload_nr)
>>        0.02%     struct task_struct +8 (thread_info.local_flags)
>>        0.01%     struct task_struct +1416 (nr_cpus_allowed)
>>     1.68%     struct file
>>        0.58%     struct file +176 (f_op)
>>        0.53%     struct file +24 (f_count.counter)
>>        0.46%     struct file +20 (f_mode)
>>        0.11%     struct file +200 (private_data)
>>     1.63%     struct rq
>>        0.76%     struct rq +2856 (cpu)
>>        0.21%     struct rq +2592 (curr)
>>        0.11%     struct rq +2632 (clock_update_flags)
>>        0.08%     struct rq +2600 (idle)
>>        0.06%     struct rq +2688 (clock_task)
>>        0.05%     struct rq +16 (numa_migrate_on)
>>        0.04%     struct rq +2864 (cfs_tasks.next)
>> 
>> Thanks
>> Athira Rajeev
>> 
>> Changelog:
>>> From v7 -> v8:
>>  Split patches 1 to 14 and patch 18 as basic patches.
>>  Patch 15, 16 and 17 is related to global register handling
>>  and will be handled separately.
>>  Added Reviewed-and-tested-by from Kajol Jain
>>  Added Reviewed-by from Namhyung for patches 1 to 14, patch 18
>> 
>>> From v6 -> v7:
>> - Addressed review comments from Namhyung
>>  Changed format string space to %-20s while printing
>>  instruction stats in patch 18.
>>  Use cmp_null in patch 17 while comparing var_name to
>>  properly sort with correct order.
>> 
>>> From v5 -> v6:
>> - Addressed review comments from Namhyung
>>  Conditionally define TYPE_STATE_MAX_REGS based on arch.
>>  Added macro for defining width of the raw codes and spaces
>>  in disasm_line__parse_powerpc.
>>  Call disasm_line__parse from disasm_line__parse_powerpc
>>  for generic code.
>>  Renamed symbol__disassemble_dso to symbol__disassemble_raw.
>>  Fixed find_data_type_global_reg to correclty free var_types
>>  and change indent level.
>>  Fixed data_type_cmp and sort__typeoff_sort to include var_name
>>  in comparing data type entries.
>> 
>>> From v4 -> v5:
>> - Addressed review comments from Namhyung
>>  Handle max number of type state regs as 16 for x86 and 32 for
>>  powerpc.
>>  Added generic support for objdump patch first and DSO read
>>  optimisation next
>>  combined patch 3 and patch 4 in patchseries V4 to one patch
>>  Changed reference for "raw_insn" to use "u32"
>>  Splitted "parse" callback patch changes and "ins__find" patch
>>  changes into two
>>  Instead of making weak function, added get_powerpc_regs to
>>  extract register and offset fields for powerpc
>> - Addressed complation fail when "dwarf.h" is not present ie
>>  elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT
>>  when including functions that use Dwarf references. Also
>>  conditionally include some of the header files.
>> 
>>> From v3->v4:
>> - Addressed review comments from Ian by using capston_init from
>>  "util/print_insn.c" instead of "open_capston_handle".
>> - Addressed review comment from Namhyung by moving "opcode"
>>  field from "struct ins" to "struct disasm_line"
>> 
>>> From v2->v3:
>> - Addressed review comments from Christophe and Namhyung for V2
>> - Changed the apporach in powerpc to parse sample for data
>>  type profiling as:
>>  Read directly from DSO using dso__data_read_offset
>>  If that fails for any case, fallback to using libcapstone
>>  If libcapstone is not supported, approach will use objdump
>> - Include instructions with opcode as 31 and correctly categorize
>>  them as memory or arithmetic instructions.
>> - Include more instructions for instruction tracking in powerpc
>> 
>>> From v1->v2:
>> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>>  to use the binary code (raw insn) to fetch opcode, register and
>>  offset fields.
>> - Added support for instruction tracking in powerpc
>> - Find the register defined variables (r13 and r1 which points to
>>  local_paca and current_stack_pointer in powerpc)
>> 
>> Athira Rajeev (15):
>>  tools/perf: Move the data structures related to register type to
>>    header file
>>  tools/perf: Add "update_insn_state" callback function to handle arch
>>    specific instruction tracking
>>  tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in
>>    powerpc
>>  tools/perf: Add disasm_line__parse to parse raw instruction for
>>    powerpc
>>  tools/perf: Add support to capture and parse raw instruction in
>>    powerpc using dso__data_read_offset utility
>>  tools/perf: Update parameters for reg extract functions to use raw
>>    instruction on powerpc
>>  tools/perf: Add parse function for memory instructions in powerpc
>>  tools/perf: Add support to identify memory instructions of opcode 31
>>    in powerpc
>>  tools/perf: Add some of the arithmetic instructions to support
>>    instruction tracking in powerpc
>>  tools/perf: Add more instructions for instruction tracking
>>  tools/perf: Update instruction tracking for powerpc
>>  tools/perf: Make capstone_init non-static so that it can be used
>>    during symbol disassemble
>>  tools/perf: Use capstone_init and remove open_capstone_handle from
>>    disasm.c
>>  tools/perf: Add support to use libcapstone in powerpc
>>  tools/perf: Set instruction name to be used with insn-stat when using
>>    raw instruction
>> 
>> tools/include/linux/string.h                  |   2 +
>> tools/lib/string.c                            |  13 +
>> tools/perf/arch/arm64/annotate/instructions.c |   3 +-
>> .../arch/loongarch/annotate/instructions.c    |   6 +-
>> .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++++
>> tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
>> tools/perf/arch/s390/annotate/instructions.c  |   5 +-
>> tools/perf/arch/x86/annotate/instructions.c   | 377 ++++++++++++++
>> tools/perf/builtin-annotate.c                 |   4 +-
>> tools/perf/util/annotate-data.c               | 453 +----------------
>> tools/perf/util/annotate-data.h               |  83 ++++
>> tools/perf/util/annotate.c                    |  21 +-
>> tools/perf/util/annotate.h                    |   5 +-
>> tools/perf/util/disasm.c                      | 468 ++++++++++++++++--
>> tools/perf/util/disasm.h                      |  19 +-
>> tools/perf/util/include/dwarf-regs.h          |  11 +
>> tools/perf/util/print_insn.c                  |  15 +-
>> tools/perf/util/print_insn.h                  |   5 +
>> 18 files changed, 1301 insertions(+), 496 deletions(-)
>> 
>> -- 
>> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc
  2024-07-18  8:43 ` [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
@ 2024-07-24 21:00   ` Arnaldo Carvalho de Melo
  2024-07-26 12:35     ` Athira Rajeev
  2024-07-31 13:51   ` kernel test robot
  1 sibling, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-07-24 21:00 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy,
	linux-kernel, linux-perf-users, linuxppc-dev, akanksha, maddy,
	kjain, hbathini, disgoel
On Thu, Jul 18, 2024 at 02:13:57PM +0530, Athira Rajeev wrote:
> Now perf uses the capstone library to disassemble the instructions in
> x86. capstone is used (if available) for perf annotate to speed up.
> Currently it only supports x86 architecture. Patch includes changes to
> enable this in powerpc. For now, only for data type sort keys, this
> method is used and only binary code (raw instruction) is read. This is
> because powerpc approach to understand instructions and reg fields uses
> raw instruction. The "cs_disasm" is currently not enabled. While
> attempting to do cs_disasm, observation is that some of the instructions
> were not identified (ex: extswsli, maddld) and it had to fallback to use
> objdump. Hence enabling "cs_disasm" is added in comment section as a
> TODO for powerpc.
> 
> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Is this building on ppc?
Here, when building on a x86-64 machine with:
ake -C tools/perf EXTRA_CFLAGS=-DREFCNT_CHECKING=1 O=/tmp/build/perf-tools-next/
That I got to by the usual:
make -C tools/perf build-test
I get:
util/disasm.c: In function ‘symbol__disassemble_capstone_powerpc’:
util/disasm.c:1618:34: error: ‘struct dso’ has no member named ‘nsinfo’
 1618 |         nsinfo__mountns_enter(dso->nsinfo, &nsc);
      |                                  ^~
make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/util/disasm.o] Error 1
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
make[2]: *** [Makefile.perf:762: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
make[1]: *** [Makefile.perf:265: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
⬢[acme@toolbox perf-tools-next]$
And then when looking at this changeset I noticed that you added a
function and then only called it if:
 static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
                                        struct annotate_args *args)
 {
@@ -1945,6 +2083,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
                        err = symbol__disassemble_raw(symfs_filename, sym, args);
                        if (err == 0)
                                goto out_remove_tmp;
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+                       err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
+                       if (err == 0)
+                               goto out_remove_tmp;
+#endif
                }
        }
but the symbol__disassemble_capstone_powerpc() is being unconditionally
built, i.e. it is not surrounded by '#ifdef HAVE_LIBCAPSTONE_SUPPORT'
And then there:
dso->nsinfo
It should have been:
⬢[acme@toolbox perf-tools-next]$ git diff
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 2a1c657d6aa66cc2..99081f37c5daba60 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1615,7 +1615,7 @@ static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *s
        if (args->options->objdump_path)
                return -1;
 
-       nsinfo__mountns_enter(dso->nsinfo, &nsc);
+       nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
        fd = open(filename, O_RDONLY);
        nsinfo__mountns_exit(&nsc);
        if (fd < 0)
⬢[acme@toolbox perf-tools-next]$
So it is critical that you try using:
make -C tools/perf build-test
Before asking for review.
Please do that next time.
I'm fixing this up this time so that we can make progress.
Best regards,
- Arnaldo
> ---
>  tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 143 insertions(+)
> 
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index a848e6f5f05a..63681df6482b 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -1585,6 +1585,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
>  	}
>  }
>  
> +static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
> +					struct annotate_args *args)
> +{
> +	struct annotation *notes = symbol__annotation(sym);
> +	struct map *map = args->ms.map;
> +	struct dso *dso = map__dso(map);
> +	struct nscookie nsc;
> +	u64 start = map__rip_2objdump(map, sym->start);
> +	u64 end = map__rip_2objdump(map, sym->end);
> +	u64 len = end - start;
> +	u64 offset;
> +	int i, fd, count;
> +	bool is_64bit = false;
> +	bool needs_cs_close = false;
> +	u8 *buf = NULL;
> +	struct find_file_offset_data data = {
> +		.ip = start,
> +	};
> +	csh handle;
> +	char disasm_buf[512];
> +	struct disasm_line *dl;
> +	u32 *line;
> +	bool disassembler_style = false;
> +
> +	if (args->options->objdump_path)
> +		return -1;
> +
> +	nsinfo__mountns_enter(dso->nsinfo, &nsc);
> +	fd = open(filename, O_RDONLY);
> +	nsinfo__mountns_exit(&nsc);
> +	if (fd < 0)
> +		return -1;
> +
> +	if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
> +			    &is_64bit) == 0)
> +		goto err;
> +
> +	if (!args->options->disassembler_style ||
> +			!strcmp(args->options->disassembler_style, "att"))
> +		disassembler_style = true;
> +
> +	if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
> +		goto err;
> +
> +	needs_cs_close = true;
> +
> +	buf = malloc(len);
> +	if (buf == NULL)
> +		goto err;
> +
> +	count = pread(fd, buf, len, data.offset);
> +	close(fd);
> +	fd = -1;
> +
> +	if ((u64)count != len)
> +		goto err;
> +
> +	line = (u32 *)buf;
> +
> +	/* add the function address and name */
> +	scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
> +		  start, sym->name);
> +
> +	args->offset = -1;
> +	args->line = disasm_buf;
> +	args->line_nr = 0;
> +	args->fileloc = NULL;
> +	args->ms.sym = sym;
> +
> +	dl = disasm_line__new(args);
> +	if (dl == NULL)
> +		goto err;
> +
> +	annotation_line__add(&dl->al, ¬es->src->source);
> +
> +	/*
> +	 * TODO: enable disassm for powerpc
> +	 * count = cs_disasm(handle, buf, len, start, len, &insn);
> +	 *
> +	 * For now, only binary code is saved in disassembled line
> +	 * to be used in "type" and "typeoff" sort keys. Each raw code
> +	 * is 32 bit instruction. So use "len/4" to get the number of
> +	 * entries.
> +	 */
> +	count = len/4;
> +
> +	for (i = 0, offset = 0; i < count; i++) {
> +		args->offset = offset;
> +		sprintf(args->line, "%x", line[i]);
> +
> +		dl = disasm_line__new(args);
> +		if (dl == NULL)
> +			goto err;
> +
> +		annotation_line__add(&dl->al, ¬es->src->source);
> +
> +		offset += 4;
> +	}
> +
> +	/* It failed in the middle */
> +	if (offset != len) {
> +		struct list_head *list = ¬es->src->source;
> +
> +		/* Discard all lines and fallback to objdump */
> +		while (!list_empty(list)) {
> +			dl = list_first_entry(list, struct disasm_line, al.node);
> +
> +			list_del_init(&dl->al.node);
> +			disasm_line__free(dl);
> +		}
> +		count = -1;
> +	}
> +
> +out:
> +	if (needs_cs_close)
> +		cs_close(&handle);
> +	free(buf);
> +	return count < 0 ? count : 0;
> +
> +err:
> +	if (fd >= 0)
> +		close(fd);
> +	if (needs_cs_close) {
> +		struct disasm_line *tmp;
> +
> +		/*
> +		 * It probably failed in the middle of the above loop.
> +		 * Release any resources it might add.
> +		 */
> +		list_for_each_entry_safe(dl, tmp, ¬es->src->source, al.node) {
> +			list_del(&dl->al.node);
> +			free(dl);
> +		}
> +	}
> +	count = -1;
> +	goto out;
> +}
> +
>  static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>  					struct annotate_args *args)
>  {
> @@ -1942,6 +2080,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>  			err = symbol__disassemble_raw(symfs_filename, sym, args);
>  			if (err == 0)
>  				goto out_remove_tmp;
> +#ifdef HAVE_LIBCAPSTONE_SUPPORT
> +			err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
> +			if (err == 0)
> +				goto out_remove_tmp;
> +#endif
>  		}
>  	}
>  
> -- 
> 2.43.0
^ permalink raw reply related	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc
  2024-07-24 21:00   ` Arnaldo Carvalho de Melo
@ 2024-07-26 12:35     ` Athira Rajeev
  0 siblings, 0 replies; 25+ messages in thread
From: Athira Rajeev @ 2024-07-26 12:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Adrian Hunter, Ian Rogers, Namhyung Kim,
	Segher Boessenkool, Christophe Leroy, LKML, linux-perf-users,
	linuxppc-dev, akanksha, Madhavan Srinivasan, Kajol Jain,
	Hari Bathini, Disha Goel
> On 25 Jul 2024, at 2:30 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, Jul 18, 2024 at 02:13:57PM +0530, Athira Rajeev wrote:
>> Now perf uses the capstone library to disassemble the instructions in
>> x86. capstone is used (if available) for perf annotate to speed up.
>> Currently it only supports x86 architecture. Patch includes changes to
>> enable this in powerpc. For now, only for data type sort keys, this
>> method is used and only binary code (raw instruction) is read. This is
>> because powerpc approach to understand instructions and reg fields uses
>> raw instruction. The "cs_disasm" is currently not enabled. While
>> attempting to do cs_disasm, observation is that some of the instructions
>> were not identified (ex: extswsli, maddld) and it had to fallback to use
>> objdump. Hence enabling "cs_disasm" is added in comment section as a
>> TODO for powerpc.
>> 
>> Reviewed-and-tested-by: Kajol Jain <kjain@linux.ibm.com>
>> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> 
> Is this building on ppc?
> 
Hi Arnaldo
Yes, I did compile with and without capstone devel support also.
But didn’t hit the compilation issue. 
...                             libcapstone: [ OFF ]
  LINK    perf
...                             libcapstone: [ on  ]
  LINK    perf
> Here, when building on a x86-64 machine with:
> 
> ake -C tools/perf EXTRA_CFLAGS=-DREFCNT_CHECKING=1 O=/tmp/build/perf-tools-next/
> 
> That I got to by the usual:
> 
> make -C tools/perf build-test
> 
> I get:
> 
> util/disasm.c: In function ‘symbol__disassemble_capstone_powerpc’:
> util/disasm.c:1618:34: error: ‘struct dso’ has no member named ‘nsinfo’
> 1618 |         nsinfo__mountns_enter(dso->nsinfo, &nsc);
>      |                                  ^~
> make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/util/disasm.o] Error 1
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
> make[2]: *** [Makefile.perf:762: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
> make[1]: *** [Makefile.perf:265: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
> make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> ⬢[acme@toolbox perf-tools-next]$
> 
> And then when looking at this changeset I noticed that you added a
> function and then only called it if:
> 
> static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>                                        struct annotate_args *args)
> {
> @@ -1945,6 +2083,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>                        err = symbol__disassemble_raw(symfs_filename, sym, args);
>                        if (err == 0)
>                                goto out_remove_tmp;
> +#ifdef HAVE_LIBCAPSTONE_SUPPORT
> +                       err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
> +                       if (err == 0)
> +                               goto out_remove_tmp;
> +#endif
>                }
>        }
> 
> but the symbol__disassemble_capstone_powerpc() is being unconditionally
> built, i.e. it is not surrounded by '#ifdef HAVE_LIBCAPSTONE_SUPPORT'
> 
> And then there:
> 
> dso->nsinfo
> 
> It should have been:
> 
> ⬢[acme@toolbox perf-tools-next]$ git diff
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 2a1c657d6aa66cc2..99081f37c5daba60 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -1615,7 +1615,7 @@ static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *s
>        if (args->options->objdump_path)
>                return -1;
> 
> -       nsinfo__mountns_enter(dso->nsinfo, &nsc);
> +       nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
>        fd = open(filename, O_RDONLY);
>        nsinfo__mountns_exit(&nsc);
>        if (fd < 0)
> ⬢[acme@toolbox perf-tools-next]$
> 
> So it is critical that you try using:
> 
> make -C tools/perf build-test
> 
> Before asking for review.
> 
> Please do that next time.
Ok Arnaldo, Sure, I will ensure to follow this every time
Thanks,
Athira
> 
> I'm fixing this up this time so that we can make progress.
> 
> Best regards,
> 
> - Arnaldo
> 
> 
>> ---
>> tools/perf/util/disasm.c | 143 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 143 insertions(+)
>> 
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index a848e6f5f05a..63681df6482b 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -1585,6 +1585,144 @@ static void print_capstone_detail(cs_insn *insn, char *buf, size_t len,
>> }
>> }
>> 
>> +static int symbol__disassemble_capstone_powerpc(char *filename, struct symbol *sym,
>> + struct annotate_args *args)
>> +{
>> + struct annotation *notes = symbol__annotation(sym);
>> + struct map *map = args->ms.map;
>> + struct dso *dso = map__dso(map);
>> + struct nscookie nsc;
>> + u64 start = map__rip_2objdump(map, sym->start);
>> + u64 end = map__rip_2objdump(map, sym->end);
>> + u64 len = end - start;
>> + u64 offset;
>> + int i, fd, count;
>> + bool is_64bit = false;
>> + bool needs_cs_close = false;
>> + u8 *buf = NULL;
>> + struct find_file_offset_data data = {
>> + .ip = start,
>> + };
>> + csh handle;
>> + char disasm_buf[512];
>> + struct disasm_line *dl;
>> + u32 *line;
>> + bool disassembler_style = false;
>> +
>> + if (args->options->objdump_path)
>> + return -1;
>> +
>> + nsinfo__mountns_enter(dso->nsinfo, &nsc);
>> + fd = open(filename, O_RDONLY);
>> + nsinfo__mountns_exit(&nsc);
>> + if (fd < 0)
>> + return -1;
>> +
>> + if (file__read_maps(fd, /*exe=*/true, find_file_offset, &data,
>> +    &is_64bit) == 0)
>> + goto err;
>> +
>> + if (!args->options->disassembler_style ||
>> + !strcmp(args->options->disassembler_style, "att"))
>> + disassembler_style = true;
>> +
>> + if (capstone_init(maps__machine(args->ms.maps), &handle, is_64bit, disassembler_style) < 0)
>> + goto err;
>> +
>> + needs_cs_close = true;
>> +
>> + buf = malloc(len);
>> + if (buf == NULL)
>> + goto err;
>> +
>> + count = pread(fd, buf, len, data.offset);
>> + close(fd);
>> + fd = -1;
>> +
>> + if ((u64)count != len)
>> + goto err;
>> +
>> + line = (u32 *)buf;
>> +
>> + /* add the function address and name */
>> + scnprintf(disasm_buf, sizeof(disasm_buf), "%#"PRIx64" <%s>:",
>> +  start, sym->name);
>> +
>> + args->offset = -1;
>> + args->line = disasm_buf;
>> + args->line_nr = 0;
>> + args->fileloc = NULL;
>> + args->ms.sym = sym;
>> +
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, ¬es->src->source);
>> +
>> + /*
>> + * TODO: enable disassm for powerpc
>> + * count = cs_disasm(handle, buf, len, start, len, &insn);
>> + *
>> + * For now, only binary code is saved in disassembled line
>> + * to be used in "type" and "typeoff" sort keys. Each raw code
>> + * is 32 bit instruction. So use "len/4" to get the number of
>> + * entries.
>> + */
>> + count = len/4;
>> +
>> + for (i = 0, offset = 0; i < count; i++) {
>> + args->offset = offset;
>> + sprintf(args->line, "%x", line[i]);
>> +
>> + dl = disasm_line__new(args);
>> + if (dl == NULL)
>> + goto err;
>> +
>> + annotation_line__add(&dl->al, ¬es->src->source);
>> +
>> + offset += 4;
>> + }
>> +
>> + /* It failed in the middle */
>> + if (offset != len) {
>> + struct list_head *list = ¬es->src->source;
>> +
>> + /* Discard all lines and fallback to objdump */
>> + while (!list_empty(list)) {
>> + dl = list_first_entry(list, struct disasm_line, al.node);
>> +
>> + list_del_init(&dl->al.node);
>> + disasm_line__free(dl);
>> + }
>> + count = -1;
>> + }
>> +
>> +out:
>> + if (needs_cs_close)
>> + cs_close(&handle);
>> + free(buf);
>> + return count < 0 ? count : 0;
>> +
>> +err:
>> + if (fd >= 0)
>> + close(fd);
>> + if (needs_cs_close) {
>> + struct disasm_line *tmp;
>> +
>> + /*
>> + * It probably failed in the middle of the above loop.
>> + * Release any resources it might add.
>> + */
>> + list_for_each_entry_safe(dl, tmp, ¬es->src->source, al.node) {
>> + list_del(&dl->al.node);
>> + free(dl);
>> + }
>> + }
>> + count = -1;
>> + goto out;
>> +}
>> +
>> static int symbol__disassemble_capstone(char *filename, struct symbol *sym,
>> struct annotate_args *args)
>> {
>> @@ -1942,6 +2080,11 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>> err = symbol__disassemble_raw(symfs_filename, sym, args);
>> if (err == 0)
>> goto out_remove_tmp;
>> +#ifdef HAVE_LIBCAPSTONE_SUPPORT
>> + err = symbol__disassemble_capstone_powerpc(symfs_filename, sym, args);
>> + if (err == 0)
>> + goto out_remove_tmp;
>> +#endif
>> }
>> }
>> 
>> -- 
>> 2.43.0
^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc
  2024-07-18  8:43 ` [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
  2024-07-24 21:00   ` Arnaldo Carvalho de Melo
@ 2024-07-31 13:51   ` kernel test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-07-31 13:51 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: oe-lkp, lkp, Kajol Jain, Namhyung Kim, linux-perf-users,
	linux-kernel, acme, jolsa, adrian.hunter, irogers, segher,
	christophe.leroy, linuxppc-dev, akanksha, maddy, atrajeev,
	hbathini, disgoel, oliver.sang
Hello,
kernel test robot noticed "perf-stat-tests.perf.make.fail" on:
commit: bcda270a3f755557316ac08c0b53c2fa23731ecc ("[PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc")
url: https://github.com/intel-lab-lkp/linux/commits/Athira-Rajeev/tools-perf-Move-the-data-structures-related-to-register-type-to-header-file/20240718-170432
base: https://git.kernel.org/cgit/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link: https://lore.kernel.org/all/20240718084358.72242-15-atrajeev@linux.vnet.ibm.com/
patch subject: [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc
in testcase: perf-stat-tests
version: 
with following parameters:
compiler: gcc-13
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407312114.ee900833-oliver.sang@intel.com
Makefile.config:699: Warning: Disabled BPF skeletons as clang (clang) is missing
  PERF_VERSION = 6.10.0-rc3
util/disasm.c: In function ▒~@~Xsymbol__disassemble_capstone_powerpc▒~@~Y:
util/disasm.c:1615:34: error: ▒~@~Xstruct dso▒~@~Y has no member named ▒~@~Xnsinfo▒~@~Y
 1615 |         nsinfo__mountns_enter(dso->nsinfo, &nsc);
      |                                  ^~
make[4]: *** [/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-bcda270a3f755557316ac08c0b53c2fa23731ecc/tools/build/Makefile.build:105: util/disasm.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-bcda270a3f755557316ac08c0b53c2fa23731ecc/tools/build/Makefile.build:158: util] Error 2
make[2]: *** [Makefile.perf:762: perf-util-in.o] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile.perf:265: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240731/202407312114.ee900833-oliver.sang@intel.com
-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply	[flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-07-31 13:51 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18  8:43 [PATCH V8 00/15] Add data type profiling support for powerpc Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 01/15] tools/perf: Move the data structures related to register type to header file Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 02/15] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 03/15] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc Athira Rajeev
2024-07-23 19:06   ` Arnaldo Carvalho de Melo
2024-07-24  5:38     ` Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 04/15] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc Athira Rajeev
2024-07-23 19:14   ` Arnaldo Carvalho de Melo
2024-07-24  5:38     ` Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 05/15] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 06/15] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 07/15] tools/perf: Add parse function for memory instructions in powerpc Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 08/15] tools/perf: Add support to identify memory instructions of opcode 31 " Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 09/15] tools/perf: Add some of the arithmetic instructions to support instruction tracking " Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 10/15] tools/perf: Add more instructions for instruction tracking Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 11/15] tools/perf: Update instruction tracking for powerpc Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 12/15] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 13/15] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c Athira Rajeev
2024-07-18  8:43 ` [PATCH V8 14/15] tools/perf: Add support to use libcapstone in powerpc Athira Rajeev
2024-07-24 21:00   ` Arnaldo Carvalho de Melo
2024-07-26 12:35     ` Athira Rajeev
2024-07-31 13:51   ` kernel test robot
2024-07-18  8:43 ` [PATCH V8 15/15] tools/perf: Set instruction name to be used with insn-stat when using raw instruction Athira Rajeev
2024-07-23 20:07 ` [PATCH V8 00/15] Add data type profiling support for powerpc Arnaldo Carvalho de Melo
2024-07-24  7:59   ` Athira Rajeev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).