From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1494B13CA99; Tue, 16 Jul 2024 14:19:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721139558; cv=none; b=o/w1x7jE4/I9eHe8cvUMLB61gaymPoiMOhqMxJH1JlLtgEThqv1jgeJd5s8ZMfMQl5nSEb7KneaxSpxWiRKBVM9YbZ+ozdFM+R5L3YL+tTL+vDAhl3o/CGcDEttd3NXZ8VpYGYLBnJC2AaMU8WvGCSE/0Ny0nRcb8AwvZE2Mwi0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721139558; c=relaxed/simple; bh=BFrTvLpHcAteLe2pHcDadeGS7PKm36+K6zclLIqIB9A=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=lDPbacY69QAPi/NCcfUHc7wcSULlY26CI/CyBru7sGWlPOCu2lo5kUaDdS+8j9FEmqN1/cRurJF0mH73cl6P7fKMm8VLz+iJXbC4Jc/Q2gfPPyrNYXFc7/vNlPaUJWnUsCiMCj6uc0pFv1ar4fNmrZKMBRV+dPPlGyecvSwgEQk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=DFLsMH9I; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="DFLsMH9I" Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46GDuUGk026025; Tue, 16 Jul 2024 14:18:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= message-id:date:subject:to:cc:references:from:in-reply-to :content-type:content-transfer-encoding:mime-version; s=pp1; bh= h1v4LZQZvqZnWXqhrnfgI2rlC5cXoPHALSq1t/1GHP0=; b=DFLsMH9IwXZjKWYl WgWuJn7VWgpLvL5cvWzB1/AcxC0E0y/LVMV7qnc0uUOLelMeRiDojFB3ekxNtWr+ x7MIKkNfyTn7FkjGrwefTm87J6Jjclih9sZ5KhleCp/InMUeXqRH+bcZtc15FgHj q5PRfCbdJKzXg8i+1VdWZuU4VGmI2nIKcDRlbJxKCdlUiepFkYOC22+TF/enq1Hn mju+RuP2LQufE75tgsEIONIBm1prlJAxvocdlvVgN951HPfEpSnMAoK1Y8dbky/y dzqWGqE2r52yXX5ZTjsByjVg5z0IaImwgCtFxqG4nQ7pWPy+KM9bsp+J5rT+SW4E 6WBgxw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40dr10gc71-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Jul 2024 14:18:50 +0000 (GMT) Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46GEIn9W027797; Tue, 16 Jul 2024 14:18:49 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40dr10gc6w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Jul 2024 14:18:49 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46GBFU0q023036; Tue, 16 Jul 2024 14:18:48 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 40c64m43m8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Jul 2024 14:18:48 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46GEIhqG32768370 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 16 Jul 2024 14:18:45 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E4A9E20070; Tue, 16 Jul 2024 14:18:42 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1AD6420040; Tue, 16 Jul 2024 14:18:40 +0000 (GMT) Received: from [9.171.28.24] (unknown [9.171.28.24]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 16 Jul 2024 14:18:39 +0000 (GMT) Message-ID: <09f03150-bc98-446a-aa67-680b74bc51a2@linux.ibm.com> Date: Tue, 16 Jul 2024 19:48:39 +0530 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V7 00/18] Add data type profiling support for powerpc To: Athira Rajeev , acme@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, irogers@google.com, namhyung@kernel.org, segher@kernel.crashing.org, christophe.leroy@csgroup.eu Cc: linux-kernel@vger.kernel.org, akanksha@linux.ibm.com, linux-perf-users@vger.kernel.org, maddy@linux.ibm.com, disgoel@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org References: <20240713165529.59298-1-atrajeev@linux.vnet.ibm.com> Content-Language: en-US From: kajoljain In-Reply-To: <20240713165529.59298-1-atrajeev@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: KJn1ICs6rkHRo3xyFbAkZkPFJ1SfJM8H X-Proofpoint-ORIG-GUID: RCho9bFruxI5wJ7SdQ-Gchb2EKrhsVyq Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-15_19,2024-07-16_02,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 malwarescore=0 phishscore=0 impostorscore=0 mlxscore=0 bulkscore=0 adultscore=0 clxscore=1015 suspectscore=0 mlxlogscore=999 priorityscore=1501 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2407160103 Patchset looks fine to me. Tested-by: Kajol Jain Reviewed-by: Kajol Jain Thanks, Kajol Jain On 7/13/24 22:25, Athira Rajeev wrote: > The patchset from Namhyung added support for data type profiling > in perf tool. This enabled support to associate PMU samples to data > types they refer using DWARF debug information. With the upstream > perf, currently it possible to run perf report or perf annotate to > view the data type information on x86. > > Initial patchset posted here had changes need to enable data type > profiling support for powerpc. > > https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/ > > Main change were: > 1. powerpc instruction nmemonic table to associate load/store > instructions with move_ops which is use to identify if instruction > is a memory access one. > 2. To get register number and access offset from the given > instruction, code uses fields from "struct arch" -> objump. > Added entry for powerpc here. > 3. A get_arch_regnum to return register number from the > register name string. > > But the apporach used in the initial patchset used parsing of > disassembled code which the current perf tool implementation does. > > Example: lwz r10,0(r9) > > This line "lwz r10,0(r9)" is parsed to extract instruction name, > registers names and offset. Also to find whether there is a memory > reference in the operands, "memory_ref_char" field of objdump is used. > For x86, "(" is used as memory_ref_char to tackle instructions of the > form "mov (%rax), %rcx". > > In case of powerpc, not all instructions using "(" are the only memory > instructions. Example, above instruction can also be of extended form (X > form) "lwzx r10,0,r19". Inorder to easy identify the instruction category > and extract the source/target registers, second patchset added support to use > raw instruction. With raw instruction, macros are added to extract opcode > and register fields. > Link to second patchset: > https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/ > > Example representation using --show-raw-insn in objdump gives result: > > 38 01 81 e8 ld r4,312(r1) > > Here "38 01 81 e8" is the raw instruction representation. In powerpc, > this translates to instruction form: "ld RT,DS(RA)" and binary code > as: > _____________________________________ > | 58 | RT | RA | DS | | > ------------------------------------- > 0 6 11 16 30 31 > > Second patchset used "objdump" again to read the raw instruction. > But since there is no need to disassemble and binary code can be read > directly from the DSO, third patchset (ie this patchset) uses below > apporach. The apporach preferred in powerpc to parse sample for data > type profiling in V3 patchset is: > - Read directly from DSO using dso__data_read_offset > - If that fails for any case, fallback to using libcapstone > - If libcapstone is not supported, approach will use objdump > > Patchset adds support to pick the opcode and reg fields from this > raw/binary instruction code. This approach came in from review comment > by Segher Boessenkool and Christophe for the initial patchset. > > Apart from that, instruction tracking is enabled for powerpc and > support function is added to find variables defined as registers > Example, in powerpc, below two registers are > defined to represent variable: > 1. r13: represents local_paca > register struct paca_struct *local_paca asm("r13"); > > 2. r1: represents stack_pointer > register void *__stack_pointer asm("r1"); > > These are handled in this patchset. > > - Patch 1 is to rearrange register state type structures to header file > so that it can referred from other arch specific files > - Patch 2 is to make instruction tracking as a callback to"struct arch" > so that it can be implemented by other archs easily and defined in arch > specific files > - Patch 3 is to handle state type regs array size for x86 and powerpc > - Patch 4 adds support to capture and parse raw instruction in powerpc > using dso__data_read_offset utility > - Patch 4 also adds logic to support using objdump when doing default "perf > report" or "perf annotate" since it that needs disassembled instruction. > - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc > - Patch 6 update parameters for reg extract functions to use raw > instruction on powerpc > - Patch 7 updates ins__find to carry raw_insn and also adds parse > callback for memory instructions for powerpc > - Patch 8 add support to identify memory instructions of opcode 31 in > powerpc > - Patch 9 adds more instructions to support instruction tracking in powerpc > - Patch 10 and 11 handles instruction tracking for powerpc. > - Patch 12, 13 and 14 add support to use libcapstone in powerpc > - Patch 15 and patch 16 handles support to find global register variables > - PAtch 17 updates data type compare functions data_type_cmp and > sort__typeoff_sort to include var_name along with type_name in > comparison. > - Patch 18 handles insn-stat option for perf annotate > > Note: > - There are remaining unknowns (25%) as seen in annotate Instruction stats > below. > - This patchset is not tested on powerpc32. In next step of enhancements > along with handling remaining unknowns, plan to cover powerpc32 changes > based on how testing goes. > > With the current patchset: > > ./perf record -a -e mem-loads sleep 1 > ./perf report -s type,typeoff --hierarchy --group --stdio > ./perf annotate --data-type --insn-stat > > perf annotate logs: > ================== > > > Annotate Instruction stats > total 609, ok 446 (73.2%), bad 163 (26.8%) > > Name/opcode : Good Bad > ----------------------------------------------------------- > 58 : 323 80 > 32 : 49 43 > 34 : 33 11 > OP_31_XOP_LDX : 8 20 > 40 : 23 0 > OP_31_XOP_LWARX : 5 1 > OP_31_XOP_LWZX : 2 3 > OP_31_XOP_LDARX : 3 0 > 33 : 0 2 > OP_31_XOP_LBZX : 0 1 > OP_31_XOP_LWAX : 0 1 > OP_31_XOP_LHZX : 0 1 > > perf report logs: > ================= > > Total Lost Samples: 0 > > Samples: 1K of event 'mem-loads' > Event count (approx.): 937238 > > Overhead Data Type Data Type Offset > ........ ......... ................ > 48.60% (unknown) (unknown) +0 (no field) > 11.42% long unsigned int long unsigned int +0 (current_stack_pointer) > 4.68% struct paca_struct struct paca_struct +2312 (__current) > 4.57% struct paca_struct struct paca_struct +2354 (irq_soft_mask) > 2.69% struct paca_struct struct paca_struct +2808 (canary) > 2.68% struct paca_struct struct paca_struct +8 (paca_index) > 2.24% struct paca_struct struct paca_struct +48 (data_offset) > 1.43% long unsigned int long unsigned int +0 (no field) > 1.41% struct vm_fault struct vm_fault +0 (vma) > 1.29% struct task_struct struct task_struct +276 (flags) > 1.03% struct pt_regs struct pt_regs +264 (user_regs.msr) > 0.90% struct security_hook_list struct security_hook_list +0 (list.next) > 0.76% struct irq_desc struct irq_desc +304 (irq_data.chip) > 0.76% struct rq struct rq +2856 (cpu) > 0.72% long long unsigned int long long unsigned int +0 (no field) > > Thanks > Athira Rajeev > > Changelog: > From v6 -> v7: > - Addressed review comments from Namhyung > Changed format string space to %-20s while printing > instruction stats in patch 18. > Use cmp_null in patch 17 while comparing var_name to > properly sort with correct order. > > From v5 -> v6: > - Addressed review comments from Namhyung > Conditionally define TYPE_STATE_MAX_REGS based on arch. > Added macro for defining width of the raw codes and spaces > in disasm_line__parse_powerpc. > Call disasm_line__parse from disasm_line__parse_powerpc > for generic code. > Renamed symbol__disassemble_dso to symbol__disassemble_raw. > Fixed find_data_type_global_reg to correclty free var_types > and change indent level. > Fixed data_type_cmp and sort__typeoff_sort to include var_name > in comparing data type entries. > > From v4 -> v5: > - Addressed review comments from Namhyung > Handle max number of type state regs as 16 for x86 and 32 for > powerpc. > Added generic support for objdump patch first and DSO read > optimisation next > combined patch 3 and patch 4 in patchseries V4 to one patch > Changed reference for "raw_insn" to use "u32" > Splitted "parse" callback patch changes and "ins__find" patch > changes into two > Instead of making weak function, added get_powerpc_regs to > extract register and offset fields for powerpc > - Addressed complation fail when "dwarf.h" is not present ie > elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT > when including functions that use Dwarf references. Also > conditionally include some of the header files. > > From v3->v4: > - Addressed review comments from Ian by using capston_init from > "util/print_insn.c" instead of "open_capston_handle". > - Addressed review comment from Namhyung by moving "opcode" > field from "struct ins" to "struct disasm_line" > > From v2->v3: > - Addressed review comments from Christophe and Namhyung for V2 > - Changed the apporach in powerpc to parse sample for data > type profiling as: > Read directly from DSO using dso__data_read_offset > If that fails for any case, fallback to using libcapstone > If libcapstone is not supported, approach will use objdump > - Include instructions with opcode as 31 and correctly categorize > them as memory or arithmetic instructions. > - Include more instructions for instruction tracking in powerpc > > From v1->v2: > - Addressed suggestion from Christophe Leroy and Segher Boessenkool > to use the binary code (raw insn) to fetch opcode, register and > offset fields. > - Added support for instruction tracking in powerpc > - Find the register defined variables (r13 and r1 which points to > local_paca and current_stack_pointer in powerpc) > > Athira Rajeev (18): > tools/perf: Move the data structures related to register type to > header file > tools/perf: Add "update_insn_state" callback function to handle arch > specific instruction tracking > tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in > powerpc > tools/perf: Add disasm_line__parse to parse raw instruction for > powerpc > tools/perf: Add support to capture and parse raw instruction in > powerpc using dso__data_read_offset utility > tools/perf: Update parameters for reg extract functions to use raw > instruction on powerpc > tools/perf: Add parse function for memory instructions in powerpc > tools/perf: Add support to identify memory instructions of opcode 31 > in powerpc > tools/perf: Add some of the arithmetic instructions to support > instruction tracking in powerpc > tools/perf: Add more instructions for instruction tracking > tools/perf: Update instruction tracking for powerpc > tools/perf: Make capstone_init non-static so that it can be used > during symbol disassemble > tools/perf: Use capstone_init and remove open_capstone_handle from > disasm.c > tools/perf: Add support to use libcapstone in powerpc > tools/perf: Add support to find global register variables using > find_data_type_global_reg > tools/perf: Add support for global_die to capture name of variable in > case of register defined variable > tools/perf: Update data_type_cmp and sort__typeoff_sort function to > include var_name in comparison > tools/perf: Set instruction name to be used with insn-stat when using > raw instruction > > tools/include/linux/string.h | 2 + > tools/lib/string.c | 13 + > tools/perf/arch/arm64/annotate/instructions.c | 3 +- > .../arch/loongarch/annotate/instructions.c | 6 +- > .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++ > tools/perf/arch/powerpc/util/dwarf-regs.c | 53 ++ > tools/perf/arch/s390/annotate/instructions.c | 5 +- > tools/perf/arch/x86/annotate/instructions.c | 377 ++++++++++++ > tools/perf/builtin-annotate.c | 4 +- > tools/perf/util/annotate-data.c | 544 ++++-------------- > tools/perf/util/annotate-data.h | 83 +++ > tools/perf/util/annotate.c | 29 +- > tools/perf/util/annotate.h | 6 +- > tools/perf/util/disasm.c | 468 +++++++++++++-- > tools/perf/util/disasm.h | 19 +- > tools/perf/util/dwarf-aux.c | 1 + > tools/perf/util/dwarf-aux.h | 1 + > tools/perf/util/include/dwarf-regs.h | 12 + > tools/perf/util/print_insn.c | 15 +- > tools/perf/util/print_insn.h | 5 + > tools/perf/util/sort.c | 25 +- > tools/perf/util/sort.h | 1 + > 22 files changed, 1421 insertions(+), 505 deletions(-) >