From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 455C0C433EF for ; Mon, 23 May 2022 12:43:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235635AbiEWMnR (ORCPT ); Mon, 23 May 2022 08:43:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235586AbiEWMnQ (ORCPT ); Mon, 23 May 2022 08:43:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDD2251591; Mon, 23 May 2022 05:43:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5BB0561328; Mon, 23 May 2022 12:43:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E962C385AA; Mon, 23 May 2022 12:43:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653309793; bh=zW9uiCsuxykAehNhwe3g0mkRkT0Z9k/pe4p8dG5psYo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QIYM0jtVJAt0si8PuKuXgneadJOxBonjx7l5HW6pWIm4QE3rApReQmwxmL0lyzfHH nsH2GK/DvSttv7UI/GYjghA2k7hJjx7xJaSb8xBlDqFFEZdZ3Hv8ocf0HybQKhlqe9 Jgg/RoGgSpH2XnnjAlKPnMxADKB85UISuS/EdD7GkEJXD0H1lq7HCU3L2+eYnt9rUi 6E/l8Er7Y0HVauv+WQKbuF7wGm/rHXVQKx5DPnn6v7yJg7ZMbqD0j1mmB8sxQOJi8N RmrbvQ+svVaw9Tqx5hivxmyQQMtcL8IqR2JZr2vSD6I5J12iwrHY2+6ct5kE1XLWgs S4JsP9/7obieA== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 83FCD400B1; Mon, 23 May 2022 09:43:10 -0300 (-03) Date: Mon, 23 May 2022 09:43:10 -0300 From: Arnaldo Carvalho de Melo To: Jiri Olsa Cc: Leo Yan , Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Namhyung Kim , Like Xu , Alyssa Ross , Ian Rogers , Kajol Jain , Adam Li , Li Huafei , German Gomez , James Clark , Kan Liang , Ali Saidi , Joe Mario , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 00/11] perf c2c: Support display for Arm64 Message-ID: References: <20220518055729.1869566-1-leo.yan@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://acmel.wordpress.com Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org Em Mon, May 23, 2022 at 10:43:47AM +0200, Jiri Olsa escreveu: > On Wed, May 18, 2022 at 01:57:18PM +0800, Leo Yan wrote: > > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows > > us to detect cache line contention and transfers. > > > > Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM' > > snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE > > source and use for perf c2c" [1] which introduces 'peer' flag and > > synthesizes memory samples with this flag. > > > > Based on patch set [1], this patch set is to finish the second half work > > to consume the 'peer' flag in perf c2c tool, it adds an extra display > > 'peer' mode. Ok, I'll look at the base patch set... > > Patches 01, 02 and 03 are to support 'N/A' metrics for store operations. > > > > Patches 04 and 05 adds statistics and dimensions for memory samples with > > peer flag. > > > > Patches 06, 07, 08 are for refactoring, it refines the code with more > > general naming so this can allow us to easier to extend display modes > > but not strictly bound to HITM tags. > > > > Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates > > the document and also changes to use 'peer' mode as default mode on > > Arm64 arches. > > > > This patch set has been verified for both x86 and Arm64 memory samples. > > > > The display result with x86 memory samples: > > > > ================================================= > > Shared Data Cache Line Table > > ================================================= > > # > > # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ---- > > # Index Address Node PA cnt Hitm Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt > > # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........ > > # > > 0 0x55c8971f0080 0 1967 66.14% 252 252 0 0 6044 3550 2494 2024 470 0 528 2672 78 20 252 0 0 0 0 > > 1 0x55c8971f00c0 0 1 33.86% 129 129 0 0 914 914 0 0 0 0 272 374 52 87 129 0 0 0 0 > > > > ================================================= > > Shared Cache Line Distribution Pareto > > ================================================= > > # > > # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared > > # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node > > # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ....................... .... > > # > > ------------------------------------------------------------------------------- > > 0 0 252 0 2024 470 0 0x55c8971f0080 > > ------------------------------------------------------------------------------- > > 0.00% 12.30% 0.00% 0.00% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e9 0 1313 863 0 1222 3 [.] 0x00000000000013e9 false_sharing.exe false_sharing.exe[13e9] 0 > > 0.00% 0.79% 0.00% 90.51% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e2 0 1800 878 0 3029 3 [.] 0x00000000000013e2 false_sharing.exe false_sharing.exe[13e2] 0 > > 0.00% 0.00% 0.00% 9.49% 100.00% 0.00% 0x0 0 1 0x55c8971ed3f4 0 0 0 0 662 3 [.] 0x00000000000013f4 false_sharing.exe false_sharing.exe[13f4] 0 > > 0.00% 86.90% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed447 0 141 103 0 1131 2 [.] 0x0000000000001447 false_sharing.exe false_sharing.exe[1447] 0 > > > > ------------------------------------------------------------------------------- > > 1 0 129 0 0 0 0 0x55c8971f00c0 > > ------------------------------------------------------------------------------- > > 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed455 0 88 94 0 914 2 [.] 0x0000000000001455 false_sharing.exe false_sharing.exe[1455] 0 > > > > > > The display result with Arm SPE memory samples: > > > > ================================================= > > Shared Data Cache Line Table > > ================================================= > > # > > # ----------- Cacheline ---------- Snoop ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ---- > > # Index Address Node PA cnt Peer Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt > > # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........ > > # > > 0 0xaaaac17d6000 N/A 0 100.00% 0 0 0 99 18851 18851 0 0 0 0 0 18752 0 99 0 0 0 0 0 > > > > ================================================= > > Shared Cache Line Distribution Pareto > > ================================================= > > # > > # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared > > # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node > > # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................ ............... .... > > # > > ------------------------------------------------------------------------------- > > 0 0 0 99 0 0 0 0xaaaac17d6000 > > ------------------------------------------------------------------------------- > > 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0 > > 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0 > > > > [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/ > > > > Changes from v2: > > * Updated patch 04 to account metrics for both cache level and ld_peer > > for PEER flag; > > * Updated document for metric 'rmt_hit' which is accounted for all > > remote accesses (include remote DRAM and any upward caches). > > LGTM > > Acked-by: Jiri Olsa > > thanks, > jirka > > > > > Changes from v1: > > * Updated patches 01, 02 and 03 to support 'N/A' metrics for store > > operations, so can align with the patch set [1] for store samples. > > > > > > Leo Yan (11): > > perf mem: Add stats for store operation with no available memory level > > perf c2c: Add dimensions for 'N/A' metrics of store operation > > perf c2c: Update documentation for store metric 'N/A' > > perf mem: Add statistics for peer snooping > > perf c2c: Add dimensions for peer load operations > > perf c2c: Use explicit names for display macros > > perf c2c: Rename dimension from 'percent_hitm' to > > 'percent_costly_snoop' > > perf c2c: Refactor node header > > perf c2c: Sort on peer snooping for load operations > > perf c2c: Update documentation for new display option 'peer' > > perf c2c: Use 'peer' as default display for Arm64 > > > > tools/perf/Documentation/perf-c2c.txt | 34 ++- > > tools/perf/builtin-c2c.c | 357 ++++++++++++++++++++------ > > tools/perf/util/mem-events.c | 25 +- > > tools/perf/util/mem-events.h | 2 + > > 4 files changed, 331 insertions(+), 87 deletions(-) > > > > -- > > 2.25.1 > > -- - Arnaldo