From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Jiri Olsa <olsajiri@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Namhyung Kim <namhyung@kernel.org>, Like Xu <likexu@tencent.com>,
Alyssa Ross <hi@alyssa.is>, Ian Rogers <irogers@google.com>,
Kajol Jain <kjain@linux.ibm.com>,
Adam Li <adamli@amperemail.onmicrosoft.com>,
Li Huafei <lihuafei1@huawei.com>,
German Gomez <german.gomez@arm.com>,
James Clark <james.clark@arm.com>,
Kan Liang <kan.liang@linux.intel.com>,
Ali Saidi <alisaidi@amazon.com>, Joe Mario <jmario@redhat.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 00/11] perf c2c: Support display for Arm64
Date: Mon, 23 May 2022 09:43:10 -0300 [thread overview]
Message-ID: <YouBXricVunh1NhJ@kernel.org> (raw)
In-Reply-To: <YotJQwZVfBtrEyuI@krava>
Em Mon, May 23, 2022 at 10:43:47AM +0200, Jiri Olsa escreveu:
> On Wed, May 18, 2022 at 01:57:18PM +0800, Leo Yan wrote:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> >
> > Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
> > snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE
> > source and use for perf c2c" [1] which introduces 'peer' flag and
> > synthesizes memory samples with this flag.
> >
> > Based on patch set [1], this patch set is to finish the second half work
> > to consume the 'peer' flag in perf c2c tool, it adds an extra display
> > 'peer' mode.
Ok, I'll look at the base patch set...
> > Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.
> >
> > Patches 04 and 05 adds statistics and dimensions for memory samples with
> > peer flag.
> >
> > Patches 06, 07, 08 are for refactoring, it refines the code with more
> > general naming so this can allow us to easier to extend display modes
> > but not strictly bound to HITM tags.
> >
> > Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
> > the document and also changes to use 'peer' mode as default mode on
> > Arm64 arches.
> >
> > This patch set has been verified for both x86 and Arm64 memory samples.
> >
> > The display result with x86 memory samples:
> >
> > =================================================
> > Shared Data Cache Line Table
> > =================================================
> > #
> > # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> > # Index Address Node PA cnt Hitm Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> > # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> > #
> > 0 0x55c8971f0080 0 1967 66.14% 252 252 0 0 6044 3550 2494 2024 470 0 528 2672 78 20 252 0 0 0 0
> > 1 0x55c8971f00c0 0 1 33.86% 129 129 0 0 914 914 0 0 0 0 272 374 52 87 129 0 0 0 0
> >
> > =================================================
> > Shared Cache Line Distribution Pareto
> > =================================================
> > #
> > # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
> > # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node
> > # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ....................... ....
> > #
> > -------------------------------------------------------------------------------
> > 0 0 252 0 2024 470 0 0x55c8971f0080
> > -------------------------------------------------------------------------------
> > 0.00% 12.30% 0.00% 0.00% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e9 0 1313 863 0 1222 3 [.] 0x00000000000013e9 false_sharing.exe false_sharing.exe[13e9] 0
> > 0.00% 0.79% 0.00% 90.51% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e2 0 1800 878 0 3029 3 [.] 0x00000000000013e2 false_sharing.exe false_sharing.exe[13e2] 0
> > 0.00% 0.00% 0.00% 9.49% 100.00% 0.00% 0x0 0 1 0x55c8971ed3f4 0 0 0 0 662 3 [.] 0x00000000000013f4 false_sharing.exe false_sharing.exe[13f4] 0
> > 0.00% 86.90% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed447 0 141 103 0 1131 2 [.] 0x0000000000001447 false_sharing.exe false_sharing.exe[1447] 0
> >
> > -------------------------------------------------------------------------------
> > 1 0 129 0 0 0 0 0x55c8971f00c0
> > -------------------------------------------------------------------------------
> > 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed455 0 88 94 0 914 2 [.] 0x0000000000001455 false_sharing.exe false_sharing.exe[1455] 0
> >
> >
> > The display result with Arm SPE memory samples:
> >
> > =================================================
> > Shared Data Cache Line Table
> > =================================================
> > #
> > # ----------- Cacheline ---------- Snoop ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> > # Index Address Node PA cnt Peer Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> > # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> > #
> > 0 0xaaaac17d6000 N/A 0 100.00% 0 0 0 99 18851 18851 0 0 0 0 0 18752 0 99 0 0 0 0 0
> >
> > =================================================
> > Shared Cache Line Distribution Pareto
> > =================================================
> > #
> > # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
> > # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node
> > # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................ ............... ....
> > #
> > -------------------------------------------------------------------------------
> > 0 0 0 99 0 0 0 0xaaaac17d6000
> > -------------------------------------------------------------------------------
> > 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0
> > 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0
> >
> > [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/
> >
> > Changes from v2:
> > * Updated patch 04 to account metrics for both cache level and ld_peer
> > for PEER flag;
> > * Updated document for metric 'rmt_hit' which is accounted for all
> > remote accesses (include remote DRAM and any upward caches).
>
> LGTM
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
> thanks,
> jirka
>
> >
> > Changes from v1:
> > * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
> > operations, so can align with the patch set [1] for store samples.
> >
> >
> > Leo Yan (11):
> > perf mem: Add stats for store operation with no available memory level
> > perf c2c: Add dimensions for 'N/A' metrics of store operation
> > perf c2c: Update documentation for store metric 'N/A'
> > perf mem: Add statistics for peer snooping
> > perf c2c: Add dimensions for peer load operations
> > perf c2c: Use explicit names for display macros
> > perf c2c: Rename dimension from 'percent_hitm' to
> > 'percent_costly_snoop'
> > perf c2c: Refactor node header
> > perf c2c: Sort on peer snooping for load operations
> > perf c2c: Update documentation for new display option 'peer'
> > perf c2c: Use 'peer' as default display for Arm64
> >
> > tools/perf/Documentation/perf-c2c.txt | 34 ++-
> > tools/perf/builtin-c2c.c | 357 ++++++++++++++++++++------
> > tools/perf/util/mem-events.c | 25 +-
> > tools/perf/util/mem-events.h | 2 +
> > 4 files changed, 331 insertions(+), 87 deletions(-)
> >
> > --
> > 2.25.1
> >
--
- Arnaldo
prev parent reply other threads:[~2022-05-23 12:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-18 5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
2022-05-18 5:57 ` [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
2022-05-18 5:57 ` [PATCH v3 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
2022-05-18 5:57 ` [PATCH v3 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
2022-05-18 5:57 ` [PATCH v3 04/11] perf mem: Add statistics for peer snooping Leo Yan
2022-05-23 12:38 ` Arnaldo Carvalho de Melo
2022-05-23 12:46 ` Arnaldo Carvalho de Melo
2022-05-18 5:57 ` [PATCH v3 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
2022-05-18 5:57 ` [PATCH v3 06/11] perf c2c: Use explicit names for display macros Leo Yan
2022-05-18 5:57 ` [PATCH v3 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
2022-05-18 5:57 ` [PATCH v3 08/11] perf c2c: Refactor node header Leo Yan
2022-05-18 5:57 ` [PATCH v3 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
2022-05-18 5:57 ` [PATCH v3 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
2022-05-18 5:57 ` [PATCH v3 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
2022-05-23 8:43 ` [PATCH v3 00/11] perf c2c: Support " Jiri Olsa
2022-05-23 12:43 ` Arnaldo Carvalho de Melo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YouBXricVunh1NhJ@kernel.org \
--to=acme@kernel.org \
--cc=adamli@amperemail.onmicrosoft.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=alisaidi@amazon.com \
--cc=german.gomez@arm.com \
--cc=hi@alyssa.is \
--cc=irogers@google.com \
--cc=james.clark@arm.com \
--cc=jmario@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=kjain@linux.ibm.com \
--cc=leo.yan@linaro.org \
--cc=lihuafei1@huawei.com \
--cc=likexu@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=olsajiri@gmail.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).