linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Alyssa Ross <hi@alyssa.is>, Ian Rogers <irogers@google.com>,
	Like Xu <likexu@tencent.com>, Kajol Jain <kjain@linux.ibm.com>,
	Li Huafei <lihuafei1@huawei.com>, Joe Mario <jmario@redhat.com>,
	Adam Li <adam.li@amperecomputing.com>,
	German Gomez <german.gomez@arm.com>,
	James Clark <james.clark@arm.com>,
	Ali Saidi <alisaidi@amazon.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 00/12] perf c2c: Support display for Arm64
Date: Fri, 3 Jun 2022 21:33:10 +0200	[thread overview]
Message-ID: <Ypph9lA1qavXXZI5@kernel.org> (raw)
In-Reply-To: <20220530114036.3225544-1-leo.yan@linaro.org>

Em Mon, May 30, 2022 at 07:40:24PM +0800, Leo Yan escreveu:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
> 
> This patch set is based on Ali Said's patch set v9 "perf: arm-spe: Decode SPE
> source and use for perf c2c" [1] and Ali's patch set doesn't need any
> change in this new round.

IIRC there is a kernel part there, please let me know when that part
gets merged so that I can process this 12 patches long set.

- Arnaldo
 
> To clearly show peer loads and express the local peer loads and remote
> peer lodes, this patch introduces three new metrics 'lcl_peer',
> 'rmt_peer' and 'tot_peer'.  The display 'peer' mode uses metric
> 'tot_peer' for sorting cache lines.
> 
> Patches 01-05 adds statistics for memory samples, and add dimensions for
> peer metrics.
> 
> Patches 06-09 are for refactoring, it refines the code with more general
> naming so this can allow us to easier to extend display modes but not
> strictly bound to HITM tags.
> 
> Patches 10-11 are to extend display 'peer' mode, and also changes to use
> 'peer' mode as default mode for Arm64 arches.
> 
> Patch 12 updates document to describe the new dimensions for peer
> metrics.
> 
> This patch set has been verified for both x86 and Arm64 memory samples.
> 
> Known issues:  Joe reminded there have an issue in patch set v3 that the
> cache line metric shows 'N/A' for node, this is because Arm SPE trace
> data doesn't contain physical address and leads to perf c2c tool fails
> to find matched node range if physical address is zero.  This issue is
> addressed in a separte patch [2].  Since I am still using the old
> perf data file (I have no Neoverse platforms), the output result still
> shows the Node field is 'N/A'.
> 
> Another thing is we need to enhance data source setting for old Arm
> platforms.  As discussed, German would follow up this task later.
> 
> The latest patch set has been uploaded on the git server [3].
> 
> The display result with x86 memory samples:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x55c8971f0080     0    1967   66.14%      252      252        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
>         1      0x55c8971f00c0     0       1   33.86%      129      129        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        ----- HITM -----  ------- Store Refs ------  --------- Data address ---------                      ---------- cycles ----------    Total       cpu                                     Shared                               
>   #   Num  RmtHitm  LclHitm   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load  records       cnt                  Symbol             Object              Source:Line  Node
>   # .....  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
>   #
>     ----------------------------------------------------------------------
>         0        0      252     2024      470        0      0x55c8971f0080
>     ----------------------------------------------------------------------
>              0.00%   12.30%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
>              0.00%    0.79%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
>              0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
>              0.00%   86.90%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0
> 
>     ----------------------------------------------------------------------
>         1        0      129        0        0        0      0x55c8971f00c0
>     ----------------------------------------------------------------------
>              0.00%  100.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0
> 
> 
> The display result with Arm SPE:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------     Peer  ------- Load Peer -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt    Snoop    Total    Local   Remote  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0xaaaac17d6000   N/A       0  100.00%       99       99        0    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        -- Peer Snoop --  ------- Store Refs ------  --------- Data address ---------                      ---------- cycles ----------    Total       cpu                                    Shared                       
>   #   Num      Rmt      Lcl   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt peer  lcl peer      load  records       cnt                  Symbol            Object      Source:Line  Node{cpus %peers %stores}
>   # .....  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
>   #
>     ----------------------------------------------------------------------
>         0        0       99        0        0        0      0xaaaac17d6000
>     ----------------------------------------------------------------------
>              0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0       375        43    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 2 100.0%    n/a}
>              0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0       180       173      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 2 100.0%    n/a}
> 
> 
> Changes from v3:
> * Changed to display remote and local peer accesses (Joe);
> * Fixed the usage info for display types (Joe);
> * Do not display HITM dimensions when use 'peer' display, and HITM
>   display doesn't show any 'peer' dimensions (James);
> * Split to smaller patches for adding dimensions of peer operations;
> * Updated documentation to reflect the latest GUI and stdio.
> 
> Changes from v2:
> * Updated patch 04 to account metrics for both cache level and ld_peer
>   for PEER flag;
> * Updated document for metric 'rmt_hit' which is accounted for all
>   remote accesses (include remote DRAM and any upward caches).
> 
> Changes from v1:
> * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
>   operations, so can align with the patch set [1] for store samples.
> 
> 
> [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/
> [2] https://lore.kernel.org/lkml/20220530083645.253432-1-leo.yan@linaro.org/
> [3] https://git.linaro.org/people/leo.yan/linux-spe.git/ branch: perf_c2c_arm_spe_peer_v4
> 
> 
> Leo Yan (12):
>   perf mem: Add statistics for peer snooping
>   perf c2c: Output statistics for peer snooping
>   perf c2c: Add dimensions for peer load operations
>   perf c2c: Add dimensions of peer metrics for cache line view
>   perf c2c: Add mean dimensions for peer operations
>   perf c2c: Use explicit names for display macros
>   perf c2c: Rename dimension from 'percent_hitm' to
>     'percent_costly_snoop'
>   perf c2c: Refactor node header
>   perf c2c: Refactor display string
>   perf c2c: Sort on peer snooping for load operations
>   perf c2c: Use 'peer' as default display for Arm64
>   perf c2c: Update documentation for new display option 'peer'
> 
>  tools/perf/Documentation/perf-c2c.txt |  30 +-
>  tools/perf/builtin-c2c.c              | 454 ++++++++++++++++++++------
>  tools/perf/util/mem-events.c          |  28 +-
>  tools/perf/util/mem-events.h          |   3 +
>  4 files changed, 403 insertions(+), 112 deletions(-)
> 
> -- 
> 2.25.1

-- 

- Arnaldo

  parent reply	other threads:[~2022-06-03 19:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30 11:40 [PATCH v4 00/12] perf c2c: Support display for Arm64 Leo Yan
2022-05-30 11:40 ` [PATCH v4 01/12] perf mem: Add statistics for peer snooping Leo Yan
2022-05-30 11:40 ` [PATCH v4 02/12] perf c2c: Output " Leo Yan
2022-05-30 11:40 ` [PATCH v4 03/12] perf c2c: Add dimensions for peer load operations Leo Yan
2022-05-30 11:40 ` [PATCH v4 04/12] perf c2c: Add dimensions of peer metrics for cache line view Leo Yan
2022-05-30 11:40 ` [PATCH v4 05/12] perf c2c: Add mean dimensions for peer operations Leo Yan
2022-05-30 11:40 ` [PATCH v4 06/12] perf c2c: Use explicit names for display macros Leo Yan
2022-05-30 11:40 ` [PATCH v4 07/12] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
2022-05-30 11:40 ` [PATCH v4 08/12] perf c2c: Refactor node header Leo Yan
2022-05-30 11:40 ` [PATCH v4 09/12] perf c2c: Refactor display string Leo Yan
2022-05-30 11:40 ` [PATCH v4 10/12] perf c2c: Sort on peer snooping for load operations Leo Yan
2022-05-30 11:40 ` [PATCH v4 11/12] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
2022-05-30 11:40 ` [PATCH v4 12/12] perf c2c: Update documentation for new display option 'peer' Leo Yan
2022-05-31 18:44 ` [PATCH v4 00/12] perf c2c: Support display for Arm64 Joe Mario
2022-06-01 10:25   ` Leo Yan
2022-06-02 16:59     ` Ian Rogers
2022-06-03  3:36       ` Leo Yan
2022-06-02 17:11     ` Ali Saidi
2022-06-03  3:46       ` Leo Yan
2022-06-03 19:33 ` Arnaldo Carvalho de Melo [this message]
2022-06-04  1:40   ` Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ypph9lA1qavXXZI5@kernel.org \
    --to=acme@kernel.org \
    --cc=adam.li@amperecomputing.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alisaidi@amazon.com \
    --cc=german.gomez@arm.com \
    --cc=hi@alyssa.is \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=lihuafei1@huawei.com \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).