public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@linaro.org>
To: Ali Saidi <alisaidi@amazon.com>
Cc: Nick.Forrington@arm.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com,
	benh@kernel.crashing.org, german.gomez@arm.com,
	james.clark@arm.com, john.garry@huawei.com, jolsa@kernel.org,
	kjain@linux.ibm.com, lihuafei1@huawei.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	mark.rutland@arm.com, mathieu.poirier@linaro.org,
	mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org,
	will@kernel.org
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores
Date: Thu, 31 Mar 2022 20:19:02 +0800	[thread overview]
Message-ID: <20220331121902.GA1704284@leoy-ThinkPad-X240s> (raw)
In-Reply-To: <20220329143214.12707-1-alisaidi@amazon.com>

Hi Ali,

On Tue, Mar 29, 2022 at 02:32:14PM +0000, Ali Saidi wrote:

[...]

> > I still think we should consider to extend the memory levels to
> > demonstrate clear momory hierarchy on Arm archs, I personally like the
> > definitions for "PEER_CORE", "LCL_CLSTR", "PEER_CLSTR" and "SYS_CACHE",
> > though these cache levels are not precise like L1/L2/L3 levels, they can
> > help us to map very well for the cache topology on Arm archs and without
> > any confusion.  We could take this as an enhancement if you don't want
> > to bother the current patch set's upstreaming.
> 
> I'd like to do this in a separate patch, but I have one other proposal. The
> Neoverse cores L2 is strictly inclusive of the L1, so even if it's in the L1,
> it's also in the L2. Given that the Graviton systems and afaik the Ampere
> systems don't have any cache between the L2 and the SLC, thus anything from
> PEER_CORE, LCL_CLSTR, or PEER_CLSTR would hit in the L2, perhaps we
> should just set L2 for these cases? German, are you good with this for now? 

If we use a single cache level (no matterh it's L2 or ANY_CACHE) for
these data sources, it's hard for users to understand what's the cost
for the memory operations.  So here I suggested for these new cache
levels is not only about cache level, it's more about the information
telling the memory operation's cost.

[...]

> > Alternatively, I think it's good to pick up the patch series "perf c2c:
> > Sort cacheline with all loads" [1], rather than relying on HITM tag, the
> > patch series extends a new option "-d all" for perf c2c, so it displays
> > the suspecious false sharing cache lines based on load/store ops and
> > thread infos.  The main reason for holding on th patch set is due to we
> > cannot verify it with Arm SPE at that time point, as the time being Arm
> > SPE trace data was absent both store ops and data source packets.
> 
> Looking at examples I don't, at least from my system, data-source isn't set for
> stores, only for loads.

Ouch ...  If data source is not set for store operation, then all store
samples will absent cache level info.  Or should we set ANY_CACHE as
cache level for store operations?

> > I perfer to set PERF_MEM_SNOOP_HIT flag in this patch set and we can
> > upstream the patch series "perf c2c: Sort cacheline with all loads"
> > (only needs upstreaming patches 01, 02, 03, 10, 11, the rest patches
> > have been merged in the mainline kernel).
> > 
> > If this is fine for you, I can respin the patch series for "perf c2c".
> > Or any other thoughts?
> 
> I think this is a nice option to have in the tool-box, but from my point of
> view, I'd like someone who is familiar with c2c output on x86 to come to an
> arm64 system and be able to zero in on a ping-ponging line like they would
> otherwise. Highlighting a line that is moving between cores frequently which is
> likely in the exclusive state by tagging it an HITM accomplishes this and will
> make it easier to find these cases.  Your approach also has innaccurancies and
> wouldn't be able to differentiate between core X accessing a line a lot followed
> by core Y acessing a line alot vs the cores ping-ponging.  Yes, I agree that we
> will "overcount" HITM, but I don't think this is particularly bad and it does
> specifically highlight the core-2-core transfers that are likely a performance
> issue easily and it will result in easier identification of areas of false or
> true sharing and improve performance.

I don't want to block this patch set by this part, and either I don't
want to introduce any confusion for later users, especially I think
users who in later use this tool but it's hard for them to be aware any
assumptions in this discussion thread.  So two options would be fine
for me:

Option 1: if you and Arm mates can confirm that inaccuracy caused by
setting HITM is low (e.g. 2%-3% inaccuracy that introduced by directly
set HITM), I think this could be acceptable.  Otherwise, please
consider option 2.

Option 2: by default we set PERF_MEM_SNOOP_HIT flag since now actually
we have no info to support HITM.  Then use a new patch to add an extra
option (say '--coarse-hitm') for 'perf c2c' tool, a user can explictly
specify this option for 'perf c2c' command; when a user specifies this
option it means that the user understands and accepts inaccuracy by
forcing to use PERF_MEM_SNOOP_HITM flag.  I think you could refer to
the option '--stitch-lbr' for adding an option for 'perf c2c' tool.

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-31 12:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 18:33 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-03-24 18:33 ` [PATCH v4 1/4] tools: arm64: Import cputype.h Ali Saidi
2022-03-25 18:39   ` Arnaldo Carvalho de Melo
2022-03-25 18:58     ` Ali Saidi
2022-03-25 19:42     ` Arnaldo Carvalho de Melo
2022-03-26  5:49       ` Leo Yan
2022-03-26 13:59         ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-03-26 13:47   ` Leo Yan
2022-03-26 13:52     ` Arnaldo Carvalho de Melo
2022-03-26 13:56       ` Leo Yan
2022-03-26 14:04         ` Arnaldo Carvalho de Melo
2022-03-26 19:43     ` Ali Saidi
2022-03-27  9:09       ` Leo Yan
2022-03-28  3:08       ` Ali Saidi
2022-03-28 13:05         ` Leo Yan
2022-03-29 13:34           ` Shuai Xue
2022-03-29 14:32           ` Ali Saidi
2022-03-31 12:19             ` Leo Yan [this message]
2022-03-31 12:28             ` German Gomez
2022-03-31 12:44               ` Leo Yan
2022-04-03 20:33                 ` Ali Saidi
2022-04-04 15:12                   ` Leo Yan
2022-04-06 21:00                     ` Ali Saidi
2022-04-08  1:06                       ` Leo Yan
2022-04-07 15:24                 ` German Gomez
2022-04-08  1:18                   ` Leo Yan
2022-03-24 18:33 ` [PATCH v4 3/4] perf mem: Support mem_lvl_num in c2c command Ali Saidi
2022-03-26 13:54   ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 4/4] perf mem: Support HITM for when mem_lvl_num is any Ali Saidi
2022-03-26  6:23   ` Leo Yan
2022-03-26 13:30     ` Arnaldo Carvalho de Melo
2022-03-26 19:14     ` Ali Saidi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220331121902.GA1704284@leoy-ThinkPad-X240s \
    --to=leo.yan@linaro.org \
    --cc=Nick.Forrington@arm.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alisaidi@amazon.com \
    --cc=andrew.kilroy@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=german.gomez@arm.com \
    --cc=james.clark@arm.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox