From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Ali Saidi <alisaidi@amazon.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, german.gomez@arm.com,
benh@kernel.crashing.org, Nick.Forrington@arm.com,
alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com,
james.clark@arm.com, john.garry@huawei.com, jolsa@kernel.org,
kjain@linux.ibm.com, lihuafei1@huawei.com, mark.rutland@arm.com,
mathieu.poirier@linaro.org, mingo@redhat.com,
namhyung@kernel.org, peterz@infradead.org, will@kernel.org
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores
Date: Sat, 26 Mar 2022 10:52:01 -0300 [thread overview]
Message-ID: <Yj8age/PSIouUiKy@kernel.org> (raw)
In-Reply-To: <20220326134754.GD20556@leoy-ThinkPad-X240s>
Em Sat, Mar 26, 2022 at 09:47:54PM +0800, Leo Yan escreveu:
> Hi Ali, German,
>
> On Thu, Mar 24, 2022 at 06:33:21PM +0000, Ali Saidi wrote:
>
> [...]
>
> > +static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
> > + union perf_mem_data_src *data_src)
> > {
> > - union perf_mem_data_src data_src = { 0 };
> > + /*
> > + * Even though four levels of cache hierarchy are possible, no known
> > + * production Neoverse systems currently include more than three levels
> > + * so for the time being we assume three exist. If a production system
> > + * is built with four the this function would have to be changed to
> > + * detect the number of levels for reporting.
> > + */
> >
> > - if (record->op == ARM_SPE_LD)
> > - data_src.mem_op = PERF_MEM_OP_LOAD;
> > - else
> > - data_src.mem_op = PERF_MEM_OP_STORE;
>
> Firstly, apologize that I didn't give clear idea when Ali sent patch sets
> v2 and v3.
Ok, removing this as well.
Thanks for reviewing.
- Arnaldo
> IMHO, we need to consider two kinds of information which can guide us
> for a reliable implementation. The first thing is to summarize the data
> source configuration for x86 PEBS, we can dive in more details for this
> part; the second thing is we can refer to the AMBA architecture document
> ARM IHI 0050E.b, section 11.1.2 'Crossing a chip-to-chip interface' and
> its sub section 'Suggested DataSource values', which would help us
> much for mapping the cache topology to Arm SPE data source.
>
> As a result, I summarized the data source configurations for PEBS and
> Arm SPE Neoverse in the spreadsheet:
> https://docs.google.com/spreadsheets/d/11YmjG0TyRjH7IXgvsREFgTg3AVtxh2dvLloRK1EdNjU/edit?usp=sharing
>
> Please see below comments.
>
> > + switch (record->source) {
> > + case ARM_SPE_NV_L1D:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> > + break;
>
> I think we need to set the field 'mem_snoop' for L1 cache hit:
>
> data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
>
> For L1 cache hit, it doesn't involve snooping.
>
> > + case ARM_SPE_NV_L2:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> > + break;
>
> Ditto:
>
> data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
>
> > + case ARM_SPE_NV_PEER_CORE:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> Peer core contains its local L1 cache, so I think we can set the
> memory level L1 to indicate this case.
>
> For this data source type and below types, though they indicate
> the snooping happens, but it doesn't mean the data in the cache line
> is in 'modified' state. If set flag PERF_MEM_SNOOP_HITM, I personally
> think this will mislead users when report the result.
>
> I prefer we set below fields for ARM_SPE_NV_PEER_CORE:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
>
> > + break;
> > + /*
> > + * We don't know if this is L1, L2 but we do know it was a cache-2-cache
> > + * transfer, so set SNOOP_HITM
> > + */
> > + case ARM_SPE_NV_LCL_CLSTR:
>
> For ARM_SPE_NV_LCL_CLSTR, it fetches the data from the shared cache in
> the cluster level, it should happen in L2 cache:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L2;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
>
> > + case ARM_SPE_NV_PEER_CLSTR:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
> > + break;
>
> This type can snoop from L1 or L2 cache in the peer cluster, so it
> makes sense to set cache level as PERF_MEM_LVLNUM_ANY_CACHE. But here
> should use the snoop type PERF_MEM_SNOOP_HIT, so:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> > + /*
> > + * System cache is assumed to be L3
> > + */
> > + case ARM_SPE_NV_SYS_CACHE:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> > + break;
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
>
> > + /*
> > + * We don't know what level it hit in, except it came from the other
> > + * socket
> > + */
> > + case ARM_SPE_NV_REMOTE:
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> > + break;
>
> The type ARM_SPE_NV_REMOTE is a snooping operation and it can happen
> in any cache levels in remote chip:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> > + case ARM_SPE_NV_DRAM:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
> > + break;
>
> We can set snoop as PERF_MEM_SNOOP_MISS for DRAM data source:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT;
> data_src->mem_snoop = PERF_MEM_SNOOP_MISS;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
>
> The rest of this patch looks good to me.
>
> Thanks,
> Leo
>
> > + default:
> > + break;
> > + }
> > +}
> >
> > +static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
> > + union perf_mem_data_src *data_src)
> > +{
> > if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> > - data_src.mem_lvl = PERF_MEM_LVL_L3;
> > + data_src->mem_lvl = PERF_MEM_LVL_L3;
> >
> > if (record->type & ARM_SPE_LLC_MISS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> > + data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> > else
> > - data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> > } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
> > - data_src.mem_lvl = PERF_MEM_LVL_L1;
> > + data_src->mem_lvl = PERF_MEM_LVL_L1;
> >
> > if (record->type & ARM_SPE_L1D_MISS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> > + data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> > else
> > - data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> > }
> >
> > if (record->type & ARM_SPE_REMOTE_ACCESS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> > + data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> > +}
> > +
> > +static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
> > +{
> > + union perf_mem_data_src data_src = { 0 };
> > + bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
> > +
> > + if (record->op & ARM_SPE_LD)
> > + data_src.mem_op = PERF_MEM_OP_LOAD;
> > + else
> > + data_src.mem_op = PERF_MEM_OP_STORE;
> > +
> > + if (is_neoverse)
> > + arm_spe__synth_data_source_neoverse(record, &data_src);
> > + else
> > + arm_spe__synth_data_source_generic(record, &data_src);
> >
> > if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
> > data_src.mem_dtlb = PERF_MEM_TLB_WK;
> > @@ -446,7 +525,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
> > u64 data_src;
> > int err;
> >
> > - data_src = arm_spe__synth_data_source(record);
> > + data_src = arm_spe__synth_data_source(record, spe->midr);
> >
> > if (spe->sample_flc) {
> > if (record->type & ARM_SPE_L1D_MISS) {
> > @@ -1183,6 +1262,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
> > struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
> > size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
> > struct perf_record_time_conv *tc = &session->time_conv;
> > + const char *cpuid = perf_env__cpuid(session->evlist->env);
> > + u64 midr = strtol(cpuid, NULL, 16);
> > struct arm_spe *spe;
> > int err;
> >
> > @@ -1202,6 +1283,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
> > spe->machine = &session->machines.host; /* No kvm support */
> > spe->auxtrace_type = auxtrace_info->type;
> > spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
> > + spe->midr = midr;
> >
> > spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
> >
> > --
> > 2.32.0
> >
--
- Arnaldo
next prev parent reply other threads:[~2022-03-26 13:52 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 18:33 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-03-24 18:33 ` [PATCH v4 1/4] tools: arm64: Import cputype.h Ali Saidi
2022-03-25 18:39 ` Arnaldo Carvalho de Melo
2022-03-25 18:58 ` Ali Saidi
2022-03-25 19:42 ` Arnaldo Carvalho de Melo
2022-03-26 5:49 ` Leo Yan
2022-03-26 13:59 ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-03-26 13:47 ` Leo Yan
2022-03-26 13:52 ` Arnaldo Carvalho de Melo [this message]
2022-03-26 13:56 ` Leo Yan
2022-03-26 14:04 ` Arnaldo Carvalho de Melo
2022-03-26 19:43 ` Ali Saidi
2022-03-27 9:09 ` Leo Yan
2022-03-28 3:08 ` Ali Saidi
2022-03-28 13:05 ` Leo Yan
2022-03-29 13:34 ` Shuai Xue
2022-03-29 14:32 ` Ali Saidi
2022-03-31 12:19 ` Leo Yan
2022-03-31 12:28 ` German Gomez
2022-03-31 12:44 ` Leo Yan
2022-04-03 20:33 ` Ali Saidi
2022-04-04 15:12 ` Leo Yan
2022-04-06 21:00 ` Ali Saidi
2022-04-08 1:06 ` Leo Yan
2022-04-07 15:24 ` German Gomez
2022-04-08 1:18 ` Leo Yan
2022-03-24 18:33 ` [PATCH v4 3/4] perf mem: Support mem_lvl_num in c2c command Ali Saidi
2022-03-26 13:54 ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 4/4] perf mem: Support HITM for when mem_lvl_num is any Ali Saidi
2022-03-26 6:23 ` Leo Yan
2022-03-26 13:30 ` Arnaldo Carvalho de Melo
2022-03-26 19:14 ` Ali Saidi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yj8age/PSIouUiKy@kernel.org \
--to=acme@kernel.org \
--cc=Nick.Forrington@arm.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=alisaidi@amazon.com \
--cc=andrew.kilroy@arm.com \
--cc=benh@kernel.crashing.org \
--cc=german.gomez@arm.com \
--cc=james.clark@arm.com \
--cc=john.garry@huawei.com \
--cc=jolsa@kernel.org \
--cc=kjain@linux.ibm.com \
--cc=leo.yan@linaro.org \
--cc=lihuafei1@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.poirier@linaro.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).