All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kajol Jain <kjain@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, acme@kernel.org, jolsa@kernel.org,
	namhyung@kernel.org, linux-perf-users@vger.kernel.org,
	ak@linux.intel.com, maddy@linux.ibm.com,
	atrajeev@linux.vnet.ibm.com, rnsastry@linux.ibm.com,
	yao.jin@linux.intel.com, ast@kernel.org, daniel@iogearbox.net,
	songliubraving@fb.com, kan.liang@linux.intel.com,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	paulus@samba.org
Subject: Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses
Date: Thu, 09 Sep 2021 22:45:54 +1000	[thread overview]
Message-ID: <87czphnchp.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <YTiBqbxe7ieqY2OE@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:
> On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote:
>> Kajol Jain <kjain@linux.ibm.com> writes:
>
>> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> > index f92880a15645..030b3e990ac3 100644
>> > --- a/include/uapi/linux/perf_event.h
>> > +++ b/include/uapi/linux/perf_event.h
>> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src {
>> >  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>> >  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>> >  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>> > -/* 5-0xa available */
>> > +#define PERF_MEM_LVLNUM_OC_L2	0x05 /* On Chip L2 */
>> > +#define PERF_MEM_LVLNUM_OC_L3	0x06 /* On Chip L3 */
>> 
>> The obvious use for 5 is for "L5" and so on.
>> 
>> I'm not sure adding new levels is the best idea, because these don't fit
>> neatly into the hierarchy, they are off to the side.
>> 
>> 
>> I wonder if we should use the remote field.
>> 
>> ie. for another core's L2 we set:
>> 
>>   mem_lvl = PERF_MEM_LVL_L2
>>   mem_remote = 1
>
> This mixes APIs (see below), IIUC the correct usage would be something
> like: lvl_num=L2 remote=1

Aha, I was wondering how lvl and lvl_num were supposed to interact.

>> Which would mean "remote L2", but not remote enough to be
>> lvl = PERF_MEM_LVL_REM_CCE1.
>> 
>> It would be printed by the existing tools/perf code as "Remote L2", vs
>> "Remote cache (1 hop)", which seems OK.
>> 
>> 
>> ie. we'd be able to express:
>> 
>>   Current core's L2: LVL_L2
>>   Other core's L2:   LVL_L2 | REMOTE
>>   Other chip's L2:   LVL_REM_CCE1 | REMOTE
>> 
>> And similarly for L3.
>> 
>> I think that makes sense? Unless people think remote should be reserved
>> to mean on another chip, though we already have REM_CCE1 for that.
>
> IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of
> the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of
> course, ABIs being what they are, we get to support both :/ But I'm not
> sure mixing them is a great idea.

OK.

> Also, clearly this could use a comment...
>
> The 'new' composite doesnt have a hops field because the hardware that
> nessecitated that change doesn't report it, but we could easily add a
> field there.
>
> Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
> corresponding PERF_MEM_HOPS_{NA, 0..6}

It's really 7 if we use remote && hop = 0 to mean the first hop.

If we're wanting to use some of the hop levels to represent
intra-chip/package hops then we could possibly use them all on a really
big system.

eg. you could imagine something like:

 L2 | 		        - local L2
 L2 | REMOTE | HOPS_0	- L2 of neighbour core
 L2 | REMOTE | HOPS_1	- L2 of near core on same chip (same 1/2 of chip)
 L2 | REMOTE | HOPS_2	- L2 of far core on same chip (other 1/2 of chip)
 L2 | REMOTE | HOPS_3	- L2 of sibling chip in same package
 L2 | REMOTE | HOPS_4	- L2 on separate package 1 hop away
 L2 | REMOTE | HOPS_5	- L2 on separate package 2 hops away
 L2 | REMOTE | HOPS_6	- L2 on separate package 3 hops away


Whether it's useful to represent all those levels I'm not sure, but it's
probably good if we have the ability.

I guess I'm 50/50 on whether that's enough levels, or whether we want
another bit to allow for future growth.

> Then I suppose you can encode things like:
> 
> 	L2			- local L2
> 	L2 | REMOTE		- remote L2 at an unspecified distance (NA)
> 	L2 | REMOTE | HOPS_0	- remote L2 on the same node
> 	L2 | REMOTE | HOPS_1	- remote L2 on a node 1 removed
> 
> Would that work?

Yeah that looks good to me.

cheers

WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mark.rutland@arm.com, atrajeev@linux.vnet.ibm.com,
	ak@linux.intel.com, daniel@iogearbox.net, rnsastry@linux.ibm.com,
	alexander.shishkin@linux.intel.com,
	Kajol Jain <kjain@linux.ibm.com>,
	linux-kernel@vger.kernel.org, acme@kernel.org, ast@kernel.org,
	linux-perf-users@vger.kernel.org, yao.jin@linux.intel.com,
	mingo@redhat.com, paulus@samba.org, maddy@linux.ibm.com,
	jolsa@kernel.org, namhyung@kernel.org, songliubraving@fb.com,
	linuxppc-dev@lists.ozlabs.org, kan.liang@linux.intel.com
Subject: Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses
Date: Thu, 09 Sep 2021 22:45:54 +1000	[thread overview]
Message-ID: <87czphnchp.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <YTiBqbxe7ieqY2OE@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:
> On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote:
>> Kajol Jain <kjain@linux.ibm.com> writes:
>
>> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> > index f92880a15645..030b3e990ac3 100644
>> > --- a/include/uapi/linux/perf_event.h
>> > +++ b/include/uapi/linux/perf_event.h
>> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src {
>> >  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>> >  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>> >  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>> > -/* 5-0xa available */
>> > +#define PERF_MEM_LVLNUM_OC_L2	0x05 /* On Chip L2 */
>> > +#define PERF_MEM_LVLNUM_OC_L3	0x06 /* On Chip L3 */
>> 
>> The obvious use for 5 is for "L5" and so on.
>> 
>> I'm not sure adding new levels is the best idea, because these don't fit
>> neatly into the hierarchy, they are off to the side.
>> 
>> 
>> I wonder if we should use the remote field.
>> 
>> ie. for another core's L2 we set:
>> 
>>   mem_lvl = PERF_MEM_LVL_L2
>>   mem_remote = 1
>
> This mixes APIs (see below), IIUC the correct usage would be something
> like: lvl_num=L2 remote=1

Aha, I was wondering how lvl and lvl_num were supposed to interact.

>> Which would mean "remote L2", but not remote enough to be
>> lvl = PERF_MEM_LVL_REM_CCE1.
>> 
>> It would be printed by the existing tools/perf code as "Remote L2", vs
>> "Remote cache (1 hop)", which seems OK.
>> 
>> 
>> ie. we'd be able to express:
>> 
>>   Current core's L2: LVL_L2
>>   Other core's L2:   LVL_L2 | REMOTE
>>   Other chip's L2:   LVL_REM_CCE1 | REMOTE
>> 
>> And similarly for L3.
>> 
>> I think that makes sense? Unless people think remote should be reserved
>> to mean on another chip, though we already have REM_CCE1 for that.
>
> IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of
> the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of
> course, ABIs being what they are, we get to support both :/ But I'm not
> sure mixing them is a great idea.

OK.

> Also, clearly this could use a comment...
>
> The 'new' composite doesnt have a hops field because the hardware that
> nessecitated that change doesn't report it, but we could easily add a
> field there.
>
> Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
> corresponding PERF_MEM_HOPS_{NA, 0..6}

It's really 7 if we use remote && hop = 0 to mean the first hop.

If we're wanting to use some of the hop levels to represent
intra-chip/package hops then we could possibly use them all on a really
big system.

eg. you could imagine something like:

 L2 | 		        - local L2
 L2 | REMOTE | HOPS_0	- L2 of neighbour core
 L2 | REMOTE | HOPS_1	- L2 of near core on same chip (same 1/2 of chip)
 L2 | REMOTE | HOPS_2	- L2 of far core on same chip (other 1/2 of chip)
 L2 | REMOTE | HOPS_3	- L2 of sibling chip in same package
 L2 | REMOTE | HOPS_4	- L2 on separate package 1 hop away
 L2 | REMOTE | HOPS_5	- L2 on separate package 2 hops away
 L2 | REMOTE | HOPS_6	- L2 on separate package 3 hops away


Whether it's useful to represent all those levels I'm not sure, but it's
probably good if we have the ability.

I guess I'm 50/50 on whether that's enough levels, or whether we want
another bit to allow for future growth.

> Then I suppose you can encode things like:
> 
> 	L2			- local L2
> 	L2 | REMOTE		- remote L2 at an unspecified distance (NA)
> 	L2 | REMOTE | HOPS_0	- remote L2 on the same node
> 	L2 | REMOTE | HOPS_1	- remote L2 on a node 1 removed
> 
> Would that work?

Yeah that looks good to me.

cheers

  reply	other threads:[~2021-09-09 12:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04  6:49 [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses Kajol Jain
2021-09-04  6:49 ` Kajol Jain
2021-09-04  6:49 ` [PATCH 2/3] " Kajol Jain
2021-09-04  6:49   ` Kajol Jain
2021-09-04  6:49 ` [PATCH 3/3] powerpc/perf: Fix data source encodings for power10 Kajol Jain
2021-09-04  6:49   ` Kajol Jain
2021-09-08  7:17 ` [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses Michael Ellerman
2021-09-08  7:17   ` Michael Ellerman
2021-09-08  9:26   ` Peter Zijlstra
2021-09-08  9:26     ` Peter Zijlstra
2021-09-09 12:45     ` Michael Ellerman [this message]
2021-09-09 12:45       ` Michael Ellerman
2021-09-09 14:36       ` Peter Zijlstra
2021-09-09 14:36         ` Peter Zijlstra
2021-09-14 10:40         ` Michael Ellerman
2021-09-14 10:40           ` Michael Ellerman
2021-09-14 11:49           ` Peter Zijlstra
2021-09-14 11:49             ` Peter Zijlstra
2021-09-16 10:57             ` Michael Ellerman
2021-09-16 10:57               ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czphnchp.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ast@kernel.org \
    --cc=atrajeev@linux.vnet.ibm.com \
    --cc=daniel@iogearbox.net \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kjain@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rnsastry@linux.ibm.com \
    --cc=songliubraving@fb.com \
    --cc=yao.jin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.