From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)
Date: Wed, 20 Jan 2010 13:40:08 -0000 [thread overview]
Message-ID: <000d01ca99d6$14edcba0$3ec962e0$@deacon@arm.com> (raw)
In-Reply-To: <201001151630.07874.jpihet@mvista.com>
Hi Jean,
Sorry for the delay in getting back to you, I've had a few technical
problems with my machine. Anyway, here we go:
* Jean Pihet wrote:
<snip>
> > 0x0c is HW_BRANCH_INSTRUCTIONS and 0x10 is HW_BRANCH_MISSES.
> > 0x12 is the number of predictable branch instructions executed, so the
> > mispredict rate is 0x10/0x12. These events are defined for v7, so A8 should
> > take these definitions too.
> From the spec I read 0x0c is 'SW write of the PC', is that equivalent to
> HW_BRANCH_INSTRUCTIONS?
This event counts:
- All branch instructions
- Instructions that explicitly write the PC
- Exception generating instructions
I think this is suitable for HW_BRANCH_INSTRUCTIONS, but if anybody feels
differently then maybe we should reconsider.
> For A8 I am using:
> - ARMV7_PERFCTR_PC_BRANCH_TAKEN (0x53),
> - ARMV7_PERFCTR_PC_BRANCH_FAILED (0x52)
>
> For A9 it is unsupported for now.
>
> Do you think I should use 0x0c and 0x10 for both A8 and A9? How to get the
> accesses and misses count directly?
I think we should define the `standard' set (i.e. those that perf supports by
name) using the v7 events, so in this case then use 0x0c and 0x10 for both A8
and A9. The core-specific definitions can then always be accessed as raw events.
As I mentioned, I think this is important if people decide to compare the counts
between two cores.
> > We could use 0x01 for icache miss, 0x03 for dcache miss and 0x04 for dcache
> > access.
> Ok changed to the following. Is that correct?
> Note that A8 uses specific events for I cache in order to make them comparable
> to each other. I cache miss could use 0x01 also. Cf. remark below for more.
>
> Cortex-A8:
> - D cache access: ARMV7_PERFCTR_DCACHE_ACCESS (0x04),
> - D cache miss: ARMV7_PERFCTR_DCACHE_REFILL (0x03) instead of
> ARMV7_PERFCTR_L1_DATA_MISS (0x49),
> - I cache access: ARMV7_PERFCTR_L1_DATA_MISS (0x50),
> - I cache miss: ARMV7_PERFCTR_L1_INST_MISS (0x4a).
>
> Cortex-A9:
> - D cache access: ARMV7_PERFCTR_DCACHE_ACCESS (0x04),
> - D cache miss: ARMV7_PERFCTR_DCACHE_REFILL (0x03),
> - I cache access: Not supported,
> - I cache miss: ARMV7_PERFCTR_IFETCH_MISS (0x01).
Hmm, this is an interesting one. I suppose comparison between events on a given
core (i.e. A8) is preferable, so I agree with you here. Due to the lack of I-cache
access events on A9, there's nothing we can do to get a fair cross-core comparison.
[minor note: You've called the I-cache access event ARMV7_PERFCTR_L1_DATA_MISS!]
> > > + [C(L1I)] = {
> > > + [C(OP_READ)] = {
> > > + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> > > + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> > > + },
> > > + [C(OP_WRITE)] = {
> > > + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> > > + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> > > + },
> > > + [C(OP_PREFETCH)] = {
> > > + [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
> > > + [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
> > > + },
> > > + },
> >
> > Same thing here. I'd suggest using 0x01 instead of 0x4a.
> Ok is it preferred to keep the ARMV7_PERFCTR_L1_ events for both accesses and
> misses in order to make the events counts comparable to each other? On the
> other end using 0x01 allows the comparison between A8 and A9.
> I am OK to change it, just let me know.
After thinking about this above, I agree with you; let's use the
ARMV7_PERFCTR_L1_ events to allow for event comparisons on the A8. Comparing with
an A9 is a non-starter because the I-cache accesses can't be counted there.
> > > +/*
> > > + * Available counters
> > > + */
> > > +#define ARMV7_CNT0 0 /* First event counter */
> > > +#define ARMV7_CCNT 31 /* Cycle counter */
> > > +
> > > +#define ARMV7_A8_CNTMAX 5 /* Cortex-A8: up to 4 counters + CCNT */
> > > +#define ARMV7_A9_CNTMAX 32 /* Cortex-A9: up to 31 counters + CCNT*/
> >
> > Actually, A9 has a maximum number of 6 event counters + CCNT.
> Cf. remark above. The code is generic enough and supports up to the 1+31
> events as defined in the A8 and A9 TRMs. The number of counters is
> dynamically read from the PMNC registers. Should that be compared against the
> given maximum (1+4 for A8, 1+6 for A9)? That looks like overkill.
Sure, I was just referring to ARMV7_A9_CNTMAX being artificially high.
You'll never see more than 6 event counters on an A9.
> > It might also be
> > worth adding a cpu_architecture() check to the v6 test just in case a
> > v7 core conflicts with the mask.
> Jamie, what do you think?
I forgot that looked at the MMU. Oh well, the ordering will have to matter.
Cheers,
Will
next prev parent reply other threads:[~2010-01-20 13:40 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-15 11:15 ARMv6 performance counters v3 Jamie Iles
2009-12-15 11:15 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Jamie Iles
2009-12-15 11:15 ` [PATCH 2/5] arm/oprofile: reserve the PMU when starting Jamie Iles
2009-12-15 11:15 ` [PATCH 3/5] arm: use the spinlocked, generic atomic64 support Jamie Iles
2009-12-15 11:15 ` [PATCH 4/5] arm: enable support for software perf events Jamie Iles
2009-12-15 11:15 ` [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6 Jamie Iles
2009-12-15 14:29 ` Will Deacon
2009-12-15 15:02 ` Jamie Iles
2009-12-15 15:05 ` Will Deacon
2009-12-15 15:19 ` Jamie Iles
2009-12-15 15:30 ` Peter Zijlstra
2009-12-15 15:36 ` Jamie Iles
2009-12-16 10:54 ` Jamie Iles
2009-12-16 11:04 ` Will Deacon
2009-12-16 11:19 ` Jamie Iles
2009-12-18 17:05 ` Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6) Jean Pihet
2009-12-19 10:29 ` Jamie Iles
2009-12-19 10:53 ` Ingo Molnar
2009-12-21 11:32 ` Jean Pihet
2009-12-21 11:29 ` Jean Pihet
2009-12-21 11:04 ` Will Deacon
2009-12-21 11:43 ` Jean Pihet
2009-12-21 12:10 ` Will Deacon
2009-12-21 12:43 ` Jamie Iles
2009-12-21 13:35 ` Jean Pihet
2009-12-22 16:51 ` Jean Pihet
2009-12-28 7:57 ` Ingo Molnar
2009-12-29 13:52 ` Jean Pihet
2009-12-29 16:32 ` Jamie Iles
2010-01-06 15:16 ` Michał Nazarewicz
2010-01-06 15:30 ` Jamie Iles
2010-01-07 17:02 ` Michał Nazarewicz
2009-12-29 13:58 ` Jean Pihet
2010-01-04 16:52 ` Will Deacon
2010-01-15 15:30 ` Jean Pihet
2010-01-15 15:39 ` Jamie Iles
2010-01-15 15:43 ` Jean Pihet
2010-01-15 15:49 ` Jamie Iles
2010-01-20 13:40 ` Will Deacon [this message]
2010-01-08 22:17 ` Woodruff, Richard
2010-01-15 15:34 ` Jean Pihet
2009-12-15 14:13 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Will Deacon
2009-12-15 14:36 ` Jamie Iles
2009-12-15 17:06 ` Will Deacon
2009-12-17 16:14 ` Will Deacon
2009-12-17 16:27 ` Jamie Iles
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000d01ca99d6$14edcba0$3ec962e0$@deacon@arm.com' \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).