From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)
Date: Mon, 4 Jan 2010 16:52:01 -0000 [thread overview]
Message-ID: <000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com> (raw)
In-Reply-To: <200912291458.20609.jpihet@mvista.com>
Hi Jean,
Thanks for this. Feedback inline.
* Jean Pihet wrote:
> Here is the updated patch after testing on HW.
> I will rebase it on Jamie's latest patch set as soon as they are out.
>
> Feedback is welcome!
> Some remarks and questions:
>
> 1) The number of available counters can reach 32 on ARMv7, so the macro
> ARMPMU_MAX_HWEVENTS is now defined as 32. Is that correct?
I think index 0 is reserved for the events array, so this should probably be 33.
Hopefully that doesn't break any bitmasks.
> 2) Please note that the Cortex-A9 events do not easily map to the predefined
> events. Cf. armv7_a9_perf_map and armv7_a9_perf_cache_map in the code.
> - the PERF_COUNT_HW_INSTRUCTIONS event is not found. It looks like the number
> of instructions is calculated by adding events numbers (events from 0x70 till
> 0x74: MAIN_UNIT_EXECUTED_INST, SECOND_UNIT_EXECUTED_INST,
> LD_ST_UNIT_EXECUTED_INST, FP_EXECUTED_INST and NEON_EXECUTED_INST),
Event 0x68 - `Instructions coming out of the core renaming stage'
can be used as an approximation for PERF_COUNT_HW_INSTRUCTIONS.
> - the HW_BRANCH events are not found
0x0c is HW_BRANCH_INSTRUCTIONS and 0x10 is HW_BRANCH_MISSES.
0x12 is the number of predictable branch instructions executed, so the mispredict
rate is 0x10/0x12. These events are defined for v7, so A8 should take these
definitions too.
> - the global cache events 0x50 and 0x51 define the COHERENT_LINE_HIT and
> COHERENT_LINE_MISS events, is that correct?
0x50 is COHERENT_LINE_MISS and 0x51 is COHERENT_LINE_HIT. These are only available
on A9 with SMP.
> - L1 and L2 cache events are not found. Those could be available in separate
> PL310 registers, TBC
We could use 0x01 for icache miss, 0x03 for dcache miss and 0x04 for dcache access.
> - no TLB events excepted the ITLB_MISS event are found.
I think 0x05 is DTLB_MISS.
<snip>
> +/*
> + * ARMv7 Cortex-A8 and Cortex-A9 Performance Events handling code.
> + *
> + * Copied from ARMv6 code, with the low level code inspired
> + * by the ARMv7 Oprofile code.
> + *
> + * Cortex-A8 has up to 4 configurable performance counters and
> + * a single cycle counter.
> + * Cortex-A9 has up to 31 configurable performance counters and
> + * a single cycle counter.
> + *
> + * All counters can be enabled/disabled and IRQ masked separately. The cycle
> + * counter and all 4 performance counters together can be reset separately.
> + */
> +
> +#define ARMV7_PMU_CORTEX_A8_NAME "ARMv7 Cortex-A8"
> +
> +#define ARMV7_PMU_CORTEX_A9_NAME "ARMv7 Cortex-A9"
> +
> +/* Common ARMv7 event types */
> +enum armv7_perf_types {
> + ARMV7_PERFCTR_PMNC_SW_INCR = 0x00,
> + ARMV7_PERFCTR_IFETCH_MISS = 0x01,
> + ARMV7_PERFCTR_ITLB_MISS = 0x02,
> + ARMV7_PERFCTR_DCACHE_REFILL = 0x03,
> + ARMV7_PERFCTR_DCACHE_ACCESS = 0x04,
> + ARMV7_PERFCTR_DTLB_REFILL = 0x05,
> + ARMV7_PERFCTR_DREAD = 0x06,
> + ARMV7_PERFCTR_DWRITE = 0x07,
> +
> + ARMV7_PERFCTR_EXC_TAKEN = 0x09,
> + ARMV7_PERFCTR_EXC_EXECUTED = 0x0A,
> + ARMV7_PERFCTR_CID_WRITE = 0x0B,
> + ARMV7_PERFCTR_PC_WRITE = 0x0C,
> + ARMV7_PERFCTR_PC_IMM_BRANCH = 0x0D,
> + ARMV7_PERFCTR_UNALIGNED_ACCESS = 0x0F,
> + ARMV7_PERFCTR_PC_BRANCH_MIS_PRED = 0x10,
> + ARMV7_PERFCTR_CLOCK_CYCLES = 0x11,
> +
> + ARMV7_PERFCTR_PC_BRANCH_MIS_USED = 0x12,
> +
> + ARMV7_PERFCTR_CPU_CYCLES = 0xFF
> +};
For consistency, I think it's better to stick to either MISS or REFILL.
Events 0x01 and 0x03 are the same, but for instructions and data respectively.
<snip>
> +static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> + [PERF_COUNT_HW_CACHE_OP_MAX]
> + [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> + [C(L1D)] = {
> + /*
> + * The performance counters don't differentiate between read
> + * and write accesses/misses so this isn't strictly correct,
> + * but it's the best we can do. Writes and reads get
> + * combined.
> + */
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_DCACHE_ACCESS,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_DATA_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_DCACHE_ACCESS,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_DATA_MISS,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
> + [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
> + },
> + },
You're using the A8-only event 0x49 [ARMV7_PERFCTR_L1_DATA_MISS]. This also
counts hash misses in the address translation during the cache lookup procedure,
even if the resulting access is a [slightly slower] hit. I think you're better off
using event 0x03. In fact, I'd try to use architecturally defined events whenever
you can because that allows for comparisons between v7 cores.
> + [C(L1I)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
> + [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
> + },
> + },
Same thing here. I'd suggest using 0x01 instead of 0x4a.
<snip>
> +/*
> + * Available counters
> + */
> +#define ARMV7_CNT0 0 /* First event counter */
> +#define ARMV7_CCNT 31 /* Cycle counter */
> +
> +#define ARMV7_A8_CNTMAX 5 /* Cortex-A8: up to 4 counters + CCNT */
> +#define ARMV7_A9_CNTMAX 32 /* Cortex-A9: up to 31 counters + CCNT*/
Actually, A9 has a maximum number of 6 event counters + CCNT.
<snip>
> +#define CPUID_V6_MASK 0x7F000
> +#define CPUID_V6_BITS 0x7B000
> +
> +#define CPUID_CORTEX_A8_BITS 0xC080
> +#define CPUID_CORTEX_A8_MASK 0xFFF0
> +
> +#define CPUID_CORTEX_A9_BITS 0xC090
> +#define CPUID_CORTEX_A9_MASK 0xFFF0
Just define CPUID_V7_MASK.
> + unsigned long cpuid = read_cpuid_id();
> +
> + /*
> + * ARMv6 detection
> + */
> + if (CPUID_V6_BITS == (cpuid & CPUID_V6_MASK)) {
> + armpmu = &armv6pmu;
> + memcpy(armpmu_perf_cache_map, armv6_perf_cache_map,
> + sizeof(armv6_perf_cache_map));
> + perf_max_events = armv6pmu.num_events;
> + }
> + /*
> + * ARMv7 detection
> + */
> + else if (cpu_architecture() == CPU_ARCH_ARMv7) {
> + /*
> + * Cortex-A8 detection
> + */
> + if ((cpuid & CPUID_CORTEX_A8_MASK) == CPUID_CORTEX_A8_BITS) {
> + armv7pmu.name = ARMV7_PMU_CORTEX_A8_NAME;
> + memcpy(armpmu_perf_cache_map, armv7_a8_perf_cache_map,
> + sizeof(armv7_a8_perf_cache_map));
> + armv7pmu.event_map = armv7_a8_pmu_event_map;
> + armpmu = &armv7pmu;
> + } else
> + /*
> + * Cortex-A9 detection
> + */
> + if ((cpuid & CPUID_CORTEX_A9_MASK)
> + == CPUID_CORTEX_A9_BITS) {
> + armv7pmu.name = ARMV7_PMU_CORTEX_A9_NAME;
> + memcpy(armpmu_perf_cache_map,
> + armv7_a9_perf_cache_map,
> + sizeof(armv7_a9_perf_cache_map));
> + armv7pmu.event_map = armv7_a9_pmu_event_map;
> + armpmu = &armv7pmu;
> + } else
> + perf_max_events = -1;
The A9 code indentation is a level too deep I think. It might also be
worth adding a cpu_architecture() check to the v6 test just in case a
v7 core conflicts with the mask.
I hope that helps. Please let me know what you think about the event
numberings.
Cheers,
Will
next prev parent reply other threads:[~2010-01-04 16:52 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-15 11:15 ARMv6 performance counters v3 Jamie Iles
2009-12-15 11:15 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Jamie Iles
2009-12-15 11:15 ` [PATCH 2/5] arm/oprofile: reserve the PMU when starting Jamie Iles
2009-12-15 11:15 ` [PATCH 3/5] arm: use the spinlocked, generic atomic64 support Jamie Iles
2009-12-15 11:15 ` [PATCH 4/5] arm: enable support for software perf events Jamie Iles
2009-12-15 11:15 ` [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6 Jamie Iles
2009-12-15 14:29 ` Will Deacon
2009-12-15 15:02 ` Jamie Iles
2009-12-15 15:05 ` Will Deacon
2009-12-15 15:19 ` Jamie Iles
2009-12-15 15:30 ` Peter Zijlstra
2009-12-15 15:36 ` Jamie Iles
2009-12-16 10:54 ` Jamie Iles
2009-12-16 11:04 ` Will Deacon
2009-12-16 11:19 ` Jamie Iles
2009-12-18 17:05 ` Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6) Jean Pihet
2009-12-19 10:29 ` Jamie Iles
2009-12-19 10:53 ` Ingo Molnar
2009-12-21 11:32 ` Jean Pihet
2009-12-21 11:29 ` Jean Pihet
2009-12-21 11:04 ` Will Deacon
2009-12-21 11:43 ` Jean Pihet
2009-12-21 12:10 ` Will Deacon
2009-12-21 12:43 ` Jamie Iles
2009-12-21 13:35 ` Jean Pihet
2009-12-22 16:51 ` Jean Pihet
2009-12-28 7:57 ` Ingo Molnar
2009-12-29 13:52 ` Jean Pihet
2009-12-29 16:32 ` Jamie Iles
2010-01-06 15:16 ` Michał Nazarewicz
2010-01-06 15:30 ` Jamie Iles
2010-01-07 17:02 ` Michał Nazarewicz
2009-12-29 13:58 ` Jean Pihet
2010-01-04 16:52 ` Will Deacon [this message]
2010-01-15 15:30 ` Jean Pihet
2010-01-15 15:39 ` Jamie Iles
2010-01-15 15:43 ` Jean Pihet
2010-01-15 15:49 ` Jamie Iles
2010-01-20 13:40 ` Will Deacon
2010-01-08 22:17 ` Woodruff, Richard
2010-01-15 15:34 ` Jean Pihet
2009-12-15 14:13 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Will Deacon
2009-12-15 14:36 ` Jamie Iles
2009-12-15 17:06 ` Will Deacon
2009-12-17 16:14 ` Will Deacon
2009-12-17 16:27 ` Jamie Iles
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com' \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).