From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)
Date: Mon, 4 Jan 2010 16:52:01 -0000 [thread overview]
Message-ID: <000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com> (raw)
In-Reply-To: <200912291458.20609.jpihet@mvista.com>
Hi Jean,
Thanks for this. Feedback inline.
* Jean Pihet wrote:
> Here is the updated patch after testing on HW.
> I will rebase it on Jamie's latest patch set as soon as they are out.
>
> Feedback is welcome!
> Some remarks and questions:
>
> 1) The number of available counters can reach 32 on ARMv7, so the macro
> ARMPMU_MAX_HWEVENTS is now defined as 32. Is that correct?
I think index 0 is reserved for the events array, so this should probably be 33.
Hopefully that doesn't break any bitmasks.
> 2) Please note that the Cortex-A9 events do not easily map to the predefined
> events. Cf. armv7_a9_perf_map and armv7_a9_perf_cache_map in the code.
> - the PERF_COUNT_HW_INSTRUCTIONS event is not found. It looks like the number
> of instructions is calculated by adding events numbers (events from 0x70 till
> 0x74: MAIN_UNIT_EXECUTED_INST, SECOND_UNIT_EXECUTED_INST,
> LD_ST_UNIT_EXECUTED_INST, FP_EXECUTED_INST and NEON_EXECUTED_INST),
Event 0x68 - `Instructions coming out of the core renaming stage'
can be used as an approximation for PERF_COUNT_HW_INSTRUCTIONS.
> - the HW_BRANCH events are not found
0x0c is HW_BRANCH_INSTRUCTIONS and 0x10 is HW_BRANCH_MISSES.
0x12 is the number of predictable branch instructions executed, so the mispredict
rate is 0x10/0x12. These events are defined for v7, so A8 should take these
definitions too.
> - the global cache events 0x50 and 0x51 define the COHERENT_LINE_HIT and
> COHERENT_LINE_MISS events, is that correct?
0x50 is COHERENT_LINE_MISS and 0x51 is COHERENT_LINE_HIT. These are only available
on A9 with SMP.
> - L1 and L2 cache events are not found. Those could be available in separate
> PL310 registers, TBC
We could use 0x01 for icache miss, 0x03 for dcache miss and 0x04 for dcache access.
> - no TLB events excepted the ITLB_MISS event are found.
I think 0x05 is DTLB_MISS.
<snip>
> +/*
> + * ARMv7 Cortex-A8 and Cortex-A9 Performance Events handling code.
> + *
> + * Copied from ARMv6 code, with the low level code inspired
> + * by the ARMv7 Oprofile code.
> + *
> + * Cortex-A8 has up to 4 configurable performance counters and
> + * a single cycle counter.
> + * Cortex-A9 has up to 31 configurable performance counters and
> + * a single cycle counter.
> + *
> + * All counters can be enabled/disabled and IRQ masked separately. The cycle
> + * counter and all 4 performance counters together can be reset separately.
> + */
> +
> +#define ARMV7_PMU_CORTEX_A8_NAME "ARMv7 Cortex-A8"
> +
> +#define ARMV7_PMU_CORTEX_A9_NAME "ARMv7 Cortex-A9"
> +
> +/* Common ARMv7 event types */
> +enum armv7_perf_types {
> + ARMV7_PERFCTR_PMNC_SW_INCR = 0x00,
> + ARMV7_PERFCTR_IFETCH_MISS = 0x01,
> + ARMV7_PERFCTR_ITLB_MISS = 0x02,
> + ARMV7_PERFCTR_DCACHE_REFILL = 0x03,
> + ARMV7_PERFCTR_DCACHE_ACCESS = 0x04,
> + ARMV7_PERFCTR_DTLB_REFILL = 0x05,
> + ARMV7_PERFCTR_DREAD = 0x06,
> + ARMV7_PERFCTR_DWRITE = 0x07,
> +
> + ARMV7_PERFCTR_EXC_TAKEN = 0x09,
> + ARMV7_PERFCTR_EXC_EXECUTED = 0x0A,
> + ARMV7_PERFCTR_CID_WRITE = 0x0B,
> + ARMV7_PERFCTR_PC_WRITE = 0x0C,
> + ARMV7_PERFCTR_PC_IMM_BRANCH = 0x0D,
> + ARMV7_PERFCTR_UNALIGNED_ACCESS = 0x0F,
> + ARMV7_PERFCTR_PC_BRANCH_MIS_PRED = 0x10,
> + ARMV7_PERFCTR_CLOCK_CYCLES = 0x11,
> +
> + ARMV7_PERFCTR_PC_BRANCH_MIS_USED = 0x12,
> +
> + ARMV7_PERFCTR_CPU_CYCLES = 0xFF
> +};
For consistency, I think it's better to stick to either MISS or REFILL.
Events 0x01 and 0x03 are the same, but for instructions and data respectively.
<snip>
> +static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> + [PERF_COUNT_HW_CACHE_OP_MAX]
> + [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> + [C(L1D)] = {
> + /*
> + * The performance counters don't differentiate between read
> + * and write accesses/misses so this isn't strictly correct,
> + * but it's the best we can do. Writes and reads get
> + * combined.
> + */
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_DCACHE_ACCESS,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_DATA_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_DCACHE_ACCESS,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_DATA_MISS,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
> + [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
> + },
> + },
You're using the A8-only event 0x49 [ARMV7_PERFCTR_L1_DATA_MISS]. This also
counts hash misses in the address translation during the cache lookup procedure,
even if the resulting access is a [slightly slower] hit. I think you're better off
using event 0x03. In fact, I'd try to use architecturally defined events whenever
you can because that allows for comparisons between v7 cores.
> + [C(L1I)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = ARMV7_PERFCTR_L1_INST,
> + [C(RESULT_MISS)] = ARMV7_PERFCTR_L1_INST_MISS,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
> + [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
> + },
> + },
Same thing here. I'd suggest using 0x01 instead of 0x4a.
<snip>
> +/*
> + * Available counters
> + */
> +#define ARMV7_CNT0 0 /* First event counter */
> +#define ARMV7_CCNT 31 /* Cycle counter */
> +
> +#define ARMV7_A8_CNTMAX 5 /* Cortex-A8: up to 4 counters + CCNT */
> +#define ARMV7_A9_CNTMAX 32 /* Cortex-A9: up to 31 counters + CCNT*/
Actually, A9 has a maximum number of 6 event counters + CCNT.
<snip>
> +#define CPUID_V6_MASK 0x7F000
> +#define CPUID_V6_BITS 0x7B000
> +
> +#define CPUID_CORTEX_A8_BITS 0xC080
> +#define CPUID_CORTEX_A8_MASK 0xFFF0
> +
> +#define CPUID_CORTEX_A9_BITS 0xC090
> +#define CPUID_CORTEX_A9_MASK 0xFFF0
Just define CPUID_V7_MASK.
> + unsigned long cpuid = read_cpuid_id();
> +
> + /*
> + * ARMv6 detection
> + */
> + if (CPUID_V6_BITS == (cpuid & CPUID_V6_MASK)) {
> + armpmu = &armv6pmu;
> + memcpy(armpmu_perf_cache_map, armv6_perf_cache_map,
> + sizeof(armv6_perf_cache_map));
> + perf_max_events = armv6pmu.num_events;
> + }
> + /*
> + * ARMv7 detection
> + */
> + else if (cpu_architecture() == CPU_ARCH_ARMv7) {
> + /*
> + * Cortex-A8 detection
> + */
> + if ((cpuid & CPUID_CORTEX_A8_MASK) == CPUID_CORTEX_A8_BITS) {
> + armv7pmu.name = ARMV7_PMU_CORTEX_A8_NAME;
> + memcpy(armpmu_perf_cache_map, armv7_a8_perf_cache_map,
> + sizeof(armv7_a8_perf_cache_map));
> + armv7pmu.event_map = armv7_a8_pmu_event_map;
> + armpmu = &armv7pmu;
> + } else
> + /*
> + * Cortex-A9 detection
> + */
> + if ((cpuid & CPUID_CORTEX_A9_MASK)
> + == CPUID_CORTEX_A9_BITS) {
> + armv7pmu.name = ARMV7_PMU_CORTEX_A9_NAME;
> + memcpy(armpmu_perf_cache_map,
> + armv7_a9_perf_cache_map,
> + sizeof(armv7_a9_perf_cache_map));
> + armv7pmu.event_map = armv7_a9_pmu_event_map;
> + armpmu = &armv7pmu;
> + } else
> + perf_max_events = -1;
The A9 code indentation is a level too deep I think. It might also be
worth adding a cpu_architecture() check to the v6 test just in case a
v7 core conflicts with the mask.
I hope that helps. Please let me know what you think about the event
numberings.
Cheers,
Will
next prev parent reply other threads:[~2010-01-04 16:52 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-15 11:15 ARMv6 performance counters v3 Jamie Iles
2009-12-15 11:15 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Jamie Iles
2009-12-15 11:15 ` [PATCH 2/5] arm/oprofile: reserve the PMU when starting Jamie Iles
2009-12-15 11:15 ` [PATCH 3/5] arm: use the spinlocked, generic atomic64 support Jamie Iles
2009-12-15 11:15 ` [PATCH 4/5] arm: enable support for software perf events Jamie Iles
2009-12-15 11:15 ` [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6 Jamie Iles
2009-12-15 14:29 ` Will Deacon
2009-12-15 15:02 ` Jamie Iles
2009-12-15 15:05 ` Will Deacon
2009-12-15 15:19 ` Jamie Iles
2009-12-15 15:30 ` Peter Zijlstra
2009-12-15 15:36 ` Jamie Iles
2009-12-16 10:54 ` Jamie Iles
2009-12-16 11:04 ` Will Deacon
2009-12-16 11:19 ` Jamie Iles
2009-12-18 17:05 ` Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6) Jean Pihet
2009-12-19 10:29 ` Jamie Iles
2009-12-19 10:53 ` Ingo Molnar
2009-12-21 11:32 ` Jean Pihet
2009-12-21 11:29 ` Jean Pihet
2009-12-21 11:04 ` Will Deacon
2009-12-21 11:43 ` Jean Pihet
2009-12-21 12:10 ` Will Deacon
2009-12-21 12:43 ` Jamie Iles
2009-12-21 13:35 ` Jean Pihet
2009-12-22 16:51 ` Jean Pihet
2009-12-28 7:57 ` Ingo Molnar
2009-12-29 13:52 ` Jean Pihet
2009-12-29 16:32 ` Jamie Iles
2010-01-06 15:16 ` Michał Nazarewicz
2010-01-06 15:30 ` Jamie Iles
2010-01-07 17:02 ` Michał Nazarewicz
2009-12-29 13:58 ` Jean Pihet
2010-01-04 16:52 ` Will Deacon [this message]
2010-01-15 15:30 ` Jean Pihet
2010-01-15 15:39 ` Jamie Iles
2010-01-15 15:43 ` Jean Pihet
2010-01-15 15:49 ` Jamie Iles
2010-01-20 13:40 ` Will Deacon
2010-01-08 22:17 ` Woodruff, Richard
2010-01-15 15:34 ` Jean Pihet
2009-12-15 14:13 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Will Deacon
2009-12-15 14:36 ` Jamie Iles
2009-12-15 17:06 ` Will Deacon
2009-12-17 16:14 ` Will Deacon
2009-12-17 16:27 ` Jamie Iles
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com' \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.