linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)
Date: Mon, 4 Jan 2010 16:52:01 -0000	[thread overview]
Message-ID: <000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com> (raw)
In-Reply-To: <200912291458.20609.jpihet@mvista.com>

Hi Jean,

Thanks for this. Feedback inline.

* Jean Pihet wrote:

> Here is the updated patch after testing on HW.
> I will rebase it on Jamie's latest patch set as soon as they are out.
> 
> Feedback is welcome!

> Some remarks and questions:
> 
> 1) The number of available counters can reach 32 on ARMv7, so the macro
> ARMPMU_MAX_HWEVENTS is now defined as 32. Is that correct?

I think index 0 is reserved for the events array, so this should probably be 33.
Hopefully that doesn't break any bitmasks.

> 2) Please note that the Cortex-A9 events do not easily map to the predefined
> events. Cf. armv7_a9_perf_map and armv7_a9_perf_cache_map in the code.
> - the PERF_COUNT_HW_INSTRUCTIONS event is not found. It looks like the number
> of instructions is calculated by adding events numbers (events from 0x70 till
> 0x74: MAIN_UNIT_EXECUTED_INST, SECOND_UNIT_EXECUTED_INST,
> LD_ST_UNIT_EXECUTED_INST, FP_EXECUTED_INST and NEON_EXECUTED_INST),

Event 0x68 - `Instructions coming out of the core renaming stage'
can be used as an approximation for PERF_COUNT_HW_INSTRUCTIONS.

> - the HW_BRANCH events are not found

0x0c is HW_BRANCH_INSTRUCTIONS and 0x10 is HW_BRANCH_MISSES.
0x12 is the number of predictable branch instructions executed, so the mispredict
rate is 0x10/0x12. These events are defined for v7, so A8 should take these
definitions too.

> - the global cache events 0x50 and 0x51 define the COHERENT_LINE_HIT and
> COHERENT_LINE_MISS events, is that correct?

0x50 is COHERENT_LINE_MISS and 0x51 is COHERENT_LINE_HIT. These are only available
on A9 with SMP.

> - L1 and L2 cache events are not found. Those could be available in separate
> PL310 registers, TBC

We could use 0x01 for icache miss, 0x03 for dcache miss and 0x04 for dcache access.

> - no TLB events excepted the ITLB_MISS event are found.

I think 0x05 is DTLB_MISS.

<snip>

> +/*
> + * ARMv7 Cortex-A8 and Cortex-A9 Performance Events handling code.
> + *
> + * Copied from ARMv6 code, with the low level code inspired
> + *  by the ARMv7 Oprofile code.
> + *
> + * Cortex-A8 has up to 4 configurable performance counters and
> + *  a single cycle counter.
> + * Cortex-A9 has up to 31 configurable performance counters and
> + *  a single cycle counter.
> + *
> + * All counters can be enabled/disabled and IRQ masked separately. The cycle
> + *  counter and all 4 performance counters together can be reset separately.
> + */
> +
> +#define ARMV7_PMU_CORTEX_A8_NAME		"ARMv7 Cortex-A8"
> +
> +#define ARMV7_PMU_CORTEX_A9_NAME		"ARMv7 Cortex-A9"
> +
> +/* Common ARMv7 event types */
> +enum armv7_perf_types {
> +	ARMV7_PERFCTR_PMNC_SW_INCR		= 0x00,
> +	ARMV7_PERFCTR_IFETCH_MISS		= 0x01,
> +	ARMV7_PERFCTR_ITLB_MISS			= 0x02,
> +	ARMV7_PERFCTR_DCACHE_REFILL		= 0x03,
> +	ARMV7_PERFCTR_DCACHE_ACCESS		= 0x04,
> +	ARMV7_PERFCTR_DTLB_REFILL		= 0x05,
> +	ARMV7_PERFCTR_DREAD			= 0x06,
> +	ARMV7_PERFCTR_DWRITE			= 0x07,
> +
> +	ARMV7_PERFCTR_EXC_TAKEN			= 0x09,
> +	ARMV7_PERFCTR_EXC_EXECUTED		= 0x0A,
> +	ARMV7_PERFCTR_CID_WRITE			= 0x0B,
> +	ARMV7_PERFCTR_PC_WRITE			= 0x0C,
> +	ARMV7_PERFCTR_PC_IMM_BRANCH		= 0x0D,
> +	ARMV7_PERFCTR_UNALIGNED_ACCESS		= 0x0F,
> +	ARMV7_PERFCTR_PC_BRANCH_MIS_PRED	= 0x10,
> +	ARMV7_PERFCTR_CLOCK_CYCLES		= 0x11,
> +
> +	ARMV7_PERFCTR_PC_BRANCH_MIS_USED	= 0x12,
> +
> +	ARMV7_PERFCTR_CPU_CYCLES		= 0xFF
> +};

For consistency, I think it's better to stick to either MISS or REFILL.
Events 0x01 and 0x03 are the same, but for instructions and data respectively.

<snip>

> +static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> +					  [PERF_COUNT_HW_CACHE_OP_MAX]
> +					  [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> +	[C(L1D)] = {
> +		/*
> +		 * The performance counters don't differentiate between read
> +		 * and write accesses/misses so this isn't strictly correct,
> +		 * but it's the best we can do. Writes and reads get
> +		 * combined.
> +		 */
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_DCACHE_ACCESS,
> +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_DATA_MISS,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_DCACHE_ACCESS,
> +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_DATA_MISS,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
> +			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
> +		},
> +	},

You're using the A8-only event 0x49 [ARMV7_PERFCTR_L1_DATA_MISS]. This also
counts hash misses in the address translation during the cache lookup procedure,
even if the resulting access is a [slightly slower] hit. I think you're better off
using event 0x03. In fact, I'd try to use architecturally defined events whenever
you can because that allows for comparisons between v7 cores.

> +	[C(L1I)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L1_INST,
> +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_INST_MISS,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L1_INST,
> +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_INST_MISS,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
> +			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
> +		},
> +	},

Same thing here. I'd suggest using 0x01 instead of 0x4a.

<snip>

> +/*
> + * Available counters
> + */
> +#define ARMV7_CNT0 		0	/* First event counter */
> +#define ARMV7_CCNT 		31	/* Cycle counter */
> +
> +#define ARMV7_A8_CNTMAX		5	/* Cortex-A8: up to 4 counters + CCNT */
> +#define ARMV7_A9_CNTMAX		32	/* Cortex-A9: up to 31 counters + CCNT*/

Actually, A9 has a maximum number of 6 event counters + CCNT.

<snip>

> +#define CPUID_V6_MASK   	0x7F000
> +#define CPUID_V6_BITS   	0x7B000
> +
> +#define CPUID_CORTEX_A8_BITS	0xC080
> +#define CPUID_CORTEX_A8_MASK	0xFFF0
> +
> +#define CPUID_CORTEX_A9_BITS	0xC090
> +#define CPUID_CORTEX_A9_MASK	0xFFF0

Just define CPUID_V7_MASK.

> +	unsigned long cpuid = read_cpuid_id();
> +
> +	/*
> +	 * ARMv6 detection
> +	 */
> +	if (CPUID_V6_BITS == (cpuid & CPUID_V6_MASK)) {
> +		armpmu = &armv6pmu;
> +		memcpy(armpmu_perf_cache_map, armv6_perf_cache_map,
> +			sizeof(armv6_perf_cache_map));
> +		perf_max_events	= armv6pmu.num_events;
> +	}
> +	/*
> +	 * ARMv7 detection
> +	 */
> +	else if (cpu_architecture() == CPU_ARCH_ARMv7) {
> +		/*
> +		 * Cortex-A8 detection
> +		 */
> +		if ((cpuid & CPUID_CORTEX_A8_MASK) == CPUID_CORTEX_A8_BITS) {
> +			armv7pmu.name = ARMV7_PMU_CORTEX_A8_NAME;
> +			memcpy(armpmu_perf_cache_map, armv7_a8_perf_cache_map,
> +				sizeof(armv7_a8_perf_cache_map));
> +			armv7pmu.event_map = armv7_a8_pmu_event_map;
> +			armpmu = &armv7pmu;
> +		} else
> +		/*
> +		 * Cortex-A9 detection
> +		 */
> +			if ((cpuid & CPUID_CORTEX_A9_MASK)
> +			    == CPUID_CORTEX_A9_BITS) {
> +				armv7pmu.name = ARMV7_PMU_CORTEX_A9_NAME;
> +				memcpy(armpmu_perf_cache_map,
> +					armv7_a9_perf_cache_map,
> +					sizeof(armv7_a9_perf_cache_map));
> +				armv7pmu.event_map = armv7_a9_pmu_event_map;
> +				armpmu = &armv7pmu;
> +		} else
> +			perf_max_events = -1;

The A9 code indentation is a level too deep I think. It might also be
worth adding a cpu_architecture() check to the v6 test just in case a
v7 core conflicts with the mask.

I hope that helps. Please let me know what you think about the event
numberings.

Cheers,

Will

  reply	other threads:[~2010-01-04 16:52 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-15 11:15 ARMv6 performance counters v3 Jamie Iles
2009-12-15 11:15 ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Jamie Iles
2009-12-15 11:15   ` [PATCH 2/5] arm/oprofile: reserve the PMU when starting Jamie Iles
2009-12-15 11:15     ` [PATCH 3/5] arm: use the spinlocked, generic atomic64 support Jamie Iles
2009-12-15 11:15       ` [PATCH 4/5] arm: enable support for software perf events Jamie Iles
2009-12-15 11:15         ` [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6 Jamie Iles
2009-12-15 14:29           ` Will Deacon
2009-12-15 15:02             ` Jamie Iles
2009-12-15 15:05               ` Will Deacon
2009-12-15 15:19                 ` Jamie Iles
2009-12-15 15:30                   ` Peter Zijlstra
2009-12-15 15:36                     ` Jamie Iles
2009-12-16 10:54                       ` Jamie Iles
2009-12-16 11:04                         ` Will Deacon
2009-12-16 11:19                           ` Jamie Iles
2009-12-18 17:05           ` Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6) Jean Pihet
2009-12-19 10:29             ` Jamie Iles
2009-12-19 10:53               ` Ingo Molnar
2009-12-21 11:32                 ` Jean Pihet
2009-12-21 11:29               ` Jean Pihet
2009-12-21 11:04             ` Will Deacon
2009-12-21 11:43               ` Jean Pihet
2009-12-21 12:10                 ` Will Deacon
2009-12-21 12:43                   ` Jamie Iles
2009-12-21 13:35                     ` Jean Pihet
2009-12-22 16:51                       ` Jean Pihet
2009-12-28  7:57                         ` Ingo Molnar
2009-12-29 13:52                           ` Jean Pihet
2009-12-29 16:32                             ` Jamie Iles
2010-01-06 15:16                               ` Michał Nazarewicz
2010-01-06 15:30                                 ` Jamie Iles
2010-01-07 17:02                                   ` Michał Nazarewicz
2009-12-29 13:58                         ` Jean Pihet
2010-01-04 16:52                           ` Will Deacon [this message]
2010-01-15 15:30                             ` Jean Pihet
2010-01-15 15:39                               ` Jamie Iles
2010-01-15 15:43                                 ` Jean Pihet
2010-01-15 15:49                                   ` Jamie Iles
2010-01-20 13:40                               ` Will Deacon
2010-01-08 22:17                         ` Woodruff, Richard
2010-01-15 15:34                           ` Jean Pihet
2009-12-15 14:13   ` [PATCH 1/5] arm: provide a mechanism to reserve performance counters Will Deacon
2009-12-15 14:36     ` Jamie Iles
2009-12-15 17:06       ` Will Deacon
2009-12-17 16:14   ` Will Deacon
2009-12-17 16:27     ` Jamie Iles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000101ca8d5e$3ce84e20$b6b8ea60$@deacon@arm.com' \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).