* [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests @ 2016-11-08 18:17 Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw) To: cov Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis, croberts, alindsay, drjones Changes from v7: * Standardize PMU register accessor names and remove unused ones * Use bit defines instead of bit fields * Change the testing configure for pmu.flat * Commit comments were updated Note: 1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see below) is required for this unit testing code to work correctly under KVM mode. https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html. 2) Because the code was changed, Drew's original reviewed-by needs to be acknowledged by him again. -Wei Wei Huang (3): arm: Add PMU test arm: pmu: Check cycle count increases arm: pmu: Add CPI checking arm/Makefile.common | 3 +- arm/pmu.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++++++++ arm/unittests.cfg | 19 ++++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 arm/pmu.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test 2016-11-08 18:17 [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests Wei Huang @ 2016-11-08 18:17 ` Wei Huang 2016-11-11 7:37 ` Andrew Jones 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang 2 siblings, 1 reply; 18+ messages in thread From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw) To: cov Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis, croberts, alindsay, drjones From: Christopher Covington <cov@codeaurora.org> Beginning with a simple sanity check of the control register, add a unit test for the ARM Performance Monitors Unit (PMU). Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Wei Huang <wei@redhat.com> --- arm/Makefile.common | 3 ++- arm/pmu.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++ arm/unittests.cfg | 5 ++++ 3 files changed, 80 insertions(+), 1 deletion(-) create mode 100644 arm/pmu.c diff --git a/arm/Makefile.common b/arm/Makefile.common index ccb554d..f98f422 100644 --- a/arm/Makefile.common +++ b/arm/Makefile.common @@ -11,7 +11,8 @@ endif tests-common = \ $(TEST_DIR)/selftest.flat \ - $(TEST_DIR)/spinlock-test.flat + $(TEST_DIR)/spinlock-test.flat \ + $(TEST_DIR)/pmu.flat all: test_cases diff --git a/arm/pmu.c b/arm/pmu.c new file mode 100644 index 0000000..0b29088 --- /dev/null +++ b/arm/pmu.c @@ -0,0 +1,73 @@ +/* + * Test the ARM Performance Monitors Unit (PMU). + * + * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License version 2.1 and + * only version 2.1 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License + * for more details. + */ +#include "libcflat.h" + +#define PMU_PMCR_N_SHIFT 11 +#define PMU_PMCR_N_MASK 0x1f +#define PMU_PMCR_ID_SHIFT 16 +#define PMU_PMCR_ID_MASK 0xff +#define PMU_PMCR_IMP_SHIFT 24 +#define PMU_PMCR_IMP_MASK 0xff + +#if defined(__arm__) +static inline uint32_t pmcr_read(void) +{ + uint32_t ret; + + asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); + return ret; +} +#elif defined(__aarch64__) +static inline uint32_t pmcr_read(void) +{ + uint32_t ret; + + asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); + return ret; +} +#endif + +/* + * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't + * null. Also print out a couple other interesting fields for diagnostic + * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement + * event counters and therefore reports zero event counters, but hopefully + * support for at least the instructions event will be added in the future and + * the reported number of event counters will become nonzero. + */ +static bool check_pmcr(void) +{ + uint32_t pmcr; + + pmcr = pmcr_read(); + + printf("PMU implementer: %c\n", + (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK); + printf("Identification code: 0x%x\n", + (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK); + printf("Event counters: %d\n", + (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK); + + return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; +} + +int main(void) +{ + report_prefix_push("pmu"); + + report("Control register", check_pmcr()); + + return report_summary(); +} diff --git a/arm/unittests.cfg b/arm/unittests.cfg index 3f6fa45..7645180 100644 --- a/arm/unittests.cfg +++ b/arm/unittests.cfg @@ -54,3 +54,8 @@ file = selftest.flat smp = $MAX_SMP extra_params = -append 'smp' groups = selftest + +# Test PMU support +[pmu] +file = pmu.flat +groups = pmu -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang @ 2016-11-11 7:37 ` Andrew Jones 0 siblings, 0 replies; 18+ messages in thread From: Andrew Jones @ 2016-11-11 7:37 UTC (permalink / raw) To: Wei Huang Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Tue, Nov 08, 2016 at 12:17:13PM -0600, Wei Huang wrote: > From: Christopher Covington <cov@codeaurora.org> > > Beginning with a simple sanity check of the control register, add > a unit test for the ARM Performance Monitors Unit (PMU). > > Signed-off-by: Christopher Covington <cov@codeaurora.org> > Signed-off-by: Wei Huang <wei@redhat.com> > --- > arm/Makefile.common | 3 ++- > arm/pmu.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > arm/unittests.cfg | 5 ++++ > 3 files changed, 80 insertions(+), 1 deletion(-) > create mode 100644 arm/pmu.c Reviewed-by: Andrew Jones <drjones@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-08 18:17 [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang @ 2016-11-08 18:17 ` Wei Huang 2016-11-11 7:43 ` Andrew Jones 2016-11-16 15:40 ` Andrew Jones 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang 2 siblings, 2 replies; 18+ messages in thread From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw) To: cov Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis, croberts, alindsay, drjones From: Christopher Covington <cov@codeaurora.org> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, even for the smallest delta of two subsequent reads. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Wei Huang <wei@redhat.com> --- arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/arm/pmu.c b/arm/pmu.c index 0b29088..d5e3ac3 100644 --- a/arm/pmu.c +++ b/arm/pmu.c @@ -14,6 +14,7 @@ */ #include "libcflat.h" +#define PMU_PMCR_E (1 << 0) #define PMU_PMCR_N_SHIFT 11 #define PMU_PMCR_N_MASK 0x1f #define PMU_PMCR_ID_SHIFT 16 @@ -21,6 +22,10 @@ #define PMU_PMCR_IMP_SHIFT 24 #define PMU_PMCR_IMP_MASK 0xff +#define PMU_CYCLE_IDX 31 + +#define NR_SAMPLES 10 + #if defined(__arm__) static inline uint32_t pmcr_read(void) { @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); return ret; } + +static inline void pmcr_write(uint32_t value) +{ + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); +} + +static inline void pmselr_write(uint32_t value) +{ + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); +} + +static inline void pmxevtyper_write(uint32_t value) +{ + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); +} + +/* + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 + * bits doesn't seem worth the trouble when differential usage of the result is + * expected (with differences that can easily fit in 32 bits). So just return + * the lower 32 bits of the cycle count in AArch32. + */ +static inline uint32_t pmccntr_read(void) +{ + uint32_t cycles; + + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); + return cycles; +} + +static inline void pmcntenset_write(uint32_t value) +{ + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); +} + +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ +static inline void pmccfiltr_write(uint32_t value) +{ + pmselr_write(PMU_CYCLE_IDX); + pmxevtyper_write(value); +} #elif defined(__aarch64__) static inline uint32_t pmcr_read(void) { @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); return ret; } + +static inline void pmcr_write(uint32_t value) +{ + asm volatile("msr pmcr_el0, %0" : : "r" (value)); +} + +static inline uint32_t pmccntr_read(void) +{ + uint32_t cycles; + + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); + return cycles; +} + +static inline void pmcntenset_write(uint32_t value) +{ + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); +} + +static inline void pmccfiltr_write(uint32_t value) +{ + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); +} #endif /* @@ -63,11 +132,40 @@ static bool check_pmcr(void) return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; } +/* + * Ensure that the cycle counter progresses between back-to-back reads. + */ +static bool check_cycles_increase(void) +{ + pmcr_write(pmcr_read() | PMU_PMCR_E); + + for (int i = 0; i < NR_SAMPLES; i++) { + unsigned long a, b; + + a = pmccntr_read(); + b = pmccntr_read(); + + if (a >= b) { + printf("Read %ld then %ld.\n", a, b); + return false; + } + } + + pmcr_write(pmcr_read() & ~PMU_PMCR_E); + + return true; +} + int main(void) { report_prefix_push("pmu"); + /* init for PMU event access, right now only care about cycle count */ + pmcntenset_write(1 << PMU_CYCLE_IDX); + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ + report("Control register", check_pmcr()); + report("Monotonically increasing cycle count", check_cycles_increase()); return report_summary(); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang @ 2016-11-11 7:43 ` Andrew Jones 2016-11-11 19:55 ` Wei Huang 2016-11-16 15:40 ` Andrew Jones 1 sibling, 1 reply; 18+ messages in thread From: Andrew Jones @ 2016-11-11 7:43 UTC (permalink / raw) To: Wei Huang Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > From: Christopher Covington <cov@codeaurora.org> > > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > even for the smallest delta of two subsequent reads. > > Signed-off-by: Christopher Covington <cov@codeaurora.org> > Signed-off-by: Wei Huang <wei@redhat.com> > --- > arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 98 insertions(+) > > diff --git a/arm/pmu.c b/arm/pmu.c > index 0b29088..d5e3ac3 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -14,6 +14,7 @@ > */ > #include "libcflat.h" > > +#define PMU_PMCR_E (1 << 0) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK 0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -21,6 +22,10 @@ > #define PMU_PMCR_IMP_SHIFT 24 > #define PMU_PMCR_IMP_MASK 0xff > > +#define PMU_CYCLE_IDX 31 > + > +#define NR_SAMPLES 10 > + > #if defined(__arm__) > static inline uint32_t pmcr_read(void) > { > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > +} > + > +static inline void pmselr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > +} > + > +static inline void pmxevtyper_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > +} > + > +/* > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 > + * bits doesn't seem worth the trouble when differential usage of the result is > + * expected (with differences that can easily fit in 32 bits). So just return > + * the lower 32 bits of the cycle count in AArch32. Like I said in the last review, I'd rather we not do this. We should return the full value and then the test case should confirm the upper 32 bits are zero. > + */ > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); > +} > + > +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ > +static inline void pmccfiltr_write(uint32_t value) > +{ > + pmselr_write(PMU_CYCLE_IDX); > + pmxevtyper_write(value); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("msr pmcr_el0, %0" : : "r" (value)); > +} > + > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); > +} > + > +static inline void pmccfiltr_write(uint32_t value) > +{ > + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > +} > #endif > > /* > @@ -63,11 +132,40 @@ static bool check_pmcr(void) > return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; > } > > +/* > + * Ensure that the cycle counter progresses between back-to-back reads. > + */ > +static bool check_cycles_increase(void) > +{ > + pmcr_write(pmcr_read() | PMU_PMCR_E); > + > + for (int i = 0; i < NR_SAMPLES; i++) { > + unsigned long a, b; > + > + a = pmccntr_read(); > + b = pmccntr_read(); > + > + if (a >= b) { > + printf("Read %ld then %ld.\n", a, b); > + return false; > + } > + } > + > + pmcr_write(pmcr_read() & ~PMU_PMCR_E); > + > + return true; > +} > + > int main(void) > { > report_prefix_push("pmu"); > > + /* init for PMU event access, right now only care about cycle count */ > + pmcntenset_write(1 << PMU_CYCLE_IDX); > + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ > + > report("Control register", check_pmcr()); > + report("Monotonically increasing cycle count", check_cycles_increase()); > > return report_summary(); > } > -- > 1.8.3.1 Besides needing to use u64's for registers that return u64's, it looks good to me. drew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-11 7:43 ` Andrew Jones @ 2016-11-11 19:55 ` Wei Huang 2016-11-14 10:05 ` Andrew Jones 0 siblings, 1 reply; 18+ messages in thread From: Wei Huang @ 2016-11-11 19:55 UTC (permalink / raw) To: Andrew Jones Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On 11/11/2016 01:43 AM, Andrew Jones wrote: > On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >> From: Christopher Covington <cov@codeaurora.org> >> >> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >> even for the smallest delta of two subsequent reads. >> >> Signed-off-by: Christopher Covington <cov@codeaurora.org> >> Signed-off-by: Wei Huang <wei@redhat.com> >> --- >> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 98 insertions(+) >> >> diff --git a/arm/pmu.c b/arm/pmu.c >> index 0b29088..d5e3ac3 100644 >> --- a/arm/pmu.c >> +++ b/arm/pmu.c >> @@ -14,6 +14,7 @@ >> */ >> #include "libcflat.h" >> >> +#define PMU_PMCR_E (1 << 0) >> #define PMU_PMCR_N_SHIFT 11 >> #define PMU_PMCR_N_MASK 0x1f >> #define PMU_PMCR_ID_SHIFT 16 >> @@ -21,6 +22,10 @@ >> #define PMU_PMCR_IMP_SHIFT 24 >> #define PMU_PMCR_IMP_MASK 0xff >> >> +#define PMU_CYCLE_IDX 31 >> + >> +#define NR_SAMPLES 10 >> + >> #if defined(__arm__) >> static inline uint32_t pmcr_read(void) >> { >> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >> return ret; >> } >> + >> +static inline void pmcr_write(uint32_t value) >> +{ >> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >> +} >> + >> +static inline void pmselr_write(uint32_t value) >> +{ >> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >> +} >> + >> +static inline void pmxevtyper_write(uint32_t value) >> +{ >> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >> +} >> + >> +/* >> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 >> + * bits doesn't seem worth the trouble when differential usage of the result is >> + * expected (with differences that can easily fit in 32 bits). So just return >> + * the lower 32 bits of the cycle count in AArch32. > > Like I said in the last review, I'd rather we not do this. We should > return the full value and then the test case should confirm the upper > 32 bits are zero. > Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit register. We can force it to a more coarse-grained cycle counter with PMCR.D bit=1 (see below). But it is still not a 64-bit register. ARMv8 PMCCNTR_EL0 is a 64-bit register. "The PMCR.D bit configures whether PMCCNTR increments once every clock cycle, or once every 64 clock cycles. " So I think the comment above in the code is an overstatement, which should be deleted or moved down to ARMv8 pmccntr_read() below. >> + */ >> +static inline uint32_t pmccntr_read(void) >> +{ >> + uint32_t cycles; >> + >> + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); >> + return cycles; >> +} >> + >> +static inline void pmcntenset_write(uint32_t value) >> +{ >> + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); >> +} >> + >> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ >> +static inline void pmccfiltr_write(uint32_t value) >> +{ >> + pmselr_write(PMU_CYCLE_IDX); >> + pmxevtyper_write(value); >> +} >> #elif defined(__aarch64__) >> static inline uint32_t pmcr_read(void) >> { >> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) >> asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); >> return ret; >> } >> + >> +static inline void pmcr_write(uint32_t value) >> +{ >> + asm volatile("msr pmcr_el0, %0" : : "r" (value)); >> +} >> + >> +static inline uint32_t pmccntr_read(void) >> +{ >> + uint32_t cycles; >> + >> + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); >> + return cycles; >> +} >> + >> +static inline void pmcntenset_write(uint32_t value) >> +{ >> + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); >> +} >> + >> +static inline void pmccfiltr_write(uint32_t value) >> +{ >> + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); >> +} >> #endif >> >> /* >> @@ -63,11 +132,40 @@ static bool check_pmcr(void) >> return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; >> } >> >> +/* >> + * Ensure that the cycle counter progresses between back-to-back reads. >> + */ >> +static bool check_cycles_increase(void) >> +{ >> + pmcr_write(pmcr_read() | PMU_PMCR_E); >> + >> + for (int i = 0; i < NR_SAMPLES; i++) { >> + unsigned long a, b; >> + >> + a = pmccntr_read(); >> + b = pmccntr_read(); >> + >> + if (a >= b) { >> + printf("Read %ld then %ld.\n", a, b); >> + return false; >> + } >> + } >> + >> + pmcr_write(pmcr_read() & ~PMU_PMCR_E); >> + >> + return true; >> +} >> + >> int main(void) >> { >> report_prefix_push("pmu"); >> >> + /* init for PMU event access, right now only care about cycle count */ >> + pmcntenset_write(1 << PMU_CYCLE_IDX); >> + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ >> + >> report("Control register", check_pmcr()); >> + report("Monotonically increasing cycle count", check_cycles_increase()); >> >> return report_summary(); >> } >> -- >> 1.8.3.1 > > Besides needing to use u64's for registers that return u64's, it > looks good to me. > > drew > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-11 19:55 ` Wei Huang @ 2016-11-14 10:05 ` Andrew Jones 2016-11-14 15:12 ` Christopher Covington 0 siblings, 1 reply; 18+ messages in thread From: Andrew Jones @ 2016-11-14 10:05 UTC (permalink / raw) To: Wei Huang Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, cov, kvmarm, shannon.zhao On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > > > On 11/11/2016 01:43 AM, Andrew Jones wrote: > > On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > >> From: Christopher Covington <cov@codeaurora.org> > >> > >> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > >> even for the smallest delta of two subsequent reads. > >> > >> Signed-off-by: Christopher Covington <cov@codeaurora.org> > >> Signed-off-by: Wei Huang <wei@redhat.com> > >> --- > >> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 98 insertions(+) > >> > >> diff --git a/arm/pmu.c b/arm/pmu.c > >> index 0b29088..d5e3ac3 100644 > >> --- a/arm/pmu.c > >> +++ b/arm/pmu.c > >> @@ -14,6 +14,7 @@ > >> */ > >> #include "libcflat.h" > >> > >> +#define PMU_PMCR_E (1 << 0) > >> #define PMU_PMCR_N_SHIFT 11 > >> #define PMU_PMCR_N_MASK 0x1f > >> #define PMU_PMCR_ID_SHIFT 16 > >> @@ -21,6 +22,10 @@ > >> #define PMU_PMCR_IMP_SHIFT 24 > >> #define PMU_PMCR_IMP_MASK 0xff > >> > >> +#define PMU_CYCLE_IDX 31 > >> + > >> +#define NR_SAMPLES 10 > >> + > >> #if defined(__arm__) > >> static inline uint32_t pmcr_read(void) > >> { > >> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > >> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > >> return ret; > >> } > >> + > >> +static inline void pmcr_write(uint32_t value) > >> +{ > >> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > >> +} > >> + > >> +static inline void pmselr_write(uint32_t value) > >> +{ > >> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > >> +} > >> + > >> +static inline void pmxevtyper_write(uint32_t value) > >> +{ > >> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > >> +} > >> + > >> +/* > >> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 > >> + * bits doesn't seem worth the trouble when differential usage of the result is > >> + * expected (with differences that can easily fit in 32 bits). So just return > >> + * the lower 32 bits of the cycle count in AArch32. > > > > Like I said in the last review, I'd rather we not do this. We should > > return the full value and then the test case should confirm the upper > > 32 bits are zero. > > > > Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > register. We can force it to a more coarse-grained cycle counter with > PMCR.D bit=1 (see below). But it is still not a 64-bit register. ARMv8 > PMCCNTR_EL0 is a 64-bit register. > > "The PMCR.D bit configures whether PMCCNTR increments once every clock > cycle, or once every 64 clock cycles. " > > So I think the comment above in the code is an overstatement, which > should be deleted or moved down to ARMv8 pmccntr_read() below. OK, please fix as appropriate, but for the v8 64-bit register, please don't drop the upper bits until after a unit test has a chance to check them. Thanks, drew > > >> + */ > >> +static inline uint32_t pmccntr_read(void) > >> +{ > >> + uint32_t cycles; > >> + > >> + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); > >> + return cycles; > >> +} > >> + > >> +static inline void pmcntenset_write(uint32_t value) > >> +{ > >> + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); > >> +} > >> + > >> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ > >> +static inline void pmccfiltr_write(uint32_t value) > >> +{ > >> + pmselr_write(PMU_CYCLE_IDX); > >> + pmxevtyper_write(value); > >> +} > >> #elif defined(__aarch64__) > >> static inline uint32_t pmcr_read(void) > >> { > >> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) > >> asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); > >> return ret; > >> } > >> + > >> +static inline void pmcr_write(uint32_t value) > >> +{ > >> + asm volatile("msr pmcr_el0, %0" : : "r" (value)); > >> +} > >> + > >> +static inline uint32_t pmccntr_read(void) > >> +{ > >> + uint32_t cycles; > >> + > >> + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); > >> + return cycles; > >> +} > >> + > >> +static inline void pmcntenset_write(uint32_t value) > >> +{ > >> + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); > >> +} > >> + > >> +static inline void pmccfiltr_write(uint32_t value) > >> +{ > >> + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > >> +} > >> #endif > >> > >> /* > >> @@ -63,11 +132,40 @@ static bool check_pmcr(void) > >> return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; > >> } > >> > >> +/* > >> + * Ensure that the cycle counter progresses between back-to-back reads. > >> + */ > >> +static bool check_cycles_increase(void) > >> +{ > >> + pmcr_write(pmcr_read() | PMU_PMCR_E); > >> + > >> + for (int i = 0; i < NR_SAMPLES; i++) { > >> + unsigned long a, b; > >> + > >> + a = pmccntr_read(); > >> + b = pmccntr_read(); > >> + > >> + if (a >= b) { > >> + printf("Read %ld then %ld.\n", a, b); > >> + return false; > >> + } > >> + } > >> + > >> + pmcr_write(pmcr_read() & ~PMU_PMCR_E); > >> + > >> + return true; > >> +} > >> + > >> int main(void) > >> { > >> report_prefix_push("pmu"); > >> > >> + /* init for PMU event access, right now only care about cycle count */ > >> + pmcntenset_write(1 << PMU_CYCLE_IDX); > >> + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ > >> + > >> report("Control register", check_pmcr()); > >> + report("Monotonically increasing cycle count", check_cycles_increase()); > >> > >> return report_summary(); > >> } > >> -- > >> 1.8.3.1 > > > > Besides needing to use u64's for registers that return u64's, it > > looks good to me. > > > > drew > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-14 10:05 ` Andrew Jones @ 2016-11-14 15:12 ` Christopher Covington 2016-11-15 22:50 ` Wei Huang 0 siblings, 1 reply; 18+ messages in thread From: Christopher Covington @ 2016-11-14 15:12 UTC (permalink / raw) To: Andrew Jones, Wei Huang Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, kvmarm, shannon.zhao Hi Drew, Wei, On 11/14/2016 05:05 AM, Andrew Jones wrote: > On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: >> >> >> On 11/11/2016 01:43 AM, Andrew Jones wrote: >>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >>>> From: Christopher Covington <cov@codeaurora.org> >>>> >>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >>>> even for the smallest delta of two subsequent reads. >>>> >>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> >>>> Signed-off-by: Wei Huang <wei@redhat.com> >>>> --- >>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 98 insertions(+) >>>> >>>> diff --git a/arm/pmu.c b/arm/pmu.c >>>> index 0b29088..d5e3ac3 100644 >>>> --- a/arm/pmu.c >>>> +++ b/arm/pmu.c >>>> @@ -14,6 +14,7 @@ >>>> */ >>>> #include "libcflat.h" >>>> >>>> +#define PMU_PMCR_E (1 << 0) >>>> #define PMU_PMCR_N_SHIFT 11 >>>> #define PMU_PMCR_N_MASK 0x1f >>>> #define PMU_PMCR_ID_SHIFT 16 >>>> @@ -21,6 +22,10 @@ >>>> #define PMU_PMCR_IMP_SHIFT 24 >>>> #define PMU_PMCR_IMP_MASK 0xff >>>> >>>> +#define PMU_CYCLE_IDX 31 >>>> + >>>> +#define NR_SAMPLES 10 >>>> + >>>> #if defined(__arm__) >>>> static inline uint32_t pmcr_read(void) >>>> { >>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >>>> return ret; >>>> } >>>> + >>>> +static inline void pmcr_write(uint32_t value) >>>> +{ >>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >>>> +} >>>> + >>>> +static inline void pmselr_write(uint32_t value) >>>> +{ >>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >>>> +} >>>> + >>>> +static inline void pmxevtyper_write(uint32_t value) >>>> +{ >>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >>>> +} >>>> + >>>> +/* >>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 >>>> + * bits doesn't seem worth the trouble when differential usage of the result is >>>> + * expected (with differences that can easily fit in 32 bits). So just return >>>> + * the lower 32 bits of the cycle count in AArch32. >>> >>> Like I said in the last review, I'd rather we not do this. We should >>> return the full value and then the test case should confirm the upper >>> 32 bits are zero. >> >> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit >> register. We can force it to a more coarse-grained cycle counter with >> PMCR.D bit=1 (see below). But it is still not a 64-bit register. AArch32 System Register Descriptions Performance Monitors registers PMCCNTR, Performance Monitors Cycle Count Register To access the PMCCNTR when accessing as a 32-bit register: MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged To access the PMCCNTR when accessing as a 64-bit register: MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] Regards, Cov -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-14 15:12 ` Christopher Covington @ 2016-11-15 22:50 ` Wei Huang 2016-11-16 13:01 ` Andrew Jones 0 siblings, 1 reply; 18+ messages in thread From: Wei Huang @ 2016-11-15 22:50 UTC (permalink / raw) To: Christopher Covington, Andrew Jones Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, kvmarm, shannon.zhao On 11/14/2016 09:12 AM, Christopher Covington wrote: > Hi Drew, Wei, > > On 11/14/2016 05:05 AM, Andrew Jones wrote: >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: >>> >>> >>> On 11/11/2016 01:43 AM, Andrew Jones wrote: >>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >>>>> From: Christopher Covington <cov@codeaurora.org> >>>>> >>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >>>>> even for the smallest delta of two subsequent reads. >>>>> >>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> >>>>> Signed-off-by: Wei Huang <wei@redhat.com> >>>>> --- >>>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 98 insertions(+) >>>>> >>>>> diff --git a/arm/pmu.c b/arm/pmu.c >>>>> index 0b29088..d5e3ac3 100644 >>>>> --- a/arm/pmu.c >>>>> +++ b/arm/pmu.c >>>>> @@ -14,6 +14,7 @@ >>>>> */ >>>>> #include "libcflat.h" >>>>> >>>>> +#define PMU_PMCR_E (1 << 0) >>>>> #define PMU_PMCR_N_SHIFT 11 >>>>> #define PMU_PMCR_N_MASK 0x1f >>>>> #define PMU_PMCR_ID_SHIFT 16 >>>>> @@ -21,6 +22,10 @@ >>>>> #define PMU_PMCR_IMP_SHIFT 24 >>>>> #define PMU_PMCR_IMP_MASK 0xff >>>>> >>>>> +#define PMU_CYCLE_IDX 31 >>>>> + >>>>> +#define NR_SAMPLES 10 >>>>> + >>>>> #if defined(__arm__) >>>>> static inline uint32_t pmcr_read(void) >>>>> { >>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >>>>> return ret; >>>>> } >>>>> + >>>>> +static inline void pmcr_write(uint32_t value) >>>>> +{ >>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >>>>> +} >>>>> + >>>>> +static inline void pmselr_write(uint32_t value) >>>>> +{ >>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >>>>> +} >>>>> + >>>>> +static inline void pmxevtyper_write(uint32_t value) >>>>> +{ >>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >>>>> +} >>>>> + >>>>> +/* >>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 >>>>> + * bits doesn't seem worth the trouble when differential usage of the result is >>>>> + * expected (with differences that can easily fit in 32 bits). So just return >>>>> + * the lower 32 bits of the cycle count in AArch32. >>>> >>>> Like I said in the last review, I'd rather we not do this. We should >>>> return the full value and then the test case should confirm the upper >>>> 32 bits are zero. >>> >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit >>> register. We can force it to a more coarse-grained cycle counter with >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > > AArch32 System Register Descriptions > Performance Monitors registers > PMCCNTR, Performance Monitors Cycle Count Register > > To access the PMCCNTR when accessing as a 32-bit register: > MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged > > To access the PMCCNTR when accessing as a 64-bit register: > MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 > MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] > Thanks. I did some research based on your info and came back with the following proposals (Cov, correct me if I am wrong): By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I think this 64-bit cycle register is only available when running under aarch32 compatibility mode on ARMv8 because it is not specified in A15 TRM. To further verify it, I tested 32-bit pmu code on QEMU with TCG mode. The result is: accessing 64-bit PMCCNTR using the following assembly failed on A15: volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); or volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); Given this difference, I think there are two solutions for 64-bit AArch32 pmccntr_read, as requested by Drew: 1) The PMU unit testing code tells if it is running under ARMv7 or under AArch32-compability mode. When it is running ARMv7, such as A15, let us use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise use "MRRC p15,0,<Rt>,<Rt2>,c9". 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will be the same as the original code. Thoughts? -Wei [1] A57 TRM, http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf [2] A15 TRM, http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438c/DDI0438C_cortex_a15_r2p0_trm.pdf > Regards, > Cov > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-15 22:50 ` Wei Huang @ 2016-11-16 13:01 ` Andrew Jones 2016-11-16 16:08 ` Christopher Covington 0 siblings, 1 reply; 18+ messages in thread From: Andrew Jones @ 2016-11-16 13:01 UTC (permalink / raw) To: Wei Huang Cc: Christopher Covington, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: > > > On 11/14/2016 09:12 AM, Christopher Covington wrote: > > Hi Drew, Wei, > > > > On 11/14/2016 05:05 AM, Andrew Jones wrote: > >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > >>> > >>> > >>> On 11/11/2016 01:43 AM, Andrew Jones wrote: > >>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > >>>>> From: Christopher Covington <cov@codeaurora.org> > >>>>> > >>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > >>>>> even for the smallest delta of two subsequent reads. > >>>>> > >>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> > >>>>> Signed-off-by: Wei Huang <wei@redhat.com> > >>>>> --- > >>>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> 1 file changed, 98 insertions(+) > >>>>> > >>>>> diff --git a/arm/pmu.c b/arm/pmu.c > >>>>> index 0b29088..d5e3ac3 100644 > >>>>> --- a/arm/pmu.c > >>>>> +++ b/arm/pmu.c > >>>>> @@ -14,6 +14,7 @@ > >>>>> */ > >>>>> #include "libcflat.h" > >>>>> > >>>>> +#define PMU_PMCR_E (1 << 0) > >>>>> #define PMU_PMCR_N_SHIFT 11 > >>>>> #define PMU_PMCR_N_MASK 0x1f > >>>>> #define PMU_PMCR_ID_SHIFT 16 > >>>>> @@ -21,6 +22,10 @@ > >>>>> #define PMU_PMCR_IMP_SHIFT 24 > >>>>> #define PMU_PMCR_IMP_MASK 0xff > >>>>> > >>>>> +#define PMU_CYCLE_IDX 31 > >>>>> + > >>>>> +#define NR_SAMPLES 10 > >>>>> + > >>>>> #if defined(__arm__) > >>>>> static inline uint32_t pmcr_read(void) > >>>>> { > >>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > >>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > >>>>> return ret; > >>>>> } > >>>>> + > >>>>> +static inline void pmcr_write(uint32_t value) > >>>>> +{ > >>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > >>>>> +} > >>>>> + > >>>>> +static inline void pmselr_write(uint32_t value) > >>>>> +{ > >>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > >>>>> +} > >>>>> + > >>>>> +static inline void pmxevtyper_write(uint32_t value) > >>>>> +{ > >>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > >>>>> +} > >>>>> + > >>>>> +/* > >>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 > >>>>> + * bits doesn't seem worth the trouble when differential usage of the result is > >>>>> + * expected (with differences that can easily fit in 32 bits). So just return > >>>>> + * the lower 32 bits of the cycle count in AArch32. > >>>> > >>>> Like I said in the last review, I'd rather we not do this. We should > >>>> return the full value and then the test case should confirm the upper > >>>> 32 bits are zero. > >>> > >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > >>> register. We can force it to a more coarse-grained cycle counter with > >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > > > > AArch32 System Register Descriptions > > Performance Monitors registers > > PMCCNTR, Performance Monitors Cycle Count Register > > > > To access the PMCCNTR when accessing as a 32-bit register: > > MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > > MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged > > > > To access the PMCCNTR when accessing as a 64-bit register: > > MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 > > MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] > > > > Thanks. I did some research based on your info and came back with the > following proposals (Cov, correct me if I am wrong): > > By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I > think this 64-bit cycle register is only available when running under > aarch32 compatibility mode on ARMv8 because it is not specified in A15 > TRM. OK, I hadn't realized that there would be differences between v7 and AArch32. It looks like we need to add a function to the kvm-unit-tests framework that enables unit tests to make that distinction, because we'll want to explicitly test those differences in order to flush out emulation bugs. I see now that Appendix K5 of the v8 ARM ARM lists some differences, but this PMCCNTR difference isn't there... As v8-A32 is an update/extension of v7-A, I'd expect there to be a RES0 bit in some v7 ID register that, on v8, is no longer reserved and a 1. Unfortunately I just did some ARM doc skimming but can't find anything like that. As we currently only use the cortex-a15 for our v7 processor, then I guess we can just check MIDR, but yuck. Anyway, I'll send a patch for that. > To further verify it, I tested 32-bit pmu code on QEMU with TCG > mode. The result is: accessing 64-bit PMCCNTR using the following > assembly failed on A15: > > volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); > or > volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); > > Given this difference, I think there are two solutions for 64-bit > AArch32 pmccntr_read, as requested by Drew: > > 1) The PMU unit testing code tells if it is running under ARMv7 or under > AArch32-compability mode. When it is running ARMv7, such as A15, let us > use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise > use "MRRC p15,0,<Rt>,<Rt2>,c9". > > 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC > p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will > be the same as the original code. 3) For the basic test do (2), but add an additional test for AArch32 mode that also does the MRRC. That way on AArch32 we test both access types. Going with (3) means we can finish this series off now and then post another patch later with the additional access, after my is_aarch32() patch, that I'll write now, gets merged. Thanks, drew > > Thoughts? > > -Wei > > [1] A57 TRM, > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf > [2] A15 TRM, > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438c/DDI0438C_cortex_a15_r2p0_trm.pdf > > > Regards, > > Cov > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-16 13:01 ` Andrew Jones @ 2016-11-16 16:08 ` Christopher Covington 2016-11-16 16:25 ` Andrew Jones 0 siblings, 1 reply; 18+ messages in thread From: Christopher Covington @ 2016-11-16 16:08 UTC (permalink / raw) To: Andrew Jones, Wei Huang Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm, Peter Maydell On 11/16/2016 08:01 AM, Andrew Jones wrote: > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: >> >> >> On 11/14/2016 09:12 AM, Christopher Covington wrote: >>> Hi Drew, Wei, >>> >>> On 11/14/2016 05:05 AM, Andrew Jones wrote: >>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: >>>>> >>>>> >>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote: >>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >>>>>>> From: Christopher Covington <cov@codeaurora.org> >>>>>>> >>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >>>>>>> even for the smallest delta of two subsequent reads. >>>>>>> >>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> >>>>>>> Signed-off-by: Wei Huang <wei@redhat.com> >>>>>>> --- >>>>>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> 1 file changed, 98 insertions(+) >>>>>>> >>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c >>>>>>> index 0b29088..d5e3ac3 100644 >>>>>>> --- a/arm/pmu.c >>>>>>> +++ b/arm/pmu.c >>>>>>> @@ -14,6 +14,7 @@ >>>>>>> */ >>>>>>> #include "libcflat.h" >>>>>>> >>>>>>> +#define PMU_PMCR_E (1 << 0) >>>>>>> #define PMU_PMCR_N_SHIFT 11 >>>>>>> #define PMU_PMCR_N_MASK 0x1f >>>>>>> #define PMU_PMCR_ID_SHIFT 16 >>>>>>> @@ -21,6 +22,10 @@ >>>>>>> #define PMU_PMCR_IMP_SHIFT 24 >>>>>>> #define PMU_PMCR_IMP_MASK 0xff >>>>>>> >>>>>>> +#define PMU_CYCLE_IDX 31 >>>>>>> + >>>>>>> +#define NR_SAMPLES 10 >>>>>>> + >>>>>>> #if defined(__arm__) >>>>>>> static inline uint32_t pmcr_read(void) >>>>>>> { >>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >>>>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >>>>>>> return ret; >>>>>>> } >>>>>>> + >>>>>>> +static inline void pmcr_write(uint32_t value) >>>>>>> +{ >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >>>>>>> +} >>>>>>> + >>>>>>> +static inline void pmselr_write(uint32_t value) >>>>>>> +{ >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >>>>>>> +} >>>>>>> + >>>>>>> +static inline void pmxevtyper_write(uint32_t value) >>>>>>> +{ >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >>>>>>> +} >>>>>>> + >>>>>>> +/* >>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 >>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is >>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return >>>>>>> + * the lower 32 bits of the cycle count in AArch32. >>>>>> >>>>>> Like I said in the last review, I'd rather we not do this. We should >>>>>> return the full value and then the test case should confirm the upper >>>>>> 32 bits are zero. >>>>> >>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit >>>>> register. We can force it to a more coarse-grained cycle counter with >>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. >>> >>> AArch32 System Register Descriptions >>> Performance Monitors registers >>> PMCCNTR, Performance Monitors Cycle Count Register >>> >>> To access the PMCCNTR when accessing as a 32-bit register: >>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt >>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged >>> >>> To access the PMCCNTR when accessing as a 64-bit register: >>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 >>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] >>> >> >> Thanks. I did some research based on your info and came back with the >> following proposals (Cov, correct me if I am wrong): >> >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I >> think this 64-bit cycle register is only available when running under >> aarch32 compatibility mode on ARMv8 because it is not specified in A15 >> TRM. That interpretation sounds really strange to me. My recollection is that the cycle counter was available as a 64 bit register in ARMv7 as well. I would expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference Manual is the complete and authoritative source. >> To further verify it, I tested 32-bit pmu code on QEMU with TCG >> mode. The result is: accessing 64-bit PMCCNTR using the following >> assembly failed on A15: >> >> volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); >> or >> volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); The PMU implementation on QEMU TCG mode is infantile. (I was trying to write these tests to help guide fixes and enhancements in a test-driven-development manner.) I would not trust QEMU TCG to behave properly here. If you want to execute those instructions, is there anything preventing you from doing it on hardware, or at least the Foundation Model? >> Given this difference, I think there are two solutions for 64-bit >> AArch32 pmccntr_read, as requested by Drew: >> >> 1) The PMU unit testing code tells if it is running under ARMv7 or under >> AArch32-compability mode. When it is running ARMv7, such as A15, let us >> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise >> use "MRRC p15,0,<Rt>,<Rt2>,c9". >> >> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC >> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will >> be the same as the original code. > > 3) For the basic test do (2), but add an additional test for AArch32 > mode that also does the MRRC. That way on AArch32 we test both access > types. The upper bits being non-zero is an insane corner case. I'd really prefer to first build some momentum with checks for issues that are A) likely to occur and 2) not too difficult to check, like whether PMCR is writeable (especially relevant to KVM mode where it's not by default). Thanks, Cov -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-16 16:08 ` Christopher Covington @ 2016-11-16 16:25 ` Andrew Jones 2016-11-16 16:41 ` Christopher Covington 0 siblings, 1 reply; 18+ messages in thread From: Andrew Jones @ 2016-11-16 16:25 UTC (permalink / raw) To: Christopher Covington Cc: Wei Huang, alindsay, kvm, Peter Maydell, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote: > On 11/16/2016 08:01 AM, Andrew Jones wrote: > > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: > >> > >> > >> On 11/14/2016 09:12 AM, Christopher Covington wrote: > >>> Hi Drew, Wei, > >>> > >>> On 11/14/2016 05:05 AM, Andrew Jones wrote: > >>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > >>>>> > >>>>> > >>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote: > >>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > >>>>>>> From: Christopher Covington <cov@codeaurora.org> > >>>>>>> > >>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > >>>>>>> even for the smallest delta of two subsequent reads. > >>>>>>> > >>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> > >>>>>>> Signed-off-by: Wei Huang <wei@redhat.com> > >>>>>>> --- > >>>>>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>>> 1 file changed, 98 insertions(+) > >>>>>>> > >>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c > >>>>>>> index 0b29088..d5e3ac3 100644 > >>>>>>> --- a/arm/pmu.c > >>>>>>> +++ b/arm/pmu.c > >>>>>>> @@ -14,6 +14,7 @@ > >>>>>>> */ > >>>>>>> #include "libcflat.h" > >>>>>>> > >>>>>>> +#define PMU_PMCR_E (1 << 0) > >>>>>>> #define PMU_PMCR_N_SHIFT 11 > >>>>>>> #define PMU_PMCR_N_MASK 0x1f > >>>>>>> #define PMU_PMCR_ID_SHIFT 16 > >>>>>>> @@ -21,6 +22,10 @@ > >>>>>>> #define PMU_PMCR_IMP_SHIFT 24 > >>>>>>> #define PMU_PMCR_IMP_MASK 0xff > >>>>>>> > >>>>>>> +#define PMU_CYCLE_IDX 31 > >>>>>>> + > >>>>>>> +#define NR_SAMPLES 10 > >>>>>>> + > >>>>>>> #if defined(__arm__) > >>>>>>> static inline uint32_t pmcr_read(void) > >>>>>>> { > >>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > >>>>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > >>>>>>> return ret; > >>>>>>> } > >>>>>>> + > >>>>>>> +static inline void pmcr_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static inline void pmselr_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static inline void pmxevtyper_write(uint32_t value) > >>>>>>> +{ > >>>>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* > >>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 > >>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is > >>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return > >>>>>>> + * the lower 32 bits of the cycle count in AArch32. > >>>>>> > >>>>>> Like I said in the last review, I'd rather we not do this. We should > >>>>>> return the full value and then the test case should confirm the upper > >>>>>> 32 bits are zero. > >>>>> > >>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > >>>>> register. We can force it to a more coarse-grained cycle counter with > >>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > >>> > >>> AArch32 System Register Descriptions > >>> Performance Monitors registers > >>> PMCCNTR, Performance Monitors Cycle Count Register > >>> > >>> To access the PMCCNTR when accessing as a 32-bit register: > >>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > >>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged > >>> > >>> To access the PMCCNTR when accessing as a 64-bit register: > >>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 > >>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] > >>> > >> > >> Thanks. I did some research based on your info and came back with the > >> following proposals (Cov, correct me if I am wrong): > >> > >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I > >> think this 64-bit cycle register is only available when running under > >> aarch32 compatibility mode on ARMv8 because it is not specified in A15 > >> TRM. > > That interpretation sounds really strange to me. My recollection is that the > cycle counter was available as a 64 bit register in ARMv7 as well. I would > expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference > Manual is the complete and authoritative source. Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit. Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64. > > >> To further verify it, I tested 32-bit pmu code on QEMU with TCG > >> mode. The result is: accessing 64-bit PMCCNTR using the following > >> assembly failed on A15: > >> > >> volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); > >> or > >> volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); > > The PMU implementation on QEMU TCG mode is infantile. (I was trying to > write these tests to help guide fixes and enhancements in a > test-driven-development manner.) I would not trust QEMU TCG to behave > properly here. If you want to execute those instructions, is there anything > preventing you from doing it on hardware, or at least the Foundation Model? > > >> Given this difference, I think there are two solutions for 64-bit > >> AArch32 pmccntr_read, as requested by Drew: > >> > >> 1) The PMU unit testing code tells if it is running under ARMv7 or under > >> AArch32-compability mode. When it is running ARMv7, such as A15, let us > >> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise > >> use "MRRC p15,0,<Rt>,<Rt2>,c9". > >> > >> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC > >> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will > >> be the same as the original code. > > > > 3) For the basic test do (2), but add an additional test for AArch32 > > mode that also does the MRRC. That way on AArch32 we test both access > > types. > > The upper bits being non-zero is an insane corner case. The bits will most likely be zero, but how do you know without checking? Also, just invoking an mrrc vs. mrc is a big difference when testing emulation. We shouldn't skip it just because we expect it to give us a boring result. > > I'd really prefer to first build some momentum with checks for issues that > are A) likely to occur and 2) not too difficult to check, like whether PMCR > is writeable (especially relevant to KVM mode where it's not by default). Looks like we're on the right track for this starter series then. Thanks, drew > > Thanks, > Cov > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code > Aurora Forum, a Linux Foundation Collaborative Project. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-16 16:25 ` Andrew Jones @ 2016-11-16 16:41 ` Christopher Covington 0 siblings, 0 replies; 18+ messages in thread From: Christopher Covington @ 2016-11-16 16:41 UTC (permalink / raw) To: Andrew Jones Cc: Wei Huang, alindsay, kvm, Peter Maydell, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On 11/16/2016 11:25 AM, Andrew Jones wrote: > On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote: >> On 11/16/2016 08:01 AM, Andrew Jones wrote: >>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: >>>> >>>> >>>> On 11/14/2016 09:12 AM, Christopher Covington wrote: >>>>> Hi Drew, Wei, >>>>> >>>>> On 11/14/2016 05:05 AM, Andrew Jones wrote: >>>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: >>>>>>> >>>>>>> >>>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote: >>>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >>>>>>>>> From: Christopher Covington <cov@codeaurora.org> >>>>>>>>> >>>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >>>>>>>>> even for the smallest delta of two subsequent reads. >>>>>>>>> >>>>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org> >>>>>>>>> Signed-off-by: Wei Huang <wei@redhat.com> >>>>>>>>> --- >>>>>>>>> arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> 1 file changed, 98 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c >>>>>>>>> index 0b29088..d5e3ac3 100644 >>>>>>>>> --- a/arm/pmu.c >>>>>>>>> +++ b/arm/pmu.c >>>>>>>>> @@ -14,6 +14,7 @@ >>>>>>>>> */ >>>>>>>>> #include "libcflat.h" >>>>>>>>> >>>>>>>>> +#define PMU_PMCR_E (1 << 0) >>>>>>>>> #define PMU_PMCR_N_SHIFT 11 >>>>>>>>> #define PMU_PMCR_N_MASK 0x1f >>>>>>>>> #define PMU_PMCR_ID_SHIFT 16 >>>>>>>>> @@ -21,6 +22,10 @@ >>>>>>>>> #define PMU_PMCR_IMP_SHIFT 24 >>>>>>>>> #define PMU_PMCR_IMP_MASK 0xff >>>>>>>>> >>>>>>>>> +#define PMU_CYCLE_IDX 31 >>>>>>>>> + >>>>>>>>> +#define NR_SAMPLES 10 >>>>>>>>> + >>>>>>>>> #if defined(__arm__) >>>>>>>>> static inline uint32_t pmcr_read(void) >>>>>>>>> { >>>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >>>>>>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >>>>>>>>> return ret; >>>>>>>>> } >>>>>>>>> + >>>>>>>>> +static inline void pmcr_write(uint32_t value) >>>>>>>>> +{ >>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static inline void pmselr_write(uint32_t value) >>>>>>>>> +{ >>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static inline void pmxevtyper_write(uint32_t value) >>>>>>>>> +{ >>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +/* >>>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 >>>>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is >>>>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return >>>>>>>>> + * the lower 32 bits of the cycle count in AArch32. >>>>>>>> >>>>>>>> Like I said in the last review, I'd rather we not do this. We should >>>>>>>> return the full value and then the test case should confirm the upper >>>>>>>> 32 bits are zero. >>>>>>> >>>>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit >>>>>>> register. We can force it to a more coarse-grained cycle counter with >>>>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. >>>>> >>>>> AArch32 System Register Descriptions >>>>> Performance Monitors registers >>>>> PMCCNTR, Performance Monitors Cycle Count Register >>>>> >>>>> To access the PMCCNTR when accessing as a 32-bit register: >>>>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt >>>>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged >>>>> >>>>> To access the PMCCNTR when accessing as a 64-bit register: >>>>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2 >>>>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32] >>>>> >>>> >>>> Thanks. I did some research based on your info and came back with the >>>> following proposals (Cov, correct me if I am wrong): >>>> >>>> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I >>>> think this 64-bit cycle register is only available when running under >>>> aarch32 compatibility mode on ARMv8 because it is not specified in A15 >>>> TRM. >> >> That interpretation sounds really strange to me. My recollection is that the >> cycle counter was available as a 64 bit register in ARMv7 as well. I would >> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference >> Manual is the complete and authoritative source. > > Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit. > Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64. Just looked it up as well in the good old ARM DDI 0406C.c and you're absolutely right. Sorry for the bad recollection. Cov -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang 2016-11-11 7:43 ` Andrew Jones @ 2016-11-16 15:40 ` Andrew Jones 1 sibling, 0 replies; 18+ messages in thread From: Andrew Jones @ 2016-11-16 15:40 UTC (permalink / raw) To: Wei Huang Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm Just crossed my mind that we're missing isb's. On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > From: Christopher Covington <cov@codeaurora.org> > > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > even for the smallest delta of two subsequent reads. > > Signed-off-by: Christopher Covington <cov@codeaurora.org> > Signed-off-by: Wei Huang <wei@redhat.com> > --- > arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 98 insertions(+) > > diff --git a/arm/pmu.c b/arm/pmu.c > index 0b29088..d5e3ac3 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -14,6 +14,7 @@ > */ > #include "libcflat.h" > > +#define PMU_PMCR_E (1 << 0) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK 0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -21,6 +22,10 @@ > #define PMU_PMCR_IMP_SHIFT 24 > #define PMU_PMCR_IMP_MASK 0xff > > +#define PMU_CYCLE_IDX 31 > + > +#define NR_SAMPLES 10 > + > #if defined(__arm__) > static inline uint32_t pmcr_read(void) > { > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > +} > + > +static inline void pmselr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); Probably want an isb here, users will call this and then immediately another PMU reg write, like is done below > +} > + > +static inline void pmxevtyper_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > +} > + > +/* > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64 > + * bits doesn't seem worth the trouble when differential usage of the result is > + * expected (with differences that can easily fit in 32 bits). So just return > + * the lower 32 bits of the cycle count in AArch32. Also, while we're discussing confirming upper bits are as expected, I guess we should confirm no overflow too. We should clear the overflow bit PMOVSCLR_EL0.C before we use the counter, and then check it at some point to confirm it's as expected. I guess that could be separate test cases though. > + */ > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); > +} > + > +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ > +static inline void pmccfiltr_write(uint32_t value) > +{ > + pmselr_write(PMU_CYCLE_IDX); > + pmxevtyper_write(value); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("msr pmcr_el0, %0" : : "r" (value)); > +} > + > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); > +} > + > +static inline void pmccfiltr_write(uint32_t value) > +{ > + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > +} > #endif > > /* > @@ -63,11 +132,40 @@ static bool check_pmcr(void) > return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; > } > > +/* > + * Ensure that the cycle counter progresses between back-to-back reads. > + */ > +static bool check_cycles_increase(void) > +{ > + pmcr_write(pmcr_read() | PMU_PMCR_E); Need isb() here > + > + for (int i = 0; i < NR_SAMPLES; i++) { > + unsigned long a, b; > + > + a = pmccntr_read(); > + b = pmccntr_read(); > + > + if (a >= b) { > + printf("Read %ld then %ld.\n", a, b); > + return false; > + } > + } > + > + pmcr_write(pmcr_read() & ~PMU_PMCR_E); > + Need isb() here > + return true; > +} > + > int main(void) > { > report_prefix_push("pmu"); > > + /* init for PMU event access, right now only care about cycle count */ > + pmcntenset_write(1 << PMU_CYCLE_IDX); > + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ Need isb() here > + > report("Control register", check_pmcr()); > + report("Monotonically increasing cycle count", check_cycles_increase()); > > return report_summary(); > } > -- > 1.8.3.1 > > Thanks, drew ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking 2016-11-08 18:17 [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang @ 2016-11-08 18:17 ` Wei Huang 2016-11-11 8:08 ` Andrew Jones 2016-11-16 15:45 ` Andrew Jones 2 siblings, 2 replies; 18+ messages in thread From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw) To: cov Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis, croberts, alindsay, drjones From: Christopher Covington <cov@codeaurora.org> Calculate the numbers of cycles per instruction (CPI) implied by ARM PMU cycle counter values. The code includes a strict checking facility intended for the -icount option in TCG mode in the configuration file. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Wei Huang <wei@redhat.com> --- arm/pmu.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++- arm/unittests.cfg | 14 ++++++++ 2 files changed, 114 insertions(+), 1 deletion(-) diff --git a/arm/pmu.c b/arm/pmu.c index d5e3ac3..09aff89 100644 --- a/arm/pmu.c +++ b/arm/pmu.c @@ -15,6 +15,7 @@ #include "libcflat.h" #define PMU_PMCR_E (1 << 0) +#define PMU_PMCR_C (1 << 2) #define PMU_PMCR_N_SHIFT 11 #define PMU_PMCR_N_MASK 0x1f #define PMU_PMCR_ID_SHIFT 16 @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value) pmselr_write(PMU_CYCLE_IDX); pmxevtyper_write(value); } + +/* + * Extra instructions inserted by the compiler would be difficult to compensate + * for, so hand assemble everything between, and including, the PMCR accesses + * to start and stop counting. + */ +static inline void loop(int i, uint32_t pmcr) +{ + asm volatile( + " mcr p15, 0, %[pmcr], c9, c12, 0\n" + "1: subs %[i], %[i], #1\n" + " bgt 1b\n" + " mcr p15, 0, %[z], c9, c12, 0\n" + : [i] "+r" (i) + : [pmcr] "r" (pmcr), [z] "r" (0) + : "cc"); +} #elif defined(__aarch64__) static inline uint32_t pmcr_read(void) { @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value) { asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); } + +/* + * Extra instructions inserted by the compiler would be difficult to compensate + * for, so hand assemble everything between, and including, the PMCR accesses + * to start and stop counting. + */ +static inline void loop(int i, uint32_t pmcr) +{ + asm volatile( + " msr pmcr_el0, %[pmcr]\n" + "1: subs %[i], %[i], #1\n" + " b.gt 1b\n" + " msr pmcr_el0, xzr\n" + : [i] "+r" (i) + : [pmcr] "r" (pmcr) + : "cc"); +} #endif /* @@ -156,8 +191,71 @@ static bool check_cycles_increase(void) return true; } -int main(void) +/* + * Execute a known number of guest instructions. Only odd instruction counts + * greater than or equal to 3 are supported by the in-line assembly code. The + * control register (PMCR_EL0) is initialized with the provided value (allowing + * for example for the cycle counter or event counters to be reset). At the end + * of the exact instruction loop, zero is written to PMCR_EL0 to disable + * counting, allowing the cycle counter or event counters to be read at the + * leisure of the calling code. + */ +static void measure_instrs(int num, uint32_t pmcr) +{ + int i = (num - 1) / 2; + + assert(num >= 3 && ((num - 1) % 2 == 0)); + loop(i, pmcr); +} + +/* + * Measure cycle counts for various known instruction counts. Ensure that the + * cycle counter progresses (similar to check_cycles_increase() but with more + * instructions and using reset and stop controls). If supplied a positive, + * nonzero CPI parameter, also strictly check that every measurement matches + * it. Strict CPI checking is used to test -icount mode. + */ +static bool check_cpi(int cpi) +{ + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E; + + if (cpi > 0) + printf("Checking for CPI=%d.\n", cpi); + printf("instrs : cycles0 cycles1 ...\n"); + + for (int i = 3; i < 300; i += 32) { + int avg, sum = 0; + + printf("%d :", i); + for (int j = 0; j < NR_SAMPLES; j++) { + int cycles; + + measure_instrs(i, pmcr); + cycles =pmccntr_read(); + printf(" %d", cycles); + + if (!cycles || (cpi > 0 && cycles != i * cpi)) { + printf("\n"); + return false; + } + + sum += cycles; + } + avg = sum / NR_SAMPLES; + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", + sum, avg, i / avg, avg / i); + } + + return true; +} + +int main(int argc, char *argv[]) { + int cpi = 0; + + if (argc >= 1) + cpi = atol(argv[0]); + report_prefix_push("pmu"); /* init for PMU event access, right now only care about cycle count */ @@ -166,6 +264,7 @@ int main(void) report("Control register", check_pmcr()); report("Monotonically increasing cycle count", check_cycles_increase()); + report("Cycle/instruction ratio", check_cpi(cpi)); return report_summary(); } diff --git a/arm/unittests.cfg b/arm/unittests.cfg index 7645180..2050dc8 100644 --- a/arm/unittests.cfg +++ b/arm/unittests.cfg @@ -59,3 +59,17 @@ groups = selftest [pmu] file = pmu.flat groups = pmu + +# Test PMU support (TCG) with -icount IPC=1 +[pmu-tcg-icount-1] +file = pmu.flat +extra_params = -icount 0 -append '1' +groups = pmu +accel = tcg + +# Test PMU support (TCG) with -icount IPC=256 +[pmu-tcg-icount-256] +file = pmu.flat +extra_params = -icount 8 -append '256' +groups = pmu +accel = tcg -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang @ 2016-11-11 8:08 ` Andrew Jones 2016-11-11 16:10 ` Wei Huang 2016-11-16 15:45 ` Andrew Jones 1 sibling, 1 reply; 18+ messages in thread From: Andrew Jones @ 2016-11-11 8:08 UTC (permalink / raw) To: Wei Huang Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote: > From: Christopher Covington <cov@codeaurora.org> > > Calculate the numbers of cycles per instruction (CPI) implied by ARM > PMU cycle counter values. The code includes a strict checking facility > intended for the -icount option in TCG mode in the configuration file. > > Signed-off-by: Christopher Covington <cov@codeaurora.org> > Signed-off-by: Wei Huang <wei@redhat.com> > --- > arm/pmu.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++- > arm/unittests.cfg | 14 ++++++++ > 2 files changed, 114 insertions(+), 1 deletion(-) > > diff --git a/arm/pmu.c b/arm/pmu.c > index d5e3ac3..09aff89 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -15,6 +15,7 @@ > #include "libcflat.h" > > #define PMU_PMCR_E (1 << 0) > +#define PMU_PMCR_C (1 << 2) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK 0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value) > pmselr_write(PMU_CYCLE_IDX); > pmxevtyper_write(value); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) We should probably pick a more descriptive name for this function, as we intend to add many more PMU tests to this file. While at it, let's change 'i' to 'n', as it's the number of times to loop. > +{ > + asm volatile( > + " mcr p15, 0, %[pmcr], c9, c12, 0\n" > + "1: subs %[i], %[i], #1\n" > + " bgt 1b\n" > + " mcr p15, 0, %[z], c9, c12, 0\n" > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr), [z] "r" (0) > + : "cc"); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value) > { > asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) > +{ > + asm volatile( > + " msr pmcr_el0, %[pmcr]\n" > + "1: subs %[i], %[i], #1\n" > + " b.gt 1b\n" > + " msr pmcr_el0, xzr\n" > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr) > + : "cc"); > +} > #endif > > /* > @@ -156,8 +191,71 @@ static bool check_cycles_increase(void) > return true; > } > > -int main(void) > +/* > + * Execute a known number of guest instructions. Only odd instruction counts > + * greater than or equal to 3 are supported by the in-line assembly code. The > + * control register (PMCR_EL0) is initialized with the provided value (allowing > + * for example for the cycle counter or event counters to be reset). At the end > + * of the exact instruction loop, zero is written to PMCR_EL0 to disable > + * counting, allowing the cycle counter or event counters to be read at the > + * leisure of the calling code. > + */ > +static void measure_instrs(int num, uint32_t pmcr) > +{ > + int i = (num - 1) / 2; > + > + assert(num >= 3 && ((num - 1) % 2 == 0)); > + loop(i, pmcr); > +} > + > +/* > + * Measure cycle counts for various known instruction counts. Ensure that the > + * cycle counter progresses (similar to check_cycles_increase() but with more > + * instructions and using reset and stop controls). If supplied a positive, > + * nonzero CPI parameter, also strictly check that every measurement matches > + * it. Strict CPI checking is used to test -icount mode. > + */ > +static bool check_cpi(int cpi) > +{ > + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E; > + > + if (cpi > 0) > + printf("Checking for CPI=%d.\n", cpi); > + printf("instrs : cycles0 cycles1 ...\n"); > + > + for (int i = 3; i < 300; i += 32) { > + int avg, sum = 0; > + > + printf("%d :", i); > + for (int j = 0; j < NR_SAMPLES; j++) { > + int cycles; > + > + measure_instrs(i, pmcr); > + cycles =pmccntr_read(); ^ missing space > + printf(" %d", cycles); > + > + if (!cycles || (cpi > 0 && cycles != i * cpi)) { > + printf("\n"); > + return false; > + } > + > + sum += cycles; > + } > + avg = sum / NR_SAMPLES; > + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", > + sum, avg, i / avg, avg / i); > + } > + > + return true; > +} > + > +int main(int argc, char *argv[]) > { > + int cpi = 0; > + > + if (argc >= 1) > + cpi = atol(argv[0]); ^ should be '1', otherwise cpi is getting set to the string "arm/pmu.flat" pointer. How did this work when you tested it? > + > report_prefix_push("pmu"); > > /* init for PMU event access, right now only care about cycle count */ > @@ -166,6 +264,7 @@ int main(void) > > report("Control register", check_pmcr()); > report("Monotonically increasing cycle count", check_cycles_increase()); > + report("Cycle/instruction ratio", check_cpi(cpi)); > > return report_summary(); > } > diff --git a/arm/unittests.cfg b/arm/unittests.cfg > index 7645180..2050dc8 100644 > --- a/arm/unittests.cfg > +++ b/arm/unittests.cfg > @@ -59,3 +59,17 @@ groups = selftest > [pmu] > file = pmu.flat > groups = pmu > + > +# Test PMU support (TCG) with -icount IPC=1 > +[pmu-tcg-icount-1] > +file = pmu.flat > +extra_params = -icount 0 -append '1' > +groups = pmu > +accel = tcg > + > +# Test PMU support (TCG) with -icount IPC=256 > +[pmu-tcg-icount-256] > +file = pmu.flat > +extra_params = -icount 8 -append '256' > +groups = pmu > +accel = tcg The unittests.cfg looks good. Thanks, drew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking 2016-11-11 8:08 ` Andrew Jones @ 2016-11-11 16:10 ` Wei Huang 0 siblings, 0 replies; 18+ messages in thread From: Wei Huang @ 2016-11-11 16:10 UTC (permalink / raw) To: Andrew Jones Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On 11/11/2016 02:08 AM, Andrew Jones wrote: > On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote: >> From: Christopher Covington <cov@codeaurora.org> >> >> Calculate the numbers of cycles per instruction (CPI) implied by ARM >> PMU cycle counter values. The code includes a strict checking facility >> intended for the -icount option in TCG mode in the configuration file. >> >> Signed-off-by: Christopher Covington <cov@codeaurora.org> >> Signed-off-by: Wei Huang <wei@redhat.com> >> --- >> arm/pmu.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++- >> arm/unittests.cfg | 14 ++++++++ >> 2 files changed, 114 insertions(+), 1 deletion(-) >> >> diff --git a/arm/pmu.c b/arm/pmu.c >> index d5e3ac3..09aff89 100644 >> --- a/arm/pmu.c >> +++ b/arm/pmu.c >> @@ -15,6 +15,7 @@ >> #include "libcflat.h" >> >> #define PMU_PMCR_E (1 << 0) >> +#define PMU_PMCR_C (1 << 2) >> #define PMU_PMCR_N_SHIFT 11 >> #define PMU_PMCR_N_MASK 0x1f >> #define PMU_PMCR_ID_SHIFT 16 >> @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value) >> pmselr_write(PMU_CYCLE_IDX); >> pmxevtyper_write(value); >> } >> + >> +/* >> + * Extra instructions inserted by the compiler would be difficult to compensate >> + * for, so hand assemble everything between, and including, the PMCR accesses >> + * to start and stop counting. >> + */ >> +static inline void loop(int i, uint32_t pmcr) > > We should probably pick a more descriptive name for this function, as > we intend to add many more PMU tests to this file. While at it, let's > change 'i' to 'n', as it's the number of times to loop. I will rename it to fixed_num_loop(). When more tests are added to it, we can standardize the name, e.g. *_test(). > >> +{ >> + asm volatile( >> + " mcr p15, 0, %[pmcr], c9, c12, 0\n" >> + "1: subs %[i], %[i], #1\n" >> + " bgt 1b\n" >> + " mcr p15, 0, %[z], c9, c12, 0\n" >> + : [i] "+r" (i) >> + : [pmcr] "r" (pmcr), [z] "r" (0) >> + : "cc"); >> +} >> #elif defined(__aarch64__) >> static inline uint32_t pmcr_read(void) >> { >> @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value) >> { >> asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); >> } >> + >> +/* >> + * Extra instructions inserted by the compiler would be difficult to compensate >> + * for, so hand assemble everything between, and including, the PMCR accesses >> + * to start and stop counting. >> + */ >> +static inline void loop(int i, uint32_t pmcr) >> +{ >> + asm volatile( >> + " msr pmcr_el0, %[pmcr]\n" >> + "1: subs %[i], %[i], #1\n" >> + " b.gt 1b\n" >> + " msr pmcr_el0, xzr\n" >> + : [i] "+r" (i) >> + : [pmcr] "r" (pmcr) >> + : "cc"); >> +} >> #endif >> >> /* >> @@ -156,8 +191,71 @@ static bool check_cycles_increase(void) >> return true; >> } >> >> -int main(void) >> +/* >> + * Execute a known number of guest instructions. Only odd instruction counts >> + * greater than or equal to 3 are supported by the in-line assembly code. The >> + * control register (PMCR_EL0) is initialized with the provided value (allowing >> + * for example for the cycle counter or event counters to be reset). At the end >> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable >> + * counting, allowing the cycle counter or event counters to be read at the >> + * leisure of the calling code. >> + */ >> +static void measure_instrs(int num, uint32_t pmcr) >> +{ >> + int i = (num - 1) / 2; >> + >> + assert(num >= 3 && ((num - 1) % 2 == 0)); >> + loop(i, pmcr); >> +} >> + >> +/* >> + * Measure cycle counts for various known instruction counts. Ensure that the >> + * cycle counter progresses (similar to check_cycles_increase() but with more >> + * instructions and using reset and stop controls). If supplied a positive, >> + * nonzero CPI parameter, also strictly check that every measurement matches >> + * it. Strict CPI checking is used to test -icount mode. >> + */ >> +static bool check_cpi(int cpi) >> +{ >> + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E; >> + >> + if (cpi > 0) >> + printf("Checking for CPI=%d.\n", cpi); >> + printf("instrs : cycles0 cycles1 ...\n"); >> + >> + for (int i = 3; i < 300; i += 32) { >> + int avg, sum = 0; >> + >> + printf("%d :", i); >> + for (int j = 0; j < NR_SAMPLES; j++) { >> + int cycles; >> + >> + measure_instrs(i, pmcr); >> + cycles =pmccntr_read(); > ^ missing space > >> + printf(" %d", cycles); >> + >> + if (!cycles || (cpi > 0 && cycles != i * cpi)) { >> + printf("\n"); >> + return false; >> + } >> + >> + sum += cycles; >> + } >> + avg = sum / NR_SAMPLES; >> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", >> + sum, avg, i / avg, avg / i); >> + } >> + >> + return true; >> +} >> + >> +int main(int argc, char *argv[]) >> { >> + int cpi = 0; >> + >> + if (argc >= 1) >> + cpi = atol(argv[0]); > ^ should be '1', otherwise cpi is > getting set to the string "arm/pmu.flat" pointer. How did this > work when you tested it? This is a bug and the value is always 0. With that said, it didn't change the testing result: other than in printf(), cpi is only used in "(!cycles || (cpi > 0 && cycles != i * cpi))", which is always evaluated as FALSE. So check_cpi() won't return early incorrectly. I will fix it. > >> + >> report_prefix_push("pmu"); >> >> /* init for PMU event access, right now only care about cycle count */ >> @@ -166,6 +264,7 @@ int main(void) >> >> report("Control register", check_pmcr()); >> report("Monotonically increasing cycle count", check_cycles_increase()); >> + report("Cycle/instruction ratio", check_cpi(cpi)); >> >> return report_summary(); >> } >> diff --git a/arm/unittests.cfg b/arm/unittests.cfg >> index 7645180..2050dc8 100644 >> --- a/arm/unittests.cfg >> +++ b/arm/unittests.cfg >> @@ -59,3 +59,17 @@ groups = selftest >> [pmu] >> file = pmu.flat >> groups = pmu >> + >> +# Test PMU support (TCG) with -icount IPC=1 >> +[pmu-tcg-icount-1] >> +file = pmu.flat >> +extra_params = -icount 0 -append '1' >> +groups = pmu >> +accel = tcg >> + >> +# Test PMU support (TCG) with -icount IPC=256 >> +[pmu-tcg-icount-256] >> +file = pmu.flat >> +extra_params = -icount 8 -append '256' >> +groups = pmu >> +accel = tcg > > The unittests.cfg looks good. > > Thanks, > drew > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang 2016-11-11 8:08 ` Andrew Jones @ 2016-11-16 15:45 ` Andrew Jones 1 sibling, 0 replies; 18+ messages in thread From: Andrew Jones @ 2016-11-16 15:45 UTC (permalink / raw) To: Wei Huang Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis, shannon.zhao, kvmarm On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote: > From: Christopher Covington <cov@codeaurora.org> > > Calculate the numbers of cycles per instruction (CPI) implied by ARM > PMU cycle counter values. The code includes a strict checking facility > intended for the -icount option in TCG mode in the configuration file. > > Signed-off-by: Christopher Covington <cov@codeaurora.org> > Signed-off-by: Wei Huang <wei@redhat.com> > --- > arm/pmu.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++- > arm/unittests.cfg | 14 ++++++++ > 2 files changed, 114 insertions(+), 1 deletion(-) > > diff --git a/arm/pmu.c b/arm/pmu.c > index d5e3ac3..09aff89 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -15,6 +15,7 @@ > #include "libcflat.h" > > #define PMU_PMCR_E (1 << 0) > +#define PMU_PMCR_C (1 << 2) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK 0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value) > pmselr_write(PMU_CYCLE_IDX); > pmxevtyper_write(value); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) > +{ > + asm volatile( > + " mcr p15, 0, %[pmcr], c9, c12, 0\n" isb > + "1: subs %[i], %[i], #1\n" > + " bgt 1b\n" > + " mcr p15, 0, %[z], c9, c12, 0\n" isb > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr), [z] "r" (0) > + : "cc"); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value) > { > asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) > +{ > + asm volatile( > + " msr pmcr_el0, %[pmcr]\n" isb > + "1: subs %[i], %[i], #1\n" > + " b.gt 1b\n" > + " msr pmcr_el0, xzr\n" isb > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr) > + : "cc"); > +} > #endif > > /* > @@ -156,8 +191,71 @@ static bool check_cycles_increase(void) > return true; > } > > -int main(void) > +/* > + * Execute a known number of guest instructions. Only odd instruction counts > + * greater than or equal to 3 are supported by the in-line assembly code. The > + * control register (PMCR_EL0) is initialized with the provided value (allowing > + * for example for the cycle counter or event counters to be reset). At the end > + * of the exact instruction loop, zero is written to PMCR_EL0 to disable > + * counting, allowing the cycle counter or event counters to be read at the > + * leisure of the calling code. > + */ > +static void measure_instrs(int num, uint32_t pmcr) > +{ > + int i = (num - 1) / 2; > + > + assert(num >= 3 && ((num - 1) % 2 == 0)); > + loop(i, pmcr); > +} > + > +/* > + * Measure cycle counts for various known instruction counts. Ensure that the > + * cycle counter progresses (similar to check_cycles_increase() but with more > + * instructions and using reset and stop controls). If supplied a positive, > + * nonzero CPI parameter, also strictly check that every measurement matches > + * it. Strict CPI checking is used to test -icount mode. > + */ > +static bool check_cpi(int cpi) > +{ > + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E; > + > + if (cpi > 0) > + printf("Checking for CPI=%d.\n", cpi); > + printf("instrs : cycles0 cycles1 ...\n"); > + > + for (int i = 3; i < 300; i += 32) { > + int avg, sum = 0; > + > + printf("%d :", i); > + for (int j = 0; j < NR_SAMPLES; j++) { > + int cycles; > + > + measure_instrs(i, pmcr); > + cycles =pmccntr_read(); > + printf(" %d", cycles); > + > + if (!cycles || (cpi > 0 && cycles != i * cpi)) { > + printf("\n"); > + return false; > + } > + > + sum += cycles; > + } > + avg = sum / NR_SAMPLES; > + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", > + sum, avg, i / avg, avg / i); > + } > + > + return true; > +} > + > +int main(int argc, char *argv[]) > { > + int cpi = 0; > + > + if (argc >= 1) > + cpi = atol(argv[0]); > + > report_prefix_push("pmu"); > > /* init for PMU event access, right now only care about cycle count */ > @@ -166,6 +264,7 @@ int main(void) > > report("Control register", check_pmcr()); > report("Monotonically increasing cycle count", check_cycles_increase()); > + report("Cycle/instruction ratio", check_cpi(cpi)); > > return report_summary(); > } > diff --git a/arm/unittests.cfg b/arm/unittests.cfg > index 7645180..2050dc8 100644 > --- a/arm/unittests.cfg > +++ b/arm/unittests.cfg > @@ -59,3 +59,17 @@ groups = selftest > [pmu] > file = pmu.flat > groups = pmu > + > +# Test PMU support (TCG) with -icount IPC=1 > +[pmu-tcg-icount-1] > +file = pmu.flat > +extra_params = -icount 0 -append '1' > +groups = pmu > +accel = tcg > + > +# Test PMU support (TCG) with -icount IPC=256 > +[pmu-tcg-icount-256] > +file = pmu.flat > +extra_params = -icount 8 -append '256' > +groups = pmu > +accel = tcg > -- > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-11-16 16:41 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-08 18:17 [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests Wei Huang 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang 2016-11-11 7:37 ` Andrew Jones 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang 2016-11-11 7:43 ` Andrew Jones 2016-11-11 19:55 ` Wei Huang 2016-11-14 10:05 ` Andrew Jones 2016-11-14 15:12 ` Christopher Covington 2016-11-15 22:50 ` Wei Huang 2016-11-16 13:01 ` Andrew Jones 2016-11-16 16:08 ` Christopher Covington 2016-11-16 16:25 ` Andrew Jones 2016-11-16 16:41 ` Christopher Covington 2016-11-16 15:40 ` Andrew Jones 2016-11-08 18:17 ` [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang 2016-11-11 8:08 ` Andrew Jones 2016-11-11 16:10 ` Wei Huang 2016-11-16 15:45 ` Andrew Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).